The Enron Emails also known as Enron Corpus are basically the largest email sources of data you can ever find across the world. We are talking about millions of files whose capacity can be more than 2.5 GB. When these emails were formulated they were not meant for the general public, but shockingly Enron collapsed and the public was able to get access to these emails. Enron collapsed and went bankrupt back in 2001 and it was a shocking event since it was one of the biggest companies back then.
These emails are important for research
Though Enron collapsed, the released emails are being used for research. It is a unique collection of real emails that the public is able to access and perform a number of studies. Normally, there are a number of legal and privacy restrictions that surround such emails and make them hard to access.
To prove the importance of these emails, a computer scientist named Andrew McCallum willingly purchased the database of these emails by paying $10,000. It is believed the scientist from the University of Massachusetts Amherst later on shared the copy with researchers. That is how researchers within the computer and social networking sector got access to this important data.
Within the E-Discovery industry, the Enron Data set is regarded as the industry standard and it is commonly used by researchers, scientists, technology providers for differing purposes. Though Enron filed for bankruptcy protection back in 2001, FERC (the Federal Energy Regulatory Commission) is still undertaking several investigations to find out what part Enron and other energy companies played in the Energy crisis that affected the western markets.
A lot Of Investigations Are Still Underway
California was the state that was badly hit by the turn of events and Senator Diane Feinstein and Governor Pete Wilson demanded answers about the crisis. Enron was regarded as a major player in the crisis since the company managed and owned a sizeable online B2B marketplace that majorly dealt with energy future contracts – referred to as EnronOnline. This is the platform that hosted the Enron employee emails as well as the company’s enterprise data systems.
Analysis of the Enron Emails
Now that we know these data sources are so massive, the question of analyzing them arises. The analysis itself is a complicated affair since it is hard to come up with a starting point. Presently there exists a number of versions of the Enron Data set since the original set had duplicates. It is also believed that there were many errors and viruses in the files. There are many online sites that attempt to explain the data set and even go ahead and offer links different versions of these emails. For instance, you can visit EDRM (the Electronic Discovery Reference Model) to access some versions of the Enron emails and perform any research or testing.
There sites that even take you through the various stages of analysis this data set and recommend the various software to use during the analysis.