What is the Enron E-discovery data set?

The Enron E-discovery data set is now regarded as the E-discovery world standard. But to many, the origin and what the Enron E-discovery data set stands for remains a mystery and it is for this reason that we try and shed light on this E-discovery data set popularly known as the Enron data set.

According to a paper published in 2004 by two Carnegie researchers, the Enron data set is described as a huge collection of emails, commonly known as the Enron corpus and which was made public at a time when the legal investigation against the Enron Company was going on. The raw corpus analyzed is said to have contained around 619,445 messages from around 158 users.

Many versions trying to explain the origin of the Enron data set have since been put forward. Several numbers of websites have also tried to provide the origin of this dataset. Some of these sites have even gone a step further and provided links to some of the common versions regarding the dataset. For instance, in the Electronic Discovery Reference Model or the EDRM, you can find some of the various versions on the Enron Data Set to help you with the different types of testing and research.

What is the actual origin of the Enron E-discovery data set?

The Enron E-discovery data set was originally sourced or taken from the FERC Enron Investigation publication released by the Lockheed Martin Corporation. The Enron E-discovery data set has been a very valuable and crucial resource when it comes to the demonstration of the E-discovery software. At the beginning, this data could be accessed or downloaded from the EDRM site but was later moved to Amazon website. However, after further discussion regarding the personal information that could lead to the identification of an individual such as PII data including credit card numbers, home addresses, birth dates and phone numbers which existed in the FERC and as a result within the Enron E-discovery data set, the data set was removed from the Amazon Web Services or simply the AWS.

What is important to note is that the Enron E-discovery data set came about as a result of the shocking collapse of the Enron Company as a result of massive cases of fraud and illegal activities by the firm’s top management and executives. As a result, different investigative agencies set out to go and look into what had led to the collapsing of the Enron Company.

Around Feb 2002, the FERC or the Federal Energy Regulatory Commission started an employee-level inspection or investigation in the Enron Company and also in the Western Energy Markets. FERC asked Enron to preserve, collect and also provide any electronic data or information they possessed which had a relationship with the issue that was being investigated. It is during this investigation that a lot of data consisting of emails, records of telephone conversations and also radio tapes were discovered. It is from this case that investigative agencies borrowed the idea of E-discovery data set, leading to what is today commonly referred to as the Enron E-discovery data set.