Enron

Hermans, Felienne

doi:10.6084/m9.figshare.1222882.v2

document.pdf (290.37 kB)

Enron

Version 2 2015-01-19, 10:56

Version 1 2014-10-30, 15:03

journal contribution

posted on 2015-01-19, 10:56 authored by Felienne HermansFelienne Hermans

Spreadsheets are used extensively in business processes around the world and as such, a topic of research interest. Over the past few years, many spreadsheet studies have been performed on the EUSES spreadsheet corpus. While this corpus has served the spreadsheet community well, the spreadsheets it contains are mainly gathered with search engines and as such do not represent spreadsheets used in companies. This paper presents a new dataset, extracted for the Enron Email Archive, containing over 15,000 spreadsheets used within the Enron Corporation. In addition to the spreadsheets, we also present an analysis of the associated emails, where we look into spreadsheet specific email behavior.

Our analysis shows that 1) 24% of Enron spreadsheets with at least one formula contain an Excel error, 2) there is little diversity in the functions used in spreadsheets: 76% of spreadsheets in the presented corpus only use the same 15 functions and, 3) the spreadsheets are substantially more smelly than the EUSES corpus, especially in terms of long calculation chains. Regarding the emails, we observe that spreadsheets 1) are a frequent topic of email conversation with 10\% of emails either sending or referring spreadsheets and 2) the emails are frequently discussing errors in and updates to spreadsheets.

History

Usage metrics

Keywords

enron Spreadsheets Software Engineering

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Enron

History

Usage metrics

Categories

Keywords

Licence

Exports