Enron Email Time-Series Network

<p>We use the <a href="https://www.kaggle.com/wcukierski/enron-email-dataset">Enron email dataset</a> to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.</p> <p>To build a graph <em>G = (V</em><em>, E</em><em>)</em>, we use email addresses as nodes <em>V</em>. Every node <em>v<sub>i</sub></em> has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge <em>e</em><em><sub><em>ij</em></sub></em> between two nodes <em>i</em> and <em>j</em> if there is at least one email exchange between the corresponding addresses.</p> <p>Column <em>'Count'</em> in <em>'edges.csv'</em>  file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.</p> <p>The file <em>'nodes.csv'</em> contains a dictionary that is a compressed representation of time-series. The format of the dictionary is <em>Day->The Number Of Emails Sent By the Address During That Day.</em> The total number of days is 1448.</p> <p><em>'id-email.csv'</em> is a file containing the actual email addresses.</p>