Yahoo Password Frequency Corpus

2015-12-23T21:21:19Z (GMT) by Joseph Bonneau
<div>This dataset includes sanitized password frequency lists collected from Yahoo in</div><div>May 2011. </div><div><br></div><div>For details of the original collection experiment, please see:</div><div><br></div><div>Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 </div><div>million passwords." IEEE Symposium on Security & Privacy, 2012.</div><div>http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf</div><div><br></div><div>This data has been modified to preserve differential privacy. For details of</div><div>this modification, please see:</div><div><br></div><div>Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private </div><div>Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.</div><div>http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdf</div><div><br></div><div>Each of the 51 .txt files represents one subset of all users' passwords observed</div><div>during the experiment period. "yahoo-all.txt" includes all users; every other</div><div>file represents a strict subset of that group.</div><div><br></div><div>Each file is a series of lines of the format:</div><div><br></div><div>FREQUENCY #OBSERVATIONS</div><div>...</div><div><br></div><div>with FREQUENCY in descending order. For example, the file:</div><div><br></div><div>3 1</div><div>2 1</div><div>1 3</div><div><br></div><div>would represent a the frequency list (3, 2, 1, 1, 1), that is, one password</div><div>observed 3 times, one observed twice, and three separate passwords observed</div><div>once each.</div>