yahoo_password_frequencies_corpus.tar.gz (130.64 kB)

Yahoo Password Frequency Corpus

Download (130.64 kB)
dataset
posted on 23.12.2015 by Joseph Bonneau
This dataset includes sanitized password frequency lists collected from Yahoo in
May 2011. 

For details of the original collection experiment, please see:

Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 
million passwords." IEEE Symposium on Security & Privacy, 2012.
http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf

This data has been modified to preserve differential privacy. For details of
this modification, please see:

Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private 
Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.
http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdf

Each of the 51 .txt files represents one subset of all users' passwords observed
during the experiment period. "yahoo-all.txt" includes all users; every other
file represents a strict subset of that group.

Each file is a series of lines of the format:

FREQUENCY #OBSERVATIONS
...

with FREQUENCY in descending order. For example, the file:

3 1
2 1
1 3

would represent a the frequency list (3, 2, 1, 1, 1), that is, one password
observed 3 times, one observed twice, and three separate passwords observed
once each.

History

Licence

Exports