figshare
Browse
yahoo_password_frequencies_corpus.tar.gz (130.64 kB)

Yahoo Password Frequency Corpus

Download (130.64 kB)
dataset
posted on 2015-12-23, 21:21 authored by Joseph BonneauJoseph Bonneau
This dataset includes sanitized password frequency lists collected from Yahoo in
May 2011. 

For details of the original collection experiment, please see:

Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 
million passwords." IEEE Symposium on Security & Privacy, 2012.
http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf

This data has been modified to preserve differential privacy. For details of
this modification, please see:

Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private 
Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.
http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdf

Each of the 51 .txt files represents one subset of all users' passwords observed
during the experiment period. "yahoo-all.txt" includes all users; every other
file represents a strict subset of that group.

Each file is a series of lines of the format:

FREQUENCY #OBSERVATIONS
...

with FREQUENCY in descending order. For example, the file:

3 1
2 1
1 3

would represent a the frequency list (3, 2, 1, 1, 1), that is, one password
observed 3 times, one observed twice, and three separate passwords observed
once each.

History