Mozdeh gender detection accuracy and bias for YouTube commenters
datasetposted on 11.12.2017 by Mike Thelwall
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This file estimates the accuracy of gender detection by the software Mozdeh, which uses a list of first names recorded at least 90% by males or females in the 1990 US census. The document shows that Mozdeh finds more males than females, introducing a small bias into the results.
The results are based on the names of school age children, probably dominated by the UK and English-speaking countries. Different biases can be expected for other countries and demographics.