A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects
datasetposted on 2020-07-20, 11:02 authored by Andrew SimmonsAndrew Simmons, Scott Barnett, Jessica Rivera-Villicana, Akshat Bajaj, Rajesh Vasa
This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.
results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.
source_code_anonymized.tar.gz: Anonymized source code to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.
Paper to appear in ESEM 2020