Replication Package of the paper "From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects"
ABSTRACT: Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git).We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.
You can find the full paper at: https://doi.org/10.1145/3345629.3345639
If you use this dataset for your research, please reference the following paper:
@inproceedings{Vieira:2019:RBC:3345629.3345639,
author = {Vieira, Renan and da Silva, Ant\^{o}nio and Rocha, Lincoln and Gomes, Jo\~{a}o Paulo},
title = {From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects},
booktitle = {Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering},
series = {PROMISE'19},
year = {2019},
isbn = {978-1-4503-7233-6},
location = {Recife, Brazil},
pages = {80--89},
numpages = {10},
url = {http://doi.acm.org/10.1145/3345629.3345639},
doi = {10.1145/3345629.3345639},
acmid = {3345639},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Bug-Fix Dataset, Mining Software Repositories, Software Traceability},
}
P.S: We added a new dataset version (v1.0.1). In this version, we fix the git commit features that track the src and test files. More info can be found in the fix-script.py file.