figshare
Browse
1/1
4 files

Dataset - pairs of smelly and refactored test codes

Download all (7.93 MB) This item is shared privately
dataset
modified on 2024-02-20, 18:30

This repository presents the dataset containing pairs of smelly and refactored test codes.

The Complete Dataset file presents four sheets to describe the selection and analysis of the test files. They are:

  • All test files. The sheet contains 132,819 lines referring to all the modified test files from 2019 to 2021 in the 13 open-source Java projects. We executed a script Git Diff to mine information stored as columns: i) project name, ii) SHA - commit identifier, iii) Date of the commit, iv) Log message, v) File a and b representing the paths before and after the refactoring.
  • Filter test files. The sheet contains 14,429 lines referring to the test files with potential refactorings. To select those lines, we filter the Log message field containing the word "test'' and optionally the words "improvement'' or "refactoring''.
  • 375 test classes. The sheet contains the stratified sample of 375 test files. We considered the projects as groups and ensured that each project received proper representation within the sample.
  • Instances. The sheet contains 611 instances of pairs [smelly, refactored] test codes. To perform the classification, we add the columns: i) Refactoring operation, ii) Test smell, iii) Lines to save before the refactoring, iv) Lines to save after the refactoring, v) Excerpt code before, vi) Excerpt code after, and vii) part of our catalog.

The Kappa CoderA x CoderB file presents the manual analysis performed by the authors. It presents the same columns as the previous sheets but it is separated by Coder A, Cober B, and the comparison between them.

In addition, we submitted several pull requests to the subject projects to understand the developers' perceptions on test smells refactorings. The List of Submitted Pull Requests file contains the pull request links and their respective status.