figshare
Browse

Method_Name_Consistency_A_Reality_Check

Download all (133.64 MB) This item is shared privately
software
modified on 2024-05-28, 09:01

*This is a public release for a study related to method name consistency checking and recommendation.

There are only two hard things in Computer Science: cache invalidation and naming things.' — Philip Lewis Karlton

This study is about ...

We note that existing datasets generally do not provide precise details about why a method was deemed improper and required to be changed. Such information can give useful hints on how to improve the recommendation of adequate method names. Based on insights from prior empirical studies, we propose to investigate code reviews as a source of justification for method naming choices, to help understand inconsistencies in method naming. Accordingly, we construct a novel method-naming benchmark, RENAME4J, by matching name changes with code reviews. We then present an empirical study of how state-of-the-art techniques detect or recommend consistent and inconsistent method names based on RENAME4J.

1. Benchmark: RENAME4J

RENAME4J dataset is systematically filtered and manually double-checked method name change pairs with review comments (i.e., BuggyName, FixedName, and Review). Please note that we provide our benchmark as a .xlsx file for easy access and better understanding.

Our benchmark contains the following information about methods:

  • Buggy_Name
  • Fixed_Name
  • Review_Comments
  • Pull_Request_URL
  • Review_URL
  • Commit_Hash
  • Class_File_Path
  • Class_File_Name
  • Buggy_Method_Declaration
  • Fixed_Method_Declaration
  • Project
  • Pull_Request_Status
  • Pull_Request_Number

2. Target Techniques

  • Each directory in this repository is solely a cloned version. We re-train the models with the datasets described in our paper and test with our benchmark.

-------------------------------------------------------------------------------------------------------------------------------------

Spot

- Preparing process:
1. `cd debug-method-name/simple-utils`
2. `mvn install`

3. `cd ../GitTraveller`
4. `mvn install`

5. `cd ../gumtree`
6. `mvn install -DskipTests=true`

- Prepare data: clone Java repositories from GitHub.
1. `cd ../Data/JavaRepos/`

2. `./git-clone-repos.sh`

- Prepare data: Collect the renamed methods from the commit history of Java programs.
1. `cd ../../RenamedMethodsCollector`

2. `mvn dependency:copy-dependencies`
3. `mvn package`

4. `mv target/RenamedMethodsCollector-0.0.1-SNAPSHOT.jar target/dependency`
5. `java -cp "target/dependency/*" -Xmx8g edu.lu.uni.serval.renamed.methods.Main`

- Prepare data: Parse methods in Java projects.
1. `cd ../DebugMethodName`

2. `mvn dependency:copy-dependencies`
3. `mvn package`

4. `mv target/DebugMethodName-0.0.1-SNAPSHOT.jar target/dependency`
5. `java -cp "target/dependency/*" -Xmx8g edu.lu.uni.serval.MainParser `, `` is the index of a Java project in the Java project list (`repos.txt`).

- Prepare data: Prepare data for deep learning of methods.
1. `cd ../LearningModel`

2. `mvn dependency:copy-dependencies`
3. `mvn package`

4. `mv target/LearningModel-0.0.1-SNAPSHOT.jar target/dependency`
5. `java -cp "target/dependency/*" -Xmx8g edu.lu.uni.serval.dlMethods.DataPreparer` Prepare data for learning process.

- Model Learning:
1. `java -cp "target/dependency/*" -Xmx256g edu.lu.uni.serval.dlMethods.EmbedCodeTokens` Embed method body code tokens.

2. `java -cp "target/dependency/*" -Xmx1024g edu.lu.uni.serval.dlMethods.MethodBodyCodeLearner` Learn method body features with CNNs.
3. `java -cp "target/dependency/*" -Xmx1024g edu.lu.uni.serval.dlMethods.MethodNameLearner` Learn method name features with ParagraphVectors.


- Spot and Refactor inconsistent method names:
1. `cd ../DebugMethodName`

2. `java -cp "target/dependency/*" -Xmx8g edu.lu.uni.serval.Main`

-------------------------------------------------------------------------------------------------------------------------------------

CogNac

- data processing: `dataextractor.py` to extract the training dataset.

- fastText training: `train_fasttext.py` to train the FastText model using the extracted data from the last step.

- training: `start_train.sh`.

- evaluation: `start_eval.sh` and `start_decode.sh`, in order.

- To calculate the similarity, execute the `cal_sim.py`.

-------------------------------------------------------------------------------------------------------------------------------------

GTNM

- data processing


  1. `merge_project.py` to save project information.
    --data_path: project data dir
    --save_path: dir to save the project information data (`java-train.pkl, java-eval.pkl, java-test.pkl`)
  2. `processor.py` to get code schema and cross project information.
    --input_file: # dir to save the project information data (For example: `data_path/java-train.pkl`)
    --schema_file # dir to save code schema information (For example: `data_path/java-train_schema.pkl`)
    --output_file # dir to save code schema and cross project information (For example: `data_path/java-train_all.pkl`)
  3. `extract_data.py` to save final pickle data.
    --sub_vocab_file: vocabulary for subtokens in the source code
    --doc_vocab_file: vocabulary for documentation of the methods
    --input_file_name: dir to save code schema and cross-project information (For example: `data_path/java-train_all.pkl`)
    --output_file_name: dir prefix to save the final data for training and evaluation (For example: `data_path/train_subword`, following files will be saved: `data_path/train_subword_body/doc/pro/tag.pkl`)
  4. `invoked_save.py` to save the invoked mask for the project context.
    parameters:
    --data_path: dir to save the final data for training and evaluation
    --prefix: data prefix, for example: train_subword

- Model training and testing: parameters are configured in `hparams.py`

- Training
python train.py --gpu gpu_id --pro True


- Testing
python test.py --gpu gpu_id --pro True