figshare
Browse

Code for Data Leakage Detection in Machine Learning Code

software
posted on 2025-02-12, 15:41 authored by Nouf AlturayeifNouf Alturayeif, Jameleddine HASSINEJameleddine HASSINE

This is the code created and used in the paper: "Data Leakage Detection in Machine Learning Code: Transfer Learning, Active Learning, or Low-shot Prompting?"

Transfer learning and Active learning approaches are found in AL4Code folder. Low-shot prompting approach can be found in GPT4Code folder.

Prerequisites

  • Python 3.8.16
  • numpy 1.21.2
  • pandas 1.5.3
  • torch 1.11.0
  • scikit-learn 1.0.1
  • transformers 4.27.4
  • sentence_transformers 2.2.2
  • openai < 1.0.0

Demo

python demo.py \

--n_query 50 \

--n_init_labeled 50 \

--strategy_name LeastConfidence \

--active_learning True \

--with_augmentation True \

--patient 5

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC