figshare
Browse

Equivalent Mutants Analysed via Deductive Verification

Version 2 2025-02-28, 15:47
Version 1 2025-01-03, 14:21
dataset
posted on 2025-02-28, 15:47 authored by Serge DemeyerSerge Demeyer

This is the reproduction package complementing the paper "Equivalent Mutants: Deductive Verification to the Rescue" accepted at the Mutation 2025 Workshop.

In the paper we investigate the (supposedly) equivalent mutants from, MutantBench an open-source dataset designed for comparing the performance of tools for detecting equivalent mutants [https://github.com/MutantBench/]. The dataset comes with 4,400 mutants injected into 18 C and 18 Java programs where 1,416 are marked as equivalent. The equivalence status itself was drawn from previously published papers concerning tools that detect equivalent mutants. This status has not been confirmed by independent authors and as we show in our analysis, several equivalent mutants were in fact false positives.

Below we list the mutants in this dataset; 30 mutants applied on 10 systems under test

Code # loc # mutants
-------------------------------
Day 59 4 / 14
Defroster 160 1 / 147
MathUtils (distance) 33 1 / 0
Mid 59 5 / 5
Min 23 9 / 9
Quicksort 96 1 / 322
Triangle 89 5 / 706
BinarySearch. 37 1 / n.a.
FindLast 22 1 / n.a.
FindMin 40 2 / n.a.


#loc = lines of code of the program and mutant under test including self composition

# mutants = the number of mutants analyzed versus the total number of mutants for the given system in the MutantBench dataset


The code in the MutantBench dataset required some adaptation to be amenable for the analysis by means of Contract Schemas (the technique proposed in the paper). In particular, the code under test must satisfy the RIPR criterion (reach - infect - propagate - reveal); where the reveal part means we should be able to manipulate the result via the output of a function. Many of the programs under test wrote the result to the console via *print* statements; we replaced those with *return* statements. Other programs just used a single *main()* method; we replaced it with a standard method declared on a class named after the file of the program under test.

For MathUtils, the benchmark covers only the uninteresting *gcd*, instead we supplied a mutant for *distance*.

Most of the samples selected above lacked loops. We therefore added three search algorithms: BinarySearch, FindLast and FindMin.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC