VulnSage.zip
Automating software vulnerability detection (SVD) remains
a critical challenge in an era of increasingly complex and in-
terdependent software systems. Despite significant advances
in Large Language Models (LLMs) for code analysis, prevail-
ing evaluation methodologies often lack the context-aware
robustness necessary to capture real-world intricacies and
cross-component interactions. To address these limitations,
we present VulnSage, a comprehensive evaluation framework
and a dataset curated from diverse, large-scale open-source
system software projects developed in C/C++. Unlike prior
datasets, VulnSage leverages a heuristic noise pre-filtering
approach combined with LLM-based reasoning to ensure a
representative and minimally noisy spectrum of vulnerabili-
ties. The framework can be used to rigorously assess LLMs
by supporting a multi-granular analysis across function, file,
and inter-function levels and employing four diverse zero-
shot prompt strategies: Baseline, Chain-of-Thought, Think,
and Think & Verify.