figshare
Browse

FedMVA: Enhancing Software Vulnerability Assessment via Federated Multimodal Learning

Version 6 2025-04-17, 07:34
Version 5 2025-04-17, 07:26
Version 4 2025-04-17, 07:24
Version 3 2025-04-17, 06:53
Version 2 2025-04-17, 04:01
Version 1 2025-04-17, 03:26
thesis
posted on 2025-04-17, 07:34 authored by Qingyun LiuQingyun Liu

This study introduces FedMVA, a privacy-preserving framework for software vulnerability assessment that leverages federated learning and multimodal data fusion. To tackle the challenges of data heterogeneity and privacy in distributed environments, FedMVA enables decentralized training across multiple clients without sharing raw source code or annotations.

The model integrates three complementary modalities:

Lexical features (tokenized code)

Structural representations (from code property graphs)

Developer comments (semantic context)

To enhance model performance and robustness, FedMVA incorporates:

A weighted variance minimization loss to reduce divergence between local and global models

A momentum-based client weighting strategy

A dynamic learning rate mechanism to handle non-IID data

Experimental results on a tri-modal dataset constructed from CVE and GitHub repositories show that FedMVA outperforms state-of-the-art baselines in accuracy, F1-score, and MCC, while ensuring data privacy. This work highlights the power of integrating multimodal feature representations under federated settings for effective and scalable software vulnerability assessment.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC