FedMVA: Enhancing Software Vulnerability Assessment via Federated Multimodal Learning
This study introduces FedMVA, a privacy-preserving framework for software vulnerability assessment that leverages federated learning and multimodal data fusion. To tackle the challenges of data heterogeneity and privacy in distributed environments, FedMVA enables decentralized training across multiple clients without sharing raw source code or annotations.
The model integrates three complementary modalities:
Lexical features (tokenized code)
Structural representations (from code property graphs)
Developer comments (semantic context)
To enhance model performance and robustness, FedMVA incorporates:
A weighted variance minimization loss to reduce divergence between local and global models
A momentum-based client weighting strategy
A dynamic learning rate mechanism to handle non-IID data
Experimental results on a tri-modal dataset constructed from CVE and GitHub repositories show that FedMVA outperforms state-of-the-art baselines in accuracy, F1-score, and MCC, while ensuring data privacy. This work highlights the power of integrating multimodal feature representations under federated settings for effective and scalable software vulnerability assessment.