figshare
Browse

Supplementary Figure 9 from Federated Deep Learning Enables Cancer Subtyping by Proteomics

Download (129.64 kB)
journal contribution
posted on 2025-09-04, 07:20 authored by Zhaoxiang Cai, Emma L. Boys, Zainab Noor, Adel T. Aref, Dylan Xavier, Natasha Lucas, Steven G. Williams, Jennifer M.S. Koh, Rebecca C. Poulos, Yangxiu Wu, Michael Dausmann, Karen L. MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M. Barranco, Mark Basik, Elise D. Bowman, Roderick Clifton-Bligh, Elizabeth A. Connolly, Wendy A. Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J. Flynn, J. Dinny Graham, Jacob George, Anthony J. Gill, Michael Gnant, Rosemary Habib, Curtis C. Harris, Kate Harvey, Lisa G. Horvath, Christopher Jackson, Maija R.J. Kohonen-Corish, Elgene Lim, Jia (Jenny) Liu, Georgina V. Long, Reginald V. Lord, Graham J. Mann, Geoffrey W. McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J. Panizza, Jaswinder S. Samra, Richard A. Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L. Balleine, Peter G. Hains, Phillip J. Robinson, Qing Zhong, Roger R. Reddel
<p>Feature importance for selected proteins with utility at distinguishing cancer subtypes</p>

History

Related Materials

ARTICLE ABSTRACT

Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a federated deep learning approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n = 1,260) and 29 cohorts held behind private firewalls (n = 6,265), representing 19,930 replicate data-independent acquisition mass spectrometry runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n = 625) in 14 cancer subtyping tasks compared with local models and matching centralized model performance. The approach’s generalizability was demonstrated by retraining the global model with data from two external, data-independent acquisition mass spectrometry cohorts (n = 55) and eight acquired by tandem mass tag proteomics (n = 832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, for example, for discovering predictive biomarkers or treatment targets while maintaining data privacy. A federated deep learning approach applied to human proteomic data, acquired using two distinct proteomic technologies from 40 tumor cohorts across eight countries, enabled accurate cancer histopathologic subtyping while preserving data privacy. This approach will enable the privacy-compliant development of large-scale proteomic artificial intelligence models, including foundation models, across institutions globally.

Usage metrics

    Cancer Discovery

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC