figshare
Browse

Assessment of computational methods in predicting TCR-epitope binding recognition

Version 7 2025-05-08, 01:11
Version 6 2025-04-28, 15:19
Version 5 2024-09-23, 13:23
Version 4 2024-09-19, 14:05
Version 3 2024-09-16, 08:53
Version 2 2024-09-16, 08:34
Version 1 2024-09-15, 06:11
dataset
posted on 2025-05-08, 01:11 authored by Yanping Lu, Yuyan Wang, Meng Xu, Bingbing Xie, Yumeng Yang, Haodong Xu, Shengbao Suo

T-cell receptors (TCRs) are critical for the immune system's ability to recognize specific epitopes. Accurate prediction of TCR-epitope interactions is fundamental for understanding and enhancing immune responses. Despite significant advances in computational methods for TCR-epitope binding prediction, a thorough evaluation of these tools remains lacking. Here, we assessed 50 state-of-the-art TCR-epitope prediction models using 21 datasets covering 762 epitopes and hundreds of thousands corresponding binding TCR sequences. Our analysis revealed that while the ratio of positive to negative samples subtly influences performance, the source of negative TCR samples significantly impacts model accuracy. External negative data may introduce uncontrolled confounders, compromising evaluation reliability. A positive correlation between the number of TCRs per epitope and model performance highlights the importance of large, diverse datasets. Multi-feature models generally outperform single CDR3β models, but generalization to unseen epitopes remains a challenge across all evaluated models. Using independently sourced test sets, which offer practical predictions for real-world applications, for both seen and unseen epitopes is crucial for objective performance assessment. These insights would guide the development of more accurate and robust computational tools for TCR-epitope interaction prediction, accelerating advancements in this evolving field.

Funding

This work was supported in part by National Natural Science Foundation of China (32370972), Guangdong Basic and Applied Basic Research Foundation (2024B1515020052, 2023A1515011783), Major Project of Guangzhou National Laboratory (GZNL2023A02007, GZNL2023A03005, SRPG22017).

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC