MS/MS Spectrum Prediction for Modified Peptides Using pDeep2 Trained by Transfer Learning
datasetposted on 25.06.2019 by Wen-Feng Zeng, Xie-Xuan Zhou, Wen-Jing Zhou, Hao Chi, Jianfeng Zhan, Si-Min He
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
In the past decade, tandem mass spectrometry (MS/MS)-based bottom-up proteomics has become the method of choice for analyzing post-translational modifications (PTMs) in complex mixtures. The key to the identification of the PTM-containing peptides and localization of the PTM-modified residues is to measure the similarities between the theoretical spectra and the experimental ones. An accurate prediction of the theoretical MS/MS spectra of the modified peptides will improve the similarity measurement. Here, we proposed the deep-learning-based pDeep2 model for PTMs. We used the transfer learning technique to train pDeep2, facilitating the training with a limited scale of benchmark PTM data. Using the public synthetic PTM data sets, including the synthetic phosphopeptides and 21 synthetic PTMs from ProteomeTools, we showed that the model trained by transfer learning was accurate (>80% Pearson correlation coefficients were higher than 0.9), and was significantly better than the models trained without transfer learning. We also showed that accurate prediction of the fragment ion intensities of the PTM neutral loss, for example, the phosphoric acid loss (−98 Da) of the phosphopeptide, will improve the discriminating power to distinguish the true phosphorylated residue from its adjacent candidate sites. pDeep2 is available at https://github.com/pFindStudio/pDeep/tree/master/pDeep2.