Version 2 2021-06-22, 11:34Version 2 2021-06-22, 11:34
Version 1 2021-06-17, 13:06Version 1 2021-06-17, 13:06
journal contribution
posted on 2021-06-22, 11:34authored byBo Sun, Pawel Smialowski, Tobias Straub, Axel Imhof
Trypsin is one of
the most important and widely used proteolytic
enzymes in mass spectrometry (MS)-based proteomic research. It exclusively
cleaves peptide bonds at the C-terminus of lysine and arginine. However,
the cleavage is also affected by several factors, including specific
surrounding amino acids, resulting in frequent incomplete proteolysis
and subsequent issues in peptide identification and quantification.
The accurate annotations on missed cleavages are crucial to database
searching in MS analysis. Here, we present deep-learning predicting
missed cleavages (dpMC), a novel algorithm for the prediction of missed
trypsin cleavage sites. This algorithm provides a very high accuracy
for predicting missed cleavages with area under the curves (AUCs)
of cross-validation and holdout testing above 0.99, along with the
mean F1 score and the Matthews correlation coefficient (MCC) of 0.9677
and 0.9349, respectively. We tested our algorithm on data sets from
different species and different experimental conditions, and its performance
outperforms other currently available prediction methods. In addition,
the method also provides a better insight into the detailed rules
of trypsin cleavages coupled with propensity and motif analysis. Moreover,
our method can be integrated into database searching in the MS analysis
to identify and quantify mass spectra effectively and efficiently.