ac0c01687_si_001.pdf (9.42 MB)
Statistical Modeling for Enhancing the Discovery Power of Citrullination from Tandem Mass Spectrometry Data
journal contribution
posted on 2020-09-16, 14:05 authored by Sunghyun Huh, Daehee Hwang, Min-Sik KimCitrullination is a post-translational
modification implicated
in various human diseases including rheumatoid arthritis, Alzheimer’s
disease, multiple sclerosis, and cancers. Due to a relatively low
concentration of citrullinated proteins in the total proteome, confident
identification of citrullinated proteome is challenging in mass spectrometry
(MS)-based proteomic analysis. From these MS-based analyses, MS features
that characterize citrullination, such as immonium ions (IMs) and
neutral losses (NLs), called diagnostic ions, have been reported.
However, there has been a lack of systematic approaches to comprehensively
search for diagnostic ions and no statistical methods for the identification
of citrullinated proteome based on these diagnostic ions. Here, we
present a systematic approach to identify diagnostic IMs, internal
ions (INTs), and NLs for citrullination from tandem mass (MS/MS) spectra.
Diagnostic INTs mainly consisted of internal fragment ions for di-
and tripeptides that contained two and three amino acids with at least
one citrullinated arginine, respectively. A statistical logistic regression
model was built for a confident assessment of citrullinated peptides
that database searches identified (true positives) and prediction
of citrullinated peptides that database searches failed to identify
(false negatives) using the diagnostic IMs, INTs, and NLs. Applications
of our model to complex global proteome data sets demonstrated the
increased accuracy in the identification of citrullinated peptides,
thereby enhancing the size and functional interpretation of citrullinated
proteomes.