pr070461k_si_001.pdf (150.43 kB)
Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing
journal contribution
posted on 2008-01-04, 00:00 authored by Sebastian Klie, Lennart Martens, Juan Antonio Vizcaíno, Richard Côté, Phil Jones, Rolf Apweiler, Alexander Hinneburg, Henning HermjakobSince the advent of public data repositories for proteomics data,
readily accessible results from high-throughput experiments have been
accumulating steadily. Several large-scale projects in particular
have contributed substantially to the amount of identifications available
to the community. Despite the considerable body of information amassed,
very few successful analyses have been performed and published on
this data, leveling off the ultimate value of these projects far below
their potential. A prominent reason published proteomics data is seldom
reanalyzed lies in the heterogeneous nature of the original sample
collection and the subsequent data recording and processing. To illustrate
that at least part of this heterogeneity can be compensated for, we
here apply a latent semantic analysis to the data contributed by the
Human Proteome Organization’s Plasma Proteome Project (HUPO
PPP). Interestingly, despite the broad spectrum of instruments and
methodologies applied in the HUPO PPP, our analysis reveals several
obvious patterns that can be used to formulate concrete recommendations
for optimizing proteomics project planning as well as the choice of
technologies used in future experiments. It is clear from these results
that the analysis of large bodies of publicly available proteomics
data by noise-tolerant algorithms such as the latent semantic analysis
holds great promise and is currently underexploited.