posted on 2024-01-25, 20:10authored byChloe Engler Hart, Tobias Kind, Pieter C. Dorrestein, David Healey, Daniel Domingo-Fernández
Calculating
spectral similarity is a fundamental step in MS/MS
data analysis in untargeted metabolomics experiments, as it facilitates
the identification of related spectra and the annotation of compounds.
To improve matching accuracy when querying an experimental mass spectrum
against a spectral library, previous approaches have proposed increasing
peak intensities for high m/z ranges.
These high m/z values tend to be
smaller in magnitude, yet they offer more crucial information for
identifying the chemical structure. Here, we evaluate the impact of
using these weights for identifying structurally related compounds
and mass spectral library searches. Additionally, we propose a weighting
approach that (i) takes into account the frequency of the m/z values within a spectral library in
order to assign higher importance to the most common peaks and (ii)
increases the intensity of lower peaks, similar to previous approaches.
To demonstrate our approach, we applied weighting preprocessing to
modified cosine, entropy, and fidelity distance metrics and benchmarked
it against previously reported weights. Our results demonstrate how
weighting-based preprocessing can assist in annotating the structure
of unknown spectra as well as identifying structurally similar compounds.
Finally, we examined scenarios in which the utilization of weights
resulted in diminished performance, pinpointing spectral features
where the application of weights might be detrimental.