posted on 2024-10-23, 09:13authored byMaribel Pérez RiberaMaribel Pérez Ribera, Muhammad Faizan Khan, Roger Giné, Josep M. Badia, Sandra Junza, Oscar Yanes, Marta Sales-Pardo, Roger Guimerà
The following item could be divided into 3 parts:
Neural networks trained to predict the presence of a peak in a specific MZ localization in a tandem mass spectrum. In the name of every file, the MZ position is written in the following way (mz*100).
ANN: Artificial Neural Networks
GNN: Graph Neural Networks
COM: Networks that combine the predictive power of ANN and GNN
Mol2vecModel: Contains a Mol2vec model trained to obtain a 300-dimensional vector from molecule SMILES.
modelData:
AllMostFreqMolGeneral_rep_dades1.pickle: file containing the number of peaks that are contained in every MZ bin from tandem mass spectra in the training set.
thresholdsANN.pickle: threshold per each of the most frequented 1,000 MZ positions in the training set. If a prediction using an ANN model for a specific position is higher or equal to this value (for its specific MZ position), means that a peak in that bin is predicted.
thresholdsGNN.pickle: same as above but for the GNN models.
thresholdsCOM.pickle: same as above but for the COM models.
Every data file is stored in a .pickle format, using Python 3.8.19.