figshare
Browse

SingleFrag

dataset
posted on 2024-10-23, 09:13 authored by Maribel Pérez RiberaMaribel Pérez Ribera, Muhammad Faizan Khan, Roger Giné, Josep M. Badia, Sandra Junza, Oscar Yanes, Marta Sales-Pardo, Roger Guimerà

The following item could be divided into 3 parts:

  1. Neural networks trained to predict the presence of a peak in a specific MZ localization in a tandem mass spectrum. In the name of every file, the MZ position is written in the following way (mz*100).
    1. ANN: Artificial Neural Networks
    2. GNN: Graph Neural Networks
    3. COM: Networks that combine the predictive power of ANN and GNN
  2. Mol2vecModel: Contains a Mol2vec model trained to obtain a 300-dimensional vector from molecule SMILES.
  3. modelData:
    1. AllMostFreqMolGeneral_rep_dades1.pickle: file containing the number of peaks that are contained in every MZ bin from tandem mass spectra in the training set.
    2. thresholdsANN.pickle: threshold per each of the most frequented 1,000 MZ positions in the training set. If a prediction using an ANN model for a specific position is higher or equal to this value (for its specific MZ position), means that a peak in that bin is predicted.
    3. thresholdsGNN.pickle: same as above but for the GNN models.
    4. thresholdsCOM.pickle: same as above but for the COM models.

Every data file is stored in a .pickle format, using Python 3.8.19.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC