figshare
Browse

ORDerly supplementary datasets

Version 3 2024-02-05, 22:42
Version 2 2023-08-29, 17:03
Version 1 2023-06-12, 16:43
dataset
posted on 2024-02-05, 22:42 authored by Daniel WighDaniel Wigh, Joe arrowsmith, Alexander Pomberger, Kobi Felton, Alexei A. Lapkin

Supplementary datasets used in ORDerly (i.e. the non-benchmark datasets)

  • Condition prediction datasets: Contains parquet files for each of the four flavours of ORDerly-condition datasets that we used in the ORDerly paper.
  • Condition prediction datasets config: Contains the .log and .json files showing the parameters used in cleaning and the impact on dataset size after each cleaning step.
  • Transformer datasets: Contains plain txt files with the six transformer-ready datasets that were used for training/testing with Molecular Transformer.
  • Non uspto data: Contains the datasets created with ORDerly from non-USPTO data in ORD. These datasets were used as test sets for forward prediction and retrosynthesis prediction.

Preprint: https://chemrxiv.org/engage/chemrxiv/article-details/64ca5d3e4a3f7d0c0d78ca42

Neurips workshop paper: https://openreview.net/forum?id=R8FQMsECIS

Code: https://github.com/sustainable-processes/orderly

The ORDerly benchmark datasets can be found here: https://figshare.com/articles/dataset/ORDerly_chemical_reactions_condition_benchmarks/23298467

Please feel free to contact me, Daniel Wigh, at dsw46@cam.ac.uk in case of any questions.

Funding

UCB Pharma

Engineering and Physical Sciences Research Council via project EP/S024220/1 EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies.

European RegionalDevelopment Fund via the project "Innovation Centre in Digital Molecular Technologies"

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC