figshare
Browse

ORDerly supplementary datasets

Version 3 2024-02-05, 22:42
Version 2 2023-08-29, 17:03
Version 1 2023-06-12, 16:43
dataset
posted on 2024-02-05, 22:42 authored by Daniel WighDaniel Wigh, Joe arrowsmith, Alexander Pomberger, Kobi Felton, Alexei A. Lapkin
<p dir="ltr">Supplementary datasets used in ORDerly (i.e. the non-benchmark datasets)</p><ul><li>Condition prediction datasets: Contains parquet files for each of the four flavours of ORDerly-condition datasets that we used in the ORDerly paper. </li><li>Condition prediction datasets config: Contains the .log and .json files showing the parameters used in cleaning and the impact on dataset size after each cleaning step.</li><li>Transformer datasets: Contains plain txt files with the six transformer-ready datasets that were used for training/testing with Molecular Transformer. </li><li>Non uspto data: Contains the datasets created with ORDerly from non-USPTO data in ORD. These datasets were used as test sets for forward prediction and retrosynthesis prediction.</li></ul><p dir="ltr">Preprint: https://chemrxiv.org/engage/chemrxiv/article-details/64ca5d3e4a3f7d0c0d78ca42</p><p dir="ltr">Neurips workshop paper: https://openreview.net/forum?id=R8FQMsECIS</p><p dir="ltr">Code: https://github.com/sustainable-processes/orderly</p><p dir="ltr">The ORDerly benchmark datasets can be found here: https://figshare.com/articles/dataset/ORDerly_chemical_reactions_condition_benchmarks/23298467</p><p dir="ltr">Please feel free to contact me, Daniel Wigh, at dsw46@cam.ac.uk in case of any questions.</p>

Funding

UCB Pharma

Engineering and Physical Sciences Research Council via project EP/S024220/1 EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies.

European RegionalDevelopment Fund via the project "Innovation Centre in Digital Molecular Technologies"

History

Related Materials

  1. 1.
    URL - Is supplemented by ORDerly code repository

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC