Version 3 2024-02-05, 22:42Version 3 2024-02-05, 22:42
Version 2 2023-08-29, 17:03Version 2 2023-08-29, 17:03
Version 1 2023-06-12, 16:43Version 1 2023-06-12, 16:43
dataset
posted on 2024-02-05, 22:42authored byDaniel WighDaniel Wigh, Joe arrowsmith, Alexander Pomberger, Kobi Felton, Alexei A. Lapkin
<p dir="ltr">Supplementary datasets used in ORDerly (i.e. the non-benchmark datasets)</p><ul><li>Condition prediction datasets: Contains parquet files for each of the four flavours of ORDerly-condition datasets that we used in the ORDerly paper. </li><li>Condition prediction datasets config: Contains the .log and .json files showing the parameters used in cleaning and the impact on dataset size after each cleaning step.</li><li>Transformer datasets: Contains plain txt files with the six transformer-ready datasets that were used for training/testing with Molecular Transformer. </li><li>Non uspto data: Contains the datasets created with ORDerly from non-USPTO data in ORD. These datasets were used as test sets for forward prediction and retrosynthesis prediction.</li></ul><p dir="ltr">Preprint: https://chemrxiv.org/engage/chemrxiv/article-details/64ca5d3e4a3f7d0c0d78ca42</p><p dir="ltr">Neurips workshop paper: https://openreview.net/forum?id=R8FQMsECIS</p><p dir="ltr">Code: https://github.com/sustainable-processes/orderly</p><p dir="ltr">The ORDerly benchmark datasets can be found here: https://figshare.com/articles/dataset/ORDerly_chemical_reactions_condition_benchmarks/23298467</p><p dir="ltr">Please feel free to contact me, Daniel Wigh, at dsw46@cam.ac.uk in case of any questions.</p>
Funding
UCB Pharma
Engineering and Physical Sciences Research Council via project EP/S024220/1 EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies.
European RegionalDevelopment Fund via the project "Innovation Centre in Digital Molecular Technologies"