figshare
Browse

Reaction SMILES CRD 1.37M dataset

Download (26.56 MB)
dataset
posted on 2025-01-17, 18:43 authored by Rik van der LingenRik van der Lingen
Collection of reaction SMILES (reactants, reagents, solvents, products)  1.37M lines total from patent literature (USPTO 1976 - 2024) and from academic literature (2.5% total). Data converted from existing USPTO dataset 1] and data generated by parsing by custom design. Data extraction by OSCAR (semantic) or ChatGPT (LLM), molecule identification by OPSIN and custom synonym list. All SMILES are RDKit-safe with duplicate reactions removed. Please note that the data have been collected in an semi-automated process, the dataset is certainly not without errors.More information on https://kmt.vander-lingen.nl.

1] Chemical reactions from US patents (1976-Sep2016), Daniel Lowe. Link.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC