posted on 2023-01-12, 13:51 authored by Thijs Stuyver, Kjell Jorner, Connor Coley

This dataset consists of 5269 reaction profiles computed in a high-throughput manner at B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level of theory with the help of autodE and Gaussian 16. Reaction IDs and SMILES, activation energies (G_act; in kcal/mol) and reaction energies (G_r; in kcal/mol) for each computed reaction profile are provided in CSV format (full_dataset.csv). XYZ-files for each reactant (both the original and stereo-constrained versions), TS and product species as well as a CSV file containing computed electronic energies and thermal corrections are available in a compressed archive file, full_dataset_profiles.tar.gz. 

The files have been organized per reaction profile, identified through the reaction ID. Within each directory, reactant XYZ-files are of the form, product XYZ-files are of the form, and transition state XYZ-files are of the form If the reactant dipole conformer had to be corrected to enforce stereochemical compatibility, the latter XYZ-files are included under to form of The energies for all of these species are summarized per directory in energies.csv.

Additionally, all the benchmarking data are made available in the benchmarking_data.tar.gz directory


Machine Learning for Pharmaceutical Discovery and Synthesis Consortium

International Postdoc grant from the Swedish Research Council (No. 2020-00314)


