ML DFT benchmarking
This dataset consists of 92 reaction profiles computed in a high-throughput manner at CAMB3LYP-GD3BJ/6-311++G** level of theory with the help of autodE and Gaussian 16 in the gas phase. Gaussian log-files for the IRC and the final frequency calculation for each reactant, TS, and product species, XYZ-files of final geometries as well as a CSV file containing computed electronic energies and thermal corrections are available in a compressed archive file, full_dataset_profiles.tar.gz.
The files have been organized per reaction profile, and identified through the reaction ID. Within each directory, reactant XYZ-files are of the form r_####.xyz, product XYZ-files are of the form p_####.xyz, and transition state XYZ-files are of the form TS_####.xyz. If the product/reactant had to be corrected to enforce stereochemical compatibility, the latter XYZ-files are included under the form of p/r_alt_####.xyz. The frequency log-files can be found in a frequency_logs directory.
In the single_points.tar.gz directory, log-files for the computation with the 20 diferent functionals can be found.
The references DLPNO-CCSD(T) log-files calculations are available in the Reference values directory.
If this dataset are used as part of a publication please cite the associated preprint.
Cite items from this project
Funding
CPJ grant (ANR-22-CPJ1-0093-01)
ERC AdG (project MaMa, no. 101097351)
ANR MoMoPlasm project (ANR-21-CE29-0003)
Labex SEAM ANR-10-LABX-096, ANR-18-IDEX-0001