Development of Scalable and Generalizable Machine Learned Force Field for Polymers
This is a dataset of 344,654 clusters of ethylene glycol EG, diethylene glycol (EG)2, and triethylene glycol EG(3), along with DFT labels at the wB97X-D3BJ/def2-TZVPD level of theory.
The primary key for each datapoint is a string representing an index from 0 to 344,653 inclusive. Each datapoint is a dictionary containing the following keys:
* atomicNumbers: atomic number of each atom
* charge: net charge for each system, in elementary charges (always 0 for this dataset)
* datasetTitle: indicates the sampling step which produced this data, out of the following six options:
* EG_run: initial cluster sampling of EG, (EG)2, and (EG)3
* decomp_EG_monomer: decomposition sampling for EG
* decomp_EG_dimer: decomposition sampling for (EG)2
* decomp_EG_trimer: decomposition sampling for (EG)3
* active_one: first active learning round
* active_two: second active learning round
* elements: element for each atom (same info as atomicNumbers)
* labels: for this dataset, always 'wB97X-D3BJ__def2-TZVPD', containing within the following keys:
* atomizationEnergy: total energy with fitted per-atom atomic energies removed, in Hartree (H: -31.91939585331449, C: -6.840292406034887, O: -12.5395523076437)
* dipoleMoment: DFT system dipole moment, in e-Angstroms
* gradient: energy gradient dE/dx (negative of atomic forces), in Hartree/Angstrom
* totalEnergy: total energy from wB97X-D3BJ/def2-TZVPD, in Hartree
* xtbCharges: xtb partial charges, in elementary charge units
* multiplicity: spin multiplicity for each system (always 1 for this dataset)
* positions: xyz coordinates of atomic nuclei, in Angstroms
Authors:
Shaswat Mohanty,
James Stevenson,
Andrea Browning,
Leif Jacobson,
Karl Leswing,
Mathew D. Halls,
Mohammad Atif Faiz Afzal