ct8b00514_si_001.pdf (201.36 kB)
Formulation of Small Test Sets Using Large Test Sets for Efficient Assessment of Quantum Chemistry Methods
journal contribution
posted on 2018-07-13, 19:43 authored by Bun ChanIn the present study,
we have examined in detail literature data
of deviations for a wide range of (mainly) DFT methods for the extensive
MGCDB82 set (∼4400 data points) of main-group thermochemical
quantities. We use the data and standard statistical techniques (lasso
regularization and forward selection) to devise the MG8 model for
linearly combining assessment results of a collection of small data
sets to accurately estimate the MAD of MGCDB82. The MG8 model contains
a total of 64 data points representing noncovalent interactions, isomerization
energies, thermochemical properties, and barrier heights. It is thus
well suited for rapid evaluation of new quantum chemistry procedures.
We propose that a value of ∼4 kJ mol–1 for
an estimated MAD by the MG8 model (EMADMG8) to be an initial
indicator of a highly robust quantum chemistry method, with large
deviations occurring mainly for properties (such as heats of formation)
that are known to be difficult to accurately compute. For methods
with larger EMADs, we emphasize the importance of more thorough testing,
as these methods are likely to have a larger number of outliers, and
it may be less trivial to anticipate circumstances under which large
deviations occur. In relation to this aspect, we have applied the
same generally applicable statistical techniques to further formulate
small-data-set models for assessing the accuracy for some properties
that are not covered by MG8 nor by MGCDB82. They include the MOR13
model for metal–organic reactions, the SBG5 model for semiconductor
band gaps, and MB13 for stress-testing methods with artificial species.