figshare
Browse
ct8b00514_si_001.pdf (201.36 kB)

Formulation of Small Test Sets Using Large Test Sets for Efficient Assessment of Quantum Chemistry Methods

Download (201.36 kB)
journal contribution
posted on 2018-07-13, 19:43 authored by Bun Chan
In the present study, we have examined in detail literature data of deviations for a wide range of (mainly) DFT methods for the extensive MGCDB82 set (∼4400 data points) of main-group thermochemical quantities. We use the data and standard statistical techniques (lasso regularization and forward selection) to devise the MG8 model for linearly combining assessment results of a collection of small data sets to accurately estimate the MAD of MGCDB82. The MG8 model contains a total of 64 data points representing noncovalent interactions, isomerization energies, thermochemical properties, and barrier heights. It is thus well suited for rapid evaluation of new quantum chemistry procedures. We propose that a value of ∼4 kJ mol–1 for an estimated MAD by the MG8 model (EMADMG8) to be an initial indicator of a highly robust quantum chemistry method, with large deviations occurring mainly for properties (such as heats of formation) that are known to be difficult to accurately compute. For methods with larger EMADs, we emphasize the importance of more thorough testing, as these methods are likely to have a larger number of outliers, and it may be less trivial to anticipate circumstances under which large deviations occur. In relation to this aspect, we have applied the same generally applicable statistical techniques to further formulate small-data-set models for assessing the accuracy for some properties that are not covered by MG8 nor by MGCDB82. They include the MOR13 model for metal–organic reactions, the SBG5 model for semiconductor band gaps, and MB13 for stress-testing methods with artificial species.

History