Gauge effects in local hybrid functionals evaluated for weak interactions and the GMTKN30 test set

ABSTRACT The so-called ‘gauge problem’, due to the non-uniqueness of exchange-energy densities, is a fundamental challenge for density functionals depending on these energy densities, such as local hybrid functionals. We have recently demonstrated how gauge effects influence the potential-energy curves of the argon dimer, and other quantities depending on ‘non-physical’ Pauli repulsions introduced by incompatible gauges of (semi-)local and exact-exchange energy densities . Introduction of suitable calibration functions depending only on semi-local quantities allowed to correct for these deficiencies and suggested ways to obtain more accurate local hybrid functionals beyond the local spin density approximation (LSDA) exchange-energy density. Here we extend the study of the gauge problem by comparing a number of uncalibrated and calibrated local hybrids for (1) the potential-energy curves of further noble-gas dimers and (2) for the entire GMTKN30 test set and its individual subsets. We find that DFT-D3 dispersion corrections fitted to be compatible with uncalibrated local hybrids have to correct not only for missing London dispersion but also for gauge artefacts that make weak interactions too repulsive. This burden is taken away when using properly calibrated local hybrids, which perform much better for dispersion-sensitive quantities already without D3 corrections, and which require only the physically relevant dispersion to be corrected for. The present results suggest directions for further improvement of calibration functions for local hybrids.

Apart from challenges regarding the new nonstandard two-electron integrals implied by Equation (1), for which we have recently provided efficient implementations using semi-numerical techniques [20,21], the socalled 'gauge problem' [22][23][24] has been a major stumbling block for the further development of local hybrids beyond the already achieved successes. The exchangeenergy densities mixed in Equation (1) are ambiguous. It is well known that any energy density can be modified (calibrated) by adding some non-trivial (identically nonvanishing) calibration function (CF), G(r), which integrates to zero. In the context of exchange-energy densities, calibration implies the modification with While this has no consequences for global hybrids, where a 0 can be taken out of the integration, it becomes a central issue for local hybrids with position-dependent a σ (r). Indeed, it has already been shown [18,24] that calibration may have an appreciable influence on the performance of local hybrids based on GGA or meta-GGA exchangeenergy densities.
While our initial successful local hybrids have been based on a mixing of only LSDA and EXX [6,7,10,12,14,15], we have recently shown that CFs constructed only from (semi-)local ingredients can improve performance of GGA-based local hybrids for thermochemistry appreciably [18]. Importantly, we could identify too repulsive potential-energy curves for weakly interacting dimers, such as the Ar 2 complex, as 'smoking gun' for the gauge problem. Indeed, adding suitable CFs corrected such unphysical Pauli repulsions and provided curves between those of the underlying GGA (LSDA) and the corresponding global hybrid incorporating the maximum EXX admixture occurring in the local hybrid, as one might expect [18]. We had previously optimised atom-pairwise additive dispersion corrections of Grimme's DFT-D3 type [25] for some of our local hybrids [13]. While these were LSDA-based local hybrids, the recent observations regarding a gauge problem for weak interactions suggested that the optimised DFT-D3 parameters may have partly compensated not only for van-der-Waals-type dispersion interactions but also for too repulsive curves caused by gauge problems. Here we therefore compare local hybrids with and without calibration for (1) some potential-energy curves of noble-gas dimers and (2) for the entire GMTKN30 test set of Goerigk and Grimme [26,27]. The latter contains distinct subsets of thermochemical and kinetic data with or without a strong dependence on dispersion interactions. By evaluating the effects of DFT-D3 corrections for calibrated (cf. Equations (3) and (4)) and uncalibrated local hybrids for each subset, we aim at gaining insight into the role of calibration of exchange-energy densities, and at obtaining a wider validation of several existing local hybrid functionals on the large GMTKN30 data set. The obtained results should subsequently aid in constructing more refined local hybrids that properly take into account the calibration of exchange-energy densities.
(1) The so-called 't-LMF' , either spin-channel separated where ρ σ is the alpha or beta spin density, or common for alpha and beta spins (also called 'ct-LMF').
where τ W is the von Weizsäcker kinetic energy density and τ is the Kohn-Sham local kinetic energy density. where is the dimensionless density gradient.
where ζ (r) is the spin polarisation

Modified dynamical correlation contribution
Lacking suitable calibration, our earliest successful local hybrids were based on the LSDA (Slater) exchangeenergy density only, and they were combined with LSDA dynamical correlation (VWN5 [28] or PW92 [29] parameterisations) [6,7,12]. While this combination benefits from some error compensation between long-range LSDA exchange and correlation in regions of space dominated by local exchange, LSDA correlation is obviously a poor approximation to combine with in regions dominated by EXX. The improvement which we have made in [15] is range-separation and reduction of SIE in shortrange PW92 correlation. The two resulting local hybrids may be tagged with either 'self-interaction-reduction for short-range LSDA correlation' (Lh-LSDA-SIR-SRc, λ < 1 in Equation (11) below, equal to factor b in the t-LMF in Equation (6) In Equation (11) have been explored, based either on the so-called 'erfgau' range-separation scheme or on an 'erf ' scheme (see [30] for details). µ in Equation (11) is the range-separation parameter. Combined with the common t-LMF (Equation (6)), optimisation of Lh-LSDA-SIR-SRc gave λ = b = 0.646 and that of Lh-LSDA-SIF-SRc (i.e. λ = 1) gave b = 0.706. These larger prefactors compared to the original values b = 0.48 for the spin-channel t-LMF and b = 0.534 for the ct-LMF were found to be beneficial for properties affected by SIE while still providing accurate thermochemistry results.

Calibration functions
In [18] we have introduced the following 'semi-local' CF, which satisfies many exact constraints (argument r is omitted below for brevity).
where f (s σ ) is a cut-off function ensuring a proper decay rate of the CF. While different cut-off functions are conceivable, in [18] we chose the simple Gaussian function Here we will evaluate two local hybrids based on this form of CF, on GGA energy densities, and only on the simple 'spin-channel' t-LMF in Equation (5): ε GGAx x σ and E GGAc c are GGA corrections to exchange and correlation, respectively. The two possibilities explored in [18] are: (1) GGAx= B88 [31] and GGAc = LYP [32,33], and (2) GGAx = GGAc = PBE [34]. Label 'CG' specifies a CF with a Gaussian cut-off function (13). Functional (14) contains five semi-empirical parameters: a x , a c , c, and two internal parameters of the LMF (b) and of the CF (η). For both functionals, the five parameters were optimised to minimise the weighted mean absolute errors (MAE) for the G2/97 test set of 148 atomisation energies [35,36], and for the BH76 set of 76 reaction barriers (38 hydrogen transfer [37] and 38 heavier-atom transfer reactions [38]), with equal weights. The specific parameters of all functionals evaluated in this work are summarised in Table S1 in Supporting Information.

Parameters of the DFT-D3-type dispersion corrections
The functional-specific D3 parameters s r,6 and s 8 for each local hybrid (with Chai/Head-Gordon damping [39]) were fitted to the S22 weak-interaction test set [40], using the refined reference data of Takatani et al. [41]. The optimised D3 parameters are listed in Table 1 (those for the global and double hybrids are from the literature [42]).

GMTKN30 error weighting
We followed Goerigk/Grimme's [26,27] The subsets are explained in detail in [26,27]. Some of the contents will be mentioned in the discussion of the results.

Further computational details
All calculations were performed with a local development version of the TURBOMOLE program (V6.4) [43]. Uncontracted def2-QZVP basis sets [44] have been employed throughout. Energies have been evaluated nonself-consistently, based on B3LYP [2,32,33,45] orbitals, a procedure that is adequate for the present preliminary evaluation of new functionals for energies (cf. [46] and references therein). We have additionally checked this by comparison with fully self-consistent calculations on the MB08 subset (selected for having the electronically most difficult cases within the GMTKN30 test set). Using the LSDA-based local hybrid with t-LMF (with D3 corrections), the self-consistent calculations provided an MAE (MSE) of 7.27 kcal/mol (-0.66 kcal/mol), whereas the post-B3LYP calculations with the same functional gave 7.31 kcal/mol (-0.68 kcal/mol). This confirms the negligible differences between the two setups for relative energies. Evaluation of the EXX energy density used our recently implemented semi-numerical implementation [20].

Artificial Pauli repulsion in weakly bound systems as manifestation of the gauge problem
As mentioned above, potential-energy curves of weakly bound complexes are an interesting 'smoking gun' for the gauge problem in local hybrids. In [18] we focused on the argon dimer. In the absence of a non-local vander-Waals correlation functional or of additive dispersion corrections like those evaluated further below, standard density functionals as well as local hybrids should provide repulsive interactions between van-der-Waalsbound fragments, due to Pauli repulsion [47]. Artificial binding may of course be due to deficiencies of the functional, in particular for LSDA. Without calibration, both LSDA-based and GGA-based local hybrids provided far too repulsive curves, outside the range ('corridor') expected from the underlying LSDA/GGA and the corresponding global hybrid with maximum EXX admixture occurring in the given local hybrid [18]. This erroneous behaviour can be corrected by suitable CFs, as we demonstrated for Ar 2 using a few different functionals [18]. Here we extend the study to the Ne 2 and NeAr dimers to evaluate parameter transferability. Figure 1 shows that with the set of parameters previously optimised on atomisation energies and reaction barriers [18], the calibrated Lh-PBE-CG functional indeed gives much more realistic potential-energy curves compared to the too repulsive uncalibrated ones. However, with these parameters we are still outside the corridor of PBE (a 0 = 0.0) and the 'upper-limit' PBE global hybrid (a 0 = 0.5) for all three systems. An increase of the linear prefactor c of the CF, Equation (14), by a factor of about 2.3 gives much improved potentialenergy curves (Figure 2). We note in passing that at large inter-nuclear separations and low densities the minuscule energy changes may require larger integration grids than used here for the semi-numerical integration scheme. As a downside, the larger value of c deteriorates performance for atomisation energies and barriers (e.g. to MAEs of 4.02 kcal/mol for W4-08 atomisation energies and to 4.71 kcal/mol for the BH76 reaction-barrier test set; see below). Optimisation of more flexible CFs should thus be evaluated. This is outside the scope of the present work and will be reported elsewhere.

Evaluation for the GMTKN30 database
We evaluate overall nine local hybrids of different generations, of which seven are LSDA-based functionals without calibration (going from the tand s-LMF to their spinpolarised counterparts [6,7,10,12,14], and to their latest range-separated and self-interaction free/reduced versions [15]), and two are GGA-based (Lh-BLYP-CG and Lh-PBE-CG) with calibration according to Equation (14) [18]. For comparison, we provide data from [42] for three global hybrids (B3LYP, PBE0 [48], and PW6B95) [49], and for the DSD-BLYP double hybrid [50]. While B3LYP and PBE0 were chosen as 'parent' global hybrids related to the two GGA-based local hybrids we will evaluate, we have included PW6PW95 as the 'best-performing global hybrid' (the highly parameterised, τ -dependent M06-2X [51] gave a still slightly lower WTMAD in [42]) and DSD-BLYP as the best-performing double hybrid (and overall best-performing functional) from [42]. Table 1 provides the total WTMAD values for all functionals, which are graphically presented (with and without D3 corrections) in Figure 3. Let us start with the performance in the absence of the dispersion corrections: we first note the order of performance of the global hybrids (with PW6B95 performing best) and of the clearly superior DSD-BLYP double hybrid (the main-group energetics of GMTKN30 appears to be particularly suitable to emphasise the strengths of double hybrids).
Strikingly, the uncalibrated local hybrids perform relatively poorly in the absence of D3 correction terms, either comparable or somewhat better/worse than B3LYP. Only the calibrated GGA-based local hybrids perform better without D3 terms, in particular Lh-PBE-CG. This functional is close to the quality of the best global hybrid, PW6B95. While the performance of all functionals benefits from the dispersion corrections (below we will analyse the most affected subsets in detail), the 'improvement' for the uncalibrated local hybrids is by far most pronounced. Indeed, after addition of D3 terms, the differences in the performance of the local hybrids become rather small, and even some of the uncalibrated LSDA-based ones can now compete with the best global hybrids. This confirms our previous suspicion (see above) that exaggerated Pauli repulsions due to the gauge problem affect appreciably the performance of the uncalibrated local hybrids for cases where weak interactions become important (cf. Figures 1 and 2). This will be evaluated in more detail below. Regarding the overall WTMADs, we see that reduction by D3 terms tends to be below 1 kcal/mol for PBE0, PW6B95, and DSD-BLYP (B3LYP exhibits a reduction by 2 kcal/mol or by a factor of 1.54; it is known to have some problems with intermediate-range correlation effects [52,53]). A factor between 1.2 and 1.5 is a typical range of reduction also for many other functionals, except possibly for some Minnesota functionals where mediumrange correlation effects are already fitted into the functional and the D3 corrections tend to be smaller [42]. In contrast, the effect of the D3 corrections is much larger for the uncalibrated local hybrids, between ca. 2.3 kcal/mol for the Lh-LSDA with ζ -t-LMF and ca. 3.8 kcal/mol for Lh-LSDA-SIR-SRc or, probably more revealingly, by factors between 1.81 and 2.33 (Table 1). Calibration reduces the effect of the dispersion terms, to ca. 1.8 kcal/mol for Lh-BLYP-CG and to ca. 1.1 kcal/mol for Lh-PBE-CG, or to factors 1.73 and 1.42, respectively. These results indicate the need for calibration and its effect on weak interactions. They suggest also, however, that the presently used gauge functions may not yet remove the gauge artefacts completely. This holds in particular for Lh-BLYP-CG, where we note the relatively large exponent η = 0.5 of the Gaussian cut-off function of the CF Equation (13) [18]). Obviously, matters are already significantly better for Lh-PBE-CG. Here the calibration with a more diffuse cut-off of η = 0.12 appears more effective (cf. Figure 2), and the effect of the D3 corrections on the WTMAD approaches the genuine physical dispersion contributions found for typical standard functionals (differences in the mediumrange repulsive/attractive character of various functionals notwithstanding). This suggests already the preliminary conclusion that there is substantial room for improvement of the CFs to provide even better-performing local hybrid functionals. We note also, however, that even the relatively crudely optimised [18] Lh-PBE-CG functional clearly outperforms its 'underlying' global hybrid PBE0, and the same holds for Lh-BLYP-CG vs. B3LYP ( Figure 3 and Table 1). To analyse the above observations more closely, Figure 4 concentrates on the MAEs for some subsets of GMTKN30, which are known to be particularly sensitive to dispersion contributions (see Tables S2-S4 in Supporting Information for detailed statistics for the various GMTKN30 subsets). We compare one uncalibrated LSDA-based local hybrid (Lh-LSDA-SIF-SRc, blue) and the calibrated Lh-PBE-CG (black) with and without dispersion corrections. In the absence of D3 corrections (solid lines), the calibrated functional is vastly superior in most cases. This holds in particular for the WATER27 test set. It is obvious that the gauge problem affects the weak intermolecular interactions in these water clusters appreciably for the uncalibrated functional (leading to far too weak binding and to an MAE of more than 30 kcal/mol). The calibrated functional does not exhibit this problem and has a modest MAE of ca. 5 kcal/mol without dispersion. Consequently, the D3 corrections reduce the MAE dramatically for the uncalibrated local hybrid and much less so for the calibrated one ( Figure 4 and Table S4). Other subsets where the gauge artefacts apparently spoil the performance of the uncalibrated local hybrid most pronouncedly are BHPERI, Al2X, NBPRC, DARC, ALK6, IDISP, S22, and ADIM6. After adding the D3 corrections, even the uncalibrated functional provides low MAEs for most of the subsets (Al2X, ALK6, DARC, and WATER27 still are reproduced relatively poorly). However, this is likely the right answer for the wrong reason, as we burden the D3 terms with also correcting for the gauge artefacts.
Extending the comparison between the uncalibrated and calibrated functional to all of the subsets of GMTKN30 ( Figure 5), we seek to evaluate if the gauge problem is also found for cases where dispersion is less important. We see some notable albeit not dramatic effects of the D3 corrections for the uncalibrated functional (but not for the calibrated one!) for MB08-165, SIE11, ACONF, and HEAVY28. Other sets are clearly unaffected by both dispersion effects and by gauge problems. Examples are W4-08 (atomisation energies), G21IP (ionisation potentials), G21EA (electron affinities), PA (proton affinities), BH76 (barriers), ISO34 (isomerisation energies), as well as SCONF and CYCONF. In case of SIE11 (SIE-sensitive energies), it has been found before that the D3 corrections actually increase the MAE [42]. It appears that this increase is also exaggerated by 'over-fitted' corrections for the uncalibrated local hybrids (Table S2).
As the S22 set has been used to fit the parameters of the D3 corrections, it is obvious that exaggerated dispersion terms must result from such a fit in case of uncalibrated local hybrids. Indeed, Table S5 in Supporting Information shows for the individual energies of the S22 set how the gauge problem clearly increases the magnitude of the dispersion terms in an unphysical way. For example, without dispersion corrections the deviations from the reference values for Lh-PBE-CG for the dispersion-dominated subclass of complexes are lower than those for the uncalibrated Lh-LSDA-SIF-SRc by typically a factor of 2-4. After D3 corrections, deviations for both functionals are generally below 1 kcal/mol.

Conclusions
This work has highlighted the influence of the so-called gauge problem of exchange-energy densities of local hybrid functionals for weak interactions. Extension of our previous evaluation for noble-gas dimer potentialenergy curves to further systems has confirmed that in the absence of a calibration of the energy densities, local hybrid functionals give far too repulsive interaction curves when compared to appropriate global hybrid functionals. Introduction of a CF allows this to be remedied. However, the present results suggest that we need to increase the flexibility of the CF further to make it more transferable.
We have subsequently chosen the large GMTKN30 test set of Goerigk and Grimme to shine further light on the interrelations between the gauge/calibration problem of local hybrids and weak interactions. As we focused in particular on subsets that depend crucially on dispersion interactions, we have optimised the adjustable parameters of the DFT-D3 corrections for a larger variety of local hybrids to the S22 weak-interaction energies. The results show clearly that for uncalibrated local hybrids, the D3 corrections not only include the genuine dispersion contributions but also correct for the systematically too repulsive behaviour of the underlying functional. In contrast, use of calibrated energy densities diminishes significantly the extent of the D3 corrections, which now largely account indeed only for the physical interactions, as they were intended to do. For example, the calibrated Lh-PBE-CG functional performs comparably to the best global hybrids in the literature (and clearly superior to its underlying global hybrid PBE0) for the entire GMTKN30 set and its subsets, and it requires very similar dispersion corrections. The present insights provide us with further directions on how to improve CFs and to thus arrive at improved local hybrid functionals. Due to their enhanced flexibility, these should outperform global hybrids for a wide range of properties. This is important, as positiondependent EXX admixture promises improvements significantly beyond the energy quantities studied here.