The performance of Dunning, Jensen, and Karlsruhe basis sets on computing relative energies and geometries

ABSTRACT In an effort to assist researchers in choosing basis sets for quantum mechanical modeling of molecules (i.e. balancing calculation cost versus desired accuracy), we present a systematic study on the accuracy of computed conformational relative energies and their geometries in comparison to MP2/CBS and MP2/AV5Z data, respectively. In order to do so, we introduce a new nomenclature to unambiguously indicate how a CBS extrapolation was computed. Nineteen minima and transition states of buta-1,3-diene, propan-2-ol and the water dimer were optimized using 45 different basis sets. Specifically, this includes one Pople (i.e. 6-31G(d)), 8 Dunning (i.e. VXZ and AVXZ, X = 2–5), 25 Jensen (i.e. pc-n, pcseg-n, aug-pcseg-n, pcSseg-n, and aug-pcSseg-n, n = 0–4), and 9 Karlsruhe (e.g. def2-SV(P), def2-QZVPPD) basis sets. The molecules were chosen to represent both common and electronically diverse molecular systems. In comparison to MP2/CBS relative energies computed using the largest Jensen basis sets (i.e. n = 2,3,4), the use of smaller sizes (n = 0,1,2 and n = 1,2,3) provides results that are within 0.11–0.24 and 0.09–0.16 kcal mol . To practically guide researchers in their basis set choice, an equation is introduced that ranks basis sets based on a user-defined balance between their accuracy and calculation cost. Furthermore, we explain why the aug-pcseg-2, def2-TZVPPD and def2-TZVP basis sets are very suitable choices to balance speed and accuracy.


Introduction
In projects involving quantum mechanics (QM) calculations a compromise must often be reached between the computational costtheory level and basis set sizeand the desired precision of the resulting data.The Dunning series of correlation-consistent polarized basis sets (i.e.VXZ and AVXZ) [1][2][3][4] are frequently used and considered among the best for generating accurate data.However, they are also among the most costly to use.In an effort to reduce calculation cost while maintaining good data accuracy, alternative basis sets have been proposed.Two such basis sets are Jensen's polarization-consistent (e.g.pc-n, augpcseg-n) [5][6][7][8][9][10] and response property-optimized Karlsruhe (i.e.def2) [11][12][13][14][15][16] basis sets.
The Jensen basis sets were developed for use in Hartree-Fock (HF) and density functional theory (DFT) calculations.These basis sets were designed to converge faster than Dunning basis sets without the loss of accuracy in certain observables (e.g.absolute energies, electron affinity, dipole moment, polarizability).Two groups have investigated how Jensen's pc-n basis set family perform for correlated wave function (i.e.MP2 and CCSD(T)) optimizations.For isolated molecules, Kupka and Lim showed that MP2 and CCSD(T) calculations using pc-n basis sets performed favorably in comparison to Dunning basis sets and at a reduced computational cost. [17]More recently, Kupka and coworkers analyzed the aug-pc-n basis sets for predicting the water dimer minimum's interaction energy using CCSD(T) theory, and their convergence to the complete basis set (CBS) limit using different fitting approaches. [18]lSohly and Tschumper extended this work by investigating weakly-bonded systems.Using modified pc-n basis sets that include diffuse functions that better model weak forces; they found that the basis sets could reproduce benchmark values, but at a higher cost than the Dunning basis sets. [19]he early Karlsruhe basis sets (e.g.[15][16] The more recent basis sets that include diffuse functions were optimized for reproducing dipole polarizabilities computed by HF, DFT, MP2 and their corresponding post-SCF resolution-of-the-identity (RI) calculations. [11,12]The most recent of these basis sets were shown to calculate interaction energies (using the S22 data set [20] ) electron affinity, dipole moments and polarizabilities of atoms and relatively inflexible molecules (e.g.He 2 , Be(CH 3 ) 2 , Mo(CO) 6 ) that are competitive with the larger Dunning basis set. [11]For organic compounds (e.g.[23] In particular, one study computed the relative energy profile for a torsion rotation that included transition states (TS), but did not compare the resulting data to higher theories levels. [22]o our knowledge, there have been two studies that include a comparison between Jensen and Karlsruhe basis sets.Johansson and Olsen computed the transition states for rotating about the central bond of biphenyl using pc-n (n = 1-4), aug-pc-n (n = 1-3), def2-SVP, def2-TZVPP, and def2-QZVPP basis sets using HF and DFT theories. [24]In a larger study, Witte et al. examined the basis sets' performances when used with DFT theory for reproducing intermolecular interactions of the S22 data set. [25]They concluded that once basis set superposition error has corrected for, def2-SVPD performs well to similarly sized Dunning and Jensen basis sets, while def2-QZVPD is a practical alternative to pc-4.
Herein, we present an extension of the above papers by comparing an extensive collection of Jensen, Karlsruhe and Dunning basis sets for computing MP2 geometries and relative conformational energies of minima and transition states.The choice of MP2 theory was motivated by the fact that it is a well established theory whose behavior is systematic as a function of basis sets size (i.e.variational-like).While modern DFT methods can offer better performance at a reduced cost, we feel that the rapid progress and development of new functionals make the longevity of a given theory difficult to predict.The results herein will be of interest to researchers who model larger and more flexible molecules (e.g.carbohydrates, drugs), whose potential energy surfaces are not easily explored using more rigorous theories (e.g.CCSD(T)), but want to include some level of electron correlation into their modeling.
The segmented contracted basis sets by Jensen (i.e.pcSseg-n and aug-pcSseg-n) were specifically created to calculate nuclear magnetic shielding constants. [10]onsequently, one might not expect them to provide highly accurate relative energies and geometries.Never-the-less, we include them into this study in hopes that they might fortuitously provide good results at a reasonable expense.
The target benchmark data includes MP2/AV5Z optimized geometries and MP2 CBS relative energies for buta-1,3-diene, propan-2-ol and the water dimer.These molecules were chosen to represent diverse organic systems (i.e. containing H, C, and O) that comprise significant electron delocalization, non-negligible dipole moments and hydrogen bonds.By including both minima and transition states, we further enrich the variety of electronic configurations that are studied.Finally, we also compare two schemes for computing CBS energies and introduce an equation that ranks theory levels according to their calculation cost, accuracy and a user specified weighting factor.

Optimizations and Frequency Calculations
PyMol [26] was used to create the buta-1,3-diene and propanol's initial conformations, which were then fully optimized using MP2/AV5Z theory level.The resulting geometries were subsequently used as input for the other theory levels herein.For the water dimer, all input configurations were taken from supplementary information of Tschumper et al., [27] which were computed at the CCSD(T)/TZ2P(f,d)+dif theory level.To simplify the writing, water configurations will be referred to as conformations.To characterize the structures' positions on their potential energy surface, MP2/AVTZ//MP2/AVTZ frequency calculations were computed using finitedifferences of gradients.
In all MP2 calculations, only the valence electrons were correlated, and the RI approximation [28,29] with auxiliary basis sets [16,[30][31][32][33] were used (see SI material for full listing).Exceptions to this include calculations involving correlation-consistent polarized core-valence basis sets (i.e.ACVXZ) and two CCSD(T)/TZ2P(f,d)+dif calculation (i.e.water dimer 2 and 7), where core electrons were included into the correlation (see Tables S1 and S2).All optimizations were performed using C 1 molecular symmetry unless otherwise stated, with a maximum force of 1.5E-5, a root-mean-squared (RMS) force of 1.0E-5, a maximum displacement of 6.0E-5 and an RMS displacement of 4.0E-5 convergence criteria specified.

Complete Basis Set Extrapolation
In practice, extrapolating energies to the complete basis set limit can be done using two different schemes.These differ by the choice in the molecular geometry used for determining the component energy computed by each basis set.For clarification, a generalized representation for computing a CBS energy is where E THEORY CBS is the CBS limit of the total energy computed using a given theory level sequence, E SCF CBS is the CBS limit of the HF energy, E CORR CBS is the CBS limit of the correlation energy, and R source indicates the source of the geometries used in the calculations.
In Scheme 1, a single source (i.e.reference) geometry is used for computing the required energies for an extrapolation.Generally, the geometry comes from either experimental spectroscopy or a high-level ab initio optimization.This scheme reduces the overall calculational expense since multiple geometry optimizations are not performed.An example for noting this scheme that uses an MP2/AV5Z optimized geometry is where the numbers of the sequential basis sets used are given within the square brackets.The double slash notation within these brackets signify that three and two basis sets (see below) were used to obtain the CBS limits of the HF and correlation energies, respectively.In Scheme 2, multiple geometries are used in computing the component energies.Specifically, these are geometries that were optimized using each basis set specified for the extrapolation workflow.An example for an extrapolation that uses MP2/AVDZ, MP2/ AVTZ, and MP2/AVQZ optimized geometries is One advantage of this scheme is that it can enable a CBS limit to be obtained for internal coordinates. [34]n all CBS limit calculations, the E SCF CBS was extrapolated to using three sequential basis sets and Feller's exponential equation. [35,36]Conversely, the E CORR CBS was extrapolated to using the two largest basis sets within the sequence and Helgaker's power equation with α ¼ 3:0. [37,38]The difference in the basis set involvement for computing E SCF CBS and E CORR CBS gives rise to the notation within the square brackets (e.g.MP2/AV[D,T,Q//T,Q]).In the current study, the individual component E SCF values were extracted from the MP2 optimizations rather than performing additional HF optimizations.Additional details of the equations used are given in supplementary information.
For an unambiguous short hand nomenclature of how a CBS limit was computed one can extend the double slash notation (e.g.MP2/AVDZ//MP2/AV5Z) to include the numbers of the basis set triads (e.g.[T,Q,5]) used in the extrapolation workflow.For both schemes, this can be illustrated in the following examples:

Indexing of Jensen Basis Sets
In order to make comparisons as easy to read and understandable as possible, the Dunning and Jensen basis set nomenclature needed be adjusted such that the indexing becomes consistent between them.Therefore, we chose an index range from 1 to 5 instead of 0 to 4, which also allows for clear CBS extrapolations.Consequently, the Jensen index will be adjusted by þ 1 (i.e.X ¼ n þ 1, n = 0,1,2,3,4) throughout this work, and will be referred to as such (e.g.pc-X, X = 1,2,3,4,5) unless specifically indicated otherwise.

Results and Discussion
A total of four, seven, and eight conformations of buta-1,3-diene, propan-2-ol and the water dimer were optimized by nearly all basis sets used herein, and are shown in Figure 1.The specific instances where an optimization was unable to converge include water dimer 4 (MP2/aug-pcSseg-2, MP2/def2-SVPD) and 5 (MP2/aug-pcSseg-1 and MP2/aug-pcSseg-2, MP2/def2-SVPD).None of the theory levels employed were able to converge to the water dimers 2 and 7 that were reported in reference 27.
In an attempt to reproduce the water dimers 2 and 7 as stationary points, we performed full optimizations at the CCSD(T)/TZ2P(f,d)+diff theory level (i.e. the same theory level reported in reference 27), using our specified convergence criteria (see Methods), enforcing C s symmetry for 7 and C 1 for 2, and correlating all electrons.Using this theory level, we were able to optimized water dimer 7. A full optimization at MP2/TZ2P(f,d)+diff (valence electrons correlated) also found 7 as a stationary point.Finally, we added the three diffuse function (i.e.α s (H) = 0.03016, α s (O) = 0.08993, and α p (O) = 0.05840) to the VTZ basis set (i.e.VTZ+diff), which also resulted in being able to optimize 7 using MP2 theory.An MP2/VTZ optimization of 7 was also performed with the molecular symmetry of C s enforced, which resulted in the structure optimizing to water dimer 1.Thus for water dimer 7, the presence of these diffuse functions is critical for its characterization as a stationary point.In all of these additional calculations, the water dimer 2 remained elusive.Additional information can be found in SI material.

CBS Extrapolation Using Schemes 1 and 2
As briefly mentioned in the introduction, one goal of this study is to evaluate the impact of the more severe approximations made in Scheme 1 in comparison to the more rigorous approach of Scheme 2. Due to their wide-spread and long usage, the AVXZ (X = D,T,Q,5) basis set family was used for this part of the study.
Table 1 provides the relative energies for all conformations investigated, computed using both CBS schemes.Included in this table are previously published CCSD(T) relative energies, computed using a variety of approaches, [27,46,47] which are the current state-ofthe-art results.
On average, the use of Scheme 1 and 2 results in values that are within 0.002 kcal Á mol À1 of one another when using [D,T,Q] or [T,Q,5] basis sets triads.The largest deviation observed is j0.007j kcal Á mol À1 , computed using the [T,Q,5] triad for water dimer 6.Thus, for the augmented Dunning basis sets and the molecules studied here, the two extrapolation schemes yield nearly equivalent MP2/CBS values results.Note that this result is based on the fact that a very reliable geometry (i.e.MP2/AV5Z//MP2/AV5Z) was used for Scheme 1.It is reasonable to assume that if one would alternatively use a less reliable geometry (e.g.computed at a lower theory), then implementing Scheme 1 and 2 would likely yield less equivalent results.
Within each scheme, increasing the basis sets triad used (i.e.[D,T,Q] ![T,Q,5]) alters the relative energies by values ranging from −0.030 (water dimer 6) to +0.062 (buta-1,3-diene 3) kcal Á mol À1 .An average absolute difference is computed to be 0.015 and 0.014 kcal Á mol À1 for Scheme 1 and 2, respectively.In comparison to the small differences observed when changing the CBS scheme used, the size of the basis sets triad has a noticeable impact on the resulting MP2/CBS relative energy values.

Electronic Energies
Observing data trends and why they occur improves our understanding of basis set behavior and helps to identify questionable data in future calculations.Both the Dunning and Jensen basis set families allow for straightforward CBS limit extrapolations.As expected due to the variational-like behavior of MP2, all MP2 electronic energies (Table S3) asymptotically approach a CBS limit (Figures S1-S3).Using these energies, the CBS SCF and MP2 energies were computed and are graphically shown as a function of their triad sequence in Figures S4-S6.For both Dunning basis set families (i.e.VXZ, AVXZ), more negative MP2/CBS energies are always computed when using a larger basis set triad.This trend is also seen for the pcseg-X and aug-pcseg-X Jensen basis set families.
However, exceptions occur for pc-X, pcSseg-X and aug-pcSseg-X where the use of a [X = 3,4,5] triad results in more positive MP2/CBS energies and is dependent upon the chemical system.As a mathematical property of the extrapolation, the MP2/CBS electronic energy computed using a specific triad results in a more negative value in comparison to the component energies that were used in the extrapolation process (e.g.AVDZ > AVTZ > AVQZ > A[D,T,Q]Z).Therefore, the fact that some [X = 3,4,5] CBS energies are more positive than those computed by the [X = 2,3,4] triad is a consequence of how their respective component energies are distributed (i.e. the shape of the curve formed by the three points).And recall that the pcSseg basis sets were optimized for nuclear magnetic shielding constants, not for determining energies.
Decomposing the MP2/CBS energies into CBS SCF and correlation energies reveal that the SCF component consistently becomes more positive as the triad increases (e.g.[2,3,4] ![3,4,5]), with the single exception of the VXZ basis set family.Conversely, the correlation energies do not follow a consistent trend.By examining Figures S4-S6, it is clear that the MP2/CBS energy trends follow those of the CBS correlation energies as the triad sequence increases.

Relative Energies
Considering that multiple CBS values can be computed for a given basis set familyfor example, a pc-X CBS energy can be computed using [X = 1,2,3], [X = 2,3,4], or [X = 3,4,5] triadwe also examined how the extrapolated relative energies computed using the smaller triads compare to those computed using the largest triad.Here we make the assumption that the CBS values computed using the largest triad are the most accurate within a given family.Figures 2-4 plot the MP2/CBS relative energies (Scheme 2, Table S4) for the three systems studied as a function of the Dunning and Jensen triad sequence.What is reassuring, in comparison to the MP2/CBS electronic energies that show significant changes as the triad size increases (Figures S4-S6), is that their computed relative energies show little variation.Consequently, while the ultimate goal here is to obtain the best CBS relative energies through the use of the largest size triad, one can get reasonably close values through the use of a smaller triad (e.g.[1,2,3]) as elaborated upon in the following paragraphs.
On average, the MP2/CBS relative energies computed using the [X = 1,2,3] Jensen triads result in values that are within 0.11-0.24kcal Á mol À1 of those computed using their substantially larger [X = 3,4,5] triad (Table 2).Similarly, the use of [X = 2,3,4] triads provides improved agreement with values within 0.09-0.16kcal Á mol À1 of the [X = 3,4,5] CBS computed limit.These values are similar to the ones reported by Kupka and coworkers' investigation of the water dimer minimum using CCSD(T) theory level. [18]One would expect these agreements to become slightly worse if further compared to extrapolated values computed using [X = 4,5,6] and larger triads.
Highlighting specific results, the aug-pcseg-[X = 1,2,3] triad CBS extrapolations perform the worst for buta-1,3-diene, with a mean absolute differences of 0.598 kcal Á mol À1 .A significant improvement occurs when computing the aug-pcseg-[X = 2,3,4] extrapolations, with a resulting value of 0.138 kcal Á mol À1 obtained.For buta-1,3-diene, this improvement is mirrored within each of the Jensen basis set families.However, this is not consistently seen for propan-2-ol or the water dimer.Notable are the pcseg-X, aug-pcseg-X, and pcSseg-X families where their [X = 1,2,4] triad results in better average agreements to [X = 3,4,5] triad CBS extrapolated relative energies for both propan-2-ol and the water dimer.Finally, as seen in previous studies [27] the CBS's SCF component is largely responsible for the general distribution of conformer stability (see Figures S7-S9).Including correlation energy slightly modulates the separation and order of the relative conformer stability.

Including Core Electron Correlation
The correlation of the valence electrons only introduces some error into the resulting relative energies.Table 3 provides the MP2/CBS relative energies computed using both the standard augmented valence Dunning basis sets (i.e.AVXZ) and the augmented Dunning basis sets that were specifically created for including core electron correlation into a calculation (i.e.correlation-consistent polarized core-valence basis sets: ACVXZ). [48]If one were to include electron correlation into a CBS extrapolation involving valence basis set (e.g.AVXZ), then an absolute difference of 0.12 kcal Á mol À1 would be, on average, expected based on the three molecules reported herein.
Alternatively, if the core-valence basis sets (i.e.ACVXZ) were used, including the core electrons into the correlation only changes the relative energies by an average absolute value of 0.02 kcal Á mol À1 in comparison to the equivalent valence basis sets (i.e.ACV [2,3,4]Z versus AV[2,3,4]Z) without core-electron correlation.However, this value reflects both the addition of core electron correlation and the basis set enlargement since the ACVXZ basis sets contain more functions than the AVXZ basis sets.Thus from a practical viewpoint, the optimizations that correlate only the valence electrons using the valence basis sets provide relative energies that are very close to those that correlate both core and valence electrons using the core-valence basis sets.While one might say that neglecting core-electron correlation in the CBS extrapolation results in 0.12 kcal Á mol À1 error, an apparent fortuitous cancellation of errors suggests that such a statement would be slightly misleading.

Individual Basis Set Performance
If computing MP2/CBS energies is not possible or desirable, then knowing how individual basis sets perform becomes important.For the remainder of the analysis the MP2/AV[T,Q,5]Z//MP2/AV[T,Q,5]Z CBS values (Table 1, Scheme 2) will be used as target reference values.For geometry comparison, the MP2/ AV5Z//MP2/AV5Z structures will be used as benchmark targets.

Relative Energies
The relative conformational energies computed using each basis set is given in Table S5.Table S6 provides the individual absolute error, the mean absolute errors for each molecule and the overall mean absolute errors.The overall mean absolute errors of the MP2 relative energies computed using the Dunning, Jensen, and Karlsruhe basis sets are plotted in Figure 5.As expected, the general trend is that smallest basis within each family yields the least accurate relative energies, with the unpolarized Jensen basis sets being the least accurate.In the release of the pc-n basis sets, Jensen stated that pc-0 (herein labeled as pc-X, X = 1) was expected to be inaccurate. [5]The addition of a polarization function (e.g.pc-n, n = 1) reduces the error to under 0.3 kcal Á mol À1 .
Interestingly, there are two instances where using a sequentially larger basis set results in a slightly larger overall mean absolute error for the relative energy: augpcseg-3 (0.03 kcal Á mol À1 ) ! aug-pcseg-4 (0.05 kcal Á mol À1 ), and def2-SV(P) (0.37 kcal Á mol À1 ) ! def2-SVP (0.39 kcal Á mol À1 ).Decomposing the overall mean error as a function of the molecule reveals that these results originate from propan-2-ol and the water dimer (see Table S6; Figures S10-S12).Concerning the Karlsruhe basis sets, def2-SVP (0.48 kcal Á mol À1 ) performs worse than def2-SV(P) (0.36 kcal Á mol À1 ) only for the water dimer even though it includes an additional polarizing p function on hydrogen atoms. [16]Part of this comes from def2-SVP incorrectly finding the water dimer's conformer 4, a cyclic structure, to be more stable than the true global conformer 1 minimum.
If one would like to model a large molecule, HF/ 6-31G(d) optimizations are considered a good theory due to fortuitous cancellation of errors and its small size.If one can afford a slightly more expensive level, then MP2/def2-TZVP theories provide relative energies and geometries (Figure 6) that are significantly better than HF/6-31G(d).Also notable is the smaller MP2/def2-SVPD theory level.However, as seen in Figures S10-S18, its Table 3. MP2/CBS relative energies (kcal Á mol À1 ) computed with both valence and core electron correlated using Scheme 2. Differences are provided relative to values computed when correlating only the valence electrons (see Table 1).improvement is system dependent.For buta-1,3-diene, MP2/def2-SVPD provides nearly equivalent results as HF/6-31G(d), while for the water dimer it shows a significant improvement.

Geometries
In regards to reproducing MP2/AV5Z geometries, all-atom and heavy-atom RMSD values for each conformation are provided in Table S7, and whose mean values computed for each molecule are plotted in Figures S13-S18. Figure 6 shows the overall mean all-atom RMSD values for all conformations investigated.As seen in this plot, the RMSD improves as the number of basis functions increases within each basis set family, with the exception of def2-SVP (0.146 Å versus def2-SV(P)'s 0.124 Å).This exception is due to the RMSD contributions from the water dimer (see Figure S15), and mirrors the increased error seen in def2-SVP's relative energies.Furthermore, through a comparison to the heavy-atom RMSD values (Table S7 and Figure S18), one can conclude that a significant amount of def2-SVP's error comes from the water dimer's hydrogen atoms' positions.An examination of the mean all-atom RMSD values as a function of the molecule finds one additional minor inconsistency worth noting -MP2/def2-TZVPP results in a 0.003 Å worse geometry than MP2/def2-TZVP for the water dimer.On average, the geometries optimized using triple zeta basis sets provide structures that are between 0.005 and 0.034 Å, with the best and worse being aug-pcseg-3 and VTZ, respectively.Similarly, geometries optimized using quadruple zeta basis sets are within 0.005 Å of the AV5Z target geometries, with the exception of VQZ (0.017 Å).For both VTZ and VQZ basis sets, the water dimer contributes the most to the overall mean RMSD error.Consequently, one should preferentially use the AVDZ and AVTZ over the VTZ and VQZ basis sets since they provide better geometries and relative energies, which also holds true for propan-2-ol.However interestingly, the VTZ and VQZ basis sets perform better for buta-1,3-diene.And finally, AVQZ and all of the Jensen quintuple basis sets (i.e.X ¼ 5) produce geometries that are within 0.002 Å of the target geometries.

Timing
Since one goal of the Jensen and Karlsruhe basis sets is to reduce the overall calculation cost while maintaining accuracy, we present the relative calculation time for computing a single point energy calculation for propan-2-ol in Table 4 as a general guide for the cost of basis sets when used with MP2 theory.Note that the following results, analysis and rankings could change if one were to use an alternative software (e.g.Gaussian, Molpro) or optimization algorithm.Consequently, we encourage researchers to conduct their own timing study if they are wish to identify cost-effective theory levels using their software and workflow.
Choosing a theory level that balances the calculation cost and accuracy is often difficult and subjective.To provide a quantitative guide for evaluating a theory level as a function of both cost and accuracy, we propose the following ranking equation: where nC is a normalized cost, f is a scaling factor (i.e.0 ! 1) and nE is the normalized error.The smallest result of this equation (i.e.min(Rank)) will be the best ranked theory level for a given scaling factor, representing a userdesired balance between the theory's cost and its error.Since timing data is available, nC was chosen to be normalized relative times (Table 4) and nE the normalized mean absolute error in the relative energies (e.g.Table S6).Table 5 gives the top three theories for five different scaling factors.An expanded version of this table can be found in the SI material (Table S9) that provides the five best theories for scaling factors given in 0.1 intervals.At the scaling factor extremes, the equation correctly ranks MP2/pcseg-5 as the most accurate theory relative to the MP2/CBS target values (i.e.f ¼ 0:0); while HF/6-31G (d), MP2/pcseg-1 and MP2/pcSseg-1 are the fastest theories (i.e.f ¼ 1:0).For all other scaling factors, different theories arise that represent different cost and error balances.Notable is that for scaling factors 0.20-0.90, the augmented Dunning basis sets (i.e.AVTZ and AVDZ) are ranked bests.If one is willing to give up some accuracy for speed improvement, then aug-pcseg-3 and def2-TZVPPD often appears to be reasonable options (Table S9).
Alternatively, nC could represent a different normalized observable.For example, a reasonable observable would be the number of uncontracted basis functions (UCBF) for a given basis set.Consequently, a notable difference in theory ranking occurs, as seen in Table 5.A Pearson correlation analysis between the relative calculation time and the number of UCBF results in a value of 0.87 (p =7.2E-14).While it is often generally used, the number of basis functions (contracted or uncontracted) is not a perfect predictor of a theory's speed.As noted above, this conclusion is drawn for Psi4's algorithm, and different software could yield different results.

Conclusions
Performance comparisons of different basis sets are an important means to aid researchers in choosing the optimal molecular orbital representation for their investigations.Usually, this choice is driven by a desire for high data accuracy and confined by how costly the calculation becomes.In this paper we compared the performance of Dunning, Jensen, and Karlsruhe basis sets, 45 in total, for computing relative electronic energies and geometries of 19 stationary points across three electronically diverse molecular systems (i.e.buta-1,3-diene, propan-2-ol, and the water dimer).For benchmark data, geometries were computed at the MP2/AV5Z//MP2/AV5Z theory level and relative energies were extrapolated to the CBS limit via MP2/AV[T, Q,5//Q,5]Z//MP2/AV[T,Q,5]Z.Two different CBS extrapolation schemes with different approximation levels were also compared.Finally, to practically guide researchers in their basis set choice, an equation was presented that ranks basis sets based on a user-defined balance between their accuracy and calculation cost.The Jensen basis sets are noted using X = 1-5 indexing.The Jensen basis sets are noted using X =1-5 indexing.
b A more complete list that contains 0.1 incremented scaling factors can be found in Table S9.
In summary, the highlights from this work are the following: (1) In computing CBS relative energies using the augmented Dunning basis sets (i.e.AVXZ), one can reduce the calculation cost by using Scheme 1 without significantly reducing the results' accuracies.Note however, that the use of a less reliable geometry in Scheme 1 (e.g.originating from a lower theory level) can have a significant impact on the resulting extrapolated energiesand consequently the use of Scheme 1 and 2 would provide different results.(2) The MP2/CBS relative energies computed using the Dunning basis set [2,3,4] triads were, on average, more accurate than those computed using the corresponding Jensen [X =2,3,4] triads (Table 2).This is reasonable since the Dunning basis sets were optimized for use with electron correlated theories, while the Jensen basis sets were optimized for use in DFT theory.(3) Including core-electron correlation into the CBS extrapolations using the valence basis sets alters, on average, the relative energies by 0.12 kcal Á mol À1 .However, the valence-only correlations using the standard Dunning basis sets (i.e.AVXZ) result in extrapolations that are in close agreement to the more rigorous calculations that include core electrons and correlation-consistent polarized core-valence basis sets Dunning basis sets (i.e.ACVXZ).(4) On average for the Jensen basis sets, the MP2/ CBS relative energies computed using [X =1,2,3] triads resulted in values that were within 0.11--0.24kcal Á mol À1 of those computed using its substantially larger [X =3,4,5] triads (Table 2).Similarly, the use of Jensen [X =2,3,4] triads provided CBS results that were within 0.09-0.16kcal Á mol À1 of those computed by [X =3,4,5] triads.However, the use of [X =2,3,4] triads does not automatically guarantee that the results will be closer to the [X =3,4,5] CBS limits.For propan-2-ol and the water dimer, pcseg-X, augpcseg-X, and pcSseg-X provided [X =1,2,3] triad extrapolated CBS values that were in closer agreement to the [X =3,4,5] CBS values.(5) Caution should be employed when using def2-SVP since it incorrectly computed the most stable water dimer conformation as the cyclic structure (i.e.conformation 4).Caution should also be extended to def2-SVPD, aug-pcSseg-1 (i.e.aug-pcSseg-n, n =0) and aug-pcSseg-2 (i.e.aug-pcSseg-n, n =1) for their inability to optimize the water dimer's 4 and 5 conformations.(6) Of the triple zeta basis sets, aug-pcseg-3 (i.e.augpcseg-n, n =2) provides the best overall geometries in comparison to MP2/AV5Z structures.When Dunning basis sets are desired for studying polar molecules, one might consider using AVDZ and AVTZ rather than the VTZ and VQZ basis sets, respectively, since they provide better geometries and relative energies using less functions.(7) Utilizing a cost/error equation provides a quantitative way for evaluating which theories should be preferentially explored when considering theory choice early within a study.In exploring different weighting factors, the aug-pcseg-3 and def2-TZVPPD basis sets appear to be reasonable options that balance calculational cost and accuracy when using Psi4.(8) When possible, MP2/def2-TZVP theory level seems to be a better choice for optimization of large molecules than HF/6-31G(d).The cheaper MP2/def2-SVPD theory is also worth investigating, but shows more dependency on the system being optimized.

Figure 2 .
Figure2.The MP2/CBS relative energies (Scheme 2) computed from the possible sequential Dunning and Jensen basis set triad combination (e.g.[2,3,4]) for buta-1,3-diene conformations.The horizontal dashed lines are added as visual guides and indicate the CBS limit computed using the[3,4,5] triad within the each basis set family.

Figure 3 .
Figure 3.The MP2/CBS relative energies (Scheme 2) computed from the possible sequential Dunning and Jensen basis set triad combination for propan-2-ol conformations.

Figure 4 .
Figure 4.The MP2/CBS relative energies (Scheme 2) computed from the possible sequential Dunning and Jensen basis set triad combination for the water dimer conformations.

Table 4 .
Average relative calculation times for computing a single point energy calculation.a,bTheraw timings can be found in TableS8. of six separate SCF calculations were performed using the same propan-2-ol input geometry to generate the average values.
a A total b All calculations were performed using one core of an AMD FX-8350 processor within a Linux Mint 17.1 desktop computer that contained 31.5 GB of RAM.cThe average raw time was 1.8 s.d

Table 5 .
Top three theory levels a ranked by Equation 4 as a function of the scaling factor (f ) used.Two sets of ranking are presented that is based on the normalized mean absolute energy error (nE), and either the use of normalized relative mean time or the normalized number of uncontracted basis set functions (nC).b.