On the use of property-oriented basis sets for the simulation of vibrational chiroptical spectroscopies

We computed vibrational circular dichroism (VCD) and Raman optical activity (ROA) spectra for a test set of six chiral compounds using two standard density-functionals and an array of basis sets. We analysed the performance of property-oriented basis sets using a quadruple-zeta basis as a reference against four key metrics. We find little qualitative difference between the spectra produced by the larger basis sets (ORP, LPolX, aug-cc-pVTZ, and aug-cc-pVQZ), though their quantitative metrics exhibit wide variations. The smaller basis sets (rDPS, augD-3-21G, augT3-3-21G, Sadlej-pVTZ, and aug-cc-pVDZ) performed better for VCD rotatory strengths than for the corresponding ROA circular intensity differences (CIDs). However, this trend diminishes as the basis-set size is increased, lending validity to the conclusion that more robust property-oriented basis sets are required for ROA spectral generation than that of VCD. We observed improved performance in the mid-infrared region compared to the high-frequency regime, as well as overestimation of VCD rotatory strengths in the latter region as compared to the reference. We conclude that the ORP and LPol-ds basis sets are the most efficient and effective choices of basis set for the prediction of VCD and ROA spectra, as they provide both highly accurate results at reduced computational expense. GRAPHICAL ABSTRACT


Introduction
Over the last several decades, two complementary infrared characterisation techniques, vibrational circular dichroism (VCD) and Raman optical activity (ROA), have been developed for the determination of absolute configuration and conformational analysis [1].VCD and ROA are defined as the differential absorption and differential scattering, respectively, of left-and right-circularly polarised light [2].Due to their flexible experimental requirements as compared to electronic circular dichroism (ECD), X-ray crystallography, or nuclear magnetic resonance (NMR) spectroscopy, Unfortunately, the relationships between molecular structure and molecular properties are often highly complex and frequently require assistance from theoretical computations for robust interpretation.Simulation of chiroptical spectra is particularly challenging, however, as it requires the computation of the response of the electronic wave function with respect to external electric and magnetic fields: through dynamic molecular property tensors in the case of ROA or through static wave function derivatives for VCD (vide infra).The determination of such responses can be computationally expensive and the properties themselves tend to be highly sensitive to environmental effects, often requiring the inclusion of dynamic solvent models and conformational averaging [6,7].
A key component in the quantum chemical computation of molecular properties, including ROA and VCD spectra, is the choice of basis set.Traditional Gaussiantype basis sets come in a variety of flavours but one commonality between them is that their orbital exponents are often optimised with respect to atomic energies and with contraction schemes designed to provide additional computational efficiency [8][9][10][11][12].The quality of the basis set is pivotal to the robustness of simulations of electronic and vibrational chiroptical spectra, but balancing accuracy with computational cost is necessary for practical applications.
A number of efforts to elaborate on the basis-set dependence of VCD and ROA spectra have been reported in the literature.In 2004, Zuber and Hug [13] carried out a set of careful computations of the effect of using relatively simple basis sets (which they referred to as 'rarefied') to obtain the required circular intensity differences of ROA spectra.They found that they could obtain reasonable quantitative agreement as compared to larger basis sets, provided the vibrational normal modes were obtained separately using a more complete set, as well.In addition, they found that, for a test set of small chiral molecules containing hydrogen atoms, it was necessary to include diffuse p-type functions on those atoms, forming what they referred to as 'rDP' and 'rDPS' functions, depending on the initial basis set.This work was followed by a 2005 study by Reiher, Liǵeois, and Ruud [14] that examined the performance of the Zuber and Hug minimal basis sets, as well as Dunning's correlationconsistent basis sets, and Sadlej's polarised triple-zeta basis set (described below).Using a set of five representative chiral compounds ranging from (S)-methyloxirane up to (M)-σ - [4]-helicene, they found agreement regarding Zuber and Hug's observation that their rarefied basis sets yielded a good balance between accuracy and cost for ROA intensities, provided a more complete basis set was used to obtain the force field.In 2011, Cheeseman and Frisch [15] carried out a larger study involving 11 molecular test cases, and concurred with the findings of Reiher et al.In addition, they found that they could modify Dunning's augmented correlation-consistent doublezeta (aug-cc-pVDZ) basis set by removing a set of diffuse d-type functions to improve the computational costs of the basis set with little impact on the accuracy of the resulting ROA spectra.
Until recently, fewer studies have been reported analysing basis set effects for VCD spectra.The earliest effort by Jalkanen, Stephens, Amos, and Handy [16] focussed on the basis-set dependence of the atomic axial tensors and atomic polar tensors underlying the VCD rotatory strengths.Using the NHDT isotopomer of ammonia as a chiral test case and a series of 12 relatively small basis sets, they determined that polarisation functions are essential for accurate calculations.Much more recently, Scholten, Engelage, and Merten [17] expanded on the literature surrounding density functionals and basis sets involved in VCD spectral prediction by exploring tosylates and sulfinates.In their study, they systematically varied the number of polarisation functions in the Pople 6-311++G(2d,p) basis set.They found that the stretching modes of S=O are red shifted when the basis set does not include higher order polarisation functions and stressed that the minimum Pople style basis that should be used for these types of molecules is the 6-311G(3df,2dp) set.In 2022, Eikås, Beerepoot, and Ruud [7] included an analysis of basis-set effects as part of their larger effort to develop a systematic protocol for simulating VCD spectra of cyclic oligopeptides.They focussed on the standard series of split-valence basis sets by Pople and co-workers including polarisation and diffuse functions.Using the 6-311++G * * basis set as the reference, they found that the effect of polarisation functions was more significant than diffuse functions, and heavy-atom polarisation functions were more important than for hydrogen atoms.However, they also observed that shifts in the vibrational frequencies between basis sets led to poorer overlap between the reference spectra and those computed using smaller basis sets.Earlier this year, Groß et.al. [18] compared the performance of different density functionals, basis sets, and solvation models for VCD spectra including the Pople, Karlsruhe, and Dunning series of basis sets for a test set of six molecules.They found that the largest Dunning basis set they tested, aug-cc-pVTZ, performed best overall though they considered this basis too large for application to larger molecules.
In addition, to the above studies, which focussed primarily on the use of conventional or slightly modified conventional basis sets, a number of researchers have designed basis sets specifically to target response and related properties.The aim of such property-oriented basis sets is two-fold: to increase the accuracy of the computed property and to reduce the number of basis functions required to obtain it [19].One of the earlier attempts at developing such a bespoke basis set was reported by Sadlej [20], who used the basis set polarisation method to develop a medium-sized basis with the goal of predicting static electric dipole polarisabilities and related properties.Using the atomic gaussian-type orbital (GTO) basis set of van Duijneveldt as the source, each shell was augmented by one diffuse GTO with orbital exponents determined by assuming an even-tempered sequence of the two most diffuse orbitals in that shell.Expansion coefficients were determined through self-consistent field Hartree-Fock (SCF HF) calculations and first-order polarisation functions were derived by applying the basis set polarisation method to the outermost shell.These functions were appended to the augmented set and a contraction scheme was determined based on the computed values of the polarisability anisotropy.As noted by Sadlej, this basis set, referred to as the Sadlej-pVTZ basis set, did not involve any explicit, property-oriented, optimisation of orbital exponents.It did, however, provide a systematic method for introducing polarisation functions.This new basis set provided comparable results to much larger basis sets at both Hartree-Fock and correlated levels of theory.
In 2010, Baranowska and Sadlej developed the LPolX basis set using a similar method [21].Again using the largest optimised GTO basis of van Duijneveldt as the source set, each shell was augmented with one diffuse GTO and the exponents determined by geometric progression.Expansion coefficients were computed through restricted open-shell Hartree-Fock (ROHF) calculations and first-order polarisation functions were derived.Following this, second-order polarisation functions were derived and their contraction coefficients were determined using ROHF calculations on the uncontracted set including first-order polarisation functions which resulted in the LPol-fl set.Reducing the size of these basis sets resulted in the LPol-fs, LPol-dl, and LPol-ds sets where 'f' and 'd' refer to the highest angular momentum type of the basis set and 'l' and 's' refer to the size of the chosen contraction scheme ('large' or 'small').Using a test of four small molecules, they found that the LPolX basis sets provided a high level of saturation for describing the polarisation effects caused by external electric fields, yielding accurate dipole moments, polarisabilities, and first hyperpolarizabilities.In addition, they speculated that the LPolX basis sets would also be useful for mixed electric-field/magnetic-field response properties such as optical rotations and circular dichroism spectra.
Following this work, Baranowska-Łaczkowska and Łaczkowski developed the optical rotation prediction (ORP) basis set, which, as the name indicates, targeted computation of specific rotation in chiral molecules [22].Using the uncontracted VTZ basis of Ahlrichs and coworkers [23], they augmented each shell with one diffuse function which, again, had orbital exponents determined through geometric progression.Next, three uncontracted first-order polarisation functions with initial guesses at their orbital exponents were appended to the set.The orbital exponents were then optimised by minimising the errors in atomic polarisabilities using finite-field ROHF calculations.Finally, a contraction scheme was determined using ROHF calculations of the initial set with the polarisation functions remaining uncontracted.They found excellent agreement with rotations obtained using the larger aug-cc-pVTZ basis set.Three years later, Baranowska-Łaczkowska [24] introduced a smaller version of the ORP basis set, R-ORP, which was similarly constructed but includes fewer polarisation functions on each atom.
In 2020, Aharon and Caricato [25] introduced two highly compact basis sets also targeting optical rotations, constructed by combining the standard 3-21G Popletype basis set with the diffuse functions from the Dunning aug-cc-pVDZ or aug-cc-pVTZ basis sets, but then reoptimizing the latter to minimise statistical errors in long-wavelength specific rotations as compared to the full aug-cc-pVXZ basis set.Using a training set of 21 chiral compounds to build the new basis sets -named augD-3-21G and augT3-3-21G -they carried out a series of computations on a control set consisting of another 30 compounds.They found that their new basis sets yielded mean unsigned errors in sodium D-line specific rotations of approximately 4% at considerably less cost than the larger ORP basis set.
The current work is intended to be a component of a larger effort to develop a robust computational protocol for simulating the VCD and ROA spectra of glycans in solution.Such a goal requires exploration of numerous relevant variables, including choice of quantum chemical model (e.g.density functional, level of electron correlation), implicit vs. explicit solvent model, molecular dynamics forcefield (in the case of explicit solvation models), conformational/configurational sampling, and, of course, basis set.
The central question we seek to address is whether the recently-developed basis sets targeting electronic response properties such as dipole polarisabilities and optical rotation provide similar accuracy at reduced computational cost for vibrational response properties.Given that these seemingly disparate properties depend on the same sundry response tensors and wavefunction derivatives, a compelling argument can be made that these new basis sets will perform admirably beyond their original purpose.Thus, we have carried out a series of benchmark VCD and ROA computations across a range of general and property-optimised basis sets to measure their performance against the effectively complete reference quadruple-zeta (aug-cc-pVQZ) basis set.We make use of several quantitative measures for this purpose, all oriented towards identifying the basis set(s) that provide practical balance between accuracy and computational efficiency.

Theoretical background
Although VCD and ROA are both vibrational chiroptical spectroscopies, their theoretical formulations are distinct.Theoretical simlulation of VCD requires computation of the rotatory strength, which is the dot product of the electric-dipole and magnetic-dipole vibrational transition moments, viz., where μ and m are the electric-and magnetic-dipole operators, respectively, n denotes the electronic state, and v denotes the vibrational state.The principal challenge in computing R n vv is that the electronic contribution to the magnetic-dipole transition moment unphysically vanishes within the Born-Oppenheimer approximation.As shown by Stephens nearly 40 years ago, [26] this problem may be overcome by first-order corrections of the adiabatic wave functions via Taylor expansions in the nuclear positions and the external magnetic field, leading to the expression, where the sum runs over all nuclear coordinates, ψ n is the electronic wave function, χ nv is the vibrational wave function, B β is a particular Cartesian direction of the magnetic field, and the subscript/superscript 0 denotes the reference geometry or field-free state.The overlap between wave function derivatives, known as the electronic contribution to the atomic axial tensor, is the key challenge of formulating and computing VCD rotatory strengths.ROA, on the other hand, requires geometric derivatives of three dynamic linear response tensors, namely, the electric dipole polarisability, [27] the electric-dipole/magnetic-dipole polarisability, and the electric-dipole/electric-quadrupole polarisability In Equations ( 3)-( 5), the subscripts α, β, and γ refer to Cartesian directions, ω is the frequency of incident radiation, ω n0 is the excitation energy between the ground and nth excited electronic states, and is the electricquadrupole operator.
From the above, we see the key differences between VCD and ROA: the former requires an accurate description of the derivative of the electronic wave function with respect to the external magnetic field, while the latter involves derivatives of electric-and magnetic-fielddependent response tensors.(It is interesting to note that VCD can also be formulated in terms of dynamic response theory, namely as the frequency derivative of the linear response function involving the electronic part of the geometrical gradient operator and the magneticdipole operator.[28]) Thus, we wish to methodically test the performance of both conventional and propertyoptimised basis sets for these methods.

Computational details
Our test set includes six relatively small molecules (Figure 1), selected because of their representative structures and bonding patterns: (P)-hydrogen peroxide, (P)-2,3-pentadiene, (R)-fluorooxirane, (S)-methyloxirane, (1S,4S)-norbornenone, and β-D-glucose.For each molecule, we carried out VCD and ROA computations using optimised geometries and vibrational force fields computed with the given basis set being tested.While some previous studies, such as those reported by Zuber and Hug [13] and by Reiher and co-workers [14], have used small basis sets to obtain the VCD and/or ROA intensities, but larger basis sets for the geometries and vibrational force fields, this is not the conventional workflow in modern quantum chemistry packages.Given that the long-term purpose of the present work is to develop protocols for general use, we have elected to maintain consistent choice of basis set for all components of the workflow.The selection of basis sets includes (1) the augmented correlation-consistent (augcc-pVXZ) sets of Dunning and co-workers [10] with double-, triple-, and quadruple-zeta quality (X=D, T, and Q, respectively), (2) the rDPS basis set of Zuber and Hug [13], (3) the Sadlej-pVTZ basis [20], (4) the LPolX sets of Baranowska and Sadlej [21], (5) the ORP sets of Baranowska-Łaczkowska and Łaczkowski [22] and R-ORP set of Baranowska-Łaczkowska [24], and ( 6) the compact augD-3-21G and augT3-3-21G sets of Aharon and Caricato [25].As available, all basis sets were obtained from the Basis Set Exchange of the Molecular Sciences Software Institute [29].The number of basis functions for each molecule and basis set are listed in Table 1.For each molecule and basis-set combination, the corresponding structures and vibrational spectra were computed using the B3LYP [30][31][32] and CAM-B3LYP [33] density functionals.All computations were carried out using the Gaussian-09 quantum chemistry package [34].Workflow control, spectrum generation, and subsequent analysis were carried out using the SoAPy package developed in our research group [35].The spectrum simulations employ a Lorentzian lineshape for both VCD and ROA, using a full-width at halfmaximum (FWHM) of 10 −3 eV (8.06573 cm −1 ).Geometry optimizations used tight convergence parameters, including maximum and root-mean-squared forces of 1.5 × 10 −5 E h /a 0 .For glucose, only the lowest-energy conformer was considered, i.e. the computed spectra are not conformationally averaged.An external field wavelength of 532 nm was used with the in-phase dual circular polarisation (DCPI) at 180 • for the ROA spectral simulations.
We have used four primary metrics to analyse the performance of these basis sets.The first of these is the singly normalised overlap (SNO) function, which includes integration over the entire (or selected) spectral region(s) between the sample spectrum, f s , and the reference spectrum, f r , normalised with respect to only the reference function.The values produced by this function can range from −∞ to ∞, though for wellbehaved functions, such as those examined here, we observe limits typically between +1 and −1, where the former indicates that the spectra are identical and the latter would occur for the same spectrum but for the opposite enantiomer.Artificially inflated values can arise when the sample spectrum has a significantly greater area than that of the reference.This circumstance can be detected, however, by our second and third metrics which are the doubly normalised overlap (DNO) function and the integrated difference function, respectively.The DNO function is, which is similar in design to the SNO, but, as the name indicates, it includes includes normalisation factors for both the sample and reference spectra, which formally limits bounds from +1 to −1.Thus, the DNO avoids the artificial inflation that can potentially occur for the SNO.However, it neglects differences in contributions of the peak intensities to the overlap between the two functions.For example, if a sample spectrum had a magnitude twice that of the reference, but was otherwise identical, the DNO function would still produce an overlap of 1.Thus, we choose to include both the SNO and DNO for our analysis.We refer to the third metric as the integrated difference function, which has the advantage that it is invariant to frequency shifts and is primarily a measure of the error in intensity differences between two spectra.This allows resolution of spectral differences into those from vibrational frequency shifts versus those from differences in absorption/scattering intensities.The integrated difference function ranges from +1 to −∞ where negative values indicate that the squared area of the sample is larger than that of the reference.Values closest to zero indicate that the differences between spectra are minimal while values further away, either negative or positive indicate greater differences between the two functions.
For this reason, we refer to the results of the integrated difference function as 'errors' and our analysis primarily involves their absolute values.We choose to retain the sign in our reported results to offer the opportunity to identify potentially inflated singly normalised overlap values.
Our final metric for basis set evaluations are 'events,' which include discrepancies in the sign between the sample and the reference, normal mode reordering, or a combination of the two.This metric has the advantage that it accumulates discrepancies that are not necessarily visible in the spectra for further comparison.Events are determined by comparing the sign of the sample intensity with that of the reference intensity for a given normal mode index.
Ideally, we would define values for each of these metrics indicating a minimal level of acceptable performance, such as whether it was sufficient for assigning a molecule's absolute stereochemical configuration.However, such metrics must ultimately be determined based on comparison to experimental data, whereas the goal of the present work is to measure the efficacy of each basis set relative to more complete sets.In many ways this is a more stringent requirement than configuration assignment, and smaller basis sets may offer significant advantages for experimental comparisons.Furthermore, it should be noted that we have chosen not to include the parent IR and Raman vibrational spectra, which are often analysed alongside their chiroptical counterparts.This is especially important when comparing to experimental data as the parent spectra are critical for determining frequency scaling factors that are commonly applied.These scaling factors are dependent on the choice of density functional, basis set, and functional groups associated with the molecule of interest [17].Given that our study involves only comparisons to computed spectra in an effort to identify the most computationally effective and efficient protocol for VCD and ROA, inclusion of the parent IR and Raman spectra would only serve to provide redundant information in our basis set evaluation, especially given the greater sensitivity of VCD and ROA spectra than their chiral versions to the level of theory and choice of basis set.In addition, while conformational dynamics and solvent effects are certainly critical for comparison to experiment, we have not considered them in the present work in order to focus specifically on basis set performance.

Results
The general trends observed between the B3LYP and CAM-B3LYP functionals are sufficiently similar that we choose to focus on the former unless otherwise stated.However, all data, including spectra and metrics for all molecules, basis sets, and density-functionals, are provided in the Supplementary Information.The results will be presented in the following format for each molecule.First the overlaps, integrated differences, and number of events will be presented in groupings based on the performance of each basis set for VCD and ROA.Then a brief synopsis of the best performing basis sets will be given, including contributions from basis set size and the three metrics used for evaluation of the basis sets.Finally, the events for each basis set will be described in more detail including the types of normal modes and their respective effects on the final spectrum.

(P)-Hydrogen peroxide
We choose to begin with the smallest of our molecular test cases because its small number of vibrational modes make it simultaneously the simplest to analsze in detail, but also the most challenging for our four metrics.In addition, hydrogen peroxide is also of interest because it is one of two compounds in our test set that exhibits axial, rather than point chirality.The computed VCD and ROA spectra for (P)-hydrogen peroxide are given in Figures 2 and 3 where we have grouped the basis sets into 'small' (rDPS, augD-3-21G, augT3-3-21G, R-ORP, aug-cc-pVDZ, and Sadlej-pVTZ) and 'large' (ORP, LPolds, LPol-dl, LPol-fs, LPol-fl, and aug-cc-pVTZ), respectively, for easier comparison and analysis.The overlaps, integrated differences, and number of events for (P)hydrogen peroxide are presented in Table 2.
Given the narrow margin for error because of the limited number of vibrational modes, it is unsurprising that most of the smallest basis sets under scrutiny -rDPS, augD-3-21G, augT3-3-21G, and R-ORP -exhibit wide variations in their representations of the VCD and ROA spectra of (P)-hydrogen peroxide, both in terms of the positions of the vibrational transitions and the corresponding rotatory strengths/scattering intensities.While only the very small rDPS basis exhibits an event (sign discrepancy relative to the reference), the error, which occurs in the O−H stretching region above 3500 cm −1 is visibly obscured in the VCD spectrum because of another vibrational mode lying ca. 1 cm −1 away with the same sign.The pair is slightly more separated with the augT3-3-21G basis set and so the opposite sign rotatory strengths are visible in the spectrum.The pair of O−H stretching vibrations are more distinct in the ROA spectrum for these small basis sets, but that, too, is at odds with the aug-cc-pVQZ spectrum, which exhibits only one peak due to our chosen FWHM.The SNO and DNO values for these basis sets fall into the range of 0.00-0.10for VCD and −0.13 − 0.02 for ROA, which is exceedingly low relative to the reference aug-cc-pVQZ results.However, the integrated difference formula, which is expected to emphasise differences in intensities and deemphasize vibrational frequency shifts, reveals that the rDPS, augD-3-21G, and augT3-3-21G basis sets still exhibit substantial errors, but the discrepancies for the R-ORP basis are at least somewhat due to errors in the harmonic forcefield.
The aug-cc-pVDZ basis set performs similarly for both VCD and ROA spectra with SNO/DNO values of ca.0.65-0.66and small integrated difference values.The Sadlej-pVTZ basis is somewhat poorer with Table 2. Overlaps, integrated differences, and number of events for the VCD and ROA spectra of (P)-hydrogen peroxide using B3LYP.spectrum of (P)-hydrogen peroxide, though the overlaps increase somewhat to 0.55/0.54for the ROA spectrum.However, similarly to the smaller basis sets, its integrated difference formula remains relatively small at −0.008 for VCD and −0.05 for ROA, which suggests that the principal source of the discrepancy is the vibrational frequency shifts rather than the rotatory strengths or CIDs.The LPol-ds and LPol-dl basis sets perform worse for the VCD spectrum of (P)-hydrogen peroxide than the aug-cc-pVDZ basis (SNO/DNO values of 0.60/0.60 for LPol-ds, 0.61/0.61for LPol-dl even though they are considerably larger.However, the performance improves considerably for the ROA spectrum increasing to 0.88/0.84for LPol-ds and 0.85/0.86 for LPol-dl.Among the larger basis sets, LPol-fs and LPol-fl yield very good results for both VCD and ROA spectra while the augcc-pVTZ basis performs somewhat more poorly than expected, given its size.Unfortunately, the aug-cc-pVTZ basis set yields somewhat worse results with SNO/DNO values of 0.71/0.72 for VCD and 0.65/0.64 for ROA.However, in spite of the variations in overlaps and integrated differences, it should be noted that, for all of the larger basis sets -from ORP through aug-cc-pVTZ -the visual differences in both VCD and ROA spectra are relatively small.

(P)-2,3-Pentadiene
We include (P)-2,3-pentadiene in our test set because of its axial chirality, as well as its adjacent C=C bonds.The computed VCD and ROA spectra for (P)-2,3pentadiene are given in Figures 4 and 5 using the same basis-set grouping as the previous section.In addition, the overlaps, integrated differences, and number of events for (P)-2,3-pentadiene are given in Table 3.
The overlaps observed for the VCD and ROA spectra draw a clear distinction between the small basis sets -rDPS, augD-3-21G, augT3-3-21G, R-ORP, augcc-pVDZ, and Sadlej-pVTZ -and their larger counter   3. Overlaps, integrated differences, and number of events for the VCD and ROA spectra of (P)-2,3-pentadiene using B3LYP.The remaining, larger basis sets performed considerably better with VCD SNO/DNO values ranging between 0.77/0.78-0.99/0.99 and ROA SNO/DNO values between 0.80/0.74-1.00/0.98.Most of the values produced by the integrated difference function for VCD showed only minor discrepancies and inflations though, of note, was a marked increase in the values for ROA which were an order of magnitude greater or more in some cases.Further investigations revealed that this was a result of a poor prediction in the CID value for the same troublesome peak as that for the smaller basis sets ca.1100 cm −1 , though the sign was correctly predicted with these larger sets.Given the performance and size of the ORP and LPol-ds basis sets, further evaluation of these promising bases is required.
In the VCD spectra, the ORP basis exhibits events at 1167.31 cm −1 , 3124.98 cm −1 , and 3125.00 cm −1 , while the LPol-ds basis produces events at 3114.27 cm −1 and 3114.37 cm −1 .The frequency at 1167.31 cm −1 corresponds to a symmetric allene (C=C) stretch, while the higher-frequency modes that are not clearly distinguishable in the spectrum are C−H stretches, and so are less relevant to experimental comparisons.Nevertheless, the discrepancy of the high-frequency modes occurs because of reordering of the symmetric and antisymmetric stretching modes as compared to the aug-cc-pVQZ basis set.Interestingly, none of these discrepancies occur for the corresponding ROA spectra for either the ORP or LPol-ds basis sets, which means that their VCD spectra have inverted signs because of the mode reordering, but they give the correct sign of the rotatory strength for the modes themselves.Meanwhile, we do not observe similar events in the corresponding ROA spectra even though the same modes are re-ordered; hence, the ORP and LPolds appear to give the correct ordering of the signs because they produce incorrect signs for the modes themselves.In addition, in comparing the ORP and LPol-ds basis sets, it is noteworthy that the ORP basis has a better absolute error for the integrated difference function for its VCD spectrum while the opposite is true for the ROA spectra.

(R)-Fluorooxirane
The next test case, (R)-fluorooxirane, is the only compound studied here containing fluorine and thus increases the diversity of the test set for analysing the various basis sets [36].(Several of the sets under consideration here have not yet been defined beyond fluorine; hence the limitation to first-and second-row elements in our benchmark.)The computed VCD and ROA spectra for (R)-fluorooxirane are given in Figures 6  and 7 using the same basis-set grouping as in the previous section.The overlaps, integrated differences, and number of events obtained for (R)-fluorooxirane are provided in Table 4.
As with previous molecules in our test set, the small bases -rDPS, augD-3-21G, augT3-3-21G, R-ORP, and aug-cc-pVDZ -performed rather poorly with VCD and ROA SNO/DNO values falling within the ranges −0.02/−0.02-0.15/0.18 and −0.42/−0.29-0.06/0.05,respectively.These large negative overlap values were attributed to frequency shifts of the smaller bases compared to the reference in regions of the spectrum with a large density of normal modes of alternating signs.Surprisingly, the LPol-ds and LPol-fs did not perform as well as was the case for previous molecules in our test set.The SNO/DNO values observed for the LPol-ds and LPol-fs were 0.59/0.62 and 0.52/0.62 for VCD and 0.60/0.59and 0.34/0.35for ROA, respectively.In addition, the LPol-fs basis set had a relatively large value from the integrated difference function for the VCD spectrum in contrast to the other larger basis sets.
The remaining bases performed well for VCD and ROA with SNO/DNO values between 0.69/0.73-0.97/0.97 and 0.75/0.73-0.92/0.93,respectively.This grouping included the Sadlej-pVTZ basis which outperformed the ORP and LPol-fl sets in the VCD spectrum.In the ROA spectrum, however, the SNO/DNO of the Sadlej-pVTZ basis were significantly poorer.Interestingly, within this group, only the LPol-fl basis had a larger than expected value of 0.20 produced by the integrated difference function.Unsurprisingly, aug-cc-pVTZ performed very well, with metrics indicating that it produced results nearly identical to those of aug-cc-pVQZ.
Of the larger/better performing bases, only the Sadlej-pVTZ and LPol-fl basis sets exhibited events.For both VCD and ROA, the Sadlej-pVTZ basis exhibited only one event.In the VCD spectra, the event occurred at 1154 cm −1 while for ROA the event was at 3203 cm −1 .These normal modes corresponded to an asymmetric hydrogen bending motion in the direction parallel to the plane of the ring and an asymmetric stretching motion involving the same hydrogen atoms, respectively.The LPol-fl basis exhibited events at 515 cm −1 and 1154 cm −1 and at 1398 cm −1 , for VCD and ROA respectively.These vibrational modes involved a bending motion of the fluorine and oxygen atoms respective to the chiral carbon, the same asymmetric hydrogen rocking motion, and a new asymmetric hydrogen bending perpendicular to the ring plane.All of these events, for both basis sets, were apparent in the spectra.

(S)-Methyloxirane
(S)-Methyloxirane is similar in structure to (R)fluorooxirane, but its methyl torsional vibrations provide additional low-frequency modes for comparison.The computed VCD and ROA spectra for (S)-methyloxirane are given in Figures 8 and 9 using the same basis-set grouping as in the earlier sections.The overlaps, integrated differences, and number of events obtained for (S)methyloxirane are provided in Table 5.
Paralleling trends from our previous test molecules, the smaller basis sets including rDPS, augD-3-21G, augT3-3-21G, rORP, aug-cc-pVDZ, and Sadlej-pVTZ yielded poor SNO/DNO overlaps ranging from 0.09/ 0.10-0.21/0/21for VCD and from −0.02/−0.03-0.47/0.39 for ROA.The exceptions to these ranges were the rDPS ROA spectrum which had an SNO/DNO of 1.13/0.35(with an integrated difference −9.20 marking a perfect example of the type of inflation the SNO function can experience) and the Sadlej-pVTZ VCD spectrum which had an SNO/DNO of 0.54/0.63.On average, these bases exhibited three or more events for both spectroscopies.
A drastic improvement was noted for the ORP, LPolds, LPol-dl, and LPol-fs sets which yielded SNO/DNO values between 0.69/0.68-0.78/0.80 and 0.62/0.61-0.77/0.76 for VCD and ROA, respectively.Of import, was that most of these basis sets had absolute errors from the integrated difference function below 0.10, which is rather reasonable.Additionally, these bases averaged one event for VCD and two events for ROA.
The remaining sets -LPol-fl and aug-cc-pVTZ -performed at nearly the aug-cc-pVQZ level with SNO/DNO greater than 0.95/0.94for VCD and 0.97/0.92for ROA.Interestingly, the VCD spectra yielded absolute errors for LPol-fl and aug-cc-pVTZ of 0.01 for VCD with no events.4. Overlaps, integrated differences, and number of events for the VCD and ROA spectra of (R)-fluorooxirane using B3LYP.In contrast, the ROA spectra for this metric increased to 0.11 and 0.09, respectively with one event each.For (S)-methyloxirane LPol-ds is clearly the preferred basis for VCD and ROA calculations with good performance at a moderate cost.Though, once again, we focus our discussion of events around the larger basis sets which make for reasonable candidates for use in the development of robust computational protocols for simulating glycans in solution.The single event noted for the ORP basis in the VCD spectrum occurs at 3091.52 cm −1 and corresponds to a sign change and normal mode reordering with a neighbouring peak at 3094.41 cm −1 (though neither is clearly observable in the simulated spectrum).In the ROA spectrum an event arises at 1168.68 cm −1 , which may be ascribed to hydrogen bending motions on all three carbon atoms.The resulting sign change is relatively subtle in the spectrum but does allow resolution of an additional peak not seen in the reference aug-cc-pVQZ spectrum.The same phenomenon as in the ORP VCD spectrum appears in the LPol-ds VCD spectrum at 3085.81 cm −1 with the neighbouring mode at 3082.23 cm −1 .The modes in question are associated with a symmetric C−H stretch on the achiral carbon in the epoxide ring and an asymmetric C−H stretching on the methyl group, respectively.In this case, the reordering does not cause significant changes in the shape of the VCD spectrum with either basis set (even with our choice of linewidth).The ROA spectrum produced by the LPol-ds spectrum also yields events at these same normal modes, however, the impact is much more easily observed in the ROA spectrum due to the large shift in intensity relative to the aug-cc-pVQZ reference spectrum.Finally, the LPol-ds spectrum also exhibits events located at 1166.88 cm −1 and 3087.60 cm −1 in the ROA spectrum corresponding to the same events as the ORP ROA and ORP VCD spectra.

(1S,4S)-Norbornenone
The next test case, (1S,4S)-norbornenone, includes important new structural features such as the bicyclic cage and the carbonyl moiety.This chiral compound has been the focus of numerous computational and experimental studies, particularly regarding its electronic CD Figure 9. VCD (left) and ROA (right) spectra of (S)-methyloxirane computed with B3LYP across the entire spectral region using the ORP, LPol-ds, LPol-dl, LPol-fs, LPol-fl, aug-cc-pVTZ, and aug-cc-pVQZ basis sets.Table 5. Overlaps, integrated differences, and number of events for the VCD and ROA spectra of (S)-methyloxirane using B3LYP.spectrum and it long-wavelength specific rotation.[37][38][39] The latter has proved particularly challenging to theoretical prediction due to its large magnitude and wide variation between solution-and gas-phase properties.
The computed VCD and ROA spectra for (1S,4S)norbornenone are given in Figures 10 and 11 using the same basis-set grouping as in the previous section.
The overlaps, integrated differences, and number of events obtained for (1S,4S)-norbornenone are provided in Table 6.
The results for (1S,4S)-norbornenone followed similar trends as the other molecules for the small rDPS, augD-3-21G, augT3-3-21G, and R-ORP bases, with overlaps between 0.03-0.07for VCD and somewhat larger from 0.16-0.32 for ROA.The integrated difference values for the R-ORP set, however, are smaller than the other basis sets in this group by roughly an order of magnitude, which, again, suggests that at least some of the discrepancy is due to vibrational frequency shifting.On the other hand, the number of events observed for these small basis sets is large, in part, because of the increased number of vibrational modes for this medium-sized molecule.Visual inspection of the spectra is consistent with these numerical data, with the r-DPS, augD-3-21G, and augT3-3-21G basis sets exhibiting significant frequency shifts for several large peaks and a number of sign errors associated smaller peaks.
As we have observed for other molecules in our test set, the aug-cc-pVDZ and Sadlej-pVTZ basis sets perform markedly better than their smaller counterparts (and better than R-ORP, which is identical in size to augcc-pVDZ), with SNO/DNO overlaps of 0.45/0.45(VCD) and 0.66/0.58(ROA) for the former and 0.56/0.55(VCD) and a particularly impressive 0.83/0.62(ROA) for the latter.Their integrated difference values are also reasonably small, though significantly worse for ROA than VCD, and with a reasonably small number of events.The next largest basis set, ORP, gives particularly good results for (1S,4S)-norbornenone, with overlaps of 0.93/0.94for VCD and 0.84/0.82for ROA and small integrated difference errors.6. Overlaps, integrated differences, and number of events for the VCD and ROA spectra of (1S,4S)-norbornenone using B3LYP.Interestingly, the LPol-X basis sets all perform approximately the same in spite of the variation in sizes from LPol-ds (392 functions) to LPol-fl (784 functions) with SNO/DNO overlaps ranging from 0.64/0.64-0.76/0.76 for VCD and significantly better for ROA from 0.92/0.91-0.98/0.98 (and only a handful of sign events).Only the large aug-cc-pVTZ basis set yields comparably strong performance for both VCD and ROA with all overlaps greater than 0.9 and integrated difference values 0.01 or less.In addition, for all of the basis sets from ORP and larger, visual inspection of the spectra reveals few apparent differences relative to aug-cc-pVQZ.The most significant discrepancy occurs in the ROA spectrum at 961 cm −1 for ORP and 959 cm −1 for LPol-dl, which is an antisymmetric hydrogen out-of-plane bending motion.For this mode, both basis sets yield significantly larger scattering intensity differences compared to the reference basis set, though the sign of the peak is still correct.

β-D-Glucose
The last test molecule is β-D-glucose, which we have selected in part because of its less-rigid ring structure compared to (1S,4S)-norbornenone.In addition, the longer-term goal of this work is to develop a computational protocol for simulating the vibrational chiroptical spectra of glycans, of which β-D-glucose, as the most abundant monosaccharide, is among the more important building blocks.[40] Although β-D-glucose is conformationally flexible, our inquiry in this work focuses only on basis-set convergence, and thus we have selected only the lowest-energy configuration of the β epimer.The computed VCD and ROA spectra for β-D-glucose are given in Figures 12 and 13 using the same basis-set grouping as in the previous section.The overlaps, integrated differences, and number of events obtained for β-D-glucose are provided in Table 7.
As with the previous test cases, the smaller rDPS, augD-3-21G, augT3-3-21G, and R-ORP basis sets performs relatively poorly, though R-ORP was clearly the best of this group.While the SNO/DNO values for the  The aug-cc-pVDZ and Sadlej-pVTZ basis sets once again perform similarly to one another, though the latter exhibits a significant increase in SNO/DNO values between VCD (0.39/0.36) vs. ROA (0.73/0.64), while the accuracy of the former is relatively constant at 0.58/0.59for VCD and 0.55/0.51for ROA.Integrated difference errors are relatively small for these basis sets, apart from that for the ROA spectrum predicted by Sadlej-pVTZ basis with a value of −0.31.The most prominent visual difference between the reference aug-cc-pVQZ spectra and that produced by the aug-cc-pVDZ and Sadlej-pVTZ basis sets occurs just near 800 cm −1 in the ROA (a hydroxyl torsional motion); the smaller basis sets exhibit a strong, positive Cotton effect, while the larger basis set shows no sign change, but instead a strong positive peak with a small shoulder.Interestingly, the ORP basis set exhibits comparable SNO/DNO values for both VCD and ROA (ranging from 0.69-0.76)and integrated difference values that are significantly smaller than those of the Sadlej-pVTZ basis set, but, like its smaller counterpart, also incorrectly exhibits the positive Cotton peaks near 800 cm −1 in its ROA spectrum.However, the ORP basis set also exhibits fewer events/sign errors as compared to the Sadlej basis set, which is also a contributing factor to its strong overlaps and small errors.
We observe a larger performance difference between LPol-ds and LPol-dl for β-D-glucose than for most of the other molecules in this study, particularly for VCD for which the two basis sets give SNO/DNO values of 0.49/0.49and 0.68/0.67,respectively.The difference is reduced for the ROA spectrum, for which the two basis sets give 0.78/0.77(LPol-ds) and 0.88/0.81(Lpol-dl), and both exhibit only one or two sign discrepancies and small integrated difference errors.Once again, however, the ROA spectrum with both basis set displays the same erroneous Cotton effect at 800 cm −1 as observed for the smaller sets, though the rest of the spectrum bears strong visual similarity to the aug-cc-pVQZ reference spectrum.This spectral feature is finally reproduced correctly with the aug-cc-pVTZ basis set, which also exhibits strong overlaps, small integrated difference errors, and zero or one sign discrepancies, as compared to the reference.Of note, however, is the approximately 20% deviation of the aug-cc-pVTZ VCD spectrum from that of the reference, which we attribute to frequency shifts in the high frequency region.Indeed, focussing on the 0 − 2000 cm −1 region, the SNO/DNO for the augcc-pVTZ basis set is very high at 0.9625/0.9599while the 2000 − 4000 cm −1 region produces SNO/DNO values of 0.3968/0.4007.The integrated difference values in these two regions, on the other hand, are −0.0054 and 0.0192, respectively indicating that the positions of the vibrational modes from these calculations are to blame for discrepancy.(See Tables S90 and S91 in the Supplementary Information.)We were unable to obtain VCD and ROA spectra for β-D-glucose using the LPol-fs and LPol-fl basis sets due to numerical instabilities in the Kohn-Sham self-consistent-field procedure during the geometry optimisation step.Thus, we are unable to make a performance comparison for those basis sets for this test case.

Discussion
Table 8 summarises the SNO and DNO values, as well as integrated difference errors, for each type of spectrum, averaged across the set of test molecules for each basis set.From these data, we can reasonably classify the basis sets into four groups based on their relative performance: (1) the small rDPS, augD-3-21G, augT3-3-21G, and R-ORP sets; (2) aug-cc-pVDZ and Sadlej-pVTZ; (3) ORP, LPol-ds, LPol-dl, and LPol-fs; and (4) LPol-fl and aug-cc-pVTZ.for each vibrational mode, all determined relative to the spectra obtained with the large aug-cc-pVQZ basis set.
Based on these metrics -as well as visual comparison of the various spectra -we were able to organise the basis sets into four groups of increasing accuracy.The smallest group, comprised of the rDPS, augD-3-21G, augT3-3-21G, and R-ORP basis sets, generally yielded poor VCD and ROA spectra compared to the QZ reference basis set and thus is not recommended for robust results.The second group (aug-cc-pVDZ and Sadlej-pVTZ) performed reasonably well in most cases and may ultimately yield the best price/performance ratio among all the sets considered here.For the larger basis sets in groups three (ORP, LPol-ds, LPol-dl, and LPol-fs) and four (LPol-fl and aug-cc-pVTZ), however, the ROA and VCD spectra were reproduced well compared to the much larger aug-cc-pVQZ basis.
In addition, we found that the smaller basis sets performed better in the prediction of VCD rotatory strengths as compared to ROA CIDs.This feature diminished as basis set size increase with the performance being nearly indistinguishable between groups three and four.Additionally, vibrational frequencies were better predicted in the low-frequency ('fingerprint') region, 0 − 2000 cm −1 , of the spectrum resulting in improved overlaps between the spectra in this region as compared to the high-frequency region, 2000 − 4000 cm −1 .Interestingly, we also observed overestimation of VCD rotatory strengths by most of the basis sets for the high-frequency region, but not for the ROA CIDs.
The performance of the basis sets revealed promising results for the two-fold aim of propery-oriented basis sets.The Sadlej-pVTZ basis set performs at approximately the same level as the slightly smaller aug-cc-pVDZ set, though, it is known that for some molecules more robust basis sets are required for accurate property prediction.The LPol-dl basis typically yields better VCD and ROA spectra compared to the ORP and LPol-ds sets making it a good alternative when higher accuracy is required, though its usage does come at greater computational expense (it is ca.40% larger than the ORP basis set).
On average, the LPol-fs and LPol-fl basis set provide better performance than their smaller counterparts, but given their large size (LPol-fl is approximately 2.2 times larger than ORP) and minor performance improvements (as well as producing numerical instabilities in the Kohn-Sham SCF procedure for (P)-2,3-pentadiene and β-D-glucose), we conclude that they are not ideal candidates for usage in robust computational protocols for simulating the VCD and ROA spectra.On the other hand, the performance of the ORP and LPol-ds basis sets is particularly noteworthy: on average, they provided highly accurate results at a significantly reduced size (aug-cc-pVQZ is nearly three times larger than the ORP basis set).We therefore recommend these basis sets for standard computational protocols for the prediction of ROA and VCD spectra.However, if the size of the molecular system precludes their use, the aug-cc-pVDZ or Sadlej-pVTZ basis sets may provide sufficient accuracy.

Figure 1 .
Figure 1.The molecular test set for testing conventional and property-optimised basis sets.

Table 1 .
Number of basis functions for each molecule and basis set.

Table 7 .
Overlaps, integrated differences, and number of events for the VCD and ROA spectra of β-D-glucose using B3LYP.
first three basis sets range from −0.001-0.06for VCD and from 0.10-0.26for ROA, the R-ORP basis yields values of 0.21/0.22 for VCD and 0.40/0.37 for ROA, as well as smaller values of the integrated difference error.Not surprisingly, the large number of vibrational modes for β-D-glucose (66) results in a larger number of sign discrepancies, as many as 29 in the VCD spectrum with the rDPS basis set, many of which are clearly discernible in both the VCD and ROA spectra, especially in the higher-frequency domains.

Table 8 .
Average overlaps and integrated differences for the basis sets across all molecules in the test set for the VCD and ROA spectra computed with B3LYP across the entire spectral region.