Solvent Phase Optimizations Improve Correlations with Experimental Stability Constants for Aqueous Lanthanide Complexes

ABSTRACT Stability constants provide insight into ion complexation in water. While computational studies have been shown to model the energy of the complexation successfully using a thermodynamic cycle approach, it does not extend to calculating the stability constants for 1:1 lanthanide to ligand complexes in solution. Using B3LYP and 6-31+G* Pople basis with small core effective core potential (ECP) on the lanthanum ion, and a solvent model based on the full solute electron density (SMD) solvation model we computed and compared with previously published stability constants of the ligands: acetate, acetohydroximate, acetylacetonate, methanoate, tropolonate, hydroxide, catecholate, malonate, oxalate, phthalate, and sulfate. The best R2 values for the thermodynamic cycle can only be determined by separating the mono and divalent ions to achieve an R2 value of 0.86 and 0.74 for mono and divalent ions, respectively. We show that by optimizing the lanthanide-ligand structures in implicit solvent, we achieve an improved correlation between experimental and computed stability constants of R2 value of 0.89 for the combined mono and divalent ions.


Introduction
Interest in rare earth elements (REE) has experienced a resurgence as they become more prevalent in new technologies, and the limited number of global sources has led to their being classified as critical materials. [1,2] Diversifying supply is of paramount importance to stabilize prices and foster an environment for continued innovation. [1] However, obtaining REEs in an environmentally friendly and cost-effective manner is difficult due to the similarities in REE chemistry. One step in obtaining REEs that could be improved is the extraction and separations process typically performed using solvent extraction methods.
Understanding the subtle differences in the chemical environment, such as the factors influencing the stability of metal ligand complexes in water, would aid in developing improvements to extraction methods. Accurate predictions of stability constants enable the comparisons of the binding sites between ligands with different binding sites. For liquid-liquid extractions, the stability or binding constant can help focus experimental research on systems that selectively target specific metal ions. While liquid-liquid separations are governed by more factors than metal-ligand binding, methods, such as the TALSPEAK process [3] where separate aqueous and organic soluble ligands can work cooperatively to maximize selective extraction can directly benefit from aqueous phase binding predictions.
Accurately predicting the stability of a metal binding to a ligand in water remains a challenge. For a metal (M) and a ligand (L) in equilibrium with a metal ligand complex (ML) (Equation (1)), the equilibrium or stability constant, K 1 , is defined as the ratio of the concentrations of a one-to-one ratio of the metal-ligand complex, ML ½ �, and the product of the concentrations of free metal ion, M ½ �, and free ligand, L ½ �, in solution at a particular temperature, pressure, and ionic strength, Equation (2). In this study, we focus on predicting stability constants for lanthanum to represent the REE. To simplify the labelling and because only K 1 values are used, the subscript was omitted.
To calculate the logK values, we use the relationship between logK and the free energy of the reaction, ΔG * aq , as described by (Equation (3)), where the asterisk refers to the reference state of 1 M concentration in aqueous solution, the thermodynamic standard state used for solutes. The variables a ML , a M , and a L are the activity for the metal ligand complex, the free metal ion, and the free ligand, respectively. R is the ideal gas constant, and T is the absolute temperature. Stability constants have been successfully predicted computationally using a thermodynamic cycle (TC), Figure 1 for ions in solution. [4,5] Calculating the energy of the reaction using naked ions in implicit solvents is not accurate and the inclusion of explicit water molecules around the ion improves the accuracy of the calculations. [5] The TC method is used to indirectly determine the metal ligand binding energy using gas phase optimizations and calculating the solvent phase energy of the structure frozen in the gas phase geometry. [5] While the method is inexpensive with regard to the addition of solvent effects, it can produce poor results if the gas phase geometry significantly deviates from the geometry of the solvated complex. [6] Thus, our research shows that simply allowing the geometry to relax in the presence of solvent improves the correlation between computed and experimentally observed values. Indeed, in this study, we show for lanthanide complexes, optimizing the structures with an implicit solvent, or the solution phase structure (SPS) method, is better at predicting the stability constants than with the TC method and does not significantly increase the time or expense of the calculations. The SPS method has not been reported previously and the results of this study show that using it to compute binding constants, especially for lanthanide complexes, improves the computed values without significantly increasing the computational cost. Some properties of lanthanides require higher levels of theory such as multireference methods which can take calculations to the order of days or weeks in comparison to the SPS method which takes in the order of half a day in a single node per complex and maintains good correspondence with the experimental values (R 2 =0.89). The aqueous phase change in energy for 1 M concentration reference state, ΔG * aq , is calculated using the TC method described in Figure 1, to derive Equation (4), Each G � aq in Equation (4) is calculated as: and includes the ZPE and thermal energy correction resulting from a gas phase simulation at 0 K to a gas at 298.15 K, both calculated in the Hessian. There is an additional correction to the energy of the water, nRTlnð½H 2 O�Þ = 2.38 kcal/mol, that accounts for pure water having a concentration 55.34 M. [5] Methods All calculations were run using B3LYP [7][8][9] in NWChem 6.5. [10] La was modelled with the Stuttgart RSC Segmented basis set and ECP [11,12] for the 28 core electrons which reduced the calculation expense and accounted for Figure 1. The thermodynamic relationship between ΔG * aq , ΔG � g , and ΔG * solv . [5] . scalar relativistic effects. All gas and solvent phase optimizations were performed using the Pople 6-31+G* basis set [13][14][15][16][17][18][19] on the nonmetal atoms and SMD implicit solvent model for solvent phase calculations. This level of theory and basis has been shown to generate sufficiently accurate geometries for iron and uranium metal-ligand complexes. [4,20] The SMD implicit solvent model was chosen because it was parameterized with charged species similar to those used in this study. [21] To calculate ΔG * aq without using the thermodynamic cycle, we used our SPS method. For this method, the SMD implicit solvent model and 6-31+G* basis was used to optimize the geometries. For both gas and solvent phase optimized structures, a single point energy calculation using 6-311++G** basis set [17][18][19]22,23] was used to compute more accurate energies. The ΔG * aq obtained using the SPS method was determined by optimizing the solvent phase molecules in the chemical reaction described in the lower chemical reaction of the TC method, Figure 1. All structures were determined to be minima using vibrational analysis, with and without implicit solvent models for gas and solvent phase optimized structures, respectively. For the SPS method, we did not include thermal corrections due to the possible inaccurate accounting of temperature corrections resulting from the use of SMD in both the optimization and the Hessian.
Eight explicit water molecules surround the lanthanide ion and one or two water molecules are displaced by a mono or bidentate ligand, respectively. The water molecules form part of the first hydration sphere and including it improves the accuracy of the energies calculated for ions using implicit solvent models. [5] The computationally derived stability constants were compared to the experimental values from the critically reviewed stability constants compiled by Smith and Martell. [24] The stability constants used for comparisons were those at zero ionic strength and 25 ºC. When necessary, the value was corrected to an ionic strength of zero using the Davies equation. [25] Results and discussion  Figure 2 complexed to lanthanum have been studied computationally to understand their hydrated structures and be able to predict the stability constants, logK values. The structures were chosen for two main reasons: their relatively fixed conformations and the availability of reliable experimental data. The fixed geometries were expected to minimize errors due to structure differences between gas and solvent phase structures. The experimental data were obtained from the critically reviewed data compiled by Smith and Martell. [24] Using the TC method ( Figure 1) and Equation (3) to compute logK values found in Table 1, and comparing the results to the experimental logK values, a poor correlation of (R 2 = 0.47) is observed (SI Figure 1). However, separating monovalent and divalent ions as in Figure 3 improves the correlation for both monovalent and divalent ligands (R 2 = 0.86 and 0.74, respectively). After fitting the data using a linear regression, the best fit-line equation was used to extrapolate the logK values. When using the TC method, the ΔlogK TC has a root mean square deviation (RMSD) of 4.51 and the extrapolated values, ΔlogK ext have a RMSD of 1.39. The improved correlations and RMSD values obtained when separating mono and divalent ligands is also observed when using the TC method to determine logK values for uranyl. [4] As was previously discussed by J. Ho, [6] the TC method breaks down if the gas phase geometry is not representative of the structure in solution. Thus, despite choosing ligands with relatively rigid structures, the impact of   Figure 3; b logK TC -logK expt ; c logK ext -logK expt .
including the first solvation sphere in the gas phase optimization results in a geometry that maximizes the interactions between explicit solvent molecules and the ligand. This effect is particularly pronounced in systems where the ligand contains hydrogen bond acceptors that do not participate in coordinating to the metal, Figure 4.
A further improvement over the TC method is observed when using the SPS method, where a high correlation (R 2 = 0.89) between computed and experimental logK values is found, Figure 5. When the origin is included, such that the x-and y-intercept is zero, the higher correlation of (R 2 = 0.98) is observed, SI Figure 2. The high level of correlation comes from combined mono and divalent ligands, not separate correlations for the different valency as obtained when using the TC. The single trend may result from using the SMD solvation model in the optimization calculations. The computed values and the difference between computed and extrapolation values is found in Table 2. The RMSD for the computed values, logK SPS , is 6.23. The largest deviation in logK SPS was observed with catecholate, having a ΔlogK SPS of 10.75. The RMSD is higher for the SPS method but after using the linear fit to extrapolate the logK values, the difference between experimental and extrapolated values ranges from an absolute value of 0.05 to 1.27 log units and an RMSD of only 0.77. The largest difference was calculated for the oxalate ion with a ΔlogK ext of −1.27. The errors may arise from the errors associated with the SMD model parameters and how similar the molecules in this study are to the set used to parameterize and validate the SMD model. From Marenich et al. [21] we know that the error in energy between computed and experimental energies computed using the SMD model can range from 0 to 1.32 kcal/mol for the types of molecules and level of theory examined in this study. An error of 1 kcal/mol

Conclusion
We found that with lanthanide structures, especially where the gas phase and solution phase geometries differ significantly, the logK values can be predicted with our new SPS method of calculating ΔG * aq . Using the TC method leads to increased errors and poor predictions when the gas phase structures do not accurately reflect the structure in solvent. This case is especially pronounced when the ligand can hydrogen bond to the explicit water molecules in the gas phase. The SPS method, on the other hand, gives excellent correlations with the experimental values, even with ligands containing different charges.
The large effect on the computed logK values by the phase used to optimize the structure, whether gas or solution phase, may be improved by using a method that minimizes artifacts in the geometry optimization. For example, the TC method could be improved with the addition of a second solvation sphere of explicit water molecules. However, while adding water molecules may improve the results, it would certainly increase the computational cost. Another option might be to use the SPS method without any explicit water molecules. The method outlined in this paper allows for the screening of ligands and enables the exploration of possible candidates for selective ligands with high binding affinity at a reasonable computational cost with solvent phase optimizations of the largest systems taking on the order of 10-24 hours when using modest resources such as 12 to 24 processors per calculation.  Figure 5; b logK SPS -logK expt ; c logK ext -logK expt .