Predicting solubility and driving forces for crystallization using the absolute chemical potential route

This work uses a dual absolute chemical potential route with a centroid reference for the dissolved molecules and an Einstein crystal reference for the solid. We use these calculations to predict the vapour pressure, solubility limit, and chemical potential driving forces for crystallization of napthalene (large and rigid) and succinic acid (floppy and anharmonic). We examine each source of error in the free energy calculations and demonstrate that the chemical potentials can be computed with an overall precision of ca. 0.05 kcal/mol, leading to solubility predictions with precision of ca. 10%. The precise calculations should facilitate studies of nucleation kinetics, growth kinetics, and efforts to develop accurate force fields. GRAPHICAL ABSTRACT


Introduction
Greater than 90% of small molecule drugs delivered to patients are in crystalline form [1]. A key physical property that establishes the driving forces for crystallization is the solubility of the molecule/compound being crystallized. Solubility of a substance-the solute-is the maximum amount of solute that can physically dissolve in a given solvent at a specified temperature and pressure [2]. From a statistical thermodynamic point of view, solubility is the concentration of the solute in solution at which the solute's chemical potential equals that of the solid crystal form.
The driving force for crystallization at nonequilibrium conditions, is the difference in the solute's chemical potential between the crystal and fluid phases. For vapour growth, it is the chemical potential difference between the solid and vapour, and for solution CONTACT  growth, it is the chemical potential difference between the solid and solution. Therefore studies of crystal nucleation and growth rely on computational methods to predict chemical potential differences between phases beyond the coexistence conditions [3][4][5][6][7]. This article leverages the tools we have developed to compute a solute's absolute chemical potential in the solid, gas, and solution phases to predict solid-fluid equilibria and the driving forces for crystallization [8]. For highly flexible molecules in fluid phases we introduced a reference entity called the 'centroid' [8]. The absolute chemical potential of the centroid can be computed and then transformed into the absolute chemical potential of the real molecule by the use of thermodynamic integration. As explained in our previous work [8], the absolute chemical potential route circumvents the need to perform potentially difficult fluid-to-solid transformations.
For the solid phase, we use the Einstein crystal reference by tethering each atom to its lattice site and then transform it to the real solid [8][9][10][11][12][13][14].
We make predictions for two model compounds, naphthalene (a rigid molecule) and succinic acid (a floppy molecule with multiple conformers)-thereby spanning the two extremes on the rigidity scale. For succinic acid, we investigate its known stable polymorph (the β polymorph), and the recently discovered γ polymorph [15]. The γ polymorph predictions demonstrate the future potential of the digital approach where computational studies can predict thermodynamic properties for difficult-to-crystallize polymorphs/compounds subject to the accuracy of the employed force fields.
For each model compound, we begin by predicting the solid-vapour equilibrium. This prediction employs only a single force field-the one describing the solute-and thereby provides a stand-alone test of the solute's force field. Next, we predict the solid-solution equilibrium which involves the use of an additional solvent force field. The accuracy of this prediction relies upon the accuracy of all three interactions: solute-solute, solute-solvent, and solvent-solvent. In the following sections, we make the solid-fluid equilibria predictions and comment on the accuracy of the predictions.

Predicting solid-vapour equilibrium
At solid-vapour equilibrium we have, where, and, μ C , μ G are the chemical potentials of the solute in the crystal and the gas phase, respectively; T, P, and P sat are temperature, pressure, and vapour pressure, respectively, a * is the centroid's Helmholtz free energy, a * →G is the Helmholtz free energy difference between the molecule and centroid, and v G is the volume occupied by a centroid/molecule. As shown in Figure 1, we start with computing the reference system's free energy in each of the phases (Einstein crystal for the solid; centroid for the fluid) followed by computing the free energy change to transform to the real systems. Note, a pure compound system is being considered here. For solids with low sublimation vapour pressure, O(10 −3 Pa) -O(10 2 Pa), we can assume that the ideal gas law is valid, thus, we insert k B T in place for P sat v G in Equation (1b). Also, μ C (T, P sat ) ≈ a C (T, v C ) [8]. This allows us to use the Helmholtz free energy of the solid equilibrated at 1 atm in Equation (1a). Thus, we get From independent sets of simulations we can compute a C (T, v C ), a * →G (T) and a * ,intra (T). Inserting them in Equation (1c) and solving for v G we get the volume per ideal gas molecule. Finally, we get the vapour pressure by using the ideal gas law, i.e. P sat = k B T/v G . Note: a sym accounts for the degeneracy in sampling the gas phase for a symmetric molecule [9,10]. For naphthalene and succinic acid, β a sym = ln(4), and β a sym = ln(2), respectively.

Simulation details: solid and gas phases
We model naphthalene (NAP) using the OPLS-AA force field [16] and succinic acid (SA) with the General Amber Force Field (version 1.8) [17] and AM1-BCC charges [18,19] (see Figure 2 for a ball and stick model of these molecules). We compute the vapour pressure of naphthalene at 4 temperatures -298, 308.17, 318.17, 333.34 K, and that of succinic acid at 6 temperatures, namely 300, 305, 310, 315, 330, and 350 K. The temperature choice is based on two factors, (i) availability of experimental data for testing the predictions, and (ii) significant temperature difference between adjacent temperatures to span a 50 K range. All molecular dynamics (MD) simulations for the solid and gas phases were performed using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) package [20].
The simulation box size and atomic positions in the unit cell required for the NVT simulations are determined from an NPT simulation. All systems were equilibrated at their respective temperature and a pressure of 1 atm by simulating an NPT ensemble. A 10 ns equilibration run followed by a 10 ns production run was used to compute the average lattice parameters and atomic positions in the unit cell. The Lorentz-Berthelot combining rules [23][24][25] were used for computing the interatomic pair coefficients. We used a time step of 0.5 -1.0 fs. Nonbonded interactions were cutoff at 1 nm with standard tail corrections applied (see Supplementary Information for details), with long range electrostatics handled by LAMMPS' Particle-Particle-Particle-Mesh (PPPM) summation [26]. The NPT ensemble was simulated using a Langevin thermostat and Nose-Hoover barostat [27] with P = 1 atm. A 25 point Gauss Legendre quadrature method [28] was used to evaluate the free energy difference using thermodyanmic integration [29]. The NVT simulations were run for a total of 8 ns, of which the initial 3 ns were used for equilibration. Data every 1 ps was used to compute the thermodynamic averages.
The Einstein crystal method calculations must be carried out at fixed centre of mass for numerical reasons [14,30]. Therefore, to constrain the centre of mass we setup the simulation to have zero external force due to springs. To achieve this, we [31] (i) scale the spring constants with atomic masses so that each oscillator has equal angular frequency [32], (ii) initialise the system with zero centre of mass velocity, and, (iii) initialise each atom at its lattice position. We first set the spring constant for hydrogen atoms by matching the mean squared displacement and then scale it with atomic masses to get the spring constants for carbon such that k sp,i /m i = constant [32,33]. The spring constants for the reference Einstein crystal in simulations at all temperatures are chosen to reproduce the mean square displacement of hydrogen atoms in the real crystal at 298-300 K . For naphthalene, the spring constants for hydrogen and carbon were set to 8.09 and 96.44 k B T/Å 2 , respectively. For succinic acid, the spring constants for hydrogen, carbon, and oxygen were set to 22, 262.12, and 349.20 k B T/Å 2 , respectively.

Gas phase simulations
For naphthalene, the centroid was constructed by tethering the hydrogens and carbons to the centre of mass by springs having force constants 16.786 and 200 k B T/Å 2 , respectively. For succinic acid, the centroid was constructed by tethering the hydrogens, oxygens, and carbons to the centre of mass by springs having force constants 1.26, 20.0, and 15.01 k B T/Å 2 , respectively. The free energy of the centroid is computed using the procedure laid out in Section 3.1 of our previous work [8]. For adding each atom, the ratio Z n /Z o n is computed using a custom Monte Carlo (MC) code. We run 100 million MC steps with a acceptance ratio between 30% and 50%. Data every 1000 MC steps is used to compute thermodynamic averages.
The free energy difference a * →G was computed in 5 sets of stages (see Tables 1 and 2). All simulations were performed in an NVT ensemble with a Langevin thermostat. The simulation box was (100 Å) 3 . No long-range interactions were used in the gas phase simulations. The nonbonded cutoff (LJ and coulombic) was 20 Å, much larger than the molecule size in order to include all pairwise interactions. A time step of 0.1-0.5 fs was used. For succinic acid, Stages 3, 4, 5 were simulated using Replica Exchange Molecular Dynamics to sample the COOH dihedral. Eight replicas at T = 300, 305, 310, 315, 330, 350, 500, 600 K were run with swaps attempted every 250-500 fs. Note, LAMMPS log files dump replica indices at the end of the time step after the swap, whereas the thermodynamic output is dumped at the end of the time step before the swap, therefore, care must be taken in aligning the thermodynamic output. The remaining simulation details specific to each set of stages are specified in Tables 1 and 2.

Results and discussion
Tables 3 and 4 show the results of the dimensionless solid phase free energies computed at different temperatures for naphthalene and β-succinic acid, respectively. Tables 5 and 6 show their dimensionless intra-centroid free energies. Note, the computation of these free energies requires the centroid simulations described in Ref. [8] to be performed only at any one temperature. This is because we scale the spring constants of the centroid with     Figure 3. The saturation pressure and driving force for crystallization of naphthalene from vapour at 298 K. The shaded area respresents a 1 standard deviation confidence interval. temperature. Tables 5 and 6 also show the results of transformation free energies of the centroid to a molecule for naphthalene and succinic acid, respectively (see Supplementary Information for further breakdown of the transformation free energies). Using a C (T, P sim ), a * →G (T), and a * ,intra (T) obtained from our simulations we compute the volume (v G ) using Equation (1c). We then use the ideal gas equation of state to compute the vapour pressure. Figures 3 and 4 show the absolute chemical potentials of solute in each phase as a function of the vapour pressures of naphthalene and β-succinic acid, respectively. The predicted vapour pressures at different temeperatures for the two systems are reported in Tables 7 and 8.

Solid free energy results 2.2.2. Centroid free energies & centroid-to-molecule transformation free energies 2.2.3. Vapor pressure & vapour growth driving force predictions
Having computed the vapour pressures at various temperatures, we plot the natural logarithm of vapour pressure against inverse temperature, i.e. construct a Clausius-Clapeyron plot from the vapour pressure data for each molecule. As seen in Figures 5 and 6 the simulation Figure 5. The Clausius-Clapeyron plot of naphthalene's sublimation vapour pressure and temperature, where P † = 1 Pa, and T † is the corresponding sublimation temperature ; the secondary y-axis tick labels are the simulation data points.  We also plot experimental vapour pressure data by Ambrose et al. [34] for naphthalene and by Bilde et al. [35,36], Cappa et al. [37] and Saleh et al. [38] for βsuccinic acid. For β-succinic acid, since the vapour pressures being measured are extremely low ( ∼ mPa), their measurement is very difficult. This can be seen from the spread in experimental data between different research groups and the uncertainty in the respective data points. Our vapour pressure predictions lie within the spread of the experiments [36] for β-succinic acid and in close agreement with experiments for naphthalene. Note that the vapour pressure predictions exponentially magnify small errors in the free energy calculations (see Equation (1c)). Approximately, a 0.05% error in absolute free energy estimates translates to a 10% error in vapour pressure predictions. This is also demonstrated by the revised results for β-succinic acid. Increasing the accuracy of the free energy calculations by ≈ 1% (primarily by including long range dispersion corrections in the solid phase), the vapour pressure predictions decreased by 30-50%. Thus, we can conclude from these solid-vapour equilibrium predictions that OPLS and GAFF accurately capture the solute-solute interactions for naphthalene and β-succinic acid, respectively.

Predicting solid-solution equilibrium
At solid-solution equilibrium we have As we can see from Equation (2b) we just need additional solvation free energy computations at different concentrations to obtain the chemical potential of the solute in solution [39][40][41]. In the next sections we describe solvation free energy calculations for naphthalene and succinic acid in aqueous solutions.

Simulation details: solution phase
We perform hydration free energy calculations using the decoupling approach as laid out in our previous work, Khanna et al. [41]. All MD simulations for the solution phases were performed using OpenMM [42]. Naphthalene as described by the OPLS-AA [16] force field was solvated in 864 SPC water molecules [43]. Succinic acid as described by the GAFF [17] forcefield was solvated in 864 SPCE water molecules [44]. The Lorentz-Berthelot combining rules [23][24][25] were used for computing the interatomic pair coefficients. The initial configurations were generated using packmol [45]. We used a time step of 0.5 fs. Nonbonded interactions were cutoff at 1 nm, with long-range electrostatics handled by OpenMM's Particle Mesh Ewald (PME) summation [46]. Long-range dispersion corrections to energy and pressure were applied.
For naphthalene, the NPT ensemble was simulated using Langevin thermostat and a Monte-Carlo barostat at T = 298.0 K and P = 1.01325 bar. All systems were equilibrated for 4 ns followed by 8 ns production runs. A 15 point Gauss-Legendre quadrature method [28] was used to evaluate the coulombic solvation free energy. For computing the solvation free energy due to the LJ interactions we used a 45 point Gauss-Legendre quadrature. Data every 2 ps was used to compute the thermodynamic averages.
For succinic acid, to compute the coulombic solvation free energy a 5 point Gauss-Legendre quadrature was employed. Replica exchange molecular dynamics (REMD) simulations were performed to sample the COOH dihedral (see Supplementary Information). A total of 22 replicas spanning temepratures 300-470 K were used. Since, the temperatures employed in the REMD simulations were above the boiling point of water at 1 atm, we approximated the NPT ensemble by simulating a NVT ensemble using the average of the box volume computed at 300 K, 1 atm pressure, and N solute = 0 and 1, and take the mean value. Each replica was run for 18 ns. The first 8 ns were discarded as equilibration burn, and remaining 10 ns of production run data was sampled every 2 ps to compute thermodynamic averages. For computing the solvation free energy due to the LJ interactions we used a 45 point Gauss-Legendre quadrature. We discarded parts of the trajectory where the COOH dihedral flipped to the trans position by chance as we know from the COOH PMF that it prefers to be in the cis conformation until it is coupled with coulombic interactions with water. All simulations for the LJ part were run for a total of 8 ns, of which the first 4 ns of thermodynamic data was discarded as a part of equilibration burn. Data every 2 ps was sampled to compute thermodynamic averages.

Results and discussion
Tables 9 and 10 show the results for the dimensionless solvation free energies computed at different temperatures for naphthalene and succinic acid, respectively. Next, using Equation (2b), the dimensionless free energy results for the solid, gas and solution phases (Tables 3, 5,  and 9, for naphthalene and Tables 4, 6, and 10, for succinic acid) we generate the solubility plots as shown in Figures 8 and 9. These plots make use of two Table 9. Dimensionless solvation free energies of naphthalene (OPLS) in water (SPC). approximations for x NAP ≤10 −4 and x SA ≤0.05: (i) the volume of mixing is zero, and ii) G solv (x) ≈ constant.  Figure 9, the μ solution solute curve is shifted up by ca. 1 k B T or if the μ solid solute curve is shifted down by ca. 1 k B T, the force field predictions will align with experiments. This difference is less than 1% of the free energies being computed. However, due to exponentiation, the solubility predictions still deviate substantially from experimental measurements. Nonetheless, as is discussed in the later results, the solubility trend with temperature is captured well by GAFF + SPCE force fields. We hypothesise, that most of the ≈ 1k B T inaccuracy stems from the force field, in particular, the succinic acid -water interactions which follow standard Lorentz-Berthelot combining rules. Other sources seem less likely. For example, the succinic acid-succinic acid interactions are validated via the solid-vapour prediction and the waterwater interactions are from an extensively tested SPCE water model. Sampling errors should be small given the extensive sampling adopted here. Finally, assumptions of a constant solvation free energy and zero volume of mixing should be valid given the solute mole fraction never exceeds 5%.
We further compute the solubility of our two model compounds at different temperatures and plot the results on van't Hoff plots (ln(x sat ) vs. 1/T) to visualise the solubility trend and compute the enthalpy of dissolution-given by the slope of a van't Hoff plot. The solubility predictions lie on a straight line bolstering the results (see Figures 10 and 11). The enthalpies of dissolution for naphthalene and β-succinic acid are 7.8 ± 0.4 and 8.3 ± 0.7 kcal/mol, respectively-which are close to agreement with experimental values of 7.1 kcal/mol [47] and 7.6 kcal/mol [48], respectively. Figure 9. The solubility plot for (β and γ )-succinic acid in water at 300 K and 1 atm. This plot shows the driving forces for crystallization as predicted by GAFF + SPCE force fields. The shaded area respresents a 1 standard deviation confidence interval. We predict a vapour pressure of 0.0142 ± 0.001 mPa (using GAFF) and a solubility mole fraction of 0.0263 ± 0.002 (using GAFF and SPCE) for the γ polymorph of succinic acid at 300 K -whose vapour pressure and solubliity have not yet been measured experimentally. In fact, the γ polymorph has not been synthesised since its first and only concomitant appearance with the stable β form during an attempted purification of a peptide by cocrystallization [15]. The solid free energy results shown in Figure 9 indicate that the γ and β form are close in stability to one another at 300 K with the γ form being more stable by 0.39 k B T/molecule. Note, however that the GAFF force field doesn't accurately predict the crystal structure of the γ form (as shown in the Supplementary Information). Therefore, the true accuracy of these predictions remains to be seen until future experiments which successfully synthesise the γ polymorph and measure its solubility.

Conclusion
We have developed a decoupled route to compute fluidsolid equilibria avoiding the fluid-to-solid transformation steps. Our approach combines judicious reference states, advanced sampling methods, and solution free energy calculations to predict the absolute chemical potential of a solute in the gas, solution and crystalline solid phases. For the gas phase, we start from a gas phase 'centroid' reference system with the same number of vibrational, rotational and translational degrees of freedom as the real molecule. This enables us to accurately compute absolute free energies of 'floppy' polyatomic molecules. For the solution phase, we compute aqueous solvation free energies using multiple stages of free energy pertubations. For the solid phase, the absolute chemical potential is computed via the Frenkel-Ladd method. We used these techniques to compute sublimation vapour pressures (P sat ) in temperature ranges 298-333 K and 300-350 K for naphthalene and succinic acid, respectively. The predicted sublimation vapour pressures are in close agreement with available experiments, demonstrating the precission of the vapour and solid phase chemical potential calculations and the accuracy of the solute force field.
The computed solubility limits (x sat ) are 17% smaller than a measured value for naphthalene and 3 times larger than a measured value of succinic acid. However, standard error of the x sat calculations are extremely small ( < 10%). This demonstrates the precision of the methods and confirms that the inacuracies are due to solutesolvent and solvent-solvent parts of the force field.
The ability to compute precise fluid and solid phase chemical potentials should facilitate the testing and development of accurate force fields as shown here. And as also shown in previous work (Zimmermann et al. [6] and Joswiak et al. [7]) it should also help calculate accurate nucleation and growth rates.