Multipole models of sulphur for accurate anisotropic electrostatic interactions within force fields$

Abstract Nowadays, as computing has become much more available, a fresh momentum has been observed in the field of re-visioning and re-parameterizing the usual tools, as well as estimating for the incorporation of new qualitative capabilities, aimed at making more accurate and reliable predictions in drug discovery processes. Inspired by the success of modelling the electrostatic part of the halogen bonding (XB) by means of the distributed multipole expansion, a study is presented which attempts to extend this approach to a tougher case of σ-hole interaction: sulphur-based chalcogen bonding. To that end, 11 anisotropic models have been derived and tested for their performance in the reproduction of reference ab initio molecular electrostatic potential. A careful examination resulted in three models which have been selected for further examination as a part of the molecular mechanics force field (GAFF). The combined force field was used to estimate inter- and intra-molecular interactions for the molecular systems, capable of differentiating the binding from the σ-hole and other directions. The anisotropic models proposed were generally able to correct the wrong predictions of the sulphur models based only on isotropic charges and, thus, are a promising direction for further development of the refined electrostatics force fields.


Introduction
The recent surge in the supply of available computing power has led to a re-evaluation of many molecular modelling methods, which have been almost unmodified for a few decades. This is especially applicable to molecular mechanics force fields. One of the main driving forces for this movement is not related, however, to a uniform quantitative improvement of prediction quality, but rather stems from the inability of the classical force fields to describe certain interactions, even at a qualitative level. One such prominent example is halogen bonding (XB), where classical approaches using atom-centred charges are not capable of describing the observed interactions, due to the lack of detailed description of molecular electrostatic potential (MEP) distribution around heavy halogen atoms. The in-depth picture reveals high anisotropy of the MEP, with the area of the positive potential on the cap of halogen atoms (σ-hole) resulting in attractive interaction with electron-rich Lewis donors, impossible in isotropic 'first order' descriptions [1][2][3][4][5]. The new interaction, thus, could be loosely called a 'second order' one, since it is not described by the generally correct but rough isotropic electrostatic potential approximation; however, it is capable of describing the 'first order' effects well.
For the XB description at the empirical level (for the force fields) several models have been proposed [3] -most of them based on introducing asymmetry of the MEP, the most consistent being an extra-point (EP) charge approach [6][7][8][9], an atom-centred multipole expansion [10], in which a single quadrupole term leads to qualitative improvement [11], and some others [12].
Continuing the trend described above, attention was drawn to the sulphur (II) atom [13][14][15][16][17][18], which is rather abundant in natural ligands, approved drugs and drug-like compounds. Both experimental results and high-level calculations indicate that only a decent anisotropy of interaction of sulphur atoms could describe the conformational preference for certain molecular systems [15] within the classical force fields approach.
In this study, we employ an atom-centred multipole approach in order to correct the description of the above-mentioned interactions for sulphur resulting from the MEP anisotropy. In what follows we first derive several reasonable empirical multipole sulphur models and then compare their performance concerning reproduction of reference ab initio MEP on a representative series of molecules. Next, we interpret the models in terms of underlying physics and estimate further applicability and transferability of the model parameters among similar molecules. Finally, the resulting models are applied to a few examples of interactions, for which the anisotropic nature of sulphur plays a distinguishable role, and their implications to drug design are considered.

Empirical anisotropic models
A seemingly simple procedure of obtaining atomic charges suitable for molecular modelling from ab initio calculations has several pitfalls, such as a problem of molecular symmetry preservation, overfit of non-polar residues and resulting poor transferability. Several special protocols were developed to overcome these issues, now making an estimation of electrostatic interactions a routine practice. Stepping forward from isotropic atomic charges to anisotropic atomic electrostatic models amplifies the existing problems and introduces the problem of orientation of anisotropic atomic fields relative to other parts of a molecule. In order to create a potentially successful and simple to understand model, one should perform a thorough investigation of possible local coordinate frame selection and parameter minimization of the model early in development.
For this purpose, 15 different multipole models ( Figure 1) were investigated on the set of four sulphur-containing molecules: thiophene (2), thiazole (3), dimethylsulphide (4) and dimethyldisulphide (5) (Figure 2) as the models of sulphur in drug-like compounds. The furan (1) molecule was also investigated for comparison of oxygen and sulphur atoms. The anisotropic models differ in additional multipoles placed on the sulphur atom, their orientation and symmetry ( Figure 1). The first three models are AM1-BCC, RESP and Merz-Kollman (MK), all of which represent the well-established charge schemes [19][20][21][22][23]. The first two of them are default electrostatics choices for the AMBER family force fields [24,25], while the last one, the Merz-Kollman atomic charge approach, is a direct fit to reference MEP and serves as a reference point of maximum MEP reproduction quality, achievable with atomic charges. The next model, m ('m' states for monopole), employs atomic charges only and an additional dipole and quadrupole for the nitrogen atom of thiazole, to filter out the MEP error introduced by its anisotropy. It also includes topological constraints, so topologically symmetric atoms bear equal parameters. These constraints were applied for all further multipole models. The m model can be treated as an MK approach with anisotropy of the nitrogen atom 'saturated' by an excessive set of multipoles. This 'saturation' approach was successfully applied earlier [11] and here it was utilized by default for every multipole model unless otherwise stated.
The other models use dipoles and traceless quadrupoles. The following definitions for electrostatic potential of multipoles were used:  in which q α is a charge of the αth atom, located at r α , d α is a dipole moment vector and Q α is a quadrupole moment tensor. To increase the ease of parameter interpretation and minimize the overfit, the dipoles and quadrupoles were restrained by the local coordinate frame. Since the divalent sulphur fragment C-S-C has a C 2v symmetry we chose a local coordinate frame consistent with this symmetry: the origin is located in S nucleus, the z-axis is directed along the symmetry axis, the y-axis lies in the plane defined by C-S-C atoms and the x-axis is perpendicular to this plane. It is also possible to choose another coordinate frame, representing the σ-hole concept. In this case, we put the z-axis to be perpendicular to the C-S-C plane and rotated the x-and y-axes by 45° relative to the 'old' z-and y-axes. Since the valence angle of sulphur is very close to 90°, the resulting x-and y-axes are aligned with C-S bonds, making possible the fine-tuning of both σ-holes. This frame breaks the C 2v symmetry of the C-S-C fragment, but it is perspective for highly asymmetric molecules with different substituents in both C α -positions. The next model, md z , adds an atomic dipole, directed along the symmetry axis of the molecule, as shown in Figure 1. Models mq, mq x , mq y and mq z place the atomic quadrupole with main components directed along the axis of the local frame, as shown in the picture. The quadrupole in the mq model is symmetrically unrestricted, while in models mq x , mq y and mq z it is constrained to be symmetric relative to local x-, y-and z-axes, respectively. The quadrupole in the mq r model is also unrestricted, but the local frame is rotated by 45° so the axes lie on the continuation of C-S bonds -in the σ-hole area. The models md z q, md z q z , md z q x , md z q y and md v q r employ both the atomic dipole and the corresponding quadrupole. This set of models thus obtained covers all reasonable orientations and constraints of atomic multipoles and has only 1-3 additional parameters per sulphur atom relative to pure atomic charges.

Reference MEP calculation
RHF/6-31G* level of theory was chosen for reference MEP calculations, since it is a default choice for RESP atomic charges [21] and, thus, the AMBER family force fields [24,25].

Fitting models to reference MEP
For the fitting of all of the models, the reference MEP was estimated on five layers (1.6-, 1.8-, 2.0-, 2.2-and 2.4-times larger than van der Waals radius) of Connolly surfaces with a 1.12 Bohr -2 point density in Firefly v 8.2 software [26]. The point density was increased from the RESP standard one for better sampling. The molecular geometry was generated with OpenBabel v 2.4.0 software [27] (with-gen3D option) and left as is. The multipole model fitting was performed as described by Sigfridsson and Ryde [28] with in-house software Electrostatic Tools freely available on our website [29]. This technique allows for incorporation of an arbitrary number of linear constraints on the parameter values and uses the resulting system during the MEP fit.
We used only well-defined components of the fit design matrix. The final fit was carried out through singular value decomposition (SVD) pseudoinversion procedure. Only the vectors for which σ 1 /σ n < 10 7 were used, where σ 1 is the maximum singular value and σ i-1 ≥ σ i . Thus, all singular values greater than n were set to zero (σ n+1, ..., N = 0). This results in minimization of the components, where correlations do not affect the MEP reproduction. Notably, the RESP charges approach [21] strives to handle the same problem, but uses a less strictly defined technique based on restraints on absolute values of the charges fitted. This effectively reduces the almost random fluctuations of charge values pertinent to unrestrained MK charges, which complicate the consistent use of them within the force fields.

Reference calculations for energy estimation
To test the final models, two molecular mechanics experiments targeting inter-and intramolecular interactions were performed. We chose GAFF [25] as one of the widely-used classical force fields. It has also been parameterized so that ab initio MEP was used as the source of atomic charges, followed by the derivation of the rest of non-bonding parameters. Since GAFF has been chosen to incorporate the anisotropic sulphur models and compare them to the models based on only isotropic charges, the ab initio calculation level has been used as to exactly correspond to the level reported in the GAFF paper [25]. This ensures the most consistent comparison of the resulting force fields energies with the ab initio reference.

Intermolecular interactions
As the anisotropic sulphur atom incurs the difference in intermolecular interactions with electron-rich Lewis donor atoms, we exploited this idea explicitly to compare different models under study.
A thiophene-pyridine system was chosen for intermolecular interactions. Potential energy surface (PES) scans along three axes were performed ( Figure 3) with different S‧‧‧N distance values. In the first scan (Figure 3(a)), the C 2v -symmetric complex with thiophene and pyridine molecules occupying different planes was constructed and the S‧‧‧N distance was varied. This is 'head-to-head' collision of two molecules. In the second scan ( Figure 3(b)), the thiophene molecule was rotated so C α , S and N atoms lie on one line. In this case, the pyridine molecule approaches thiophene from the σ-hole of the sulphur atom. In the third case ( Figure  3(c)), the thiophene molecule is rotated by 90° so pyridine approaches it from the aromatic π-system (or from sulphur's lone pair position).
To comply with the GAFF calibration technique, the geometry of monomers was optimized at the MP2/aug-cc-pVTZ level of theory and frozen. The MP2/aug-cc-pVTZ energy values corrected for basis set superposition error (BSSE) were used as a reference for intermolecular interactions. The intermolecular scans were performed in ORCA v 4.0 software [30].

Rotational relative energies
For intramolecular scans, five molecules (#6-10) were selected based on the ligand cases listed in the review [15] (Figure 4). For each molecule, two conformations with torsional angle values of 0° and 180° were selected. In accordance with the GAFF calibration procedure, each conformation was optimized with the frozen torsional angle at MP2/6-31G* level, while the other degrees of freedom were unrestrained. Then a single point calculation at MP4/6-311G(d,p) was performed. Due to the complexity of molecule #8, the S-C-N-C torsional angle was frozen, so only one angle was varied during the scan. The constrained optimizations were performed in ORCA v 4.0 software [30]. The MP4 calculations were carried out in Firefly v 8.2 package [26].
The molecular mechanics calculations of inter-and intramolecular interactions were performed in a similar manner. First, GAFF energy values with zero atomic charges were estimated with the 'sander' program from the Amber v 11 package. Next, electrostatic energy values were estimated with an in-house software based on the Electrostatic Tools package [29].

Results and discussion
The anisotropic models for sulphur, comprised from different sets of atomic monopoles (charges), dipoles and quadrupoles, were derived as described in the Methods section. The models differ in dipole and quadrupole axes direction, symmetry constraints imposed on quadrupole components values and presence/lack of some components. Their performance was first estimated in terms of quality of reproduction of the quantum chemistry derived reference MEP. Then several models were selected for further detailed analysis, based on both MEP reproduction and additional theoretical and practical grounds. Finally, the chosen models were tested in the settings of the intended use -as the electrostatic part of the GAFF molecular mechanics force field on a few model systems described in the review [15].

MEP reproduction with empirical anisotropic models
As described in the Methods section, for the sake of correct comparison of the quality of MEP reproduction, anisotropy resulting from all the other polar atoms was minimized by saturating its description with non-restricted multipoles up to the quadrupole moment. This results in the leading source of the remaining MEP root mean square error (RMSE) being attributed to a different description of the sulphur atom in the models studied rather than to the rest of the molecule. The results are shown in Table 1.
First, the resulting RMSE obtained with MK charges are very close to the one for the charges only model m. The only reasonable difference is for structure #3, where the nitrogen atom of thiazole introduces a significant portion of MEP RMSE due to a high anisotropy, not perfectly described by atomic charges only. Here one can see that the nitrogen atom saturated with an atomic dipole and unrestrained quadrupole results in a far lower MEP RMSE. In the cases of the other structures, the MEP error for MK charges is lower due to the absence of the topological constraints, that is the MK model overfits the data. This verifies the algorithms adopted for MEP fitting work correctly. The performance of the charge-only models MK and m constitutes the bottom line for the models we study, since they are the most accurate for the MEP grids used within the charge-only models. The enhanced multipole-based models proposed should lead to better general MEP reproduction.
Second, one can see that both RESP and especially AM1-BCC charges result in worse RMSE performance due to their relevance to the MEP. RESP charges bear restraints and constraints [21], which on the one hand make them more consistent across similar molecules compared to MK charges, but, on the other hand, it is achieved by the price of certain deterioration in the direct purpose performance. AM1-BCC charges are by design just approximate MEP with charges first obtained via the AM1 method for the molecule under study and then adjusted by the generally fitted constant bond charge increments to mimic the charges obtained through RESP procedure [19,20]. Thus, they are unrelated to the MEP grid used to derive charges for the other models, causing the largest MEP error among the models compared. Still, AM1-BCC charges are more practical in the case of everyday applied modelling, as contrasted to refined in detail theoretical studies, because the method is significantly faster and readily accessible via AmberTools. Hence, the AMBER group recommends the AM1-BCC method as a practical tool for small molecules at everyday use basis.
As could be expected, the most accurate is the model with all components unrestrained (md z q). The choice of z-axis cannot be considered as an additional restraint, since the local symmetry of the fragments surrounding the sulphur atom leaves only this direction as reasonable, as described in the Methods section. Lifting this z-axis restraint could potentially result in certain negligible improvement in the MEP RMSE reproduction, due to the implicit account of anisotropy due to distant fragments of the molecules. Since the C-X-C fragments of molecules #1-4 are C 2v -symmetric, this only applies to molecule #5.
The addition of dipole moment to charge-only sulphur atoms in molecules #2-5 does not result in appreciable improvement, as compared to that for the oxygen atom in #1. This can be interpreted as the charge of the oxygen atom in the charge-only model should ensure the intricate balance between low values of MEP out of the aromatic plane and relatively low molecular dipole moment (see the Supplementary Materials for further details). The former favours greater modulo negative charge on oxygen, whereas the latter, on the contrary, favours lesser modulo value of the oxygen charge. This ill dependency is lifted with the addition of a dipole moment to furan. The MEP of thiophene out of the aromatic plane and close to sulphur is not so negative as for furan ( Figure 5), but the dipole moments are close for the two molecules. Thus, the sulphur charge is more comfortable to ensure the balance described for furan above and the additional atomic dipole moment does not bring significant improvement in MEP reproduction.
Within the series of models where quadrupoles are added above the charge-only model, the performance improves unevenly. Models mq and mq z show the more pronounced improvement. Interestingly, model mq y , where x-and z-components are constrained to be equal, does not lead to even negligible improvement in the oxygen-containing molecule #1. This fact pinpoints the difference in the peculiarities of MEP distribution between oxygen and sulphur atoms in the similar molecular environment. From the theoretical viewpoint this can be attributed to an enhanced p-character of molecular orbitals for sulphur compared to oxygen, leading to the more distinct out-of-plane direction of the lone electron pairs. This series of models also reveals the difference in electrostatic potential distribution between aromatic (molecules #2-3) and aliphatic (molecules #4-5) sulphur: models mq and mq z are capable of providing a better description for aromatic molecules.
Addition of a dipole moment to charge and quadrupole moment models (effectively along the z-axis) leads to further minimization of MEP RMSE. A notable exception to this trend is the md z q z model, where no significant improvement is observed for #2-5. Reference (RhF/6-31G*) mEP of furan (left) and thiophene (right) in the plane coming through the heteroatom (o and S, respectively) and the symmetry axis and out-of-plane for the aromatic ring. oxygen exerts more negative potential along the symmetry axis of the molecule, whereas sulphur generates less negative potential distributed predominantly out of the aromatic plane, both in accord with hybridization theory.
A very informative interpretation for sulphur atoms comes from analysing models mq z and md z q z , where out-of-plane and in-plane quadrupole components are forced to be equal (Q XX = Q YY ) during the fit. This constraint led to the MEP RMSE almost exactly equal to that of the charge-only model m. Moreover, an additional dipole oriented along the z-axis does not remove the error entirely, as can be seen by comparing mq z and md z q z model performance. This observation means that the most important requirement for good reproduction of sulphur anisotropy consists of allowing in-plane and out-of-plane components of the quadrupole to have different values. Particularly, the negative lobe of the quadrupole should be directed out of the aromatic system plane, where the lone pair should be anticipated for sulphur and the positive lobe should be directed in the aromatic plane, perpendicular to the z-axis (molecule symmetry axis for thiophene). For example, in the least restraint model md z q the quadrupole components were found to be -1.98, 1.29 and 0.70 a.u. for Q XX , Q YY and Q ZZ , respectively.
Better reproduction of the MEP by means of anisotropic multipole models could be seen by inspection of Figure 6, where the ab initio reference MEP is compared to the charge-only MK potential and the least restrained md z q models. It should be pointed out, however, that the MEP quality should be compared only in the outer region (beyond the van der Waals radii of atoms), practically reachable for interaction at biologically relevant temperatures.
An important result achieved is the residual MEP error almost completely vanished (see the furan and thiophene MEP errors analysis in the Supplementary Materials) for the anisotropic model md z q compared to the charge-only models, e.g. MK. Therefore, the source of errors in molecular mechanics energy can be substantially reduced with the use of the former. Definitely, an accurate description of interactions involving sulphur atom is tricky, since it is not dominated by electrostatic interactions: the proper account for dispersion and polarization interactions should also be in place. However, a common approach to deriving force fields parameters is to rely on the parameterization of all non-bonded interactions on top of the already settled parameters for the electrostatic interactions. Thus, the accurate and reliable electrostatic interactions are the prerequisite for the consistent parameterization of other non-bonded interaction contributions. From this perspective, the achieved results are encouraging.

Thorough model comparison
The practical purpose of the study is to choose the anisotropic models for further in-depth study and development. Several criteria should be considered to assess the models. The first one is the qualitative and quantitative accuracy of the reference MEP reproduction. The second is the interpretability of the model. The third is the balance between the model flexibility and its simplicity. Excessively flexible models could lead to overfitting. The fourth is the potential means to account for the asymmetry of substitution of the fragments containing sulphur, leading to additional sources of anisotropy.
The simplest to interpret model is mq, where an unconstrained quadrupole was added to the charge-only model of sulphur. This model introduces two independent parameters -the third quadrupole component is linearly dependent due to the traceless quadrupole definition adopted in our study. Its interpretation is rather straightforward. For example, for thiophene (2) quadrupoles are fitted in such a way that the negative potential is added in x-and z-directions, whereas the positive potential -in y-direction, which is quite consistent with the main residual anisotropy not explained by the atomic charges. This model leads to better improvements for the aromatic molecules #2-3 rather than for the aliphatic ones #4-5.
The most accurate and flexible is the md z q model, which almost completely exhausts the residual error in reference MEP. This model also provides a reasonably uniform quality among both aromatic and aliphatic molecules.
The md z q x model represents the intermediate case between the two described above. It is more accurate than mq, but contains the same number of extra degrees of freedom. As in the mq model, the aromatic molecules are better described.
The results of SVD analysis of the design matrix (A) of the fit to the reference MEP allows for shedding additional light on certain relevant details. The magnitude of singular values (components of the diagonal matrix Σ) reflects the degree of influence of the linear combination of parameters given by singular vectors in parameter space (V): The SVD decomposition for the mq model for the thiophene (2) molecule ( Table 2) can be interpreted as follows. The most influential (the first one, 4.25) collective movement in parameter space is such that the charges of the sulphur atom and both α-carbons and α-hydrogens have one sign, whereas the charges for β-carbons and β-hydrogens have the opposite sign. This shows that perturbing charges (above the optimal fitted values) resulting to the molecule dipole moment changes, lead to the largest MEP errors. The most significant involvement of the components of sulphur quadrupole is observed for the last two non-zero singular values σ 5 = 0.06 and σ 6 = 0.05 given in Table 2. Low singular values mean that variation of quadrupole components does not incur a significant influence on the reproduction of the MEP around the whole molecule. This is consistent with a local influence caused by quadrupoles since their potential decays faster than that of atomic charges (r -3 vs r -1 ). The last singular vector (σ 5 = 0.05) is most straightforward to interpret, since it does not involve significant contributions from atomic charges. It reads as simultaneously increasing Q XX and Q ZZ , whereas decreasing Q YY (or vice versa) leads to the least penalty in MEP reproduction. That means the other movements between the quadrupole components are penalized more in terms of greater MEP error. This is consistent with the above analysis regarding the primordial importance of different signs for Q XX and Q YY components (Q XX < 0 and Q YY > 0). Generally, quadrupole components are rather independent of the other parameters (atomic charges), the only exception being the appreciable contribution of the quadrupoles in the fourth singular vector (σ 4 = 0.23), which represents the variation in adjacent atoms polarization.
The SVD decomposition for the md z q model (Table 3) is generally similar, but has some new features. Again, the leading contribution (the largest singular value) comes from the effective dipole moment variation of the whole molecule. Here again, the sixth singular vector (σ 6 = 0.05 -the same as σ 5 for the mq model) represents coherent x-, z-vs y-axes variations. The dipole moment of the sulphur atom is engaged in many parameter variations. The most significant interplay between the dipole and quadrupole contributions is depicted by the vector corresponding to the last non-zero singular value (σ 7 = 0.03). Thus, despite the introduction of a dipole on sulphur reducing MEP error appreciably, the degree of orthogonality of the dipole component is low when the unrestricted quadrupole is also present. That may lead potentially to overfitting, so care should be taken to handle this issue when deriving the transferable parameters using this model. Finally, the characteristics of the models chosen for further analysis and development are shown in Table 4.

Performance in the force field
The three models (mq, md z q x and md z q) selected as most promising were estimated as an electrostatic part of the GAFF molecular mechanics force fields calculations. Since GAFF explicitly relies on RESP charges derived from RHF/6-31G* MEP, we may reasonably expect that the electrostatic models which reproduce the same level MEP better are at least compatible with the rest of GAFF and at best are capable of better reproduction of the total molecular mechanics energy.

Inter-molecular interactions
For more detailed comparison, the non-electrostatic contribution of the GAFF (equal to the van der Waals interactions) was also calculated and plotted, along with the ab initio reference interaction energies (Figure 7). The orientations chosen to sample intermolecular interaction energy are specifically such that any significant difference in Lewis donor (nitrogen of pyridine) approaching a σ-hole of the sulphur and other directions should be exposed if it is present in reference calculations. One can observe (Figure 7) that the reference calculation actually predicts that the B approach (directly to the sulphur's σ-hole) is the most energetically favourable. The reference interaction energies (in the approximation of frozen monomers) are ca. -0.8, -2.6 and -1.1 kcal/mol for A, B and C orientations, respectively.
Another general observation is that the repulsion at close distances manifests itself at significantly different separations for A, B and C orientations. Equilibrium minima are achieved at 3.7, 3.2 and 3.5 Å, respectively. An even more striking difference is observed for the steepness of the repulsion part of the curve: the zero energy is crossed at 3.2, 2.6 and 3.0 Å, respectively. This well corresponds to the polar flattening effect [31]. Therefore, the reference calculations imply that, for force field calculations to be accurate, they should mimic the  above difference in close distance repulsion. For the classic force field, it means that the van der Waals interactions should be anisotropic. This requirement has been revealed in a similar case of heavy halogen atoms to be able to describe halogen bonding (XB) accurately [12].
In the current study van der Waals interactions ('No elec' legend, Figure 7), on the contrary, reveal the uniform equilibrium distances and zero energy cross separations, the former being 3.5 (with a small variation) and the latter lying in the range 3.0-3.2 Å for all orientations studied. It should be noted here that at short distances the electrostatic interactions cannot override the repulsion part of the van der Waals interaction potentials due to the former changes as r -1 (or r -3 for quadrupole-charge interaction), whereas the latter changes as r -12 . So, this experiment reveals that the accurate account of interaction involving sulphur atoms requires an anisotropic repulsion term in force fields.
The addition of electrostatic contribution due to atomic charges (MK) alters the interaction curves. Different charges used in the study give very close interaction curves (see the Supplementary Materials), so we retained only the MK model here for clarity. As could be expected from the simple picture of the interaction of two Lewis isotropic donors (sulphur and nitrogen) for orientations A and C, the equilibrium minima are shifted by 0.4 and 1.3 kcal/mol, respectively. Surprisingly, for orientation B, no appreciable alteration occurs, perhaps due to the cancellation of electrostatic effects from several surrounding atoms. The final total energies with MK charges are -0.4, -1.0 and -0.2 kcal/mol for A, B and C, respectively.
All anisotropic models studied predictably added to differentiation of energies between A/C vs B directions above the MK curves. Their predictions are generally very consistent. A distinguishable difference in the behaviour of the models is observed for approach A. The mq model performs in a most aggressive manner, leading to the interaction energies in A, B, C series to be ca. -0.2, -1.4, 0.0 kcal/mol. The origin of difference is the precision of the mq model which reproduces a relative depletion of negative MEP along the z-axis less accurate than the other anisotropic models, due to its relative simplicity. This results in more negative electrostatic potential precisely in the direction of approaching of the stronger Lewis donor (the nitrogen atom of pyridine) in this case.
The main goal of this part is to compare what the anisotropic models could add above the current de facto standard description -RHF/6-31G(d) MEP derived charges, using the most accurate option playing for charge-only models -MK charges. Thus, one should compare results obtained by combining non-electrostatic GAFF part with MK charges and with the anisotropic models. This comparison reveals that the anisotropic models consistently favour σ-hole direction for interaction with donor, as judged by A-B and C-B minimum energy differences: 1.8 and 1.5 kcal/mol for reference; 0.6 and 0.8 kcal/mol for MK; and 1.0-1.2 and 1.3 kcal/mol for the anisotropic models. In the case of anisotropic van der Waals interactions, the sigma-hole direction -B orientation -would be even more preferred since, in that case, van der Waals repulsion starts overcoming attractive electrostatic interactions at closer distance, where gain from the anisotropic electrostatic interaction is higher.
Thus, the intermolecular interaction study reveals that: (a) GAFF interaction energies are significantly different from the reference energy obtained, as suggested by GAFF paper [25] for the system under consideration, (b) the van der Waals repulsion in force fields should be anisotropic in order to accurately describe the interactions involved, (c) the use of isotropic atomic charge models predictably offsets the equilibrium energies making them less favourable and (d) the anisotropic models predict larger energetic difference for different approaches, as should be expected.

Intra-molecular interaction
The main idea was to evaluate how well the conformational preferences, potentially affected by sulphur σ-hole interactions, are described for a part of the molecular systems reported in the review on sulphur bonding [15].
The set-up was similar to the one used in the above study of intermolecular interactions. The major difference is that for intermolecular interactions frozen monomers approximation has been used to rule out the force field bonding interactions. In an intramolecular interaction study that is not applicable, hence the force field bonding interactions (bond, angle and dihedral) are all turned on for all conformers, along with the van der Waals and electrostatic parts. Another point is that the conformers have been obtained by partially constrained optimization (for a torsional angle to have a defined value) at MP2/6-31G*//MP4(SDTQ)/6-311G(d,p) level, as reported in the GAFF paper [25] for deriving torsional parameters. One of the consequences of it, however, is that each conformer thus generated has different values for not only torsional angles, but also for the valence angles and even for the bond lengths. Consequently, each conformer total energy has different contributions from these energy terms, which somewhat complicates the analysis.
Finally, 'No elec' values, comprised of all the GAFF energy terms, except for the electrostatic ones, were also calculated and reported for comparison.
For each equilibrium studied (Figure 8), two conformers have been generated to represent the conformations by setting torsional constraints to 0 and 180° (more details in Methods). The differences in total energies of the right and the left conformers (Figure 8) comprise the relative conformational preference energies and are shown graphically in Figure 9.
One can see that generally the reference energies are not well described by the force field. For #8 and #10, the signs of energy difference are correctly predicted by nearly all models, whereas for the remaining three molecules this is not the case.
It is interesting to note that the 'No elec' contribution for all equilibria has the opposite sign in comparison to the reference energy, except for a single case of equilibrium for molecule #8.
To better understand the root of poor non-electrostatic force field energy, the component-wise GAFF force field energy differences were examined (see Table 1S in Supplementary Materials). For most systems, the conformational energy difference is dominated by terms with apparently hard degrees of freedom due to bond length and valence angle changes. The origin of these terms lies in the procedure adopted by us and used to derive molecular mechanics force field energies using the geometries obtained from ab initio partially constrained (by the desired torsional angle value) optimization. Thus, the bond length and valence angle slightly differ from the optimal set in GAFF. As long as those degrees of freedom are firmly penalized within the force fields, even slight differences can produce appreciable values of energy. Vice versa, if the molecules were optimized in the force field, penalties from these hard terms would quickly relax, unless the system is sterically hindered. The latter is not generally the case.
Thus, we decided to exclude the influence of those components (bond and angle terms) since they would cancel quickly if the systems were allowed to relax. The total energy difference for the conformers was adjusted as though those terms make zero contribution (E rel value in Table 1S in Supplementary Materials). A new set of conformational energy differences was derived using this adjusted value and the set of the electrostatic models studied (Figure 10).
Equilibrium #6 (Figure 8) represents the case where the possibility of one unconventional and weak C-H‧‧‧N hydrogen bond is effectively compared to possible sulphur σ-hole-nitrogen interaction, which we aim to model better. The charge models (MK and RESP) expectedly predict the reverse conformational preference due to the electrostatic repulsion of the two Lewis donors (S and N), both having negative charges in the charge-only models. The anisotropic models do not exhibit such behaviour. Models md z q and md z q x slightly offset the  non-electrostatic contribution to the correct direction. However, model mq adds ca. 1 kcal/ mol to the wrong direction, which is still far less than the offset introduced by the atomic charge-only models. Molecule #7 represents a tough case which compares directly two interactions: 'sulphur as chalcogen bond donor-carbonyl oxygen' and 'sulphur as hydrogen bond acceptor-polar hydrogen of amide' . Both interactions are not very strong, at least they should be less than the conventional hydrogen bond in energy. In this case, the reference calculation also predicts sulphur σ-hole interaction as slightly more favourable. The charge-only models (MK and RESP) predictably display the error, since the underlying approach effectively compares repulsive interactions of two Lewis donors, on one side, and a weak hydrogen bond on the other -both favour the right conformer (as in Figure 8). The anisotropic models consistently predict the correct sign of the energy difference and comparable absolute value, although the correspondence is not perfect. Model mq is distinguished by predicting the smallest energetic preference.
Molecule #8 represents a case similar to molecule #7, with an exception that the former represents the so-called 1-5 interactions, whereas the latter demonstrates the 1-4 ones. This implies somewhat different angles at which a Lewis donor approaches the sulphur atom. In 1-5 interactions this angle is closer to 180°, required for optimal σ-hole interaction. In accordance with the above reasoning, the reference energy preference is the most severe among the whole set of molecules, reaching 5 kcal/mol. The charge-only models are unable to reproduce this difference. However, the anisotropic models are quite capable of predicting both the correct sign and significant magnitude of the difference, the mq model giving the most conservative estimation.
The next two cases -molecules #9 and #10 -are the systems where the adjusted non-electrostatic force field energy (E rel ) significantly differs from zero, mainly due to the uncompensated van der Waals interactions. This also applies to molecule #8, but the van der Waals penalty here appeared to be compensated by non-zero residual dihedral energy. The force Figure 10. Estimated energy differences between conformations investigated using the adjusted nonelectrostatic force field energy. field energy breakdown reveals that for all these three molecules the left conformer is significantly more penalized by the van der Waals repulsion. This repulsion could most probably arise from the geometrically closer donor-donor distances and the deficiency of the isotropic van der Waals interaction for sulphur, which is not quite accurate, as was discussed in the Intermolecular interactions section. All the molecules #8-10 involve 1-5 interactions, with the oxygen/nitrogen Lewis donors approaching very close to the ideal direction for σ-hole sulphur complexes formation. This direction should be more depleted with electronic density; therefore, less repulsion should be perceived by the donor in this direction compared to the other directions. If we assume this deficiency of repulsion part of the force field potential is fixed, the E rel values should almost completely cancel out. The interpretation of the results for molecules #9 and #10 alters in such circumstances. The charge-only models (MK and RESP) still predict the wrong sign of the energy preference for #9. All the anisotropic models (after offsetting by E rel ) predict the correct sign and reasonable magnitude, comparable to the reference value. The closest values then would be provided by the mq model.
Molecule #10 differs from the other molecules studied above by the presence of a non-aromatic sulphur atom, which appears to be more electron rich generally. The latter can be judged by inspecting the monopole (atomic charge) values fitted in different models for molecules #2 and #4 (see Multipole values fitted to MEP for all molecules in the Supplementary Materials). In the former case, the monopole values are all positive, although not large in absolute values. On the contrary, most monopoles, especially in lean models, are negative for the latter molecule. This is in accordance with the reference MEP inspection (Figure 11). The aliphatic sulphur atom of #4 exerts far more negative electrostatic potential in the area of the potential lone pair. This more electron rich nature of the aliphatic sulphur explains the preference for the classical hydrogen bond (although rather weak) and the repulsion interactions (perhaps somewhat less for the direction of the σ-hole) with the oxygen Lewis donor. After applying E rel offset, the charge-only models correctly predict the more favourable conformer, but significantly exaggerate the energy difference. The anisotropic models give similar sign and value results as do the charge models. Model mq predicts a considerably smaller preference for the classical hydrogen bonded conformer.
Summing up, this intramolecular study reveals that: (a) the isotropic charge-only models are not capable of correct prediction of the conformational preference involving 'aromatic sulphur-Lewis donor' interactions, (b) the anisotropic models make predictions more consistent qualitatively and quantitatively with the reference values and (c) the molecular mechanics force fields should be reparameterized more accurately for the van der Waals parameters. Most probably, the van der Waals interaction anisotropy is necessary to obtain accurate predictions.

Future directions
The multipole anisotropic models for sulphur have two main avenues for further development and use. The first one relates to directly applying the models as a source of special descriptors to QSPR/QSAR models. To this end, the following set-up could be applied: (1) for each molecule of a set studied perform an ab initio calculation of the electrostatic potential (MEP); then (2) derive multipole values from the MEP by one of the methods proposed above; (3) optionally, take atomic charges from the other (non-S) atoms and, perhaps, derive usual descriptors from them; (4a) use the multipole values as separate descriptors, e.g. adding columns d z , Q XX , etc. to the regression model. This effectively assumes the molecules of the set are either aligned similarly with their sulphur atoms within a hypothetical site or alignment being not relevant -as for e.g. QSPR studies; and (4b) a finer and more consistent approach is to use the MEP values around the S atom, derived from the multipoles (including charges from the neighbouring atoms), as descriptors -e.g. maximum and minimum values of the MEP may well correlate to binding, as has been published many times for hydrogen and halogen bonding studies.
The second avenue is devoted to consistent building of the more accurate description of molecular interactions via force fields -i.e. aimed at their systematic improvement. The main question here is how to better delineate and parameterize different force field contributions. The electrostatic contribution, which, according to the perturbation theory, is a Coulomb interaction of the unperturbed charge densities of the interacting species, lays the ground and the correct long-range limits for interactions as applied to drug-like molecules. It is physically sound to derive the other contributions (such as polarization, dispersion and even torsional terms) above the well-established electrostatics. In fact, such an approach to derivation of parameters has been the general practice for the current generation of the force fields (e.g. MMFF94 [32] and GAFF [25]). It is evident, however, that, in order to derive more accurate force fields, one should start from the more accurate base -the electrostatic interactions -and then proceed systematically to other contributions, choosing the approximation levels and the desired accuracy for them as necessary.
In this perspective, within the study we provide a more accurate building block for anisotropic electrostatic interactions involving divalent sulphur atoms. The next step is to check variability and transferability of multipole parameters on a broad set of sulphur containing molecules. An interesting fact has been revealed during the study, aimed at correct Figure 11. comparison of the reference mEP for molecules #2 (thiophene, left) and #4 (dimethylsulphide, right). electrostatic description of halogen bonding (XB), of multipole parameters derived for heavy halogen (Cl, Br, I) atoms [11]. Despite the electrostatic potential around halogen atoms varying significantly with the variation of donor and acceptor substituents in the aromatic core, the quadrupole values varied only negligibly. Conceptually this complies fully to the theoretical view on σ-hole anisotropy origin as a result of a perturbation (in terms of perturbation theory) involving hypothetically placing the free atom of halogen into its actual atom-in-molecule environment. This environment effectively polarizes the atom -a significant portion of the valence electron density shifts to the region between the halogen and the nearest atom, i.e. on the chemical bond. It is reasonable to assume that such perturbation caused by the close proximity of the bonded atom is much higher in energy than the perturbations caused by more distant substituents of the molecule and proximity of different molecules. Quadrupole moment with the main axis oriented along the C-Hal axis effectively represents such "fixed" internal halogen atom polarization and, thus, does not depend much on aromatic substituents. Clearly a more robust and correct description of intra-and inter-molecule polarization is possible on top of the electrostatic description already including the above higher order internal polarization effects.
Operationally, the notion of intrinsically polarized atoms brings the opportunity to build an empirical model to derive all electrostatic parameters for molecules in such a way that the intrinsic atom-in-molecule anisotropy is described by nearly constant parameters, whereas the remaining description is done via proper description of atomic charges, based on e.g. electronegativity relaxation principle [33,34]. Such a scheme appeared to be well capable of reproducing the variation in the MEP caused by varying substituents [34].
Finally, a RHF/6-31G(d) level for MEP calculation is chosen in our study to ensure the compatibility with GAFF by design and, thus, make the models pluggable as the electrostatic part of this force field. This implies MK, RESP or AM1-BCC charges for other species as usual for GAFF. However, the anisotropic schemes themselves do not depend qualitatively on this level of quantum chemical calculation, but rather reflect the intrinsic anisotropy of the electrostatic potential of atom-in-molecule sulphur. Thus, multipole axes and relative magnitudes of the multipole sulphur models should be qualitatively transferable. As such, the models proposed are potentially mergeable with other charge calculation methods, provided those methods admit a sensible way to calibrate the magnitudes of the multipole components to harmonize them with the magnitudes of the atomic charges for the other atoms.

Conclusions
Several anisotropic electrostatic models for sulphur have been derived in the study and their performance was first evaluated as the achieved quality of the description of the reference MEP. In contrast to the electrostatic description of heavy halogens in XB, the electrostatic description of sulphur proved to be not as straightforward. This corresponds well with the generally accepted theory and empirical medicinal chemistry practice, where sulphur is considered mainly as not very polar, but rather a hydrophobic and polarizable contributor. Nonetheless, a few reasonably performing models have been identified and explained.
The models chosen were applied as an electrostatic part of the GAFF force field in estimating inter-and intra-molecular interactions. Anisotropic models resulted in a better agreement with the theoretical expectations than did the classic atomic charge approach with different charges. Thus, the approach of atom-centred multipoles lays the ground for a more accurate description of species having intrinsically anisotropic MEP around them.
However, the discrepancy between the force field and the reference potentials is high and, in some cases, is of qualitative nature. This fact emphasises the importance of further refinement of the force fields, not only in terms of inclusion of 'second order' anisotropic electrostatic effects, but rather in terms of general uniform quality of 'first order' interaction. We realize that this implies a lot of complex and scrupulous work to be done. However, we believe the outcome is warranted, since it will most probably settle the new level of molecular modelling quality and a more reasonable and fruitful use of computers and expert resources will be put into drug discovery.