Are hot-spots occluded from water?

Protein–protein interactions are the basis of many biological processes and are governed by focused regions with high binding affinities, the warm- and hot-spots. It was proposed that these regions are surrounded by areas with higher packing density leading to solvent exclusion around them – “the O-ring theory.” This important inference still lacks sufficient demonstration. We have used Molecular Dynamics (MD) simulations to investigate the validity of the O-ring theory in the context of the conformational flexibility of the proteins, which is critical for function, in general, and for interaction with water, in particular. The MD results were analyzed for a variety of solvent-accessible surface area (SASA) features, radial distribution functions (RDFs), protein–water distances, and water residence times. The measurement of the average solvent-accessible surface area features for the warm- and hot-spots and the null-spots, as well as data for corresponding RDFs, identify distinct properties for these two sets of residues. Warm- and hot-spots are found to be occluded from the solvent. However, it has to be borne in mind that water-mediated interactions have significant power to construct an extensive and strongly bonded interface. We observed that warm- and hot-spots tend to form hydrogen bond (H-bond) networks with water molecules that have an occupancy around 90%. This study provides strong evidence in support of the O-ring theory and the results show that hot-spots are indeed protected from the bulk solvent. Nevertheless, the warm- and hot-spots still make water-mediated contacts, which are also important for protein–protein binding.


Introduction
Protein-protein interactions (PPI) are the basis of many biological processes ranging from signal transduction and enzymatic regulation to the adhesion of cells. The huge importance of these interactions made protein-protein binding an area of significant interest in the pharmaceutical industry. PPI are governed by focused regions with high binding affinities, the hot-spots (Clackson, Ultsch, Wells, & de Vos, 1998;DeLano, Ultsch, de Vos, & Wells, 2000). The characterization of these structural key points is crucial to the development of means to inhibit proteinprotein association, a therapeutic modality proposed for many diseases. Hot-spots were defined as residues whose mutation by alanine destabilizes the bound state ensemble relative to the unbound one (DeLano, 2002), generating a binding free energy difference (ΔΔG binding ) higher than 4.0 kcal/mol. Warm-spots, also important for binding, upon alanine mutation generate a binding free energy difference between 2.0 and 4.0 kcal/mol; the definition for null-spots is that a mutation causes a binding free energy difference of <2.0 kcal/mol (Moreira, Fernandes, & Ramos 2007e). Hot-and warm-spots are compact, centralized regions, and mutational studies have shown that the tolerance for substitution is clearly lower for these residues than for others at the protein surface (Moreira et al., 2007). Experimental and computational methods have been highly used to identify these types of sites on protein surfaces (Huo, Massova, & Kollman, 2002;Kollman et al., 2000;Kortemme, Kim, & Baker, 2004;Massova & Kollman, 1999;Moreira, Fernandes, & Ramos, 2006a, 2006bMoreira et al., 2007a;Moreira et al., 2007b;Moreira et al., 2007c;Moreira et al., 2007d;Moreira et al., 2007e;Moreira et al., 2007f;Moreira, Fernandes, & Ramos, 2008;Martins, Ramos, & Moreira, 2013). Results showed that the hot-spot of one monomer usually packs against the hot-spot of another, thereby establishing a region that is determinant for complex binding, and thus, may be a target site for drug discovery. The number of such hot-spots within densely packed regions is correlated essentially with the interface size, and the local organization of these hot-spots is a critical factor in stabilizing protein-protein interactions. Notably, however, Chothia and Janin have proposed a very different theory: the Buried Surface model (BSA) (Chothia & Janin, 1975;Janin & Chothia, 1990;Janin, 2009) that states that the interaction is distributed more or less evenly over the surface of proteins (Chakrabarti & Janin, 2002). Yet, these two apparently opposed theories can be reconciled. Closer inspection shows that protein-protein interfaces can be split into a core and a rim region, according to the solvent accessibility of interface atoms. While the rim is very similar to the protein's surface and displays very few hot-spots, the core contains the buried interface atoms and a larger number of hot-spots (Guharoy & Chakrabarti, 2005;Guharoy, Pal, Dasgupta, & Chakrabarti, 2011). Indeed, Guharoy and Chakrabarti clearly demonstrated a correlation between the ΔΔG binding and the contribution to the BSA of the buried residues of the core, a relationship that is not discernible for the rim residues (Guharoy & Chakrabarti, 2005;Guharoy et al., 2011).
It had been proposed earlier that hot-spots are surrounded by regions with higher packing density leading to solvent exclusion around them, which results in a lower local dielectric constant and enhancement of specific electrostatic and hydrogen bond (H-bond) interactions. As these structures resembled an O-ring (see Figure 1), the idea became known as the O-ring theory, or the 'Water Exclusion' hypothesis (Bogan & Thorn, 1998). This theory led to the inference that warm-and hot-spots tend to have low solvent-accessible surface area (SASA) values. (Bogan & Thorn, 1998) also speculated that warm-and hot-spots tend to be surrounded by energetically less important residues that would function to occlude bulk solvent from them. This hypothesis was already proven not to be universal by other authors. Kosloff, Travis, Bosch, Siderovski, and Arshavsky (2011) have shown for the RGS-domain-Gprotein interface that peripheral residues protect the hotspots from the solvent and have essential roles in recognition, by optimizing the protein-protein specificity and activation (Kosloff et al., 2011). Li and colleagues also proposed a 'double water exclusion' theory in which they suggest that there is a ring of residues surrounding the warm-and hot-spots for avoiding the water invasion after binding and that these residues themselves are water free (Li & Liu, 2009). Rajamani et al. have also proposed that there are some residues at protein-protein interfaces that upon complex formation yield the largest decrease in SASA (Rajamani, Thiel, Vajda, & Camacho, 2004). These anchor residues were demonstrated to be functionally important (hotspots) or kinetically important (Meireles, Domling, & Camacho, 2010;Rajamani et al., 2004). Bogan and Thorn's (1998) original idea, the O-ring theory, was used by different authors to achieve new computational alanine scanning mutagenesis methods (Guharoy & Chakrabarti, 2005, 2010Guharoy et al., 2011;Guney, Tuncbag, Keskin, & Gursoy, 2007;Li & Li, 2010;Liu & Li, 2010;Li, Wong, & Li, 2011;Tuncbag & Keskin, 2009;Xia, Zhao, Song, & Huang, 2010;Zhu & Mitchell, 2011), but the theory still lacks adequate experimental support.
Nowadays, it is accepted that the biological function of proteic macromolecular systems depends on a delicate interchange between the molecule itself and its environment (Ball, 2008). Water plays a key role in understanding the interplay between structure and function, which is a central goal in protein science (Levy & Onuchic, 2006). It is imperative to understand the water's role around the energetically important residues. Are these residues really occluded from solvent as stated by the O-ring theory? Are the warm-and hot-spots themselves water free as stated by the 'double water exclusion' theory (Li & Liu, 2009)? As proteins are dynamic polymers and their conformational flexibility is critical for function, particularly for interactions with other molecules (Chuang et al., 2010), the correct analysis of the PPI requires dynamical information. Thus, in order to achieve a so long needed answer to the questions raised above, we characterize different SASA features for a hot-spots database of various complexes subjected to explicit water MD simulation. We have also measured the RDF of water around them, water occupancy, and protein-water distances. This allowed us to provide strong structural evidence for the O-ring theory.

System setup
Nine different complexes for a total of 160 mutations were studied. Complexes were chosen based on their different interfacial characteristics (i.e. size, hydrophobicity, and amino-acid composition) in order to have a meaningful database. A complete list is given in Table 1. The protonation states of the different residues of the various proteins were determined using the PDB2PQR server at http://kryptonite.nbcr.net/pdb2pqr/ (Dolinsky, Nielsen, McCammon, & Baker, 2004) by the PROPKA methodology (Bas, Rogers, & Jensen, 2008;Li, Robertson, & Jensen, 2005;Olsson, Sondergaard, Rostkowski, & Jensen, 2011). The binding free energy differences values upon alanine mutation were taken out from the ASEdb (Alanine Scanning Energetics database) (Bogan & Thorn, 1998;Thorn & Bogan, 2001) and from references in Table 1.

Molecular dynamic simulations
The MD simulations were performed using the AMBER9 package (Case et al., 2004) with the Cornell force field (Cornell et al., 1995). Table 1 summarizes the composition of a variety of systems under study. The complexes were solvated by explicit TIP3P water molecules that extended 10 Å from any edge of the box to the protein atoms; the whole system can, therefore, be replicated in three dimensions and treated with periodic boundary conditions. Counter ions were added to the box to neutralize the systems. In each of the simulations, the system was initially energy minimized to remove bad contacts by steepest descent followed by conjugate gradient algorithms. The systems were then subjected to 2 ns of heating procedure (in NVT ensemble) in which the temperature was gradually raised to 300 K, followed by 6 ns runs in NPT ensemble. The Langevin thermostat (Jackson, Gabb, & Sternberg, 1998;Loncharich, Brooks, & Pastor, 1992) was used and the electrostatic interactions were calculated by using the particle mesh Ewald (PME) method (Darden, York, & Pedersen, 1993). Bond lengths involving hydrogen atoms were constrained to their equilibrium values using the SHAKE algorithm (Ryckaert, Ciccotti, & Berendsen, 1977). The equations of motion were integrated with a 2 fs time-step and the nonbonded interactions were truncated with a 10 Å cutoff.

Analysis
VMD (Humphrey, Dalke, & Schulten, 1996) and PTRAJ modules from the AMBER9 package (Case et al., 2004) were used in the different analyses carried out from the MD simulations. RMSDs (root-mean-square-deviations) were calculated for each simulation to ensure their equilibration and for the nine complexes ranged between 2.0 and 3.5 Å. Different SASA calculations were made to evaluate the importance of water molecules in the warmand hot-spots microenvironment. The SASA of each interfacial residue within the complex (SASA cpx ) and within the separated monomers (SASA mon ) was averaged at the last 3 ns of the explicit MD simulation. SASA, as defined by Lee and Richards, is the area of the surface traced by the center of a probe sphere, whose radius is the nominal radius of the solvent, as it rolls over the van der Waals surface of the molecule (Lee & Richards, 1971). ΔSASA and SASArel defined by Equation (1) and (2), respectively, were also calculated. SASArel allows the differentiation of residues with equal ΔSASA but different solvent exposure. For example, a residue with a 50 Å 2 solvent accessibility in the monomer and 0 Å 2 in the complex and a residue that has a value of 150 Å 2 solvent accessibility in the monomer and 100 Å 2 in the complex are two very distinct situations. In both cases, ΔSASA is 50 Å 2 but solvent accessibility importance is strikingly different between the two. SASArel was already shown to be important in a previous work (Cho, Kim, & Lee, 2009).
All the four SASA features were analyzed for the 160 residues for the nine explicit water MD simulations. We have also analyzed the RDF, g(r), and the average number of waters within a given distance, of all interfacial residues. G(r) gives the probability of finding an atom within a spherical shell of thickness Δr and central distance r from another atom, in relation to the probability expected for a bulk solvent distribution at the same density. It was calculated by compiling a histogram with a spacing of Δr = 0.02 and a range of 8 Å. Occupancy of waters in the first coordination shell was calculated by python in house scripts, which takes advantage of the ptraj module of the amber package. Tailor-made scripts were also used to evaluate the microenvironment surrounding each interfacial amino acid residue. Geometric criteria (donor-acceptor distance = 3.0 Å, angle cutoff 20°) were adopted to define the formation of an H-bond during the MD simulation. We have used the PISA webserver (Krissinel & Henrick, 2007;Krissinel, 2010) from the European Bioinformatics Institute to calculate interface area composition. This software defines the interface as the protein surface area which becomes inaccessible to solvent upon binding, and so is calculated as the difference in total accessible-surface areas of isolated and interacting structures divided by two:

Results and discussion
It is now accepted that only a few residues at each PPI are responsible for the association driving force, named the warm-and hot-spots. The O-ring theory states that these residues tend to be clustered in the center of the interface surrounded by the null-spots to presumably occlude bulk solvent. To understand this phenomenon, we have carried out explicit water MD simulations of several known protein-protein complexes for which experimental ΔΔG binding values are available.

Molecular dynamic simulations
Proteins are not static structures and instead, they have an inherent flexibility that leads to a succession of structural changes that carve out a specific pathway. Fullatomistic MD simulations offer a unique opportunity to understand the structural and functional role of water around the warm-and hot-spots. MD stability was ensured by measuring the root-mean-square-deviation (RMSD) values of the backbone of all the residues of the system, of the different monomers as well as for the interface residues under study. Figure 2 shows a representative result. It can be perceived that the MD simulations are very stable presenting low RMSD values, especially concerning the residues at the protein-protein interfaces (around 1 Å).  Keyt et al. (1996)

Database analysis
Our database is described in the methodological section in Table 1. To better understand the chemical composition of the PPI, we plotted in Figure 3 the percentage of each type of amino acid residue in the protein, interface, and as a warm-and hot-spot. The warm-and hot-spots at the PPIs at our database are enriched in Tyr (25%), Asp (16%), Leu (9%), and Glu (9%). Nevertheless, Phe (7% vs. 3%), Trp (7% vs. 3%), and Met (5% vs. 2%) residues have higher percentages as energetic determinant residues than as PPI residues (% as hot spots vs. % as PPI residues). So, as reported by Bogan and Thorn (1998) tyrosine is over-represented as an energetic crucial residue and valine and threonines are under-represented. Arg is abundant as a warm-and hot-spot but does not exceed the expected level at the PPIs (5% vs. 7%). Cys, Pro, Ala, and Gly do not appear as warm-or hot-spots due to reasons related with the ASM itself or structural motifs. More concretely, Pro unlike other residues can exist in a cis-configuration that translates in a very different backbone conformation, and, therefore, Pro-Ala mutations are disruptive and can produce abnormal changes. Ala for the obvious reasons, and Gly and Cys are not usually mutated in alanine scans as they lead to structural disruption. Our database has 44 warm-and hot-spots that correspond to 25% of the protein-protein interfacial residues. Tyr and Asp residues are widely present in almost all the interfaces. Glu and Trp in smaller percentage and the rest of the warm-and hot-spots are characteristic of the specific protein-protein systems under study.

Solvent-accessible surface area
The O-ring theory states that occlusion of an amino acid from the bulk solvent plays a decisive role in determining the energetic contribution of that residue to protein-protein binding (Bogan & Thorn, 1998). Later, Guharoy and Chakrabarti (2005) illustrated that there is a correlation between energy change and decrease in the SASA of individual residues as a consequence of protein-protein binding. Solvent accessibility was also combined with conservation in an empirical formula to identify warm-and hot-spots computationally (Guney et al., 2007). We have supported the idea that hot spots are protected from solvent by a rim region (Moreira et al., 2007e). Li, Keskin, Ma, Nussinov, and Liang (2004) suggested that warm-and hot-spots are either found on the complemented pockets or on the protruding surfaces to protect each other from the solvent. Various authors have previously focused on solvent accessibility to discriminate warm-and hot-spots residues but a clear understanding is still far from being achieved. In order to evaluate the relationship between solvent accessibility and ΔΔG binding , we measured four different SASA features: SASA cpx , SASA mon , ΔSASA, and SASArel for the interfacial residues for which experimental data is available. The most used descriptors in the literature are SASA cpx and ΔSASA. residue to be a warm-and hot-spot residue. However, these features are not by themselves effective to distinguish between slightly buried residues, a large part of which are null-spots, and deeply buried residues that are very likely to be hot-spot residues. In fact, as hydrophilicity/hidrophobicity is a key physical property of a proteic surface, and one of the most relevant for PPI, we have also investigated if the nature of the residue has any influence on the qualitative nature of the relationship between SASA features and the ΔΔG binding values. In Figure 4 are presented the average values for the four SASA features for all the amino acid residues at the database as well as for the negatively charged (Asp and Glu), positively charged (His, Lys and Arg), charged (Asp, Glu, Lys, Arg, His), polar (Ser, Thr, Asn, Gln, Tyr), nonpolar (Val, Ile, Leu, Met, Phe, Trp), and aromatic (Phe, Trp, Tyr, His) residues. Ala, Gly, Cys, and Pro were not considered for the motives mentioned above. There is a striking difference between the average SASA features values between the warm-and hot-spots and the null-spots except in the case of SASA mon . It is clear that the warm-and hot-spots are more buried upon complexation than the null-spots as the SASA cpx , ΔSASA, and SASArel are markedly different for these two groups. The values for the null-spots are almost half than for the warm-and hot-spots. This fact is even more stressed out for the positively charged residues. Although the standard deviation of these values go up to around 38% of the measured values, which highlights the difficulty in using SASA features as a sole feature for hot-spot detection, it clearly demonstrates that the energetically important residues are more occluded from solvent.

Radial distribution function
Although warm-and hot-spots are more deeply buried, it cannot be forgotten that water plays a pivotal role in determining the structure and dynamics of most biological systems by establishing specific interactions, screen efficiently coulombic interactions, mediate proton transfer, and can even be used as a structural component in protein secondary structure (Papoian, Ulander, & Wolynes, 2003;Rodier, Bahadur, Chakrabarti, & Janin, 2005). In this paper, we present an overview of the dynamical behavior of water within protein-protein interfaces. The water structure can be depicted in terms of a RDF, which is related to the liquid density ρ(r) around a particle: ρ(r) = ρ b g(r), where ρ b is the bulk density (Ball, 2008). The RDFs represent the probability density of finding any solvent molecule whose oxygen is at distance r from the specific solute atom. Usually, the RDF of water presents an oscillatory profile and a peak due to the presence of Hbonds. The RDF profiles were measured for the residues at our database. As an example, in Figure 5 are plotted two RDF profiles, one for a representative hot-spot and another for a null-spot. They present very different behaviors. The peak around 2.5 Å seen in Figure 5(c) arises from the strong interaction between the water hydrogen atoms with the oxygen atoms of the carbonyl group of the Figure 3. The percentage of each type of amino acids in our database of nine complexes for all the proteins (number of residues of type X/number of residues at the database), protein-protein interfaces (number of residues of type X at interface/number of residues at the interface of all the complexes at the database), and as warm-and hot-spots (number of residues of type X at interface/number of residues that act as warm-and hot-spots at the interface of all the complexes at the database).
null-spot. The following peaks, whose location varies according to the system considered, are due to the interaction between water molecules and the nonhydrogen atoms of the amino acid residue. These peaks are less defined for the warm-and hot-spots. Panels b and d show that the typical number of waters around warm-and hot-spots and null-spots are also distinct. These values measured at 3, 4, 5, and 6 Å for the two groups are depicted in Figure 6. The average number of water molecules around warmand and hot-spots is lower than half of the number of water in the null-spots micro-environment. These values are even lower around the aromatic and nonpolar residues than around the charged residues. Figure SI-1 shows the average values by type of residue for the warm-and hot-spots as well as for the null-spots. Gln, His, Tyr, Trp, Phe, and Leu when acting as key energetic determinants have less than two waters around them at a distance of 5 Å. As null-spots, the water molecules that surround them can range on average from 1.48 up to 8.86. The number of waters triples when charged residues act as null-spots.
The O-ring theory was proposed upon a statistical study of the SASA cpx and ΔSASA values for static structures.
The results presented here, ranging from the SASA calculations to the RDF calculations, take into account the dynamics of protein-protein systems and clearly show that the average number of water molecules in the microenvironment around warm-and hot-spots is much lower than for the null-spots. This study provides strong evidence that indeed the warm-and hot-spots are protected from solvent, as predicted by the O-ring theory.

Water's occupancy
Water-mediated contacts may provide an additional layer of recognition in protein binding. These can be highly specific and complement direct contacts in the task of recognizing a particular binding partner out of many competing proteins (Rodier et al., 2005). Interfacial   waters are often too disordered to be resolved crystallographically. Therefore, explicit MD simulations can be used to describe the structural and the dynamical organization of the H-bonds formed between the solvent and their implications in the dynamical properties of the macromolecular systems. Table 2 shows the average and standard deviation values for these interactions as well as the water maximum occupancy around these residues. We have focused on interactions involving the interfacial residues of our database. However, in certain cases, they interact with other residues for which no experimental ΔΔG binding information exist, and that for are not under study. These interactions were found for five of the nine complexes studied. The occupancy of water molecules in specific hydration sites does not seem to depend in any simple way on the nature of the residue to which they bindits polarity or H-bonding ability, for example. In the cases shown, the water occupancy around the warm-and hot-spots is higher than 90%. These solvent molecules contribute by forming H-bonding networks with the warm-and hot-spots cluster. In the 1DFJ complex, Asp435, a hot-spot, establishes a high number of water-mediated H-bonds with the main and side-chain atoms that explain its energetically crucial role. For the 1DVF and 1BRS, the cluster of warm-and hot-spots is tightly packed, establishing H-bonds with waters with high occupancy. For the 1DVF complex, Tyr32_A interacts with Asp207 and Tyr208 by two water molecules. For the 1BRS complex, a two water-mediated interaction is made between Glu73 and Lys27. These interactions are extremely stable as seen in Table 2. In these three complexes, at least two water molecules mediated the H-bonds between warm-and hot-spots from different monomers. In the 1FQ9 and 1FLT complexes, the hot-spots also form water-mediated interactions. For example, in the 1FQ9 complex, the same water molecule connects the energetically important residue Glu96 with two different residues.

Conclusions
Despite the ongoing effort to decipher the complex nature of protein interactions, these are not still entirely understood. Protein-protein binding is achieved by the connection of small regions with specific chemical and geometric properties that contribute significantly to the binding free energy (over 2.0 kcal/mol), the warm-and hot-spots. Bogan and Thorn proposed that these regions are surrounded by O-rings, which exclude bulk water molecules from them. A useful indicator of how a surrounding medium affects protein structures is represented by the surface area of protein atoms in contact with the solvent molecules (SASA). Although quantities derived from SASA are useful in many applications in protein design and structural biology, the computational cost of accurate SASA calculation in an ensemble of structures makes this calculation expensive. In order to test the O-ring theory, four SASA features were measured for the residues in our database. By measuring the average SASA features for the warm-and hot-spots as well as for the null-spots, it was possible to obtain a clear view of the different behavior of these two sets of residues. RDFs have allowed once more distinguishing between the two groups and undoubtedly warm-and hot-spots are occluded form the solvent. The number of water molecules tin their micro-environment is much lower than for the null-spots. This work has finally provided strong evidence that warm-and hot-spots are protected from the bulk solvent validating the O-ring theory. However, we cannot forget that water-mediated interactions have significant power to construct an extensive and strongly bonded interface and indeed, warm-and hot-spots tend to form H-bond networks with single water molecules that have an occupancy around 90%.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.10.1080/07391102.2012.758598.