Molecular mechanisms of activation in CDK2

Cyclin-dependent kinases (CDKs) are enzymes involved in crucial cellular processes. Their biological activity is directly linked to their high conformational variability, which involves large protein conformational rearrangements. We present here the application of an enhancing sampling technique to the study of conformational transitions between the open and closed state of CDKs. The analysis of the conformational intermediates supports the idea that the process is regulated by two important protein regions, which sequentially rearrange in order to allow the protein to reach its final conformation. Furthermore, the two paths involve additional (minor) protein rearrangements which are specific to the paths. Our results show that our procedure can provide reasonable transition pathways between the two protein forms at a very reduced computational cost. The robustness and the simplicity of our approach make it of general application to describe virtually any macromolecular conformational transitions.


Introduction
Cyclin-dependent kinases (CDKs) are regulatory enzymes involved in cell cycle progression and transcription and their deregulation has been implicated in a number of diseases, including cancer (Johnson, 2009). They are small proteins (34-40 kD) containing a catalytic core which is conserved in all protein kinases. CDKs are foundby defaultin the inactive state and their full activation requires both the binding of a cyclin subunit and the phosphorylation of a threonine residue in the activation loop located near the active site. Both activation and inhibition mechanisms involve conformational changes in and around their catalytic cleft (Jeffrey et al., 1995), implying that flexibility plays a central role in their function (Huse & Kuriyan, 2002). That is, CDK2and kinases in general work because they are highly dynamical entities which allow a continuously functioning switch between active and inactive conformations (Jura, Zhang, Endres, & Seeliger, 2011). Notably, these peculiar characteristics indicate the possibility to inhibit kinases by trapping them in the inactive state. Due to the high conservation of the kinase ATP-binding site, the use of the inactive kinase conformation as binding target allowed the identification of high-specificity drugs to be used for kinase inhibition (Nagar et al., 2002). Considering that protein kinases are involved in a myriad of crucial cellular processes and that their misregulation often results in diseases (Lengyel, Sawada, & Salgia, 2007), it is not surprising the intense interest for the understanding of the molecular mechanisms regulating the conformational switching associated to their activation (Hanson et al., 2007;Henzler-Wildman et al., 2007;Lu & Wang, 2008;Nagar et al., 2003;Whitford, Miyashita, Levy, & Onuchic, 2007).
In CDK2, the active (inactive) state coincides with the open (closed) protein conformation, which refers to the position of both the central loop (the activation or Tloop) and the αC helix, located in the N-lobe. CDK2 structure has been solved in the free state (De Bondt et al., 1993), as well as in complexes with cyclin A (Jeffrey et al., 1995) and with KAP (Song et al., 2001). These structures point out that the essential differences between active and inactive conformations are essentially determined by the position of the helix αCwhich is swung outward from the N-lobe in the inactive state and results in moving of the conserved hydrophobic residue in the helix away from the active siteand by the Tloop, a protein region including an essential motif for the ATP binding (the aspartic-phenylalanine-glycine, DFG).
In the inactive (closed) state, the T-loop is displaced onto the C-terminal lobe, leaving the ATP-binding site accessible to substrates (Brown, Noble, Endicott, & Johnson, 1999;Jeffrey et al., 1995;Pavletich, 1999;Russo, Jeffrey, Patten, Massagué, & Pavletich, 1996). That is, the mechanisms of activation/deactivation of the CDK2 (and of other members of the CDK family) involve large conformational changes of different regions of the protein.
Therefore, a proper theoretical model able to describe the transition between the two states can be a very useful tool to understand the key steps of the process, which ultimately determine the protein biological activity. However, due to the relatively slow kinetics of the transitionestimated in the microsecond time-scalethe usual computational sampling techniques are almost usefulness. Therefore, the search for enhancing sampling algorithms represents an active field of research. A group of them, including but not limited to umbrella sampling (Bartels & Karplus, 1997), metadynamics (Laio & Parrinello, 2002) and conformational flooding (Grubmuller, 1995), an adaptive umbrella potential, is employed to destabilize the conformations already sampled, thus elevating the system from the bottom of a conformation minimum until transition into another minimum takes place. A second group of methods, making use of constraints or restraints along collective degrees of freedom, is able to drive the systems in selected regions of the conformational space.
In this respect, we present here the application of an enhanced sampling techniquethe essential dynamics sampling (EDS) (Amadei, Linssen, & De Groot, 1996) to the description of the CDK2 open-closed and closedopen transitions. Such a technique is based on the possibility to guide the sampling towards a target and to reject movements (in the conformational space) which are not in the desired direction. With respect to other enhancing sampling methodologies (for a brief overview on the subject, see for examples the review of Tai (Daidone & Amadei, 2012;Tai, 2004) and references therein), the advantages of the EDS technique are that the system moves along its own essential eigenvectors as provided by free (unbiased) molecular dynamics (MD) and no ad hoc terms are introduced in the Hamiltonian. Moreover, the contraction procedure guaranteesby constructionthat the system goes from an initial state to a final state, being only slightly constrained. In fact, the use of the first few essential eigenvectors associated to the principal motions of the protein alpha-carbonswhich typically describes up to 90% of the total atomic fluctuation of proteins and are likely to describe protein conformational changesonly biases a number of degrees of freedom which is an order of magnitude less than the total degrees of freedom of the whole protein. The EDS methodwhich was originally developed to overcome sampling issue, i.e. in expansion mode was successfully applied (contraction procedure) to study the mechanism of protein folding (Narzi, Daidone, Amadei, & Di Nola, 2008) and the accessibility of the closed and open domain conformations in an enzyme (Daidone, Amadei, Roccatano, & Di Nola, 2003). In addition, the combination of the EDS with the replica exchange technique was successfully applied to the study of the adenylate kinase conformational transitions (Kubitzki & de Groot, 2008), and an alternative enhanced sampling procedurebased on the use of the EDSwas recently proposed and applied on a set of 50 macromolecules to find conformational transition pathways (Sfriso et al., 2012).
In this work, we show how the application of the EDS technique to the study of the conformational transitions occurring in CDK2 protein can provide an efficient description of such a process at atomistic level, highlighting the existence of different transition paths which connectin the conformational spacethe closed and open forms of the protein. A graphical representation of the main differences between the two conformational states of the protein is reported in Figure 1.

Methods
The starting structures of both "open" and "closed" conformations were taken from the corresponding crystallographic structures, PDB entries of 1FIN (Jeffrey et al., 1995) and 3PXR (Betzi et al., 2011) respectively. These crystals represent the best structures for our goal (conformational states, absence of inhibitor in the crystal, number of resolved residues and resolution). The MD simulations were carried out at constant volume with the GROMACS program package (Hess, Kutzner, van der Spoel, & Lindahl, 2008), using the AMBER99 force field (Wang, Cieplak, & Kollmann, 2000) and the SPC water model (Berendsen, Postma, & van Gunsteren, Intermolecular Forces, 1981). The temperature was kept constant using isothermal-gaussian coupling at T = 300 K (Berendsen, Postma, van Gunsteren, Di Nola, & Haak, 1984). All bonds were constrained by using the LINCS algorithm (Hess, Bekker, Berendsen, & Fraaije, 1997). An integration time step of 2 fs was used. Lennard-Jones interactions were calculated with a cut-off of 11 Å. Electrostatic interactions were calculated explicitly at a distance smaller than 11 Å; long-range electrostatic interactions were calculated by particle-mesh Ewald summation with a grid spacing of 0.12 nm and fourth-order B-spline interpolation (Darden, York, & Pedersen, 1993). After removing the first few nanoseconds, the two trajectories (lasting 200 ns each) were concatenated and the covariance matrices of positional fluctuations (C-alpha only) were built and diagonalized. Eigenvectors are directions in configurational space and the corresponding eigenvalues indicate the mean square fluctuations along these axes. The procedure corresponds to a linear multidimensional least-squares fitting of a trajectory in configurational space. Sorting the eigenvectors by the size of the eigenvalues shows that the conformational space can be divided in a low-dimensional (essential) subspace in which most of the positional fluctuations are confined and a high-dimensional (near-constraints) space in which merely small uninteresting vibrations occur. The EDS method was applied in order to sample the configurational space and the conformational change between two states more efficiently than by classical molecular dynamics (MD). The EDS technique (Amadei et al., 1996) is based on a previous essential dynamics analysis and it is used to increase (expansion procedure) or decrease (contraction procedure) the distance from a reference structure. In the latter, used in this work, a regular MD simulation step is performed and the distance between the current structure and the reference structure is calculated. The step is accepted if the distance between the current structure and the reference does not increase (contraction procedure); otherwise, the coordinates and velocities are corrected radially back onto the hypersphere (in the chosen subspace) centred in the reference, with the radius length given by the distance from the reference in the previous step. Additional details of the EDS procedure are reported in the following section.

Results
The first step in our study was to select a number of starting structuresboth open and closedrepresentative of the protein conformational space as sampled by the free MD simulations. To this end, we selected a total of 21 structures (13 open and 8 closed) which fall in different regions of the corresponding essential subspaces (see Figure S1 in SI) and used them as starting and ending points for the EDS simulations, resulting in a total of 104 EDS trajectories.
As expected, the first eigenvectorobtained by means of principal component analysiswell discriminate between the two protein conformational states (see below). Considering that the main objective of the present work is to describe the CDK2 conformational transitions, we performed a preliminary set of EDS simulations varying the number of eigenvectors used to define the subspace providing the biased sampling, to assess the minimum dimension able to describe the transitions and leading to final structures very similar to the target structures extracted from the free MD simulations. As expected, the best structures (those with the lowest RMSD with respect to the target structure provided by the free MD simulations) are obtained using all the C-α eigenvectors; however, the use of the first 20 eigenvectors leads to structures which are still very similar to the corresponding targets (the range of the RMSD values observed for the free MD simulations as well as the RMSD of the final structures with respect to the target versus the number of eigenvectors used is reported in the Table S1 in Supporting Information). Therefore, the first 20 C-α eigenvectors were used to define the subspace involved in the EDS procedure to describe the openclosed and closed-open transitions in the CDK2. It is worth noting that the use of the EDS procedure based on 20-dimensional C-α eigenvector subspace provides a very limited bias on the MD trajectory as it leaves all the remaining degrees of freedom completely free in their sampling and relaxation.
In a previous work, the eigenvectors associated with the smallest eigenvalues were used in the EDS procedure to describe the folding pathways of two small proteins (Narzi et al., 2008). In that case, in fact, the main mechanical information needed to fold the systems were contained in the last eigenvectors (near constraints degrees of freedom), whereas here we focus on large concerted motions in the folded state, as occurring in the opening/closing conformational transitions, which do not disrupt the mechanical framework of the folded conformation, and, thus, are well described by the first (largest eigenvalues) C-α eigenvectors (essential subspace).
The projections of a representative set of 15 EDS trajectories on the essential subspace defined by the first two C-α eigenvectors are reported in Figure 2 (additional data-set projections are reported in Figures S4-S7 in SI). The most interesting point is that the two conformational transitions, i.e. the opening and the closure processes, follow different paths. In particular, the first eigenvector discriminates between the two equilibrium protein conformations (open and closed), while the second eigenvector characterizes the differences in the transitions paths. These results are confirmed for all the EDS trajectories we performed (see Figure S2 in SI). To explain and rationalize mechanistically the transition path differences between the opening and closing processes, we analysed the conformational intermediates as provided by the EDS trajectories. It is worth to note that although the structures along the pathways cannot be a representative of their equilibrium distribution, the high similarity of the conformational intermediates in the essential subspace observed in all the EDS simulations and the high conservation of the secondary structures (see Table 2 in Supporting Information) seems to exclude the presence of possible friction-dissipative artefacts.
From visual inspection of the atomic motions due to the first two essential eigenvectors connecting the two equilibrium conformations, it comes out that two protein subparts, mainly determining the conformational state (as shown in Figure 1) i.e. the T-loop and the α-C helix, rearrange in a different order. In fact, in the open-toclosed transition, there is first a rototraslation of the helix followed by the T-loop closure. The reverse transition sequence, i.e. the T-loop opening followed by the helix movement, is observed in the opening process. Interestingly, the same sequential transition mechanism was observed in a recent study on the Src kinase activation by combining multiple independent MD trajectories (Yang, Banavali, & Roux, 2009). Nevertheless, it is worth noting that computational approaches based on a simplified description of the protein (i.e. using the Gō model) led to opposite results in the sequentiality of the activation mechanism as occurring in the Src kinase (Huang, Zhao, Dickson, Skeel, & Post, 2012;Yang & Roux, 2008).
The alpha-C helix and T-loop conformational changes are coupled with additional structural rearrangements involving different protein regions, mainly located in the N-domain. In particular, during the opening process, there is a concerted movement of the T-loop and the G-rich loop, which plays an important role in both phosphoryl transfer and ATP/ADP exchange during the catalytic cycle (Kornev & Taylor, 2010). These loops, drifting away from the active site, allow the α-C helix to enter in the ATP-binding site. The closure mechanism involves the samebut reversedprotein conformational rearrangements. It should be remarked that such structural rearrangements cannot explain the clear differences between the two paths observed in the essential subspace (Figure 1) due to the different intermediate distribution along the second C-α eigenvector. In fact, for a conformational transition involving such structural changes, no transition path differentiation should be observed in the essential subspace as the closed-open rearrangement is simply the reverse of the open-close. Indeed, the second C-α eigenvector involves further structural changesmainly arising from two short protein regions, a loop near the α-C helix (residues 38-44) and a loop between two β-sheets (β-4) involving the residues 71-76resulting in the difference of the paths (the closed-to-open structural transition of these regions is not the reverse of the open-to-closed process).
The sequentiality of the two structural rearrangement mechanisms is well explained by the graph of the α-C helix rotation angle versus the T-loop displacement with respect to the initial states (Figure 3, for the sake of clarity only a trajectory for each transition is reported). This figure further supports our description and rationalization of the conformational transition mechanisms on a more structural ground.
Although the processes involve several additional (but narrower) protein rearrangements and it might be somewhat difficult to completely separate the T-loop, the C-α helix and additional smaller conformational rearrangements, our interpretation furnishes a simplified description of the protein conformational transitions on a structural ground.
From the computational viewpoint, the steep decrease of the contraction radius values versus the number of EDS steps (Figure 3) indicates that our procedure is very efficient: in fact, a number of steps between 20 and 40 kcorresponding to 40 and 80 ps of a free MD simulationare sufficient to reach the  target structures for both the transitions. In addition, the overall values of the contraction radius versus the number of steps ( Figures S2, SI) point out that the number of steps required to reach the target in the essential subspace is almost independent from the starting and target structures, thus indicating the presence of a common minimum-free-energy path in the conformational space connecting the two protein states.
In Figure 4, the derivative of the mean of the contraction radius with respect to the number of steps (bottom panel) has been shown. Interestingly, in the first 5 k steps, the absolute values of the derivatives are quite large and steeply changing, indicating that, in the EDS simulations, the protein finds the path towards the target region relatively fast, although rapidly reducing its efficiency. The derivatives reach a plateau between 5 and 25 k steps, roughly corresponding to the conformational intermediate regions. For further radius contractions, both the transitions slow down (as shown by the smaller absolute values of the derivatives), converging to zero.
The possibility to qualitatively reconstruct the freeenergy profile of the conformational transitions from the analysis of the number of steps needed to reach the target is currently under development.

Conclusions
In this work, we investigated the possibility to model large protein conformational rearrangements by means of an enhancing sampling procedure, based on the use of the essential protein motions. Our data point out that such a procedure is robust, computationally efficient and does not require any a priori knowledge of the regions involved in the transition but requires only a description of the starting and final conformational states. In the case of CDK2, we found that both the opening and closure follow common transition paths in the essential subspace, which can be used to highlight the structural determinants involved in such processes. In particular, such transitions are essentially described by well-defined structural rearrangements involving two important protein regions, the C-α helix and the activation loop. The sequentiality of such rearrangements determines the specific protein conformational transition, which is well characterized by the first essential eigenvector. Using the second eigenvector, the transition paths can be separated in the essential subspace, thus highlighting that additional (tiny) protein rearrangements occur during the protein conformational transitions. In summary, the presented procedure seems to be promising to provide a general, rather efficient and possibly accurate, theoretical-computational method to address protein conformational rearrangements in complex macromolecules, where such process is beyond the presently computational feasibility.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.10.1080/07391102.2013.844080.