AI-based AlphaFold2 significantly expands the structural space of the autophagy pathway

ABSTRACT A structural understanding of entire cellular processes has been an uncharted realm, until now. The artificial intelligence-based tool AlphaFold2 (AF2) has substantially changed the prediction accuracy, and predicted models of entire proteomes are now available. Here, we have examined AF2’s prediction of 38 core macroautophagic/autophagic proteins and 378 interacting partners representing the human autophagic interactome. Prior to AF2, ~50% of the proteins lacked atomistic level resolution and we found significant improvement in structural coverage by AF2, with an addition of ~ 47% of the residues modeled with reasonable confidence. We also augmented this structural information with μs timescale molecular dynamics simulations, in particular, ATG2, ATG10, and ATG14. ATG2A, a bipartite membrane protein with rodlike architecture was predicted with high accuracy and our simulations revealed dynamic transitions of cavity-lining residues that might play a critical role in regulating lipid transfer. In addition, a promising approach of multimeric prediction by AF2 revealed the architecture of ATG7-ATG10, a tetrameric complex that participates in conjugation machinery in autophagy. By combining computational and experimental approaches, we demonstrated that three salt bridges were crucial to ATG7-ATG10 complex formation and mutating these residues abrogated the binding. We have also generated a web resource with curated AF2 structural models, simulated conformational ensemble, and structural analysis that will be highly pertinent to the autophagy community. Altogether, our work presents a robust pipeline to utilize AF2 as a tool for a starting point to provide the dynamic behavior of molecules in a given biological pathway. Abbreviations AF2: AlphaFold2; AF2-Mult: AlphaFold2 multimer; ATG: autophagy-related; CTD: C-terminal domain; ECTD: extreme C-terminal domain; FR: flexible region; MD: molecular dynamics; NTD: N-terminal domain; pLDDT: predicted local distance difference test; UBL: ubiquitin-like


Introduction
Autophagy is a specialized catabolic process by which cells sequester cytoplasmic material and target it for lysosomal degradation.A membrane structure called phagophore is formed during the process which subsequently expands to form a double-membrane vesicle called autophagosome [1][2][3].The unique feature of autophagy to target cellular content to lysosomes is attractive to therapeutic strategies for cancers, myopathies, neurodegeneration and heart diseases [4][5][6][7][8][9][10][11][12].There are ~ 38 conserved core autophagic proteins that participate independently or in oligomeric complexes to drive autophagy toward completion.Broadly, there are four functional categories that regulate autophagosome biogenesis and maturation, namely, the ULK complex, the class III phosphatidylinositol 3-kinase (PtdIns3K) complex, the ATG9 complex and the ubiquitin-like (UBL) conjugation systems [2,13].Previous studies have elaborated on the role of autophagic proteins and their complexes, thus providing a comprehensive view of how the pathway proceeds [1,2,20].While the present structural understanding of autophagic machineries is expanding, the membrane localization and heterogeneous nature of autophagic complexes pose marked technical hurdles to exploit autophagic proteins as potential drug targets.
Recently, the availability of protein structure prediction platform AlphaFold2 (AF2) was announced as a watershed moment in the biology community [21][22][23][24][25].In the past, several in silico protein prediction tools utilizing homology modeling, ab initio, and threading provided good structure models for biologists [26][27][28][29][30][31][32][33].However, the bottlenecks in the field were requirement of closely resembling templates or ab initio based methods that rely on large compute resources.With AF2, several of these bottlenecks have been removed and the tool performs dramatically better compared to the previous options [34].Briefly, the artificial intelligence-based approach can predict structures with a high accuracy even for proteins with limited templates [25,34,35].The AF2 network essentially comprises two stages that capture spatial and evolutionary relationships between sequences, and further translate the information to produce atomic coordinates by utilizing physical principles.The current form of AF2 is available both in a web application termed AlphaFold database and as a stand-alone application [24].In addition to the monomer AF2, the standalone version has recently incorporated multimer functionality in which AF2 models are specifically trained for multimeric interfaces of known stoichiometry [36].The current version of the AF2 database contains over 214 million predicted structures across 48 complete proteomes [24].Owing to its numerous advantages, various applications have incorporated AF2 predictions into existing pipelines, such as SWISS-MODEL [37], UniProt [38], PDBSum [39], ColabFold [40], ChimeraX [41] etc.
The usefulness of AF2 predicted models in understanding protein structures in quantitative manner has just started to emerge.For instance, its implications in the context of designing proteins that bind to specified target [42], complex prediction [36], missense mutations [43][44][45], post-translational modifications [46], identifying disordered protein regions [47], structurebased drug designing [48,49] and relevance in the area of protein aggregation [50].Here, we present a curated understanding of the contribution of AF2 to the present comprehensive structural information of autophagic proteins.~50% of the autophagic interactome is currently lacking atomistic level structural resolution and, remarkably AF2 provides ~47% additional structural coverage with high accuracy.By consolidating information on core autophagic proteins, we evaluated high-confidence structural models and performed extensive molecular dynamics simulations of ATG2, ATG10 and ATG14 proteins.We also present an extensive pipeline to utilize AF2 multimer application (AF2-Mult), simulations and experiments to elaborate on molecular details of human ATG7-ATG10 complex.At end, we present a curated web-resource called RAPSAP (Resource of AF2 Predicted Structures of Autophagy Pathway, http://rap sap.igib.res.in) that contains structural information of proteins in autophagy pathway post AF2 era.

AF2 predictions largely increase the structural repertoire of autophagy-related proteins
The autophagy process is regulated by core autophagic proteins that interact with numerous other proteins to constitute the "autophagy interactome".A pool of ~ 400 proteins has been previously reported to orchestrate the process of mammalian autophagy [51].Based on this autophagy network, we mapped information of structural coverage for each node (protein), as shown in Figure 1A.We found that 205 out of 416 proteins lacked atomistic information, that represents ~ 50% of core autophagy interacting proteins.The network was also sub-divided into functional groups of core proteins.In particular, ULK complex, PtdIns3K complex, ATG9 complex and UBL conjugation system lacked ~ 35%, ~44%, ~42% and ~ 55%, of structural coverage, respectively.Thus, a lot of autophagy proteins and their complexes remain unexplored.
Here, we extracted structural information from predicted models of human autophagy-related proteins and their interacting partners (Figure 1B).We asked a question how accurate are the models provided by AlphaFold2 in a quantitative manner.The tool reports a metric called the predicted local distance difference test (pLDDT) for each residue as a measure of confidence in the correctness of the prediction.Residues with pLDDT > 90 are high confidence scores for both backbone and sidechain residues, pLDDT values between 70 and 90 are of intermediate confidence and represent accurate backbone prediction, values between 70 and 50 are low confidence, and residues with pLDDT below 50 depict low reliability.On a more global scale, ~9% (19 of 205) of the autophagyrelated proteins in the interactome fitted into more stringent confidence criteria of pLDDT > 90, which is proposed to result in accurate backbone as well as sidechain prediction.However, ~45% of the proteins (92 of 205) reported medium confidence scores (pLDDT > 70).At the local level, an average of ~ 47% of the residues of the autophagy interactome were observed to be additionally covered by AF2 with pLDDT score of > 70.Together, the data suggests significant expansion of the structural realm of autophagy proteins by AF2 predictions.

AF2 predictions enhance the structural coverage of core autophagic machinery
Out of 38 core autophagic proteins, only 19 proteins were experimentally resolved (Figure 2A).The unresolved category has broad spectrum of structural coverage from the proteins with no existing structural information to proteins with partial existing structural information.The latter category can further be subdivided into proteins with lacking structure information on domains or largely known structure with unresolved disordered regions.To identify core proteins in each of these categories, we have enlisted unresolved residues in each protein and labeled them as either structured or disordered depending on their IUPred2 scores.Figure 2B shows bar plot categorized each protein into resolved, unresolved structured and unresolved disordered segments.The key autophagic proteins ATG2, ATG10 and ATG14 were observed to have low structural coverage with no existing atomic resolution structures available in humans and were predicted by high pLDDT scores of greater than 70.Secondly, RB1CC1/FIP200, AMBRA1, and the ULKs have a percentage of residues whereby domain is predicted at good confidence.Lastly, LC3, ATG13, WIPI and other proteins have experimental structures albeit with missing regions that may be intrinsically disordered.These regions were predicted at pLDDT values of less than 50.When AF2 predicted pLDDT scores of core autophagic proteins were aligned against the secondary structure components of the protein sequence, coil regions mostly exhibited low pLDDT values (Figure S1).This is also in agreement with previous literature, where lower pLDDT is attributed to intrinsically disordered region [47].Overall, significant atomistic details are now available on core autophagic proteins, and one can navigate further experimental design based on proposed highconfidence AF2 structures.

Structural, dynamical and functional analysis of high confidence AF2 predicted structures through microsecond molecular dynamics simulations
Structures adopt multiple conformations.While the presented structural pool of AF2 predicted models provide important starting points, the structure exhibits many relevant functional states.We have carried out extensive all-atomistic molecular dynamics simulations to capture protein dynamics and reveal relevant functional states.We selected three fulllength proteins predicted with high confidence scores with no existing atomic resolution structure in humans, namely ATG2A, ATG10 and ATG14.
ATG2A, a bipartite membrane protein, is proposed to transfer lipids from ER to the phagophore.AF2 predicted  the structure of the protein with good accuracy.The model consisted of a concave, taco-shaped β-sheet like structure with a hydrophobic interior and hydrophilic exterior (Figure 3A; Figure S2A).The hydrophobic-lined groove formed a long twisted inner cavity that runs along the length of the protein, and has architecture consistent with the previously available cryo-EM data (Figure S2B) [52,53].The modular building block of the cavity is composed of eight "repeating βgroove" (RBG) domains, with each consisting of 5 antiparallel β strands (except terminal RBGs containing a smaller number of β strands) followed by unstructured loop that crosses back over the β sheet [54,55].By computing root mean square fluctuations of Cα atoms of each residue, it was observed that loops of RBG4 and RBG6 displayed more flexibility compared to other protein segments (Figure S2C).
Owing to the unavailability of complete experimental structure of ATG2, very little is known about the molecular mechanism of transfer of lipids through the protein.Figure 3B shows the time evolution of the cavity calculated from CAVER from 1 μs long molecular dynamics simulation (see Materials and Methods section).Interestingly, the dynamic movements of the protein showed changes in length and width of the protein.The tunnel length was observed to change from approximately 21 nm in the beginning of the simulation to 18 nm toward the end.This originated from RBG4 and 5 that are stacking together and hence decreasing the length of the protein (Figure S2D).On the other hand, to monitor the pore width, we analyzed the long rod-like architecture into four small cavities composed of two RBGs each (Figure 3C).In total, cavity consisted of 265 lining residues.To rigorously establish the pore dynamics, we traced all possible distance pairs within a given cavity i.e., in total 18,826 residue pairs were calculated.Based on maximum standard deviation of error, contributions of significant changes of distances are shown in Figure 3D.The pore was observed to be relatively larger in width in cavity 2 and showed dynamic opening and closing of RBG domains.The main interaction points are A443-T511 and W659-T692 that showed distances varying from ~2.0 nm to 4.3 nm (Figure 3E, Figure S2E).The inward and outward mobility of interaction sites were also quite evident in cavity 3, with multiple interactions showing closure of the pocket (L951-M1147, L1126-L1208, and P1125-I1193).At N-terminal level, the cavity was relatively stable which is also proposed to be extractor of membrane lipids [56].A previous study showed that N-terminal comprising of only 345 residues is sufficient to support lipid transfer activity [53].Given the importance of N-terminal, these residues that are involved in dynamic transitions may be critical for its function.Our results highlight the significance of MD simulations to reveal dynamic sites in ATG2 protein and for the subsequent experiments to test its specific activity.
Next, ATG10 is another such protein which does not have any existing atomic resolution structure available in humans and is predicted by AF2 with a pLDDT score of 78.9 (Figure 4A).The protein acts as an E2 enzyme in ubiquitin-like conjugation system, one of the key events in autophagy.The predicted structure of human ATG10 consisted of a conserved canonical E2 fold namely central four-stranded β sheet and two α helices, along with two additional α helices and two β strands inserted in the E2 core, as observed previously in yeast Atg10 [57][58][59].The structural similarities between human and yeast Atg10 are also expected owing to their functional similarities, despite their low sequence homology (~19% with Km yeast and ~ 23% with Sc yeast).Although the topology is mostly the same as other yeast proteins, five notable differences were found (Figure 4B).In particular, human ATG10 composed of three longer α helices (residues 125-142, 165-181, and 188-218), a longer flexible region (FR) region  and absence of one of the β strand (91-97 of Sc yeast).To understand the dynamics of the structural regions, 1 μs MD simulations of an AF2 predicted model of human ATG10 in explicit solvent was performed.During most of the trajectory length, the protein fold was maintained.However, key fluctuating residues were in the C-terminal and FR region (Figure 4C).The dynamic nature of these regions is likely to play an important role while interacting with its binding partners, enabling the protein to adopt diverse orientations.Interestingly, one of the longer α helical regions (165-181) of human ATG10 contained the catalytic cysteine residue (Cys166) at the beginning of the helix in AF2 predicted structure.Prominent early work by Hong et al. described 2.7 Å resolution structure of yeast with well characterized catalytic cysteine to be present at the loop albeit juxtaposed to helical residues at position 134-138 [58].Figure 4D depicts the time evolution of dihedral angles (φ and ψ) of Cys166 that showed transitions from characteristic alpha helical (−60, −40) to loop conformations for ~ 20% of trajectory duration.The ensemble of structures with cysteine in the loop region obtained from simulations is shown in Figure 4E.Given the presence of cysteine at initial helical region and rare transitions to the loop region, we speculate that precise location of cysteine conformation needs further validation.
Lastly, AF2 predicted a reasonable model of ATG14, with high confidence pLDDT score of 73.5 which is composed of N-terminal coiled-coil and a C-terminal BATS domains having pLDDT values of 95.7 and 44.7, respectively (Figure 4F).ATG14 protein forms a part of PtdIns3K complex and helps in mediating autophagosome nucleation [60].The protein exists as both monomeric and oligomeric states [61,62].The coiled-coil domain is known to interact with proteins such as BECN1/Beclin 1 in an extended helical conformation, and is required to target the PtdIns3K complex to the proper site [62,63].We observed coiled region to be modestly flexible, with calculated RMSF being maximum for the N-terminal disordered segment (1-70 residues).This conformational flexibility is likely to play a crucial role in rendering structural elasticity to ATG14 in oligomeric form, facilitating its interaction with different proteins and regulating its diverse functionality.Although MD simulations could capture the flexibility of the coiled coil domain in the monomeric form as opposed to the static rigid extended helical conformation of the domain predicted by AF2, the simulation results are still dependent on the starting structure.The presence of helical arrangement of residues of the domain in the AF2 predicted initial structure do not get dissolved completely over the course of 1 μs of MD Simulations at 310 K.This suggests large energy requirements to overcome such barriers and largely originates from theoretical application of monomeric protein prediction where oligomeric conformational space is vastly different.Such caveats need to be taken into account while interpreting the results obtained with such integrated approaches.Overall, the above extensive characterization of three autophagic proteins suggest that AF2 high confidence structures present excellent resources as starting points for detailed functional investigations, including therapeutic interventions and computer-aided drug discovery.

AF2 predicts structures of domains within ULK, RB1CC1, AMBRA1
In addition to the full-length proteins, AF2 predicted functional domains/regions of autophagic proteins (RB1CC1, ULK4 and AMBRA1) with high confidence.Figure 5A shows the domain architecture of RB1CC1, its N terminal domain is predicted with high confidence (pLDDT score > 70).It is predicted to be C-shaped, with maximum dimensions of around 17 nm, consistent with the recent cryo-EM data that shows 16-22 nm tip to tip distance [66].AF2 predicted RB1CC1's NTD displayed good fit into the reported density map as compared to its expected structural templates.In addition to the assigned structural domains, residues 2-110 were also predicted with high confidence of 90.9 pLDDT.
Certain regions within another crucial protein, AMBRA1, which is part of the PtdIns3K complex, was also predicted with high confidence by AF2 (Figure 5B).AMBRA1 functions to regulate the autophagic process and interacts with BCL2 at the mitochondrial outer membrane in the normal condition [64,[67][68][69].The interaction is mediated by residues at the N (1-532) as well as the C terminus (767-1269) [67].Under adverse conditions of nutrient starvation, however, it competes with interaction with BECN1 at the endoplasmic reticulum and mitochondria to stimulate autophagy.While the N terminus is structurally resolved with three WD40 domains assigned to it, no structural information exists for the C-terminal residues.Interestingly, AF2 predicted high pLDDT scores to the unknown structural region between 858-1039.This structural motif was observed to fold together with three WD40 units from the N-terminus, constituting another repeating unit of WD40, and forming a circularized beta-propeller structure.The relative positions of these WD40 units were also predicted with high confidence.Lastly, AF2 predicted the structure of ULK4's C terminal heat repeats with high pLDDT score of 84.8 (Figure 5C).The heat repeats are found in abundance in ULK4 homologs and also a common repeat in multiple other proteins.In Arabidopsis ULK4, these regions are involved in microtubule binding.Overall, the domain information predicted by AF2 within high accuracy may be attributed to already existing homology, however the accessibility of these domains may now be helpful for genetic perturbations.

Structure and dynamics of multimer human ATG7-ATG10 conjugation system
Autophagic protein ATG7 is an E1 enzyme that sequentially recognizes multiple conjugation partners to form unique multimeric complexes [70].Herein, we utilized AlphaFold2 multimer (AF2-Mult) application to generate ATG7-ATG10 complex that participates in ATG12 conjugation system [36].Although the complex is available in yeast, owing to less sequence homology, the spatial arrangement of tetrameric assembly in humans has remained unclear [71].The predicted structure of top ranked human ATG10-ATG7-ATG7-ATG10 complex with mapped pLDDT scores is shown in Figure 6A.The structure was predicted with reasonably high confidence for ATG7 having an average pLDDT score of 81.3, however ATG10 pLDDT score was only 52.7.It was observed that ATG7 adopts a bird shaped topology, with two N-terminal domains (NTD) connected to C-terminal domain (CTD) via linker.Figure 6B shows the complex architecture, with CTD containing adenylation domain that binds UBLs (ATG8 and ATG12), a part of it also refers to cross-over loop with catalytic cysteine 572, and finally extreme C-terminal domain (ECTD).ATG10 protein consists of a conserved canonical E2 fold namely central four-stranded β sheet and two α helices along with two additional α helices (B, C) and two β strands inserted in the E2 core as detailed in previous section.An overall comparison with previously available structures of yeast revealed an increased buried surface area in humans, where the buried surface area between ATG7-ATG10 in yeast was observed to be ~ 1745 Å 2 , while it was ~ 2790 Å 2 in human (Figure 6C).
To understand the contributions and interplay of individual arrangements of ATG7-ATG10 domains, we performed three independent molecular dynamics simulations of the complex predicted by AF2-Mult.The tetramer was observed to be highly dynamic, with the ECTD of ATG7 populating the major structural changes (Figure S3A).The NTD region fold was essentially maintained during the simulations with an average RMSD of 0.37 ± 0.07 nm across all simulations, however its relative orientation varied substantially (Figure S3B).
Further, four key motifs or specific protein segments were observed to be highly dynamical and as a result may play a critical role for the complex formation and function (Figure 6D).Firstly, the loops in NTD (218-226, 252-263, and 288-315 a.a.) along with two beta hairpins (200-210 and 269-279 a.a.) showed high fluctuations.Secondly, the linker that connects NTD to the CTD region also directly affected the NTD movement, with an average deviation of 0.12 ± 0.02 nm.Thirdly, the ECTD region in human ATG7 was predicted as a straight helix by AF2, however, in our simulations, we found it to be highly mobile, with high deviation (0.89 ± 0.27 nm).Lastly, the loops in ATG10 were found to be flexible, in particular, residues 33-42, 51-92, 128-142, and 175-187.It is interesting to note that the loop 53-90 a.a. is proposed to be the part of FR region of the ATG10.More importantly, the dominant interface region was contributed by the longer FR of ATG10 which is embedded deeply in ATG7 NTD.
Another interesting feature of ATG10-ATG7 is the transfer of UBL ATG12 that is mediated by two key cysteine residues; Cys572 located in the adenylation domain of the ATG7 and Cys166 of ATG10.The relative orientation and distance between these two catalytic cysteines regulate the feasibility of ATG12 transfer.Our simulation trajectories revealed dynamic movement of cysteines, with latter part of simulations showing stable Cys572-Cys166 distance to the opposite ATG10 (Figure 6E).There are two suggested mechanisms for the transfer of this ATG12 to ATG10 [59,71], where when the transfer occurs to the ATG10 bound to the same ATG7, it is termed as -cis while when it is transferred to the ATG10 bound to the opposite ATG7 unit, it is referred as -trans (Figure S3C).The distance between these two Cys residues in the -cis form remained ~4.6 nm or above throughout the simulation time (Figure S3D).These dynamical changes confirm that the feasibility of ATG12 transfer is favored in -trans form rather than -cis form.

Experimental validation of predicted protein-protein interface of ATG7 and ATG10
Our simulations provided key insights in the conformational changes of the two proteins in the complex.The proteinprotein interface induced a hydrogen bonding network between NTD residues of ATG7 and FR region of ATG10 (Table S1).Notably, 3 salt bridges viz.Arg246 (ATG7)-Asp80 (ATG10), Asp23 (ATG7)-Lys95 (ATG10) and Lys306 (ATG7)-Glu83 (ATG10) are proposed to contribute to the stability of ATG7-ATG10 interactions (Figure 7A).To further investigate the role of these salt-bridges in maintaining the stable ATG7-ATG10 conformation, we mutated these three residues in ATG7 to generate an in-silico triple-mutant tetramer complex of ATG7 D23K,R246D,K306E and analyzed its impact on the protein-protein interaction.Mutant simulations were observed to show larger backbone deviations and interestingly resulted in the disruption of the salt-bridges as observed by increased distance between Arg246 (ATG7)-Asp80 (ATG10), Asp23 (ATG7)-Lys95 (ATG10) and Lys306 (ATG7)-Glu83 (ATG10) (Figure 7B).
To validate predicted binding regions, we prepared and characterized wild-type ATG7 N-terminal (WT ATG7 NTD) and full-length ATG10 construct (Figure S4A-B).In addition, we performed site-directed mutagenesis of D23, R246 and K306 to K23, D246, and E306 respectively of ATG7 protein in N-terminal region.In order to investigate if three amino acid substitution brings any conformational changes in WT and mutant NTD protein, we recorded the circular dichroism (CD) spectra in the far UV range (190-250 nm) to detect their secondary structure.The CD spectrum of wild-type ATG7 was characterized by the minimum at 220 nm and positive ellipticity at ~200 nm and displayed a β sheet spectrum [72].Similar CD spectral curve of mutant ATG7 and the wild type suggested that amino acid residue substitution did not affect the secondary structure of the protein (Figure S4C).Further, we carried out isothermal titration calorimetry (ITC) of ATG10 with the N-terminal domain of ATG7 in a binding buffer at 25°C (Figure 7C, left panel).The raw titration data shown in the upper panel could be best fitted to the "one set of sites" binding model and the lower panel shows the endothermic heats of binding corresponding to the successive injections together with the fit obtained.The average parameters of three independent titration experiments yielded the binding constant (K d ) = 3.6 ± 0.8 µM, stoichiometry (n) = 0.87 ± 0.1, enthalpy of binding (∆H) = 4 ± 0.3 kcal/mol and entropy changes on binding (-T∆S) = −11.5 ± 0.3 kcal/mol.The entropy-driven thermodynamic force behind the ATG10 and WT ATG7 binding in the complex formation might be the contribution of changes in both solvent and protein conformation.Remarkably, mutant completely abolished the interaction with ATG10, as shown in Figure 7C (right panel) and Figure S4D.Overall, the biophysical experiments confirm the spatial binding mode predicted from simulations and provide in-depth analysis of conformational rearrangements in the binding interface.

RAPSAP: resource of AF2 predicted structures of autophagy pathway
The RAPSAP is a curated web-resource with comprehensive structural information on autophagy proteins in post-AlphaFold era (http://rapsap.igib.res.in).The resource was broadly categorized to host two set of information, first AF2 models and analysis, and secondly MD simulation related data (Figure 8).As a meta-data approach, we gathered predicted models by AF2 of core autophagy proteins and their interactors (416 proteins).The models were then analyzed on the basis of pLDDT scores and the structured and disordered ratio.The webpage with AF2 models and analysis reports precisely average pLDDT scores, BLASTp results, pLDDT score on the predicted segments, and intrinsically disordered analysis based on IUPred2.A list of non-IDRs with high pLDDT scores are provided to enable the readers to identify functionally important regions predicted by AF2.Secondly, ensemble of high confidence structures of three core autophagic proteins (ATG2, ATG10, and ATG14) was generated using μs MD simulations.In addition, AF2 multimeric complex models of ATG7-ATG10 along with structural ensemble from simulations, are made freely available to the readers.The key molecular interactions crucial to the complex formation were experimentally validated using mutagenesis and isothermal titration calorimetric experiments.Overall, we provide large amount of structural information as an open access that can be utilized to aid future experiments and expand structure-based understanding of autophagy pathway.

Discussion
Autophagy is a highly regulated catabolic process of degrading cytosolic components and is essential for maintenance of cellular homeostasis and adaptation to cellular stresses.However, autophagic proteins share low sequence identity in diverse organisms, have membranous localization and are of heterogeneous nature, posing constraints in their structural and functional characterization.An assessment based on available structural information suggested that ~ 50% of the autophagy interactome lacked atomistic level resolution (Figure 1).Motivated by transformative results of artificial intelligence-based protein structure prediction approach AF2, we studied its impact on autophagic structural space.Models with high confidence predictions of full length and structured segments in proteins previously unexplored were identified (Figure 2).
In particular, key autophagic proteins ATG2, ATG10, and ATG14 were predicted with relatively high confidence scores.These structures provide an excellent resource as starting points for detailed simulations that have the potential to explore their conformational landscape.In addition, AF2-Mult application was employed to identify structure of key intermediate in autophagy conjugation machinery namely ATG7-ATG10 tetrameric complex.The contributions and interplay of different segments of the interacting proteins were explored using multiple replicas of long length MD simulations.The binding interface residues and functional hypotheses generated were validated using biophysical experiments.Next, we discuss here the presented structural and dynamical understanding of core autophagy machinery available in AF2 in context of initiation, nucleation and elongation of autophagosome.Initiation.The initiation of the autophagic pathway in humans is mediated by the ULK complex that consists of ULK1, ULK2, ULK3 or ULK4, RB1CC1, ATG101, and ATG13.While ATG101 has complete structural information available in literature [73][74][75], the remaining proteins namely ULK1, ULK2, ULK3 and ULK4, RB1CC1, and ATG13 consists of certain domains with missing structural resolution in humans.The recently proposed model by Shi et al. sheds light on the arrangement of initiation complex which has got substantially enhanced by AF2 predicted interfaces of these proteins [66].Broadly, the domains that can be further explored include NTD in RB1CC1; MIT domain in ULK1 to ULK3 and MIM domain in ATG13; and HEAT domain in ULK4 (Figure 5).Firstly, RB1CC1's N-terminal dimerization domain is predicted with high confidence by AF2 having ~17 nm diameter, which is proposed to be compatible with localization on growing phagophore's rim or on highly curved ATG9containing vesicles [66].Secondly, the high confidence structural prediction of MIT domain in ULKs and MIM domain in ATG13 opens the door to explore its role in scaffolding the complex and in membrane binding.Although, the predicted structures comply with the previous structural arrangement of the complex in yeast, its structure in the apo form needs further investigation [76].Previous report points toward completely flexible MIT and MIM domains in unbound form in yeast [77].The overall structural organization of MIT domain was observed to be similar in ULK1 and ULK2, that might explain the functional redundancy in the two isoforms.ULK3, conversely, has different spatial arrangement of the domain, which could be the probable reason behind its differential functionality.Lastly, AF2's prediction of ULK4's heat repeat domain can be leveraged to explore the involvement of these extended helical regions in microtubule binding, and their communication to the (pseudo) kinase domain, etc [78][79][80][81].
Nucleation.Secondly, the nucleation step in autophagosome biogenesis is regulated by the PtdIns3K complex.The complex has two distinct types: PtdIns3K complex I and PtdIns3K complex II.While the structural information of the common core subunits viz PIK3C3, PIK3R4, and BECN1 of the two complexes are available in protein data bank, structure of the unique component of PtdIns3K complex I viz ATG14/ATG14L is proposed by AF2 (Figure 4).The predicted structure of ATG14 was observed to resemble its counterpart in PtdIns3K complex II, UVRAG, only partially, and substantial differences were observed at the N and C terminus [82].These structural differences can be explored for probable molecular determinants underlying the functional differences between PtdIns3K complex I and II.Further, the structure in the monomeric form adopts collapsed, curved conformation of its coiled coil domain.This domain forms the interaction site of numerous proteins [61,63], thus, its flexibility might govern diverse functionality.Downstream to the PtdIns3K complex formation is recruitment of ATG9 vesicles whose subcellular distribution is maintained by ATG2-WIPI complex [83,84].The complex acts as a tether, is proposed to play an essential role in transferring lipids to phagophores and in mediating expansion of the phagophore membrane [52,53,85].The mechanistic insights are however enigmatic, probable reason being lack of the structural information of ATG2.High confidence prediction by AF2, followed by MD simulations, suggested that ATG2 forms an elongated rod consisting of repeating β strand segments with hydrophobic residues forming the inner cavity that runs along the length of ATG2A (Figure 3).Based on simulation, atomistic details of long cavity were computed that showed dynamic transitions of residues showing closing of the pore.Interestingly, the cavity length also decreased during the simulation from 21 to 18 nm, and we could attribute this change to RBGs stacking.These factors might regulate kinetics of lipid transfer, facilitating the flow of lipid molecules along the rod to allow expansion of the acceptor membrane [53,85].
Elongation.Thirdly, the expansion of the phagophore membrane is mediated by two UBL cascades involving ATG12 and the LC3 conjugation systems [70].While the structural information of the core proteins involved in the latter system are available, the former conjugation system involves ATG10, with no atomistic resolution structure available in humans.It functions as E2 ligases to transfer ATG12 from E1 enzyme, ATG7, and conjugate ATG12 to ATG5.The resulting complex interacts with ATG16L to localize on the forming autophagosome, regulating its elongation.We presented structural model of one of the intermediates in the pathway ATG7-ATG10 tetrameric complex using AF2-Mult (Figure 6).The predicted structure has ATG7 being arranged in bird topology and ATG10 interacting via multiple interfaces.The protein-protein interfaces predicted by AF2 harbor regions that may adopt multiple conformations.Towards this, we performed three molecular dynamics simulations that provide dynamical information on functionally relevant regions such as strength of interactions during the course of simulations.These are usually a good measure of crucial interactions.Three proposed salt bridges were experimentally validated using site directed mutagenesis and ITC (Figure 7, Figure S4).
Finally, the present study is available as a web source RAPSAP (http://rapsap.igib.res.in) that offers a user-friendly way to access the data that is based on integrated approach of combining AF2 predictions, MD simulations and in vitro experiments (Figure 8).
The present study provides mechanistic insights into the functioning of different proteins in core autophagy machinery utilizing combination of AF2 predicted models with MD simulations and invitro experiments.The immediate extension of the current work would be generating different multiprotein complexes and identifying therapeutic targets.
Limitations of AF2 predictions.Some of the autophagic proteins have low confidence score regions (RB1CC1) and have large positional error in domain placement.For instance, ULK's kinase and MIT domains, C terminal region in ATG2A, BATS domain in ATG14 have large expected positional error in their orientation with respect to other domains.The poor alignment of the domains could also be due to flexible nature of the domains.Another point of caution while using AF2 predicted structures is that unstructured regions predicted by AF2 should not be interpreted as representative conformation that describes the ensemble but rather just as a prediction of disorder.AF2 has been recently reported to overestimate the radius of gyration of IDRs [47].Thus, subsequent processing steps of the IDRs, as also proposed by Ruff et.al., are required in order to remove steric clashes and refine them based on experimentally derived restraints [47].Third, potential limitation is posttranslational modifications that are not captured by AF2, and they can lead to major conformational changes as they often act as conformational switches.Also, AF2 have been previously found to be less capable of accurately predicting the structural effects of missense mutations [43][44][45].

Conclusions
Autophagy is a cellular degradation pathway in response to starvation or other kinds of stress.While our understanding of proteins involved in autophagy has advanced significantly, there remains little data on how these proteins achieve specificity.Several autophagic protein complexes tether to lipids on autophagosomal membrane, making them difficult targets to characterize experimentally.The present work evaluated the application of AF2 predictions in autophagy.We found that AF2 holds a great potential in identifying structural elements of the autophagy interactome and it expands the structural space by ~ 47%.Our study presents a unique pipeline by integrating AF2 predicted models as starting points, molecular dynamics simulations to provide relevant states and, finally experimentally validating our key results.Additionally, AF2 multimer application combined with simulations also provides an excellent framework to understand protein-protein interactions and sets a precedent to explore more autophagic protein complexes.The current work is also available as a web resource (RAPSAP; http://rapsap.igib.res.in) and the data is made freely available to gain further insights into the functioning of core autophagic machinery.

Generation of structural autophagy interactome
A set of 38 human core autophagic proteins was created and their interacting partners were extracted from the recent study by Behrends et al. that have utilized STRING [86], BioGRID [87,88] and MINT [89,90] databases for data acquisition [51].The uncharacterized proteins were filtered out, summing to 416 proteins in the interactome.Mammalian autophagy interaction network was thereafter constructed using Cytoscape [91,92].The nodes represented the protein, and the interactions were denoted by edges of the network.The edges and the nodes boundary were colored on the basis of their interaction with core autophagic proteins placed in four key functional categories viz ULK complex, PtdIns3K complex, ATG9 complex and UBL conjugation system.For clarity in Figure 1A, all interactors were colored as per their interaction with core proteins in the functional subgroup, i.e. both primary and secondary interactors were merged.The analysis was performed on primary (direct) interactors.In addition, proteins were also classified as per the availability of the structural information.In total, 416 proteins formed autophagy interactome.Individual proteins were selected as a query and a BLAST search was performed against all the proteins deposited in the Protein Data Bank dated 10 January 2022 [93].Hits with e-value less than 1e −7 were considered significant, in accord with previous study [94].Residual coverage was identified for each of the query proteins, and its percentage with respect to total sequence length was calculated.In-house scripts were used for extracting and analyzing data.Matplotlib library of python was used for plotting [95].Finally, the structural information was mapped in terms of color, the nodes of the proteins with structural coverage < 90 are colored green while remaining proteins are depicted by gray color.

Structural prediction by AlphaFold2
AlphaFold database was used to download AF2 predicted models of the autophagic interactome [22,24].pLDTT score of each residue was extracted from the B-factor column of the protein coordinate file of the model.The analysis was performed on proteins with structural coverage less than 90.ChimeraX 1.3 was used to depict the representative predicted structures with mapped pLDDT scores [41,96].Residues with pLDDT ≥ 70 are considered to be of high confidence.At a protein level, high confidence structures are defined as those with 50% of the residues having pLDDT ≥ 70.For core autophagic proteins, secondary structure was computed using Stride and coil regions were considered as disordered [65,97].

Disorder prediction
IUPred2A python script with the recommended prediction type of "long" disorder option, was used to identify intrinsically disordered regions in proteins of the autophagic interactome [98,99].The threshold was based on recent study by Valencia and group, where all the residues with score greater than 0.5 were considered as disordered [94].

Molecular dynamics simulations.
(i) System Preparation: For ATG10 and ATG14, AF2 standalone application was utilized to generate 5 different models of the protein.The lowest ranked structure obtained was used as starting structure for MD simulations.The original model used at CASP14 with no ensembling specified (by adding -model_-preset=monomer) was used.For ATG2A, predicted structure by AlphaFold Database was used as the starting structure.AF2-Mult application was used to generate 5 models of ATG7-ATG10 tetramer complex (using -model_preset=multimer) [36].For all the runs, we have used all the genetic databases used at CASP 14 viz.UniRef90, MGnify, Uniclust30, BFD, PDB70, PDB, PDB seqres (only for AF2-Mult), and UniProt (only for AF2-Mult).This is specified using -db_preset=full_dbs.All templates until 2022-03-03 was used by passing -max_template_date = 2022-03-03 flag.The computational time required for running monomeric and multimeric proteins varied according to the sequence length of the protein.
A multimer run took ~ 126 CPU hours for generating 5 output models.The lowest ranked ATG7-ATG10 tetramer structural model predicted by AF2-Mult was utilized to generate the mutant ATG7-ATG10 models.In silico single point substitution for ATG7 residues, namely-D23K, R246D and K306E were generated using Dunbrack 2010 backbonedependent rotamer library in Chimera [100].(ii) Simulations set up: Molecular dynamics simulations of ATG2, ATG10, ATG14, WT ATG7-ATG10 tetrameric complex and mutant tetrameric structural models were performed using GROMACS 2020.6 [101,102].All the systems were performed under similar conditions.Protein of interest was placed in a cubic box, solvated using TIP3P water model [103], and neutralized using Na + or Cl − , as required.CHARMM36 all-atom force field (March 2019) was employed [104].The equations of motion were numerically integrated with a 2 fs time step, and the coordinates were saved at every 100 ps.Particle Mesh Ewald was used for computing long range electrostatics [105].The steepest descent approach for energy minimization was used followed by two step equilibrations using NVT followed by NPT ensemble.The production run was carried out in NPT ensemble using Nose-Hoover thermostat and Parrinello-Rahman barostat to maintain the temperature and pressure at 310 K and 1 bar, respectively [106].Three simulations for WT ATG7-ATG10 tetrameric complex corresponding to 550 ns each were carried out.While for the triple-mutant system, a single production run up to 550 ns was generated.For ATG2, ATG14 and ATG10, simulations of 1 μs each were carried out.(iii) Analysis: Root Mean Square Fluctuations were computed for alpha carbon atoms of a protein, averaged over entire trajectory, using gmx rmsf utility of GROMACS.gmx distance and gmx mindist were utilized to compute distance between two atoms and minimum distance between two residues, respectively.gmx rama was used for dihedral angle calculations.Molecular lipophilicity potential maps for ATG2 were created using ChimeraX [41,96].For calculating hydrogen bond occupancy, VMD [107] was utilized.VMD and ChimeraX were used for representing snapshots at different time points.
Cavity calculation.For ATG2 cavity computation and detection, CAVER v3.0 software [108] was employed with a 0.8 Å probe radius, 10 Å shell radius, and 4 Å shell depth.The average-link hierarchical algorithm was used for clustering and the clustering threshold was set to 4. The structured beta-sheet residues were used for the input structure for central cavity calculation after the loop residues were eliminated.Over the 1 µs trajectory, the structures were taken every 100 ns.The starting position for the CAVER calculation was set to employ the center of mass of residues of the C-terminal RBG (1663-1714).The tunnel clusters obtained were further filtered by tunnel length to reduce redundancy.Finally, the lining residues were defined within 5 Å of the tunnel.We also utilized mdpocket, which is based on fpocket version 4.0, to detect the cavity and pocket volume of ATG2 as a quality check.
To define the pocket, we used a threshold of 0.5 isovalue.
The cavity was calculated using a minimum alpha sphere radius of 3.5 Å and a maximum radius of 10.0 Å, with a requirement of at least three alpha spheres to be present in the pocket.The results obtained for pocket lining residues were consistent with caver output.Angle calculations.Stacking angle of different RBG subunits was calculated by taking center of mass of the two concerned RBGs as the two reference points and the center of mass of the interface beta strands was taken as the third point.VMD was used for such computation every 1 ns across the trajectory.

Circular dichroism (CD) spectroscopy
CD spectra were recorded using the Jasco 815 spectropolarimeter.Spectra of WT ATG7 NTD and mutant proteins (~20 µM) were collected in the CD buffer (the composition of the buffer was the same as binding buffer except NaCl was replaced by NaF as chloride ions absorb strongly below 195 nm) [109].The cuvette with a 0.1 cm path length was employed to record spectra from 190 nm to 250 nm and averaged over 10 scans, with a 10 s averaging time per wavelength.The experiment was performed in triplicate.

Isothermal calorimetry
Calorimetric titrations were carried out at 25°C using a Microcal PEAQ-ITC (MicroCal, LLC).Proteins were exchanged into a binding buffer by gel filtration.Over the course of 19 injections of 2 μL each, (His) 6 ATG10 protein at 300 μM were titrated into 300 μL of 20-30 μM ATG7 constructs.All the experiments were performed in three replicates.The protein was titrated into the buffer to check that heat of dilution was not altering our titration results and almost negligible heat was detected.The MicroCal PEAQ-ITC Analysis software was used to fit integrated heat data from titration using the fitted offset option, which automatically subtracts the control heat from the titration.The isotherm was fitted with one set of site binding model.

Figure 1 .
Figure1.AF2 enhances the structural catalog of autophagic proteins.(A) Autophagy interaction network highlighting the availability of the structural information of four functional autophagic groups color coded by node border; ULK complex (red), PtdIns3K complex (light green), ATG9 complex (yellow) and UBL conjugation (black).The edges depict the interactions between proteins according to the functional group as above.Proteins with less than 90% of the structural coverage are shown by green colored nodes while the remaining proteins with existing structural data are shown by gray color nodes.(B) Autophagy interaction network mapped with AF2 predicted pLDDT scores based on residue-wise calculations is shown in the form of pie chart for each node.The gray color represents availability of structural information.The rest of the nodes were colored with pLDDT ≥ 70 (high confidence) and pLDDT < 70 (low confidence), in blue and orange, respectively.For clarity, core proteins are kept larger in size compared to other interactors.

Figure 2 .
Figure 2. AF2's prediction provides additional structural coverage of core autophagic proteins.(A) Violin plot representing the distribution of residues structurally covered in each core autophagic protein (proteins with structural coverage ≥ 90% are denoted by gray dots and remaining proteins are shown by green dots).Dark gray dots represent the protein with structural information available in humans.Dotted line divides the distribution into proteins with high structural coverage of ≥ 90% and low structural coverage of < 90%.(B) The top panel shows a stacked bar plot representing residues with structural information available in the PDB database (black); residues with AF2 predicted structured domains and residues with AF2 predicted disordered segments according to IUPred2 scores are colored in light and dark gray, respectively.The bottom panel shows residue-wise distribution of pLDDT scores.Proteins are divided into 3 categories namely models with AF2 predictions of full-length proteins (in dark blue box), models with AF2 predictions of functional domains (in light blue box) and AF2 predictions of disordered/ truncated segments of proteins (in orange box).

Figure 3 .
Figure 3. Structure and dynamics of ATG2A reveal key structural elements related to its function.(A) AF2 predicted model of ATG2A is depicted in cartoon representation.The residues are colored based on their pLDDT scores using ChimeraX.(B) Snapshots of ATG2A at different time points during the course of MD simulations, depicting cavity dynamics calculated using CAVER.(C) Representation of the long cavity into four small cavities comprising of two "Repeating β-groove" (RBG) units each, with cavity 1 composed of RBG1 and so on and so forth.(D) Time evolution of four distances between residue pairs lining the cavities that are significantly changing during the course of the simulations.(E) Snapshots of cavity lining residues in each small cavity (cavity 1-4) at different time points are depicted in iceblue color with vdw representation.The residue pairs are mapped with same color as in panel (D).

Figure 4 .
Figure 4. Structural and dynamical analysis of monomeric ATG10 and ATG14.(A) AF2 predicted structure of human ATG10 with mapped pLDDT scores in cartoon representation using ChimeraX.(B) Structural snapshots of human AF2 predicted model of ATG10, Sc Yeast Atg10 (pdb 4ebr) and Km Yeast Atg10 (pdb 2lpu).Particularly, human ATG10 is comprised of three longer α helices (residues 125-142 depicted in pink color, 165-181 in purple color, and 188-218 in sea green color), a longer FR region (53-90) colored burly wood and absence of one of the β strand (91-97 of Sc yeast) shown in iceblue color.Catalytic cysteines are shown in sphere representation with orange red color.(C) Snapshot of ATG10 at 1 μs of MD simulations with mapped root mean square fluctuations of Cα atoms averaged over the entire trajectory.(D) Time evolution of dihedral angles (phi and psi) of catalytic Cys 166 depicting transitions from characteristic alpha helical structures to loop conformational propensity.(E) The ensemble of superimposed ATG10 structures from MD simulations with Cys 166 (orange red) in the loop region

Figure 5 .
Figure 5. AF2 predicts small structural regions in RB1CC1, AMBRA1, and ULK4 with high confidence.(A) Domain architecture of RB1CC1 (top).The N terminal domain shows similarity with Atg17 in yeast (PDB 4HPQ [64]) and TBK1 (PDB 6NT9 [65]).Fitting of Atg17, TBK1 and predicted N terminal of RB1CC1 into the cryo-EM (EMD-21325 [55]) (bottom panel).AF2 predicted RB1CC1 N terminal shows better fitting compared to Atg17 and TBK1.(B) Domain architecture of AMBRA1 (top).AF2 predicts high pLDDT scores to the unknown structural region between 858-1039.This structural motif folds together with three WD40 units from the N-terminus, constituting another repeating unit of WD40.The structure of predicted WD40 is shown at the bottom.(C) Domain architecture of ULK4 (top).Heat repeats of ULK4 are predicted at high confidence by AF2.Structure of ULK4 heat repeats is shown at the bottom.Gray color represent domain with experimental structural coverage, blue color depicts domains structurally covered with high accuracy by AF2, and pink color represents low accuracy predictions.All the predicted structures are colored according to pLDDT scores.

Figure 6 .
Figure 6.Structural and dynamical analysis of human ATG7-ATG10 tetramer model.(A) Predicted structure of ATG7-ATG10 tetramer complex by AF2 multimer in cartoon representation.The residues are colored based on their pLDDT scores using ChimeraX.(B) AF2 multimer structure colored on the basis of domains.(C) Comparative buried interface between ATG7-ATG10 tetramer complex in yeast and human being shown in surface representation.(D) Snapshots representing different dynamic regions in the tetrameric complex at different points in simulations superimposed, viz NTD β hairpin (269-271), linker, ECTD and FR region of ATG10 from left to right.(E) Distance between catalytic Cys 572 in ATG7 and Cys 166 in ATG10 in -trans form.Snapshots at the beginning and end of simulations are depicted in the inset.

Figure 7 .
Figure 7. Key salt-bridges stabilize the ATG7-ATG10 interaction.(A) The zoomed-in view of the snapshots at 500 ns of three-key salt-bridge interactions between ATG7-ATG10 in both WT (blue) and mutant (purple), where interacting residues are shown in ball-stick representation.(B) Minimum distance between residues forming the salt bridge between ATG7 and ATG10 in WT and mutant (D23K, R246D, D306E) simulations of the complex.Sim1, Sim2 and Sim3 in the legend represents three independent MD simulations of WT complex.Left to right panels depict distances between residues 23-95, 246-80, 306-83.(C) Isothermal titration calorimetry (ITC) data for ATG10 binding to WT ATG7 NTD (left panel) and mutant ATG7 NTD (right panel).Upper panels show raw data; lower panels depict integrated heat values.Calorimetric titration for ATG10 (300 μM) titrated into WT ATG7 NTD (20 μM) is shown and yielded an endothermic binding isotherm.Derived values for Kd and stoichiometry (n) are shown.For mutant ATG7 NTD, values for Kd and stoichiometry (n) were not determined because of weak binding.

Figure 8 .
Figure 8. Summary of the approach used for generating the current resource termed as RAPSAP (Resource of AlphaFold2 Predicted Structures of Autophagy Pathway).The resource can be made accessible via http://rapsap.igib.res.in.