Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining Method

Studying the interface region of a protein complex paves the way for understanding its dynamics and functionalities. Existing works study a protein interface region by the composition of its residues, by the geometry of the interface residues, or by directly aligning interface regions. Very few works use graphs as the tool for modeling the interface regions. In this work, we use interface residues for forming networks from a set of protein structures, and then find subgraphs that are frequent in those networks. For finding such subgraphs, we use a scalable frequent subgraph mining algorithm, which can mine frequent sub-network patterns of a specific size. We then discover the functional motif along the interface region of a given protein from those mined subgraphs. In our experiment, we use PDB structures from two dimeric protein complexes: HIV-1 protease (329 structures) and triosephosphate isomerase (TIM) (86 structures). The proposed frequent subgraph based approach discovers the graphs representing the dimerization lock which is formed at the base of the structure, in 323 of the 329 HIV-1 protease structures. Similarly, for 86 TIM structures, the approach discovers the dimerization lock formation in 50 structures. Our method captures the locking mechanism at the dimeric interface by taking into account the spatial positioning of the interfacial residues through graphs.