figshare
Browse
BagPype_manuscript.pdf (3.27 MB)

BagPype: A Python package for the construction of atomistic, energy-weighted graphs from biomolecular structures

Download (3.27 MB)
preprint
posted on 2021-02-16, 16:07 authored by Florian SongFlorian Song, Mauricio Barahona, Sophia N. Yaliraki
Atomistic, energy-weighted graphs of biomolecular structures allow for versatile and efficient modelling of their properties whilst keeping physico-chemical detail. Starting only with a priori knowledge of the spatial arrangement of individual atoms obtained from structural files available at the Protein Data Bank (PDB), we present a multi-step pipeline leading to an atomistic energy-weighted graph with individual atoms as nodes and chemical interactions as edges.
Whilst most graph approaches only consider strong interactions and typically only at the residue level, an advantage of our methodology lies in the inclusion of weaker interactions, such as hydrogen bonds, electrostatics, hydrophobic interactions and п-п stacking interactions in DNA. The latter enable the study of nucleic acids and their complexes with proteins. In addition, we provide an implementation of the framework in the Python programming language, which is made available under the GNU General Public License v3.0 at https://github.com/yalirakilab/BagPype. The computational efficiency of the programme is shown by obtaining wall-clock timing data for over 50,000 experimentally obtained structures spanning most of the PDB. We find that our implementation scales as a slow-growing second order polynomial, where even the largest structures consisting of more than 60,000 residues can be processed in only a few minutes on a standard desktop computer. Finally, a case study of the well-studied lac operon repressor protein-DNA complex, comprising of 10,937 atoms, showcases aspects of the methodology using a dynamics-based graph clustering technique, which has been previously applied successfully to elucidate protein rigidity and multi-scale organisation. The graphs obtained by the approach presented here can be combined with any method that uses graph theoretic or network scientific information.

Funding

EPSRC Centre for Doctoral Training in Physical Sciences Innovation in Chemical Biology for Bioindustry and Healthcare

Engineering and Physical Sciences Research Council

Find out more...

EPSRC Centre for Mathematics of Precision Healthcare

Engineering and Physical Sciences Research Council

Find out more...

History