A comparison of reduced coordinate sets for describing protein structure

This file contains supplementary material for the following publication:

Title: A comparison of reduced coordinate sets for describing protein structure
Authors: Konrad Hinsen, Shuangwei Hu, Gerald R. Kneller, Antti J. Niemi
Journal: Journal of Chemical Physics 139, 124115 (2013)

DOI: 10.1063/1.4821598

 

It contains the software implementing the computations described in the article, the input dataset, the output datasets, and the figures. A detailed list is given below.

 

Instructions for use

The file software_and_datasets.ap is an HDF5 file that can be read with any HDF5-compatible software, including the free HDFView package (http://www.hdfgroup.org/hdf-java-html/hdfview/). HDFView can be used to inspect the tables contained in this file. Reading the protein structures requires software that understands the Mosaic data model (http://bitbucket.org/molsim/mosaic/).

The file software_and_datasets.ap has been prepared using the ActivePapers framework (http://bitbucket.org/khinsen/active_papers_py) for reproducible research. It contains all the software used in the study that is described in the article, from the download of the protein structures to the generation of the figures. All software is written in the Python language. The ActivePapers framework keeps track of which data was generated using which script and also notes the user name, machine name, and versions of all software packages used when running each script.

Readers wishing to modify and re-run the scripts, or to run the on different input data, should download and install the ActivePapers framework.

 

Datasets contained in the ActivePaper file

/data/pdb_structures

The orginal protein structures imported from the PDB. Only the backbone atoms relevant for our study (N, CA, C) are stored. There is one subgroup for each PDB code, and each subgroup contains a Mosaic universe and a Mosaic configuration.

For more information about Mosaic, see https://bitbucket.org/molsim/mosaic/.

/data/coordinate_analysis/
  averages
  variances
  histograms

The distributions of the internal coordinates of the backbone, computed over all residues of all protein structures. For each coordinate, there is an average value and a variance, and a histogram of the values.

/data/reconstructions

The protein backbone configurations reconstructed from various reduced coordinate sets.

/data/number_of_residues
/data/root_mean_square_distances
/data/radii_of_gyration

Tables containing for each protein the number of residues, the root-mean-square distances from each reconstruction to the initial structure, and the radii of gyration for the initial structure and all reconstructions.

/data/rg_analysis

The fits of the asymptotic large-N behavior of the radii of gyration for the initial configurations and all reconstructions.

/documentation/
  2OVU_initial.pdb
  2OVU_ca.pdb
  2OVU_phipsi.pdb

The PDB files for the initial configuration, the reconstruction from virtual-CA coordinates, and the reconstruction from phi-psi coordinates.

/documentation/
  histograms_distance.pdf
  histograms_angle.pdf
  histograms_dihedral.pdf

The histograms of the internal coordinate values.

/documentation/rmsd.pdf

The plot of the RMSD distances between the reconstructions and the initial configurations.

/documentation/rg.pdf

The plots of the radii of gyration for each reconstruction, with the asymptotic fits.