sorry, we can't preview this file

software_and_datasets.ap (146.46 MB)

A comparison of reduced coordinate sets for describing protein structure

Download (0 kB)
posted on 2013-09-13, 09:43 authored by Konrad HinsenKonrad Hinsen, Shuangwei Hu, Gerald R. Kneller, Antti J. Niemi

This file contains supplementary material for the following publication:

Title: A comparison of reduced coordinate sets for describing protein structure
Authors: Konrad Hinsen, Shuangwei Hu, Gerald R. Kneller, Antti J. Niemi
Journal: Journal of Chemical Physics 139, 124115 (2013)

DOI: 10.1063/1.4821598


It contains the software implementing the computations described in the article, the input dataset, the output datasets, and the figures. A detailed list is given below.


Instructions for use

The file software_and_datasets.ap is an HDF5 file that can be read with any HDF5-compatible software, including the free HDFView package ( HDFView can be used to inspect the tables contained in this file. Reading the protein structures requires software that understands the Mosaic data model (

The file software_and_datasets.ap has been prepared using the ActivePapers framework ( for reproducible research. It contains all the software used in the study that is described in the article, from the download of the protein structures to the generation of the figures. All software is written in the Python language. The ActivePapers framework keeps track of which data was generated using which script and also notes the user name, machine name, and versions of all software packages used when running each script.

Readers wishing to modify and re-run the scripts, or to run the on different input data, should download and install the ActivePapers framework.


Datasets contained in the ActivePaper file


The orginal protein structures imported from the PDB. Only the backbone atoms relevant for our study (N, CA, C) are stored. There is one subgroup for each PDB code, and each subgroup contains a Mosaic universe and a Mosaic configuration.

For more information about Mosaic, see


The distributions of the internal coordinates of the backbone, computed over all residues of all protein structures. For each coordinate, there is an average value and a variance, and a histogram of the values.


The protein backbone configurations reconstructed from various reduced coordinate sets.


Tables containing for each protein the number of residues, the root-mean-square distances from each reconstruction to the initial structure, and the radii of gyration for the initial structure and all reconstructions.


The fits of the asymptotic large-N behavior of the radii of gyration for the initial configurations and all reconstructions.


The PDB files for the initial configuration, the reconstruction from virtual-CA coordinates, and the reconstruction from phi-psi coordinates.


The histograms of the internal coordinate values.


The plot of the RMSD distances between the reconstructions and the initial configurations.


The plots of the radii of gyration for each reconstruction, with the asymptotic fits.



Usage metrics



    Ref. manager