Here, we integrate residue-level and protein-level data for different structural regions such as: disordered regions, structured domains, surface and buried residues.
These datasets serve the analysis of evolutionary constraints within-and-across structural regions of yeast proteins, and were analyzed in a manuscript submitted to Frontiers Molecular Biosciences entitled "Abundance imparts evolutionary constraints of similar magnitude on the buried, surface, and disordered regions of proteins"
The proteins sequences were taken from the reference S. cerevisiae proteome from SGD, filtering for 3797 proteins with known orthologs in 14 fungi species, as defined in Wapinsky et al., Nature (2007).
Two datasets were used for our analysis:
- a protein-centric dataset (PROTEIN-EVORATE.tsv)
- a residue-centric dataset (RESIDUE-EVORATE.tsv)
Both datasets are available as regular data table formatted to Tab Separated Value (TSV).
The residue-level dataset comprises 42 columns with features mapped onto 2,129,854 residues. The protein dataset summarized 25 features related to evolutionary rate on different subset of residues.
A description of each feature is given in a separate file for each dataset with an identical name preceded by "header-".