Hydrophobic cluster database
Class a: All alpha proteins
Class b: All beta proteins
Class c: Alpha and beta proteins (a/b)
Class d: Alpha and beta proteins (a+b)
Class e: Multi-domain proteins (alpha and beta)
Class f: Membrane and cell surface proteins and peptides
Class g: Small proteins
Class h: Coiled coil proteins
Class i: Low resolution protein structures
Class j: Peptides
Class k: Designed proteins
SCOP folds with >5 compatible HCA patterns
Synopsis
Gaboriaud et al. have shown that sequences with very similar distribution patterns of the hydrophobic set of residues V, I, L, F, Y, W, M (detected in a two-dimensional helical representation of the protein sequence) are most often structural homologs, even when the overall sequence identity is as low as 7 %. This representation is obtained by writing the protein sequence on a classical alpha-helix (3.6 amino acids per turn) smoothed on a cylinder. After five turns, residues i and i+ 18 have similar positions parallel to the axis of the cylinder.To make this 3D representation easier to handle, the cylinder is then cut parallel to its axis and unrolled. As some adjacent amino acids are widely separated by the unfolding of the cylinder, the representation is duplicated, making the sequence easier to follow and giving a better impression of the environment of each aminoacid.

Clusters of these hydrophobic amino acids are good markers of regular secondary structures and have been extensively used in the detection of similar folds or similar motifs between sequences showing very limited sequence relatedness (reviewed by Callebaut et al.).
Using a new methodology, we have shown that whereas most structural folds of proteins, as defined in the SCOP classification of protein structure (release 1.65), are very homogeneous in hydrophobic cluster composition, a large number of the described folds are compatible with a large variety of hydrophobic patterns.
We have gathered every distinct hydrophobic cluster pattern present in each fold of the SCOP database (release 1.65) in this HCA database. Every cluster is linked to the relevant PDB structure, for ease of retrieval and comparison. We hope that this information will be helpful in:
the design of synthetic proteins with strutural homology to any given fold
the recognition of protein folding cores
the identification of suitable templates for homology modelling of very divergent sequences
| © 2006 The HCA database authors |