figshare
Browse
PhD_ManuelCorpas_Manchester_2007.pdf (3.7 MB)

Folding Patterns in Protein Sequences

Download (0 kB)
thesis
posted on 2014-03-18, 12:26 authored by Manuel CorpasManuel Corpas

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences. School of Computer Science (2007).

 

ABSTRACT

In the wake of numerous fruitful genome projects, a growing number of protein sequences remain uncharacterised. Generally, computational techniques have not been developed to distinguish between regions that are conserved in proteins owing to evolutionary pressures to maintain structure and those that are conserved in order to preserve function. Interestingly, there is experimental evidence to suggest that some residues are conserved to maintain the protein fold, while others are conserved to maintain function at the cost of structural stability. Combining this observation with a fingerprint analysis of a representative dataset of globular proteins, here we explore folding signals affecting sequence conservation and, in particular, motifs. Initially, we aimed at determining whether folding signals are encoded in sequence motifs. It was found that those regions best conserved in evolutionary time (i.e., superfamily motifs) tend to concentrate most favourable folding signals. A folding score, combining folding signals affecting sequence conservation, was created to distinguish structurally favourable motifs from those that are not. We found that integration of folding signals indeed added value over individual ones. Folding score troughs were observed to be highly conserved, while peaks had variable degrees of conservation. Coupled with the degree of conservation of residues, the folding score was used to delineate regions that are likely to contribute to (i) the stability of the fold (structural motifs), and (ii) the function of the protein (functional motifs). We have presented a few simple case-studies to illustrate how the combined data can be used to pinpoint motifs with potential structural and functional roles. This method offered a means of automatic motif detection, which could be used for protein family characterisation and functional/structural annotation of evolutionarily conserved regions. As folding information is contained in superfamily motifs, we explored whether folding signals can be used to enhance the diagnostic performance of distant similarity search. For that purpose, fingerprints were created according to motif boundaries imposed by the folding score. Automatically created motifs provided a new means to guide motif selection in fingerprints, where manually selected motifs had been chosen in an ad hoc manner, owing to being extracted from a highly conserved alignment. Similarly, fingerprinting using only automatically defined structural motifs, had a greater number of distant matches than manually created fingerprints. Thus, this new approach to fingerprinting offers a complementary method to manual fingerprint creation and the ability to characterise conserved protein regions from a structural point of view.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC