Supporting info to DOI:10.1002/prot.21803

Version 2 2014-04-10, 15:28

Version 1 2014-04-10, 14:43

dataset

posted on 2014-04-10, 14:43 authored by Pedro SilvaPedro Silva

Supporting info to Silva, P.J. (2007) "Assessing the reliability of sequence similarities detected through hydrophobic sequence analysis" Proteins: Structure, Function and Bioinformatics, 70, 1588-1594.

HCA_db.zip contains the complete HCA patterns database, which is also available at :http://homepage.ufp.pt/pedros/HCA_db

HCA_analyze (source code + Win executable).
Syntax: HCA_analyze input_file
input_file is an alignment file in PIR format, generated by ClustalW with the appropriate matrix (hca.txt , also present in the distribution). The aligned sequences may have a size up to 2800 aa. The program outputs a tab separated, human-readable, file ("results.txt") which can be easily imported into common spreadsheet software for further analysis. ClustalW can be downloaded from the European Bioinformatics Institute.

HCA_analyze_multiple_aligns (source code + Win executable).
Syntax: HCA_analyze_multiple_align input_file results
input_file is an alignment file in PIR format, generated by ClustalW with the appropriate matrix (hca.txt , also present in the distribution). The program outputs a tab separated, human-readable, file (results) which can be easily imported into common spreadsheet software for further analysis. Two further output files are created: "Distinct_HCA_patterns.txt" lists all sequences with less than 60% HCA similarity (relative to each other), and "minima.txt" includes the characteristics of the most divergent sequences (based on HCA score, charged aminoacid similarity and proline distribution).

SCOP (Win executable)
Syntax: SCOP input_file output_file
input_file should be an output file generated by HCA_analyze. The program outputs the SCOP_class of each protein present in the original input file. The program assumes that PDB codes were used to name the original alignment files analyzed by HCA_analyze. E.g. an alignment of a sequence to chain E of PDB structure 1ABF should have been named 1abf_e.aln. The program REQUIRES lower-case PDB names, and reports the SCOP class according to SCOP release 1.73 (November 2007)

Automated comparisons vs. PDB90.
Includes HCA_analyze and SCOP (Win executables), as well as every PDB sequence with less than 90 % similarity to other PDB sequences (as of November 2008). Also included: a simple batch file that automatizes the task of performing alignment of a query sequence vs. every PDB90 sequence, followed by HCA analysis and SCOP class attribution. Before runnig this batch file, the query sequence to analyze must be placed by the user in a new file called test.txt. DO NOT change the .bat file. Last updated on January, 22nd, 2009.

Comparison of models generated through this method with the best CASP models and experimentally-derived structures.
Comparison of models generated through this method with experimentally-derived structures.
PDB coordinates of the model of amyloid beta peptide described in the paper.