Large-Scale Predictions of Gram-Negative Bacterial Protein Subcellular Locations

2006-12-01T00:00:00Z (GMT) by Kuo-Chen Chou Hong-Bin Shen
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliabe prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called “Gneg-PLoc”, is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc. Keywords: Gram-negative • Subcellular compartment • Gene ontology • Amphiphilic pseudo amino acid composition • Fusion • <i>K</i>-nearest neighbor rule