TY - DATA T1 - Large-Scale Predictions of Gram-Negative Bacterial Protein Subcellular Locations PY - 2006/12/01 AU - Kuo-Chen Chou AU - Hong-Bin Shen UR - https://acs.figshare.com/articles/journal_contribution/Large_Scale_Predictions_of_Gram_Negative_Bacterial_Protein_Subcellular_Locations/3044164 DO - 10.1021/pr060404b.s002 L4 - https://ndownloader.figshare.com/files/4749193 KW - annotate gene products KW - subcellular location annotations KW - protein subcellular location prediction KW - Protein Subcellular LocationsMany species KW - PSORT KW - location sites need KW - location group KW - subcellular locations N2 - Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliabe prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called “Gneg-PLoc”, is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc. Keywords: Gram-negative • Subcellular compartment • Gene ontology • Amphiphilic pseudo amino acid composition • Fusion • K-nearest neighbor rule ER -