Performance of CONC with Different Input Features

2013-02-22T08:44:16Z (GMT) by Jinfeng Liu Julian Gough Burkhard Rost
<p><i>F</i>-measures (harmonic mean of specificity and sensitivity; see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020029#s3" target="_blank">Materials and Methods</a>) were calculated for different SVMs for both the coding (A) and non-coding (B) predictions. Since the coding set was twice as big as the non-coding set, the percentage of incorrect predictions was bigger for the non-coding set, hence the smaller <i>F</i>-measures. When used individually, input features achieved <i>F</i>-measures of 67.6 to 90.9 on the non-coding set. Combining the features improved the performance to 97.4 for coding and 94.5 for non-coding. In comparison, ESTScan received <i>F</i>-measures of 86.7 and 69.9 for coding and non-coding predictions, respectively. The top-performing features were number of homologs in the protein database and peptide length.</p>