Detecting differentially expressed genes of heterogeneous and positively skewed data using half Johnson’s modified <i>t</i>-test

<p><i>Background</i>: Microarray technology allows simultaneously detecting thousands of genes within one single experiment. The Student’s <i>t</i>-test (for a two-sample situation) can be used to compare the mean expression of a gene, taken from replicate arrays, to detect differential expression under the conditions being studied, such as a disease. However, a general statistical test may have insufficient power to correctly detect differentially expressed genes of heterogeneous and positively skewed data. <i>Methods</i>: Here we define a differentially expressed gene as with significantly different expression in means, variances, or both between the two groups of microarray. Monte Carlo simulation shows that the “half Johnson’s modified <i>t</i>-test” maintains quite accurate type I error rates in normal and non-normal distributions. And the half Johnson’s modified <i>t</i>-test was more powerful than the half Student’s <i>t</i>-test overall when the ratio of standard deviations between case and control groups is greater than 1. <i>Results</i>: Analysis of a colon cancer data shows that when the false discovery rate (FDR) is controlled at 0.05, the half Johnson’s modified <i>t</i>-test can detect 429 differentially expressed genes, which is larger than the number of differentially expressed genes (i.e. 344) detected by the half Student’s <i>t</i>. To target 100 priority genes, the half Johnson’s modified <i>t</i> only set FDR to 4.28 × 10<sup>−8</sup>, but for the half Student’s <i>t</i>, it is set to 5.39 × 10<sup>−4</sup>. <i>Conclusions</i>: The half Johnson’s modified <i>t</i>-test is recommended for the detection of differentially expressed genes in heterogeneous and ONLY positively skewed data.</p>