Detecting differentially expressed genes of heterogeneous and positively skewed data using half Johnson’s modified t-test

Background: Microarray technology allows simultaneously detecting thousands of genes within one single experiment. The Student’s t-test (for a two-sample situation) can be used to compare the mean expression of a gene, taken from replicate arrays, to detect differential expression under the conditions being studied, such as a disease. However, a general statistical test may have insufficient power to correctly detect differentially expressed genes of heterogeneous and positively skewed data. Methods: Here we define a differentially expressed gene as with significantly different expression in means, variances, or both between the two groups of microarray. Monte Carlo simulation shows that the “half Johnson’s modified t-test” maintains quite accurate type I error rates in normal and non-normal distributions. And the half Johnson’s modified t-test was more powerful than the half Student’s t-test overall when the ratio of standard deviations between case and control groups is greater than 1. Results: Analysis of a colon cancer data shows that when the false discovery rate (FDR) is controlled at 0.05, the half Johnson’s modified t-test can detect 429 differentially expressed genes, which is larger than the number of differentially expressed genes (i.e. 344) detected by the half Student’s t. To target 100 priority genes, the half Johnson’s modified t only set FDR to 4.28 × 10−8, but for the half Student’s t, it is set to 5.39 × 10−4. Conclusions: The half Johnson’s modified t-test is recommended for the detection of differentially expressed genes in heterogeneous and ONLY positively skewed data.