A Workflow for the Analysis of DNA Microarray Time-Course Data

2012-10-26T12:00:52Z (GMT) by Robert M Flight
<p>The past two decades have witnessed the increasing use of high-throughput<br>measurement technologies in biology and the advent of the –omics fields, including genomics,<br>transcriptomics, proteomics, and metabolomics. These new measurement platforms have<br>motivated the development of novel data-analysis methods and workflows. Nowhere is this<br>more true than in transcriptomics, where DNA microarrays are widely used to measure gene<br>expression. One area that has suffered from a lack of development of new analysis tools is the<br>application of DNA microarrays to time-course data. The use of DNA microarrays to follow<br>temporal changes in biological systems is particularly important, allowing the measurement of<br>dynamic changes in gene expression and providing valuable insight into cellular regulation.<br>However, there are many challenges to analyzing this type of DNA microarray data that are<br>distinct from other gene-expression experiments, thus necessitating the development of novel<br>analysis methods.<br>This thesis reports the development of a workflow for the analysis of DNA microarray<br>time-course data. Particular emphasis is focused on the estimation and incorporation of<br>measurement uncertainties at each step, methods for data visualization and normalization, and<br>the decomposition of data using biologically meaningful models. The emphasis on measurement<br>uncertainties led to a study of operator effects (gridding, flagging) on expression ratios, as well<br>as the validation of a bootstrap method to estimate measurement uncertainties in microarray<br>data. The application of correlation heat maps to time-course array level data allowed the<br>visualization and interpretation of transcriptome-wide changes in gene expression, providing<br>preliminary insights into the data. Microarray normalization was also investigated in the context<br>of time-course experiments, with a comparison of traditional and novel data normalization<br>methods. Finally, the application and analysis of multivariate curve resolution using weighted<br>alternating least squares (MCR-WALS) to time-course data is considered, with the extraction of<br>biological information using the Gene Ontology. The biological systems investigated in this work<br>include <em>S. cerevisiae</em> (yeast; cell cycle and exit from stationary phase), <em>P. falciparum</em> (malaria<br>parasite; intraerythrocytic developmental cycle) and <em>D. melanogaster</em> (fruit fly; life cycle).<br>Through the implementation of the workflow described in this thesis, putative regulatory<br>profiles were extracted for each of these systems that were ontologically consistent with the<br>known biology.</p>