Nikolas Pontikos PhD Thesis 2015
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
PhD thesis (submitted July 2015).
Genetic association studies have discovered many variants which influence type 1 diabetes (T1D) risk and further correlate with quantitative cell-type specific phenotypes. However, disease associated differences can be small, and large numbers of samples are required to overcome the heterogeneity that exists between humans.
Novel high-throughput biotechnologies measure large number of samples but technical or within-batch variation, may undermine reproducibility of measurements.
In my thesis, I analyse two types of these datasets, central to the study of T1D. The first is generated by flow cytometry, a biotechnology utilising light scatter and fluorescently stained markers to discriminate different cell types. Unfortunately, flow cytometry can be prone to batch effects since blood samples are often collected, prepared and analysed at different times and by different operators. I consider several normalisation techniques to address these issues, using external or within sample controls. The main objective of flow cytometry data analysis is that of identifying different cell types. While this is essentially a clustering problem, currently the most widely applied method is a manual approach which can be inefficient and biased. I investigate ways this process can be automated by fitting mixture models to emulate the manual process. I show that, in the absence of manual gates, data-driven approaches can be applied to detect new cell subsets, not targeted by manual gating, that respond to IL-2 in an in-vitro stimulation experiment.
The second type of dataset is generated by qPCR and genotyping arrays, which are applied to DNA from T1D cases and controls to determine whether copy number variation in two Killer Immunoglobulin-like Receptors (KIRs) genes associates with T1D.
I apply normalisation to correct for batch effects between qPCR plates and clustering using mixture models to identify copy number groups.
Supervised clustering is then used to correlate qPCR copy number with SNP data, allowing for association testing in a twenty-fold larger sample size than ever previously considered for KIR genes.
Finally, I conclude with what I have learned from applying these methods and how these may be further developed, with special attention to flow cytometry where these remain under utilised. In particular, I discuss how normalisation and clustering relate, and how prior knowledge, when available, can be incorporated into the clustering process.