figshare
Browse
final thesis.pdf (12.92 MB)

Transcriptome Analysis of Cancer Data Using Next Generation Sequencing Platforms

Download (12.92 MB)
thesis
posted on 2017-07-23, 12:50 authored by Afifa YousafzaiAfifa Yousafzai, Maimuna Najaf Khan

Breast cancer is a random and deadly disease. Breast Cancer is characterized by the formation of abnormal cell growths i.e. malignant cells called tumours in the tissues of the breasts. Breast cancer is the most frequently diagnosed cancer and is the leading cause of cancer deaths among women worldwide. Every 19 seconds, somewhere around the world a case of breast cancer is diagnosed among women. Every 74 seconds, somewhere in the world, someone dies from breast cancer.This thesis includes statistical analysis of breast cancer dataset taken from NCBI-GEO datasets holding accession number GSE48213 .The dataset containing 5 distinct cell lines make up total of 56 samples for 36954 genes. Next generation sequencing platform that extracted RNA-Seq data was Illumina Genome Analyzer IIx. For this part of thesis 47 cell lines (samples) i.e. cell lines for Basal, Luminal and Claudinlow were taken and comparison groups were Basal Claudinlow, Luminal Basal and Luminal Claudinlow. These samples were statistically examined in terms of expression profiling; isoformic expressions, spread of data, similarity (clustering) in R, and functional analyses were performed, contributing to better molecular understanding of how different genes behave in different breast cancer cell lines. For Breast Cancer dataset specifically the expression of different cell lines were observed to analyze the differentially expressed genes in them. After expression profiling and statistical analysis these differentially expressed genes (up and down regulated) were deducted and further analysed for functional part in DAVID, for different KEGG cancer pathways. This helped finding out how different genes are involved in regulating pathways that ultimately affect cancer cell expressions.

Cardiomyopathy happens when heart is unable to pump properly significant blood to the rest of the body parts in order to keep the required needs of body. Heart failure is a chronic disease. Heart failure signs include problem in proper breathing, swelling in legs and high level of tiredness. Dataset was downloaded from GEO holding accession number GSE55296. This contains RNA-seq count data. SOLid 5500xl platform of next generation technique were used to generate whole transcriptome libraries for sequencing from total poly A-RNA samples. The count data sets were statistically analysed and density plot, bar plot, MDS (multidimensional scaling) were made. Functional analysis was performed to explore different biological processes and pathways in which these genes get involve.This analysis was basically performed on dilated and ischemic heart failure patients who are suggested to go for heart transplantation. The comparison has been made between protein levels of pro ANP, ANP, Corin and Furin levels in left ventricle of these patients being compared with control donor. Furthermore RNA expression level of NPPA genes of ANP proteins were also analyzed in this patient’s left ventricle.

The purposes of this thesis is to understand that how Next Generation Sequencing techniques and statistical computing can help us in analyzing the accurate cause of any disease. Furthermore specifically basic bioinformatics skills and techniques we used contributing to explore and analyze RNA-seq data set of any disease (cancer and heart disease).

History