The hetnet awakens: understanding complex diseases through data integration and open science

Himmelstein, Daniel

doi:10.6084/m9.figshare.4724797.v1

1/1

3 files

The hetnet awakens: understanding complex diseases through data integration and open science

thesis

posted on 2017-03-04, 22:42 authored by Daniel HimmelsteinDaniel Himmelstein

The PhD Dissertation of Daniel S. Himmelstein.

This is my thesis from my PhD in Biological & Medical Informatics from the University of California, San Francisco.

The versionless DOI for this record is 10.6084/m9.figshare.4724797. The corresponding shortened URL is https://doi.org/b2nz.

Files

dhimmel-thesis-figshare.pdf — PDF version of the dissertation produced specially for figshare. This document is identical to the ProQuest version, except that the ProQuest copyright and UCSF library release pages have been removed. This version also has additional PDF metadata including a document outline.

dhimmel-thesis-sharelatex.zip — The LaTeX source of the thesis as downloaded from ShareLatex. The PDF output from compiling this source was used to create dhimmel-thesis-figshare.pdf. Note that dhimmel-thesis-figshare.pdf contains manually modified metadata and the official UCSF cover page.

dhimmel-ucsf-diploma.pdf — PDF photograph of my diploma from the Regents of the University of California.

Title: The hetnet awakens: understanding complex diseases through data integration and open science

Dates: I submitted my dissertation on June 2, 2016. However, my official graduation date was June 10, 2016.

Abstract

Human disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open.

To integrate data, we pioneered the hetnet—a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century.

To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease—gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method.

After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data.

Thesis Committee

Sergio E. Baranzini (chair & advisor)
John S. Witte
Andrej Sali

ProQuest Information:

Dissertation/thesis number: 10133408

ProQuest document ID: 1801982909

ISBN: 9781339919881

ISBN: 1339919885

OCLC Number: 970819555

The hetnet awakens: understanding complex diseases through data integration and open science

Funding

National Science Foundation, Graduate Research Fellowship Program, Grant Number 1144247

History

Usage metrics

Categories

Keywords

Licence

Exports