figshare
Browse
p6_DSS2017_LINCSAnalytics.pdf (479.75 kB)

LINCSAnalytics: An integrated platform for the efficient query and computation across diverse LINCS signatures

Download (479.75 kB)
poster
posted on 2017-05-09, 19:57 authored by Michele Forlin, ZongJun Hu, Amar Koleti, John Turner, Raymond Terryn, Caty ChungCaty Chung, Dušica Vidović, Vasileios Stathias, Stephan Schürer

The Library of Integrated Network-based Signatures (LINCS) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. A signature, defined as a specific cellular response to a given perturbation, can hence be expressed as a function of a set of parameters: the model system (typically a cell), the perturbation (e.g. small molecule) and the detected analytes (e.g. expressed in a transcriptional profiling assay) plus additional experimental details (such as concentration and time). In order to effectively use LINCS data for a wide variety of scientific use case, signatures need to be readily queryable, retrievable and accessible for computation as a function of all of these dimensions.


Here we present a computational platform built on top of the open source Cloudera Hadoop platform allowing the distributed storage and processing of large datasets through a number of dedicated modules. LINCS signature data and standardized entity metadata are stored in the Hadoop Distributed Filesystem. Apache HIVE and IMPALA are responsible for the fast query and retrieval of any data point, while computation and modeling are available through Apache Spark and its Sparklyr R interface. Full accessibility to the core of the platform is achieved via a set of APIs, which also allow to build and deploy custom-made applications. As an initial demonstration, we show a simple Shiny R application to interactively query and retrieve LINCS signatures for any dimension of interest.

To enable the computational biology community to use LINCS data in their research via the LINCS Analytics platform, we deployed an R package that allows to retrieve the available data and metadata for any dimension of interest. It also allows on the fly aggregation of replicates and filtering by desired output values.

Funding

funded by NIH BD2K and LINCS programs (U54HL127624)

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC