The shapes of scientific data
Figure 2. The shapes of scientific data. A wide variety of scientific data can be represented by a handful of fundamental data structures. A list might hold protein or gene identifiers. Networks represent regulatory influence, metabolic pathways or protein interactions. Numeric data resides in matrices, for example a gene expression matrix or promoter motif PSSM. The combination of tabular data and matrices could enable ChIP-chip data, tiling array data and genome features to be plotted by location in the genome. A bicluster, a set of genes co-expressed under specific conditions, might be represented by the combination of a list of genes, a list of conditions and a gene expression matrix, tied together in a tuple (hierarchically nested key-value pairs). Tuples may also represent experiment design (metadata about media, environmental variables or patient data).