Additional file 1: Figure S1. of LSTrAP: efficiently combining RNA sequencing data into co-expression networks

Quality statistics for Sorghum bicolor samples. Gray dots indicate quality statistics of the samples based on HTSeq-Count and TopHat. Samples below our suggested quality control (contained within red areas in plot) were excluded from the final network. Figure S2. Dendrogram and heatmap of Sorghum bicolor sample distances. The helper script matrix_heatmap.py calculates the Euclidean distance between samples and plots a hierarchically clustered heatmap of those sample distances. This can be used to detect outliers. Here the most divergent samples (in the top left) are valid pollen and seed samples which are known to have a unique transcriptional profile. Figure S3. Node degree distribution of the Arabidopsis thaliana samples co expression network. Co-expression networks are known to have few nodes with many connections to other genes and many genes with few connections. For the co expression network of Arabidopsis thaliana based on the positive samples, this behavior can clearly be observed. Table S1. Negative Arabidopsis thaliana dataset. The columns correspond to SRA run IDs for the samples, short description (description and type) and mapping percentages for TopHat and HTSeq-count. Table S2. Sorghum bicolor samples with organ annotation. Overview of all Sorghum bicolor samples used, organized by organ the samples were derived from. Methods S1. Data source and curation. Methods S2. PCA analysis of expression data. Methods S3. Power law. (DOCX 411 kb)