Additional file 1: Table S1. of Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

Sample description. Intermediate headers show database size and analysis time running with 1 thread (1×) or 20 threads (20×) for each CNIDARIA database group. Each line contains a list of the names of the samples used, sequence ID, source type, source name, reference, size of JELLYFISH database, size of input data, GC content, percentage of Ns, number of sequences in the input data and list of samples used in the REFERENCEFREE comparison. For each k-mer size (11, 15, 17, 21 and 31 bp): number of distinct k-mers, total number of k-mers, number of k-mers occurring only once, number of shared k-mers and percentage of k-mers shared. Input data is in the form of assembled genome (genomic - fasta files), raw genomic data (raw - fastq or BAM), filtered genomic data (raw filtered - BAM) or RNA-seq. The 34 samples of the extended dataset were used exclusively against the 21-mer dataset. Analysis time and database sizes are calculated for each dataset and do not correspond to the sum of the partial times and sizes. (XLS 87 kb)