Panama bacterial phyllosphere sequences
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Bacterial 16S rRNA gene sequence data (chloroplast-excluding primers 799f/1115r) and metadata for bacterial communities sampled from plant leaf surfaces on Barro Colorado Island, Panama.
Data collection procedures are described in the paper "Relationships between phyllosphere bacterial communities and plant functional traits in a neotropical forest" by Kembel et al. (manuscript in press at PNAS).
Multiple sets of files are included in this data archive:
- a tab-delimited text file with information on barcode sequences and host species identity for each sample. The columns of the text file include a sample identifier (SampleID), forward and reverse barcode sequences* (ForwardBarcodeSequence and ReverseBarcodeSequence), and the identity of the host plant species from which each sample was collected (HostSpecies).
* note that in the raw sequence data, a combinatorial or dual-index barcoding approach was used to uniquely identify samples. The first 6bp of the barcode are on the forward read, and the final 6bp of the barcode are on the reverse read. The barcodes will need to be combined into a single 12bp construct prior to parsing into samples, or parsed with software capable of processing dual-indexed sequences.
- raw sequence data in FASTQ format, produced by a sequencing run using Illumina HiSeq 150bp paired-end sequencing technology. Due to file size limitations, the original tar/gzip archives have been split into multiple parts. To recombine and uncompress the sequence files, the following command works in a Linux/Unix environment:
cat phyllobacteria.fastq.tgz.part-* | tar xz