figshare
Browse
Mirounga_raw_muscle.fa.zip (250.56 MB)

Mirounga angustirostris muscle raw transcriptome assembly

Download (1.91 GB)
Version 3 2016-01-12, 09:14
Version 2 2016-01-12, 02:17
Version 1 2016-01-12, 02:15
dataset
posted on 2016-01-12, 02:15 authored by Jane KhudyakovJane Khudyakov, Likit Preeyanon, Cory Champagne, Rudy Ortiz, Daniel Crocker
Raw assembly of northern elephant seal (Mirounga angustirostris) skeletal muscle transcriptome.

Experimental Design:
Skeletal muscle tissue for transcriptome sequencing was collected from three anesthetized juvenile northern elephant seals at three time points during an acute stress challenge experiment: before 0.25 U/kg ACTH injection (“0 hr”), 2 hours after injection (“2 hr”), and 24 hours (“24 hr”) after injection.

RNA Isolation:
Tissue samples were stored at −80 °C until extraction. In the laboratory, 75–165 mg of muscle tissue were minced with a scalpel on ice, transferred to a glass tissue grinder (Kimble-Chase Kontes Duall, USA), and homogenized with 1 ml of TRIzol Reagent (Life Technologies, USA). RNA was extracted according to the manufacturer’s protocol and purified with the RNeasy mini kit including a 30-minute on-column DNase I digest (Qiagen, USA). RNA was treated with TURBO DNase I (Ambion, Life Technologies, USA) for 30 minutes according to manufacturer’s protocol. Phenol:chloroform:isoamyl alcohol (Affymetrix, USA) extraction was performed to remove DNase I. RNA concentration was quantified on a Qubit fluorometer (Life Technologies, USA). Total RNA integrity was evaluated using 2100 Bioanalyzer RNA 6000 kit (Agilent, USA). All RNA samples had integrity values (RIN) of 7.6 - 9.0.

Library Preparation and Sequencing:
Libraries for sequencing were prepared according to TruSeq protocol (Illumina, USA). Specifically, mRNA was isolated from total RNA samples using oligo-d(T)25 magnetic beads (Dynabeads: Invitrogen, USA) and used as template for first-strand cDNA synthesis. After double-stranded (ds) cDNA synthesis, overhang fragments were end-repaired by incubation in the presence of T4 DNA polymerase and Klenow polymerase. The polished fragments were phosphorylated by T4 PNK, followed by the addition of a single ‘A’ base to the 3′ end of the blunt-ended phosphorylated fragments. This ‘A’ base prepared the cDNA fragments for ligation to proprietary adapter oligonucleotides (Illumina, USA) that have a ‘T’ base at their 3′ end. Ligation products were subjected to a final PCR amplification step (8–10 cycles) before library quantification and validation. Individual libraries were prepared with barcode and all nine samples (biological triplicates of 0 hr, 2 hr, and 24 hr samples) were pooled for sequencing on one lane. Sequencing was carried out for 100 cycles on the Illumina HiSeq 2500 platform with paired-end 100 bp reads and library insert size of approximately 500 bp. The average number of reads generated per sample was 28.5 ± 7.5 million with total reads numbering 256 million, with 25.6 billion total bases and 66.3 GB of data. Fastq files were generated using the Illumina Casava pipeline v1.8.2.

Assembly:
Raw sequencing data were deposited at NCBI Sequence Read Archive under study accession [SRP045540].
Transcriptome assembly was performed using the Eel Pond mRNAseq Protocol (https://​khmer-protocols.​readthedocs.​org/​). Analysis was conducted in the cloud using an Amazon EC2 x-large machine (m1.xlarge), except for assembly, for which a 2x-large machine (m2.2xlarge) was required. Downloaded reads were trimmed of sequencing adapters and poor quality sequences using Trimmomatic v0.30 with TruSeq3-PE adapter sequences. Reads with quality scores < 30 and quality base pair percent < 50 were filtered using Fastx toolkit v0.0.13.2. Sequence quality was evaluated by FastQC v0.10.1; per-sequence quality score was 38 for each sample after adapter and quality filtering. One round of digital normalization (diginorm) was performed on all samples to filter redundant reads with coverage and k-mer sizes both set to 20.  Adapter and quality filtering reduced the amount of data from 66.3 GB to 23.0 GB and improved sequence quality scores as determined by FastQC. Assembly was conducted with all nine sequenced samples using Trinity v2013-11-10 with default parameters (k-mer size of 25) and maximum memory size set to 30 GB with 4 CPU. Trinity assembled 1.6 gigabases into 522,699 transcripts and 164,966 Trinity components (“gene families”) with 50.88 percent GC content. The mean, median and N50 contig lengths were 3117 bp, 2298 bp, and 5501 bp, respectively. 92.62% of left and 92.55% of right reads could be mapped back to the assembled transcriptome, with 86.60% proper pairs mapped for a representative sample. Assembly metrics and alignment statistics were obtained using accompanying Trinity bowtie and samtools scripts. Specifically, bowtie was run using default parameters of maximum number of mismatches (N) = 2, seed length (V) of 28, and maximum total of Phred quality scores at all mismatched positions throughout the alignment (E) = 70.

Reference:
Khudyakov JI, Preeyanon L, Champagne CD, Ortiz RM, Crocker DE. (2015) Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights. BMC Genomics 16: 64.

Funding

NSF, NIH

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC