Version 3 2016-01-12, 09:14Version 3 2016-01-12, 09:14
Version 2 2016-01-12, 02:17Version 2 2016-01-12, 02:17
Version 1 2016-01-12, 02:15Version 1 2016-01-12, 02:15
dataset
posted on 2016-01-12, 02:15authored byJane KhudyakovJane Khudyakov, Likit Preeyanon, Cory Champagne, Rudy Ortiz, Daniel Crocker
Raw assembly of northern elephant seal (Mirounga angustirostris) skeletal muscle transcriptome.
Experimental Design: Skeletal muscle
tissue for transcriptome sequencing was collected from three anesthetized juvenile
northern elephant seals at three time points during an acute stress
challenge experiment: before 0.25 U/kg ACTH injection (“0 hr”), 2 hours after
injection (“2 hr”), and 24 hours (“24 hr”) after injection.
RNA Isolation: Tissue samples were stored at −80 °C until extraction. In the
laboratory, 75–165 mg of muscle tissue were minced with a scalpel on
ice, transferred to a glass tissue grinder (Kimble-Chase Kontes Duall,
USA), and homogenized with 1 ml of TRIzol Reagent (Life Technologies,
USA). RNA was extracted according to the manufacturer’s protocol and
purified with the RNeasy mini kit including a 30-minute on-column DNase I
digest (Qiagen, USA). RNA was treated with TURBO DNase I (Ambion, Life
Technologies, USA) for 30 minutes according to manufacturer’s protocol.
Phenol:chloroform:isoamyl alcohol (Affymetrix, USA) extraction was
performed to remove DNase I. RNA concentration was quantified on a Qubit
fluorometer (Life Technologies, USA). Total RNA integrity was evaluated using 2100 Bioanalyzer
RNA 6000 kit (Agilent, USA). All RNA samples had integrity values (RIN) of 7.6 - 9.0.
Library Preparation and Sequencing: Libraries for sequencing were prepared according to TruSeq protocol
(Illumina, USA). Specifically, mRNA was isolated from total RNA samples
using oligo-d(T)25 magnetic beads (Dynabeads: Invitrogen, USA) and used
as template for first-strand cDNA synthesis. After double-stranded (ds)
cDNA synthesis, overhang fragments were end-repaired by incubation in
the presence of T4 DNA polymerase and Klenow polymerase. The polished
fragments were phosphorylated by T4 PNK, followed by the addition of a
single ‘A’ base to the 3′ end of the blunt-ended phosphorylated
fragments. This ‘A’ base prepared the cDNA fragments for ligation to
proprietary adapter oligonucleotides (Illumina, USA) that have a ‘T’
base at their 3′ end. Ligation products were subjected to a final PCR
amplification step (8–10 cycles) before library quantification and
validation. Individual libraries were prepared with barcode and all nine
samples (biological triplicates of 0 hr, 2 hr, and 24 hr samples) were
pooled for sequencing on one lane. Sequencing was carried out for
100 cycles on the Illumina HiSeq 2500 platform with paired-end 100 bp
reads and library insert size of approximately 500 bp. The average
number of reads generated per sample was 28.5 ± 7.5 million with total reads numbering 256 million, with 25.6 billion total bases and
66.3 GB of data. Fastq files
were generated using the Illumina Casava pipeline v1.8.2.
Assembly: Raw sequencing data were deposited at NCBI Sequence
Read Archive under study accession [SRP045540]. Transcriptome assembly was performed using the Eel Pond mRNAseq Protocol (https://khmer-protocols.readthedocs.org/).
Analysis was conducted in the cloud using an Amazon EC2 x-large machine
(m1.xlarge), except for assembly, for which a 2x-large machine
(m2.2xlarge) was required. Downloaded reads were trimmed of sequencing
adapters and poor quality sequences using Trimmomatic v0.30
with TruSeq3-PE adapter sequences. Reads with quality scores < 30
and quality base pair percent < 50 were filtered using Fastx toolkit
v0.0.13.2. Sequence quality was evaluated by FastQC v0.10.1;
per-sequence quality score was 38 for each sample after adapter and
quality filtering. One round of digital normalization
(diginorm) was performed on all samples to filter redundant reads with coverage
and k-mer sizes both set to 20. Adapter and quality filtering reduced the amount of
data from 66.3 GB to 23.0 GB and improved sequence quality scores as
determined by FastQC. Assembly was conducted with all nine
sequenced samples using Trinity v2013-11-10
with default parameters (k-mer size of 25) and maximum memory size set
to 30 GB with 4 CPU. Trinity assembled 1.6 gigabases into 522,699 transcripts and 164,966
Trinity components (“gene families”) with 50.88 percent GC content. The
mean, median and N50 contig lengths were 3117 bp, 2298 bp, and 5501 bp,
respectively. 92.62% of left and 92.55% of right
reads could be mapped back to the assembled transcriptome, with 86.60%
proper pairs mapped for a representative sample. Assembly metrics and alignment statistics were
obtained using accompanying Trinity bowtie and samtools scripts.
Specifically, bowtie was run using default parameters of maximum number
of mismatches (N) = 2, seed length (V) of 28, and maximum total of Phred
quality scores at all mismatched positions throughout the alignment
(E) = 70.
Reference: Khudyakov JI, Preeyanon L, Champagne CD, Ortiz RM, Crocker DE. (2015) Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights. BMC Genomics 16: 64.