Transcript assembly and peptide sequences of Atlantic cod
The reference is based on a de novo Trinity [1,2] assembly because the official gene models versions did not contain full length sequences for all genes.
The assembly consists of sequences from the following RNA-Seq data:
Gadus morhua Transcriptome or Gene expression
NCBI project : PRJNA277848
https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA277848
Developmental stages: 10-dph, 20-dph, 30-dph, 45-dph, 60-dph, 90-dph
SRR2045416 brain ,SRR2045417 gills, SRR2045418 heart, SRR2045419 muscle, SRR2045420 liver, SRR2045421 kidney, SRR2045422 bones, SRR2045423 intestine, SRR2045425 embryo, SRR2045415 ovary
Three cod liver samples:
GEO accession: GSE106968 [3]
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106968
Samples: dcod_12_S3, dcod_1_S1, dcod_25_S25
Reads were assembled using Trinity through the Agalma pipeline version 0.5.0 [4]. The transcript assemblies from different stages and tissue samples were mapped to both cod genomes (gadMor 1 and 2) [5,6]. Transcripts were mapped to both genomes as each genome is missing some genes.
Transcripts were annotated with UniProtKB, Zebrafish ENSEMBL gene model and Medaka ENSEMBL gene model. The transcripts were grouped in clusters based on similarity. We chose the transcript with the longest match to the genomes, and the transcript with the best annotation BLAST score. In cases where the same transcript had the longest match to the genome and the best annotation blast score, only one transcript was added. As the transcripts are assembled from many samples we do not know if the differences between transcripts in a cluster are splice variants or assembly errors, as there may be errors in the assemblies based on RNA-Seq.
The aim was to make a more complete reference with full length transcripts.
Bibliography
1 Grabherr, M.G. et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652
2 Haas, B.J. et al. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512
3 Yadetie, F. et al. (2018) RNA-Seq analysis of transcriptome responses in Atlantic cod (Gadus morhua) precision-cut liver slices exposed to benzo[a]pyrene and 17α-ethynylestradiol. Aquat. Toxicol. 201, 174–186
4 Dunn, C.W. et al. (2013) Agalma: an automated phylogenomics workflow. BMC Bioinformatics 14, 330
5 Star, B. et al. (2011) The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207–210
6 Tørresen, O.K. et al. (2017) An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 18, 95
CITE THIS COLLECTION
FUNDING
DL: dCod 1.0 -decoding systems toxicology of cod (Gadus morhua) -environmental genomics for ecosystem quality monitoring and risk assessment
The Research Council of Norway
REFERENCES
- https://doi.org/10.1101/2020.06.23.162792
- https://www.nature.com/articles/nbt.1883
- https://www.nature.com/articles/nprot.2013.084
- https://www.sciencedirect.com/science/article/pii/S0166445X18305216
- https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-330
- https://www.nature.com/articles/nature10342
- https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3448-x