figshare
Browse

Transcript assembly and peptide sequences of Atlantic cod

Version 2 2020-10-08, 14:38
Version 1 2020-10-08, 14:33
Posted on 2020-10-08 - 14:38 authored by Xiaokang Zhang

The reference is based on a de novo Trinity [1,2] assembly because the official gene models versions did not contain full length sequences for all genes.


The assembly consists of sequences from the following RNA-Seq data:


Gadus morhua Transcriptome or Gene expression

NCBI project : PRJNA277848

https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA277848

Developmental stages: 10-dph, 20-dph, 30-dph, 45-dph, 60-dph, 90-dph


SRR2045416 brain ,SRR2045417 gills, SRR2045418 heart, SRR2045419 muscle, SRR2045420 liver, SRR2045421 kidney, SRR2045422 bones, SRR2045423 intestine, SRR2045425 embryo, SRR2045415 ovary


Three cod liver samples:

GEO accession: GSE106968 [3]

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106968

Samples: dcod_12_S3, dcod_1_S1, dcod_25_S25


Reads were assembled using Trinity through the Agalma pipeline version 0.5.0 [4]. The transcript assemblies from different stages and tissue samples were mapped to both cod genomes (gadMor 1 and 2) [5,6]. Transcripts were mapped to both genomes as each genome is missing some genes.


Transcripts were annotated with UniProtKB, Zebrafish ENSEMBL gene model and Medaka ENSEMBL gene model. The transcripts were grouped in clusters based on similarity. We chose the transcript with the longest match to the genomes, and the transcript with the best annotation BLAST score. In cases where the same transcript had the longest match to the genome and the best annotation blast score, only one transcript was added. As the transcripts are assembled from many samples we do not know if the differences between transcripts in a cluster are splice variants or assembly errors, as there may be errors in the assemblies based on RNA-Seq.


The aim was to make a more complete reference with full length transcripts.


Bibliography

1 Grabherr, M.G. et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652

2 Haas, B.J. et al. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512

3 Yadetie, F. et al. (2018) RNA-Seq analysis of transcriptome responses in Atlantic cod (Gadus morhua) precision-cut liver slices exposed to benzo[a]pyrene and 17α-ethynylestradiol. Aquat. Toxicol. 201, 174–186

4 Dunn, C.W. et al. (2013) Agalma: an automated phylogenomics workflow. BMC Bioinformatics 14, 330

5 Star, B. et al. (2011) The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207–210

6 Tørresen, O.K. et al. (2017) An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 18, 95

CITE THIS COLLECTION

DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
or
Select your citation style and then place your mouse over the citation text to select it.

FUNDING

DL: dCod 1.0 -decoding systems toxicology of cod (Gadus morhua) -environmental genomics for ecosystem quality monitoring and risk assessment

The Research Council of Norway

SHARE

email
need help?