figshare
Browse
1/1
8 files

Assembly and annotation for haddock with script

Version 2 2018-06-19, 10:20
Version 1 2017-07-07, 11:45
dataset
posted on 2018-06-19, 10:20 authored by Ole K. TørresenOle K. Tørresen
Approximately 160x coverage of Illumina paired end reads and 20x coverage of PacBio reads were assembled with the Celera Assembler resulting in a contig assembly . All Illumina reads were mapped to the contig assembly with BWA, and the scaffold module from SGA was used to scaffold the contigs. To reduce gaps and to improve the accuracy of the consensus sequence, all Illumina reads were mapped to the scaffold assembly, and Pilon was run to improve the contigs using high-coverage short-read information.

An iterative automatic annotation with MAKER using an Illumina based transcriptome of haddock, and proteins from UniProt/SwissProt, annotated 96,576 gene models. InterProScan was run on the predicted proteins of these, and gene names were allocated based on match with proteins in UniProt/SwissProt. We created a filtered set where all genes had an Annotation Edit Distance (AED) of less than 0.5 (where 0.0 indicates perfect accordance between the gene model and evidence (mRNA and/or protein alignments), and 1.0 no accordance). This resulted in 27,437 gene models.

History

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC