Transcript and genome assemblies of Atlantic cod
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Described here: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3448-x
NEWB454: An assembly based on 454 sequencing reads and sequenced BAC-ends. Assembled with Newbler 3.0.
CA454ILM: An assembly based on Illumina and 454 sequencing reads. Assembled with Celera Assembler.
CA454PB: An assembly based on Illumina, 454 and PacBio sequencing reads. Assembled with Celera Assembler.
ALPILM: An assembly based on Illumina reads. Assembled with ALLPATHS-LG.
gadMor2: An assembly reconciliation of NEWB454, ALPILM, CA454PB, CA454ILM put into linkage groups.
trinity_cod_rna: A Trinity assembly of Illumina RNA reads from 10.1016/j.redox.2015.06.003.
newbler_cod_rna: A Newbler assembly of 454 and Sanger reads from 10.1038/nature10342 and Sanger reads from 10.1186/1471-2164-13-443.
isoseq_cod_rna: A IsoSeq clustering of PacBio RNA reads.
te_comprehensive_cod_replib: The repeat library used for annotation TEs.
repeatmodeler_cod_replib: The repeat library used for annotation.
gadMor2_predicted_*_filtered: The predicted proteins/transcripts, filtered at less than 0.5 AED.
gadMor2_predicted_transcripts_all: All predicted proteins/transcripts.
gadMor2_annotation_filtered_only_gene_models: Only the gene models from the annotation, genes with less than 0.5 AED.
gadMor2_annotation_filtered: Only genes with less 0.5 AED, plus everything else (exonerate, blast, repeats etc).
gadMor2_annotation_complete: Everything MAKER outputs.