figshare
Browse
1/1
2 files

Transcriptome and genome assemblies of Blasia pusilla.

dataset
posted on 2024-06-19, 07:10 authored by peter szoevenyipeter szoevenyi

Transcriptome data set:

We combined all RNA-seq data (PRJNA1099914: SRA submission SUB14374705, SUB14439959,SUB14439962) and the draft genome sequence to create reference transcripts for B. pusilla which was used later for gene expression estimation. We generated a genome-guided as well as a de novo TRINITY transcriptome assembly using all collected RNA-seq reads (42 RNA libraries). We used the PASA pipeline to combine the de novo and genome-guided assemblies into a non-redundant set of transcripts and putative genes. We ran all transcripts through Transdecoder (https://github.com/TransDecoder/TransDecoder) to obtain their best ORF and peptide translation. To reduce the number of potential transcripts and gene models, we discarded all putative gene ids without at least one complete ORF prediction. To identify potential contaminants in this filtered transcript file, we selected the longest ORF for each putative gene and searched them against the Eggnog database in TRAPID 2.0 and assessed their taxonomic assignment. The initial run indicated that most of the contamination concerned yeast. To remove these transcripts, we searched all transcripts against the full transcriptome of Saccharomyces cerevisisae S288C (assembly R64) transcripts using blastn version 2.12.0+ and discarded all transcripts passing the following filtering criterium: -evalue ≤ 0.0001, similarity value ≥ 90%, and query coverage ≥ 80%. (Yeast transcriptomics data was downloaded from here:   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_rna.fna.gz) After that, we ran a further taxonomic assignment using TRAPID 2.0 and removed another smaller set of genes mapping to Viruses, Bacteria, Archeae, Opisthokonta, and other Eukaryota. The final assembly had 22635 genes represented by 156239 transcripts.


Genome:

In-vitro grown gametophytic cultures of B. pusilla on MS medium were used for genomic DNA extraction using the plant-EZ DNA extraction kit and were sequenced using PacBio-SMRT at the Institute of Biotechnology, University of Helsinki, Finland. Reads mapped using BLASR to organelle and cyanobacterial genomes were removed. In addition to the PacBio sequencing, short read sequencing was done to support hybrid genome assembly and polishing. Short reads were trimmed using Fastp and assembled together with the filtered PacBio reads using the hybrid genome assembler Masurca. The hybrid genome assembly was polished with cleaned short reads using Pilon with a minimum read depth of 10. Finally, we used blobtools v.1.1.1, the NCBI nr database and the average coverage Illumina reads) of each scaffold to remove scaffolds of contaminant origin. After visual assessment we kept scaffolds with a taxonomic assignment of viridiplantae or streptophyte. Scaffolds with other taxonomic assignments were discarded.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC