figshare
Browse

Assemblies_and_data.zip

Download (4.28 GB)
dataset
posted on 2025-01-13, 11:49 authored by Monika WisniewskaMonika Wisniewska, Jiří Kyslík, Gema Alama-Bermejo, Alena Lövy, Martin KoliskoMartin Kolisko, Astrid S. Holzer, Anush Kosakyan

All datasets used for the assemblies, annotations, differential expression analysis, enrichment analysis, and variant surface proteins (VSPs) identification.


Assemblies directory contains: raw Trinity assemblies (genome-guided: Trinity_genome_guided_assembly.fasta, de novo: Trinity_de_novo_assembly.fasta), assemblies cleaned based on the taxonomy (genome-guided: Trinity_genome_guided_assembly_cleaned.fasta, de novo: Trinity_de_novo_assembly_cleaned.fasta), file with the original and simplified headers of the genome-guided assembly (Trinity_genome_guided_assembly_headers.map), predicted proteins for the genome-guided assembly (Trinity_genome_guided_assembly_transdecoder.pep), excel spread sheet with the combined annotations for the genome-guided clean assembly (Genome_guided_Trinity_assembly_annotations_summary.xlsx).

The cleaned_reads directory contains: reads used for the genome-guided and de novo assemblies.

The DE directory contains count matrix computed using featureCounts of the Subread R package (featureCounts_counts_matrix_for_DE_analysis.txt) and used for the differential expression analysis; Table with the differentially expressed genes, their fold change values, annotations with assigned categories as well as information if gene is specific to S. molnari (DEGs_annotations_categories.xlsx); table (pathogenicity_related_transcripts.pdf) and corresponding fasta file showing genes classified as pathogenecity-related (pathogenicity_related_transcripts.fas).

The S_molnari_unique_genes contains an excel spread sheet (S_molnari_unique_genes.xlsx) with the gene identifiers and sequences identified as the genes unique to the S. molnari.

The VSPs directory contains: instal fasta file with the selected putative VSPs protein sequences (VSPs.faa), alignment computed using mafft-linsi (VSPs.aln), and the manually trimmed alignment showing the transmembrane domain and the N-terminus motif (VSPs.trim).


Funding

Czech Science Foundation number 20-30321Y

Czech Science Foundation number 19-25536Y

Czech Science Foundation number 19-28399X

Centre for Research of Pathogenicity and Virulence of Parasites number CZ.02.1.01/0.0/0.0/16_019/0000759

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC