figshare
Browse

18S and ITS read separation file and database

dataset
posted on 2024-12-12, 05:41 authored by NEIL YOUNGNEIL YOUNG, Lucas HugginsLucas Huggins
<p dir="ltr">PBlat Read Separation</p><p><br></p><p dir="ltr">Sequencing data was demultiplexed using MinKNOW and fastq files for each barcode concatenated prior to downstream analysis. </p><p dir="ltr">Because both the 18S rDNA and ITS1-to-ITS2 sequences were pooled and given the same barcode for each sample analysed these firstly had to be separated using pblat (M. Wang & Kong, 2019) and seqtk (https://github.com/lh3/seqtk). </p><p dir="ltr">To conduct separation and binning of 18S rDNA and ITS sequences into different files a database of nematode 18S rDNA sequences was built just using the region of the 18S rDNA targeted by our primers. </p><p dir="ltr">Using a pblat minimum score value of 50, the 18S rDNA sequences from a barcode sequencing file were compared to our pblat 18S rDNA database and extracted to form an 18S rDNA sequence file, whilst the remaining sequences were used to form an ITS1-to-ITS2 sequence file. </p><p><br></p><p dir="ltr">Code is</p><p><br></p><p dir="ltr">conda activate pblat</p><p dir="ltr">for i in {01..75}; </p><p dir="ltr">do</p><p dir="ltr">DIR=/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2</p><p dir="ltr">REF=18S_ref_seqs_for_Pblat_v2.fasta</p><p dir="ltr">conda run -n seqtk seqtk seq -A /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > tmp.fa</p><p dir="ltr">conda run -n pblat pblat -noHead minScore=50 -threads=48 18S_ref_seqs_for_Pblat_v2.fasta tmp.fa barcode${i}.tmp.pblat.psl</p><p dir="ltr">awk -F"\t" '{print $10 }' barcode${i}.tmp.pblat.psl | sort | uniq | wc -l > barcode${i}.count.txt</p><p dir="ltr">awk '{if ($1>=50) print $10}' barcode${i}.tmp.pblat.psl | sort | uniq > barcode${i}.tmp.pblat.header.txt</p><p dir="ltr">conda run -n seqtk seqtk subseq /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz barcode${i}.tmp.pblat.header.txt > barcode${i}.tmp.pblat.fastq</p><p dir="ltr">seqkit grep -v -f <(seqkit seq -ni /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}.tmp.pblat.fastq) /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > barcode${i}.tmp.inverse.pblat.fastq</p><p dir="ltr">done</p><p><br></p><p dir="ltr">NOTE:</p><p dir="ltr">Barcode[#].tmp.pblat.fastq = fastq file of 18S reads</p><p dir="ltr">Barcode[#].tmp.inverse.pblat.fastq = fastq file of ITS reads and other non-18S reads</p><p><br></p><p><br></p><p><br></p>

History

Usage metrics

    University of Melbourne

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC