Appendix 2-figure 2: Alat1.2 Blobtools results
datasetmodified on 2018-08-13, 17:48
Potential contaminants in Alat1.2 were identified using the blobtools toolset (v.1.0). First, scaffolds were compared to known sequences by performing a blastn (v2.5.0+) nucleotide sequence similarity search against the NCBI nt database and a diamond (v.0.9.10) translated nucleotide sequence similarity search against the of Uniprot reference proteomes (July 2017). Using this similarity information, scaffolds were annotated with blobtools (parameters “-x bestsumorder”). We also inspected the read coverage by mapping the paired-end reads (FFGPE_PE200) on the genome using bowtie2. The contigs derived from potential contaminants and/or poor quality contigs were then removed: contigs with higher %GC (>50%) with bacterial hits or no database hits and showing low read coverage (<30x) (see Supplementary Figure 18.104.22.168). This process removed 1925 scaffolds (1.17 Mbp), representing 26.3% of the scaffold number and 1.3% of the nucleotides of Alat1.2, producing the final filtered assembly, dubbed Alat1.3.