Table S1 - Genomic sequencing library statistics

2018-02-23T01:16:53Z (GMT) by Timothy Fallon Sarah Sander Lower

The level of non-eukaryote contamination of the raw read data for each P. pyralis library was assessed using kraken v1.0 using a dust-masked minikraken database to eliminate comparison with repetitive sequences. Overall contamination levels were low, in agreement with a low level of contamination in our final assembly. On average, contamination was 3.5% in the PacBio reads (whole body) and 1.6% in the Illumina reads (only thorax). There was no support for Wolbachia in any of the P. pyralis libraries, with the exception of a single read from a single library which had a kraken hit to Wolbachia. QUAST version 4.3, was used to calculate genome quality statistics for comparison and optimization of assembly methods. BUSCO (v3.0.2) was used to estimate the percentage of expected single copy conserved orthologs captured in our assemblies and a subset of previously published beetle genome assemblies. The endopterygota_odb9 (metamorphosing insects) BUSCO set was used. The bacteria_odb9 gene set was used to identify potential contaminants by screening contigs and scaffolds for conserved bacterial genes. For genome predictions from beetles, the parameter “--species tribolium2012” was used to improve the BUSCO internal Augustus gene predictions. For non-beetle insect genome predictions, “--species=fly” was used.