Using long reads, optical maps and long-range scaffolding to improve the Diaphorina citri genome

<div>The current <i>Diaphorina citri</i> draft genome assembly (NCBI-Diaci_1.1) was sequenced with Illumina paired-end and mate-pair reads. Low coverage Pacbio was used for scaffolding and the scaffold N50 for this 486Mb assembly (163,023 scaffolds) is 109.8kb. A BUSCO (Simão et al., 2015) analysis shows a significant number of conserved single-copy markers missing: (24.9%) for the 3,350 Hemipteran marker set. A community-driven genome annotation has also identified a number of misassemblies and missing genes in the current genome (Saha 2017). This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and potentially exacerbated by the use of short reads. We have generated 36.2Gb of Pacbio long reads from 41 SMRT cells with a coverage of 80X for the 400-450Mb psyllid genome. The Canu assembler (Koren 2016) was used to create an interim assembly with a contig N50 of 115.8kb and 8300 contigs. This will be polished with Pacbio and Illumina paired-end reads followed by scaffolding with Illumina mate-pair reads. Chicago libraries (Putnam 2016) utilize chromatin crosslinks to associate sequences originating from the same large DNA fragment and will also be used for scaffolding. BioNano optical maps generated using high molecular weight DNA from adult psyllid tissue will be used to provide long-range scaffolding. This will be the first time all these methods have been applied to resolve an insect genome from a highly heterogeneous sample. The new assembly is available on and Ag Data Commons.</div><div><br></div>