A novel pipeline was employed to assemble the multinucleate, heterozygous genome of Rhizoctonia solani AG8.
A) Genomic DNA from multinucleate cells with variable nucleic copy numbers was prepared for next-generation sequencing (NGS) Illumina paired-end (B) and mate-paired (C) short-read libraries. The 3′ ends of read pairs from both (B) and (C) were tested for overlapping sequence, indicating short DNA fragment sizes. Overlapping pairs in mate-paired libraries (C) were discarded as these indicated paired-end contaminants which would lead to assembly errors. D) De novo assembly was performed combining the non-overlapping and overlapping paired-end read pairs that were merged into longer single-end reads. E) Redundant haplotypes where equivalent regions of the genome from multiple nuclei were present more than once in the assembly were merged into a single haplotype sequence. F) Non-overlapping mate-paired reads were used to build assembled sequences into larger scaffold sequences. Stretches of unknown bases (polyN) in the assembly were filled where possible (G) by alignment of genomic NGS reads to the assembly and regions predicted to contain tandem-duplication errors were corrected (H). Processes F, G and H were repeated for several rounds to ensure complete assembly. I) Minor assembly errors and the presence of RIP mutation between nuclei were corrected by substitution of the most dominant or pre-RIP allele. The final RIP-depleted, haploid consensus genome assembly (J) was manually annotated using a combination of RNA-seq and protein homology supporting evidence, producing a final dataset of 13,964 protein-coding genes (K).