Title: The Pan-genome of the Cultivated Soybean (PanSoy) Reveals an Extraordinarily Conserved Gene Content Abstract: Studies on structural variation in plants have revealed the inadequacy of a single reference genome for an entire species and suggest that it is necessary to build a species-representative genome, called a pan-genome to better capture the extent of both structural and nucleotide variation. Here, we present a pan-genome of cultivated soybean (Glycine max), termed PanSoy, constructed using the de novo genome assembly of 204 phylogenetically and geographically representative improved accessions selected from the larger GmHapMap collection. PanSoy uncovers 108 Mb (~11%) of novel nonreference sequences encompassing 3,621 protein-coding genes (including 1,659 novel genes) absent from the soybean “Williams 82” reference genome. Nonetheless, the core genome represents an exceptionally large proportion of the genome, with >90.6% of genes being shared by >99% of the accessions. A majority of PAVs encompassing genes could be confirmed with long-read sequencing on a subset of accessions. The PanSoy is a major step towards capturing the extent of genetic variation in cultivated soybean and provides a resource for soybean genomics research and breeding.

