<p dir="ltr">Peppermint clones are sterile hybrids that come from hybridization events between three different parental genomes: diploids <i>M. suaveolens</i> (apple mint) and <i>M. longifolia</i> (horse mint), and octoploid <i>Mentha aquatica</i> (water mint). Genome assemblies for both<i> M. longifolia</i> and <i>M. suaveolens</i> have been previously published, but no genome for water mint has been developed so far. Water mints are highly polyploid, often auto-octoploid (2n = 96). Here we describe the sequencing, assembly, and annotation of the genome of a clone of <i>M. aquatica </i>obtained from the USDA germplasm collection and called <i>M. aquatica var citrata 14 </i>(CMEN 678, PI 617488). Water mints are not typically grown commercially but we expect that this genome will help inform the parental contributions to peppermint oil and enable future breeding approaches. </p><p dir="ltr">Two sets of PacBio HiFi long-read sequencing totaling 4.9 million reads with an average length of 11.3 kb were assembled using the assembly tool hifiasm. Hifiasm was specifically run with the expected ploidy of 8x to generate a primary assembly. The resulting assembly had an N50 of 21 Mb, 458 contigs, and an assembled total size of 2.9 Gb. Next, to scaffold the contigs into pseudochromosomes, the largest 153 scaffolds (size > 5Mb) were mapped using minimap2 to the chromosome-level assembly of <i>M. suaveolens </i>to assign contigs to chromosomes. The largest non-overlapping contigs representing every mentha chromosome were manually selected and linked with each other to form a consensus genomic reference. The resulting genomic reference contained 12 superscaffolds with an N50 of 30.28Mb, and overall size of 353Mb.</p><p dir="ltr">This genomic superscaffold assembly was analyzed using the BUSCO eudicots_odb10 dataset, which reported the following statistics: 91.4% complete, 0.8% fragmented, and 7.8% missing BUSCO. The assembly was processed by the Braker3 pipeline using reads from 11 RNA-Seq libraries prepared from stems, runners, buds, flowers, developing leaves, mature leaves and roots, as well as four replicated libraries of mature leaves sampled from field grown plants. The RNA-Seq libraries were combined and the reads mapped to the draft genomic reference using hisat2. The resulting file was converted into a bam file using samtools and used as input for braker3. Braker3 generated a CDS annotation containing 95,553 unique transcripts, representing a coding space of 106 Mb. Finally, these CDS transcripts were input into Omicsbox (Biobam Bioinformatics) for functional annotation. This included mapping to NCBI’s NR database, InterProScan protein mapping, and GO Annotation. The resulting annotation included 69,536 transcripts with at least one type of functional annotation, and 26,017 with no associated annotation data.</p><p dir="ltr">Limitations</p><p dir="ltr">This is a first draft assembly of a specific clone of<i> M. aquatica var citrata 14</i> that was selected because of high fertility and potential use in crosses. It is believed to be an autooctoploid and it was therefore challenging and outside of the scope of this project to produce a haplotype-phased assembly. Instead, to draft the assembly into a usable resource, the decision was made to select contigs to form a coherent 1x haploid superscaffold genome. This was performed by comparison to the previously assembled genome of diploid <i>M. suaveolens</i>. It is therefore possible that the assembled consensus sequence represents a mixture of sequences from different haplotypes of the octoploid genome. Additionally, as the retained contigs were manually selected and not scaffolded, noticeable gaps remain in each chromosomal reference superscaffold.</p><p dir="ltr">Upon closer inspection of the repeat sequences present in the scaffold obtained in the context of this assembly, and comparison to the repeats found in other mentha species, it is now unclear whether this particular <i>M. aquatica</i> clone is indeed an autopolyploid <i>M. aquatica</i> or a hybrid of two or more parental species. </p><p dir="ltr">The annotation was based on 11 libraries representing 8 tissue types. It is thus possible that some genes were not detected and annotated because they are not significantly expressed in any of the tissue types sampled. Conversely, many of the annotated transcripts did not generate any functional or GO annotation links, suggesting that they might have been incorrectly annotated. The quality of the mapped annotations therefore ranged from no transcript functional information to transcripts with protein blast hits, GO annotation, and protein functional domain(s).</p>
Funding
This work was funded by the Mint Industry Research Council