First draft of an annotated genome for a lichenised strain of the green alga Diplosphaera chodatii (Prasiolales, Trebouxiophyceae)

ABSTRACT Although genome sequences of lichenized fungi are increasingly becoming available, genome sequences of microalgae involved in the lichen symbiosis are still scarce. For lichenized eukaryotic algae, genome sequencing has focused mostly on Trebouxia and Asterochloris, with little genomic data available for Stichococcus-like algae, such as Diplosphaera. The genus Diplosphaera is a common component of biological soil crusts, and often occurs associated with lichens of the family Verrucariaceae. It is characterized by cylindrical to spherical cells containing a plate-like chloroplast, and more specifically by a vegetative cell division that leads to the formation of typical two- to four-celled clusters. Here, we present a draft genome sequence for the algal partner of an Australian lichen specimen of Endocarpon pusillum. The genome was sequenced with Pac Bio long read and Illumina short read technologies, and transcriptome data were generated to inform the structural annotations. This algal strain is here identified as Diplosphaera chodatii based on nuSSU and ITS data. Compared with closely related lichenized and non-lichenized algae, the genome of D. chodatii stands out for its large size (85.6 Mb) and gene content (21,261 protein-encoding regions), as well as its high rate of duplicated genes (60% of the BUSCO genes are duplicated). These results suggest that whole genome duplication or large-scale segmental duplications may have occurred in the evolutionary history of this algal species. HIGHLIGHTS Little genome data are available for lichenized algae. We generated the first genome for a lichenized Diplosphaera chodatii. Results suggest a possible whole genome duplication in this species.

As part of a project to improve the outcome of lichen in vitro resynthesis, including through the application of metabolic network modelling (Nazem-Bokaee et al., 2021), the mycobiont and photobiont of an Australian specimen of Endocarpon pusillum were isolated and grown axenically (Mead & Gueidan, 2021, this study) and their genomes were sequenced, assembled and annotated.In this article we present the draft genome of the algal partner Diplosphaera chodatii, while the fungal genome was published previously (Mead & Gueidan, 2020).This new algal genome is described and compared with available genomes of related green algae.

Strain and culture conditions
A specimen of the lichen Endocarpon pusillum (C.Gueidan 2364, CANB913709) was collected from Black Mountain (Canberra, Australia) in 2016 and fertile squamules were used to shoot fungal spores into potato dextrose agar (PDA, Sigma-Aldrich) plates.Plates were then inspected using a stereomicroscope and single fungal spores surrounded by hymenial algae (Diplosphaera chodatii) were isolated and transferred to new PDA plates.Plates were then incubated for several weeks in a Climatron growth chamber (Thermoline Scientific) under the following conditions: day cycle of 12 hours of light (200 µmol m -2 s -1 ) at 20°C and 12 hours of dark at 18°C.Once algal colonies were visible, algal cells were isolated from the fungal spore using a sterile inoculation needle and transferred to plates with Bold's Basal Medium (BBM) agar (Bischoff & Bold, 1963, as modified by Starr & Zeikus, 1993; see also SAG medium recipe v. 10.2008) under the same light and temperature conditions.If no bacterial or fungal contaminations were visible, these plates were then used to collect algal cells to transfer to 250 ml Erlenmeyer flasks with 100-150 ml of liquid BBM media.These flasks were grown in the Climatron chamber on an OS1 orbital shaker (Bioline) under the same light and temperature conditions.
To obtain single cell algal cultures, subsamples of these liquid cultures were run through a FACSAria sorting flow cytometer (BD Biosciences) at the Australian National University, using the alga's natural fluorescence.Fluorescence analyses were done with FACSDiva v.8 (BD Biosciences).Single cells were retrieved on 24-to 48-well cell plates with BBM agar media.These single-cell isolates were incubated in the Climatron under the same light and temperature conditions for several weeks.Once colonies were visible, they were transferred to both liquid and solid BBM media and grown as described above.These liquid and solid cultures were transferred to new media when required.Fresh liquid cultures of Diplosphaera chodatii strain CS-1475 were used for DNA and RNA extractions.The strain CS-1475 was deposited at the Australian National Algae Culture Collection (CSIRO, Hobart, Australia).

Genome sequencing and assembly
Both short and long read data were generated for the Diplosphaera chodatii strain CS-1475.For short read sequencing, algal cells were collected from liquid BBM medium using a vacuum Steritop filter (Millipore).They were added to a tube with 0.5 mm standard silica beads (Benchmark) containing 400 µl of phenol-chloroform and 400 µl of 2% SDS-1% BME lysis buffer.They were ground in a Precellys tissue grinder (Bertin Instruments) for 3 cycles of 10s at 6000 rpm.The tube was centrifuged at 13 000 g for 5 min.The supernatant was transferred and washed two times in chloroform before being incubated at 37°C for 15 min with 1 µl of RNAse A. The DNA was then extracted using an equal volume of PEG-SPRI beads suspension (NEB).The beads were then washed with 80% ethanol, then resuspended in 30 µl of nuclease free water to elute the DNA.The bead wash was repeated, eluting in 10 µl of water.A Nextera DNA Flex (Illumina) library was prepared for the genomic DNA of Diplosphaera chodatii and the library was sequenced together with one other Trebouxiophycean alga onto one mid output 2 × 75 bp NextSeq 500 run.Library preparation and sequencing were done at the Ramaciotti Centre for Genomics (UNSW Sydney, Australia).
For long read sequencing, cells were collected from liquid BBM medium using a vacuum Steritop filter (Millipore).Algal cells collected on the filter were resuspended with a small volume of fresh medium, the cell suspension transferred to three 2 ml Eppendorf tubes and the tubes centrifuged to obtain pellets.Dry pellets were then resuspended in TES buffer and ground for 15s at 6000 rpm using tubes with 0.5 mm standard silica beads (Benchmark) with a Precellys tissue grinder (Bertin Instruments).Proteinase K and RNAse A treatments were done successively after the grinding, then the genomic DNA was extracted using a CTAB protocol as described by Möller et al. (1992).Pippin pulse gel (Sage Science), Nanodrop (Thermo Scientific) and Qubit (Invitrogen) were used by Ramaciotti to assess the quality and quantity of genomic DNA.An additional AMPure bead cleanup (Beckman Coulter) was performed by Ramaciotti to remove remaining contaminants.A 20 kb library was then prepared by Ramaciotti and the library sequenced on one SMRT cell on the Sequel platform (Pacific Biosciences).
The assembly was done by AGRF Bioinformatics (Melbourne).PacBio subreads BAM files were parsed to FASTA using SAMtools (Danecek et al., 2021).The initial assembly was constructed using Flye (Kolmogorov et al., 2019).Assembly contiguity and completeness were assessed using QUAST (Gurevich et al., 2013) and BUSCO v. 5.2.2 using the Chlorophyta dataset (Waterhouse et al., 2018;Manni et al., 2021).In parallel, Illumina reads were quality trimmed using TrimGalore (https://github.com/FelixKrueger/TrimGalore).The cleaned and trimmed Illumina reads were then aligned with the initial assembly.Further, the aligned reads and assembly were subjected to two rounds of polishing with Pilon (Walker et al., 2014).To verify assembly improvements, contiguity and completeness were re-assessed as described above.

Transcriptome sequencing
For RNA sequencing, algal cells were collected from liquid BBM medium using a vacuum Steritop filter (Millipore).They were added to a tube with 0.5 mm standard silica beads (Benchmark) containing 200 µl of 2% SDS and 1% beta mercaptoethanol in TE buffer.The tube was then briefly vortexed and 200 µl of acidic phenol-chloroform was added.The suspension was ground in a Precellys tissue grinder (Bertin Instruments) using three cycles of 10s at 6000 rpm, with 3 s off between cycles.The tube was then centrifuged at 13 000 g for 3 min.The supernatant was added to a tube containing 400 µl of acidic phenol-chloroform and 200 µl of 3 M potassium acetate.The tube was briefly vortexed and centrifuged at 13 000 g for 3 min.The supernatant was then transferred to a tube containing 600 µl of RNAse-free chloroform, which was vortexed and then centrifuged for 30s at 13 000 g.The supernatant was again transferred to a tube containing 600 µl of RNAse-free chloroform, which was vortexed and then centrifuged for 30s at 13 000 g. Finally, the supernatant was added to a tube with 900 µl of isopropanol and 140 µl of 5 M sodium acetate.The tube was inverted 20 times and chilled at −20°C for 20 min before being centrifuged for 15 min at 20 000 g.The supernatant was then discarded, and the pellet washed with 200 µl of 100% ethanol, then dissolved in an appropriate amount of nuclease-free water with 0.5 µl of RiboLock RNAse inhibitor.The RNA was then purified using a DNAse I treatment and cleaned using SPRI Beads (NEB).The RNA sample was submitted to the BRF (ANU) where a 75 bp PE TruSeq library (Illumina) was prepared and sequenced on the Illumina mid output NextSeq500 platform together with one other RNA sample.

Genome annotation
The structural annotation was done by AGRF Bioinformatics (Melbourne).To generate these annotations for Diplosphaera, Illumina RNA sequence reads were trimmed and cleaned using TrimGalore, assembled with Flye, and polished with Pilon.They were then mapped to the polished genome assembly using START (Dobin et al., 2013).Gene prediction was carried out with Augustus v. 3.4.0(Stanke et al., 2008) using the assembled transcriptome.Structural annotation statistics were obtained with BRAKER2 (Brůna et al., 2021) using Augustus and GeneMark-EX v.4 (Brůna et al., 2021).
The functional annotation was performed using OmicsBox v. 2.1.14(BioBam Bioinformatics).The predicted protein sequences were loaded in FASTA format, and a BLAST search carried out using CloudBlast.The non-redundant protein sequence database nr v. 5 was queried with Chlorophyta as the taxonomy filter.The search was performed with a maximum e-value of 1.0e -3 and allowed a maximum of 20 BLAST hits.For identifying GO terms, protein sequences with BLAST hits were mapped against the GOA database v. 2021.11(Gotz et al., 2008).Next, the functional labels were retrieved using an annotation cut-off value of 55 and GO weight of five.Finally, 'Combined Analysis' was performed on the mapped sequences to identify enzyme codes by searching both the KEGG and Reactome databases.
Long genome sequence reads, short genome sequence reads and transcriptome sequence reads have been deposited into the NCBI Sequence Read Archive (PRJNA606981).Assembly and annotations are also available from NCBI under the same bioProject (PRJNA606981).Mitochondrial and chloroplast genome sequences and annotations were deposited to NCBI under the accession numbers OP846047 and OP846046, respectively.

Strain phylogenetic placement
As both GenBank BLAST comparisons and a previous phylogenetic framework at the Trebouxiophyceae level (Thüs et al., 2011) suggested that Endocarpon photobionts belong to Diplosphaera, the following phylogenetic analysis was restricted to a clade including this genus and its sister taxon Tetratostichococcus, as shown in Pröschold & Darienko (2020).Sequences of three strains of Deuterostichococcus epilithicus were used as an outgroup (Pröschold & Darienko, 2020).The sequences included the nuclear ribosomal SSU and ITS regions.To find the same regions in the genome of our strain of Diplosphaera chodatii, a sequence from GenBank (MT078181) was mapped onto our assembled genome using Geneious Prime v. 2022.1.1 (Geneious) and the corresponding sequence saved as FASTA.All sequences (Supplementary table S1) were aligned using Mesquite v. 3.61 (Maddison & Maddison, 2011).Ambiguous regions delimited as described in Lutzoni et al. (2000) were excluded from the alignment.The dataset was analysed using a Maximum likelihood (ML) criterion with RAxML v. 8.2.12 (Stamatakis et al., 2005(Stamatakis et al., , 2008)), as implemented on the CIPRES Web Portal (http://www.phylo.org;Miller et al., 2010).The model used for the tree search was GTRCAT and 1000 pseudo-replicates were carried out for the bootstrap analysis.The resulting tree was visualized with FigTree v. 1.4.4 (https://github.com/rambaut/figtree) and edited with Illustrator v. 25.4.1 (Adobe).

Strain phylogenetic placement
The isolated algal strain displayed the characteristics of the algal genus Diplosphaera, with short cylindrical to spherical cells, which sometimes occurred as two-celled clusters, and a plate-like chloroplast in each cell (Fig. 1).The nuclear ribosomal SSU and ITS alignment included a total of 4003 nucleotide positions, 2776 of which could be unambiguously aligned.This corresponded to 118 distinct alignment patterns, and no gaps were present.The resulting ML tree (Fig. 2) shows high support (100% bootstrap) for the placement of the Diplosphaera strain CS-1475 within a clade of Diplosphaera chodatii strains.It, therefore, confirms the identification of the Diplosphaera associated with the lichenized fungus Endocarpon pusillum as Diplosphaera chodatii, as recently circumscribed by Pröschold & Darienko (2020).

Whole genome assembly and annotations
PacBio long and Illumina short reads were assembled into 62 contigs (Table 1).Most contigs (47 out of 62) were longer than 50 000 bp.The largest contig was 4.3 Mb, and the N50 value was 2.6 Mb.The total length of the assembly was 85.6 Mb and the estimated coverage about 200×.Based on a BUSCO search on the Chlorophyta dataset (1519 genes), the genome is estimated to be 88.4% complete (Table 2).Among the complete set of BUSCO genes, 60% were duplicated.The GC content was of 60.2% for the nuclear genome, 27.1% for the chloroplast genome and 26.5% for the mitochondrial genome (Table 3).
A total of 21 261 protein-encoding regions were structurally annotated using Augustus.The mean gene length was 3149 bp and the mean number of exons per CDS was 10 (Supplementary table S2).From the total of 21 261 protein sequences structurally annotated, 17 680 sequences (83%) were identified with LAST hits, 16 374 (77%) were mapped to GO terms, and 11 197 (53%) annotated with functional labels.There were 3581 protein sequences (17%) with no LAST hits.The average length of the protein sequences was 508 aa and 99.4% of all protein sequences had a length smaller than 2500 aa (Supplementary fig.S1).
Figure 3 shows the counts of protein sequences (for 75% of total mapped sequences) based on the assigned GO terms for each of the main GO categories (i.e.biological processes, cellular components and molecular functions).Under the 'biological processes' GO category (Fig. 3a), the main protein functions are metabolic, translation, transport and phosphorylation.The top 'molecular functions' were predicted to be those acting as transferases, hydrolases, oxidoreductases as well as metal ion and nucleotide binding activities (Fig. 3b).With respect to the 'cellular components' GO category (Fig. 3c), the counts for protein sequences in membranes were almost double the counts of sequences in other major compartments such as the cytosol or nucleus.
The mitochondrial genome was 83 919 bp long and included 29 protein-encoding genes (Fig. 5).Compared with the complete mitochondrial genome of Trebouxia lynnae (Martínez-Alberola et al., 2019), 29 of the 33 protein-encoding genes annotated in Trebouxia were also found in Diplosphaera.

Comparison with other algal nuclear and organellar genomes and possible nuclear genome duplication
The genome of Diplosphaera chodatii CS-1475 was compared with the published genomes of two lichenized algae, Asterochloris glomerata (Armaleo et al., 2019) and Trebouxia lynnae (Martínez-Alberola et al., 2019, 2020; organellar genomes only), and three free-living algae, Chlorella variabilis (Blanc et al., 2010), Coccomyxa subellipsoidea (Blanc et al., 2012) and Stichococcus bacillaris (Lemieux et al., 2014).All these taxa belong to the Trebouxiophyceae.The organellar genomes of Diplosphaera chodatii appear as standard in comparison to these other species (Table 3).The length of the mitochondrial genome of D. chodatii is 83 919 bp and within the range (65 497-110 932 bp) of the three other algal species with available data (A.glomerata, C. subellipsoidea and T. lynnae).The number of mitochondrial protein-encoding genes are also similar among these algal species, with 29 genes in D. chodatii, 32 in A. glomerata and 33 in T. lynnae.The length of the chloroplast genome of D. chodatii (174 073 bp) also falls within the range (116 952-303 163 bp) found in other algae it was compared with (Table 3), and the number of chloroplast protein-encoding genes was similar among these algal species (75 genes for D. chodatii, 73 for A. glomerata and 77 for T. lynnae).
However, the nuclear genome of D. chodatii is significantly larger in size  and in number of predicted genes (21 261 genes for D. chodatii vs. 10 025 genes for A. glomerata) than other algae it was compared with.In eukaryotes, several processes can lead to increased nuclear genome size, including whole genome duplication and the accumulation of transposable elements (Ayala-Usma et al., 2021).Here, in addition to the larger genome size and higher number of genes, the BUSCO analysis reveals an important duplication of core orthologous genes, hinting more towards whole genome duplication or, at least, the duplication of largescale fragments of the genome.This genome duplication could be the result of autoploidy or hybridization.Although further work will be necessary to determine the cause of the large genome size and high gene duplication rate in this strain of D. chodatii, this work highlights the importance of genome studies of lichenized algae.Thanks to their association with fungi, these poorly studied microalgae may have evolved unique genome characteristics and biological functions and processes.

Fig. 3 .
Fig. 3. Counts of protein sequences in Diplosphaera chodatii based on their GO annotations: a) biological processes, b) cellular components, c) molecular functions.The sequence counts only shown for the first 25 GO terms (including repeated sequences) in each GO category and at all GO levels (up to level 5).

Fig. 4 .
photosystem assembly/stability factors RubisCO large subunit ATP synthase cytochrome b/f complex photosystem II photosystem I

Fig. 5 .
Fig. 5. Gene map of the draft mitochondrial genome of the green alga Diplosphaera chodatii strain CS-1475.The inner graph represents the GC content.Transcription directions are as follows: clockwise if inside the circle and counterclockwise if outside the circle.An asterisk indicates the presence of at least one intron.

Table 1 .
Assembly metrics for the draft genome of Diplosphaera chodatii CS-1475.

Table 2 .
Assembly completeness of the draft genome of Diplosphaera chodatii CS-1475 as estimated using BUSCO and the Chlorophyta dataset.

Table 3 .
Comparison of genome statistics of Diplosphaera chodatii CS-1475 and other Trebouxiophycean algae, either lichenized or free-living.