DNA barcoding as a tool for early warning and monitoring alien duckweeds (Lemna sp.pl.): the case of Central Italy

Abstract Aquatic habitats are vulnerable to the invasion of alien species, so early warning protocols are necessary for eradication. The presence in Italy of two alien duckweeds in freshwaters has been documented: Lemna minuta, that showed high invasivity, and L. valdiviana, still confined to south Lazio. These two species may be mistaken for each other and for the domestic L. minor and L. gibba due to morphological variation. Here, we assess the applicability of DNA barcoding as a complement to morphological analysis for monitoring the spread of alien Lemna. We chose two chloroplast genome sequences for their ability to discriminate all Lemna species: the 5’ intron of the trnK gene and the matK gene. Among 48 samples of Lemna collected at 20 sites in Central Italy, 20 were identified as L. minor, 19 as L. minuta, five as L. trisulca and four as L. gibba. L. minuta was present at most sampling sites; in particular, at six locations of Lake Trasimeno, eight L. minuta samples were found. We demonstrate that DNA sequence analyses with cost-effective barcoding techniques can effectively support expert efforts in species determination for an early alert system of invasive Lemna species.


Introduction
According to Hulme et al. (2013), the term "invasive" can be referred to "established alien organisms that are rapidly extending their range in the new region, usually causing significant harm to biological diversity, ecosystem functioning, socio-economic values, and/or human health in invaded regions". Invasive alien species (IAS) are recognized as one of the main five pressures directly driving biodiversity loss (Slingenberg et al. 2009;Burkmar and Bell 2015). Of the 174 European species listed as critically endangered by the IUCN Red List, 65 are endangered also because of introduced species (Shine et al. 2010;Sonigo et al. 2011;Scalera et al. 2016). Invasive alien species can threaten biodiversity in various ways, directly competing with native species for resources, reducing autochthonous genetic variation, eroding gene pools, altering the structure and functioning of communities, ecosystems and habitats due to different interactions than native species with other organisms and abiotic factors (Hulme 2007). Biological invasion also determines a high socio-economic impact. Ecological changes result in changes of ecosystem services. To avoid or compensate the loss of these services or in order to directly act on the invaders, local and regional administrations spend a hardly accounting amount of money (Vil a et al. 2010).
It has been well-documented that an efficient early warning system aimed at a timely eradication can be a valid strategy to fight the establishment of new alien species (EEA 2010;Leuven et al. 2017;EU Regulation n.1143/2014. DNA sequence markers capable to unequivocally identify species, such as the so-called DNA barcodes (Erickson et al. 2008;Hollingsworth et al. 2009Hollingsworth et al. , 2011Bhargava and Sharma 2013) may be tremendously useful for early warning of the spread of invasive alien species, especially when morphological identification is not obvious. Fast vegetative spreading ability, large distribution, minimal size and the scarcity of distinctive morphologic traits make duckweeds a good case study to assess early warning strategies.
In Central Italy, two alien duckweeds native of America have been documented so far, L. minuta and L. valdiviana (Iberite et al. 2011, Ceschin et al. 2016a. L. minuta was first reported in Europe (France) in 1965 (Jovet and Jovet-Ast 1966); it is now widespread in many European countries (Misfud 2010;Hussner 2012) and is considered a highly invasive alien species. The presence of L. minuta has been demonstrated in Italy by several authors (Celesti-Grapow et al. 2009, Iberite et al. 2011, Ceschin et al. 2016b. L. valdiviana was documented for the first time in Italy by Podda et al. (2010) and shortly after by Iberite et al. (2011); its distribution in Italy seems to be limited to the Administrative Regions of Lazio and Sardegna, but this species could have been overlooked in other parts of Italy. Iberite et al. (2011) emphasize the importance of morphological and taxonomic analyses; in fact, this species can be mistaken not only with L. minuta but also with L. minor, due to its morphological variation (Kandeler 1975).
The rapidity of clonal propagation of duckweeds makes them attractive for several research and practical uses. They are an important food for birds and fish and a resource for aquaculture and animal feed (Goopy and Murray 2003;Khanum et al. 2005). Wastewater treatment with Lemnoideae (the so called Lemna system) has been proposed as an efficient way to decontaminate municipal water (Ozengin and Elmaci 2007); thanks to their rapid growth duckweeds have potential also as biofuel sources (Appenroth 2002;Cheng and Stomp 2009). Some duckweed species are used as standards for assessing water quality due to their sensitivity to a wide range of environmental contaminants such as heavy metals, nitrates, phosphates (Brain and Solomon 2007) and pesticides (Reale et al. 2016). For this reason, they are also useful as model plants for diverse studies in plant biology, toxicology and environmental pollution (Les et al. 2002, Mardanov et al. 2008Reale et al. 2016).
For such laboratory or field uses, sometimes alien species, or even taxonomically or geographically uncharacterized species or lab clones are adopted (Yamamoto et al. 2001). Therefore, specific protocols to prevent unwanted, accidental escape should be adopted (EU Regulation n. 1143/2014; Heywood and Brunel, 2012). However, such an extensive use of duckweeds, the rapid clonal propagation and the capability of these plants to be easily involuntarily dispersed by animals and humans make these plants very dangerous.
Species classification of Lemnoideae based on morphological traits is challenging due to the small size of the fronds, their simple and reduced morphology and infrequent flowering, resulting in a scarcity of morphological markers. In particular, L. minuta and L. minor are not easily distinguished on a morphological basis. Vein number and frond length are the main morphological characters discriminating the two species, but these two characters are not completely reliable (Ceschin et al. 2016c). Veins are very often not well visible at the microscope and frond shape dependents on nutrient and light availability. Although additional diagnostic characters (width, apex and shape of the frond root length; Ceschin et al. 2016c) can improve morphological discrimination, a DNA-based identification method would be of high value for the taxonomy of Lemnoideae and for studying their population dynamics (Wang et al. 2010). By complementing expert morphological analyses, simple DNA analysis techniques can assist alien species identification, which may not be certain based on morphology only.
The concept and techniques of DNA barcoding (Erickson et al. 2008;Hollingsworth et al. 2009Hollingsworth et al. , 2011Bhargava and Sharma 2013) have been applied to the taxonomy of Lemnoideae. Rothwell et al. (2004) used the trnL-trnF intergenic spacer to investigate phylogenetic relationships between Lemnoideae and Aroids. Cabrera et al. (2008) used both coding (rbcL, matK) and non-coding (trnK and trnL intron, trnL-trnF spacer) plastid sequences for a solid phylogenetic analysis of Araceae at the genus level. The rpS16 intron was used for a phylogenetic analysis of 25 accessions representing nine species of the genera Landoltia, Lemna, Wolffia and Spirodela (Martirosyan et al. 2009). Wang et al. (2010) tested the performance of seven plastid DNA sequences, both coding (rpoB, rpoC1, rbcL, matK) and noncoding (atpF-atpH, psbK-psbI and trnH-psbA) for interspecific and intraspecific identification of 97 accessions belonging to 31 duckweed species. They found that the atpF-atpH noncoding spacer provided the lowest intraspecific distances and sufficient interspecific distances, and they proposed this sequence as a universal DNA barcoding marker for species identification of Lemnoideae. However, L. minuta and L. valdiviana could not be separated from each other by any of the tested sequences. Bog et al. (2013) employed rps16 and rpl16 sequences, alone and combined with AFLP markers, to identify Wolffia species, with partial success. Appenroth et al. (2013) reviewed the genotyping techniques applied to the taxonomy of Lemnoideae. They concluded that the single most informative marker appeared to be atpF-atpH, but several plastid DNA sequences are required to obtain reliable species identification. Most recently, Xu et al. (2015) used the chloroplast ribosomal protein S16 intron (rps16) and the atpF-atpH spacer to effectively classify 230 Lemnoideae samples from the Hainan Island, China.
Whole plastid genome sequencing would provide the definitive solution to interspecific and intraspecific identification (Wang et al. 2010;Taylor and Harris 2012), and the plastid genome of L. minor is available (Mardanov et al. 2008). However, the use of one or a few short plastid DNA sequences may combine sufficient identification power with affordable costs for environmental monitoring.
The objectives of this work were threefold: i) assessing the presence of alien duckweeds in Central Italy, including not only the well-known L. minuta and L. valdiviana but also those reported in nearby European countries in the same habitats and possibly overlooked in Italy such as L. turionifera and L. obscura; ii) assessing the reliability of morphological identification by developing a molecular barcoding method for species identification in the genus Lemna, based on two short plastid genome sequences, the trnK 5' intron and the matK gene, and ii) assessing these sequences for their power in phylogenetic analyses and within-species diversity estimation.
To this end, 56 environmental samples of Lemna from Central Italy and 29 samples of four Lemna species provided by several labs were used. Our results can serve as a basis for implementing containment practices of invasive Lemna species.

Plant materials
Fifty-six samples of Lemna sp.pl. were collected at 20 sites in Central Italy (Abruzzo, Emilia Romagna, Lazio, Marche and Umbria regions, Figure 1).
Forty-eight of them were successfully subjected to molecular analysis ("Collected samples" in Table 1). Based on field evaluation of morphological traits (Landolt 1980(Landolt , 1986Iamonico et al. 2010), the collected samples were tentatively classified into five different species. To estimate the ability of the selected barcoding sequences to correctly represent phylogenetic relationships between Lemna species, 32 additional previously characterized samples ("Reference samples" in Table 1), kindly provided by K. Appenroth, E. Landolt and W. L€ ammler or by K. Sumberov a, were included.

Amplification of barcode sequences
The sequence of the trnK 5' intron and the matK gene of all the 14 Lemna species were downloaded from GenBank (Supplemental Information Table S1). The matK coding sequence is at position 2133-3671 of the L. minor plastid genome sequence (GenBank NC_010109.1), and has a total length of 1538 bp. The trnK gene is at position 1843-4453 and has a total length of 2601 bp. Primer3 (Untergasser et al. 2012) was used to design primer pairs for each sequence (Supplemental Information Table S2), which give a 752 bp amplicon (2176-2928) for matK and a 688 bp amplicon (3672-4360) for trnK 5'.
PCR amplifications were performed with the Phire V R Plant Direct PCR kit (Thermo Fisher) according to the manufacturer's instructions, using 10 mM of each primer in 20 mL final volume. Leaf disks of 0.5 mm diameter from fresh or dry frozen tissue were used. Cycling was as follows: 98 C for 5 min, 40 cycles at 98 C for 5 seconds, 57 C for 5 seconds, 72 C for 20 seconds and final extension at 72 C for 1 min. PCR products were examined and quantified after agarose gel electrophoresis. When an amplicon was not obtained, 2 mL of the reaction products were used as template for a second PCR with the same primer concentrations, 5 mM dNTPs, 50 mM MgCl 2, 1 Unit Phire Taq polymerase (Thermo Fisher) in a final volume of 23 ml; the cycling conditions were: 98 C for 1 min, 30 cycles at 98 C for 30 seconds, 57 C for 30 seconds, 72 C for 30 seconds and final extension at 72 C for 2 min. PCR products were treated with the ExoSAP-IT kit (GE Healthcare) and sequenced using the BigDye V R Terminator v3.1 kit (Applied Biosystems). Sequencing reactions were purified using the BigDye V R XTerminator TM Purification kit (Applied Biosystems) and run in a ABI3130 XL automatic sequencer.

Sequence data analysis
Raw sequences were controlled using the Vector NTI Advance V R 11.5 software (Invitrogen), end-trimmed to avoid low-quality bases, assembled using the Contig Express function, aligned using the ClustalW algorithm and then aligned and analyzed using the Molecular Evolutionary Genetics Analysis version 7.0 (MEGA7) software (Kumar et al. 2016). The alignments used for phylogenetic analysis were 709 bp (matK), 673 bp (trnK intron) or the combined 1382 sequence. The Maximum Likelihood (ML) algorithm was used; for matK the method General Time Reversible (GTR) þ Gamma distributed (G) was selected, whereas for the trnK 5' intron, the method General Time Reversible (GTR) þ Gamma distributed with invariant sites (G þ I) was selected (Supplemental Information Tables S3, S4, S5).

Choice of barcoding sequences
Four chloroplast sequences among those commonly used for barcoding were tested for their ability to discriminate among all Lemna species: the trnK 5' intron, rbcL, matK and rpl16. The published sequences of these four markers for the 14 Lemna species were downloaded from GenBank and analyzed using MEGA 7 (Supplemental Information Fig. S1). The rbcL and rpl16 sequences did not allow to discriminate L. minuta from L. valdiviana, both exotic species that have been found in Europe (Supplemental Information Fig. S1).
Therefore, two sequences were selected for this work: the matK sequence, capable to discriminate all Lemna species with the exception of the tropical species L. turionifera and L. ecuadoriensis; and the trnK 5' intron sequence capable to discriminate all the species with the exception of L. turionifera, L. ecuadoriensis and L. obscura. The two sequence combined (1382 bp), were able to discriminate all Lemna species with the exception of L. euadoriensis versus L. turionifera. L. turionifera has been reported in Central Europe (Germany, France, Netherlands) (Hussner 2012) and Czeck Republic (Kaplan et al. 2016), but L. ecuadorensis was not reported in Italy or Europe. Therefore, the combined matK and trnK 5' intron sequences were deemed suitable for detecting invasive Lemna species in Italy.

Efficiency of direct PCR amplification from plant tissue
Direct PCR allowed to obtain the trnK intron amplicons from all 56 field samples, whereas matK was successfully amplified from 48 samples. Both sequences were successfully amplified in all the 32 reference samples. Further analysis was performed with the samples for which both sequences were available. The two selected sequences were amplified at the first attempt in 75% (matK) or 82.5% (trnK intron) of all samples, whereas re-amplification using the first amplification product was necessary to obtain an amplicon from the rest of the samples. Dry tissue was used for some samples for which fresh tissue was not available. The Phire direct PCR kit is not recommended for dry leaf tissue; however, we found that successful amplification can be obtained from dry leaf tissue by a second round of amplification. The possibility to perform direct PCR from plant tissue, avoiding DNA extraction, is essential for a fast alert system for aquatic species. Our protocol appears suitable as regards PCR success rate.

Species identification based on morphological traits and phylogenetic relationships
Phylogenetic analysis was conducted with 94 sequences of the matK gene and the trnK 5' intron (48 from collected samples þ32 from reference samples þ14 reference sequences), excluding samples for which either one was not obtained. The Find-Best-DNA/Protein-Models function of Mega7 (Supplemental Information Tables S3-S5) showed that the T92 þ G model gave the lowest Bayesan Information Criterion (BIC) for both single sequences and their combination (Supplemental Information Tables S3-S5).
The phylogenetic trees for single sequences (Supplemental Information Fig. S2 and S3) and the combined sequences ( Figure 2) were in close agreement. Several samples from Central Italy, putatively identified on the basis of morphological traits were attributed to a different species based on genomic sequences (Table 2, Figure 2).
Among the samples from Central Italy, 20 were identified as L. minor, 19 as L. minuta, five as L. trisulca and four as L. gibba. L. minuta was found in all three central Italian Regions, Lazio, Marche and Umbria. At the six sampling locations of Lake Trasimeno, eight L. minuta samples were found, versus six of L. minor, four of L. gibba and two of L. trisulca. This indicates that the invasive L. minuta has made his way to become dominant at Lake Trasimeno. No other exotic species have been indentified in this work.

Intraspecific diversity
We found evidence of variation within species in two cases. First, the L. minuta accession from Peru and the reference   (-3258.59) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The percentage of trees in which the associated taxa clustered together is shown next to the branches (500 bootstraps). The tree is drawn to scale, with branch lengths proportional to the number of substitutions per site. sequence from GenBank share a single nucleotide polymorphism with respect to all other L. minuta sequences. Second, two of the three L. gibba accessions from Tuoro showed a polymorphism with respect of all other L. gibba samples (Figure 2). In both cases, variation consists of a single nucleotide difference in the matK sequence (Supplemental Information Fig. S1). Since the chloroplast genome does not recombine, polymorphism can only be attributed to mutation, and does not imply sexual reproduction. However, in the case of L. gibba, the polymorphism observed at one collection site is indicative of the coexistence of different clones, implying genetic variation.
Evolutionary divergence within and between groups of the four species for which multiple accessions were available were calculated using Mega 7. There was no variation within species (not shown), while variation between species ranged from 0.016 (L. minor -L. trisulca) to 0.068 (L. minuta -L. gibba) base substitutions per site (Table 3).

Discussion
We showed that L. minuta was present at most sampling sites, even those for which historical documentation and recent publications had reported the presence of the native species L. minor (Granetti 1965;Landucci et al. 2011). The diffusion of L. minuta in Central Italy is fast and the possibility of implementing containment strategies at this stage seems scanty. However, early identification of the species at new sites can speed up eradication efforts in order to limit further spread and protect the indigenous communities.
Sometimes the experience of taxonomists based on morphologic traits is not enough for accurate species identification in many genera. In this work, we found that it can be difficult to identify exotic Lemna species versus the native ones. The barcoding method was tested as a support to morphological analysis for routine screening of many samples of Lemna populations from large geographical areas. This can be useful in view of setting up an early warning system for aquatic ecosystems that are particularly sensitive to invasion by alien species with negative impacts on biodiversity and modifications of taxonomic composition (Stiers et al. 2011;Lastrucci et al. 2017).
Species identification was accomplished with the selected matK and trnK 5' plastid genome sequences, which showed the same interspecific discrimination power. Previous phylogenetic analyses of duckweeds based on plastid DNA sequences have shown that both resolution and overall bootstrap support for clades improve when multiple DNA regions are analyzed in combination (Cabrera et al. 2008). In our case, the single and combined analyses gave identical results. Adding another plastid genome region such as rbcL or rpl16 can allow to distinguish all Lemna species without ambiguity, but it would proportionally increase the costs of molecular analyses.
The barcoding method is aimed at species identification and overlooks genetic variation within species. However, the genetic variation of invading populations can be an important aspect of invasions, but can be addressed with different molecular markers, at selected invasion sites.
The barcoding techniques can be applied to herbarium specimens, even using direct PCR techniques, with the aim of confirming previous morphological determinations that may have missed exotic Lemna species; such an effort may allow reassessing the taxonomic and distribution patterns of Lemnoideae in Italy.
Our results demonstrate that DNA sequence analyses with timely and cost-effective barcoding techniques, not requiring DNA extraction, can effectively support expert efforts in species determination for early alert systems. DNA sequencing was performed in-house in this case, but the low cost and fast turnover of commercial Sanger sequencing is compatible with small to medium scale field studies. Next generation sequencing of plastid or nuclear genome sequences may be adopted for large-scale studies aimed at monitoring new invasions, or to gain knowledge about Lemna genome plasticity related to the variation of invaded environments.