sorry, we can't preview this file
GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2
We sampled plant lineages from the NCBI database, for which ribosomal ITS/ITS2 sequences are available from closely related species. We focused on lineages that presented DNA barcodes, as they generally have adequate inter- and intraspecific sampling for effective species identification. The validity and coverage of species within a given lineage were investigated using the Plant List online service (http://www.theplantlist.org). Lineage representativeness among the major angiosperms was also assessed based on the Angiosperm Phylogeny Group IV system (APG IV, 2016). All sequences of these lineages with the annotation “internal transcribed spacer” or “internal transcribed spacer 2” were selected. Then, ITS2 boundaries were determined by using GenBank annotations or the hidden Markov models implemented in the ITS2 database (http://its2.bioapps.biozentrum.uniwuerzburg.de/). All sequences of each genus were aligned and edited by BioEdit, where incomplete ITS2 sequences were excluded. A total of 8666 species representing 165 genera, 63 families, and 30 orders were finally selected for analyses. We also retrieved some EST unigene data sets from Serres-Giardi’s previous study and tested the correlation of GC contents between the EST and ITS2 among the shared genera.