Multiple nuclear genes stabilize the phylogenetic backbone of the genus Quercus

Phylogenetic relationships among 108 oak species (genus Quercus L.) were inferred using DNA sequences of six nuclear genes selected from the existing genomic resources of the genus. Previous phylogenetic reconstructions based on traditional molecular markers are inconclusive at the deeper nodes. Overall, weak phylogenetic signals were obtained for each individual gene analysis, but stronger signals were obtained when gene sequences were concatenated. Our data support the recognition of six major intrageneric groups Cyclobalanopsis, Cerris, Ilex, Quercus, Lobatae and Protobalanus. Our analyses provide resolution at deeper nodes but with moderate support and a more robust infrageneric classification within the two major clades, the ‘Old World Oaks’ (Cyclobalanopsis, Cerris, Ilex) and ‘New World Oaks’ (Quercus, Lobatae, Protobalanus). However, depending on outgroup choice, our analysis yielded two alternative placements of the Cyclobalanopsis clade within the genus Quercus. When Castanea Mill. was chosen as outgroup, our data suggested that the genus Quercus comprised two clades corresponding to two subgenera as traditionally recognized by Camus: subgenus Euquercus Hickel and Camus and subgenus Cyclobalanopsis Øersted (Schneider). However, when Notholithocarpus Manos, Cannon and S. Oh was chosen as an outgroup subgenus Cyclobalanopsis clustered with Cerris and Ilex groups to form the Old World clade. To assess the placement of the root, we complemented our dataset with published data of ITS and CRC sequences. Based on the concatenated eight gene sequences, the most likely root position is at the split between the ‘Old World Oaks’ and the ‘New World Oaks’, which is one of the alternative positions suggested by our six gene analysis. Using a dating approach, we inferred an Eocene age for the primary divergences in Quercus and a root age of about 50–55 Ma, which agrees with palaeobotanical evidence. Finally, irrespective of the outgroup choice, our data boost the topology within the New World clade, where (Protobalanus + Quercus) is a sister clade of Lobatae. Inferred divergence ages within this clade and the Cerris–Ilex clade are generally younger than could be expected from the fossil record, indicating that morphological differentiation pre-dates genetic isolation in this clade.


Introduction
Oaks (Quercus spp.) belong to one of the most widely distributed forest-tree genera. They are dominant species in many forest types ranging from temperate deciduous forests in North America, Europe and Asia, to Mediterranean and desert scrub forest in America and Europe, and finally to tropical montane forest in South America and South East Asia. Besides their economic and ecological importance, oaks are also considered in many countries as cultural and patrimonial resources. Depending on the authors, there are between 300 and 600 reported oak species, making it one of the largest tree genera of the northern hemisphere. Since the beginning of Linnaean taxonomy, the classification within the genus Quercus L. has been controversial, and numerous classifications have been proposed (;rsted, 1867;Schwarz, 1936;Camus, 1936À1954;Menitsky, 1984Menitsky, , 2005Nixon, 1993).
Discrepancies stem either from different nomenclatures used for the classification levels (subgenera, series, species) by the different authors, or from the lack of diagnostic morphological characters. Traditional characters in taxonomy as leaf shape and lobing, pubescence, acorn morphology, floral characters exhibit large intraspecific variation and frequent hybridization in oaks further increases their range of variation. Two nuclear-encoded molecular markers, the widely used ITS region of the 35S rDNA cistron and a 2415 bp long part of the CRABS CLAW gene (CRC), have clarified the infrageneric classification by recognizing five or six major groups Denk & Grimm, 2010), which partly are in agreement with the consensus of the classification systems of Camus (1936À1954), Schwarz (1936) and Menitsky (2005;reviewed in Denk & Grimm, 2010). These species groups are: the North American golden-cup oaks (Group Protobalanus), the red or black oaks (Group Lobatae), the white oaks (Group Quercus) with a ubiquitous northern hemispheric distribution, the exclusively Eurasian Group Cerris and Group Ilex (according to Denk & Grimm, 2010) and the exclusively East Asian (subtropical to tropical) cycle-cup oaks (Group Cyclobalanopsis), all of which have traditionally been treated as distinct genera (;rsted, 1867;Schwarz, 1936) or subgenera (Camus, 1936À1938). Taxonomic formalization of these groups is pending, since morphological traits are highly homoplastic in oaks. Nevertheless, five of the six groups (Groups Protobalanus, Quercus-Lobatae, Cerris, Ilex and Cyclobalanopsis) can be unambiguously identified based on pollen ornamentation; furthermore, a number of pollen types are today exclusively found in North American red or white oaks (Solomon, 1983a(Solomon, , 1983bDenk & Grimm, 2009;Denk et al., 2010Denk et al., , 2012. Phylogenetic inferences and species delineation in oaks using molecular data have been hindered by a high intraspecific genetic variation and low interspecific divergence in the genus observed in nuclear and plastid datasets (Kremer & Petit, 1993;Denk & Grimm, 2010;Kremer et al., 2012;Simeone et al., 2013) and frequent interspecific hybridization events (Rushton, 1993). In particular, plastid data seem to lack clear discriminating signals at the intrageneric level in oaks and other Fagaceae (Manos et al., 1999) and Nothofagaceae (Acosta & Premoli, 2010;Premoli et al., 2012). Hence, earlier phylogenetic studies (Manos et al., 1999Manos & Stanford, 2001;Xu, 2004;Denk & Grimm, 2009 have so far failed to unambiguously resolve relationships among and within the six major groups of Quercus. Discrepancies among different studies and resolution problems could be partly attributed to incomplete (unrepresentative) taxon sampling (Manos & Stanford, 2001;Xu, 2004;Xu et al., 2005; and limited gene sampling Denk & Grimm, 2010), but a further source of topological ambiguity seems to be outgroup selection (Manos & Stanford, 2001;Xu, 2004;Xu et al., 2005) and a generally weak phylogenetic signal in all sequence regions analysed so far. Equally problematic is a cladistic-systematic interpretation of the molecularbased phylograms since Groups Protobalanus, Quercus and Ilex are either forming grades or clades depending on the gene and taxon sampling (Table 1).
Here, we explore recently developed genomic resources for the genus (Ueno et al., 2010) to find new nuclear gene sequences for clarifying phylogenetic relationships among the six groups. We also use a comprehensive taxon  . Our objectives were: (1) to test the monophyly of all of the infrageneric groups; (2) to improve the resolution at the deeper nodes of the phylogeny; (3) to investigate the effect of outgroups on the inferred genus root; and (4) to infer divergence ages of the major infrageneric splits.

Species sampling
Leaf or bud material of 108 species of the genus Quercus, and two additional species belonging to the putatively closely related genera Notholithocarpus Manos, Cannon & S. Oh and Castanea Mill. (Table 2) was collected. All samples were collected in situ from natural populations or collections (provenance tests or common garden experiments).
In situ collections of leaf or bud material in natural populations were made on one to seven individuals for each species located within a neighbourhood of less than 500 metres. The overall collection of specimens (210 individuals) covers a representative set of species from the six infrageneric groups (Groups Cyclobalanopsis, Quercus, Lobatae, Protobalanus, Cerris and Ilex) across their geographic distribution range (Table 2). A complete list of samples, their taxonomic classification and their geographic origin can be found in Table S1 (see online supplemental material, which is available from the  DNA extraction, marker selection, amplification and sequencing Collected tissues were immediately preserved in dried silicagel or were frozen at ¡80 C. Genomic DNA was extracted using the Invisorb Ò DNA plant HTS 96 kit from Invitek Company using the manufacturer's protocol. DNA quality was visualized on a 1 % (w/v) agarose gel stained with GelRed Ò (Biotium, USA). DNA concentration was quantified on an eight channel Nanodrop Ò spectrophotometer and concentration of each sample was adjusted manually to approximately 10 ng/ mL.
We used the resequencing data of a set of about 800 gene fragments in three European species: Q. petraea (Matt.) Liebl., Q. robur L. (Group Quercus) and Q. ilex L. (Group Ilex) selected within the first oak EST library constructed for the NCBI database (Ueno et al., 2010). This resequencing project of 800 fragments was conducted on 11 individuals (each) of Q. petraea and Q. robur and one individual of Q. ilex. These 800 fragments corresponded to 621 candidate genes (http://www.evoltree.eu/index. php/candidate-genes) potentially involved in adaptive  Camus (1936À1954). f Assignment of the study species to infrageneric groups (Group Cyclobalanopsis, Quercus, Lobatae and Protobalanus) was done according to Camus (1936À1954) and Valencia (2004). Assignments to Group Cerris and Group Ilex were done according to Menitsky (2005). g Govaerts & Frodin (1998 traits (such as growth, bud phenology, biotic interactions, etc.) and a subset of 87 control genes randomly selected in the EST database of Quercus petraea and Q. robur (Ueno et al., 2010). The sequences of the 87 control genes were analysed to select a subset of fragments, exhibiting interspecific variation between Q.robur, Q. petraea and Q. ilex, and no intraspecific variation. We set several criteria in order to select genes that might be useful for phylogenetic investigations. To ensure that they could be sequenced in one run, we set length ranges from 500 to 900 bp. Finally, amplification success and sequence quality were used as final decision criteria. All these tests aiming at the selection of genes were conducted on a panel of 14 individuals, comprising two individuals from each of the six infrageneric groups of the genus Quercus and two trees of Castanea mollissima. Using these criteria, we selected six markers annotated throughout the Blastx searches on the Swissprot database (release July 2013) with an E-value cutoff of 10À5 (CL1155: No hit, CL8461: No hit, CL8561: NADP-dependent malic enzyme, CL5191: 1-aminocyclopropane-1-carboxylate oxidase homolog 1, CL9715: No hit, CL8745: Calnexin homolog). The primer sequences are listed in Table 3.
All amplifications were carried out in a 20 mL volume, containing 2mL of 10£ reaction Buffer, 5mM of MgCl 2 , 5 mM each dNTP, 4 mM each primer, 0.5 units of Amplitaq Gold Ò 360 (Applied Biosystems) and approximately 20 ng of template DNA. The same touch-down PCR programs were used for all the PCR amplification as follows. An initial step at 95 C for 10 min, followed by 2 cycles at 94 C for 60, 59 C for 60 s, 70 C for 120 s, 17 cycles at 93 C for 45 s, 59 C for 30 s with a decrease of temperature of 0.5 C per cycle, 70 C at 120 s, 19 cycles at 92 C for 30 s, 50 C at 30 s, 70 C at 120 s and a final incubation at 72 C for 10 min. PCR products were separated on 3 % agarose gel stained with GelRED Ò (Biutium, USA) for verification of amplification. All amplified fragments were sequenced in an ABI3730 automated sequencer (Applied Biosystems).

Sequence alignments
The obtained sequences were imported into Codon Code Aligner version 3.7 (CodonCode Corporation, Dedham, MA, USA). We only used sequences for which forward and reverse reads were of good quality. All other sequences were discarded. All sequences were edited using CodonCode Aligner version 3.7 and aligned with the Muscle algorithm (Edgar, 2004) of the Seaview software version 4 (Gouy et al., 2010). Coding parts of the markers were checked for protein coding frame shifts, to eliminate pseudogenes (Zhang & Hewitt, 1996) with Mega v4.1 (Tamura et al., 2007). Insertion-deletion events were treated as missing data.
When sequences from several individuals were available for a given species, we compiled a consensus sequence by using Bioedit v7.1.3 (Hall, 1999). This strategy allowed minimizing missing data (i.e. number of taxa for which gene sequences were missing). When for a site a nucleotide base was less frequent than 95%, the IUPAC (International Union of Pure and Applied Chemistry) ambiguity code was used. Accession numbers of consensus sequences are provided in Table S2 (see supplemental material online). In order to conduct a combined analysis we concatenated all sequences into a single alignment using Seaview. The final concatenated matrix comprised 112 taxa and six genes with a total of 2896 bp.

Phylogenetic reconstructions
Phylogenetic trees were rooted using Notholithocarpus densiflorus and/or Castanea mollisima as outgroups. Recent phylogenetic analysis of Fagaceae showed that these two genera are the closest relatives of oaks Denk & Grimm, 2010). We first performed Maximum Likelihood (ML) analyses on each individual gene and on the concatenated sequences and associated standard bootstrapping using the MPI-parallelized RAxML 7.2.8-ALPHA (Stamatakis, 2006;Stamatakis et al., 2008). Given that a, which scales the shape of the gamma distribution, and the proportion of invariable sites cannot be optimized independently from each other and following Stamatakis' personal recommendations (RAxML manual), we used GTR + G with 25 discrete rate categories for all partitions (-m GTRGAMMA). Support was established based on 1000 standard bootstrap replicates.
For Bayesian Inference (BI) analysis (using MrBayes version 3.1.2), the best fitting model was set for each partition. Parameters of the model were treated as unknown variables with uniform prior probabilities and were estimated during the analysis; they were allowed to vary across partitions. Each Bayesian analysis included two MCMC runs, each composed of four chains, three heated and one cold. Each Markov chain was started from a random tree and run for up to 20 £ 10 6 generations, sampling the chains every 1000 th cycle. At the end of each run we considered the sampling of the posterior distribution to be adequate if the average standard deviation of split frequencies was < 0.05. The log-likelihood scores of sample points were plotted against generation time to determine when the chain became stationary. After discarding the 'burn-in' samples (25% of the trees) MCMC runs were summarized and further investigated for convergence of all parameters, using sump and sumt commands in MrBayes version 3.1.2 and the computer program Tracer version 1.4 (Rambaut & Drummond, 2007). Data remaining after discarding burn-in samples were used to generate a majority-rule consensus tree where the percentage of samples recovering any particular branch of the consensus tree represented the clade's posterior probability (Huelsenbeck & Ronquist, 2001). Posterior probabilities (PP) of 0.95 or higher were considered as significant support. The mean, variance and 95% credibility intervals were calculated from the set of substitution parameters. RAxML and MrBayes analyses were conducted on a 150-core Linux Cluster at CBGP (Centre for Biology and Management of Populations, Montferrier-sur-Lez, France) and a 16-core Linux Cluster at INRA Bordeaux, France.
Finally, we extended the phylogenetic reconstruction by combining our sequences with sequences available in gene banks, namely ITS and CRC sequences (downloaded from NCBI GenBank, 6.10.2012). GenBank data included few to many accessions from one to several individuals per species (particularly ITS). Our data matrix was constructed using strict consensus sequences. Strict consensus sequences were computed for each species in order to concatenate the new data with data stored in GenBank.
Furthermore to detect deep phylogenetic splits that may otherwise be obscured by terminal noise, strict consensus sequences were computed for the following species clusters: Group Quercus: Four consensus sequences were computed corresponding to the major disjunct distribution areas Eastern Asia, Western Eurasia, Eastern North America (i.e. Eastern USA) and Western North America + Southern North America (i.e. Western USA + Mexico). Group Lobatae: Two consensus sequences were computed for Eastern North America and Western North America + Southern North America (i.e. Western USA + Mexico). Group Ilex: Two consensus sequences were computed for Eastern Asia (Q. franchetii, Q. dolicholepis) and Euro-Mediterranean basin (Q. coccifera, Q. ilex). Group Cerris: Two consensus sequences were computed for Eastern Asia and Western Eurasia. Group Protobalanus: One consensus sequence was computed for this Western North American endemic group (California and Oregon). Group Cyclobalanopsis: One consensus sequence was computed for all the (East Asian) species investigated. This approach was preferred over random selection of a placeholder sequence to avoid topological bias inflicted by unrepresentative sequences or missing data. Incompatible signal in the concatenated matrix was visualized using consensus networks based on the ML bootstrap replicate trees ('bipartition networks'; Grimm et al., 2006). Bipartition networks, in which the length of each edge is proportional to the frequency of the respective bipartition in the bootstrap tree sample, were computed with the consensus network module implemented in SplitsTree using the option 'COUNT' (Huson & Bryant, 2006). Additionally, we applied a gene jackknifing approach to test for the robustness of the result to the gene sampling. Bootstrapping analyses were run based on the complete data, single-gene data and 7-gene datasets excluding one partition in the original matrix. The consensus sequences dataset was finally used to optimize the position of the two putative outgroups within the optimal ingroup topology using the evolutionary placement algorithm (EPA) implemented in RAxML (Berger et al., 2011). EPA provides a probability for placing a target taxon (sequence) in a given topology. Although invented for a different purpose, EPA results hence allow further investigation of alternative placements of outgroup-inferred roots (A. Stamatakis, pers. comm., 2013).

Molecular dating
Divergence ages were calculated in BEAST v. 1.8 (Drummond et al., 2012) on the strict consensus sequence matrix. Substitution models were initially defined for each partition as selected by Modeltest (see above). Ambiguities were treated as polymorphisms by changing the defaults setting in the .xml file generated with Beauti. (Drummond et al., 2012) ('useAmbiguities' D 'true') to compensate for the use of consensus sequences. We then selected a Yule tree prior with an uncorrelated lognormal molecular clock (UCLD).
To infer absolute ages for the diversification of the main species cluster, we selected five different ingroup fossils, fossils unambiguously assignable to one of the intrageneric lineages, for age calibration (Table 4). Calibrations used a single fossil as age constraint, a lognormal age prior with the uppermost limit of the time interval as a minimum hard bound (offset) and large standard deviations (1.5 for calibrations 1, 2 and 5; 1.2 for 3a, 1.3 for 3b, 4). For most of the fossil constraints relatively exact stratigraphic ages were available, except for constraint no. 3 (first Cerris-type pollen). To compensate for the age uncertainty in this case, two calibrations were done, using the minimum and maximum possible stratigraphic age of the fossil flora. To assess the reliability of our root, we ran analyses where the ingroup was constrained to be monophyletic and another batch without constraining the ingroup. Two runs of 150 million generations with sampling every 10 000 generations were performed for each calibration/rooting scheme. The two separate runs were then combined using LogCombiner v.1.8. We ensured convergence of all parameters using Tracer 1.5 (Drummond et al., 2012). We used TreeAnnotator v.1.8 to generate a maximum clade credibility (MCC) tree and FigTree v.1.3.1 to visualize the results.

PCR and sequencing results
Amplifications were successful for most but not all six nuclear genes selected (Table 5 and Table S2). Eleven species were represented by only two gene sequences, 14 by all six gene sequences. The whole dataset resulted in a supermatrix of 453 sequences out of the 672 potential sequences (D112 samples*6 genes), thus representing 67% of the theoretically possible data. Overall, the six selected gene sequences exhibited less nucleotide diversity and parsimonious informative characters than previous markers used for phylogenetic reconstruction (Table 5). The number of distinct alignment patterns ranged from 57 to 161 depending on the selected genes and outgroup (Table 5). Finally, the proportion of ambiguous nucleotide sites, which comprise intra-species polymorphism and possible sequencing errors, amounted to 0.0024, which is negligible in comparison to the betweenspecies variation (Table 5).

Single gene analysis
The results of the single-gene ML analyses illustrate the differences in the signal provided by each gene region and the inclusion of either Castanea or Notholithocarpus (Figs S1ÀS4, see supplemental material online). The phylogenies were rooted on Notholithocarpus and Castanea except for CL8745 which was unrooted, because this gene could not be amplified in both outgroup genera. The phylogenetic analysis of CL1155 retrieved four ingroup clades with high BP supports (BP generally > 90), while members of Groups Protobalanus and Quercus clustered together in another strongly supported clade (Figs S1 and S3). Phylogenetic analyses of CL8561 recovered three of the six intrageneric groups as clades (Groups Lobatae, Quercus and Cyclobalanopsis), with BP of 98, 78 and 93, respectively (Fig. S4) when Notholithocarpus was used as outgroup, but only two (Groups Lobatae and Quercus) when rooted on Castanea (Fig. S2). The third marker, CL5191, for which amplification was less successful, was also less informative (Figs S1 and S3). It divided the species into two principal groups, one clade mixing members of Group Cerris and Cyclobalanopsis and an unsorted grade with the remainder. In addition there were some taxa that were scattered in the phylogeny (one red oak was found in the Cerris-Cyclobalanopsis clade, one member of Group Cyclobalanopsis was found as a sister taxon of the clade comprising Groups Quercus and Lobatae  S4). Overall, the single gene analyses retrieved clades comprising members of one or several infrageneric groups depending on the gene, but the relationships between these clades remained unresolved. These results prevented us from using any super tree approach (Scornavacca et al., 2008).

Combined analyses
We first inferred an ingroup tree comprising all species and using the concatenated sequences of all six genes (Fig. 1). This tree clearly separates the six major infrageneric groups. Topologies of ML trees including either Castanea or Notholithocarpus were highly concordant with those obtained by BI as shown by the congruence between bootstrap values and posterior probabilities (Figs 2 and 3) and were much better resolved than the trees obtained with each individual gene. However, there was a slight topological shift depending on the outgroup species used. Using Castanea as outgroup (Fig. 2) the resulting tree resolved two strongly supported clades within Quercus corresponding to the two subgenera recognized by Camus (1936À1954) (Cyclobalanopsis and Euquercus). The nine species of Group Cyclobalanopsis (D subgenus Cyclobalanopsis according to Camus) formed a sister clade to the remaining taxa. The clade corresponding to subgenus Euquercus (according to Camus) was further subdivided into two moderately supported clades: one clade comprised the 'New World Oaks' (Manos et al., 1999) of Groups Quercus, Lobatae and Protobalanus (of which Group Quercus includes also Old World species); the second clade was made up of species belonging to Group Cerris and Ilex. The three infrageneric groups within the 'New World Oaks' were retrieved by ML and BI analyses as clades with strong support values. The Cerris-Ilex clade comprised one subclade of Euromediterranean species of Group Ilex (Q. ilex, Q.coccifera L.) and a second subclade with all species of Group Cerris and the two analysed East Asian species of Group Ilex (Q. dolicholepis A. Camus and Q. franchetii Skan), however, with poor support.
When Notholithocarpus was included as outgroup and used to root the tree (Fig. 3), the positions of the ingroup root and of the Group Cyclobalanopsis changed, resulting in two poorly supported major clades: one corresponding to the 'New World Oaks' as above, the other to the 'Old World Oaks' (Manos et al., 1999) comprising Groups: Cyclobalanopsis, Cerris and Ilex. The main difference between both analyses is the loss of the deeper split between the two subgenera, and the shift of Group   Circles at nodes indicate the tested alternative ingroup roots. Filled grey circles correspond to placements for which the likelihood was not significantly different from the initial placement. Empty circles correspond to significantly lower likelihoods.
Cyclobalanopsis from a 'sister-to-the-remaining-taxa' position to the 'Old World Oaks'.
We tested alternative ingroup roots as shown in Figure 2 (Castanea is the outgroup) and in Figure 3 (Notholithocarpus is the outgroup) using the nonparametric ShimodairaÀHasegawa (SH) test implemented in RAxML. When Castanea was included in the matrix, the likelihood of all other topologies was significantly lower than that of the ML-preferred topology (Fig. 2), except when Castanea was connected to the basal branches within the Cerris-Ilex-Cyclobalanopsis subtree. When Notholithocarpus was selected as outgroup, the likelihood of alternative root positions was not significantly different from the ML-preferred position (Fig. 3), but was lower when Notholithocarpus was connected to the branch leading to Groups Cerris and Cyclobalanopsis. In conclusion, for both outgroups, relatively more likely topologies are obtained when the outgroups are connected to basal branches within the Cyclobalanopsis-Cerris-Ilex subtree (see also Fig. 1). This lends credibility to a basal split between the New World and Old World Clade.

Stability of found ingroup and outgroup relationships against signals from CRABS CLAW (CRC) gene and ITS region of the 35S rDNA cistron
To investigate the effect of gene sampling on ingroup relationships and the placement of the two outgroup taxa, we complemented our data with ITS and CRC sequences harvested from gene bank data (see Materials and methods). The 8-gene bipartition (bootstrap) network obtained with the strict group consensus sequences (Fig. 4) corroborated the backbone topology of the 6-gene ML or Bayesian inferences (Fig. 1). The two main clades, the New World and the Old World clade were strongly supported. Furthermore, there was moderate support for the ingroupoutgroup split. The subsequent gene jackknifing allowed further identification of phylogenetic footprints from each marker ( Table 6). The sister relationship (and mutual monophyly) between Group Protobalanus and Group Quercus is mostly due to signal from CL1155. When this gene is removed from the analysis, the alternative LobataeÀQuercus sister relationship is favoured, although the support to a Protobalanus-Quercus relationship does not collapse to zero. Furthermore, it is the CRC sequences in combination with other partitions that generate the clear split between the ingroup Quercus and the outgroup Notholithocarpus and Castanea. The position of the two outgroup taxa shown in Fig. 4 could be confirmed using the Evolutionary Placement Algorithm (EPA) implemented in RAxML. Although the EPA analyses based on individual genes indicated different possible roots, partly with high probability (CL1155, CL8561, CL9715), the concatenated data provided unambiguous support for New World-Old World split (Table 7) and the monophyly of the genus (Fig. 4).

Molecular dating
Divergence ages were calculated on the 8-gene consensus sequence matrix based on five different ingroup fossils (Table 4). Inferred divergence ages vary strongly depending on the used constraint ( Fig. 5; Table S3, see supplemental material online). The overall best fit, an average difference of 10 million years (Ma) for inferred medians to expected ages (7 Ma regarding inferred min and max values), was obtained using constraint no. 1 (Table 4), the oldest unambiguous fossil record of Group Cyclobalanopsis. This 'best fit' suggests a root age of 54 (68À48) Ma for genus Quercus (Fig. 5). Divergence between the New World oaks (Groups Protobalanus, Quercus, Lobatae) and the Old World oaks (Groups Cyclobalanopsis, Ilex, Cerris) occurred shortly after, during the early Eocene. Radiations within each of these two clades lasted until the mid Miocene. Cyclobalanopsis, 49 Ma (48À52 Ma, constrained), and Lobatae, 34 (53À9) Ma, diverged already in the Eocene. Inferred divergence ages between Protobalanus and Quercus in the New World oaks, 15 (8À38) Ma, and between Cerris and (East Asian) Ilex, 10 (1À30) Ma in the Old World oaks fall within the mid to late Miocene. Using a single minimum age constraint on the relatively complex sequence data resulted in large 95% highest posterior density (HPD) intervals. Figure 5 shows that, for most nodes, the different inferences converge to an interval encompassing the inferred median age using constraint no. 1. Exceptions are the deepest ingroup nodes referring to the MRCA of all oaks, the New World oaks, and the Old World oaks, respectively. A general trend is that inferred ages tend to be younger than indicated by the fossil record, irrespective of the constraint used.

Confirmation of the infrageneric subdivision of genus Quercus
In this study we explored phylogenetic reconstructions using a larger sample of nuclear genes than in previous investigations and an extensive taxon sampling. Up to now phylogenetic analyses were mainly based on ITS sequences (Manos et al., 1999Manos & Stanford, 2001;Denk & Grimm, 2010) with one study also using sequences from the CRABS CLAW (CRC) gene, a single copy nuclear gene that regulates carpel development in angiosperms . Despite a lower nucleotide diversity and less parsimony informative characters (Table 5) than ITS, our data largely confirmed results from previous studies (Manos et al., 1999Manos & Stanford, 2001;Denk & Grimm, 2010) that recognized six major infrageneric lineages (Protobalanus, Lobatae, Quercus, Cyclobalanopsis, Cerris and Ilex) but also provided new insights into the phylogenetic relationship between these lineages. The major split between the 'New World' and 'Old World' clade Denk & Grimm, 2010) is further supported, and placed in the early Eocene (Fig. 5). Within the 'New World' clade, the red oaks, Group Lobatae, are resolved as sister to a   clade comprising the Golden-cup oaks and white oaks (Groups Protobalanus and Quercus) in contrast to previous studies. Within the 'Old World' clade, members of Group Cyclobalanopsis are most distinct, and Group Ilex is suggested to be paraphyletic to Group Cerris. Interestingly, the six infrageneric groups are generally reproductively isolated. Successful natural or controlled hybridization between species from different groups have only been shown between Groups Ilex and Cerris (Boavida et al., 2001;Burgarella et al., 2009;). Very anecdotal hybridization events between species of Groups Quercus and Cerris (Mir et al., 2006) and between Groups Quercus and Ilex (Schnitzler et al., 2004) have also been reported. However, interspecific hybridization is considered to occur very frequently within members of Group Quercus (Rushton, 1993) or Group Lobatae (Moran et al., 2012), and has been documented for Group Cerris (an arboretum hybrid between Q. suber L. and Q. trojana Webb, Denk & Grimm, 2010). The possibilities of natural hybridization have been less investigated in the other groups but intraand intergroup artificial hybridization experiments support the general conclusion of frequent cross-breeding within groups and very rare hybridization success between groups (Cottam et al., 1982). Similar observations were made in the Nothofagaceae by Acosta and Premoli (2010), who inferred past hybridization events by cpDNA sharing among species.

Tree topologies and outgroup choice
Results from traditional tree inference suggest two different ingroup roots and two alternative placements of the Cyclobalanopsis clade within the genus Quercus. With Castanea as outgroup, our data suggested two clades within Quercus corresponding to the subgeneric classification of Camus (1936À1954), also partially accepted by Nixon (1993) and by Menitsky to some degree (Menitsky, 1984(Menitsky, , 2005: subgenus Quercus (called Euquercus by Camus) and subgenus Cyclobalanopsis (called Cyclobalanoides by Menitsky, 2005). This rooted topology differed markedly from the one observed when Notholithocarpus was chosen as outgroup: subgenus Cyclobalanopsis clustered with Groups Cerris and Ilex to form an Old World clade (Figs 2 and 3). Instability of the genus root was also observed in previous studies, when different outgroups were used. When outgroups were chosen that are only distantly related with Quercus, for example members of the Trigonobalanus genus complex (Manos et al., 1999) or Fagus , Group Cyclobalanopsis was usually placed as sister clade to the remaining oaks. This is similar to the topology we obtained when Castanea was used as outgroup and based on our 6-gene data. However, when more closely related genera such as Lithocarpus Blume (Manos & Stanford, 2001) or Notholithocarpus  were used as outgroups, Group Cyclobalanopsis was reconstructed as a member of the 'Old World Oaks'. Phylogenetic relationships between subgenus Quercus and subgenus Cyclobalanopsis were therefore ambiguously resolved and were very sensitive to the outgroup choice. In our dataset, Castanea, being more distant to Quercus than Notholithocarpus, may attract ingroup taxa that are well differentiated from the rest, such as species belonging to Group Cyclobalanopsis. Thus the split between two subgeneric clades (Fig. 2) may actually be an artefact due to long branch attraction (Bergsten, 2005). However, we cannot entirely rule out that the true phylogeny actually has its root at the longest branch. The results of the SHtest (Table 6), the phylogenetic placement algorithm (Table 7) and the general support patterns in the single genes demonstrate that the signal from Castanea is more explicit regarding its placement within the Quercus tree Table 7. Probabilities of the position of the two putative outgroups using the evolutionary placement algorithm. The placement with the highest probability (weight) is shown; in case several alternatives showed essentially the same probability (within a range of 0.05), the number of alternatives is given and the range of the probabilities. EPA could not be applied for CL8745 because no sequences were available for Castanea and Notholithocarpus (Table S2, see supplemental material online). than that of Notholithocarpus. Whereas Notholithocarpus can be shifted within the oak phylogenetic backbone, Castanea is strongly attracted by Group Cyclobalanopsis, but also East Asian members of Group Ilex. This attraction is strongest in the CL1155 gene; whereas the signal in other partitions is inconclusive regarding the best placement of Castanea in the all-oak tree. The same gene places Notholithocarpus afar from Castanea, which directly implies that the genus Quercus is not monophyletic (see e.g. Manos et al., 2008). This incongruence is resolved when the data are concatenated and filtered against terminal noise (Fig. 4, Tables 6 and 7).

Phylogenetic relationships and diversification within the Old World clade
The genetic distinctness of Group Cyclobalanopsis seen in our data is well established in the literature. Within the Cerris-Ilex clade, our analyses yielded two clades, or one grade and one clade, which corresponded to the two botanical entities: Group Ilex and Group Cerris (Figs 2 and 3). Resolution is slightly better when Notholithocarpus is chosen as outgroup. One of the (East) Asian Group Ilex species (Q. franchetii) is misplaced in both analyses, however, Q. dolicholepis (the second Asian Ilex species) is only placed as direct sister to Group Cerris when Castanea is the outgroup and with low support. Reconstructions based on cpDNA sequences (Manos et al., 1999) also separated Cerris from Ilex, with one Asian Ilex species (Q. phillyreoides A. Gray) placed as well within the Cerris clade. Evergreen and sclerophyllous Euro-Mediterranean oaks (Q. ilex, Q. coccifera and Q. coccifera subsp. calliprinos) assigned to the subgenus Sclerophyllodrys by Schwarz (Schwarz, 1936À39, 1993 form a clade in our analyses (Fig. 2) similarly to previous work based on ITS sequence data restricted to Eurasian oak species (Samuel et al.,Fig. 5 Chronogram based on the 8-gene consensus sequence matrix. Node heights reflect median inferred divergence ages using constraint no. 1 (Table 4). 95% confidence intervals (based on the height posterior densities) of the six runs using different constraints are indicated by blue, transparent bars to highlight overlap or disagreement between calibrations. Constrained nodes indicated by solid blue lines for each analysis; numbers refer to constraints as listed in Table 4. Line thickness corresponds to the branch-length in a non-ultrametric tree (thin D short branches, thick D long branches 1998; Bellarosa et al., 2005), and with ITS+5S-IGS data (Denk & Grimm, 2010). However previous studies, based on CRC sequences and on ITS and CRC sequences suggested that Group Cerris and Ilex were mutually monophyletic (Manos & Stanford, 2001;Denk & Grimm, 2009, while we found them intermingled in the two groups of the Old World clade. Because Asian species assigned by Menitsky (2005) to Section Ilex (Q. franchetii and Q. dolicholepis), a placement confirmed by group-diagnostic pollen ornamentation in oaks (Denk & Grimm, 2009) and ITS data (Manos et al., 1999;Denk & Grimm, 2010), occupied different positions within the Old World clade, our data do not support recognition of Section Ilex (sensu Menitsky) or Section Cerris (sensu Camus or sensu Menitsky). That Group Cerris is a relatively recent off-shoot of Group Ilex (Denk & Grimm, 2009 finds further evidence in the fossil record of both groups (Table 4; Kmenta, 2011;Denk & Tekleva, in press, Velitzelos et al., 2014) and the results of our dating (Fig. 5). It would nevertheless be important to sample a higher number of members of Group Ilex.

Phylogenetic relationships and diversification within the New World clade
Like earlier investigations our analyses yielded a moderately supported New World clade. However our study is the first to show the subdivision of the New World clade into three well-supported subclades corresponding to Groups Protobalanus, Lobatae and Quercus. Phylogenetic relationships between these three groups varied depending on the gene sampling. ITS sequences suggested that Group Protobalanus is a sister clade to Groups Quercus and Lobatae, but with poor support, evidence of lineage mixing, and Group Lobatae usually nested in a Group Quercus grade (Manos & Stanford, 2001;Denk & Grimm, 2010). The close relationship of Groups: Lobatae and Quercus is also mirrored by their pollen ornamentation, with Group Protobalanus showing a putatively more ancestral state (Solomon, 1983a(Solomon, , 1983bDenk & Grimm, 2009). Similar results but with stronger support were obtained with CRC sequences and concatenated CRC+ITS sequences , whereas earlier reports based on ITS sequences and on ITS+cpDNA placed Lobatae as sister clade to (Protobalanus+Quercus) (Manos et al., 1999), as we actually found in our results (Figs 2 and 3). The gene jackknifing analysis further showed that the Protobalanus-Quercus sister relationship was due to a strong signal from gene CL1155 adding to weak, but consistent, signals from other markers. Both lineages are separated for at least 34 million years (Ma;  Fig. 5). This appears to be too young, since both groups can be traced back in the fossil records at least to the Eocene-Oligocene boundary (Table 4) Denk, pers. comm.). Because of its highly incompatible signal, we refrained from including Q. sadleriana in the dating. Thus, we may have failed to infer the age of the first radiation in the Protobalanus-Quercus clade, and the inferred age (Fig. 5) mirrors the final (genetic) isolation of the modern members of both groups. Or we may also conclude that morphological differentiation pre-dates genetic isolation in this clade.
Ancient rapid divergence and late radiation obscures Quercus molecular phylogenies As earlier phylogenetic reconstructions (Manos & Stanford, 2001;Denk & Grimm, 2009) our reconstructions are partly characterized by low support for deeper branches. To some degree, the overall lower support is due to equally supported, competing (alternative) relationships among the different taxonomic groups; in other cases it is due to a generally weak phylogenetic signal. None of the explored gene regions showed a phylogenetic signal that was in strong conflict with the consensus topology. Such patterns are generally considered as signatures of ancient rapid radiation followed by long time spans during which earlier splits were blurred (Rokas et al., 2005). Alternatively, similar patterns can also be obtained when the genome has been poorly explored for phylogenetic signals, or when target genes exhibit inappropriate substitution rates to detect ancient events (Whitfield & Kjer, 2008). Before exploring the rapid radiation hypothesis, we may therefore question whether our selected genes were appropriate to reveal old speciation events. Selecting genes for detecting targeted phylogenetic signals is a challenging task. On one hand it is difficult to explore the phylogenetic signal before conducting the analysis on the whole dataset, on the other hand technical criteria such as PCR amplification success and sequence quality may prevail over phylogenetic criteria. Our selection of six protein coding genes resulted from a compromise between technical criteria and optimizing phylogenetic information. The independent analyses of these genes resulted in phylogenetic trees with generally short basal branches (Figs S1 and S2). Their individual signals are comparable to those in available datasets (CRC, ITS). Short backbone branches were also found in previous reconstructions based on ITS sequences. Despite the generally weak individual signals, the concatenated data converge towards a consensus topology, which in addition shows a distinct ingroup (genus Quercus)-outgroup (Notholithocarpus and Castanea) split (Fig. 4). Congruency of patterns across genes is likely to result from evolutionary processes that impact the whole genome, and therefore would support the rapid radiation hypothesis. According to the best-fit chronogram, the first divergence within Quercus took place directly after the putative establishment of the genus close to the PaleoceneÀEocene boundary; the first radiations within the 'New World Clade' and 'Old World Clade' may have been finished already by the mid-Eocene (Fig. 5, Table 4). The Oligocene (34À23 Ma) has been suggested as the time when proliferation of oaks (and other Fagaceae (Denk & Grimm, 2009;Denk et al., 2012) and Nothofagaceae (Premoli et al., 2010)) occurred with the modern diversity already established by the Miocene (Axelrod, 1983) (Table 4; Fig. 5). Based on sequence variation in non-coding plastid DNA, Manos & Stanford (2001) concluded that Old World and New World members of Group Quercus split at around 17 Ma (Burdigalian, late early Miocene); which appears too old regarding the dating based on the 8-gene nuclear gene matrix (Fig. 5). This discrepancy can be easily explained by phases of unhindered gene flow via the North-Atlantic Land Bridge in Group Quercus until at least 8 Ma (Denk et al., 2010) and Beringia until the latest Pliocene or Pleistocene interglacials. In contrast to plastids, nuclear genomes are biparentally inherited, hence, genetic signatures are propagated via pollen (partly long-distance dispersed in the wind-pollinated oaks) and seeds (short-distance dispersed by jay birds and squirrels). There are no apparent geographic barriers to gene flow in modern Lobatae (Moran et al., 2012), and also in the case of Group Cerris the range may have been fragmented quite recently (analogous to the situation in Fagus, Denk & Grimm, 2009;T. Denk, pers. comm. 2014). The late and rapid radiation of modern lineages of Fagaceae in general, and possibly of some infrageneric lineages of Quercus (Groups Cerris, Lobatae, Quercus), may be linked to the long persistence of extinct and partly widespread fagaceous lineages (Smiley & Huggins, Pseudofagus;1981;Crepet & Nixon, various organ taxa;Denk et al., 2012, Eotrigonobalanus, Trigonobalanopsis).