How likely is speciation in neutral ecology ?

Patterns of biodiversity predicted by the neutral theory rely on a simple phenomenological model of speciation. To further investigate the effect of speciation on neutral biodiversity, we analyze a spatially-explicit neutral model based on population genetics. We define the metacommunity as a system of populations exchanging migrants and we use this framework to introduce speciation with little or no gene flow (allopatric and parapatric speciation). We find that with realistic mutation rates, our metacommunity model driven by neutral processes cannot support more than a few species. Adding natural selection in the population genetics of speciation increases the number of species in the metacommunity but the level of diversity found in Barro Colorado Island is difficult to reach.


Introduction
How patterns of biodiversity arise through ecological and evolutionary processes is a central question in modern ecology (Fussmann et al. 2007;Johnson and Stinchcombe 2007). According to Hubbell's neutral theory of biodiversity (NTB), patterns of biodiversity such as species-abundance distributions can be explained by the balance between speciation, dispersal, and random extinction (Hubbell 2001;). The neutral theory provides a good fit to species distribution curves (Hubbell 2001) and has been extended in several ways (Volkov et al. 2005;de Aguiar et al. 2009;Haegeman and Etienne 2009;Rosindell et al. 2010). The neutral theory is flexible enough to fit nearly any species abundance distribution , but, apart from species abundance distributions, it provides valid starting points and interesting null hypotheses for many problems in community ecology (Alonso et al. 2006;Leigh 2007).
While much has been said about the assumption of ecological equivalence (Abrams 2001;Purves and Turnbull 2010), much less attention has been given to the speciation mode (Etienne et al. 2007), which is sometimes seen as the theory's weakest point (Kopp 2010). In recent years, several variants of the NTB have explored different speciation models (Etienne et al. 2007;de Aguiar et al. 2009;Haegeman and Etienne 2009;Rosindell et al. 2010). However, nothing has been done to relate the theory to population genetics and known models of speciation, despite the fact that, as Etienne et al. (2007) noted, such a mechanistic model could eventually force us to reject neutrality. The neutral theory with point speciation has also been criticized for predicting too many rare species and too many young species (Ricklefs 2003) and for assuming a direct relationship between abundance and speciation (Etienne et al. 2007).
In this article, we introduce a neutral theory of biodiversity with a speciation model derived from population genetics. We emphasize the role of allopatric and parapatric speciation. Speciation modes are most often distinguished according to the level of gene flow between the diverging populations. Allopatric speciation occurs when the new species originates from a geographically isolated population. By contrast, sympatric speciation is often defined as speciation without geographical isolation, in short, when the diverging populations share the same location. Finally, parapatric speciation covers the middle ground between these two extremes (Gavrilets 2003).
In the original neutral theory's formulation, Hubbell presented two models of speciation, point speciation and random fission speciation (Hubbell 2001). Both are phenomenological individual-based models. In the case of point speciation, a newly recruited individual is selected at random and undergoes speciation. In the case of random fission, the whole species is divided in two at random. The random fission model is more realistic and does improve some predictions related to speciation, but the resulting species abundance curves do not fit data as well as the point speciation model (Etienne and Haegeman 2011). In both cases, the probability of speciation of a given species is directly proportional to abundance and independent of dispersal. Hubbell (2001) associates the point speciation model with sympatric speciation and the random fission model with allopatric speciation. Some rare forms of sympatric speciation are indeed similar to the point speciation model, namely, polyploid speciation, but most sympatric speciation events involve a population being divided in two by nongeographical factors (Coyne and Orr 2004). Also, because neither model takes gene flow into consideration, neither can distinguish sympatric and allopatric speciation events.
While theoretical models have shown sympatric speciation to be possible, empirical studies have uncovered very few solid cases (Bolnick and Fitzpatrick 2007), and much of the theory is controversial (Barton and Polechova 2005;Spencer and Feldman 2005). Despite the growing acceptance of sympatric speciation as a plausible cause of speciation, most speciation events are still thought to occur with limited gene flow (Coyne and Orr 2004;Gavrilets and Vose 2005;Bolnick and Fitzpatrick 2007;Fitzpatrick et al. 2008). Sympatric speciation is difficult to achieve for two reasons (Coyne and Orr 2004, p. 127). First, there is antagonism between selection and recombination. As selection pushes the populations in different directions, gene flow tends to break combinations that would be beneficial for one population but not the other, creating maladapted genotypes. Also, the diverging populations have to coexist before and after reproductive isolation. Allopatric and parapatric speciation events are thought to be more common, but modeling them requires some details about the spatial structure of the metacommunity. Ricklefs (2008) argued that allopatric speciation is the creative force in community ecology, and we choose to base our model on the most common forms of speciation despite the increased complexity of a spatially explicit framework. We find that with realistic parameters, metacommunities cannot support more than a few species when the genetics of speciation is assumed to be neutral. We also considered a simple alternative pseudoselection model by adding natural selection at the genetic level but keeping the ecological equivalence assumption at the individual level. This model shows that the rates of speciation typical of the NTB cannot be obtained without selection pushing mutations to fixation.

Model
We model speciation with the Bateson-Dobzhansky-Muller model (BDM) in which reproductive isolation is caused by the accumulation of incompatible alleles (Bateson 1909;Dobzhansky 1937;Muller 1942;Orr 1996;Orr and Turelli 2001). While the BDM model is simple, we have many empirical and theoretical reasons to think that speciation events often follow a similar scheme (Gavrilets 2003;Coyne and Orr 2004). We use a two-loci and two-alleles version of the model where reproduction is not modeled explicitly (Gavrilets 2004, p. 131). A species can be divided into several populations living in different communities, with each population having its own set of incompatible alleles at different loci. A population in community x starts with the a x b x haplotype fixed. The allele at the first locus, a x , mutates to A x , and the allele at the second locus, b x , mutates to B x . Both mutate at the same rate, m. We follow Gavrilets (2004) and ignore back mutations. Back mutations have been shown to slow down speciation in this model but not dramatically (Gavrilets 2004, p. 131). The path from a x b x to A x B x can be seen as a process with three states: ( 1 ) The haplotype a x B x is absent because of an incompatibility between a x and B x . Speciation occurs when all individuals in the population carry the A x B x haplotype. Migration brings new individuals, always with the a x b x haplotype, at rate m. To integrate Gavrilets's BDM model in a metacommunity, we connect local communities composed of populations of one or more species. Speciation is a complex process, but this simple model captures many important characteristics of speciation events that are ignored in the NTB. First, speciation takes time. It is the result of a long process whereby a population diverges from the rest of the species to the point where reproductive isolation prevents them from producing fertile progenies (Coyne and Orr 2004). Second, with a few exceptions, the starting population size of the new species is likely to be higher than 1 (Gavrilets 2004;Rosindell et al. 2010). Third, gene flow (migration) has a strong homogenizing effect that will inhibit speciation (Coyne and Orr 2004;Fitzpatrick et al. 2008). Finally, speciation occurs as a population of a given species diverges, most often in well-defined geographic areas (Avise 2000;Coyne and Orr 2004). None of these characteristics are present in the original neutral theory (Hubbell 2001), although protracted speciation partially solves the first two problems by adding a parameter to account for the duration of speciation (Rosindell et al. 2010). Some of these problems have been solved within a population genetics framework with assortative mating (de Aguiar et al. 2009;Melián et al. 2010). In particular, de Aguiar et al.'s model covers the aspects mentioned above using a spatially explicit framework with positive assortative mating (de Aguiar et al. 2009). The model developed by de Aguiar et al. (2009) leads to distinct species without the need for physical barriers, in contrast to our approach, which is based on explicit physical barriers and limits to gene flow. It is difficult to distinguish populations in individualbased models. Because speciation is the result of divergences between populations, it is hard to model unless individuals are grouped into populations (Coyne and Orr 2004). In the NTB and most of its variants, only two levels Figure 1: Illustration of the dynamics of a metacommunity with three local communities (numbered from 1 to 3) of 10 individuals. Colors and shapes are used to distinguish species and haplotypes, respectively. a, At each time step, an individual is selected in each community. All individuals have the same probability 1/J x of being selected, with J x being the size of the local community. b, The individuals selected are then replaced either by migration (with probability m) or by local replacement (with probability ). In com-1 Ϫ m munity 1, the individual is replaced by a local replacement event. A blue individual is chosen with , and then the a 1 b 1 hap- the fitness of the various haplotypes. In community 2, a blue individual with haplotype a 2 b 2 is selected and mutates to A 2 b 2 (probability m). The individual in community 3 is replaced by a migrant. A blue individual is selected in community 2 with probability 0.8. While this individual carries A 2 b 2 , we assume different mutations are required in each community to achieve speciation. Thus, migrants carry no mutations at the focal loci for the population into which they move and the A 2 b 2 haplotype is irrelevant in community 3. c, At the end of the time step, speciation occurs if A x B x is fixed. Because all green individuals in community 1 carry A 1 B 1 , they speciate and are now represented by red triangles (a 1 b 1 ).
of organization are recognized: the individual and the species. To integrate Gavrilets's model in the NTB, we model populations in patches using graphs. Several approaches have been used to model the spatial structure of populations and local communities. Some are spatially explicit at the level of the individual. In these models, the location of each individual is known, generally by using a grid (Rosindell and Cornell 2007) or a graph (Lieberman et al. 2005). Another approach is to consider the position of populations but ignore the exact position of the individuals within the populations. Again, this method has been used with grids (Gavrilets and Vose 2005) and graphs (Minor and Urban 2007;Economo and Keitt 2008;Dale and Fortin 2010). We use the latter approach and model the metacommunity as a graph of n local communities (hereafter simply referred to as communities), where each community x can support a total of J x individuals. These communities, composed of one or more species, are connected by dispersal (Economo andKeitt 2008, 2010; fig. 1). This spatial representation allows us to distinguish three levels of organization: species, populations, and individuals. A population is simply the sum of all the individuals of a given species in a given community. A species can thus be divided into n populations. Dispersal between two communities will always be low enough to assume that the individuals in these two communities can be defined as distinct populations (Berryman 2002). All individuals have a haplotype (a x b x , A x b x , or A x B x ), and we follow explicitly their dynamics in each community. Because these haplotypes represent a path toward speciation in a particular community, they should be seen as different pairs of loci for each population. For example, if an individual migrates from community 1 to community 2, it will carry the a 2 b 2 haplotype in its new community, regardless of its haplotype in community 1. This assumption is not realistic in all situations because both mutational order and ecological speciation are known to be influenced by complex interactions between the diverging populations (Mani and Clarke 1990;Coyne and Orr 2004;Gavrilets 2004;Schluter 2009;Nosil and Flaxman 2011). Integrating the effect of these divergences would require many more assumptions about the nature of speciation and in most cases cannot be done without introducing the concept of niche (Schluter 2000). We ignore much of the details of speciation in favor of a simple model that captures many of the most fundamental characteristics of speciation as a population process (Gavrilets 2004). Because there is no niche differentiation, new mutations toward speciation are always allowed to appear regardless of the ecological context. As soon as all the individuals of a given species inside a local community x carry the haplotype A x B x , they undergo speciation ( fig. 1). An alternative approach would be to allow A x B x individuals to undergo speciation even in the presence of a x b x if there are no A i b i individuals present. However, this model would fail to account for the homogenizing effect of gene flow and would almost always lead to sympatric speciation. Because we want to model allopatric and parapatric speciation, we follow Gavrilets (2004) and allow speciation only when A x B x is fixed.
Metacommunity dynamics is similar to Hubbell's neutral model of biodiversity (Hubbell 2001) and the Moran model in population genetics (Moran 1962;Ewens 2004). It can be described in three steps ( fig. 1). (1) For each time step, an individual is selected in each community. All individuals have the same probability 1/J x of being selected, with J x being the size of the community ( fig. 1a). (2) The individuals selected in step 1 are replaced either by migration or by local replacement (fig. 1b). The probability of migration from x to y is given by the matrix m. In the case of migration from x to y ( ), the new individual x ( y will belong to species i with probability N ix /J x , with N ix being the population size of species i in community x. We assume that migrants carry no mutations at the focal loci for the population into which they move so the haplotype is ignored and the new individual will carry a y b y . In the case of local replacement events, the new individual will also belong to species i with probability N ix /J x . However, the fitness of the haplotypes is used to determine the new individual's haplotype. In the neutral model, we can simply select the species and haplotype using relative abundance. One of the basic tenets of the NTB is ecological equivalence, so to introduce selection within the framework of neutral ecology, the probability to pick an individual from one species has to ignore the internal genetic composition. In this pseudoselection model, we use a multiplicative fitness regime (Charlesworth and Charlesworth 2011, p. 166), leading to , , and w p 1 w p 1 ϩ s w p , where w denotes fitness and s is the selection 2 (1 ϩ s) coefficient. The neutral model is the special case . s p 0 In reality, if a population has many individuals with haplotypes A x b x and A x B x , it should have an advantage over a population with only a x b x individuals, but this would break the ecological equivalence assumption of neutral ecology, so we ignore it. After the haplotype is selected, a x b x mutates to A x b x and A x b x to A x B x at rate m. (3) In the last step, all populations with A x B x fixed undergo speciation ( fig. 1c). The individuals of the new species will carry a x b x , and a new path toward speciation is possible. This is similar to the infinite sites approach of population genetics (Crow and Kimura 1970) and is a direct consequence of neutrality.
We consider four different metacommunity shapes: circle, complete, star, and random. In the circle each community is linked by migration to its two neighboring communities. In the complete metacommunity, each community is linked to all the others. In the star, a single central community forms a link to all outer communities, which have no other links. The random graph is an assemblage of communities based on random geometric graphs (Penrose 2003). These random graphs are used to test algorithms designed for spatial structures such as maps (Sedgewick 2002). The migration matrix is built with a single parameter q, which represents the strength of the links between communities. The migration probability be-tween two linked communities x and y is found by dividing q by the sum of all links to community x plus 1 (for local replacement events). This method ensures that all rows in the migration matrix sum to 1 and that communities with more links are subjected to stronger migration. The probability that an individual selected in community x will be replaced by migration is x 1 ϩ cq with c being the number of communities linked to x. The 1 in the denominator stands for the weight given to local replacement events, and q is always much smaller than 1 so the average migration probability is approximately cq. Circle communities all have ; for communities in c p 2 the complete metacommunity, ; and for stars c p n Ϫ 1 we have , except for the central community, where c p 1 . The average number of links for the random c p n Ϫ 1 graphs depends on n but varies little for . With n ! 30 , the random graphs have on average 2.56 links. n p 10 We explored the model by simulations using an implementation in ANSI C99 (the code is available at https:// github.com/PhDP/origin). Each simulation starts with 20 species evenly distributed in the metacommunity. We compared simulations with communities of size to 2 J p 10 x 10 6 and found similar results. We thus use unless 4 J p 10 x otherwise noted. The mutation rate m for eukaryotes is generally between 10 Ϫ4 and 10 Ϫ6 (Drake et al. 1998;Gavrilets 2004), and we set m to the highest realistic value, . The simulations ran for 100,000 generations (a Ϫ4 m p 10 generation being J x time steps; Rosindell et al. 2010). We recorded the average local and regional species richness over the last 1,000 generations.

Results
We found that for our neutral metacommunity model with realistic parameter values, regardless of its size, shape, and dispersal rate, the local species richness never exceeds a few species. Communities are dominated by one species, sometimes with a few individuals from other species ( fig.  1). For all values of q, regional species diversity at equilibrium is equal to or less than the number of communities. Not surprisingly, we find that reducing the average migration rate increases the speciation rate, but it also increases the number of extinctions. For , the re-Ϫ5 q ! 10 gional species richness stabilizes at n, while for , Ϫ3 q 1 10 the entire metacommunity supports only a single species. We find a threshold migration rate around , Ϫ4 q p 10 where the regional species richness increases suddenly. Around this value, the number of species varies between 1 and n. When , the communities are so isolated Ϫ5 q ! 10 Figure 2: Average number of species in local communities at equilibrium in the pseudoselection model increases nonlinearly with selection. The number of species quickly increases between s p 0.00 and , in part because selection pushes the alleles toward s p 0.05 speciation but also because it reduces the fitness of migrants. We used random geometric graphs with and .
Ϫ4 q p 5 # 10 n p 10 Figure 3: Average local species abundance with selection from 0.05 to 0.35. Random geometric graphs were used with the most favorable set of parameters found ( and ). We set the Ϫ4 q p 5 # 10 n p 10 local community size J x to 22,000, and the species distributions are compared to a 50-ha plot in Barro Colorado Island, Panama (Condit et al. 2002). Selection increases local diversity but saturates quickly. Despite the favorable parameters and strong within-population selection, the level of diversity found in the Barro Colorado Island is reached in only less than 5% of the communities subjected to strong selection ( and ). s p 0.25 s p 0.35 that they are dominated by a single species, sometimes with a small number of individuals from one or two other species.
We studied the effect of increasing the mutation rate beyond realistic values. Keeping J x at 10 4 and q p 5 # , we ran simulations for several mutation rates. Even Ϫ4 10 a 10-fold increase in the mutation rate ( ) has Ϫ3 m p 10 little effect on the equilibrium regional species richness. The metacommunity sustains higher diversity around a mutation rate of . This mutation rate is well Ϫ2 m p 10 above the typical mutation rate (Drake et al. 1998;Gavrilets 2004). This finding lends credit to the theory that the NTB requires unrealistically high speciation rates (Ricklefs 2003).
Within-population selection has an important impact on species richness ( fig. 2). We explored the parameter space to find the values of q and n (the number of communities) where diversity is highest. For q, diversity peaks around , with little variation between the different Ϫ4 5 # 10 community shapes. Local diversity increases with n between 1 and 5 but quickly reaches a plateau. Except for the complete metacommunity and the star, increasing n beyond 10 has no effect on local diversity. Star, circle, and random metacommunities share a similar regional species abundance distribution, which is similar to a lognormal distribution with negative skewness (a long left tail). The complete metacommunity supports fewer species, espe-cially as n increases. There are fewer rare species than the regional distribution seen in the NTB, supporting the criticism of Ricklefs (2003), who argued that the NTB predicted too many rare species.
We illustrated the effect of selection on species abundance distribution with a comparison to the tropical forest in Barro Colorado Island, Panama (Condit et al. 2002). This plot of 50 ha contains 21,457 individuals and 225 species (Condit et al. 2002). To find the minimal amount of selection required in our model to reach this level of diversity, we use the most species-rich combination of parameters for random geometric graphs ( fig. 3) and then increase s to reach an average local species richness of 220 using communities of size . This point is never J p 22,000 x reached. Local diversity increases with s but saturates around ( fig. 3). Selection above has s p 0.15 s p 0.15 little effect on species richness but decreases the median life span from 380, with , to 240, with . s p 0.15 s p 0.35 Selection accelerates speciation events, so, as s increases, species will lose parts of their populations to daughter species at a faster pace, reducing the median life span. The median population size at speciation decreases slightly with selection but remains in the 550-600 range for 0.35 ≥ . This pattern is also related to the speciation rate. s ≥ 0.15 As s increases, newly established populations are more likely to undergo speciation before they can reach high abundances. A small number of communities supported a significantly higher diversity than average. Of the communities with and , 5% supported more s p 0.25 s p 0.35 than 200 species, with the most diverse local community supporting 233 species.

Discussion
In this study, we developed a framework to study speciation as a population process within neutral ecology. The speciation rate was not assumed to take any particular value; it is an emergent property of the system. It depends on selection, the mutation rate, migration, and the shape of the metacommunity. Also, we made no assumption about the relationship between abundance and the speciation rate. Species with more individuals are likely to occupy more communities, so they will have more opportunities to speciate, but the relationship will depend on the shape of the metacommunity and the spatial distribution of the populations. Our goal was to examine the relationship between neutral ecology and a more mechanistic model of speciation based on population genetics. Phenomenological models are not inherently inferior (Mc-Gill and Nekola 2010), but they should be compared to their mechanistic counterpart to determine whether they can provide a good approximation of reality, under what conditions this approximation can hold, and what kind of assumptions are required to make it hold (Kopp 2010). Our assumptions deliberately made speciation easy to achieve. We used the BDM model with only two steps required to reach speciation, and we ignored back mutations (Gavrilets 2004). The mutation rate chosen was plausible but high (Drake et al. 1998;Kumar and Subramanian 2002). There was always a mutation toward speciation available, arguably the most unrealistic assumption because the conditions for speciation are seldom common (Coyne and Orr 2004). All these assumptions greatly favor speciation, yet the model failed to produce metacommunities with many species unless selection is added or the mutation rate is set to impossible levels. The only element that could have a significant negative effect on speciation is the assumption that new migrants always carry the a x b x haplotype (Gavrilets 2004), although this assumption is supported by empirical evidence against speciation with gene flow (Coyne and Orr 2004). The level of diversity seen in the Barro Colorado Island data set is hard to reach with allopatric and parapatric speciation events within our neutral model, even with the addition of within-population selection. Yet recent studies have improved the speciation model within neutral ecology (Rosindell et al. 2010;Rosindell and Phillimore 2011) and it remains to be seen whether allopatric or parapatric speciation can be included explicitly in a neutral model with-out resulting in species-poor communities. This is an important challenge for the neutral theory given the importance of these modes of speciation. Adding temporal variation in q might increase diversity to more reasonable levels as allopatric speciation events often follow changes in the environment (Coyne and Orr 2004). Also, negative frequency dependence could improve the fit by allowing species to remain in local communities for more time (Volkov et al. 2005;Melián et al. 2010).
Speciation can be achieved easily if mutations toward speciation are given some positive selection coefficient. But new species, being the result of the accumulation of fitnessenhancing mutations, should have greater fitness, which would violate the NTB's ecological equivalence assumption. Zhou and Zhang's nearly neutral model showed that small differences between species lead to markedly different species distributions (Zhou and Zhang 2008). However, Du et al. (2011) argued that negative density dependence can offset the effect of competition and lead to neutral patterns. There is little doubt that selection plays an important role in speciation events (Coyne and Orr 2004), and few neutral models of speciation have been developed (Nei et al. 1983). Speciation by drift alone is simply too slow . A possible alternative is to model speciation with positive assortative mating, in which individuals mate more often with similar individuals (Felsenstein 1981;Coyne and Orr 2004). This has been attempted in two recent models (de Aguiar et al. 2009;Melián et al. 2010) based on Higgs and Derrida's (1992) work on neutral dynamics with assortative mating. Although de Aguiar et al. (2009) rely on positive assortative mating, which is generally associated with sympatric speciation (Coyne and Orr 2004, p. 130), their model does not assume a particular biogeography for speciation and covers both sympatric and parapatric speciation events. In their model speciation is also inhibited by gene flow (de Aguiar et al. 2009, fig. 1). Melián et al. (2010) reached the conclusion that frequency-dependent selection leads to more species than neutral models.
When comparing models, one aspect to consider is their complexity. Theoretical population genetics is based mostly on mathematical models that are simple enough to be analytically tractable, which has led to a tendency to ignore spatial complexity (Epperson et al. 2010). Because allopatric and parapatric speciation events rely on this spatial complexity, we have few theoretical models to study the effect of these forms of speciation on diversity. We chose to base our theory on the most common forms of speciation and introduced a simple method to model allopatric and parapatric speciation in complex spatial structures. While using graphs adds a layer of complexity to neutral ecology, our approach fixes some of the problems of the point speciation model without adding new parameters for speciation, as we replace the speciation rate v with the mutation rate m. More importantly, this approach allows us to divide a species into populations, a fundamental unit in evolution. Ricklefs (2003) argued that new species under the point speciation model would not be recognized as species, because those species have appeared instantaneously and are likely too similar. More importantly, one of the problems with a fixed speciation rate v is that speciation is directly influenced by ecological factors such as isolation and habitats. In particular, the inhibiting effect of gene flow on speciation is ignored in most community models with speciation (Hubbell 2001;Volkov et al. 2005;Etienne et al. 2007;Rosindell et al. 2010; but see Rosindell and Phillimore 2011), despite the fact that gene flow shapes patterns of speciation (Coyne and Orr 2004), which in turn has an important influence on the predictions (Etienne et al. 2007;Etienne and Haegeman 2011).