figshare
Browse
dissertation.pdf (1.69 MB)

Multi-Process Statistical Modeling of Species’ Joint Distributions

Download (1.69 MB)
thesis
posted on 2016-03-14, 17:19 authored by David HarrisDavid Harris

As a discipline, community ecology emphasizes a cluster of related questions: what processes cause some species to co-occur but not others? How can we accurately but economically describe the structure of an assemblage with many species? What kinds of assemblages are possible, and under what conditions? Ecologists have developed a broad spectrum of largely-unrelated techniques for addressing different aspects of these closely-related questions, from modeling the multivariate geometry of the data with ordination techniques, to testing various metrics in the observed data against their null distributions, to fitting “stacks” of independent regression models describing each species’ occurrence probabilities under different environmental conditions. Each of these approaches relies on a different set of potentially- incompatible assumptions, and the conclusions drawn from one approach can be difficult to reconcile with those drawn from another.


In this dissertation, I propose a more unified approach, based on estimating the joint probability distribution across all the species. From this perspective, the objective is to determine how likely a given combination of species is likely to exist in nature, and under what conditions it could occur. To the extent that the model structure includes the important ecological processes, such as environmental filtering and species interactions, then the roles of these processes can be inferred based on the model’s coefficients. When these different ecological forces are all included in the same model, ecologists can draw clearer conclusions about their relative importance than would be possible if each effect were tested separately with independent models that make potentially incompatible assumptions.


The models presented here take two complementary approaches toward this objective. Chapter 1 approaches the problem from the perspective of species distribution models (SDMs). Ordinarily, SDMs only produce occurrence probabilities for individual species (rather than whole assemblages). For this reason, a “stack” of such models, which simply identifies sets of climatically-compatible species, can yield systematic errors at the assemblage level. To address this problem, Chapter 1 introduces a stochastic neural network model that estimates assemblage-level patterns using a combination of observed and latent environmental variables. The latent variables, whose true values are not measured, enable the model to describe a mixture of different outcomes, and to concentrate its probability mass on assemblages with realistic co-occurrence patterns.


In Chapters 2 and 3, I address the more difficult question of how to detect direct interactions among pairs of species (such as competition and mutualism) using co-occurrence data. Detecting these interactions can be difficult, partly because of the large number of potential interactions among n species, and especially because it is important to distinguish the effects of direct pairwise species interactions from other factors, such as indirect interactions or the effects of abiotic environmental variables. Here, I show that assemblage-level co- occurrence patterns can be modeled using a set of direct interactions in a Markov network (an undirected graphical model also known as a Markov random field). Chapter 2 introduces this modeling approach and shows that it outperforms existing techniques in distinguishing direct interactions from indirect ones. Chapter 3 introduces an approximation to the model’s likelihood gradient, which enables ecologists to fit these models to much larger assemblages and to partition species’ co-occurrence patterns into the component driven by biotic interactions and the component driven by abiotic environmental filtering.

History