Ecological Archives E084-081-A1

Perry de Valpine. 2003. Better inferences from population-dynamics experiments using Monte Carlo state-space likelihood methods. Ecology 84:3064–3077.

Appendix A. Markov chain Monte Carlo (MCMC) algorithms. A pdf file of the appendix is also available.

This appendix discusses the MCMC algorithms used to sample from for the MCKL method, the kernel density estimator for the MCKL method, and calculation of final likelihood values after MCKL estimation of the maximum-likelihood estimate (MLE). First I describe an MCMC algorithm for for a single replicate, and then I describe the algorithm for . For each I describe example Metropolis-Hastings sampling steps in detail and then list the full combination of sampling steps used.

To set the basic concepts (Gilks et al. 1996, Robert and Casella 1999), consider a Metropolis-Hastings MCMC algorithm to sample from the distribution . Let and be the current and next values of . MCMC works by considering a random proposal value, , and using an acceptance probability to decide whether or . Specifically, if the proposal density is , the acceptance probability is

. (A.1)

Metropolis-Hastings algorithms for different dimensions or blocks of dimensions of can be used iteratively or randomly to produce a chain with stationary distribution .

MCMC Algorithm 1

For sampling from (omitting the subscript), write the components of as where . To sample from , draw a proposal from , here a normal proposal distribution with mean and covariance matrix , where is the 3 × 3 identity matrix and . Let be the noise vector with this proposal value, . Then the acceptance probability is

. (A.2)

In this case, the ratio of ``backward'' to ``forward'' proposals, , from Eq. A.1 (with ) is always 1 and has been omitted. The probabilities of and the other values are the same for and , so their probabilities also cancel. To complete this sampling step, a uniform random variable is drawn, and is accepted if the uniform draw is less that . Otherwise is rejected and the next value in the Markov chain is the same as the previous value, .

To obtain sampling steps for , I used a lognormal sampling distribution with the log of the proposal centered on the log of the current value:

(A.3)

where for this step . The acceptance probability for this step is calculated similarly to Eq. A.2, with and in place of and , but in this case the proposal distribution is not symmetric, so the ratio of ``backward'' to ``forward'' proposals must be included:
. (A.4)

A more general way to think about this type of sampling step is that we can transform coordinates to work with instead of , use a proposal distribution in those coordinates, and then account for the transformation back to our original coordinates when we calculate , which yields the same acceptance probability. This approach will be used for the next MCMC algorithm.

Sampling steps for different components of can be combined in many ways. One useful approach is to sample from adjacent values together. For example, to sample from five adjacent values with , we could propose = , where is normally distributed with mean 0 and covariance matrix , i.e. use the same normal shift for several 's. We then calculate in the same way as Eq. A.2.

One iteration of the full sampler for consisted of samples from blocks of 10 values ( and ), blocks of 4 values ( ), individual samples from each of the 20 values, and 5 samples from . In the MCMC literature, a sampling algorithm that efficiently moves around the target distribution is termed "well-mixed"', and in the model here mixing turns out to be especially important, which is why 5 sampling steps for were used for each iteration.

MCMC Algorithm 2

Metropolis-Hastings sampling in the MCKL algorithm for was considerably more complicated because there were strong correlations among the parameter dimensions and between the parameter dimensions and the process noise dimensions. MCMC algorithms that sample in one direction at a time can be inefficient if the sampling directions do not match the directions of correlation in the target distribution. An example of a correlation among parameter dimensions is that was negatively correlated with because higher fecundity and lower survival can together produce similar population trajectories, and thus similar likelihoods, as lower fecundity and higher survival. An example of a correlation between parameter dimensions and process noise dimensions is that different process noises were more likely (given the data) with higher fecundity and lower survival than with lower fecundity and higher survival.

To obtain good mixing, I used a more complex set of Metropolis-Hastings steps than in MCMC Algorithm 1. A first change was that for , I used a parameterization different than that of the text. Instead of , , and , I used , , and . Second, working from this parameterization, I combined a number of sampling steps that move in directions of parameters and/or states that are motivated by biological interpretations of the model. An example to be explained in detail is of sampling and together. This makes sense because a shift in changes the survival values for each replicate, so a matching shift in fecundity values can produce more likely state trajectories. A biologically motivated way to do this is to draw a random proposal value of and then choose values to hold constant the number of eggs that would mature to juveniles in the mean environment: . Given a proposal value , can be kept constant by choosing the proposal value , for each . However, this adjustment affects the Metropolis-Hastings algorithm and must be accounted for as follows:

View as a coordinate transformation, . The proposal probability in the transformed coordinates involves only the dimension. The proposal probability in the original coordinates is given by standard probability theory as:

(A.5)

where is the Jacobian matrix of derivatives of with respect to each of its arguments, and is the determinant of the Jacobian. In this example, the Jacobian has non-zero values only on the diagonal and on the left-most column. The diagonal values are . The left-most column elements drop out because the determinant of this matrix structure reduces to the product of the diagonal elements.

Now the acceptance probability is:

(A.6)

where as before prime indicates proposal values and no prime indicates current values. For a proposal distribution , I used a "reflected normal" proposal, which is obtained by adding a normal random variable to and "reflecting" about 0 or 1, since we must have . "Reflecting" means that, for example, if and the normal random variable is 0.07, instead of we would use . (In theory one might need multiple reflections, but in practice that didn't arise for the proposals used here.) With reflections it still turns out that .

With this general approach to constructing Metropolis-Hastings steps in temporarily transformed coordinate spaces, with the transformation accounted for by a Jacobian rule like Eq. A.5, one can easily use steps that take advantage of biological interpretations of the model. Of course there are simpler approaches to obtain valid MCMC samplers, but the steps used here greatly improve efficiency over sampling one dimension at a time. The Metropolis-Hastings steps I used for a full sampling iteration were: each of , , and in log coordinates; with each held constant by changing ; with each held constant; and (both randomly sampled) with each held constant; and (both randomly sampled with in log coordinates) with each held constant; with each (lifetime reproductive success) held constant; in coordinates; in coordinates; holding and each constant; in log coordinates; in log coordinates; and all process noises sampled with the same combination of steps described above for MCMC Algorithm 1.

The state of was saved after every 15 iterations, until a sample of states had been obtained. (Implemented in GNU C++ with a 2.0 GHz Pentium processor, each run takes approximately one hour.) Finally, the "priors" for the parameters were: for (although this is centered on 4, it is essentially flat and provides no prior information); exponential distribution with mean 100 for , , , , , and ; and uniform distribution between 0 and 1 for , , .

Kernel Density Estimation

For Monte Carlo kernel likelihood (MCKL), samples were reparameterized into standardized principal components before maximization. This linear transformation gives approximately independent, similarly scaled coordinates. In general, reparameterization of after sampling but before kernel density maximization requires consideration of Jacobians of the parameter transformation (as in Eq. A.5). However, the transformation to principal coordinates is linear, so the Jacobian is constant and MCKL maximization is unaffected.

The kernel was multivariate Gaussian and independent along the principal compenent axes. The smoothing bandwidth in each direction was and for null and alternative hypotheses, respectively. These values were chosen so that for a unit normal Gaussian likelihood - approximately the shape of the principal components sample - and MC sample size , there would be approximately a chance of obtaining an estimate with likelihood of the true maximum, which for log-likelihood-based hypothesis testing is quite accurate.

Final Likelihood Values

The MCKL method gives a maximum likelihood known only up to the unknown constant . To estimate the actual likelihood at the MLE, I used MCMC Algorithm 1 to obtain a sample from for each , estimated the mean and covariance matrix of this sample, and used a multivariate normal distribution with that mean and covariance as an importance sampling distribution to calculate the likelihood. This importance sampling calculation is similar to Eq. 10 of the main text, with a different sampling distribution, , for each . For MCMC Algorithm 1, process noises were saved after every five sampling iterations until a sample of 2000 was obtained to estimate the mean and covariance for the sampling distribution, and then a sample of 10000 points from that distribution was used for the likelihood estimate.

Literature cited

W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, editors. 1996. Markov Chain Monte Carlo in Practice.
Chapman and Hall, New York, New York, USA.

Robert, C. P., and G. Casella. 1999. Monte Carlo Statistical Methods.
Springer, New York, New York, USA.



[Back to E084-081]