Appendix A. Markov chain Monte Carlo (MCMC) algorithms. A pdf file of the appendix is also available.
This appendix discusses the MCMC
algorithms used to sample from for the
MCKL method, the kernel density estimator for the MCKL method, and calculation
of final likelihood values after MCKL estimation of the maximum-likelihood estimate
(MLE). First I describe an MCMC algorithm for
for a single replicate, and then I describe the algorithm
for
. For each I describe example Metropolis-Hastings sampling
steps in detail and then list the full combination of sampling steps used.
To set the basic concepts (Gilks
et al. 1996, Robert and Casella 1999), consider
a Metropolis-Hastings MCMC algorithm to sample from the distribution . Let
and
be the current and next values of
. MCMC works by considering a random proposal value,
, and using an acceptance probability to decide whether
or
. Specifically, if the proposal density is
, the acceptance probability is
For sampling from (omitting the
subscript), write the components of
as
where
. To sample from
, draw a proposal
from
, here a normal proposal distribution with mean
and covariance matrix
, where
is the 3 × 3 identity matrix and
. Let
be the noise vector with this proposal value,
. Then the acceptance probability is
In this case, the ratio of ``backward''
to ``forward'' proposals, , from Eq. A.1 (with
) is always 1 and has been omitted. The probabilities of
and the other
values are the same for
and
, so their probabilities also cancel. To complete this sampling
step, a uniform random variable is drawn, and
is accepted if the uniform draw is less that
. Otherwise
is rejected and the next value in the Markov chain is the
same as the previous value,
.
To obtain sampling steps for , I used a lognormal sampling distribution with the log
of the proposal centered on the log of the current value:
(A.3) |
(A.4) |
A more general way to think about
this type of sampling step is that we can transform coordinates to work with
instead of
, use a proposal distribution in those coordinates, and
then account for the transformation back to our original coordinates when we
calculate
, which yields the same acceptance probability. This approach
will be used for the next MCMC algorithm.
Sampling steps for different components
of can be combined in many ways. One useful approach is to
sample from adjacent
values together. For example, to sample from five adjacent
values with
, we could propose
=
, where
is normally distributed with mean 0 and covariance matrix
, i.e. use the same normal shift for several
's. We then calculate
in the same way as Eq. A.2.
One iteration of the full sampler
for consisted of samples from blocks of 10
values (
and
), blocks of 4
values (
), individual samples from each of the 20
values, and 5 samples from
. In the MCMC literature, a sampling algorithm that efficiently
moves around the target distribution is termed "well-mixed"', and
in the model here mixing
turns out to be especially important, which is why 5 sampling
steps for
were used for each iteration.
MCMC Algorithm 2
Metropolis-Hastings sampling in
the MCKL algorithm for was considerably more complicated because there were strong
correlations among the parameter dimensions and between the parameter dimensions
and the process noise dimensions. MCMC algorithms that sample in one direction
at a time can be inefficient if the sampling directions do not match the directions
of correlation in the target distribution. An example of a correlation among
parameter dimensions is that
was negatively correlated with
because higher fecundity and lower survival can together
produce similar population trajectories, and thus similar likelihoods, as lower
fecundity and higher survival. An example of a correlation between parameter
dimensions and process noise dimensions is that different process noises were
more likely (given the data) with higher fecundity and lower survival than with
lower fecundity and higher survival.
To obtain good mixing, I used a
more complex set of Metropolis-Hastings steps than in MCMC Algorithm 1. A first
change was that for , I used a parameterization different than that of the text.
Instead of
,
, and
, I used
,
, and
. Second, working from this parameterization, I combined
a number of sampling steps that move in directions of parameters and/or states
that are motivated by biological interpretations of the model. An example to
be explained in detail is of sampling
and
together. This makes sense because a shift in
changes the survival values for each replicate, so a matching
shift in fecundity values can produce more likely state trajectories. A biologically
motivated way to do this is to draw a random proposal value of
and then choose
values to hold constant the number of eggs that would mature
to juveniles in the mean environment:
. Given a proposal value
,
can be kept constant by choosing the proposal
value
, for each
. However, this adjustment affects the Metropolis-Hastings
algorithm and must be accounted for as follows:
View as a coordinate transformation,
. The proposal probability in the transformed coordinates
involves only the
dimension. The proposal probability in the original coordinates
is given by standard probability theory as:
where is the Jacobian matrix of derivatives of
with respect to each of its arguments, and
is the determinant of the Jacobian. In this example, the
Jacobian has non-zero values only on the diagonal and on the left-most column.
The diagonal values are
. The left-most column elements drop out because the determinant
of this matrix structure reduces to the product of the diagonal elements.
Now the acceptance probability is:
(A.6) |
where as before prime indicates
proposal values and no prime indicates current values. For a proposal distribution
, I used a "reflected normal" proposal, which
is obtained by adding a normal random variable to
and "reflecting" about 0 or 1, since we must
have
. "Reflecting" means that, for example, if
and the normal random variable is 0.07, instead of
we would use
. (In theory one might need multiple reflections, but in
practice that didn't arise for the proposals used here.) With reflections it
still turns out that
.
With this general approach to
constructing Metropolis-Hastings steps in temporarily transformed coordinate
spaces, with the transformation accounted for by a Jacobian rule like Eq.
A.5, one can easily use steps that take advantage of biological interpretations
of the model. Of course there are simpler approaches to obtain valid MCMC samplers,
but the steps used here greatly improve efficiency over sampling one dimension
at a time. The Metropolis-Hastings steps I used for a full sampling iteration
were: each of ,
, and
in log coordinates;
with each
held constant by changing
;
with each
held constant;
and
(both randomly sampled) with each
held constant;
and
(both randomly sampled with
in log coordinates) with each
held constant;
with each
(lifetime reproductive success) held constant;
in
coordinates;
in
coordinates;
holding
and each
constant;
in log coordinates;
in log coordinates; and all process noises sampled with
the same combination of steps described above for MCMC Algorithm 1.
The state of was saved after every 15 iterations, until a sample of
states had been obtained. (Implemented in GNU C++ with
a 2.0 GHz Pentium processor, each run takes approximately one hour.) Finally,
the "priors" for the parameters were:
for
(although this is centered on 4, it is essentially flat
and provides no prior information); exponential distribution with mean 100 for
,
,
,
,
, and
; and uniform distribution between 0 and 1 for
,
,
.
Kernel Density Estimation
For Monte Carlo kernel likelihood
(MCKL), samples were reparameterized into standardized principal components
before maximization. This linear transformation gives approximately independent,
similarly scaled coordinates. In general, reparameterization of
after sampling but before kernel density maximization requires
consideration of Jacobians of the parameter transformation (as in Eq.
A.5). However, the transformation to principal coordinates is linear, so
the Jacobian is constant and MCKL maximization is unaffected.
The kernel was multivariate Gaussian and independent along the principal
compenent axes. The smoothing bandwidth
in each direction was
and
for null and alternative hypotheses, respectively. These
values were chosen so that for a unit normal Gaussian likelihood - approximately
the shape of the principal components sample - and MC sample size
, there would be approximately a
chance of obtaining an estimate with likelihood
of the true maximum, which for log-likelihood-based hypothesis
testing is quite accurate.
Final Likelihood Values
The MCKL method gives a maximum
likelihood known only up to the unknown constant . To estimate the actual likelihood at the MLE, I used
MCMC Algorithm 1 to obtain a sample from
for each
, estimated the mean and covariance matrix of this sample,
and used a multivariate normal distribution with that mean and covariance as
an importance sampling distribution to calculate the likelihood. This importance
sampling calculation is similar to Eq. 10 of the main text, with a different
sampling distribution,
, for each
. For MCMC Algorithm 1, process noises were saved after
every five sampling iterations until a sample of 2000 was obtained to estimate
the mean and covariance for the sampling distribution, and then a sample of
10000 points from that distribution was used for the likelihood estimate.
W. R. Gilks, S. Richardson,
and D. J. Spiegelhalter, editors. 1996. Markov Chain Monte Carlo in Practice.
Chapman and Hall, New York, New York, USA.
Robert, C. P., and G. Casella.
1999. Monte Carlo Statistical Methods.
Springer, New York, New York, USA.