A pdf version of Appendix B is also available for viewing.
Appendix B. Derivations of the distributions of the estimated parameters.
Distribution of
the
estimate, Eq. 2
Consider for the moment, an idealized
estimate using subsampled data to eliminate overlap in the Nt+
/Nt
ratios and L = 1. Let's call it
.
We can derive the distribution of
by observing that the slope of var(ln(Nt +
)
- ln(Nt)) vs.
(
= 1,2,…,
')
is basically
since the var(ln(Nt+)-ln(Nt))
vs.
line is generally straight.
Using
where (
,
)
is a gamma distribution with shape
and scale
,
the distribution of
is straight-forward to derive.
(B.1)
Note that the sequential Nt+'/Nt
ratios are chosen so that there is no overlap thus each ratio is independent.
Assume for the moment, that the two gamma distributions are independent -- which
they are not. In this case, we can show as follows that the limiting distribution
of Eq. B.1 as df
'
and df1 become large is
2
with
df
'
degrees of freedom.
The moment generating function of
is
.
Thus the moment generating function for the distribution in Eq. B.1 is
.
Take the natural log of this to get,
.
Using the Taylor expansion for ln(1+x)
and multiplying the second element by
,
.
Ignoring higher order terms, the ln(mgf) has the form:
which is the ln(mgf) of the following
2
distribution:
As noted, the gamma distributions
for the variances of ln(Nt+'/Nt)
and var(ln(Nt+1/Nt)) are actually correlated.
The effect of the correlation, as seen from numerical experiments, is to cause
the distribution in Eq. B.1 to approach the limiting distribution faster (i.e.,
when the dfs in the gamma distributions are smaller).
The
used in the Dennis-Holmes method is somewhat different than the idealized
used in this derivation. First, the
Nt+'/Nt
ratios cannot generally be subsampled due to short time series. This means the
ratios are correlated and df
'
is substantially less than the number of ratios minus one; additionally the
lack of subsampling makes
biased. The data are running sum transformed (L > 1); this leads to
further bias. These are trade-offs that improve estimation for short corrupted
time series by reducing the number of negative variance estimates (percent errors
column in Table B1). Despite the differences, understanding
the limiting distribution for the idealized
helps us understand why when we estimated a non-idealized
(L > 1 and data not subsampled) from simulated data, we observed that
showed a distribution of the form
for a wide range of time series lengths, non-process to process error ratios,
and filter lengths (Table B1).
Monte Carlo estimation was used
to numerically estimate the 2
distributions for the
estimates used in the Dennis-Holmes method (= the slope of ln(Rt+
/Rt) vs.
for
= 1,2,3,4). Monte Carlo estimation
uses parameter estimates from samples of data generated with simulations to
calculate the distribution of the parameter estimate (this is akin to parametric
bootstrapping). We generated 5000 time series of length n using the model,
Nt+1 = Nt exp(
+
p), Ot
= Nt exp(
np)
where the process error,
p
~ Normal(0,
p),
and the non-process error,
np
~ Normal(0,
np).
Let mean(
) denote the mean of all 5000
estimates. For each simulation, we calculated the statistic
=
.
We then found the best fitting dfslp parameter such that
.
This was done by finding the dfslp
that maximized the P value from a Kolmogorov-Smirnov goodness of fit
test. The fitting process was repeated for different time series lengths (n),
filter lengths (L), ratios of process to non-process error (p/
np)
and
and
p.
The best fitting dfslp values for different n, L
and (
p/
np)
are given in Table B1 with the P values for the fitted distribution.
The observed bias and
parameters from the simulations are given in Table
B2. The degrees of freedom depended mainly on the length of the time series,
n, and the length of the filter, L. There was an approximately
linear relationship between n, L and the dfslp
values in Table B1. The following formula gives
a close approximation of the numerically calculated dfslp:
dfslp = 0.333 + 0.212 n - 0.387 L for n > 15.
Variance of Given n observations, O1,
O2, O3 … On, of the true population
size, N1, N2, N3 … Nn,
the Ot series is transformed into a running sum, R1,
R2, R3 … Rr where r = n-L+1
and .
Denote as the mean of the N's
that comprise the running sum, Rt:
,
and recall Ot =
np,t
Nt.
Note that is the mean
of the ln(Rt+1/Rt) ratios from the time series;
however, for corrupted time series, the variance of the
is not 1/(n-L) times the variance of the ln(Rt+1/Rt)
ratios, as it would be the case for uncorrupted time series:
Using the variance of the ln(Rt+1/Rt)
ratios would lead to high overestimation of the variance of .
This overestimation is greater for smaller L.
Estimate of the distribution of
If
and
were known, it would be straight-forward to specify the distribution of
,
(i.e., Normal(
)
however, instead we have to use estimates of
and
which themselves have some distribution. Below is outlined an estimate of the
distribution of
which uses only
.
Deriving a distribution based on both
and
appears problematic given the nature of the distribution of
(see below) and given that the
estimate is not independent of
.
By simple algebra, we can rewrite
as
Point estimate of
A point estimate of
can be calculated by noting that
,
thus
where
from the data.
Tables
Table B1. Numerically calculated
degrees of freedom for the 2
distribution describing the ratio
and percent errors (negative
variance estimates). P values give the fit of the empirical distribution
from a Kolmogorov-Smirnov goodness of fit test. For the degrees of freedom calculations,
negative
were removed
from the sample. For the simulations,
= -0.01 and
= 0.1. The
results were not sensitive to alternate parameter values in the ranges: –0.2<
<0.2
or 0.01<
<0.3.
|
|
|
|
|
|
|
||||||||||
Years |
L |
df |
P value |
% errors |
df |
P value |
% errors |
df |
P value |
% errors |
df |
P value |
% errors |
df |
P value |
% errors |
10 |
3 |
1.3 |
0.02 |
15 |
1.4 |
0.09 |
19 |
1.4 |
0.03 |
25 |
1.4 |
0.03 |
31 |
1.4 |
0.10 |
32 |
10 |
4 |
1.3 |
0.02 |
19 |
1.4 |
0.02 |
21 |
1.3 |
0.00 |
26 |
1.1 |
0.00 |
33 |
0.9 |
0.00 |
35 |
10 |
5 |
1.5 |
0.28 |
38 |
1.5 |
0.05 |
42 |
1.5 |
0.18 |
50 |
0.9 |
0.00 |
58 |
0.7 |
0.00 |
58 |
15 |
3 |
2.1 |
0.19 |
1 |
2.2 |
0.51 |
2 |
2.2 |
0.52 |
3 |
2.1 |
0.35 |
7 |
2.1 |
0.60 |
8 |
15 |
4 |
1.8 |
0.04 |
0 |
1.9 |
0.08 |
1 |
2.0 |
0.06 |
1 |
1.9 |
0.06 |
3 |
1.9 |
0.09 |
3 |
15 |
5 |
1.6 |
0.01 |
2 |
1.6 |
0.01 |
3 |
1.6 |
0.03 |
5 |
1.6 |
0.01 |
10 |
1.6 |
0.00 |
9 |
20 |
3 |
3.3 |
0.29 |
0 |
3.1 |
0.18 |
0 |
3.2 |
0.54 |
1 |
3.4 |
0.85 |
2 |
3.4 |
0.75 |
3 |
20 |
4 |
2.8 |
0.08 |
0 |
2.8 |
0.05 |
0 |
2.9 |
0.03 |
0 |
3.3 |
0.23 |
0 |
3.2 |
0.27 |
0 |
20 |
5 |
2.4 |
0.06 |
0 |
2.3 |
0.06 |
0 |
2.3 |
0.06 |
0 |
2.4 |
0.12 |
2 |
2.4 |
0.11 |
2 |
30 |
3 |
5.7 |
0.20 |
0 |
5.5 |
0.31 |
0 |
6.0 |
0.78 |
0 |
5.9 |
0.98 |
0 |
5.7 |
0.94 |
0 |
30 |
4 |
4.8 |
0.10 |
0 |
4.9 |
0.12 |
0 |
5.4 |
0.18 |
0 |
5.7 |
0.22 |
0 |
5.6 |
0.33 |
0 |
30 |
5 |
4.1 |
0.07 |
0 |
4.1 |
0.16 |
0 |
4.5 |
0.30 |
0 |
4.5 |
0.31 |
0 |
4.3 |
0.31 |
0 |
40 |
3 |
7.8 |
0.49 |
0 |
7.9 |
0.34 |
0 |
8.0 |
0.28 |
0 |
8.4 |
0.93 |
0 |
7.8 |
0.94 |
1 |
40 |
4 |
7.0 |
0.17 |
0 |
6.7 |
0.04 |
0 |
7.4 |
0.15 |
0 |
8.2 |
0.81 |
0 |
8.1 |
0.68 |
0 |
40 |
5 |
6.3 |
0.09 |
0 |
6.6 |
0.06 |
0 |
6.4 |
0.17 |
0 |
6.6 |
0.82 |
0 |
6.2 |
0.87 |
0 |
Table B2. Numerically calculated
mean bias between
and
,
expressed as a percentage of
, and
parameter (Eq. 11) describing the relationship between
and
. Negative
estimates were removed from the sample before calculations. For the simulations,
= -0.01 and
= 0.1. The
results were not sensitive to alternate parameter values in the ranges: –0.2<
<0.2
or 0.01<
<0.3.
|
|
|
|
|
|
|
|||||
Years |
L |
% bias |
|
% bias |
|
% bias |
|
% bias |
|
% bias |
|
10 | 3 | -75 | 0.25 | -71 | 0.27 | -61 | 0.35 | -17 | 0.60 | 54 | 0.45 |
10 | 4 | -87 | 0.12 | -86 | 0.14 | -80 | 0.18 | -53 | 0.35 | -1 | 0.32 |
10 | 5 | -95 | 0.05 | -94 | 0.05 | -91 | 0.08 | -77 | 0.17 | -51 | 0.16 |
15 | 3 | -48 | 0.52 | -44 | 0.55 | -31 | 0.65 | 24 | 1.01 | 120 | 0.92 |
15 | 4 | -61 | 0.40 | -58 | 0.41 | -48 | 0.50 | -7 | 0.79 | 70 | 0.79 |
15 | 5 | -73 | 0.26 | -71 | 0.28 | -66 | 0.33 | -40 | 0.51 | 7 | 0.53 |
20 | 3 | -34 | 0.65 | -31 | 0.68 | -17 | 0.80 | 33 | 1.15 | 128 | 1.15 |
20 | 4 | -45 | 0.54 | -43 | 0.57 | -31 | 0.67 | 11 | 1.00 | 93 | 1.09 |
20 | 5 | -57 | 0.42 | -55 | 0.44 | -48 | 0.51 | -20 | 0.72 | 34 | 0.81 |
30 | 3 | -22 | 0.77 | -19 | 0.81 | -7 | 0.91 | 44 | 1.32 | 140 | 1.48 |
30 | 4 | -33 | 0.67 | -29 | 0.70 | -20 | 0.79 | 25 | 1.16 | 110 | 1.42 |
30 | 5 | -43 | 0.56 | -41 | 0.58 | -35 | 0.64 | -5 | 0.89 | 53 | 1.09 |
40 | 3 | -19 | 0.81 | -15 | 0.84 | -4 | 0.94 | 46 | 1.36 | 136 | 1.63 |
40 | 4 | -29 | 0.71 | -25 | 0.74 | -15 | 0.83 | 28 | 1.21 | 110 | 1.56 |
40 |
5 |
-39 |
0.61 |
-37 |
0.63 |
-30 |
0.68 |
-1 |
0.94 |
54 |
1.20 |