Precision of systematic and random sampling in clustered populations : habitat patches and aggregating organisms

Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic fi eld sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard onestart aligned systematic survey design, a uniform 10 × 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were onethird to onefi fth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to fi ve times larger. Organisms being restricted to patches of habitat was alone suffi cient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confi dence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators ( v ) that corrected for inter-transect correlation ( v 8 and v W ) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two “poststratifi cation” variance estimators ( v 2 and v 3 ) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, v 8 and v W again being the best performers in the longerrange autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confi dence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confi dence interval. The search continues for better estimators of sampling variance for the systematic survey mean.


INTRODUCTION
One major challenge in fi eld sampling is that most natural populations are not distributed uniformly or randomly in space (Perry et al. 2002 ).Plants and animals cluster for (at least) two reasons.First, species are restricted to patches of suitable habitat, for example, in strata of favorable elevation, soil substrate, plant cover, or depth.Second, within patches of suitable habitat, individual organisms of the same species tend to aggregate, for reasons including predatory defense (Brönmark et al. 1984, Green and Nunez 1986, Dew 1990 ), enhanced reproductive success (Stevens et al. 1994 , Babcock andKeesing 1999 ), or plant offspring settling near parents (Ehrlen and Eriksson 2000, Levine and Murrell 2003, Suzuki et al. 2005, Seidler and Plotkin 2006 ).Clustering renders fi eld surveys of populations using quadrats or transects less precise, often yielding many zero counts Precision of systematic and random sampling in clustered populations: habitat patches and aggregating organisms RICHARD MCGARVEY, 1,3 PAUL BURCH, 1,2 AND JANET M. MATTHEWS 1 1 SARDI Aquatic Sciences , P.O.Box 120 ,Henley Beach ,South Australia 5022 Australia Abstract .Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density.We compare the precision of random and systematic fi eld sampling survey designs under these two processes of species clustering.Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey.Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination.The standard one-start aligned systematic survey design, a uniform 10 × 10 grid of transects, was much more precise.Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fi fth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to fi ve times larger.Organisms being restricted to patches of habitat was alone suffi cient to yield this precision advantage for the systematic design.But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confi dence intervals.True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates.Testing 10 published and three newly proposed variance estimators, the two variance estimators ( v ) that corrected for inter-transect correlation ( v 8 and v W ) were the most accurate and also the most precise in clustered populations.These greatly outperformed the two "post-stratifi cation" variance estimators ( v 2 and v 3 ) that are now more commonly applied in systematic surveys.Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, v 8 and v W again being the best performers in the longer-range autocorrelated populations.However, no systematic variance estimators tested were free from bias.On balance, systematic designs bring more narrow confi dence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confi dence interval.The search continues for better estimators of sampling variance for the systematic survey mean.and a few very high counts.These skewed sample distributions greatly reduce the precision of survey estimates of mean population density (Pennington andVølstad 1994 , Pennington et al. 2002 ), or of quantities that depend on density.Clustered (spatially autocorrelated) populations pose a formidable cost challenge to environmental monitoring studies, requiring larger sample sizes to achieve equivalent levels of survey estimate precision.In this study we consider the two most common survey designs for environmental monitoring and ecological fi eld study to assess their precision in populations that spatially cluster by these two processes.
The comparison of systematic and random survey methods was fi rst active from the late 1930s to the 1950s.Statistical studies (Madow and Madow 1944, Cochran 1946, Yates 1948, Matérn 1960, Bellhouse 1977 ) concurred that systematic sampling schemes are more precise in autocorrelated populations.Some of this work (Madow andMadow 1944 , Cochran 1946 ) was done assuming linear populations were to be sampled, such as an alphabetic list of households to be surveyed by telephone.A parallel succession of studies undertaken in the population biological literature (Bourdeau 1953 ), notably for application to sampling a two-dimensional space, surveying forests for timber yield estimation (Hasel 1938, Finney 1948, 1949 ), concluded that the precision advantage of systematic designs was minor and that random sampling is preferred (Greig-Smith 1983 ).More recent statistical investigation (D ' Orazio 2003, Wolter 2007 ) assumed systematic sampling to be more precise in autocorrelated populations and focused on the still unsolved problem of how to reliably estimate the variance of the estimate of the mean from a systematic survey for linear (Wolter 1984, 2007) and two-dimensional (i.e., spatial) populations (D ' Orazio 2003 ).Published studies in the applied ecological literature have now shifted to general agreement that systematic designs are more precise in spatially autocorrelated populations (e.g., Dunn and Harrison 1993, Ambrosio et al. 2004, Aune-Lundberg and Strand 2014 ).
In practice, fi eld ecologists continue to apply both random and systematic sampling designs (Legendre et al. 2002 ).Random sampling satisfi es the assumption of independence among samples, assuring accurate estimates of confi dence interval by the standard error formula,

√
s 2 ∕n , where the statistic s is the estimator of standard deviation and n is the number of observations.Systematic designs are often chosen for ease of design and implementation.Both random and systematic designs are unbiased (for systematic, see Madow and Madow [ 1944 ]), a major advantage achieved with no prior knowledge of the population to be sampled.Freedom from bias is not guaranteed with more sophisticated model-based or adaptive sampling approaches, though these can achieve higher precision when spatial model assumptions are met.
As a fi rst objective, we evaluate and compare the precision of random and systematic survey designs for measuring population density in populations that cluster by habitat patchiness, aggregating behavior, or both.These two spatial processes, affecting probabilities or patterns of organism locations in space, are typical of many or most natural populations.High levels of clustering are a particularly common source of imprecision in marine surveys (Pennington 1996 ).Using simulated survey transects allocated randomly or systematically within a square two-dimensional study region, we quantifi ed the imprecision of both survey designs by the variance of the repeated survey estimates, as the spread of replicated simulation survey mean densities.A measure of the cost advantage that may accrue from choosing the more precise design is inferred for each spatial population.
As a second objective, we evaluate variance estimators for the sample mean used to compute confi dence intervals in systematic surveys.We tested 13 variance estimators, including 10 previously published (D ' Orazio 2003, Wolter 2007 ).A comparison of 13 different variance estimators has not been undertaken, to our knowledge.Wolter ' s ( 1984, 2007) study, extended by D ' Orazio ( 2003 ), has been the most comprehensive to date.We also tested three "covered grid" variance estimators proposed here (Appendix), which extend Yates' ( 1960 ) and Wolter ' s ( 2007 ) balanced difference method.In covered grid estimators, each balanced difference term uses all transects in each row or column of the systematic grid to uniformly cover the full width, the full length, or both, of the survey study region.

METHODS
Comparing the precision of random and systematic survey designs requires (1) simulated or enumerated spatial point populations of organism locations distributed within a study region, and (2) a method to simulate survey sampling of these spatial populations by the two survey designs to be tested.
Here, to test survey design performance in populations clustering by patchiness and aggregating organisms, we (1) generated six, and then a further 12, spatial point populations (Diggle 1983 ), each organism ' s location represented by an xy coordinate point in a 1-km 2 study region, and (2) simulated transect sampling, measuring population density in each clustered (or unclustered) population.Using simulated rather than enumerated spatial populations permitted testing of a controlled range of clustering processes typical of natural populations, namely due to habitat patchiness and aggregating behavior, together and separately.Using populations with these combinations, we assessed the effect of these two forms of spatial clustering in natural populations for (1) comparing the relative precision of systematic vs. random sampling, and (2) comparing 13 estimators for the variance of the mean from a systematic sample.Each simulated transect yields a single organism count, a sample measure of density.The accuracy and precision of each survey design was assessed by comparing replicated transect survey estimates of mean density with the true population density of each simulated population, which is known without error.

Simulated spatial populations
Six spatial point populations with differing degrees of patchiness and aggregating behavior alone or in combination were generated (Fig. 1 ).Population densities of the six simulated populations are given in Table 1 (row 1).Population a is spatially random, i.e., unclustered, with organism locations assigned independently and randomly.In populations b and c, organisms are again assigned randomly, but restricted to two drawn band-shaped patches of habitat covering ~22% of the study region.Populations d-f (Fig. 1 d-f) are Matérn aggregated, in which organisms are distributed randomly within circles of 50 m radius, with the circle center points positioned randomly.The aggregations of population d are distributed across the entire study region (Fig. 1 d), while populations e and f are Matérn aggregated within the two habitat patches (Fig. 1 e, f).Matérn cluster circles can, by chance, randomly overlap (Fig. 1 d-f), doubling the density in that aggregation.The two patches of habitat contain organisms of either approximately equal population density (Fig. 1 b, e), or of densities that differed between the two patches by an approximate factor of 10 (Fig. 1 c, f).These six simulated populations represent a wide range of clustering, from no clustering as complete spatial randomness (CSR, Fig. 1 a; Diggle 1983 ) to both aggregated and patchy (Figs. 1 e,f).The spatstat R package (Baddeley and Turner 2005 ) was used for both simulation steps of generating clustered populations and sampling them with transects (R code given in Supplement 1).

Simulated survey sampling: random and systematic
For each simulated survey, organism counts were obtained from a sample of 100 transects.Each transect was 1 × 50 m; 20 000 transects completely cover the study region without overlapping.These 20 000 distinct transect positions provide the sample frame.For random surveys, 100 transects were selected independently and randomly, without replacement (Fig. 2 , top right), from among the 20 000 possible transect locations.For systematic sampling, the 1-km 2 study region was partitioned into a 10 × 10 array of grid blocks, each a square of size 100 × 100 m (Fig. 2 , left).Two rows of 100 1 × 50 m transects cover each grid block.For each systematic sample of 100 transects, a fi rst transect was chosen at random in the fi rst (upper left) grid block.Following the standard procedure of one-start (Madow and Madow 1944 ) aligned systematic sampling in a two-dimensional study region (Quenouille 1949, Cochran 1977 ), the randomly chosen start position within the fi rst grid block was used for the position of all remaining transects, one transect in each of the other 99 grid blocks (Fig. 2 , bottom right).Each survey yields a measured mean density as the mean from 100 transects.For each population and survey design, 10 000 replicated Monte Carlo survey samples were drawn.

Accuracy and precision of the survey mean under random and systematic survey designs
We measure the accuracy (lack of bias) of each survey design by the closeness of the mean of 10 000 replicated simulated survey density estimates to the true overall population density.
To compare the precision (inverse of variance) of the two survey designs, the true sampling variance of each survey design in each population was computed as the sample variance of mean density estimates from each set of 10 000 simulated surveys.The cost advantage obtained by choosing the more precise survey design was inferred as the effective sample size of randomly allocated transects that would be required to equal the precision of the systematic survey design, knowing the variance of a randomly sampled mean varies inversely with sample size (as n in s 2 / n ).

Variance estimators for the survey mean
For systematic surveys, a variety of methods have been proposed to compute a confi dence interval for the survey mean.In this study, we tested the performance only of design-based methods that use a single formula for the estimate of sampling variance.We estimate the accuracy and precision of each tested variance estimator for systematic surveys using simulation by comparing the (single) true variance (defi ned in the preceding subsection) with the mean of 10 000 estimator variances, one for each survey of 100 simulated transects.
The variance estimator ( v ) of the mean for identical independent random sampling is s 2 ∕n .Under sampling without replacement, the fi nite population correction (Wolter 2007 :300).This correction is often small since f , the proportion of the population sampled, is small for most surveys.Following Cochran ( 1977 ) and Wolter ( 2007 ), the fi nite population correction was included in all 13 variance estimators tested.
Ten of the evaluated variance estimators for the systematic survey mean were from published sources.Wolter ( 2007 )   estimators (Appendix), extending the balanced difference method of Yates ( 1960 ) and Wolter ( 2007 ).In addition to the three covered grid estimators described in the Appendix, formulas are included for v 1 (the textbook estimator) and v 8 (given in Discussion ).The remaining eight variance estimator formulas are detailed by their authors (D ' Orazio 2003, Wolter 2007 ).The 13 variance estimators for systematic samples were coded as subroutines in R (R Core Team 2013 ; Supplement 2) and applied to each simulated systematic survey sample of 100 transects.Applying Wolter ' s v 2 -v 8 requires a specifi c ordering of transects, which was done here in a vertical orientation, starting from the top left transect, down each column of grid blocks, columns succeeding from left to right.

Bias of random and systematic survey designs
The mean densities of 10 000 simulated surveys, for both simple random sampling (Table 1 , row 2) and systematic sampling (Table 1 , row 3), were close to the true population density (values in these two rows are close to 1), implying both survey designs are unbiased.Because these designs are both representative, neither using a priori information about population density in allocating samples within the study region, this result of non-bias is expected and well known.It was fi rst proven for systematic sampling by Madow and Madow ( 1944 ).

True precision of random and systematic survey designs
While both random and systematic designs are unbiased, they differ in sampling variability precision.For all fi ve clustered populations (b-f, Fig. 1 b-f), with n = 100 transects, the systematic survey design was much more precise.The true variances for the systematic survey estimate of mean density (from 10 000 survey estimates) were one-third (36%) to one-fi fth (18%) of the true variances achieved by the random design (Table 1 , row 5, columns b-f).
For population d, with the population clustering only by Matérn aggregating of organisms (Fig. 1 d), true systematic survey variances were one-third (0.33, Table 1 , row 5, column d) those of randomly allocated transects.Theory of effective sample size (Pennington and Vølstad 1994 ), based on the inverse linear decline of sampling variance with sample size under independent random sampling, implies that a survey using the random design would require three times as many transects (301 vs. 100, Table 1 , row 6) to equal systematic precision, given the observation that the random sampling variance is three times wider.
When organisms were distributed with complete spatial randomness (no aggregating behavior), but were restricted in habitat range to two patches of either roughly equal density (Fig. 1 b), or density that differed by 10-fold (Fig. 1 c), the systematic survey design again yielded true sampling variances that were much smaller, 29% and 23%, respectively, of those obtained using the random design (Table 1 , row 5, columns b, c), implying that for these two populations that expressed clustering only via habitat patchiness, with 100 systematic transects, 3.4 and 4.3 times as many random transects (344 and 430 vs. 100, Table 1 , row 6, columns b, c) would be needed to equal the precision obtained using the 10 × 10 square-grid systematic design.
When individuals were both Matérn aggregated and restricted to suitable habitat, systematic sampling variances for populations distributed in two patches of roughly equal density (Fig. 1 e) or 10-fold differing densities (Fig. 1 f) were 18% and 36% those of the 10 000 random surveys (Table 1 , row 5, columns e, f), respectively, implying that over fi ve or nearly three times (565 and 275 vs. 100, Table 1 , row 6, columns e,f) as many random transects would be needed to equal systematic survey precision.
In summary, for these simulated clustered populations, a systematic 10 × 10 square grid of transects produced estimates of absolute population density that were three to fi ve times more precise, thus requiring approximately one-third to one-fi fth as many transect sample counts to achieve equivalent precision in the estimate of the sample mean.This precision advantage for systematic survey held about equally for the two tested sources of clustering in natural populations, habitat patchiness and aggregating behavior.
By contrast, in the purely spatially random (CSR) population (Fig. 1 a), the systematic survey design was slightly less precise, the true variance of 10 000 systematic survey estimates being 14% larger (1.14, Table 1 , row 5, column a) than that obtained by random sampling.

Testing variance estimators for systematic surveys
For the random survey design, in all six populations, the textbook (sample) variance estimator ( v 1 ) gave accurate agreement with the true variance (results not shown).This was expected because random survey designs satisfy the assumptions of identical independent sampling.
However under systematic sampling, the v 1 -estimated variances differed from the true variances in all six tested populations (Table 1 , row 9), implying bias.In the CSR population, v 1 slightly underestimated (0.87) the true spread of systematic mean density estimates.In the fi ve clustered populations, v 1 overestimated the true sampling variance, by 2.79 up to 5.63 times (Table 1 , row 9, columns b-f), implying v 1 underestimated the true precision of systematic survey estimates of mean density in clustered populations.
The true variances of the systematic sample for populations a-f, each computed as the variance of Notes : Section 1 assesses bias of random and systematic sampling, giving the means of 10 000 survey-measured densities compared with true population densities.Section 2 compares true precision of random with systematic designs.Sections 3 and 4 show bias of 13 tested variance estimators under systematic sampling, quantifi ed by the means of 10 000 simulation-survey estimated sampling error variances ( v ) divided by the true sampling error variance for each population.Covered grid variance estimators are row-covered (RCG), column-covered (CCG), and twice-covered (TCG).
the replicated sample means from 10 000 simulated systematic surveys, are given in row 4 of Table 1 .
The accuracy of each variance estimator was quantifi ed by the ratio of the mean of 10 000 variance estimates divided by the true variance.Mean values near 1 in Sections 3 and 4 of Table 1 imply accurate (unbiased) estimates of systematic sampling variance, and are graphically shown as red dots (means of 10 000 estimated variances) near the blue line (true variance) in Fig. 3 .Values greater than 1 imply overestimated (overly wide) confi dence intervals.
In the purely random CSR population (Fig. 1 a), all 13 variance estimators underestimated the true variance of systematic sampling (Fig. 3 a).Estimators other than v 8 and v W underestimated by 10-13% (Table 1 , column a, rows 9-21).The least biased were two of the covered grid estimators (0.90 for v TCG [twice-covered grid estimator] and v CCG [column-covered grid estimator]); v 8 and v W underestimated the true variance by about one-third.
One variance estimator, v 7 , performed particularly poorly, giving unacceptably wide ranges of variance estimates, with values reaching close to zero for all populations (Fig. 3 ) other than b (Fig. 3 b), where it was highly biased.Means of v 7 well above medians (Fig. 3 ) imply large upper tails.This exceptionally unstable performance of v 7 is suffi cient to rule it out for general use, as Wolter ( 2007 ) had earlier concluded for linear populations.We do not consider v 7 henceforth.
For the population of underestimation biased (0.62, Table 1 , row 18, column b; Fig. 3 b).Populations b and c showed similar results overall (Fig. 3 b and c), with v 8 the best performer, being both accurate and precise, and with most others overestimating or greatly overestimating the true variance of the systematic sample mean; v W considerably underestimated the true variance in these two populations.After v 8 , covered grid estimators were the next least overestimation biased (Fig. 3 b and c).
In population d, with Matérn clustering but no patchiness of habitat (Fig. 1 d), v 8 (1.09) and v W (1.19) showed relatively low bias (Table 1 , column d, rows 16 and 18; Fig. 3 d).The others ( v 1 -v 6 , v STR2 , and three covered grid estimators, Table 1 , column d; Fig. 3 d) showed much higher and similar bias, overestimating the true variance on average by ~2.4-3.1 times.In addition to being nearly unbiased, v 8 and v W also had much tighter spreads of variance estimates (boxes and whiskers, Fig. 3 d), and so were also more precise.
Both patchiness and aggregation within patches were simulated in populations e and f.For population e (Fig. 1 1 , column e; Fig. 3 e), though all overestimated.After v W and v 8 , covered grid estimators showed the least, though still large, bias (~3.73-3.93,Table 1 , column e).For this common ecological case of a population aggregating within habitat patches, v 8 and the closely related v W were again clearly superior, showing both lower bias and narrower interquartile ranges (Fig. 3 e), though overestimation remained.
For the most highly clustered population (Fig. 1 f), with Matérn clustering and restriction to two habitat patches of unequal density, all variance estimators performed poorly, overestimating and showing a wide spread of estimates; v W showed the lowest bias (Table 1 , column f, row 18) and a modestly smaller than FIG. 4 .Twelve spatially autocorrelated populations, generated using the R package gstat, where the degree of autocorrelation was controlled by two parameters of the variogram, psill and range.Boxplots show the spread of 10 000 variance estimates for each of the 13 tested variance estimators, as in Fig. 3 .average interquartile range (Fig. 3 f), while v 8 performed less well (Fig. 3 f).Covered grid estimators were the next least biased (Table 1 , column f).

Variance estimator comparison with a second set of spatial populations
The principal new result obtained from these simulations was the appearance of v 8 and v W as more reliable estimators of sampling variance for the systematic survey mean in clustered populations.This was observed for the fi ve spatial point populations (b-f), which express clustering by drawn habitat patch boundaries and Matérn (circular) aggregations.The question naturally arises, how do these results extend to other plausible clustered populations?To test for sensitivity to the spatial point populations generated, we used a different set of algorithms to produce a second set of 12 spatially autocorrelated populations (Fig. 4 , populations g-r).To sample these new populations, the same simulated systematic survey design was applied (Fig. 2 ) using n = 100 1 × 50 m transects.The 13 variance estimators were again applied to 10 000 simulated systematic surveys in each population to evaluate estimator performance.
The autocorrelation properties of these new populations were controlled by two parameters of the variogram, psill and range.Following Goslee ( 2006 ), we used the gstat R package (Pebesma andWesseling 1998 , Pebesma 2004 ), which provides routines for kriging and simulation, to allocate organism point locations.The range specifi es the separation distance beyond which points are effectively uncorrelated.For all simulated populations of Fig. 4 , we set the nugget (i.e., measurement error of repeated sampling at the same location) equal to zero.With a zero nugget, the psill parameter we control equals the sill as normally defi ned, being the asymptotic value of the semivariance at separation distances greater than the range.Two values of psill (1 and 10) were chosen, and for each of these, six values of the range of autocorrelation: 0.001, 10, 50, 100, 150, and 200 m.The R code used to generate this second set of populations is given in Supplement 3.
By testing the two values of psill, we examine two suites of populations where the maximum density of tight clusters differ by an order of magnitude, broadening these results to apply for a wider range of possible natural fi eld populations.Simulating a broad array of autocorrelation distances (range values) tests the robustness of these results from populations that are effectively uncorrelated in space (range = 0.001 m) to those correlated over long distances (range = 200 m) greater than the length of transects (50 m) and the distance between neighboring transects (100 m).
By this method of generating autocorrelated spatial populations, unoccupied areas arose naturally within the study region (see point maps of these 12 populations in Fig. 4 ).Clustering is visually evident in the point maps as range increases.
The results for this second set of 12 populations (Fig. 4 ) show qualitative similarity, with some differences, to those observed for the six populations described previously.The textbook estimator ( v 1 ) was again consistently the worst performer, overestimating the sampling variance greatly for increasingly autocorrelated populations (Fig. 4 boxplots, populations i-l and o-r).
In Fig. 3 , for the four populations restricted to patches of habitat (b, c, e, less strongly for f), the variance estimators can be approximately ranked in terms of bias starting with the worst bias of In Fig. 3 , only the population with no habitat patches (d) did not express this trend, yielding similar overestimation bias for estimators v 2 -v 6 and v RCG -v TCG , though v 1 was again the worst, and v 8 and v W were by far the best.
This same bias trend was evident for a majority of the gstat populations in Fig. 4 , strongly (i, l, q) or weakly (j, o, p), but was less evident for others (k, r).
Again v 7 showed unacceptably poor performance for all 12 populations of Fig. 4 .D ' Orazio ' s extension of v 2 , v STR2 , as in Fig. 3 , showed high overestimation bias similar to v 2 , overestimating more than other estimators (except v 1 ) for the high-range autocorrelated populations (k, l, q, r).
The performance outcomes for v 8 and v W were similar to those of Fig. 3 , but differed between the two levels of psill.For psill = 10, v 8 and v W , in the more clustered populations (p-r) were the best performers, showing both low bias and much tighter spreads of variance estimates, much like for populations b-e.However with psill = 1, for populations j and k, v 8 and v W showed large underestimation bias, as previously reported by Wolter ( 2007 ) for v 8 and by D ' Orazio ( 2003 ) for v W , despite being much more precise.For populations l and r with the longest tested autocorrelation range (200) under both psill levels, v 8 and v W were both much less biased and much more precise.
One general trend was identifi ed by varying the range parameter.At some value of the range, between 10 and 50 m for psill = 1, and below range = 10 for psill = 10, all variance estimators shift from below to above the blue line (Fig. 4 , h compared to i and m to n), that is, from all underestimating to all, except v 8 and v W , overestimating the true sampling variance.
Overall, the trends for populations g-r showed similarities to those of populations a-f, with generally lower or much lower bias of v 8 and v W , always higher precision of v 8 and v W in more highly clustered populations, and tendency of v 8 and v W to underestimate, even in some clustered populations (b, c, i, j, k).Between these two best performers, v W showed greater underestimation bias for some populations (b, c, j, l, q).For all unclustered populations (a, g, h, m), v 8 and v W underestimated the true sampling variance more than other estimators.

True precision of random and systematic survey designs
The square grid systematic design gave variances of the survey mean that were three to fi ve times smaller than the random survey design in populations that clustered by habitat patchiness, aggregating behavior, or both (Table 1 , row 5, columns b-f).This implies that a random design would require ~300-500 transects to equal the precision of 100 systematically allocated transects for estimating population density in these clustered populations.
Such large (300−500%) reductions in sampling error variance under a systematic design were not found in the majority of previous studies.One possible reason for this is that this study uses synthetic populations, while many studies used enumerated real world data.Another reason is that we have sought to replicate levels of clustering more commonly observed in marine sampling, while these studies more often used forest or landscape data.However, in Cochran ' s ( 1977 :223) table of variance ratios, stratifi ed random variances were sometimes two to fi ve times wider than systematic, and all survey design comparison studies we cited have found that systematic designs are more precise than simple random or stratifi ed random.Cochran ( 1946, see also Iachan 1982 ) analytically proved that a systematic design is more precise for linear populations whose spatial autocorrelation declines exponentially with separation distance.Examples of systematic designs bringing higher survey precision in spatially autocorrelated populations were reported in a range of applied ecological contexts (e.g., Dunn and Harrison 1993, Ambrosio et al. 2004, Aune-Lundberg and Strand 2014 ) and in the application of Euclidean distance analysis to animal habitat selection analysis (Benson 2013 ) and in stereology (Gundersen et al. 1999 ).
Restricting organisms that are otherwise randomly distributed to habitat patches covering 22% of the survey study area was alone suffi cient to yield a high (~300-400%) level of precision improvement with systematic sampling (Table 1 , rows 5 and 6, columns b,c).Previous studies comparing random with systematic designs (or comparing the variance of systematic variance estimators) have not investigated the effect of demarcated patchiness in species habitat.Since restriction to favorable habitats is typical of most natural populations, this source of imprecision must be encountered by many environmental and ecological surveys.
In clustered populations, meaningful improvements in survey effi ciency permit either lower-cost surveys or more precise information from a given sample size, affording large increases in statistical power.As the cost constraints on fi eld studies and environmental monitoring increase, meaningful improvements in effi ciency favor the choice of a systematic design.
In addition to (1) higher survey precision in clustered populations, two further natural advantages favor a systematic design: (2) systematic designs are more practical to implement.Sample locations positioned on a grid makes systematic designs easier to plan and carry out.Field studies often use systematic designs for this reason.(3) A systematic grid of measurements is superior to random sample locations for use as data input to geostatistical (i.e., contour) mapping.
Spatial mapping inference, modeling the distribution of organisms across the study region, for example by ordinary kriging (Marchant and Lark 2007 ) or spatial modeling using maximum likelihood (Diggle and Ribeiro 2007 ), generally give more reliable estimates when survey sample locations are allocated uniformly across the study region, ideally augmented by additional samples at shorter distance scales (Diggle and Ribeiro 2007 ).
Random sample locations provide an uneven coverage and so a less consistent source of spatial information, and random sampling is not often chosen for mapping applications.In application to abalone fi shery management, from surveys using semisystematic (Byth and Ripley 1980 ) density measurements, maps of spatial distribution were validated in a fi sh-down experiment (McGarvey et al. 2008 ).Contour maps generated from semi-systematic surveys before and after fi shing were faithful predictors of where higher abalone densities were confi rmed by commercial fi shing, and where population density changed most by fi shing removals.Geostatistical mapping methods, including kriging (Marchant andLark 2007 , Li andHeap 2008 ) and spatial statistical modeling (Diggle and Ribeiro 2007 ) also benefi t by information about autocorrelation over shorter separation distances than the distance between systematic samples (Diggle and Lophaven 2006 ).Subdividing each transect into quadrats and recording organism counts in each quadrat along (within) transects permits shorter (1-100 m) scales of spatial autocorrelation to be measured directly (McGarvey et al. 2010 ).Other modifi cations of systematic designs, such as combinations of systematic, stratifi ed, and random (Quenouille 1949 ) have been proposed.
One practical advantage favors a random design.Harbitz and Pennington ( 2004 ) showed that the shortest path among random sample locations traces, on average, a shorter sampling pathway than along a systematic grid.This outcome refl ects one objective of systematic sampling, to maintain a wider mean distance between neighboring sample locations, minimizing the effect of spatial autocorrelation, and covering the survey region more uniformly.In practice, identifying the shortest path among random sample locations can be challenging, particularly for larger sample sizes, there being no general mathematical solution to the traveling salesmen problem.Moreover, in practice, with varied topography and surveys run monthly or yearly over large areas, a shorter path may be a smaller consequence for practical implementation of surveys for researchers in the fi eld than other practical advantages of a systematic design.
Systematic designs probably achieve the greatest precision advantage in pelagic sampling applications.In terrestrial studies, the availability of remote sensing and direct observation means that differences in habitat type can be readily discerned and mapped, permitting reliable stratifi cation of the study region, often greatly improving survey precision.In pelagic sampling using a net, with either vertical or horizontal tows, very high levels of sampling variance are common (Pennington 1983(Pennington , 1996 ) ). Remote sensing can only provide presurvey knowledge of the surface layer, and the pelagic environment varies over time.High levels of clustering are common in pelagic environments, due to both high patchiness and dense aggregations spread over three dimensions, resulting in differences among sample counts of sometimes two to three orders of magnitude (many samples with zero counts, and a few samples with very high counts of hundreds or thousands of eggs or larvae, or of adults when fi sh are schooled).Aggregations of copepods over coral reefs were measured in swarm densities of hundreds of thousands per cubic meter (Hamner and Carleton 1979 ).
In choosing between a random or systematic allocation of sample locations in environmental monitoring or ecological fi eld work, researchers face a trade-off.In autocorrelated (clustered) populations (identifi ed, in practice, by meaningful numbers of low or zero transect counts and a few big counts), a systematic survey design will nearly always be more precise (Cochran 1946, 1977, D ' Orazio 2003 ).In more highly autocorrelated populations, such as the fi ve clustered populations (b-f) of Fig. 1 , the improvement in survey estimate precision with systematic sampling is large.On the other hand, a random design brings more reliably estimated, unbiased confi dence intervals using the textbook estimator, v 1 , by meeting the assumption of independence.The trade-off is better (more precise or effi cient) estimates with a systematic design in clustered populations, but how (much more) precise remains uncertain.

Testing variance estimators for systematic surveys
To address this uncertainty, following Wolter ( 1984Wolter ( , 2007 ) ), we tested a suite of variance estimation formulas applicable for use with systematic sampling.Some recent studies of systematic variance estimators have limited comparison of the textbook variance estimator to one of several forms of post-stratifi cation estimator (Dunn andHarrison 1993 , Aune-Lundberg andStrand 2014 ) represented in this study by v 2 and v 3 .We examined a wider range of variance estimators than previous studies, evaluating Wolter ' s eight variance estimators, and fi ve variance estimators designed for use in spatially resolved (not linear) populations, two proposed by D ' Orazio ( 2003 ), and three covered grid balanced difference estimators proposed here (Appendix).
We did not assess model-based approaches, which show considerable promise (Simmonds and Fryer 1996, Aubry and Debouzie 2000, 2001, Diggle and Ribeiro 2007, Cressie and Wikle 2011, Fewster 2011, Opsomer et al. 2012 ) but are more difficult to code and implement than the simpler designbased systematic survey variance estimators tested here.Model-based methods depend on a correct choice of model assumptions for the true spatial distribution of organisms inside the survey study region, including its autocorrelation structure (Särndal et al. 1978 , Diggle andRibeiro 2007 ).This can work very well for studies carried out on restricted sites, but semi-variograms included in Aune-Lundberg and Strand ( 2014 ) demonstrate the challenges of fitting models to real-world data for large and variable regions.Modeling methods continue to advance, and Bayesian methods (Cressie and Wikle 2011 ) or Gaussian random fields across the spatial domain (Thorson et al. 2015 ) offer further statistical modeling power for surmounting the high and complex variability common in real spatial data.
The other important class of survey methods we did not examine was adaptive sampling (Thompson andSeber 1996 , Seber andSalehi 2013 ).These can be designed for estimating density in clustered populations by placing higher sampling intensity where the species of interest occurs with higher probability as inferred from the survey itself.Generally these must be tailored to each application, and the aim is to fi nd adaptive designs that are unbiased.Asking researchers to implement decisions in the fi eld in real time about where and whether to sample can be challenging, with overestimation a potential risk (McGarvey 2006 :93).An alternative method for targeting the higher density sub-areas with higher sampling intensity for which non-bias can be assured by classical statistics is to (pre-) stratify the study region.Strata can be drawn in subsequent iterations of a monitoring survey after a contour map from an initial uniform-grid systematic survey demarcates areas of high and low density.Systematic sampling within well-drawn strata (Cochran 1977 :226-227, Aune-Lundberg andStrand 2014 ) offers two ways to reduce survey sampling variance.
Among the eight that he tested for systematic samples, Wolter ( 1984Wolter ( , 2007 :332) :332) recommended the two post-stratifi cation variance estimators ( v 2 and v 3 ).Fewster ( 2011 ) noted that these have become the most commonly applied (e.g., Millar and Olsen [ 1995 ], one by Simmonds and Fryer [ 1996 ], Kingsley [ 2000 ]) for systematic surveys.However, the simulation results here (Table 1 and Fig. 3 ) found v 2 and v 3 , after v 1 , to be the most overestimation biased in clustered populations.Similar overestimation bias by v 2 and v 3 was also evident in the 12 populations (i-l, o-r) of Fig. 4 , though for these, the differences between v 2 and v 3 and other estimators (besides v 8 and v W ) were less pronounced.D ' Orazio ' s ( 2003 ) v STR2 , an adaptation of Wolter ' s v 2 , was similarly biased.This poor performance by the post-stratifi cation variance estimators could refl ect their having been developed and tested by Wolter for linear, rather than spatially, autocorrelated populations.We did not test the Matérn ( 1947 ) two-dimensional versions of local spatial difference estimators used in Scandinavian forest surveys (reviewed by Heikkinen 2006 , Aune-Lundberg andStrand 2014 ).
We found Wolter ' s versions of balanced difference estimators ( v 4 -v 6 ) to be less biased than the poststratifi cation estimators for populations b, c, e, i, l, q, and less clearly so for f, j, o, and p. Covered grid estimators, which we developed as extensions of Wolter ' s v 4 -v 6 , improved on v 4 -v 6 in spatially clustered populations b-c, e-f, l, p, q, and more weakly so for i, j, k, o.Covered grid estimators ( v RCG , v CCG , v TCG ) were the second best performing category of variance estimator.
For clustered populations, covered grid ( v RCG , v CCG , v TCG ) and other estimators ( v 1 -v 6 , v STR2 ) were considerably more biased, and much less precise, than the best performers among the 13 we tested here, v 8 and v W .However, for v W in populations b, c, q, and for both v 8 and v W in populations i, j, k, underestimation of sampling variance for some clustered populations was evident (Figs. 3 and 4 ).
These results were effectively reversed in unclustered populations, where all 13 estimators underestimated the true variance (a, h, m), with v 8 and v W being the most, rather than the least, biased (a, g, h, m).Future work could address the considerable underestimation bias that remains for v 8 and v W in unclustered populations.Wolter ( 2007 :332) reported that v 8 has remarkably good properties for the simulated populations with spatial autocorrelation or a linear trend.D ' Orazio ( 2003 :293) also concluded that v W was the appropriate choice for populations with high spatial correlation.But Wolter and D ' Orazio observed underestimation bias of v 8 and v W respectively, and overall found in favor of post-stratifi cation estimators, notably for lower sample sizes and lower levels of autocorrelation.
Future work: improving on ν 8 and ν W The problem of estimating the variance of the systematic sample mean was fi rst addressed in Scandinavian forestry in the 1920s and 1930s (Heikkinen 2006 ), and later in North American forestry (Hasel 1938, Osborne 1942, Finney 1948, 1949, Bourdeau 1953 ).A series of studies by statistical sampling theorists from the 1940s onward, fi rst reviewed by Buckland ( 1951 ), extended Cochran ' s idea of a superpopulation model to investigate statistical properties of estimators in spatial populations where systematic sampling was more precise (Quenouille 1949, Das 1950, Jowett 1952, Williams 1956, Hannan 1962, Iachan 1982, Bellhouse and Sutradhar 1988 ).However, this theoretical approach has not yet produced a robust, unbiased variance estimator generally usable with systematic data sets (see e.g., Cochran 1977 ).Wolter ( 1984, 2007), D ' Orazio ( 2003 ), and others proposed and investigated a selection of diverse estimators.The unexpectedly strong performance that we observed for v 8 and its close variant v W may suggest a pathway forward toward still more reliable variance estimates under systematic sampling.
These two least biased and most precise variance estimators for clustered populations, v 8 and v W , use formulas that explicitly correct for the observed level of autocorrelation (if positive) between successive systematic samples.In particular, v 8 is written (Wolter 2007 :302): .
The autocorrelation ρ in v 8 quantifi es observed correlation between successive points in the systematic sample grid.In D ' Orazio ' s v W ( 2003 ), Moran ' s index of spatial autocorrelation was substituted for ρ in seeking a two-dimensional generalization of v 8 .Future development of better systematic variance estimators in two-dimensional autocorrelated populations should seek to account for the full autocorrelation function vs. separation distance (Quenouille 1949, Matérn 1960, Diggle and Ribeiro 2007 ) to improve on v 8 and v W . Relatively low bias by v 8 and v W for positively autocorrelated (clustered) populations suggests that the ρ -dependent correction terms can work well for We conjecture that one important improvement to v 8 and v W which could reduce this underestimation by v 8 and v W in unclustered populations would be to extend its reach to permit more direct use of ρ estimates that are negative.In Wolter ' s v 8 , the correction terms are (2∕ ln( ρ)+2∕( ρ−1 − 1)) applied only for positively auto- correlated (clustered) populations, necessary, in particular, since ρ is used as the argument of a log function.In randomly distributed populations, where ρ averages around zero, the chance of a sampled negative or positive value is about equal.The ρ sign asymmetry in Wolter ' s v 8 formula specifi es that no correction is made to the textbook variance estimator for ρ ≤ 0 , which plausibly leads, in part, to the observed underestimation bias of v 8 and v W in CSR populations.A systematic variance estimator is sought which does not truncate for negative ρ and which can incorporate not merely a single scalar ρ but the full dependence of autocor- relation on separation distance within the survey study region (Cochran 1946, Quenouille 1949, Diggle and Ribeiro 2007 ;P. J. Diggle, personal communication ).

CONCLUSIONS
Systematic survey designs are superior to random designs in clustered populations for three important reasons.They provide more precise survey estimates in populations that cluster by habitat patchiness or aggregating behavior, they are easier to implement, and they produce input data suitable for geostatistical mapping.The better precision of systematic designs for estimating population density should extend to any environmental or ecological quantity that depends on population density such as standing stock biomass, CO 2 absorbed or released, or prey consumed.To address the overestimation of confi dence intervals for systematic surveys in clustered populations, among those evaluated, v 8 and v W were clearly superior for estimating the variance of the survey sample mean, but these were the worst performing in unclustered populations.The statistical search is ongoing for a variance estimator usable with systematic surveys that is unbiased (or meaningfully less biased) in both clustered and random populations.
FIG. 1 .Six simulated populations used for testing the performance of random and systematic survey designs.These populations express two forms of clustering: patchy habitat and aggregating behavior, here simulated by a Matérn process.Population a is generated as a complete spatial random ( CSR ) process.Populations b and c are CSR within two band-shaped patches, where in population c the lower patch has a density ~1/10th smaller.Population d expresses only Matérn clustering with no patches, while organisms in e and f Matérn cluster within two habitat patches, in population f with patch densities again differing by factor of 10.
FIG.2 .Diagram of the two transect sampling designs, random and systematic.For both designs, a simulated sample of 100 transects was allocated within the study region to measure absolute population density.Here, the population is Matérn clustered in patches of unequal density (Fig.1 f).The 1000 × 1000 m study region (left) was divided into 100 grid blocks, each 100 × 100 m, to construct the systematic design.Boxes to the right illustrate possible differences in transect placement for a 300 × 300 m central portion of the study region.Under (aligned) systematic sampling, transects (red lines) are uniformly spread over the study area (one transect allocated to the same position in each grid block).Under simple random sampling, each transect is allocated within the study region independently of others, resulting by chance in some grid blocks being sampled multiple times while others are left unsampled.
FIG.3.Boxplots displaying systematic variance estimator precision and accuracy.Thirteen variance estimators ( v 1 -v TCG ) for survey mean density were tested using replicated systematic sampling of the six populations of Fig.1.The fi rst 10 variance estimators include the eight presented byWolter ( 2007 ), v 1 -v 8 , plus two extensions of Wolter estimators proposed by D ' Orazio ( 2003 ), v STR 2 and v W .The three covered grid variance estimators ( v RCG , v CCG , v TCG ; row-covered, column-covered, and twice-covered, respectively) we propose here and describe in the Appendix.Each survey gave a measured mean density from 100 simulation transect counts.Box (25% and 75% quantiles) and whiskers (10% and 90% quantiles; arrows indicate that whiskers extend beyond the displayed y -axis range) display variance estimate precision as the spread of 10 000 simulated estimated variances.The horizontal line in each box shows the median.Each red dot marker shows the mean of 10 000 variance estimates.Y -axis scaling is given by dividing each estimator variance by the (single) true variance of 10 000 replicate survey mean densities.Blue lines show the value (1) at which estimated survey variance agrees with true.

TABLE 1 .
Test statistics from simulation sampling by 100 transects in the six simulated populations of Fig. 1 a-f.