Robust algorithms for simulating spatial cluster processes

We review existing algorithms for generating simulated realisations of a spatial Neyman–Scott cluster point process and propose new algorithms. In real applications, the values of the model parameters passed to the algorithm can be extremely large or extremely small. The existing algorithms fail for such extreme cases, due to excessive computation time and memory demands. Fundamental reasons for this failure are analysed. We propose new algorithms that combine the best features of the existing algorithms and achieve control over the computation time and memory demands, even for extreme values of the model parameters. We analyse performance using theory and simulation experiments. Efficiency improvements of several orders of magnitude are achieved.


Introduction
A cluster process is a spatial random point process, constructed by first generating a random spatial pattern of 'parent' points, and then replacing each parent point by several 'offspring' points, randomly displaced from the parent [1][2][3]; [4,Chap. 12].
Algorithms for generating simulated realisations of a cluster process are important for many applications.The challenge is that the distance from parent to offspring could be arbitrarily large.In order to generate a simulated realisation inside a bounded spatial window W, the algorithm must generate parent points lying outside W.
'Naïve' simulation algorithms involve specifying a permitted maximum distance b from parent to offspring.The original window is expanded by this distance.One may then simply generate the process of parents in the expanded window, generate offspring of these parents, and restrict the offspring to those which lie inside W. Naïve algorithms are not exactly correct in general, and they become costly when the spatial scale of clusters is large.
Brix and Kendall [5] developed elegant simulation algorithms for Cox Poisson cluster process models, based on the insight that the parents which have offspring in the window W constitute a Poisson point process, with a nonuniform intensity ρ(x) throughout space, and containing a finite total number of points.Their algorithms generate the parent process directly according to this intensity.These algorithms are exact (i.e. the output has exactly the desired distribution) and are much more efficient than the naïve algorithm when the cluster scale is large.
Unfortunately, existing simulation algorithms break down when the cluster scale, and other model parameters, are extremely large or extremely small.This is important because extreme values of the model parameters are frequently obtained in real applications, when the parameters are estimated from data.In a separate article [6], we explain the fundamental causes and show that these extreme parameter values still correspond to valid point process models.
Consequently, real applications require the simulation algorithm to be 'robust' against extreme values of the model parameters, in the sense that the expected computation time and expected memory usage are uniformly bounded.
In this note, we review existing algorithms, demonstrate that they are not robust, and propose two new 'hybrid' algorithms which are robust.Algorithm performance is analysed theoretically and measured experimentally.
The plan of the paper is as follows.Section 2 defines the Neyman-Scott Cox point process model.Section 3 describes the Naïve and Naïve Conditional simulation algorithms.Section 4 describes the Brix-Kendall [5] algorithms.In Section 5, we propose a 'superdominating' algorithm that may be more computationally efficient than the Brix-Kendall algorithm.In Section 6, we propose 'hybrid' algorithms, which are computationally efficient, and robust against extreme values of the model parameters.Section 7 reports on a comprehensive simulation study of the performance of all the algorithms mentioned.We end with a Discussion in Section 8. Some calculations are relegated to the Appendix.Open-source code for the new algorithms is available.

Model definition
Neyman and Scott [1] developed a class of point process models which are used frequently in applications, and have been extensively studied [2,7,8].In the models of interest here, the cluster process is formed by a two-step procedure.In the first step, a point process X of 'parent' points is generated.In the second step, each 'parent' point x i ∈ X gives rise to a random pattern of 'offspring' points y ij .The point pattern consisting of all the offspring points y ij , regardless of their parentage, forms the realisation of the cluster point process Y.
In order to make progress, we confine attention to the models defined below.

Definition 2.1:
A stationary and isotropic Neyman-Scott Cox process, with cluster scale parameter s, is a point process Y in R 2 formed as follows: (1) The process X of 'parent' points is a homogeneous Poisson point process with intensity κ > 0; (2) Given X, each point x i in X gives rise to a finite set i = {y ij : j = 1, . . ., N i } of 'offspring' points y ij such that (a) (a)the number n( i ) = N i of offspring of x i has a Poisson (μ) distribution; (b) (b)for each parent x i , the locations of its offspring y i1 , y i2 , . . .are independent and identically distributed with probability density where s > 0 is the cluster scale parameter and h is a probability density on R 2 called the template kernel, assumed to be isotropic, h(u) = h( u ); (c) (c)the offspring processes 1 , 2 , . . . of different parents are independent point processes; (3) The cluster process Y consists of all offspring points, that is, the superposition of clusters Note that the parent points are excluded from the final cluster process and that some 'parents' may have no offspring.For a review see [9,Chap. 7], [4,Chap. 12], [10,Chap. 6].Neyman and Scott [1] considered much wider classes of models, including multi-stage constructions; however, the term 'Neyman-Scott cluster process' is usually interpreted as the two-stage model specified in Definition 2.1, with the exception of Assumption 2(a), so that the number of offspring per parent may follow any discrete distribution.We proposed the name 'Neyman-Scott Cox process' in [4,Chap. 12] when the number of offspring has a Poisson distribution.
If Definition 2.1 holds, Assumption 2(a) implies that, given X, each cluster i is a finite Poisson process with intensity and since the clusters are independent, their superposition Y is a Poisson point process with intensity (given X) ( That is, Y is a Cox process, with driving random intensity (3).This representation is important in the analysis.
The parameters of the model are θ = (κ, μ, s) with parameter space = (0, ∞) 3 .In applications, h might depend on additional 'shape' parameters but these are implicitly assumed to be fixed.
The intensity of the process is λ = κμ.The expected 'sample size' (total number of offspring points in W) is λ|W|, where |W| denotes the area of W.

Model examples
We shall consider three common models in this class, representing progressively more difficult challenges.
In the Cauchy cluster process [15,16], the template kernel is the standard Cauchy density in two dimensions, . This density has unbounded support and infinite variance for any s > 0.
Figure 1 shows a simulated realisation of the Thomas process in the unit square, showing the parents of these points and the link from parent to offspring.

Extreme values of the parameters
In applications, it is often required to generate simulations from the Neyman-Scott Cox model with parameters obtained by fitting the model to data.In such cases the model parameters κ, μ, s may sometimes be extremely large or extremely small.The reasons have been studied in our companion paper [6].An example is the case where κ is very large, μ is very small, and λ = κ μ is a reasonable value that is neither small nor large in the context.Despite the extreme values of the parameters, the model is welldefined, and is approximately a Poisson process with intensity λ.Such parameter estimates can occur because standard model-fitting techniques [17] first estimate the intensity λ from the data as λ = n/|W| where n is the observed number of points.The parameters κ and s are then estimated by a separate procedure.If the data do not contain strong evidence of clustering, then it is likely that κ will be very large and μ will be very small.We have established that this can occur even if the data contain evidence of clustering.
Extreme values of the parameters may cause failure of the existing simulation algorithms.Failures include termination by the computer system due to excessive memory demands, excessive computation time, or non-response to system interrupts; or error exit due to detection of numerical problems (failure to converge in numerical integration; failure to find solutions in numerical root-finding; numerical error due to rounding, etc).

Algorithm performance and robustness
The most important practical measures of algorithm performance are computation time, memory resources and probability of failure.These were recorded in our simulation experiments.However, these performance measures are difficult to analyse theoretically, because they depend on details of the implementation.Instead, for our theoretical analysis, we shall measure an algorithm's performance using the expected total number of random proposal points generated during the algorithm (whether or not they are retained or deleted subsequently).This is much easier to analyse and is indirectly related to both memory usage and computation time.
We shall say that an algorithm is (theoretically) robust if, for a given expected sample size (i.e.given expected number of points in the simulated point pattern) the expected number of proposal points is uniformly bounded as a function of the model parameters.Our goal is to find robust algorithms for simulating the Neyman-Scott Cox processes described above.We will then confirm by experiment that the algorithms are also robust with regard to computation time and absence of failures.

Naïve simulation algorithm
Many existing algorithms for simulating Neyman-Scott models use a simple strategy in which the window is expanded by a border region of finite width b, and parents are generated with uniform intensity in the expanded window For the Matérn cluster process, the naïve algorithm is exact (i.e.outputs of the algorithm have exactly the desired distribution) when the expansion distance b is taken to be greater than or equal to the disc radius R defining the process.For cluster kernels which do not have compact support, such as the Gaussian and Cauchy kernels, the naïve algorithm is approximate, because it omits parents lying at distances greater than b away from W. Common practice is to let b equal the 1−p quantile of the distribution of distance from a parent to its offspring, where p is a specified small probability.Scaling properties imply that b = c p s for a constant c p .The error involved in this truncation is calculated in the Appendix.
For the naïve algorithm, the expected number of parents generated in step 1 is κA b and the expected number of proposed offspring in step 2 (before deletion of points outside W) is μκA b = λA b , where A b = |W ⊕b | is the area of the expanded window.The expected total number of proposal points is T = (κ + λ)A b .This is not uniformly bounded in (κ, μ, s) for fixed λ = κμ, because κ can be made arbitrarily large.That is, the naïve algorithm is not robust against extreme values of κ.
The algorithm can be improved by generating the parent points conditionally on having at least one offspring point.That is, parents are generated with intensity κ † = κ(1 − e −μ ), and for each parent, the number of offspring points is a zero-truncated Poisson variable (a Poisson random variable N with mean μ conditioned on N > 0).For fixed λ = κμ we have κ † = κ(1 − e −λ/κ ) and a little calculus shows that κ † is an increasing function of κ satisfying κ † ≤ min(κ, λ) and κ † → λ as κ → ∞.This ensures that the expected number of parents to be generated is bounded by a multiple of the expected sample size.The conditional naïve algorithm should be more computationally efficient than the naïve algorithm when μ is small.The expected total number of proposals is now The conditional naïve algorithm is robust for fixed cluster scale s.
If W is convex, the area A b = |W ⊕b | is a quadratic function of b, according to the Steiner formula [18].Since b = c q s, the expected total number of proposals T is also a quadratic function of cluster scale.Consequently, the naïve and conditional naïve algorithms are not robust against large cluster scales.
Figure 2 shows the results of a simulation experiment which measured the average computation time and number of proposals involved in generating one realisation of the Thomas process with κ = 10 and μ = 10, as a function of the scale parameter σ , using the Naïve and Naïve Conditional algorithms implemented in the R package spatstat [4,19].Technical details are given at the end of the article.
As expected, for large σ the computation time and number of proposals increase quadratically with cluster size.For very large σ , the algorithm sometimes exited with an error message related to memory limits or non-response.
In the case shown (with μ = 10), the Conditional Naïve algorithm has negligible improvement in computational efficiency over the Naïve algorithm.Other experiments confirm that there is a great improvement in efficiency for small μ.

Brix-Kendall algorithms
Brix and Kendall [5] developed simulation algorithms for cluster process models, based on the insight that the point process of parents which have offspring in the window W is a spatial Poisson point process, with nonuniform intensity ρ(x) throughout space.Their algorithms generate the parent process directly according to this intensity.The algorithms are exact (i.e. the distribution of the output is exactly the desired point process distribution), and more efficient than the naïve algorithm when cluster radius is large, but require more extensive computation.

Direct algorithm
Consider a stationary Neyman-Scott Cox model with template kernel h(x).The point process X of parents is a stationary Poisson process with intensity κ > 0. For a parent at location x, the cluster of offspring of x consists of a Poisson (μ) number of random points, independently and identically distributed with probability density k(u | x) = h s (u − x) where h s (x) = s −2 h(x/s) is the rescaled kernel and s > 0 is the cluster scale parameter.For a parent at x, let the probability that an offspring of a parent at location x will lie inside W. The number of offspring of x falling in W is a Poisson random variable with mean μP(W | x).The probability that a parent at x has any offspring in Let X [W] be the sub-process of X consisting of parents which have any offspring in W. By a thinning argument [20][21][22], X [W] is a Poisson process with intensity ( 5 ) Thus, the following algorithm generates exact realizations of the Neyman-Scott process, by generating only X [W] .Brix and Kendall [5] called this the 'Method in Simplest Form'.
(1) Generate a Poisson process X in R 2 of 'parents' with intensity ρ(x).
(2) For each parent point x i in X, (a) Generate a random integer N i from the Poisson distribution with mean and zero otherwise.(3) Collect all offspring points y ij .Delete those which fall outside W. Return the retained offspring points as the realisation of the cluster process Y inside W.
The direct algorithm is a mathematical algorithm rather than a detailed procedure, because it does not specify exactly how step (1) should be performed.

Dominating algorithm
It may be difficult to perform step (1) in Algorithm 4.1, because the intensity function ρ(x) may be complicated because of the geometry of W. To avoid this, Brix and Kendall [5] propose a thinning procedure.

Algorithm definition
First, the window W is enclosed in a larger set D ⊇ W with simpler geometry.Next, we choose a 'dominating kernel' k(u | x) which is any function such that, for u ∈ D only, we have k(u | x) ≥ k(u | x) for all x.Note that k(• | x) is not a probability density, and implicitly, k(u | x) should be designed to be more analytically tractable than k(u | x).Correspondingly (which is not a probability) satisfies P(D | x) ≥ P(W | x) for all x, and satisfies ρ(x) ≥ ρ(x) for all x.Brix and Kendall [5] called the following algorithm the 'Modified Method'.(1) Enclose W in a set D ⊃ W which has simpler or more convenient geometry.
(2) Generate a Poisson process X in R 2 of parents with intensity ρ(x).
(3) For each parent point x i in X, (a) Generate a random integer N i according to the Poisson distribution with mean

Validity
To see that Algorithm 4.2 is valid, we observe that the combined effect of step (2) and step (3a-b) is equivalent to generating a Poisson process of parents with intensity κ, and for each parent point x i , generating a finite Poisson process of 'proposed offspring' y ij with intensity μ k(y | x i ), then deleting parents which have no offspring.After independent thinning in step (3c), the offspring of each parent constitute a finite Poisson process with intensity μk(y | x i ).Consequently, given all parents, the retained offspring constitute a Poisson process with intensity (3).Thus the algorithm has the desired distribution, exactly.
In order that the algorithm terminates in finite time, the expected number of parents E[n( X)] and the expected number of proposed offspring E[n( Y)] should be finite.We have is required to be integrable, and ) is required to be integrable.The expected total number of proposals is

Implementation of Brix-Kendall dominating algorithm
Implementation of the Brix-Kendall dominating Algorithm 4.2 requires the artful choice of a dominating kernel k making it possible to evaluate the terms P(D | x) and ρ(x) at any point x, as well as any calculations required to perform the random generation steps ( 2) and (3b).

Standard recipe
One recipe for choosing k was proposed in [5,Sec. 3.1] and it is this recipe which is usually known as 'the' Brix-Kendall algorithm.The dominating kernel is chosen to be which trivially satisfies the requirement that k Assume the kernel h is isotropic, h(x) = h 1 ( x|) say, continuous at 0, and monotone decreasing with radius: r < t implies h 1 (r) > h 1 (t).These assumptions are true for all the example models considered here.
Then it will be advantageous to choose D to be a disc, of radius r D .Take the coordinate origin to be the centre of the disc.Then for any x ∈ D, which gives and Figure 3 sketches these definitions.We verify below that ρ(x) and P(D | x) are integrable.Brix and Kendall [5] noted that ρ(x) is radially symmetric, ρ(x) = ρ 1 ( x ) say, so that step (2) of the dominating algorithm (Algorithm 4.2) can be performed by generating a one-dimensional Poisson process of radial distances r i with one-dimensional intensity ζ(r) = 2π r ρ(r), and assigning x i = r i (cos θ i , sin θ i ) where θ i is uniformly distributed on [0, 2π ].See [23].Define the radial cumulative integral and note that M(r) < ∞.The radii r i can be generated as follows.(2) Generate a Poisson random variable N with mean M(∞). ( The resulting points x i are the points of a Poisson process with intensity ρ(x) as required for step (2) of the dominating algorithm (Algorithm 4.2).
For a further efficiency, we note that ρ(x) is constant for all x ∈ D, so that M(r) = Br 2 for r ≤ r D , where B = πκ(1 − exp(−μπ r 2  D h s (0))).

Validity and theoretical performance
We now check that this recipe satisfies the integrability conditions.These calculations are also important for studying performance.First, note that using the inequality 1 − e −x ≤ x.It suffices to compute The remaining integral is where V denotes the random vector from a parent to its offspring when the cluster scale is set equal to 1. Since h(x) is assumed to be continuous at x = 0, we have Hence the integrability conditions are satisfied.
The expected total number of proposals is where and This indicates that the Brix-Kendall dominating Algorithm 4.2 is robust against large cluster scales (s → ∞) but is not robust against small scales (s → 0).Indeed, for a parent x inside the disc D, the dominating kernel ( 9) is uniform on D with value h s (0) = s −2 h(0), which is unbounded as s → 0.
A related observation is that the Brix-Kendall dominating Algorithm 4.2 cannot be applied if the kernel is infinite at the origin, h(0) = ∞, for in that case the dominating kernel would be infinite everywhere inside D. An example of such a kernel is the 'variance-gamma' or Bessel model [16, p. 126] when the shape exponent parameter ν is negative.

Details for example models
Here we find explicit expressions for the dominating intensity functions for the Thomas, Matérn and Cauchy cluster process models.These are important for the study of performance and are useful for efficient software implementation.

Details for Matérn model
Here the template kernel is the uniform density on the unit disc,

Details for Thomas model
Here the template kernel is h ) and the intensity of dominating parents (11) is Calculation of the cumulative radial integral (12) requires numerical integration, except on [0, r D ] as noted above.

Details for Cauchy model
Here the template kernel is h + /s 2 ) −3/2 so that (10) becomes and the dominating intensity (11) is Calculation of the cumulative radial integral (12) requires numerical integration, except on [0, r D ].

Performance in experiments
Figure 4 compares the empirical performance (mean computation time and number of proposals) of the dominating Brix-Kendall Algorithm 4.2 with that of the naïve Algorithm 3.1 for the Thomas process with κ = 10 and μ = 10, as a function of cluster scale σ .As predicted, the dominating algorithm is inefficient for small scales, but efficient for large scales.
Roughly speaking if the cluster scale σ is smaller than about 3 times the diameter of the disc enclosing the original window, then the naïve algorithm is more efficient; otherwise the Brix-Kendall dominating algorithm is more efficient.However, we recall that the naïve algorithm is not exact.

Super-dominating algorithm
Based on our theoretical and experimental analysis of the existing algorithms, we shall now propose some new algorithms.

Motivation
The dominating intensity (11) may be difficult to handle numerically, as shown in the examples above.One possible strategy is to use a simpler intensity which dominates ρ(x).One choice is which satisfies ρ * (x) ≥ ρ(x) by the inequality 1 − e −x ≤ x.We call this the 'superdominating' intensity.Generating parent points with intensity ρ * (x) may be computationally simpler, but will also generate more points, so there is a tradeoff.Efficiency depends on analytical tractability of the radial cumulative integral and of its inverse function (M * ) −1 .
(4) Generate N independent random variables U 1 , . . ., U N uniformly distributed on [0, M * (∞)].(5) Compute radii r i = (M * ) −1 (U i ).(6) Randomly accept or delete each proposed radius r i , independently of other radii, with acceptance probability ρ(r i )/ρ * (r i ).(7) For each accepted radius r i , (a) Generate a parent location x i = r i (cos θ i , sin θ i ) where θ i is uniformly distributed on [0, 2π).(b) Generate a random integer N i according to the Poisson distribution with mean for y ∈ D, and zero otherwise.(d) Randomly delete or retain each proposed offspring y ij of x i with retention probability k(y ij | x i )/ k(y ij | x i ) independently of other offspring points and other parents.(e) Restrict the retained offspring y ij to those which fall inside W.

Theoretical performance
Relative to the Brix-Kendall dominating Algorithm 4.2, the super-dominating Algorithm 5.1 has an increased number of proposed parent points, but has the same number of proposed offspring.The expected number of proposed parents is increased from The value of M * (∞) was effectively calculated in ( 13)-( 16) above: The super-dominating parents are thinned at step (6) of Algorithm 5.1 to achieve intensity ρ(x), and the subsequent procedure is the same as in Algorithm 4.2.Consequently the expected number of proposed offspring is the same in both Algorithms, and is equal to ( 16) which equals M * (∞).The total cost of the super-dominating Algorithm 5.1 is The increase in cost of the super-dominating algorithm over the dominating algorithm is M * (∞) − M(∞).When cluster scale s is large, ρ * (x)/ ρ(x) will be close to 1, and the increase in cost will be small.The difference in computation time between the two algorithms is more difficult to evaluate, since it depends on the time taken to compute M(r) and M * (r) and their inverses.
In the example models discussed in Section 4.4, computation of M(r) required numerical integration, while in the next section, we obtain analytic expressions for M * (r) for the example models.Hence the super-dominating algorithm is potentially faster than the Brix-Kendall algorithm despite the increase in the number of proposal points.

Details for Matérn model
The dominating intensity for the Matérn model is analytically tractable, so that there is probably no benefit in using the 'super-dominating' intensity.However, we shall derive it for comparison.The 'super-dominating' intensity (17)

Details for Thomas model
Here the 'super-dominating' intensity ( 17) is The radial cumulative integral (18) is where A(r) = r 2 if r ≤ r D and if r > r D , with denoting the cumulative distribution function of the standard normal distribution.The limit is ).The inverse function (M * ) −1 can be computed rapidly by numerical root-finding, and inverted analytically on [0, M * (r D )].

Performance in experiments
Figure 5 compares the performance of the naïve, Brix-Kendall dominating, and superdominating algorithms for the same test case.The super-dominating algorithm is more costly than Brix-Kendall for small cluster scale, but for larger scales (σ > 0.4) the superdominating algorithm is faster than Brix-Kendall by a factor of 4.

Motivation
The naïve algorithm is efficient for small cluster scales, while the Brix-Kendall algorithm is efficient for large cluster scales.One simple strategy would be to select the algorithm according to the value of the cluster scale.However, it should be remembered that the naïve algorithm is not exact, in general, because it truncates the cluster kernel.This strategy would fail to produce exact realizations of the model in many cases.
A more elegant approach is to use the Brix-Kendall algorithm to generate parents outside the disc D, but to use the naïve algorithm to generate parents inside the disc.This is valid because, in Definition 2.1, the parent process X is a Poisson process, so that the parents inside and outside D are independent point processes.
Experiments show that this remedy is not entirely effective.The reason is clear from the middle panel of Figure 3, which shows that P(D | x) can also be excessively large for a parent point x that lies just outside D at a short distance from the boundary of D. The function P(D | x) has values of order O(s −2 ) when d(x, D) ≤ O(s), so the expected number of proposed offspring generated by parents outside D is O(s −1 ) as s → 0. A more precise calculation is given below.
The solution is to make the demarcation between the two algorithms occur at a positive distance outside D. Let E be the disc of fixed radius r E > r D that is concentric with D. In our experiments we use r E = 2r D .Then we use the Brix-Kendall dominating Algorithm 4.2 to generate parents outside E (using the disc D to define the dominating kernel) and the naïve conditional Algorithm 3.2 to generate parents inside E. (2) Generate parents outside E using the Brix--Kendall dominating Algorithm 4.2 (with the dominating kernel defined using D).These parents have intensity ρ(x).Generate offspring of these parents uniformly inside D, following the steps for the Brix--Kendall dominating algorithm.(3) Generate parents inside E using the naïve (conditional) Algorithm 3.2.These parents have uniform intensity κ † = κ(1 − exp(−μ)).The offspring clusters are generated from the kernel in R 2 , conditional on at least one offspring for each parent, but without spatial constraints.(4) Superimpose the two sets of offspring points, and delete those which fall outside W.

Algorithm definition
Note that the hybrid algorithm remains valid even when the kernel is infinite at the origin, h(0) = ∞, provided r E > r D .

Theoretical performance
The expected number of proposed offspring can be calculated using previous results.Inside E the parents have constant intensity κ † = κ(1 − e −μ ) and the expected number of proposed offspring per parent is μ/(1 − e −μ ), so the expected number of proposed offspring of parents in E is π r 2 E κμ = λπ r 2 E , which does not depend on cluster scale s.For proposed parents outside E, the expected total number of proposed offspring is where the last line is obtained by modifying the derivation of ( 14).Clearly The bound on the right-hand side is uniformly bounded as a function of s.Hence, the hybrid algorithm with r E > r D is robust against s → 0 and against s → ∞.These calculations require modification if the kernel is infinite at the origin, h(0) = ∞, but similar conclusions are obtained.

Performance in experiments
Figure 6 compares the performance of the hybrid Algorithm 6.1 and the existing algorithms, for the same test case, a Thomas process with κ = 10 and μ = 10 in the unit square.The hybrid algorithm with r E = 2r D is robust against extreme cluster scales (s → ∞ and s → 0).The hybrid algorithm with r E = r D is not robust against s → 0.
The naïve algorithm is the most efficient when cluster scale is small, but computation time and memory increase without bound for large cluster scale.The Brix-Kendall algorithm has unbounded time and cost for small cluster scale, but stable, bounded time and cost for large cluster scale.Its computation time is minimised when cluster scale σ is about 0.07.The hybrid algorithm with r E = r D (denoted naive/BK) is uniformly better than the Brix-Kendall algorithm, but still has the same weaknesses as the Brix-Kendall algorithm.The hybrid algorithm with r E = 2r D (denoted inflated naive/BK) is competitive with the naïve algorithm for small scales, and competitive with/equivalent to BK for large scales, and is robust.It is less efficient than the naïve algorithm over a range of clusters scales from about 0.1 to 10.

Hybrid naïve/super-dominating algorithm
The super-dominating algorithm can also be combined with the naive algorithm to make an efficient hybrid algorithm.This will be our recommended optimal algorithm.The offspring clusters are generated from the kernel in R 2 , conditional on at least one offspring for each parent, but without spatial constraints.(4) Superimpose the two sets of offspring points, and delete those which fall outside W.
The expected total number of proposal points (parents and offspring) generated by Algorithm 6.
. This is uniformly bounded so that Algorithm 6.2 is robust.This conclusion extends to the case where the kernel is infinite at the origin.
Experimental performance results for this algorithm are reported in Section 7.

Choice of inflated radius r E
It would be possible to choose the inflated radius r E to depend on the parameters of the cluster process, especially on the cluster scale s.For the Matérn cluster process, a simple argument shows that T is minimised at r E = r D + R if R < r D (reducing to the Naïve Conditional algorithm), and otherwise is minimised at r E = r D .
If T is continuously differentiable with respect to r E (which excludes the Matérn cluster process), then T achieves its minimum at a root of provided s < r D π h(0)/a where a = 1 + κ † /λ, and otherwise the minimum of T is achieved at r E = r D .
For the Thomas process, T is minimised at , and otherwise T is minimised at r E = r D .For the Cauchy process, T is minimised at , and otherwise T is minimised at r E = r D .For s → 0 the optimal inflation is r E − r D = O(s 1/3 ).
We emphasise that these calculations minimise the expected number of proposals, which is not necessarily equivalent to minimising computation time.

Comprehensive study of performance
We conducted an extensive set of simulation experiments to compare the performance of the algorithms presented above.This section gives a summary of the experiments, showing results for one choice of the parameters μ and κ, and noting any differences that were observed with other parameter values.

Matérn cluster process
Figure 7 shows the performance of different variants of the algorithm for simulating the Matérn cluster process, in the same test case κ = 10, μ = 10.The labels are as follows: naive is the naïve conditional Algorithm 3.2 with expansion b = R (which is exact for this model); BK is the Brix-Kendall dominating Algorithm 4.2; super is our superdominating Algorithm 5.1; naive/BK is the hybrid Algorithm 6.1 with r E = r D , and inflated naive/BK is Algorithm 6.1 with r E = 2r D ; naive/super is the hybrid Algorithm 6.2 with r E = r D , and inflated naive/super is Algorithm 6.2 with r E = 2r D .Further details of implementation are given at the end of the article.
The Brix-Kendall dominating algorithm is efficient for large R, but becomes very inefficient for small R.There is no benefit in using the super-dominating process.Inflation (i.e.allowing r E > 2r D ) substantially improves the performance of the hybrid algorithms.Optimal performance is achieved by the hybrid Algorithm 6.1 with r E > r D .For the Matérn model, the inverse functions of M and M * are known analytically in simple form, and these forms were used in our implementation.Consequently, the computation times are much shorter than those experienced in the other examples, where the inverse function must be evaluated by numerical root-finding.
Results in the cases κ = 100, μ = 1 and κ = 1, μ = 100 are shown in the Supplementary Material.In all three cases, the comparison of performance is similar.Computation time increases with κ.

Thomas process
Figure 8 shows the performance of different variants of the algorithm for simulating the Thomas process, in the same test case κ = 10, μ = 10.
As expected, the naïve conditional Algorithm 3.2 is efficient for small cluster size σ , but inefficient for large σ .The cost increases quadratically with σ .The Brix-Kendall dominating algorithm is efficient for large σ , but becomes very inefficient for small σ .The super-dominating Algorithm 5.1 is very inefficient for small scales but more efficient than Brix-Kendall for large scales.The hybrid Algorithm 6.2 with r E = r D has the same computation time as the super-dominating Algorithm 5.1.The hybrid Algorithm 6.2 with r E = 2r D is efficient for small and large scales.The naïve algorithm is best for scales between 0.5 and 5, but we note that the naïve algorithm is not exact, because the kernel is truncated.We conclude that the hybrid Algorithm 6.2 with r E = 2r D should be recommended.
Results in the cases κ = 100, μ = 1 and κ = 1, μ = 100 are shown in the Supplementary Material.In all three cases, the comparison of performance is similar.

Cauchy cluster process
Figure 9 shows the performance measures for simulating the Cauchy cluster process in one case.For the Cauchy model, the naïve algorithm is extremely costly, even for quite small cluster scales, and prohibitive for large scales.The computation time curve for the Brix-Kendall dominating algorithm has a similar shape in this case as it did for the Matérn and Thomas models.The hybrid algorithms are uniformly better than the other algorithms in this case.The hybrid Algorithm 6.2 with r E = 2r D is uniformly optimal.
Simulation runs frequently failed for the Brix-Kendall dominating algorithm at scale s ≈ 5, with an error message about failure of the numerical integration procedure (such as 'maximum number of subdivisions reached', or 'the integral is probably divergent').
Results in the cases κ = 100, μ = 1 and κ = 1, μ = 100 are shown in the Supplementary Material.In all three cases, the comparison of performance is similar.

Comparison of performance against scale
The scale parameters R, σ , s for the Matérn, Thomas and Cauchy models are not comparable quantities since they are idiosyncratic to the particular model.When we seek to compare the performance of an algorithm across different models, it may be appropriate  to re-express the scale of clustering, using the median distance from parent to offspring, or the median distance between offspring of the same parent.These are given in Table 1.

Discussion
In this article we demonstrated, empirically and theoretically, that existing algorithms for simulating spatial cluster processes have unbounded computation time and memory requirements as a function of cluster scale.A simple analysis of the performance of these algorithms led to new proposals.We developed new 'hybrid' algorithms which are robust in the sense of having uniformly bounded computation time and memory requirements, and also have better numerical properties.Performance comparisons were broadly similar across different examples and depended mainly on the asymptotic behaviour of the tail of the kernel.
Performance has been increased by several orders of magnitude in many cases.Our empirical performance experiments had only modest numbers of replicates, ironically because the existing algorithms are so slow.
This study has wider implications for the study of simulation of spatial random processes.It suggests that researchers should consider the robustness of an algorithm against extreme values of parameters, where these extremes specify a well-defined random process.
We used simple first-order methods to analyse the performance of the simulation algorithms.This was sufficient for the purpose, and makes it possible to generalise the study to wider classes of models.This approach was also sufficient to detect cases where the existing algorithms are theoretically invalid (namely when the kernel is infinite at the origin).
The article treated a very limited class of models, in order to make progress and gain insight.Topics for further research could include robust algorithms for shot-noise Cox processes, since the original Brix-Kendall algorithm also applied to these models [5]; and simulation of inhomogeneous Neyman-Scott Cox models, which is currently performed inefficiently by thinning a stationary Neyman-Scott Cox model.

Implementation
For the experiments, algorithms were implemented in the R language using the opensource R package spatstat [4,19].This allowed a fair comparison with existing implementations in the R language in spatstat.Code was byte-compiled as a small R package, and run under R version 4.1-2 using the command line interface.Most simulation experiments were run on a Linux laptop (Lenovo ThinkPad X1 Carbon Gen 9) with 32 Gb RAM running Ubuntu 21.10 at 2.8 GHz.Experiments were conducted under identical conditions (e.g.always connected to the power supply, mounted on a cooling board, with no other user applications running).Timing was measured using the R system utility proc.time.The garbage collector gc was called immediately before each timed computation.Various shortcuts in the spatstat code were disabled to ensure a fair comparison.
The validity of the simulation output was checked using the first two moments.Empirical confidence intervals for the intensity λ were computed from the simulated patterns and compared with the correct value.Empirical estimates of the pair correlation function were computed from the simulated patterns, pooled using the ratio-of-sums estimator [4, Equation (16.1), p. 681], and compared with the true pair correlation function.
These algorithms have now been publicly released in the spatstat package, including both R and C language implementations.The C implementations achieve a further acceleration of one to two orders of magnitude.

Appendix. Truncation error of the naïve algorithm
The naïve Algorithm 3.1 (Section 3) expands the simulation window by a finite distance b and generates parent points inside this expanded window.The resulting simulated patterns do not exactly conform to the desired Neyman-Scott cluster process model; there is a 'truncation' error, because parent points were restricted to lie closer than a distance b from the simulation window [5].
It is straightforward to calculate a bound on this error.Assume that W = b(0, r D ) is the disc of radius r D centred at the origin.The output of the naïve algorithm can be viewed as a subset of the true cluster process, obtained by restricting the offspring to lie in W and restricting the parents to lie in the expanded disc b(0, r D + b).Let N be the number of parents, of offspring in W, falling outside the expanded disc.Then where V denotes the distance from a parent to an offspring point, with probability density f (r) = 2π rh(r).By a coupling argument [6], the total variation distance between the distribution of the true cluster process and the distribution of the output of the naïve algorithm is bounded by P{N > 0}, which is bounded by E[N] by Markov's inequality.
Standard practice is to choose b to be the (1 − p) quantile of the distribution of V where p is a specified small probability.In that case, the analysis above shows that the truncation error in the sense of total variation distance is bounded by λ|W|(1 + r D /b)p.

Figure 1 .
Figure 1.Simulated realisation of the stationary Thomas process with κ = 10, μ = 10 and σ = 0.25 in the unit square, and the parent points of these offspring.Filled circles: offspring points.Crosses: parent points.Grey lines connect parents to their offspring.

Figure 2 .
Figure 2. Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Thomas process using the naïve algorithm and the naïve conditional algorithm, as implemented in the function rThomas in the spatstat package.Stationary Thomas process with κ = 10 and μ = 10 in the unit square.Note logarithmic scales on both axes.

Figure 3 .
Figure 3. Functions k(u | x), P(D | x) and ρ(x) for the Thomas model with parameters μ = 10, κ = 10, σ = 0.25 in the unit disc D. The horizontal axis corresponds to a ray in two dimensions, extending out from the centre of disc D. Dashed line indicates disc radius r D = 1.Left: For a parent point x at distance x = 1.25 from the origin, grey shading shows the true kernel k(u | x) as a function of offspring location u; line shading shows the dominating kernel k(u | x).Middle: Dominating 'probability' P(D | x) as a function of parent location x.Right: Dominating intensity ρ(x) as a function of parent location x.

Figure 4 .
Figure 4. Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Thomas process using the naïve algorithm (dashed line/red line) and the Brix-Kendall dominating algorithm (solid line/blue line).Stationary Thomas process with κ = 10 and μ = 10 in the unit square.Bounding disc radius r D = 1/ √ 2 ≈ 0.71.Note logarithmic scales on both axes.

Figure 5 .
Figure 5. Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Thomas process with κ = 10 and μ = 10 in the unit square, using the naïve algorithm (naive), the Brix-Kendall dominating algorithm (BK), and the super-dominating algorithm (super-BK).Note logarithmic scales on both axes.

Figure 6 .
Figure 6.Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Thomas process with κ = 10 and μ = 10 in the unit square, using the naïve algorithm (naive), the Brix-Kendall dominating algorithm (BK), the hybrid algorithm with r E = r D (naïve/BK), and the hybrid algorithm with r E = 2r D (inflated naïve/BK) where r D = 1/ √ 2 ≈ 0.71 is the radius of the disc enclosing the unit square.Note logarithmic scales on both axes.

Figure
Figure Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Matérn cluster process with κ = 10 and μ = 10 in the unit square, using various algorithms.Note logarithmic scales on both axes.

Figure 8 .
Figure 8. Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Thomas process with κ = 10 and μ = 10 in the unit square, using various algorithms, as a function of cluster scale σ .Note logarithmic scales on both axes.

Figure 9 .
Figure 9. Computation time (Left) and number of proposal points generated (Right) to generate one realisation of the stationary Cauchy cluster process with κ = 10 and μ = 10 in the unit square, using various algorithms.Note logarithmic scales on both axes.
E[N] = κ b(0,r D +b) c [1 − exp(−μP(W | x))] dx ≤ λ b(0,r D +b) c P(W | x) dx ≤ λ|W| b(0,r D +b) c sup v∈W h(v − x) dx ≤ λ|W| b(0,r D +b) c h( x − r D ) dx,where in the last line, we use the assumption that h(x) is isotropic and is monotone decreasing inx .Transforming to polar coordinates givesE[N] ≤ λ|W| ∞ b 1 + r D r f (r) dr ≤ λ|W| 1 + r D b P{ V > b} Collect all offspring points y ij .Delete those which fall outside W. Return the retained offspring points as the realisation of the cluster process Y inside W.
which are independently and identically distributed, with probability density f i (y) = k(y | x i )/ P(D | x i ) for y ∈ D, and zero otherwise.(c) Randomly delete or retain each proposed offspring y ij of x i with retention probability k( y ij | x i )/ k( y ij | x i ) independently of other offspring points and other parents.(4) Collect all offspring points y ij that are retained in step (3c).Delete those which fall outside W. Return the retained offspring points as the realisation of the cluster process Y inside W.

Table 1 .
Median distances for the example models.