Adaptive semiparametric Bayesian differential equations via sequential Monte Carlo

Nonlinear differential equations (DEs) are used in a wide range of scientific problems to model complex dynamic systems. The differential equations often contain unknown parameters that are of scientific interest, which have to be estimated from noisy measurements of the dynamic system. Generally, there is no closed-form solution for nonlinear DEs, and the likelihood surface for the parameter of interest is multi-modal and very sensitive to different parameter values. We propose a Bayesian framework for nonlinear DE systems. A flexible nonparametric function is used to represent the dynamic process such that expensive numerical solvers can be avoided. A sequential Monte Carlo algorithm in the annealing framework is proposed to conduct Bayesian inference for parameters in DEs. In our numerical experiments, we use examples of ordinary differential equations and delay differential equations to demonstrate the effectiveness of the proposed algorithm. We developed an R package that is available at \url{https://github.com/shijiaw/smcDE}.


Introduction
Nonlinear differential equations (e.g.nonlinear ordinary or delay differential equations) are commonly used in modelling dynamic systems in ecology, physics, and engineering.Delay differential 1 arXiv:2002.02571v2[stat.CO] 6 Sep 2021 equations (DDEs) are described by equations dx(t)/dt = g(x(t), x(t − τ )|θ), where θ is the vector of unknown parameters and τ is the time delay parameter.These are continuous-time models for interactions between variables x(t) and a time delay τ .Ordinary differential equations (ODEs) are often presented by dx(t)/dt = g(x(t)|θ), which can be regarded as a special case of DDEs with τ = 0.The form of g(•) is generally proposed by specialists with scientific intuition.For example, ecologists proposed the simple Lotka-Volterra model (Rosenzweig and MacArthur, 1963) to understand and predict the populations of predators and prey in an ecosystem.Given a concrete form of the function g(x(t), x(t − τ )|θ), the parameters θ and τ are unknown and need to be estimated using observations at some data points.Differential equations (DEs) are often observed with measurement error.We assume that the observed y(t) is linked to x(t) though an additive error model such that y(t) = x(t)+ , where is measurement error.The estimation of parameters in DEs is of great interest and usually requires us to solve the DEs dx(t)/dt = g(x(t), x(t − τ )|θ).
Many DE systems do not admit an analytic solution.One alternative approach is to solve the DEs numerically (Butcher, 2016), for example by using the Euler method (Jain, 1979;Bulirsch and Stoer, 1966), the Exponential integrators (Hochbruck et al., 1998;Hochbruck and Ostermann, 2010), or the Runge-Kutta method (Jameson et al., 1981;Ascher et al., 1997).However, numerical DE solvers are computationally expensive, especially for DDEs.Various methods have been proposed to solve DEs more efficiently in recent decades.The idea of using smoothing splines to fit dynamic data was first proposed by Varah (1982).Ramsay and Silverman (2007), Poyton et al. (2006), Chen and Wu (2008) extended the idea of smoothing to a two-stage approach.In the first stage, spline coefficients are optimized by minimizing the sum of the squared distances between the data and the spline functions at the observation times.In the second stage, using the estimated spline coefficients, DE parameters are optimized by minimizing the residuals of DE models.The two-stage approach may lead to inconsistent estimates (Ramsay et al., 2007).Ramsay et al. (2007) proposed a generalized smoothing approach, called "parameter cascading", based on data smoothing methods and a generalization of profiled estimation.In the proposed approach, the spline coefficients are treated as nuisance parameters.Their method iterates between optimizing the objective function with respect to the spline coefficients given current DE parameter estimates, and optimizing the objective function with respect to the DE parameters given the estimated spline coefficients.The iteration is repeated until convergence is achieved.The parameter estimates are consistent and asymptotically normally distributed under mild conditions (Qi et al., 2010;Pang et al., 2017).There are several variations of the parameter cascading approach.Cao et al. (2011) proposed a robust algorithm to estimate parameters using measurements with outliers based on smoothing splines.Cao et al. (2012) proposed a method to estimate time-varying parameters in ODEs, in which the ODE parameters are also modelled by smoothing splines.Wang and Cao (2012) developed a semiparametric method with smoothing spline to estimate DDE parameters.
Using smoothing splines to model DEs is computationally efficient since we do not need to numerically solve DEs.Most methods based on data smoothing to estimate parameters of DEs are derived from a frequentist perspective.Bayesian methods are of interest since they quantify the uncertainty of parameters.Campbell and Steele (2012) proposed a smooth functional tempering algorithm to conduct posterior inference for ODE parameters.This idea originates from parallel tempering and model-based smoothing.Zhang et al. (2017) proposed a high-dimensional linear ODE model to accommodate the directional interaction between areas of the brain.Parallelized schemes for Markov chain Monte Carlo have been proposed to estimate this model.Bhaumik et al. (2015) investigated a two-stage procedure to estimate parameters by minimizing the penalized ODEs.
There are several lines of work involved in estimating DE parameters from a Bayesian perspective based on numerical DE solvers.Dass et al. (2017) proposed a two-step approach to approximate posterior distributions of parameters of interest.They first applied a numerical algorithm to solve ODEs, then integrated out nuisance parameters using Laplace approximations.Bhaumik et al. (2017) proposed a modification of Bhaumik et al. (2015) by directly considering the distance between the function in the nonparametric model and that obtained from a four-stage Runge-Kutta (RK4) method.Calderhead et al. (2009) presented a novel Bayesian sampler to infer parameters in nonlinear delay differential equations; the derivatives and time delay parameters were estimated via Gaussian processes.To make the DE estimation more consistent, Dondelinger et al. (2013) proposed an adaptive gradient matching approach to jointly infer the hyperparameters of a Gaussian process as well as ODE parameters.Barber and Wang (2014) simplified previous approaches by proposing a more natural generative model via Gaussian process.The proposed approach directly links state derivative information with system observations.Standard sequential Monte Carlo (SMC) methods (Doucet et al., 2001(Doucet et al., , 2000;;Liu and Chen, 1998) are popular approaches for estimating dynamic models (e.g.state space models).SMC methods combine importance sampling and resampling algorithms.Under mild conditions, consistency properties and asymptotic normality hold (Chopin et al., 2004).Del Moral et al. (2006) proposed a general SMC framework, to sample sequentially from a sequence of intermediate probability distributions that are defined on a common space.This general framework has promoted popularity of SMC methods in areas besides state space models.For example, Wang et al. (2020) proposed an annealed SMC algorithm for phylogenetics by designing an artificial sequence of intermediate distributions.Several SMC methods have been proposed to estimate parameters in ODE models.Zhou et al. (2016) presented an adaptive SMC sampling strategy to estimate parameters and conduct model selection.They used a simple example of ODEs to demonstrate the performance of model selection using their algorithm.Lee et al. (2018) introduced additive Gaussian errors into the ODE trajectory provided by numerical solvers, and they proposed a particle filter to infer ODE parameters.In addition, Gaussian processes have been used to avoid numerical integration.These works are based on numerically solving ODE models.
In this article, we propose a semiparametric Bayesian model for nonlinear DEs and design an annealed SMC algorithm to conduct inference efficiently for parameters.The DE trajectories are represented using a linear combination of basis functions.Consequently, our method avoids expensive numerical solvers, especially those for DDEs.It instead needs to estimate the basis coefficients together with other parameters in the DEs.In other words, the parameters of interest include the DE parameters, basis coefficients of smoothing spline functions, and parameters in the observation model.In addition, a tuning parameter is used to balance the fit to data and fidelity to the DEs.We estimate the tuning parameter using the Bayesian approach to avoid tuning it through expensive cross-validation.Inspired by the reference distribution of Fan et al. (2011) in the context of model selection, we design an artificial sequence of intermediate distributions that starts from a reference distribution, which is easier to sample from, and gradually approaches the target distribution through a sequence of annealing parameters.The proposed annealed SMC can effectively sample parameters with multiple isolated posterior modes and basis function coefficients of high dimensionality.It adopts the adaptive scheme in Zhou et al. (2016) and Wang et al. (2020) to choose the sequence of annealing parameters that determines the intermediate target distributions of SMC.Our numerical experiments demonstrate the effectiveness of our algorithm in estimating parameters and DE trajectories for both ODEs and DDEs.
The rest of article is organized as follows.In Section 2, we construct a fully Bayesian framework for nonlinear DEs.In Section 3, we introduce our new algorithm for Bayesian inference for nonlinear DEs.In Section 4 and Section 5, we use a real data analysis and numerical experiments to show the effectiveness of our method.We conclude in Section 6.

Hierarchical Bayesian differential equations
In this section, we introduce a hierarchical Bayesian structure for DE models.In Section 2.1, we introduce the likelihood function for DEs.In Section 2.2, we construct a Bayesian model for the DE model.In Section 2.3, we introduce selection of the tuning parameter λ.In Section 2.4, we introduce the posterior distribution of the DE model.

DE models
We use x(t) = (x 1 (t), . . ., x I (t)) to denote the DE variables (i.e. the solution of a DE system), where x i (t) denotes the i-th DE variable and I denotes the total number of DE variables.Each DE variable x i (t), i = 1, . . ., I, is a dynamic process modelled with one differential equation where t ∈ [t 1 , t max ], t 1 = 0 unless it is specified otherwise, θ denotes the vector of unknown parameters in the DE model, τ is the delay parameter in DDE model (τ = 0 in ODE model), and x i (0) is the initial condition for the i-th DE variable, which is also unknown and needs to be estimated.Delay differential equations (DDEs) are time-delayed systems.The time delay τ in DDEs considers the dependence of the present state of the DE variable based on its past state.
We refer readers to Section 4 for a more detailed description of DDE models.
We do not observe the DEs directly, instead we observe them with measurement error.In addition, we often only observe a subset of the I DE variables, I 0 ⊆ {1, . . ., I}.We let y i = (y i1 , . . ., y iJ i ) denote the observations for the i-th DE trajectory.The j-th observation of y i is assumed to be normally distributed with mean x i (t ij |θ, τ, x i0 ) and variance σ 2 i , where x i (t ij |θ, τ, x i0 ) denotes the DE solution given θ, τ , and initial condition x i0 ; t ij denotes the time we observe the j-th observation of y i .
The joint likelihood function of θ, τ , x(0), and σ 2 i admits the following form We use a figure (see Figure 1

A Bayesian structure for DE model
Numerically solving DEs can be computationally extremely intensive, especially for DDE models.
We propose to solve differential equations by penalized smoothing.More specifically, we repre- where c i denotes the vector of basis coefficients.See Figure 1 in Supplementary Material for an example of cubic B-spline functions (Ramsay, 2004;De Boor, 1972).Here we do not distinguish the true x i (t) from its approximation using splines, which is a common practice in nonparametric smoothing (Berry et al., 2002;Ramsay, 2004;Wood, 2017).The reason is that the number of basis functions is chosen to be sufficiently large such that we expect the basis function approximation can avoid bias from model over-simplification.The error in approximating x i (t) by its B-spline approximation typically is negligible compared to the estimation error, so we can assume that the two are equivalent (See Page 161 of Berry et al. (2002)).The initial condition for the i-th One advantage of using smoothing spline functions to model DE trajectories is that we can avoid explicitly estimating the initial condition x(0); instead, it is estimated using xi (0) = Φ i (0) ĉi , where ĉi is the vector of estimated basis coefficients.
where In the Bayesian framework, we need to assign appropriate priors for model parameters θ, τ , σ 2 i , i = 1, . . ., I. The following priors are specified: where σ 2 θ , g 0 , and h 0 are the hyper-parameters in prior distributions, and D is the dimension of the vector θ.The vector of all zeros is represented by 0, and I is an identity matrix.Their subscripts denote the vector/matrix dimension.

The choice of λ
The tuning parameter λ is important in balancing between fit to the data and fidelity to the DE model.A small value of λ does not impose much information about the DE fitting.If λ → 0, we end up fitting least squares for spline coefficients with the data.If we choose a large value of λ, the prior information of the DE system is too strong and not much information about the data is taken into consideration.Hence, it is crucial to choose a proper value of λ to balance the DE fitting and data information.
One approach to choose λ is through cross-validation (Wang and Cao, 2012;Reiss and Todd Ogden, 2009) from a range of reasonable choices of λ.However, this approach is infeasible in a Bayesian framework as it significantly increases the computational cost.We propose to treat λ as an unknown parameter by specifying a prior distribution on λ and estimating its posterior distribution through a Bayesian method.This idea is adapted from Berry et al. (2002), in which they automatically select a smoothing parameter for splines.We choose the prior distribution for the smoothing parameter to be Gamma(a λ , b λ ).

Posterior distribution of DE model
The likelihood function is We introduce a new notation β = (θ , τ, c , σ , λ) to denote all the parameters of interest.Let π0 (β) denote the prior distribution, which is specified in Equations ( 3) to ( 6), and Gamma(a λ , b λ ) for λ.We are interested in the posterior distribution for β Here γ(β) = π0 (β)p(y|β) is the unnormalized posterior distribution of β and can be written as The integral usually does not have a closed-form expression.However, it can be evaluated by numerical quadrature approximation.Let ) denote the knots placed within [t 1 + τ, t max ] for i-th DE function, we approximate the integral by using the composite Simpson's rule (Burden et al., 2001) , where M is the number of quadrature points, ζ i is the number of knots used for the i-th DE function, , and v l i m is the corresponding quadrature weight.

Methodology
One classical methodology for Bayesian inference of nonlinear DE parameters is Markov chain Monte Carlo (MCMC).In MCMC, we construct an ergodic Markov chain which admits the normalized posterior as its stationary distribution.If we run the chain long enough, convergence to the posterior is guaranteed.
We show the details of this method in Supplementary Section 2.1.
However, MCMC (more specifically, the Metropolis-Hastings (MH) algorithm) is inefficient for estimating parameters of nonlinear DEs for several reasons.First, the posterior surface is extremely sensitive to DE parameters θ.There may exist isolated modes in the posterior distribution.The posterior may change quite a bit even with a tiny change in the parameter value.Second, the computation of the likelihood function involves numerically solving nonlinear DEs, which is computationally expensive.Third, the convergence of MCMC is generally difficult to assess.

An annealed sequential Monte Carlo for Bayesian DE inference
To better cope with the inadequateness of MCMC, we propose a sequential Monte Carlo (SMC) algorithm in the SMC framework of Del Moral et al. (2006) for the static setting for Bayesian DEs.This special case of SMC is a generic method to approximate a sequence of intermediate probability distributions {π r (β)} 0≤r≤R defined on a common measurable space (E, E).This method is different from the standard SMC algorithm (Doucet et al., 2000(Doucet et al., , 2001)), as the sequence of intermediate probability distributions {π r (β)} 0≤r≤R in standard SMC methods are generally defined on measurable spaces with increasing dimension.
The SMC algorithm in the static setting approximates the target distribution π(β) in R steps.We are interested in sequentially sampling from the distributions {π r (β)} 0≤r≤R .For example, we first approximate π 0 (β), then approximate π 1 (β) and so on.The subscript r denotes the index of intermediate where γ r can be evaluated pointwise.There are several reasons to use multiple distributions in SMC.
r+1 ).One typical approach in the SMC framework for the static setting is to select T r+1 (β ) to be a π r+1 -invariant MCMC kernel; this will be detailed later in the paper.Then we compensate for the difference between the particles β k=1 and π r (β) by the updated weights r+1 , we first compute the incremental importance weight , where L r (β r ) is an artificial backward kernel (Del Moral et al., 2006, 2012), denoting the probability of moving from β r .Then we calculate the unnormalized weight by using the previous unnormalized weight and the incremental importance weight as follows The normalized weights W (k) ).The selection of the backward kernel L r (β ) is important as it will impact the variance of {W (k) r+1 } K k=1 .We refer readers to Del Moral et al. (2006) for a more detailed discussion of this artificial backward kernel.A convenient backward Markov kernel that allows an easy evaluation of the importance weight is .
With this backward kernel, the weight update function w(k) . Thus, we do not require pointwise evaluation of the forward kernel T r+1 (β r ) to compute the weight update function.In this article, we propose a sequence of annealing intermediate target distributions (Neal, 2001;Wang et al., 2020) {π r (β)} 0≤r≤R to facilitate the exploration of posterior space, such that where ρ(β) is a reference distribution (Fan et al., 2011), and 0 = α is the sequence of annealing parameters.When α 0 = 0, the first distribution is the reference distribution ρ(β); when α R = 1, the last distribution is our target distribution, the posterior distribution of β.
The reference distributions should be easy to sample from and ideally they are close to the modes of the target distribution.Since we can easily sample from all the prior distributions except for the prior of c, we use the same reference distribution as the prior distribution for all parameters except for c.
In this case, We specify the following reference distribution for c based on its MLE: where ĉi is the MLE of c i by maximizing p(y|β)π 0 (c|θ, τ, λ) with respect to τ , θ, c, and σ given λ, σ 2 c is the hyper-parameter in the reference distribution.
If there are isolated modes in π(β), MCMC may get stuck in one of the modes which is close to the initial value.A sequence of intermediate distributions is introduced to avoid this.With a small annealing parameter α r , the intermediate distribution surface is flat, which makes MCMC samples move easily between modes.The intermediate distribution with a higher value of annealing parameter is closer to the true posterior.The samples move closer to the target posterior distribution if we increase α r .
One simple choice of annealing parameters is to equally put parameters across [0, 1], such that α 0 = 0, We now introduce an SMC algorithm with a defined sequence of intermediate targets.First, we initialize particles {β denote particles after the resampling step (see Step 3).We iterate between the following three steps to obtain the approximated intermediate distributions Step 1.We compute the weight function for particles at iteration r with Note that the weight update function for particles at the r-th iteration only depends on particles at the (r − 1)-th iteration, which is different from the standard SMC algorithm (Doucet et al., 2000(Doucet et al., , 2001)).
Step 3. We conduct a resampling step to prune particles with small weights.The particles after the resampling step are denoted by { β(k) r } K k=1 , and all particles are equally weighted.The simplest resampling method is multinomial resampling based on the normalized particle weights.However, advanced resampling schemes such as stratified resampling (Kitagawa, 1996;Hol et al., 2006) or residual resampling (Douc and Cappé, 2005) are preferable to multinomial resampling, since multinomial resampling will create more variance for the SMC estimator when compared with advanced resampling algorithms.
In our numerical experiments, we use systematic resampling (Carpenter et al., 1999).
It is not recommended to conduct resampling at every iteration as resampling will create additional variation in the estimator (Chopin et al., 2004).Our resampling scheme is typically triggered when the relative effective sample size (rESS) falls below a given thresholds ς.The effective sample size (ESS) at iteration r can be computed by denotes the number of "perfect" samples used to approximate the intermediate distribution π r .
Effective sample size takes value between 1 and K.It takes value K if all particles are equally weighted, and it takes value close to 1 if one of the particles has a much larger weight than the others.The rESS normalizes the ESS to be between zero and one.The rESS at iteration r can be computed by rESS r /K.If we never conduct resampling, the annealed SMC algorithm degenerates to the annealed importance sampling (Neal, 2001).
After conducting the annealed SMC algorithm, we obtain a list of weighted samples to empirically represent the posterior distribution π(β),

Properties of the annealed SMC algorithm
We discuss some properties of our annealed SMC method.Note that the general SMC algorithm proposed by Del Moral et al. (2006) includes our annealed SMC as a special case.Hence consistency and asymptotic normality properties can be generated from Del Moral et al. (2006).We summarize these properties in following propositions.First, our annealed SMC method can provide a consistent representation of the intermediate target posterior distributions.
Proposition 1.The annealed SMC method provides asymptotically consistent estimates.We have where the convergence is in L 2 norm, and ψ is a target function that satisfies regularity conditions, for example ψ is bounded.Del Moral ( 2004) and Chopin et al. (2004) discussed more general conditions which include the case of our annealed SMC algorithm.
The central limit theorem shown below can be used to assess the total variance of Monte Carlo estimators.
Proposition 2. Under the integrability conditions given in Theorem 1 of Chopin et al. (2004), or Del Moral (2004), pages 300-306 in Section 9.4, when multinomial resampling is performed at each iteration, where the convergence is in distribution.The form of asymptotic variance σ 2 r (ψ) depends on the Markov kernel T r , and the artificial backward kernel L r .We refer readers to Del Moral et al. (2006) for details of this asymptotic variance.
Note that Propositions 1-2 hold if the integral in R i can be computed exactly.These propositions do not hold exactly if we numerically approximate the integral due to the error being introduced.
In addition, the annealed SMC algorithm can be easily parallelized, by allocating particles across different cores (Del Moral et al., 2006;Wang et al., 2020).

Adaptive annealing parameter scheme in SMC
In the annealed SMC algorithm, one challenge is to properly select the sequence of annealing parameters.
If we choose α 0 = 0 and α 1 = 1, the annealed SMC sampler degenerates to importance sampling.A large number of annealing parameters improves the performance of algorithm, but it will be computationally more intensive.If we select an insufficient number of annealing parameters or an improper annealing scheme, the algorithm may collapse.We propose an adaptive annealing parameter scheme based on the In this article, we use the relative conditional effective sample size (rCESS) (Zhou et al., 2016) to measure the discrepancy between two successive intermediate targets.The rCESS normalizes the conditional effective sample size (CESS) to be between zero and one.The rCESS is defined as rCESS r (W which takes a value between 1/K and 1.The rCESS is equal to the relative ESS if we conduct resampling at every SMC iteration.Using the fact that w(k) r−1 )] αr−α r−1 , rCESS r is a decreasing function of α r , where α r ∈ (α r−1 , 1].We control rCESS over iterations by selecting the annealing parameter α such that where φ is a value between 0 and 1.A small value of φ will lead to a high value of α r , while a large value of φ will lead to a low value of α r .It is impossible to obtain a closed-form solution of α * by solving f (α) = φ, but we are able to use a bisection method to solve this one-dimensional search problem.The search interval of α is (α r−1 , 1].By using f (α r−1 ) − φ > 0, f (1) − φ < 0 (in case f (1) ≥ φ, we set α r = 1), and the continuous property of f (α) − φ, the solution α * of f (α) = φ is guaranteed.Algorithm 1 provides a detailed description for the SMC algorithm.

6:
Initialize weights to unity: w Compute annealing parameter α r using a bisection method with

11:
Normalize weights: r ).12: Sample particles β (k) r with one MCMC move admitting π r as stationary/invariant distribution, using particles β(k) r−1 and the propagation step in Supplementary Section 1.

13:
if α r = 1 then 14: return the current particle population if rESS < ς then 17: Resample the particles.18: for k ∈ {1, . . ., K} do 19: Reset particle weights: w In the dynamic system of the blowfly population, resource limitation acts with a time delay, roughly equal to the time for an egg to grow up to a pupa.One classic experiment on the resource competition in laboratory populations of Australian sheep blowflies (Lucilia cuprina) is studied by Nicholson (1954).
The blowflies were cultivated in a room with temperature maintained at 25°C.The population of blowflies was measured every day for approximately one year.Figure 2 displays the counts of blowflies over time studied in Nicholson (1954).The time unit is one day.The oscillations displayed in the blowfly population are caused by the time lag between stimulus and reaction (Berezansky et al., 2010).May (1976) proposed to model the counts of blowflies with the following DDE model where x(t) is the blowfly population, ν is the rate of increase of the blowfly population, P is a resource limitation parameter set by the supply of food, and τ is the time delay, roughly equal to the time for an egg to grow up to a pupa.Our goal is to estimate the initial value, x(0), and the three parameters, ν, P , and τ , from the noisy Nicholson's blowfly data y(t).The observed counts of blowflies y(t) is assumed to be lognormal distributed with mean x(t) and variance σ 2 .
The counts of blowfly x(t) is a positive function.Instead of modelling the constrained function x(t) by a linear combination of cubic B-spline basis functions W (t) = Φ(t) c, we transform x(t) = e W (t) and use B-spline basis functions to model the unconstrained function W (t) = Φ(t) c. Equivalently, we solve the delay differential equation with noisy observations log y(t) ∼ N(W (t), σ 2 ).
We approximate the DDE solution using cubic B-splines with 34 equally spaced interior knots over the time span.The total number of knots is equal to 36.The total number of cubic B-spline functions is L = 38.Selection of the number of basis functions is explored in Section 5.1.2.Our prior/reference distributions for parameter of interest (c, ν, P, τ, σ 2 , λ) are ν ∼ N(0, 5 2 )I(ν > 0), P ∼ N(0, 5 2 )I(P > 0), τ ∼ Unif(0, 50), In our adaptive SMC, we set the rCESS threshold φ = 0.9 and resampling threshold ς = 0.5.The number of particles is K = 500.Under this setting, the number of SMC iteration R = 227.Figure 2 in Supplementary shows the annealing parameter sequence α 1:R under the adaptive scheme.In Section 5.1.2,we will compare the performance of our method using different values of φ and K.  for DDE parameters in Equation ( 10).Note that our point estimates are similar to those obtained from Wang and Cao (2012), in which the same nonparametric function expressed using B-splines is estimated by maximizing the DDE-defined penalized likelihood function.However, the uncertainty of these parameters is significantly underestimated using their frequentist approach.In contrast, our Bayesian approach can provide more reasonable estimates for the parameter uncertainty.More concretely, we compare the estimates for the main parameter of interest, the delay parameter τ , which can be interpreted as the time for an egg to grow up to a pupa.From Table 1, our posterior mean of τ and its 95% CI is 8.368 (5.656, 9.916) while the maximum likelihood estimate for τ is 8.781 and the standard error is 0.039 in Wang and Cao (2012).
Recall that ν is the rate of increase of the blowfly population and τ is the time delay, roughly equal to the time for an egg to grow up to a pupa.The small positive value of the correlation between τ and ν indicates that the blowfly population will increase if eggs take their time to develop into pupae.The parameter P is related to a resource limitation.The relatively large positive correlation between ν and P can be easily understood: the blowfly population grows faster when there is a larger food supply.The tiny positive value of the correlation between τ and P implies that the amount of food supply has a small impact on the period of being a pupa.

Simulation Study
We use simulation studies to demonstrate the effectiveness of our proposed model and method.The experiments include both ODE and DDE examples.We use the R package deSolve (Soetaert et al., 2010) to simulate differential equations.

A nonlinear ordinary differential equation example
In this section, we use a nonlinear ODE example to illustrate the numerical behaviour of SMC algorithm.
We generate ODE trajectories according to the following ODE system, where θ 1 = 2 and θ 2 = 1, and initial conditions x 1 (0) = 7 and x 2 (0) = −10.The observations y i are simulated from a normal distribution with mean x i (t|θ) and variance σ 2 i , where σ 1 = 1 and σ 2 = 3.We generate 121 observations for each ODE function, equally spaced within [0, 60] (see Figure 3 in Supplementary).Under this setting, the posterior distribution of θ 1 and θ 2 will have multiple local modes (see Figure 1).
We use cubic B-spline functions (see Figure 1 in Supplementary) to represent ODE trajectories.We put equally spaced knots on each of eight observations.The total number of knots is 16, including 14 interior knots.The total number of cubic B-spline functions is L 1 = L 2 = 18.We select weak prior/reference distributions of β for the SMC algorithm,
We simulate the ODE trajectories using the "lsoda" method in the R package deSolve (Soetaert et al., 2010).In SMC-spline, we set the rCESS threshold φ = 0.9 and resampling threshold ς = 0.5.The total number of particles we use is K = 500.With given samples θ (n) , x(0) (n) in the SMC-deSolve and MCMC-deSolve methods, we use the "euler " method in deSolve to solve ODEs to obtain x(t ij |θ (n) , x(0) (n) ).We purposely use a different ODE solver to mimic the fact that no data generation information is available for real data.We select weak prior distributions for θ and x(0).In the SMC-deSolve algorithms, we use K = 500 and φ = 0.999 and utilize the same prior distributions and MCMC moves as those in the MCMC-deSolve methods.We run both MCMC algorithms with 400, 000 iterations, which is close to  x 2 (0) -10 -10.66 (-17.86, -2.26) -4.47 (-5.63, -3.23) -10.28 (-14.18, -6.71) 0.68 (0.65, 0.70) We also run SMC-deSolve and MCMC-deSolve using the "lsoda" method in deSolve with the same setting.Note that the "lsoda" is the same ODE solver that is used for simulating ODEs and therefore it will favour the algorithms SMC-deSolve and MCMC-deSolve.MCMC-deSolve does not converge to the posterior distribution.Table 3 displays the posterior mean of θ and x(0) together with the 95% CI from SMC-deSolve and MCMC-deSolve.The posterior means from SMC-deSolve are close to the true values, and the credible intervals seem narrow.The true value of θ 2 is on the boundary of the 95% CI.
Obviously, the methods with DE solvers heavily rely on the choice of numerical solvers and they tend to ignore the uncertainty from approximating DE solutions using these solvers.

Comparison of SMCs using different tuning parameters
We conduct experiments investigating the selection of tuning parameters φ, K, λ, and number of basis functions.1.The parameter estimates get closer to the true values, and the RMSE of the ODE trajectories gets smaller, when we increase the rCESS threshold φ.A higher value of rCESS threshold is equivalent to more intermediate target distributions.2. The proposed SMC method performs better when we use a large number of particles.3.For a given amount of computation, a relatively small K and a large φ is optimal.However, a too small value of K is not recommended, as an extremely small K may lead to large Monte Carlo variance.4.This experiment indicates a sufficient number of basis functions is important in ODE trajectory estimation.However, we do not recommend using an overly large number of basis functions because it gains little in improving the approximation of ODE trajectories and causes challenges in sampling high-dimensional basis function coefficients.5.The Bayesian scheme for sampling λ performs satisfactorily in terms of parameter estimates and estimated ODE trajectories.The details of experiments are displayed in Supplementary Section 4.1.2and 4.1.3.

Hutchinson's equation
Our first DDE example is the Hutchinson's equation, which is used to model the blowfly data in Section 4, where τ , ν, and P are parameters of interest in the DDE.We set x(0) = 3500, τ = 3, ν = 0.8, and P = 2 to simulate the DDE trajectory.The DDE trajectory is observed with measurement error.The error is lognormally distributed with mean 0 and standard deviation σ.We investigate the influence of the number of points per time step, and the influence of standard deviation of error σ.We simulate data sets in two scenarios.In the first scenario, we simulate 3 data sets, with 101, 201, and 401 observations respectively, equally spaced in [0, 100].The standard deviation of error is σ = 0.4.In the second scenario, we simulate 36 data sets with 201 observations equally spaced in [0, 100].The standard deviations of error for the 36 data sets are σ = (0.1, 0, 5, 1.5), 12 data sets for each level of σ.
We transform the positive constraint function x(t) = e W (t) and use B-spline basis functions to model the unconstrained function W (t) = Φ(t) c.This is equivalent to solving the delay differential equation displayed in Equation ( 10).We put 51 knots equally spaced in [0, 100], including 49 interior knots.
The total number of cubic B-spline functions is L = 53.The hyper-parameters in DDE parameter prior/reference distributions and sequential Monte Carlo setups are the same as Section 4.
Table 4 displays the estimated parameters (ν, P, τ, W (0)) and RMSE defined in Equation ( 12) in Supplementary for W (t) using data sets simulated in the first scenario.For the same DDE function, a larger number of observations improves the performance of estimation.Figure 6 shows the estimated parameters (ν, P, τ, W (0)) and RMSE using data sets simulated in the second scenario.It indicates that a smaller value of σ improves the estimation.We also evaluate the quality of the uncertainty estimation of the parameters through a simulation study using Hutchinson's DDE model.The true parameters are set to ν = 0.22, P = 2, τ = 8, and σ = 0.2.We generate 50 data sets of 200 observations in the time interval [0, 130] with different random seeds and run our SMC algorithm for each of them.We use 16 equally spaced knots in [0,130].Table 5 shows the percentage that the 95% CI's cover the true parameter value and the averaged 95% CI.The coverage probability is close to the nominal 95%.

A nonlinear delay differential equation example
In this section, we investigate a nonlinear delay differential equation model proposed by Monk (2003) to model the feedback inhibition of gene expression.The nonlinear DDE is described as follows: In Equation ( 13), x 1 (t) denotes the expression of mRNA at time t, and x 2 (t) denotes the expression of a protein at time t.There is a delayed repression of mRNA production by the protein.The DDE system depends on the transcriptional delay τ , and degradation rates µ m and µ p , the expression threshold p 0 , and the Hill coefficient n.As noted in Monk (2003), there is significant nonlinearity in the DDE system when the Hill coefficient n > 4. We simulate a delay differential equation system and noisy observations.

Discussion
We proposed an adaptive semi-parametric Bayesian framework to solve nonlinear differential equations and estimate the DE parameters using an efficient annealed sequential Monte Carlo method.The main idea is to represent DE trajectories using a linear combination of basis functions and to estimate the coefficients of these basis functions together with other DE parameters using an annealed sequential Monte Carlo algorithm.The proposed method avoids using DE solvers which can be computationally expensive and sensitive to the initial state and model parameters.Our work is a Bayesian method with two obvious advantages over the counterpart using a frequentist method.First, the Bayesian method can easily achieve uncertainty estimates.Second, we avoid expensive tuning for the global smoothing parameter by treating it in the same way as other parameters.
We represent DE variables with a linear combination of basis functions.A prior distribution on the basis function coefficients is used to control the trade-off between fit to the data and fidelity to the DE model.Our model is related to the generalized profiling approaches developed by Ramsay et al. (2007), in which the coefficients of the basis functions and DE parameters are estimated by a penalized smoothing procedure.Qi et al. (2010) investigated the asymptotic bias induced by the spline approximation in the generalized profiling approaches.Pang et al. (2017) loosened the assumptions for the asymptotic properties.We refer readers to Qi et al. (2010); Pang et al. (2017) for the details of assumptions, theorems, and proofs.
We developed a sequential Monte Carlo method in an annealing framework to estimate the DE parameters.The annealed SMC considers the same parameter space for all the intermediate distributions.
Consequently, MCMC moves used in the literature on Bayesian inference for differential equations can be repurposed to act as SMC proposal distributions in the annealed SMC.Note that more advanced MCMC moves can be used to further improve the performance of the annealed SMC.Annealed SMC is preferred over MCMC algorithms for several reasons.First, the developed SMC method can fully explore the multimodal posterior surface of DE parameters.Second, the proposed method is a semi-automatic algorithm that requires minimal tuning from the user; given a criterion for the relative conditional effective sample size and the number of particles, it can adaptively choose a scheme for the sequence of the annealing parameters that determine the intermediate target distributions of SMC.Third, the annealed SMC is an embarrassingly parallel method.Unlike running an MCMC chain for a long time until it converges, the annealed SMC is more efficient because a large number of particles can be run on different CPUs or GPUs simultaneously.
We used different simulation scenarios to explore the numerical behavior of our model and method, and demonstrated it can perform well in both ODE and DDE parameter estimation.Our simulation studies provide some guideline for choosing the value of rCESS and the number of particles.To ensure more accurate estimates of the DE parameters from the annealed SMC, a rule of thumb is to choose a large number for rCESS and to avoid using an extremely small number of particles.We also applied our method to a real data example to model the population dynamics of blowflies with a delay differential equation.The delay parameter in DDEs is usually challenging to estimate.But our application shows that our method is superior to the previous frequentist method.
There are several improvements and extensions based on our proposed method for future work.Note there might be other satisfactory ways to construct the intermediate distributions for our SMC algorithm such that the last distribution of the sequence is our target distribution.In theory, when the number of particles and the number of the SMC iterations are large enough, such an alternative SMC algorithm can also well approximate the same target distribution.But in practice, different sequences of intermediate distributions may perform differently given the same computing budget.In future, it is worth exploring alternative ways to construct the intermediate distributions and make comparisons.For simplicity, we have used the same reference distributions as the prior distributions for most of the parameters.The performance of the annealed SMC can be improved by using reference distributions that are close to the target distribution.In all of our current numerical experiments, we put equally spaced knots for smoothing splines and the number of knots are pre-determined before running experiments.In future work, we will explore using a smaller number of knots that are well placed, and let the data determine the number of knots and their locations.The adaptive control of knots in smoothing spline for DEs will benefit the estimation of DEs, especially those with sharp changes.In practice, it is often the case that there are several DE models that are proposed to describe the same dynamic system.This requires selection among various differential equations models.One direction of future work is to explore model selection methods for DEs.Another line of future work is to develop more scalable SMC algorithms for estimating parameters in a large series of differential equations.

Supplementary Materials
Technical material: The file supplementary.pdf(PDF file) provides details of the algorithms and simulation studies in the article.

R-package:
The R package "smcDE" was developed to implement our proposed methods, real data analysis, and simulation studies.It is available from https://github.com/shijiaw/smcDE.• If the full conditional distribution π r (c i |τ, θ, σ, c −i , λ) does not admit a closed form.We perform a random walk MH algorithm with a Gaussian kernel with . (3) • If the full conditional distribution π r (θ|c, τ, λ) does not admit a closed form.We perform a random walk MH algorithm with a Gaussian kernel with 2 Three Monte Carlo methods for inference of parameters in Bayesian differential equations In this Section, we introduce three other Monte Carlo methods for inference of parameters in Bayesian differential equations.

Markov chain Monte Carlo targeting π(β)
The first approach we introduce is Markov chain Monte Carlo targeting π(β).We name it MCMCspline for simplicity.
We perform a random walk MH algorithm with a Gaussian kernel to propose τ .
• If the full conditional distribution π(c i |τ, θ, σ, c −i , λ) does not admit a closed form.We perform a random walk MH algorithm with a Gaussian kernel with • If the full conditional distribution π(θ|c, τ, λ) does not admit a closed form.We perform a random walk MH algorithm with a Gaussian kernel with • The full conditional posterior distribution of λ is Gamma(a The second method we introduce is classical Markov chain Monte Carlo algorithm targeting π(θ, τ, x(0), σ 2 ).We name it MCMC-deSolve for simplicity.We follow the notation described in Section 2 of main manuscript.The joint likelihood function of θ, τ , x(0), and σ 2 can be written as To construct a Bayesian framework, we assign appropriate prior distributions for model parameters θ, τ , x(0), and σ 2 , denoted by π0 (θ), π0 (τ ), π0 (x(0)), and π0 (σ 2 ).The full conditional posterior distribution of σ 2 i is Inverse-Gamma distributed.The full conditional posterior distributions of θ, τ , and x(0) do not have analytical solutions.We conduct random walk MH algorithms to sample new parameters.We use τ as an illustrative example.Conditional on samples at the n-th iteration θ (n) , x(0) (n) , and σ 2. Solve DEs numerically and obtain x(t ij |θ (n) , τ , x(0) (n) ).
We name it SMC-deSolve for simplicity.The SMC-deSolve framework is same as described in Section 3 of main manuscript.
We iterate between the weighting, propagation, and resampling to obtain the approximated intermediate target posterior distribution.We refer readers to Section 3.1 for details of weighting and resampling steps.We use MCMC-deSolve moves described in previous section to propagate new particles.

Real Data Analysis
Figure 2 shows the annealing parameter sequence α 1:R under the adaptive scheme.

Simulation Study
We use simulation studies to demonstrate the effectiveness of our proposed model and method.
The experiments include both ODE and DDE examples.We use the R package deSolve (Soetaert et al., 2010) to simulate differential equations.

A nonlinear ordinary differential equation example
In this section, we use a nonlinear ODE example to illustrate the numerical behaviour of the SMC algorithm.We generate ODE trajectories according to the following ODE system, where θ 1 = 2 and θ 2 = 1, and initial conditions x 1 (0) = 7 and x 2 (0) = −10.The observations y i are simulated from a normal distribution with mean x i (t|θ) and variance σ 2 i , where σ 1 = 1 and σ 2 = 3.We generate 121 observations for each ODE function, equally spaced within [0, 60] (see Figure 3).We use cubic B-spline functions (see Figure 1) to represent ODE trajectories.We put equally spaced knots on each of eight observations.The total number of knots is 16, including 14 interior knots.The total number of cubic B-spline functions is L 1 = L 2 = 18.We select weak prior/reference distributions of β for the SMC algorithm, σ 2 1 ∼ IG(1, 1), σ 2 2 ∼ IG(1, 1), λ ∼ Gamma(1, 1).

One bimodal example
We first alter Equation (10) to produce a symmetric, bimodal posterior for θ 1 , In our adaptive SMC, we set the rCESS threshold φ = 0.9 and resampling threshold ς = 0.5.
The total number of particles we use is K = 500.Under this setting, the number of annealing parameters is 752.We show the approximated intermediate posterior distributions for θ and σ for r = 1, R/6, R/2, R with color from light grey to dark grey (see two panels in the first row of Figure 4).With the increment of annealing parameters, the particles gradually move to the posterior distribution.We create two main modes for θ 1 in Equation ( 11).The SMC algorithm is able to find these two global modes of θ 1 while avoiding being stuck in local modes.We report the estimated ODE trajectories and the 95% pointwise credible bands in the two bottom panels of Figure 4.The estimated mean ODE trajectories are very close to the true ODE trajectories.
The 95% pointwise credible bands covers the true ODE trajectories.

Comparison of SMCs using different φ and K
We conduct experiments to investigate the performance of SMC algorithm with different thresholds of φ and K using the ODE system showed in Equation ( 10).We simulate 20 datasets according to Section 4.1.We compare the performance of SMC algorithm in terms of estimated θ, σ, and estimated ODE.With estimated basis coefficients ĉi , we are able to compute the estimated ith ODE trajectory xi (t) = Φ i (t) ĉi .We define the distance between the estimated i-th ODE trajectory xi (t) and the true ODE x i (t) that we use to simulate data as We select three different levels of rCESS threshold φ (φ = 0.8, 0.9, 0.99).We put equally spaced interior knots on each of the 12 observations.The total number of basis function is 13.
The number of particles we use is K = 500.For each level of φ, we run the SMC algorithm 20 times (once for each dataset).Figure 5   We select three different levels of K (K = 10, 100, 2000).We also put equally spaced interior knots and the total number of basis function is 13.We set rCESS threshold φ = 0.9.For each level of K, we run the SMC algorithm 20 times (once for each dataset).Figure 6 displays boxplots for θ, σ, and RMSE(x i (t)) from SMC with different levels of K.It indicates that the proposed SMC method performs better when we use a large number of particles.The consistency of the SMC algorithm holds when the number of particles goes to infinity (Chopin et al., 2004;Wang et al., 2020;Del Moral et al., 2006).However, we cannot use an arbitrarily large value of K as the computational cost of the SMC algorithm is a linear function of K. We recommend increasing φ in SMC (using a larger number of intermediate distributions R), as increasing R does not increase the memory burden.In addition, we conducted an experiment to investigate the relative importance of φ and K in improving the quality of estimation by SMC.We selected four levels of K, φ combinations: (K, φ) = (10, 0.99), (100, 0.8), (1000, 0.08), (6000, 0.001).For each value of K, we select a corresponding value for φ such that the total computation cost K • R is close to 80, 000.We simulate 20 datasets according to Section 4.1.Figure 7 displays boxplots of ODE parameter estimates with different combination of K, φ values in SMC.This results indicate that for a given amount of computation, a relatively small K and a large φ is optimal.However, a too small value of K is not recommended, as an extremely small K may lead to a large Monte Carlo variance.values, and the RMSE values of estimated ODE trajectories decrease if we increase the number of basis functions from 7 to 16.However, the parameter estimates and RMSE of the estimated ODE trajectories become worse if we use a large number of basis functions.This experiment indicates a sufficient number of basis functions is important in ODE trajectory estimation.However, we do not recommend using a overly large number of basis functions as it will cause over fitting and induce a heavy computational cost.
The second experiment we conduct is a comparison between the performance of SMC algorithm with different choice of λ (λ = 0.1, 1, 10, 100) and the Bayesian scheme (BS).We put equally spaced knots and set the number of basis functions to 16.We set K = 1000 and the rCESS threshold φ = 0.95 for SMC algorithm.For each choice of λ, we run the SMC algorithm with one replicate for the 40 datasets simulated in according to Section 4.1.

A nonlinear delay differential equation example
In this section, we investigate a nonlinear delay differential equation model proposed by Monk (2003) to model the feedback inhibition of gene expression.The nonlinear DDE is described as  follows: In Equation ( 13), x 1 (t) denotes the expression of mRNA at time t, and x 2 (t) denotes the expression of a protein at time t.There is a delayed repression of mRNA production by the protein.The DDE system depends on the transcriptional delay τ , and degradation rates µ m and µ p , the expression threshold p 0 and the Hill coefficient n.As noted in Monk (2003), there is significant nonlinearity in the DDE system when the Hill coefficient n > 4.
We simulate a delay differential equation system with τ = 25, p 0 = 100, µ m = 0.03, µ p = 0.03, and n is set to 8. The observations y i (t) are simulated from a normal distribution with mean x i (t|θ) and variance σ 2 i , where σ 1 = 1 and σ 2 = 5.We generate 101 observations for each DDE function, equally spaced in [0, 500].Figure 11 represents the simulated DDE system, which exhibits oscillations in mRNA and protein expression.
In our adaptive SMC, we set φ = 0.9 and resampling threshold ς = 0.5.The total number of particles we use is K = 300.Under this setting, the number of annealing parameters is R = 850.
We show the parameter estimates and the corresponding 95% credible interval (CI) in Table 1.
The mean of parameters are fairly close to the true values, and the 95% credible intervals cover the true values.The estimated posterior mean of λ is 0.225.x 1 (0) 7 6.87 (4.39, 9.14) x 2 (0) -10 -10.75 (-16.66, -5.06)We reported the estimated DDE trajectories and the 95% pointwise credible intervals in Figure 12.The estimated mean DDE trajectories are generally very close to the true DDE trajectories.
The 95% pointwise credible intervals cover the true DDE trajectories.
(b)) to show an example of the log-likelihood surface over the DE parameters θ, and for the setup of this model we refer to Section 5.1.The log-likelihood surface for θ has multiple isolated modes, and it is very sensitive to different parameter values.
t e x i t s h a 1 _ b a s e 6 4 = " Y b z p D e 6 g D g 1 g o O A Z X u H N M 9 6 L 9 + 5 9 L F o L X j 5 z D H / g f f 4 A 4 4 C R C w = = < / l a t e x i t > ✓ < l a t e x i t s h a 1 _ b a s e 6 4 = " J q E n Y v V 6 P t s K B J Y m B V w E p j I M A N w = " > A A A B 7 X i c b V D L S g N B E J y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J c x O O s m Y 2 Z l l p l c I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S q S w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r U 4 N h w b X U p t 2 x C x I o a C B A i W 0 E w M s j i S 0 o v H t z G 8 9 g b F C q w e c J B D G bK j E Q H C G T m p 2 c Q T I e u W K X / X n o K s k y E m F 5 K j 3 y l / d v u Z p D A q 5 Z N Z 2 A j / B M G M G B Z c w L X V T C w n j Y z a E j q O K x W D D b H 7 t l J 4 5 p U 8 H 2 r h S S O f q 7 4 m M x d Z O 4 s h 1 x g x H d t m b i f 9 5 n R Q H 1 2 E m V J I i K L 5 Y N E g l R U 1 n r 9 O + M M B R T h x h 3 A h 3 K + U j Z h h H F 1 D J h R A s v 7 x K m h f V w K 8 G9 5 e V 2 k 0 e R 5 G c k F N y T g J y R W r k j t R J g 3 D y S J 7 J K 3 n z t P f i v X s f i 9 a C l 8 8 c k z / w P n 8 A o / e P K A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J q E n Y v V 6 P t s K B J Y m B V w E p j I M A N w = " > A A A B 7 X i c b V D L S g N B E J y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J c x O O s m Y 2 Z l l p l c I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S q S w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r U 4 N h w b X U p t 2 x C x I o a C B A i W 0 E w M s j i S 0 o v H t z G 8 9 g b F C q w e c J B D G bK j E Q H C G T m p 2 c Q T I e u W K X / X n o K s k y E m F 5 K j 3 y l / d v u Z p D A q 5 Z N Z 2 A j / B M G M G B Z c w L X V T C w n j Y z a E j q O K x W D D b H 7 t l J 4 5 p U 8 H 2 r h S S O f q 7 4 m M x d Z O 4 s h 1 x g x H d t m b i f 9 5 n R Q H 1 2 E m V J I i K L 5 Y N E g l R U 1 n r 9 O + M M B R T h x h 3 A h 3 K + U j Z h h H F 1 D J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R 5 G c k F N y T g J y R W r k j t R J g 3 D y S J 7 J K 3 n z t P f i v X s f i9 a C l 8 8 c k z / w P n 8 A o / e P K A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J q E n Y v V 6 P t s K B J Y m B V w E p j I M A N w = " > A A A B 7 X i c b V D L S g N B E J y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J c x O O s m Y 2 Z l l p l c I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S q S w 6P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r U 4 N h w b X U p t 2 x C x I o a C B A i W 0 E w M s j i S 0 o v H t z G 8 9 g b F C q w e c J B D G b K j E Q H C G T m p 2 c Q T I e u W K X / X n o K s k y E m F 5 K j 3 y l / d v u Z p D A q 5 Z N Z 2 A j / B M G M G B Z c w L X V T C w n j Y z a E j q O K x W D D b H 7 t l J 4 5 p U 8 H 2 r h S S O f q 7 4 m M x d Z O 4 s h 1 x g x H d t m b i f 9 5 n R Q H 1 2 E m V J I i K L 5 Y N E g l R U 1 n r 9 O + M M B R T h x h 3 A h 3 K + U j Z h h H F 1 D J h R A s v 7 x K m h f V w K 8 G9 5 e V 2 k 0 e R 5 G c k F N y T g J y R W r k j t R J g 3 D y S J 7 J K 3 n z t P f i v X s f i 9 a C l 8 8 c k z / w P n 8 A o / e P K A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J q E n Y v V 6 P t s K B J Y m B V w E p j I M A N w = " > A A A B 7 X i c b V D L S g N B E J y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J c x O O s m Y 2 Z l l p l c I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S q S w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r U 4 N h w b X U p t 2 x C x I o a C B A i W 0 E w M s j i S 0 o v H t z G 8 9 g b F C q w e c J B D G b K j E Q H C G T m p 2 c Q T I e u W K X / X n o K s k y E m F 5 K j 3 y l / d v u Z p D A q 5 Z N Z 2 A j / B M G M G B Z c w L X V T C w n j Y z a E j q O K x W D D b H 7 t l J 4 5 p U 8 H 2 r h S S O f q 7 4 m M x d Z O 4 s h 1 x g x H d t m b i f 9 5 n R Q H 1 2 E m V J I i K L 5 Y N E g l R U 1 n r 9 O + M M B R T h x h 3 A h 3 K + U j Z h h H F 1 D J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R 5 G c k F N y T g J y R W r k j t R J g 3 D y S J 7 J K 3 n z t P f i v X s f i 9 a C l 8 8 c k z / w P n 8 A o / e P K A = = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " r x / s 5 a C v X U P 3 5 G p p l w A k B r T m D 4 o = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J l m y u 3 f s z g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l E h h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q 2 z g 1 j L d Y L G P T i a j l U m j e Q o G S d x L D q Y o k f 4 w m t 7 n / + M S N F b F + w G n C Q 0 V H W g w F o 5 h L P a R p v 1 r z 6 / 4 c Z J U E B a l B g W a / + t U b x C x V X C O T 1 N p u 4 C c Y Z t S g Y J L P K r 3 U 8 o S y C R 3 x r q O a K m 7 D b H 7 r j J w 5 Z U C G s X G l k c z V 3 x M Z V d Z O V e Q 6 F c W x X f Z y 8 T + v m + L w O s y E T l L k m i 0 W D V N J M C b 5 4 2 Q g D G c o p 4 5 Q Z o S 7 l b A x N Z S h i 6 f i Q g i W X 1 4 l 7 Y t 6 4 N e D + 8 t a 4 6 a I o w w n c A r n E M A V N O A O m t A C B m N 4 h l d 4 8 5 T 3 4 r 1 7 H 4 v W k l f M H M M f e J 8 / I S 2 O S A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r x / s 5 a C v X U P 3 5 G p p l w A k B r T m D 4 o = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J l m y u 3 f s z g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l E h h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q 2 z g 1 j L d Y L G P T i a j l U m j e Q o G S d x L D q Y o k f 4 w m t 7 n / + M S N F b F + w G n C Q 0 V H W g w F o 5 h L P a R p v 1 r z 6 / 4 c Z J U E B a l B g W a / + t U b x C x V X C O T 1 N p u 4 C c Y Z t S g Y J L P K r 3 U 8 o S y C R 3 x r q O a K m 7 D b H 7 r j J w 5 Z U C G s X G l k c z V 3 x M Z V d Z O V e Q 6 F c W x X f Z y 8 T + v m + L w O s y E T l L k m i 0 W D V N J M C b 5 4 2 Q g D G c o p 4 5 Q Z o S 7 l b A x N Z S h i 6 f i Q g i W X 1 4 l 7 Y t 6 4 N e D + 8 t a 4 6 a I o w w n c A r n E M A V N O A O m t A C B m N 4 h l d 4 8 5 T 3 4 r 1 7 H 4 v W k l f M H M M f e J 8 / I S 2 O S A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r x / s 5 a C v X U P 3 5 G p p l w A k B r T m D 4 o = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J l m y u 3 f s z g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l E h h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q 2 z g 1 j L d Y L G P T i a j l U m j e Q o G S d x L D q Y o k f 4 w m t 7 n / + M S N F b 1 4 l 7 Y t 6 4 N e D + 8 t a 4 6 a I o w w n c A r n E M A V N O A O m t A C B m N 4 h l d 4 8 5 T 3 4 r 1 7 H 4 v W k l f M H M M f e J 8 / I S 2 O S A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r x / s 5 a C v X U P 3 5 G p p l w A k B r T m D 4 o = " > A A A B 6 3 i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 E 0 D J o Y x n B x E B y h L 3 N J l m y u 3 f s z g n h y F + w s V D E 1 j 9 k 5 7 9 x L 7 l C E x 8 M P N 6 b Y W Z e l E h h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q 2 z g 1 j L d Y L G P T i a j l U m j e Q o G S d x L D q Y o k f 4 w m t 7 n / + M S N F b 1 4 l 7 Y t 6 4 N e D + 8 t a 4 6 a I o w w n c A r n E M A V N O A O m t A C B m N 4 h l d 4 8 5 T 3 4 r 1 7 H 4 v W k l f M H M M f e J 8 / I S 2 O S A = = < / l a t e x i t > D D E I < l a t e x i t s h a 1 _ b a s e 6 4 = " s h a 1 _ b a s e 6 4 = " b + X y l d g p n I w m g 7 5 l Y 9 G Y d k S b L I U = " > A A A B 9 H i c b V D L S s N A F L 2 p r 1 p f V Z d u g k W o m 5 K o o M u i G 5 c V 7 A P a U C b T S T t 2 M o k z N 8 U S 8 h 1 u X C j i 1 o 9 x 5 9 8 4 f S y 0 9 c C 9 H M 6 5 l 7 l z / F h w j Y 7 z b e V W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J o q x O I x G p l k 8 0 E 1 y y O K b 9 b I e r H e r Y / Z a M 6 a 7 x z C H 1 i f P 9 w Z k i U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " u Z q E 4 H p S C f C Q d 3 s k c t Z A e 7 L r c 1 o e e M 3 P v 9 e N A J M p x X n P W 3 P z C 4 l J + u b C y u r a + U d z c a i R R K h m v s y i I Z M v 3 E h 6 I k N e V U A F v x Z J 7 I z / g T X 9 4 p u P N W y 4 T E Y V X a h z z z s g b h K I v m K e I u j z v i m 6 x 5 J Q d s + x Z 4 G a g h G z V o u I L r t F D B I Y U I 3 C E U I Q D e E j o a c O F g 5 i 4 D i b E S U L C x D n u U S B t S l m c M j x i h / Q d 0 K 6 d s S H t t W d i 1 I x O C e i V p L S x R 5 q I 8 i R h f Z p t 4 q l x 1 u x v 3 h P j q e 8 2 p r + f e Y 2 I V b g h 9 i / d N P O / O l 2 L Q h 8 n p g Z B N c W G 0 d W x z C U 1 X d E 3 t 7 9 U p c g h J k 7 j H s U l Y W a U 0 z 7 b R p O Y 2 n V v P R N / M 5 m a 1 X u W 5 a Z 4 1 7 e k A b s / x z k L G g d l 9 6 h 8 c H F Y q p x m o 8 5 j B 7 v Y p 3 k e o 4 I q a q i T 9 w C P e M K z V b V C K 7 X u P l O t X K b Z x r d l P X w A J 7 m Q M g = = < / l a t e x i t > J i < l a t e x i t s h a 1 _ b a s e 6 4 = " P 5 7 t w b 7 q y 8 f m e x T D V V F u Z O W v B N M = " > A A A C y X i c j V H L T s J A F D 3 U F + I L d e m m k Z i 4 I i 0 x 6 p L o x s Q N J v J I k J C 2 D D h Q 2 t p O j U h Y + Q N u 9 c e M f 6 B / 4 Z 1 x S F R i d J q 2 Z 8 6 9 5 8 z c e 9 3 I 5 4 m w r N e M M T e / s L i U X c 6 t r K 6 t b +

Figure 1 :
Figure 1: (a) Graphical representation of DEs; (b) Log-likelihood surface for a DE model.
Figure 1 (a) represents the graphical structure for the proposed DE model.The unknown parameters in our DE model include spline coefficients c i , the delay time parameter τ (which is known in an ODE with τ = 0), the DE parameter θ, and variance parameter σ 2 i .With the basis function representation, finding the DE solution becomes a problem of estimating the basis function coefficients, c = (c 1 , c 2 , . . ., c I ) .In the Bayesian framework, we specify a prior distribution for c conditional on the DE parameters θ, τ , and a smoothing parameter λ, as follows π0 (c|θ, τ, λ) ∝ exp − λ 2 in the appropriate diagonal block and 0 elsewhere.This prior distribution measures how well the estimated DE variables x(t) satisfy the DE system defined on [t 1 , t max ].It is based on treating a penalty term proposed in Ramsay et al. (2007) as a prior similar to how Berry et al. (2002) incorporated a penalty as a prior.The smoothing parameter λ controls the trade-off between fit to the data and fidelity to the DE model.Details on selecting a proper λ will be discussed in Section 2.3.

First, in online
problems, data arrive sequentially and we aim to do inference sequentially in time.The distribution π r (β) is then the unnormalized posterior distribution conditioning on the first r batches of data.Second, as in this work, a sequence of intermediate target distributions is introduced to facilitate the exploration of the state space.At each step r, we use a collection of K samples to represent π r (β), denoted by {β (k) r } K k=1.Each of these K samples is called a particle.There is a positive weight associated with each particle β weight.From iteration r to r + 1, we move particles from {β seminal work of DelMoral et al. (2012);Zhou et al. (2016);Wang et al. (2020).Note that the weight function (Equation8) for iteration r only depends on particles of the (r−1)-th iteration, and the difference between two successive annealing parameters α r − α r−1 .This indicates that we can "manipulate" w(k) r by changing the annealing parameter α r .If α r is close to α r−1 , the incremental weight function wit would be if we choose a larger value of α r .This provides the intuition that we are able to control the discrepancy between two successive intermediate target distributions by manipulating α r .

Figure 2 :
Figure 2: Blowfly population in one experiment published in Nicholson (1954); the time unit is one day.

Figure 3 :
Figure 3: Estimated posterior mean trajectory and 95% pointwise credible intervals for the delay differential equation modelling the population dynamics of blowflies.

Figure 5
Figure 5 shows the comparison of four algorithms in terms of estimating θ.Panel (a) and (b) show samples of the intermediate posterior distributions for θ by running SMC-spline and SMC-deSolve, respectively.The points with colors from light grey to dark grey are samples for r = 1, R/6, R/2, R, where R = 942 in SMC-spline and R = 1220 in SMC-deSolve.Panel (c) and (d) display the trace plots for θ by running MCMC-spline and MCMC-deSolve, respectively.The black dots indicate the true parameter values in the ODE.For SMC-spline and SMC-deSolve, the particles gradually move to the posterior distribution with increasing annealing parameters.For MCMC-spline, the acceptance ratio of MH algorithm is 21.4%.It can explore one mode rather than two modes created in Equation (12).For MCMC-deSolve, the acceptance rate of MH algorithm is 25.2%.It gets stuck in local modes close to the initial value, and cannot explore the two modes.Table2shows the posterior means of θ and x(0) as well as the corresponding 95% credible interval (CI) using SMC-spline, SMC-deSolve, MCMC-spline, and MCMC-deSolve.SMC-spline and MCMC-spline provide posterior means close to the true values, and the corresponding credible intervals cover the true values, while MCMC-spline can only explore one mode of θ 1 .Our experiment in Supplementary Section 4.1.1reports the estimated ODE trajectories and

Figure 5 :
Figure 5: (a) and (b): Intermediate posterior distributions for θ by running SMC-spline and SMC-deSolve, respectively.The points with colors from light grey to dark grey are samples for r = 1, R/6, R/2, R. (c) and (d): Trace plots for θ by running MCMC-spline and MCMC-deSolve, respectively.The black dots indicate true parameter values for generating ODEs.
We use B-spline functions to represent the DDE trajectories and use SMC to estimate parameters.The posterior means of the parameters are fairly close to the true values, and the 95% credible intervals cover the true values.The estimated mean DDE trajectories are generally very close to the true DDE trajectories.The 95% pointwise confidence bands cover the true DDE trajectories.The details of setups and results are shown in Supplementary Section 4.2.

Figure 1 :
Figure 1: The thirteen B-spline basis functions defined on [0, 1] with degree d = 3 and 9 equally spaced knots; each basis function is positive over at most + 1 adjacent subintervals.The continuity characteristics of cubic B-spline functions make the segment joints smooth (Vince and Vince, 2010).

Figure 3 :
Figure 3: Simulated ODE trajectories and observations.Red lines in the figure refer to simulated ODE trajectories and blue points refer to simulated observations.

Figure 4 :
Figure 4: Intermediate posterior distributions for θ and σ (two panels in the first row, points with color from light grey to dark grey are samples for r = 1, R/6, R/2, R, the black dots indicate true values for generating ODE), and estimated ODE trajectories and the 95% confidence bands (two panels in the second row).

Figure 5 :
Figure 5: ODE parameter estimates with different rCESS in SMC.

Figure 6 :
Figure 6: Boxplots of ODE parameter estimates with different values of K in SMC.
Figure 9 displays the ODE parameter estimates with different choices of λ.The Bayesian scheme performs satisfactorily in terms of parameter estimates and RMSE of the ODE trajectories.Figure 10 displays the posterior samples of λ for one SMC replicate.

Figure 8 :
Figure 8: ODE parameter estimates obtained from SMC with different numbers of knots.

Figure 9 :
Figure 9: ODE parameter estimates with different choices of λ.

Table 1 :
Posterior mean and corresponding 95% credible interval (CI) for parameters in population dynamics of blowflies.

Table 4 :
Estimated parameters and MSE of W (t) for three simulated data sets.

Table 5 :
Coverage probability and averaged value of the 95% CI.
displays boxplots of the posterior samples of θ, σ, and RMSE(x i (t)) from SMC with different rCESS thresholds.It indicates that the parameter estimates get closer to the true values, and RMSE of the ODE trajectories gets smaller, when we increase the rCESS threshold.A higher value of rCESS threshold is equivalent to more intermediate target distributions.

Table 1 :
Parameter estimates and the corresponding 95% credible interval (CI) for nonlinear DDE