Bootstrapping Two-Stage Quasi-Maximum Likelihood Estimators of Time Series Models

Abstract This article provides results on the validity of bootstrap inference methods for two-stage quasi-maximum likelihood estimation involving time series data, such as those used for multivariate volatility models or copula-based models. Existing approaches require the researcher to compute and combine many first- and second-order derivatives, which can be difficult to do and is susceptible to error. Bootstrap methods are simpler to apply, allowing the substitution of capital (CPU cycles) for labor (keeping track of derivatives). We show the consistency of the bootstrap distribution and consistency of bootstrap variance estimators, thereby justifying the use of bootstrap percentile intervals and bootstrap standard errors.


Introduction
Many econometric models are estimated in stages in order to make them more computationally tractable. For example, copula-based models often have the marginal distribution parameters estimated in the first N stages, and then the copula parameters estimated in a final stage (see, e.g., Joe 1995;Patton 2006). Similarly, the famous Dynamic Conditional Correlation (DCC) GARCH model of Engle (2002) first estimates univariate GARCH models for each variable, and then the correlation matrix component is estimated. In empirical finance, the ubiquitous Fama-MacBeth (1973) procedure involves estimating assets' exposures to risk factors in a first stage, before estimating risk premia associated with those factors in a second stage. In all cases, valid inference on parameters in the final stage must account for estimation error accumulated in earlier stages.
Standard methods to account for estimation error from earlier estimation stages requires computing many first-and second-order derivatives of the overall objective function. 1 In most cases, it is difficult or tedious to obtain these analytically, and many authors instead rely on numerical derivatives. Both of these approaches are susceptible to error: human, in the first case; numerical, in the second. Inference based on the bootstrap is an attractive alternative, allowing the substitution of capital (CPU cycles) for labor (keeping track of derivatives). See, for example, Cochrane (2001, chap. 15.2) for bootstrap inference in two-pass regressions and Patton (2012) for bootstrap inference in copula-based models.
Some related results are available in the literature. For example, Chen, Linton, and van Keilegom (2003) and Armstrong, Bertanha, and Hong (2014) consider the bootstrap for twostep nonlinear parameteric and semiparameteric models, and Cattaneo, Jansson, and Ma (2019) considers bootstrap inference for two-step GMM estimators, but all of these articles consider only iid data. Bootstrap methods for one-step quasimaximum likelihood estimators (QMLE) have been considered (e.g., Gonçalves and White 2004) for both iid and time series data, but no general results are available for multistep QMLE. We address this gap and show bootstrap validity for such applications under very general time series dependence and heterogeneity.
We consider two approaches for bootstrap inference. The first jointly resamples the contributions to the quasi loglikelihood functions in the various stages of estimation. Since model misspecification at any stage can induce time series dependence in the scores of each model, we rely on the moving blocks bootstrap (MBB) of Künsch (1989) and Liu and Singh (1992). This approach involves re-estimating the model on each bootstrapped objective function, and this "fully optimized" approach is thus computationally demanding.
The second approach is a fast resampling method that avoids any optimization problem in the bootstrap world. In particular, our proposal is to resample the score function underlying the asymptotic linear representation of the final-step QMLE, evaluated at the parameters estimated in the real data. This is related to the resampling method proposed in Armstrong, Bertanha, and Hong (2014), however, in their approach the firststage model is re-estimated on each bootstrap sample (which has advantages in the semiparameteric settings considered in that article) while our proposed method requires no estimation in the bootstrap world. 2 We prove two sets of results for both bootstrap methods. For simplicity, we provide formal results only for the twostep QMLE. 3 First, we show the consistency of the bootstrap distribution of the final-stage parameter estimator for the usual limit distribution of that parameter, justifying the construction of bootstrap percentile intervals. We consider regularity conditions that allow for time series dependence and heterogeneity of unknown form, extending the work of Gonçalves and White (2004) who show the asymptotic validity of the MBB for inference on one-step QMLE. Second, we prove the consistency of bootstrap standard errors. Although these are very popular in applied work, their consistency does not automatically follow from the consistency of the bootstrap distribution. This was recently emphasized by Hahn and Liao (2020), who proved that bootstrap standard errors for smooth functions of iid data may lead to conservative inference. Here, we show that the bootstrap variance estimator of the two-step QMLE is consistent by verifying a certain uniform integrability condition. We follow the approach of Kato (2011) and Cheng (2015) and rely on empirical process theory to prove our results. The results of those two articles apply to onestep M-estimators with iid data and do not cover time series applications. They also do not cover two-step QMLE, even for the iid data. Similarly, although the results of Gonçalves and White (2005) allow for time series dependence, they are specific to the one-step least squares estimator. Thus, no results appear to be available regarding the consistency of bootstrap variance estimators of one or multi-stage QMLE with general time series dependence.
We present Monte Carlo simulations to illustrate the usefulness of our results. We consider the estimation of a bivariate copula model, where estimation is done in stages and the parameter of interest is the copula parameter, which is estimated in the final (third) stage. In addition to the standard confidence interval that relies on analytical standard errors, we consider two types of bootstrap-based intervals: intervals that rely on bootstrap standard errors, but use the normal critical value, and bootstrap percentile intervals. We find that all methods provide similarly good coverage probabilities, even for the smaller sample sizes. This is in agreement with the theory of the bootstrap since none of the methods promises asymptotic refinements. However, we find that the competing inference methods differ in their confidence interval lengths. In particular, the intervals based on the fully optimized bootstrap method tend to be narrower than the intervals based on either the fast resampling method or the usual asymptotic approach using analytical standard errors. We show that this is due to the fully optimized bootstrap standard error having a smaller mean squared error than the competing methods. Thus, although more computationally intensive than the fast resampling method, the 2 Other artcles that have proposed fast resampling methods include Davidson and MacKinnon (1999), Andrews (2002), Gonçalves and White (2004), Hong and Scaillet (2004), and La Vecchia, Moor and Scaillet (2020), among others. However, all these articles consider one-step M or GMM estimators. 3 These results can easily be generalized to multistage QMLE. In our Monte Carlo simulations, we estimate a copula model involving a three-stage QMLE.
fully optimized bootstrap intervals have better finite sample properties. The rest of the article is organized as follows. In Section 2, we present the framework and provide an example of a two-step QMLE based on the bivariate copula model. In Section 3, we describe our two bootstrap methods and prove their consistency in Section 4. Section 5 contains the simulation results and Section 6 concludes. Assumptions are collected in Appendix A. The supplementary materials contains the proofs of the bootstrap distribution consistency and the proof of the bootstrap variance consistency results. This supplementary materials also contains two general bootstrap consistency theorems for two-step M and GMM bootstrap estimators, which could be of independent interest as their high level conditions can be verified for any bootstrap scheme. Our focus on the two-stage QMLE estimator based on the moving blocks bootstrap in the main text is justified by the fact that this covers the main applications in finance we have in mind.

Two-Stage Quasi Maximum Likelihood Estimation
Suppose X t : → R l , t ∈ N denotes a sequence of R l -valued random vectors defined on some probability space ( , F, P). Let = A × B, where A and B are compact subsets of finite dimensional Euclidean spaces. Given data {X t : t = 1, . . . , n}, our goal is to estimate a parameter vector β 0 ∈ B ⊂ R p by a twostage quasi-maximum likelihood (2QMLE) estimator. We focus on two-stage QMLE for ease of presentation, but our results generalize directly to multi-stage QMLEs. (e.g., our simulation study considers a three-stage QMLE problem.) In the first step, we estimate α 0 ∈ A ⊂ R k witĥ where Q 1n (α) ≡ n −1 n t=1 log f 1t X t , α , with X t ≡ (X 1 , . . . , X t−1 , X t ), for some quasi-likelihood function f 1t X t , α : R lt × A →R + . To simplify the notation, we sometimes write f 1t (α) ≡ f 1t X t , α . A similar notation is used for any other function of X t throughout.
In the second step, we estimate β witĥ We allow for time heterogeneity in f 1t (α) and f 2t (α, β) (i.e., the functional forms may depend on t) and we also allow for the possibility that these functions depend on the past information up to time t (i.e., X t is a vector of possibly growing dimension).

An Example: Copula Models
An example of time series models that are often estimated in multiple stages are copula-based multivariate models. These models combine separately estimated marginal distributions via a copula function to form a joint distribution. When the parameters that characterize the marginal distributions are different from those that characterize the copula density function, estimation and inference can be done in stages. Our results can be useful in this context.
To illustrate, let X t ≡ (X 1t , X 2t ) denote a random vector whose joint conditional density we would like to model. By the usual decomposition, we can write where g t X t |F t−1 , θ is the conditional density function of X t given F t−1 . Suppose X it |F t−1 ∼ G it (α i ) , some distribution function parameterized by a set of parameters α i with density function g it (α i ). The joint (conditional) pdf of X t is then given by where c t (·, ·, β) is a copula density function parameterized by β, and θ = (α 1 , α 2 , β) denotes the full set of parameters. It follows that the joint log-likelihood function can be written as When the parameters characterizing the marginals and the copula function are separable (i.e., the parameters that enter one marginal do not enter another marginal nor the copula function and there are no cross equation restrictions), these parameters can be estimated in stages. In particular, first estimate α i by QMLE: and then estimate the copula parameters β in a second stage bŷ Thus, in our previous notation, log f 2t X t ,α n , β , wherê α n = α 1n ,α 2n , and f 2t X t ,α n , β ≡ c t G 1t X 1t ,α 1n , G 2t X 2t ,α 2n |F t−1 , β .
The contributions to this quasi-log-likelihood function depend on the sample of X t = (X 1t , X 2t ) up to time t through the integral probability transforms G it X it ,α in .

Asymptotic Properties of Two-Stage QMLE
We now briefly review the asymptotic properties of the twostage QMLE to assist in later explaining the properties that the bootstrap needs to have in order to be asymptotically valid. These properties are well known in the literature, see, for example, White (1994), Newey and McFadden (1994), and Wooldridge (1994).
Let α 0 be the unique maximizer ofQ 1 (α) ≡ lim n→∞ E (Q 1n (α)) on A and let β 0 be the unique maximizer ofQ 2 (α 0 , β) ≡ lim n→∞ E (Q 2n (α 0 , β)) on B. 4 Then, under Assumption A in Appendix A, we can show thatβ n P −→ β 0 and As this result shows, the impact of the first stage estimation of α 0 is not negligible asymptotically except when F 0 = 0. This implies that we need to adjust the standard errors ofβ n for the added estimation uncertainty ofα n . Although a consistent estimator of J 0 can be obtained by applying a HAC (heteroscedasticity and autocorrelation covariance) estimator to s 2t α n ,β n −F nÂ −1 n s 1t α n (whereF n andÂ n are consistent estimators of F 0 and A 0 ) in practice the bootstrap is often used. Our goal is to provide a set of conditions that justify this practice in time series applications.

Bootstrap Methods
The asymptotic validity of the bootstrap depends on its ability to mimic the asymptotic variance-covariance matrix ofβ n . The form of J 0 suggests that the bootstrap should replicate the time series dependence and the heterogeneity properties of the score vector s t (α 0 , . Model misspecification at any stage can induce time series dependence and our approach is to use a block bootstrap. In particular, we rely on the moving blocks bootstrap (MBB) of Künsch (1989) and Liu and Singh (1992). See also Gonçalves and White (2002, 2005 for the validity of the MBB under general time series dependence and heterogeneity. Although we focus on the MBB, other bootstrap methods which are robust to serial dependence could be used. Examples include the tapered block bootstrap of Paparoditis and Politis (2002) and the kernel block bootstrap of Parente and Smith (2021). The validity of these alternative bootstrap schemes could be established by verifying the high level conditions given in Theorems S4.3 and S4.4 in the supplementary materials.
We consider two different methods. One is based on resampling the contributions to the likelihood functions f 1t (α) (which yields a bootstrap QMLEα * n ) and f 2t α * n , β (which is optimized over β to yieldβ * n ). The same bootstrap indices obtained with the MBB are used across the two stages, ensuring that this method mimics the time series dependence of the extended score. Because it requires two (or multiple) sets of maximization, this method may be computationally intensive. For this reason, we also propose another bootstrap method which resamples directly the estimated score s t α n ,β n ≡ s 2t α n ,β n −F nÂ −1 n s 1t α n . Our simulations show that this method yields wider confidence intervals than the fully optimized bootstrap method. In particular, the fast resampling standard errors have larger mean squared errors compared to the fully optimized standard errors, especially for the smaller sample sizes. This translates into wider confidence intervals for the parameter of interest.
Both methods involve resampling certain functions of the data using the MBB to obtain new indices, which can be described as follows. For a generic time series consecutive observations starting at Z t ( = 1 corresponds to the standard iid bootstrap). For simplicity take n = k . The MBB draws k = n/ blocks randomly with replacement from the set of overlapping blocks B 1, , . . . , B n− +1, . Letting I 1 , . . . , I k be iid random variables distributed uniformly

The Fully Optimized Bootstrap Method
The first method we consider requires resampling the contributions to the two (or more) likelihood functions f 1t and f 2t and then computingα * n andβ * n using these resampled likelihood functions. More specifically, the bootstrap analogue ofα n is where the indices τ t are chosen by the bootstrap. Thus, we resample the functions f 1t (α) rather than the data directly. However, when Resampling both functions f 1t and f 2t with the same set of indices is important because this will preserve the form of dependence between the two functions. In particular, this will guarantee that the bootstrap is able to mimic the dependence in the score vector s t α n ,β n . If instead we used two different sets of indices, say τ 1t and τ 2t , generated independently of each other, this would induce an independence between f * 1t and f * 2t which would not necessarily exist for the original functions.

A Fast Resampling Method
Bootstrapping multi-stage extremum estimators can be computationally intensive as this may require solving multiple optimization problems for each resample. For this reason, we also consider a fast resampling method for bootstrapping two-step QMLE which has a closed form expression and avoids any numerical optimization. To describe this estimator, let The fast resample two-step QMLE is given bŷ where s * t α n ,β n is a resampled version of Thus,β * 1,n is a one-step bootstrap QMLE which updatesβ n using the estimated HessianĤ n and the bootstrap scores s * t α n ,β n , evaluated atα n andβ n . A special case ofβ * 1,n is a version of the one-step bootstrap QMLE considered in Gonçalves and White (2004) (henceforth GW(2004)) in the context of one-stage QMLE. In that article, β n does not depend on a first stage estimatorα n , implying that s 2t α n ,β n = s 2t β n and s t α n ,β n = s 2t β n . The only difference with respect to GW(2004) in this case is that our proposal only resamples the estimated scores and does not involve resampling the contributions to the HessianĤ n (instead, their one-step bootstrap QMLE involves resampling both; see also Davidson and MacKinnon (1999) and Andrews (2002) who proposed k-step bootstrap methods that resample the contributions to the Hessian and the score vector at each iteration, starting from the original estimators).
β * 1,n is also related to a fast resampling approach proposed by Armstrong, Bertanha, and Hong (2014) in the context of a two-step GMM estimator with iid data, where the first step is a potentially nonparameteric estimator (see also Chen, Linton, and van Keilegom 2003;Chen and Liao 2015). In our context, it amounts toβ * There are two main differences betweenβ * 1,n andβ * 1,n . First, β * 1,n requires computingα * n whereasβ * 1,n does not. Hence,β * 1,n only avoids the computational burden of the second step and not of the first step. Instead, our method avoids computinĝ α * n for each resample and therefore is computationally more attractive. Second,β * 1,n resamples the scores of the second-stage model (evaluated at α * n ,β n ), whereas our method involves resampling s t α n ,β n = s 2t α n ,β n −F nÂ −1 n s 1t α n . We can think of this vector as an "extended" version of the scores for the second-stage, extended by the term −F nÂ −1 n s 1t α n . This term corrects for the added uncertainty due to the first step. We note that it would not be valid to resample s 2t α n ,β n unlessF n = 0.

Bootstrap Theory
We discuss two uses of the bootstrap for inference on β usinĝ β n . In Section 4.1 we consider using the bootstrap distribution of √ n β * n −β n (or √ n β * 1,n −β n ) to approximate the quantiles of the distribution of √ n β n − β 0 . This approach underlies the construction of percentile bootstrap intervals for β. Even though it does not promise asymptotic refinements, it is empirically attractive as it does not require computing standard errors forβ n . An alternative is to use the bootstrap to estimate standard errors, which we consider in Section 4.2.

Bootstrap Distribution Consistency for Nonstudentized Statistics
The first order asymptotic validity of the MBB based on the fully optimized bootstrap two-step QMLEβ * n follows by showing that the bootstrap distribution of √ n β * n −β n is consistent for the distribution of √ n β n − β 0 .This result requires we strengthen Assumption A as follows.
Assumption B. For some r > 2 and some δ > 0, These assumptions are weaker than those used by GW (2004) (see their Assumption 2.1) and are sufficient to show that a bootstrap CLT applies to s , as shown by Gonçalves and de Jong (2003). Assumption B.4 is a restatement of Assumption 2.2 of Gonçalves and White (2002) and is satisfied when the models are correctly specified or when the scores s 1t X t , α 0 and s 2t X t , α 0 , β 0 are stationary (this follows if {X t } is a strictly stationary process, the likelihood functions f 1t (α) and f 2t (α, β) depend only on a finite number of lags of X t and there is no time heterogeneity on f 1t and f 2t ). Under this assumption, the bootstrap covariance matrix of the scaled average of s * 2t (α 0 , β 0 ) − F 0 A −1 0 s * 1t (α 0 ) converges to J 0 , the correct asymptotic covariance matrix of √ n β n − β 0 . In the following theorem, and throughout, we let E * , var * and P * denote the bootstrap expectation, variance and probability measure induced by the resampling, conditional on the original sample.
Theorem 4.1. Let Assumption A as strengthened by Assumption B hold. If n → ∞ and n = o n 1/2 , then To prove Theorem 4.1, we verify the conditions of Theorem S2.4 in the supplementary materials. This result shows the consistency of the bootstrap distribution of a general two-step M-estimatorβ * n (based on an asymptotically linear first-step estimatorα * n ) under a set of bootstrap high level conditions (Assumption B * ). We show that Assumption A strengthened by Assumption B verifies Assumption B * .
The first-order asymptotic validity of the fast resampling method is given in the next result. Its proof is a by-product of the proof of Theorem 4.1 and is omitted.

Bootstrap Distribution Consistency for Studentized Statistics
Here, we focus on testing hypotheses about β 0 based on Wald statistics. 5 Let r : B → R q , where B ⊂ R p , q ≤ p, be a continuously differentiable function on B such that R 0 ≡ ∂ ∂β r (β 0 ) has full row rank q. The Wald statistic for testing H 0 : ∂β ∂β log f 2t α n ,β n is a consistent estimator of H 0 , andĴ n is such thatĴ n − J 0 = o P (1) . We define the bootstrap Wald statistic as follows For the fully optimized bootstrap method, we setr * n = r β * n ,R * n = ∂ ∂β r β * n and ∂ ∂α s 2τ t α * n ,β * n . For the fast resampling method, we setr * n = r β * 1,n ,R * n = ∂ ∂β r β * 1,n andĈ * n s 1τ t α n . For the fast resampling approach proposed by Armstrong, Bertanha, and Hong (2014), we setr * n = r β * 1,n ,R * n = ∂ ∂β r β * 1,n andĈ * where J * 1,n = k −1 k i=1 −1/2 t=1 s I i+t α * n ,β n −1/2 t=1 s I i+t α * n ,β n , such that s τ t α * n ,β n = s 2τ t α * n ,β n −F nÂ −1 n s 1τ t α * n . Next we show that the sampling distribution of W n is well approximated by the bootstrap distribution of W * n . For this, we strengthen Assumption B4 as follows.
The proof of Theorem 4.3 is in the supplementary materials; it builds upon results in Gonçalves and White (2004, see, Theorem 3.1).

Bootstrap Variance Consistency
Bootstrap standard errors are often used in applied work as they are easy to compute, avoiding the need to look up complicated formulas. This is especially true in multistage estimation, where these formulas become quickly involved due to the need to keep track of the added uncertainty caused by each estimation stage. Instead, bootstrap standard errors are easily computed by Monte Carlo simulation. For instance, we can approximate the bootstrap variance estimator of the parameterβ * n,j , var * β * n,j , with the sample variance obtained across B replications ofβ * n,j , The corresponding bootstrap standard error is the square root of this expression. The previous results (Theorems 4.1 and 4.2) do not justify by themselves the consistency of bootstrap standard errors based onβ * n orβ * 1,n . The reason is that convergence in distribution of a random sequence does not imply convergence of moments. For instance, Ghosh et al. (1984) and Shao (1992) give examples of the inconsistency of bootstrap variance estimators for the sample median and smooth functions of sample means, respectively, in the iid context. Despite this, applied researchers routinely apply the bootstrap when computing standard errors. For a recent article emphasizing this point, see Hahn and Liao (2020), who show that bootstrap standard errors can lead to conservative inference.
The main goal of this section is to provide a theoretical justification for computing bootstrap standard errors in the context of two-step QMLE with time series data. The current bootstrap literature does not cover this case as it has either assumed iid data (as in Kato (2011) andCheng (2015), who prove the consistency of bootstrap variance estimators for onestep M-estimators) or has considered time series least squares estimators, as in Gonçalves and White (2005). No results appear to be available for multistage QMLE, even for iid data.
Given our previous bootstrap distribution consistency results, a sufficient condition for showing the consistency of the corresponding bootstrap standard errors is to show that a uniform integrability condition holds. In particular, to show that var * √ nβ * n is consistent, it suffices to show that E * √ n β * n −β n 2+δ = O P (1) for some small δ > 0. Becausê β * 1,n has a closed form expression, it is substantially easier to verify this condition for the fast resampling method than for the fully optimized bootstrap two-stage QMLE. For this reason, we focus on this estimator first.
We impose a smoothness condition on the vector of scores {s 1t } and {s 2t } which we did not need for bootstrap distribution consistency. Next, we consider the fully optimized bootstrap estimator β * n . Similarly to Theorem 4.4 , we prove the consistency of var * √ nβ * n by relying on Theorem 4.1 and showing that E * √ n β * n −β n 2+δ = O P (1) for some small δ > 0. Becausê β * n does not have a closed form expression, this condition is much harder to verify than forβ * 1,n and requires a different method of proof and a different set of assumptions.
Our proof and regularity conditions are inspired by 2011 and Cheng (2015). Kato (2011) shows the consistency of bootstrap moment estimators for M-estimators of parameteric models, whereas Cheng (2015) allows for semiparameteric models, where the parameter of interest is a finite dimensional parameter, but the model also contains a nuisance parameter that is potentially infinite dimensional. Both articles focus on onestep M-estimators and give sufficient conditions for bootstrap variance consistency that only cover iid data. Our contribution is to extend those results to two-stage M-estimation with time series data.
To present our regularity conditions, we need to introduce more notation. First, because our proof is based on showing that the unconditional moment of √ n β * n −β n 2+δ is finite, we need to introduce the joint probability measure P = P × P * that accounts for the two sources of randomness inβ * n : the randomness that comes from the original data (and which is described by P) and the randomness that comes from the resampling, conditional on the original sample (described by P * ). In the following, we write E to denote expected value with respect to P. Second, to prove that E √ n β * n −β n 2+δ < ∞, we assume the uniform square integrability of the original two-step QMLE estimator (i.e., we assume that E √ n β n − β 0 2+δ < ∞) and provide regularity conditions that allows us to show that E √ n β * n − β 0 2+δ < ∞. We follow Kato (2011) andCheng (2015) and use an argument that entails bounding the tail probability P √ n β * n − β 0 > u for large u. This requires empirical process theory and maximal inequalities. In particular, we impose bounds on the L p -moments (with p > 2 + δ) of the supremum of certain empirical processes which we describe next.
For any class of functions F = f t , define the empirical process G n f = n −1/2 n t=1 f t − Ef t and let its norm be given by G n F = sup f ∈F G n f .
Our assumptions are as follows.
vi. sup Assumption B.6 (i) assumes that the log-likelihood functions f 2t and the population criterion functionQ 2 (α, β) ≡ E log f 2 X t , α, β are time invariant (the latter will follow from the first under stationarity of {X t }).
To understand Assumptions B.6(ii)-(iv), suppose that β is a scalar and consider for example the generated regressor problem. In particular, suppose the following regression model y t = βq t + e t , where the regressor q t = w t α is latent because we do not observe α . If x t = w t α + η t , where x t and w t are observed, we can obtain a consistent estimator of α by running an OLS regression of x t on w t . This yields a generated regressorq t = w tα , which we can then use to obtain a consistent estimator of β. In terms of our notations, and letting e t |w t , x t ∼ iid N 0, σ 2 , the log-likelihood function underlying the second step is given by log f 2t (α, β) = −log 2πσ 2 − 1 2σ 2 y t − w t αβ 2 and This verifies condition B.6(i). To verify Condition B.6(ii), note that by a second-order Taylor expansion ofQ 2 (α 0 , β) around β 0 , So, the condition will be satisfied if we can bound ∂ 2 ∂β 2Q2 (α 0 , β) by a negative constant −K, for any value of β. This is true ifQ 2 (α 0 , β) is a quadratic function of β, as in the generated regressor problem. This is a strong condition since it imposes a global restriction onQ 2 (α 0 , β), but it is crucial for controlling the tail probability P √ n β * n − β 0 > u , as Kato (2011) andCheng (2015) note. A similar condition is also used by Nishiyama (2010) to prove the moment convergence of the original M-estimator.
Assumption B.6(iii) (see, Equation (5)) is a high level condition on the empirical process G n Cheng (2015) relies on a similar assumption to show the consistency of bootstrap onestep moment estimators of any order p ≥ 1 for iid data. This so-called L p -maximal inequality condition can be verified under more primitive conditions involving the structure of the function class N η , for example, Cheng (2015) shows that (for one-step M-estimators with iid data) condition (5) is implied by a finite uniform entropy integral condition, which is verified when the functions in N η are Lipschitz continuous. Our Assumption B.6(iii) adapts Cheng (2015)'s high level condition to the two-step QMLE context, where we use the function class N η .
To verify Condition B.6(iii) in the context of the generated regressor problem, note that log . Given that β and β 0 ∈ B, a compact subset of R, it follows that log f 2t (α, β) − log f 2t (α, β 0 ) ≤ C |w t | y t + |w t | |β − β 0 |, for some sufficiently large constant C. Hence, the condition follows by Remark 2.3 of Kato (2011) (relying for instance on Lemma 2.14.1 of van der Vaart and Wellner (1996) provided that E |w t | y t + |w t | p < ∞ < ∞, for some p > 2 + δ.
Assumptions B.6(iv) and (v) are new to the two-step estimators we treat here. Part (iv) imposes a Lipschitz continuity condition on the score and the Hessian of log f 2t (α, β) with respect to α. Note that in the example of OLS with generated regressor, condition (iv) follows provided that E w 2 t + y 2 t ε ε−1 p < ∞, for p > 2 + δ as in Assumption B.6(iii) and for some ε > 1. Part (v) imposes uniform integrability conditions on the firststep estimatorα n and its bootstrap analogα * n . Similarly, part (vi) assumes the uniform integrability condition onβ n . These high level conditions could be derived from more primitive conditions such as the ones used by Cheng (2015) or Kato (2011), but we prefer to state them as high level conditions since our focus is on the second-step bootstrap estimatorβ * n . It is nevertheless interesting to note that stronger than usual uniform square integrability conditions on the first-step estimators are imposed in order to verify the uniform square integrability condition on the second stage bootstrap estimatorβ * n . In particular, we require the existence of a bit more than six moments for √ n α n − α 0 and its bootstrap analogue. This is three times more than the number of moments for the second-step estimatorsβ n andβ * n . When the log-likelihood function log f 2t is quadratic in α and β, . Under these assumptions, we can prove the following theorem.
Theorem 4.5. Suppose Assumptions A and B strengthened by Assumption B.6 holds. Then, for some δ > 0,

Monte Carlo Simulations
We here assess the properties of the bootstrap approximation proposed in Sections 3 and 4. We do so via detailed and realistic Monte Carlo simulations and we start by describing the design of the study. We consider a bivariate copula-based model. Each variable's marginal distribution is an AR(1)-GARCH(1,1) with standardized Student's t errors: We examine the case where the amount of dependence between the two variables X 1 and X 2 is related to the Clayton copula, with parameter β = 1, which roughly implies linear correlation of 0.5 (see Nelsen 1999) for more on this copula). We use parameters similar to those found in applied work (see, e.g., Oh and Patton 2013): 0.05, 0.05, 0.9], and ν 1 = ν 2 = ν, such that ν ∈ {6, 10} . Thus, we have two DGPs, which differ only in the value of the Student's t parameter ν, which control the thickness of the tail of the distributions. Note that when ν → ∞, this implies that η ∼ N (0, 1). We generate repeated trials of length n ∈ {200, 500, 2500} from these processes and conduct inference based on a misspecified model that assumes φ 1,i = 0 (i.e., we estimate a constant mean-GARCH(1,1)-Student's t-Clayton copula model for each trial).
It is easy to see that our bivariate density models constructed using copulas can be partitioned into elements relating only to a marginal distribution and elements that relate only to the copula. As pointed out by Joe (1995) and Patton (2006), when such a partition is not possible, the familiar one-stage maximum likelihood estimator is the natural estimator to employ. However, when this partitioning is possible as in our simulation setting, great computational savings may be achieved by employing a multi-stage estimator. Therefore, in the following we consider the multi-stage maximum likelihood estimator (MSMLE).
Our estimation steps are: 1. Estimate the conditional mean of X i , i = 1, 2, using the sample mean n −1 n t=1 X it . Obtain the estimated residualŝ ε it = X it − n −1 n t=1 X it . 2. Estimate the conditional variance parameter using QML (with normal log-likelihood) and the residuals from step 1, conditioning on the realizations for t = 1. Obtain the estimated standardized residualsη it . 3. Estimate ν using ML and the standardized residuals from step 2. Obtain the estimated probability integral transforms (PITs)Ĝ it . 4. Estimate β using ML and the estimated PITs from step 3.
More generally in a multivariate d-dimensional application (with d ≥ 2 ) there are a total of 3d + 1 estimation steps: three steps for each marginal distribution, and 1 step for the copula. With misspecification as in our context, the scores are generally correlated, hence, justifying the use of blockwise bootstrap methods. Furthermore, for misspecified models, the QMLE is generally inconsistent for the copula true parameter β, instead confidence intervals for the copula pseudo-true parameter β 0 are obtained. We evaluate by simulation, the pseudo-true parameter using 1 million observations.
To generate the bootstrap data, we use the MBB. The number of Monte Carlo trials is 1000 with B = 999 bootstrap replications each. We implement three resampling methods: the fully optimized bootstrap procedure B1, the fast resampling approach B2 and the fast resampling approach proposed by Armstrong, Bertanha, and Hong (2014) B3. To select the block size, we rely on the asymptotic equivalence between the MBB and the Bartlett kernel variance estimators, and choose equal to the bandwidth chosen by Andrews's automatic procedure for the Bartlett kernel.
We consider three types of confidence intervals for the copula pseudo-true parameter β 0 : asymptotic normal theory-based confidence intervals, computed by using the quantile of the standard normal distribution, bootstrap percentile confidence intervals, and bootstrap percentile-t confidence intervals, which use the bootstrap methods (B1, B2, and B3) to compute critical values for the nonstudentized and studentized statistics based onβ n , respectively. The asymptotic normal theory-based confidence interval for β 0 is given byβ n ± 1.96 · SE β n , where SE β n is a consistent estimator of SE β n = var β n .
Four choices are used to compute SE β n . For our first choice of SE β n , we use the multi-stage maximum likelihood (MSML) standard errors estimator as described in detail in Section 3.1.1 in Patton (2012) (see Equation (41)). Then, we construct a MSMLE variance, asymptotic normal theory-based confidence interval for β 0 by using SE β n = SE MSML β n , where SE MSML β n is the estimated MSML standard error of β n (which has a sandwich form). In our second, third and fourth choices of SE β n , we use the bootstrap approaches B1, B2, and B3. In particular, a fully optimized bootstrap procedure variance, asymptotic normal theory-based confidence interval for β 0 can be obtained with SE β n = SE B1 β n , where SE B1 β n is the estimated standard error ofβ n based on B1. Similarly, fast resampling procedure variance, asymptotic normal theory-based confidence intervals for β are obtained by using SE β n = SE B2 β n and SE β n = SE B3 β n , where SE B2 β n and SE B3 β n are the estimated standard error ofβ n based on B2 and B3, respectively. Note that, the standard errors SE B1 β n , SE B2 β n and SE B3 β n are obtained by computing the statis- The second set of intervals we consider are bootstrap percentile confidence intervals, which are very simple to compute since they avoid the need to explicitly compute standard errors. For each resampling approach (B1, B2, and B3), a symmetric bootstrap percentile confidence intervals for β 0 is given byβ n ± p * 95 , where p * 95 is the 95% quantile of the bootstrap distribution of |β * n −β n |. Finally, we also consider bootstrap percentile-t confidence intervals. For each resampling approach, a symmetric bootstrap percentile-t confidence intervals for β 0 is given byβ n ± q * 95 SE MSML β n , where q * 95 is the 95% quantile of the bootstrap distribution of √ n|β * n −β n |/ Ĉ * n , withĈ * n as in (2), (3), and (4), for B1, B2, and B3, respectively. Table 1 reports the coverage rates and lengths of 95% confidence intervals of the copula pseudo-true parameter β 0 for the two DGPs, respectively. Results in Table 1 are not too sensitive to the value of the Student's t parameter ν. For a given sample size, all intervals have approximately the same coverage rate, with only small differences among them. Both fast resampling procedures B2 and B3, and the MSML approach perform well even for the small sample size n = 200. Indeed, the evidence of presence of serial correlation in the scores is confirmed by the average value of the block sizes chosen by Andrews (1991) method, which is equal to 3.90 in our simulations. However, as Table 1 suggests, there are notable differences among the different methods when considering their confidence interval lengths. This table clearly shows that for all DGP's, the intervals based on B1 (either using the CLT-based or the bootstrap percentile and/or percentile-t approach) tend to display shorter intervals for the smaller sample sizes compared to CLT-MSML, B2 and the B3 intervals.
Note that all three asymptotic normal theory-based confidence intervals differ only by the way that the estimated standard errors ofβ n have been computed. In order to gain CLT-MSML, CLT-B1, CLT-B2, and CLT-B3 -intervals based on the normal using estimated standard error based on the MSML, the B1, the B2, and the B3, respectively; B1 bootstrap intervals based on the fully optimized procedure, B2 bootstrap intervals based on the fast resampling procedure, B3 bootstrap intervals based on the fast resampling approach proposed by Armstrong, Bertanha, and Hong (2014). 1000 Monte Carlo trials with 999 bootstrap replications each. Pseudo-true parameters were calculated using one million observations. For ν = 6, β 0 = 0.99, and β 0 = 1.07 when ν = 10.  further insight into the "relatively" good performance of these asymptotic normal theory-based confidence intervals in finite samples, we compute the ratio of the estimated standard error over the true value and the mean-square error (MSE) of the estimated standard errors. The results are presented in Table 2.
For small sample sizes, on average MSML and B2 overestimate the standard errors, with the ratio of estimated standard error over the true value above 1. For instance, when n = 200 and ν = 6, the ratio of estimated standard error over the true value based on MSML and B2 are 1.12 and 1.14 for MSML and B2, respectively, whereas this ratio is 1.01 for B1 and 0.99 for B3. Consequently, the length of confidence intervals based on estimated standard error from MSML and B2 are larger than the one based on B1 and/or B3. The gains associated with the bootstrap methods can be quite substantial when the main goal of the researcher and/or practitioner is to estimate the standard errors. The results in Table 2 are in favor of the bootstrap particularly for small sample sizes. More specifically, the full resampling method B1 is better than using MSML and/or B2 standard errors. For small samples, the bootstrap method B1 estimates the standard error of the copulas parameter estimatorβ n more precisely than the MSML, B2 and B3 approaches. For large sample sizes, we have approximately the same performance for all three methods. For instance, when n = 200 and ν = 10, the MSE of the estimated standard errors ofβ n based on MSML, B1, B2, and B3 are 7.03 · 10 −3 , 0.96 · 10 −3 , 6.31 · 10 −3 , and 3.28 · 10 −3 , respectively. Whereas, for n = 2, 500 and ν = 10, the MSE become 0.17 · 10 −3 , 0.02 · 10 −3 , 0.15 · 10 −3 , and 0.03 · 10 −3 , respectively. Thus we see that although all four methods MSML, B1, B2, and B3 are asymptotically equivalent, and the full resampling method B1 may be computationally much more demanding, in small samples, the improved estimates of the standard errors based on B1 may outweigh its computational cost. Overall, the performance of B2 is comparable to that of MSML, B3 outperforms B2, whereas B1 outperforms B2, B3 and MSML and provides more accurate estimators of the standard errors, especially when the sample size is small.

Conclusion
This article proposes and theoretically justifies bootstrap methods for inference on nonlinear dynamic models that are estimated in two (or more) stages of quasi-maximum likelihood estimation (QMLE). In particular, we show the consistency of the bootstrap distribution of the two-step QML estimator using dependence and heterogeneity conditions similar to those used by Gonçalves and White (2004) for the one-step case. In addition, we also prove the consistency of bootstrap standard errors for two-step QMLE, a result that does not seem to be available even for iid data. This justifies the standard practice of computing bootstrap standard errors instead of computing analytical standard errors, which quickly becomes cumbersome in the multistage QMLE context. Our simulation results show that intervals based on bootstrap standard errors or bootstrap percentile intervals obtained with the fully optimized method that resamples the log-likelihood functions jointly are shorter on average than intervals based only on asymptotic theory or on the fast resampling method we propose. Thus, although more computationally demanding, the fully optimized bootstrap method has better finite sample properties than the other methods we consider.