Parameter Estimation Robust to Low-Frequency Contamination

We provide methods to robustly estimate the parameters of stationary ergodic short-memory time series models in the potential presence of additive low-frequency contamination. The types of contamination covered include level shifts (changes in mean) and monotone or smooth time trends, both of which have been shown to bias parameter estimates toward regions of persistence in a variety of contexts. The estimators presented here minimize trimmed frequency domain quasi-maximum likelihood (FDQML) objective functions without requiring specification of the low-frequency contaminating component. When proper sample size-dependent trimmings are used, the FDQML estimators are consistent and asymptotically normal, asymptotically eliminating the presence of any spurious persistence. These asymptotic results also hold in the absence of additive low-frequency contamination, enabling the practitioner to robustly estimate model parameters without prior knowledge of whether contamination is present. Popular time series models that fit into the framework of this article include autoregressive moving average (ARMA), stochastic volatility, generalized autoregressive conditional heteroscedasticity (GARCH), and autoregressive conditional heteroscedasticity (ARCH) models. We explore the finite sample properties of the trimmed FDQML estimators of the parameters of some of these models, providing practical guidance on trimming choice. Empirical estimation results suggest that a large portion of the apparent persistence in certain volatility time series may indeed be spurious. Supplementary materials for this article are available online.


INTRODUCTION
Empirical evidence of level shifts (changes in mean) or other deterministic trends has been recognized for many years. Of the numerous and varied examples, Garcia and Perron (1996) found the presence of large level shifts in U.S. real interest rate series; Qu (2011) rejected the null hypothesis of stationary short or long-memory against the alternative of level shifts or deterministic trends for U.S. inflation rate series; Eraker, Johannes, and Polson (2003), Stȃricȃ and Granger (2005), and Qu and Perron (2013) found that incorporating jumps into volatility models provides a substantial improvement for capturing the dynamics of stock market returns volatility; McCloskey and Perron (2013) provided evidence that contaminating components bias standard memory parameter estimates upward for daily stock market returns volatility. The literature also indicates that the presence of level shifts or trends leads to the phenomenon of "spurious persistence," in which the econometrician is misled into believing a process is more persistent than it actually is. For example, Perron (1990) demonstrated that the presence of abrupt changes in mean often induces spurious nonrejection of the unit root hypothesis while Bhattacharya, Gupta, and Waymire (1983) demonstrated that certain deterministic trends can induce the spurious presence of long-memory features in the data. Perron (1990) also showed that level shifts asymptotically bias autoregressive coefficient estimates toward one. Similarly, through simulation evidence, Lamoureux and Lastrapes (1990), and analytically, Mikosch and Stȃricȃ (2004) and Hillebrand (2005), have shown that when the mean of a GARCH process changes, the sum of its estimated autoregressive parameters converge to unity. Much recent attention has focused on the phenomenon that changes in mean also lead to the presence of spurious long-memory. For example, Diebold and Inoue (2001), Granger and Hyung (2004), Mikosch and Stȃricȃ (2004), and Perron and Qu (2010) have shown through simulation and theory that level shifts induce hyperbolically decaying autocorrelations and spectral density estimates that approach infinity at the null frequency. Thus, the data indicate that an estimation method that is robust to these contaminating components would be quite useful in practice. The current study provides such a method.
This article presents a robust estimation technique that exploits the difference in stochastic orders of magnitude between the periodograms of an additive low-frequency contaminating process and a weakly dependent contaminated process whose parameters we wish to estimate. Depending on the frequency range under study, one of these periodograms comes to asymptotically dominate the other. The estimators fit the periodogram of the observed process to the spectral density function of a candidate model within certain (higher) frequency ranges for which the periodogram of the contaminated process dominates that of the contaminating process. The robust estimators we focus on are trimmed versions of the frequency domain quasimaximum likelihood (FDQML) estimator for which the trimming depends on the sample size. The asymptotic properties of the (untrimmed) FDQML estimator have been extensively studied in standard (uncontaminated) linear process contexts by Dunsmuir and Hannan (1976), Dunsmuir (1979), and Hosoya and Taniguchi (1982), among others. We obtain the asymptotic properties of the trimmed version of this estimator in the presence of additive low-frequency contamination.
To overcome the issue of spurious persistence caused by level shifts in the context of fully parametric estimation, a substantial amount of work has been done to explicitly estimate mean changes and change points within a sample (see, among many others, Bai and Perron 1998). After accounting for mean changes by estimating them, one may subsequently estimate the other parameters of a given process. Similarly, one may attempt to explicitly model an underlying trend function that could be causing the presence of spurious persistence. However, in practice one may not have a good idea of the form a contaminating component takes or whether contamination is present at all. Our robust estimators perform well in both the presence and absence of a wide variety of contaminating components. The types of contamination we are interested in are not amenable to high-pass filtering because they cause the observed periodogram to diverge in a shrinking band of frequencies surrounding the origin. Moreover, an approximating high-pass filter, as detailed by Baxter and King (1999), for example, is unnecessary in this context since estimation can be performed directly in the frequency domain. (Not only is the use of such filtering unnecessary and not designed for the problem at hand, but also preliminary simulation results indicate that the performance of such a method in this context is clearly inferior to the technique advanced here.) In a separate regression context, Engle (1974) also noted that ignoring a band of low frequencies could help to improve estimation performance when the model being estimated is only valid within a higher frequency range. Yet his analysis was similarly concerned with the properties of estimators that ignore a fixed band of frequencies, which, in the present context is both unnecessary and wasteful of valuable information on the parameters of the contaminated process. The methodology presented here complements the work of Müller and Watson (2008), who focus upon modeling the low-frequency variability of a time series by examining a shrinking band (but fixed number) of low-frequency ordinates.
The fully parametric estimation methodology we introduce in this article is related to the robust semiparametric memory parameter estimators of Iacone (2010), McCloskey and Perron (2013), and Hou and Perron (2014). However, the estimators in these articles do not specify the other, short-run dynamics of the time series and hence require a bandwidth parameter that limits the highest frequency ordinate examined. In contrast, our fully parametric approach does not require this upper bound, resulting in an estimator that is consistent at the fully parametric rate. On the other hand, the types of contaminated processes considered in this article are restricted to have absolutely summable autocovariances. In another related article, McCloskey (2013) showed how to use the trimmed FDQML estimator presented here to estimate the fully parametric long-memory stochastic volatility model robust to low-frequency contamination.
Our trimmed FDQML estimators can be used to consistently estimate the parameters of a large class of models including ARMA, stochastic volatility (SV), ARCH, and GARCH, in the presence of a wide variety of additive low-frequency contami-nating components. They are asymptotically normal under mild conditions. The asymptotic variances of these estimators are the same as those in the uncontaminated and untrimmed case. Depending upon the model being estimated, the trimming can be chosen to balance the competing finite sample biases arising from the potential presence of contaminating components and ignoring lower frequency ordinates when fitting the model. We provide a simulation study that addresses the issue of trimming parameter choice for popular ARMA, SV, and GARCH models. We subsequently use high-frequency exchange rate volatility and daily stock market volatility data to robustly estimate the parameters of particular specifications of these models, finding that robust estimation significantly reduces the persistence of the fitted models.
The rest of this article proceeds as follows. In Section 2, we detail the robust estimation technique and provide the reasoning behind the use of trimming. Here, we also discuss some of the contaminating processes of interest. Section 3 supplies the asymptotic properties of the trimmed FDQML estimators under general assumptions on the contaminated and contaminating processes and the class of models being estimated. The finite sample properties of the trimmed FDQML estimator are analyzed for three different popular time series models in Section 4 while Section 5 comprises the empirical application of the robust estimation technique to financial volatility data. Section 6 concludes. Proofs of the main results are contained in a supplemental mathematical appendix available online.
In what follows, R and Z denote the sets of real numbers and integers; K denotes the largest integer value below any generic K ∈ R; O(·), o(·), O p (·), and o p (·) denote the usual (stochastic) orders of magnitude; a T ∼ b T implies a T /b T → 1; I (·) is the indicator function; " p −→" and " d −→" indicate convergence in probability and distribution; A hk denotes the entry of the hth row and kth column of the generic matrix A. All convergence concepts are taken to mean as the sample size goes to infinity.

ROBUST ESTIMATION TECHNIQUE
Let {y t } denote a generic covariance stationary process. The discrete Fourier transform and periodogram of {y t } at frequency λ ∈ [−π, π] are, respectively, defined as follows: where T is the sample size. The estimators we examine involve evaluating the observed time series' periodogram at a subset of the Fourier frequencies λ j ≡ 2πj/T for j = − T /2 + 1, . . . , T /2 − 1, T /2 . We wish to estimate the model parameters of a contaminated process {v t }, but we observe the process where μ is a constant, and {v t } and {u t } are independent. Henceforth, "contaminated process" refers to {v t } and "contaminating process" refers to {u t }.

Contaminating Processes of Interest
Many contaminating processes satisfy E[I u (λ j )] = O(T /j 2 ), a subset of which we provide here.
Random Level Shifts (RLS) where η t is iid with E[η t ] = 0 and E[η 2 t ] = σ 2 η > 0 and π T ,t is iid Bernoulli taking the value 1 with probability p/T for some p ≥ 0. The components π T ,t and η t are mutually independent.
Deterministic Level Shifts (DLS) where B is a fixed positive integer (the number of breaks plus one), where φ ∈ (−1/2, 1/2). Outliers where M is a fixed positive integer (the number of outliers), |m i | < ∞ for i = 1, . . . , M, and 0 < T 1 < · · · < T M−1 < T M ≤ T . The Bernoulli probability of the RLS process is sample sizedependent so that the level shifts are rare. Otherwise, {u t } would be better construed as a random walk. For p = 0, the RLS process nests the no level shift, no trend case. For a discussion of how E[I u (λ j )] = O(T /j 2 ) is satisfied for the above processes, we refer the interested reader to McCloskey (2013) and references therein.

Trimmed FDQML Estimation
Frequency domain estimation is designed to select the parameters that provide the best fit of a spectral density function within a given class of models to the periodogram of the observed time series. The FDQML estimator (sometimes referred to as a Whittle estimator) minimizes the negative logarithm of the frequency domain approximation to the Gaussian likelihood function: where F 1 ≡ (−T /2, T /2] ∩ Z \ {0} and f (λ; θ ) is the spectral density function of the candidate model with parameters θ , evaluated at frequency λ. For the contaminated processes we examine, the contaminating process component of the observed process {x t } may bias this minimizer away from its true value when frequencies close to zero enter the objective function. Letting l ∈ (0, T /2) denote the trimming parameter, this leads to the trimmed FDQML estimator:θ T ≡ argmin θ∈ L T ,l (θ ) for some appropriate parameter set , where

ASYMPTOTIC PROPERTIES OF ROBUST ESTIMATORS
In what follows, θ 0 denotes the true parameter vector.

Consistency and Asymptotic Normality of the Trimmed FDQML Estimator
We begin with sufficient conditions for consistency of the trimmed FDQML estimator of a linear process.
Assumption 1. The process {x t } is generated by the following data-generating process (DGP): where μ is a finite constant and v t and u t are independent at all leads and lags. ⊂ R s is compact and the following properties hold over : (i) g(λ; θ ) ≡ | ∞ j =0 c(j ; θ )e −iλj | 2 is continuous in (λ, θ) ∈ [−π, π] × ; (ii) σ 2 (θ ) is continuous and strictly greater than zero over ; (iii) g(λ, θ) > 0 for all (λ, θ) ∈ [−π, π] × ; (iv) If θ 0 = θ ∈ , g(λ; θ ) = g(λ; θ 0 ). Furthermore, for θ 1 , θ 2 ∈ , if θ 1 = θ 2 , then f (λ; θ 1 ) = f (λ; θ 2 ) on a subset of [−π, π] that is of positive Lebesgue measure; Note that Assumption 1(a) implies that the spectrum of {v t } takes the form Parts (a) and (b) are variants of standard assumptions for frequency domain estimation of models with linear representations (see, e.g., Condition B of Dunsmuir and Hannan 1976). Part (a) limits the dependence of the contaminated process and part (b) is composed of identification conditions. Assumption 1 contrasts with the assumptions imposed in the related articles of McCloskey and  and McCloskey (2013) in that it applies to short-memory processes with a fully parametric spectrum that is not necessarily derived from a stochastic volatility model. Part (c) is a high-level assumption that encompasses contamination by many processes of interest.
Remark 1. The methods of this article are useful for estimating the parameters of a short-memory process that is potentially contaminated by a process {u t } with E[I u (λ j )] = O(T /j 2 ). If the user is instead interested in semiparametrically estimating the memory parameter of a long-memory process that is robust to the same form of contamination, one may use the methods McCloskey and  or Hou and Perron (2014). If the user is interested in parametrically estimating all of the parameters of a long-memory stochastic volatility model (advanced by Breidt, Crato, and de Lima 1998;Harvey 1998) that is robust to the same form of contamination, one may use the methods of McCloskey (2013). The literature does not appear to contain results for parametric estimation of long-memory models robust to low-frequency contamination outside of this long-memory stochastic volatility class. This would likely require a completely different treatment by building upon the work of Hosoya (1997) and is beyond the scope of the present article.
For consistency, the trimming parameter must grow fast enough to asymptotically rid the FDQML objective function's dependence on the frequencies for which the contaminating component dominates the observed periodogram: This is quite a weak assumption since we only require trimming to grow l → ∞ faster than the slowly growing function log 4 T but slower than the sample size T. We then have the following consistency result. To establish asymptotic normality of the trimmed FDQML estimator, we strengthen the moment conditions for the innovations in the linear representation of the process, the coefficients in this representation and smoothness conditions on the spectral density function.
Assumption 3. For the process {v t } in Assumption 1, the following hold for its spectrum and innovations: Parts (a)-(c) are standard (see, e.g., Condition C of Dunsmuir 1979). Parts (a) and (b) impose smoothness conditions on the spectral density function of the contaminated process. In this case, as in Dunsmuir (1979), we impose linearity (c) with innovations that are third-order martingale differences solely to render a parametric asymptotic variance form. This may be restrictive in some applications where the second-and third-order martingale difference properties are unknown or higher moments may not exist. Nevertheless, simulation evidence in Section 4 indicates that the trimmed FDQML estimator continues to perform well in the presence of heavy-tailed data. Part (d) requires the contaminated process to be weakly dependent.
Asymptotic normality requires we also impose a stronger negligible trimming condition. In particular l → ∞ faster than T 1/2 but slower than T ensures identification and T 1/2 -asymptotics follow from the periodogram I v of the stationary process {v t }.
Theorem 2. Under Assumptions 1, 3, and 4, T 1/2 (θ T − θ 0 ) is asymptotically multivariate normal with zero mean and covariance matrix given by It is worth reiterating here that consistency (Theorem 1) and asymptotic normality (Theorem 2) cannot be obtained by estimation in the time domain unless one takes a parametric approach to the form of contamination u t . Assumption 1(c) allows for a much more robust approach in allowing for a wide range of contamination, or even the absence of contamination altogether.
Remark 2. The asymptotic covariance matrix of Theorem 2 is standard in the literature on FDQML estimation (see Dunsmuir 1979). Hence, neither the sample size-dependent trimming nor the presence of low-frequency contamination reduces the asymptotic efficiency of the estimator.

Remark 3. If E[e 2
t ] = σ 2 is separately parameterized from θ , one may alternatively estimate θ by minimizing a simplified objective function (see, e.g., Hannan 1973 for linear processes and Giraitis andRobinson 2001 andStraumann 2002 for GARCH processes) In the absence of contaminating components, this simplified estimator is asymptotically equivalent to the untrimmed FDQML estimator. However, this equivalence breaks down when a trimming large enough to make the simplified estimator robust to low-frequency contamination is used. In this case, the estimator becomes asymptotically biased with a degenerate limiting distribution and slower than standard parametric rate of convergence. (These results are omitted for the sake of brevity and available from the authors upon request.) We therefore recommend against trimming this simplified objective function for robust estimation.
When d > 0, the typical range of interest in economic applications, (i) for all λ j such that lim T →∞ j/T = 0, I u (λ j ) would asymptotically dominate the periodogram of the observed process and (ii) for all λ j such that lim T →∞ |j |/T ∈ (0, 1], neither I u (λ j ) nor I v (λ j ) would dominate. Thus, it does not appear possible to robustly estimate the parameters of a short-memory process {v t } when it is contaminated by a long-memory process {u t } by using the frequency domain trimming employed in this article. However, by parametrically specifying the spectral density function of {x t } to include a short-memory component and a separate long-memory component, existing frequency domain estimation methods would allow one to form consistent and asymptotically normal estimates of the parameters of both components. See, for example, Hosoya (1997) and references therein.

Examples of Processes Satisfying the Assumptions
We now discuss a few popular models for the contaminated process {v t }.
3.2.1 ARMA Models. Stationary ergodic ARMA(p, q) processes satisfy Assumption 1(a). If the parameter set is restricted to models for which (i) the roots of the autoregressive and moving average polynomials, A(z; θ ) and B(z; θ ) say, lie outside of the unit circle, (ii) A(z; θ ) and B(z; θ ) have no common roots, and (iii) the highest order AR or MA parameters are nonzero, then these ARMA(p, q) models satisfy Assumptions 1(b) and 3(a)-(b).
3.2.2 Stochastic Volatility Models. The standard shortmemory SV model with an additive low-frequency component contaminating the volatility is given by a returns process {r t } that follows (3) the polynomials A(z; θ ) and B(z; θ ) satisfying the same conditions as for the above ARMA(p, q) case, and {u t } satisfying Assumption 1(c). Qu and Perron (2013) studied a special case of a contaminated AR(1) version of this model, providing a Bayesian procedure for estimation and inference. In contrast, our frequentist methods apply to a much larger class of contaminated and contaminating processes.
For this contaminated SV model, the log-squared returns have the following decomposition: where θ 1 = σ 2 ξ and θ 2 = σ 2 ζ are the first two entries of the parameter vector. Under the same assumptions imposed on the ARMA component {ṽ t } as those imposed above for the pure ARMA model, {v t } is weakly stationary and therefore has the linear representation required by Assumption 1(a). If the ARMA component {ṽ t } is purely autoregressive of finite order and all roots of A(z; θ ) lie outside of the unit circle for θ ∈ , Assumptions 1(b) and 3(a)-(b) follow from results in Pagano (1974).

(G)ARCH Models.
Finally, we turn to contaminated (G)ARCH models for the returns process {r t }: {u t } satisfies Assumption 1(c) and I ε t is the σ -field of events generated by ε s , s ≤ t. From (4), several specifications for the returns process {r t } may arise, perhaps the most natural being which nests the standard (G)ARCH framework when u t = 0. One may view the process {ṽ t } as the squares of a latent (G)ARCH process whose parameter vector θ we wish to estimate. For this contaminated (G)ARCH model, where ξ t ≡ṽ t − h t are martingale differences. If {ṽ t } is secondorder stationary, {v t } has the spectral density function The GARCH(p, q) model similarly has an ARMA (max(p, q), q) representation and an invertible GARCH(p, q) model is a subcase of the model forṽ t in (5). The functional form for the spectral density function (6) resembles that given in Assumption 1(a) but in this contaminated (G)ARCH case, the sequence {ξ t } is not independent in second moments, violating Assumption 3(c). Hence, a separate proof of asymptotic normality for the trimmed FDQML estimator is required.

Consistency and Asymptotic Normality for Contaminated (G)ARCH Models
We now provide conditions under which the trimmed FDQML estimator of the parameters of a contaminated (G)ARCH process given by (4) is consistent and asymptotically normal.
Remark 5. Assumption 1(b)(i) holds when {ṽ t } is modeled as the squares of a stationary ARCH(p) or GARCH(p,q) process, which follows from its implicit ARMA(max(p, q), q) representation. The remainder of Assumption 1(b) is implied by Assumptions ARCH-4 and 2 of Giraitis and Robinson (2001).
Remark 6. As noted by previous authors (see Giraitis and Robinson 2001;Mikosch and Straumann 2002), ARCH-8 may be a strong assumption in certain contexts. For example, it implies the existence of the eight moment of the ARCH innovations ε t , whereas standard time domain methods typically impose the existence of the fourth moment. Nevertheless, Corollary 1 shows that consistency is still attainable under the weaker ARCH-4 Assumption.
Remark 7. The limiting covariance matrices of Theorems 2 and 3 can be difficult to estimate in practice, particularly in the (G)ARCH framework (see Remark 2.2 of Giraitis and Robinson 2001). This challenge, which is also present for standard frequency domain estimators in the absence of data contamination, is exacerbated by the potential presence of data contamination. Estimates of these variances and/or valid bootstrap procedures may still be feasible. The works of Taniguchi (1982) and Chiu (1988) are likely to provide useful starting points for this direction of research.

FINITE SAMPLE PROPERTIES OF ROBUST FDQML ESTIMATORS
We now study the finite sample properties of our trimmed FDQML estimators in the presence and absence of lowfrequency contamination, and under thin and heavy tails. We chose to examine the models below for their popularity in applied work and ease in interpreting what the "persistence parameters" of the model are. The values reported in this section all result from 1000 Monte Carlo replications.

The AR(1) Model
We begin by fitting an AR(1) time series in the absence of contamination: x t = v t = av t−1 + e t , e t ∼ iid(0, σ 2 e ).
We use σ 2 e0 = 1, and a 0 = 0, 0.5, 0.95 as they are representative of the range of persistent values for a 0 . We estimate (a 0 , σ 2 e0 ) on [0, 0.99] × [0.01, 10]. (Given that the minimization is conducted over a ∈ [0, 0.99], Assumption 3(a)(ii) does not technically hold when a 0 = 0. However, if asymptotic normality is required, one can easily expand the search space.) We compute the standard FDQML and three trimmed FDQML estimators with trimmings set as l = T 0.4 , T 0.51 , T 0.6 to illustrate the broad picture. We use sample sizes T = 1000, 2000, and 4000, which align with sizes available for financial returns data. The finite sample biases and root mean squared errors (RM-SEs) for a 0 for the uncontaminated processes are given in the top three panels of Tables 1 and 2. The first column shows that for Gaussian innovations, e t ∼ iid N (0, 1), all four estimators of a 0 perform well with very little bias or RMSE differences except when using the largest trimming l = T 0.6 . We can see from the third column of Tables 1 and 2 that changing the innovation distribution to be heavy-tailed hardly changes the results. Here, e t has a symmetric Paretian distribution such that P (e t < −c) = P (e t > c) = 0.5(1 + c) −κ with tail index κ = 2.5, standardized so that σ 2 e = E[e 2 t ] = 1. In this case, v t has an infinite third moment, in violation of Assumption 3(c).
Next, we contaminate the AR(1) model by adding u t to v t in (7), where {u t } is the RLS process described in Section 2.1. Again, σ 2 e0 = 1 but now the variance of a level shift σ 2 η is set equal to the variance of v t . We examine a range of specifications for p and results for the representative value p = 10 are  presented here. The finite sample biases and RMSEs for estimators of a 0 for these processes are displayed in the final three panels of Tables 1 and 2. Beginning with the case of contaminated white noise (i.e., a 0 = 0) with Gaussian or heavy-tailed innovations, we can see that the level shifts induce a very large upward bias in the untrimmed estimator of a 0 that appears to stabilize around 0.51. We can also see that trimming removes most of this bias, resulting in substantially lower RMSE. Adding short-run dynamics to the {v t } process by setting a 0 = 0.5 or 0.95 does not change the general pattern except for decreasing the finite sample biases. Too high a trimming (e.g., l = T 0.6 ) can induce a downward bias that overwhelms the upward bias due to the level shifts but, as in the uncontaminated case, this problem can be avoided by using a more moderately sized trimming.
Remark 8. The trimmed FDQML estimator continues to perform quite well in the presence of heavy-tailed data that violates Assumption 3(c). This is also the case for the autoregressive SV model of the following section. These simulation results appear to complement the theoretical results of Mikosch et al. (1995) who found that frequency domain estimators continue to perform well even in the presence of an infinite second moment.

The Autoregressive SV Model
The (contaminated) autoregressive SV (ARSV) model we examine is specified in (2) and (3) with A(L; θ ) = 1 − aL, B(L) = 1, ε t ∼ iidN (0, 1) and {u t } being an RLS process. Our thin-tailed specification sets ξ t ∼ iid N (0, 1), while our heavytailed one uses ξ t with a symmetric Paretian distribution with tail index equal to 2.5, as defined in the previous subsection. FDQML estimation of θ = (a, σ 2 ξ , σ 2 ζ ) is performed using the periodogram of the logarithm of the squares of r t . We examine a 0 = 0.5 and 0.95. Using the same sample sizes and trimmings and the analogous parameter space as in Section 4.1, the finite sample biases and RMSEs for estimators of a 0 are provided in Table 3. Beginning with the case of no contamination (for which p = 0) and a 0 = 0.5, we again see good performance and little differences across estimators in the presence of thin or heavy tails. For the case of high persistence, a 0 = 0.95, we encounter the same phenomenon as in the AR(1) model: a moderate trimming should be used.
Adding an RLS contaminating component by setting p = 10 and the variance of a level shift σ 2 η to the variance ofṽ t , it is first interesting to note that, unlike for the AR(1) models, E [â] does not appear to stabilize at some value less than one. Instead, it seems that E[â] → 1. For the case of a 0 = 0.5, trimming removes the vast majority of the bias induced by level shifts and the trimming of l = T 0.51 seems to perform the best. We again see that competing biases arise from too high of a trimming that may discard valuable information about v t , and too low of a trimming that subjects the estimator to upward bias from u t . This bias competition is quite pronounced in the cases for which a 0 = 0.95 and we can see its presence in the RMSE values. Often  to a lesser degree, this bias competition is in fact a generic feature of the trimmed FDQML estimation procedure in the presence of low-frequency contamination, manifesting itself to varying extents in certain models and parameter configurations, the present one being a particularly dramatic case. Figures 1 and 2 more clearly illustrate this phenomenon, displaying the absolute bias and RMSE ofâ as a function of the trimming exponent for the three sample sizes when a 0 = 0.95 and p = 10.
Remark 9. There is an important parallel between the competing biases present in this context and the context of semiparametric frequency domain estimation of the memory parameter that is also robust to low-frequency contamination, as in McCloskey and Perron (2013). In this latter case, too high of a trimming induces bias in the estimate of the memory parameter in the presence of unmodeled noise or short-run dynamics.
FDQML estimation of θ = (a, b, σ 2 ξ ) is performed on the periodogram of the squares of r t . We examine the values of (a 0 , b 0 ) = (0.05, 0.3), (0.05, 0.6), (0.05, 0.9). (ARCH-8 does not hold for the latter specification but ARCH-4 and the other assumptions required for consistency does). The parameter space is the subset of [0.01, 0.99] × [0.01, 0.99] × [0.1, 10] for which a + b is restricted to be no larger than 0.99. We use the same trimmings as above but look at sample sizes of 1000, 4000, and 16,000 since accurate FDQML estimation of the GARCH(1,1) model requires somewhat larger samples. The finite sample biases and RMSEs for estimators of (a 0 , b 0 ) corresponding to these uncontaminated GARCH(1,1) processes are recorded in the top three panels of Tables 4 and 5, where "QML" corresponds here to the standard time domain quasimaximum likelihood estimator, included for comparison. When (a 0 , b 0 ) = (0.05, 0.3) or (0.05, 0.6), the (trimmed) FDQML estimators exhibit almost no bias when moderate trimmings are used. The standard time domain estimator RMSE dominates the FDQML estimators but not to a major extent. For the case of highest persistence, (a 0 , b 0 ) = (0.05, 0.9), the FDQML estimators run into some problems even under moderate trimming, with substantial downward biases, though estimates using a small trimming l = T 0.4 perform fairly well in larger samples.
We now add a DLS component to the GARCH(1,1) model:  and ε t ∼ iidN(0, 1). Letting var(ṽ t ) = σ 2 v , is a DLS process with three level shifts. We now wish to estimate the parameters of the latent {ṽ t } process. The finite sample biases and RMSEs of the estimators are given in the lower three panels of Tables 4 and 5. Starting with the case of least persistence, the trimmed FDQML estimators of a and b clearly outperform both the standard FDQML and time domain estimators. As the sample size grows, it appears that the level shifts cause the standard estimators to have E[â] + E[b] → 1, consistent with the results of Lamoureux and Lastrapes (1990) and Hillebrand (2005). On the other hand, the trimmed FDQML estimators remove very substantial portions of the biases inâ andb with moderate trimmings typically yielding the lowest bias. For the case (a 0 , b 0 ) = (0.05, 0.6), the overall pattern is quite similar. The results become more nuanced in the highly persistent case though the trimmed FDQML estimators often have lower biases than the standard time domain estimator.
The Monte Carlo results of this section show that if the potential for additive low-frequency contamination is present, trimmed FDQML estimators can provide major improvements in terms of bias and RMSE when care is taken regarding the choice of trimming. A moderate trimming, for example, T = T 0.51 , performs quite well in most scenarios.

EMPIRICAL APPLICATIONS
We now demonstrate the practical differences between lowfrequency contamination-robust and nonrobust estimation by estimating the parameters of two popular time series models.

Autoregressive SV Estimation of High-Frequency Exchange Rate Returns
For this application, we follow Deo, Hurvich, and Lu (2006) in modeling high-frequency log returns via an SV model although we do not allow for long-memory. As we will see, allowing for long-memory in the model may be unnecessary when low-frequency contamination is accounted for. The data are 30 min log returns of the Japanese Yen per U.S. dollar exchange rate from 10:30 pm on December 12, 1986, to 10:00 pm on June 29, 1999 (148,416 observations). (The cleaned data and Andersen and Bollerslev (1997) deseasonaliztion code were kindly provided by Denis Tkachenko.) The choice of 30 min sampling frequency is used to reduce microstructure noise (see Deo, Hurvich, and Lu 2006).
We use two different deseasonalization approaches to rid the data of intraday periodicity (see Andersen and Bollerslev 1997).
The first is the flexible Fourier form approach of Andersen and Bollerslev (1997) (see their Appendix B). We deaseasonalize 5 min return data using six trigonometric terms and then aggregate the 5 min deseasonalized returns to the 30 min sampling frequency. (In the notation of (A.3) of Andersen and Bollerslev (1997), we set J = 0, D = 0 and P = 6.) The second deseasonalization approach is due to Deo, Hurvich, and Lu (2006). It is conducted directly in the frequency domain by ignoring certain Fourier frequencies of the periodogram that are fixed with the sample size, allowing the deseasonalization to occur concurrently with the (trimmed) FDQML parameter estimation on the 30 min data. We ignore both the Fourier frequencies λ j Figure 2. RMSE, a 0 = 0.95.    Deo, Hurvich, and Lu (2006) but our estimation results are insensitive to this choice.) As noted by Deo, Hurvich, and Lu (2006), excluding these frequencies from the objective function has no effect on the asymptotic properties of the parameter estimates while making them robust to the presence of intraday periodicity. The FDQML parameter estimates of the model are reported in Table 6 for the two trimmings T 0.4 and T 0.51 and the two different deseasonalization procedures. We report the estimated half-life of a shock to the short-memory component of log-squared returns and the percentage reduction in the estimated half-life when moving from standard to robust estimation. The results are striking: robustly estimating the parameters via trimmed FDQML reduces the implied half-life of a shock by roughly 90% relative to standard estimation. This result is insensitive to the trimming or deseasonalization procedure.

GARCH(1,1) Estimation of Daily Stock Market Returns
Finally, we examine low-frequency robust estimation of the GARCH(1,1) model for two standard daily stock market returns time series: the S&P 500 and Dow Jones Industrial Average (DJIA). The S&P 500 series consists of daily returns from January 8, 1926, to March 25, 2004 observations) and The results are again quite notable. The standard estimates of the model parameters produce the usual result of (nearly) integrated GARCH (IGARCH). However, the robust estimators clearly do not, reducing the implied half-lives of shocks by 96%-98%. These results are again insensitive to the trimming used.

CONCLUSION
This article addresses the well-documented issue of spurious persistence from an estimation standpoint. We introduce trimmed FDQML estimation methods that are robust to lowfrequency contamination, yielding consistent and asymptotically normal estimators. In the potential presence of lowfrequency contaminating components, we have shown that trimmed FDQML estimation outperforms existing estimation techniques, removing large biases, and decreasing RMSEs for a wide variety of models. The empirical applications of this article provide evidence that an estimation methodology that is robust to low-frequency contamination is practically important for fitting time series models to economic and financial data.

SUPPLEMENTARY MATERIALS
The supplemental appendix provides technical proofs for the main results of the article. These proofs make use of several supporting lemmas which are also stated and proved in the supplemental appendix.

ACKNOWLEDGMENTS
A previous version of this article circulated under the title "Parameter Estimation Robust to Low-Frequency Contamination with Applications to ARMA, GARCH and Stochastic Volatility Models." The authors are very grateful to Pierre Perron and Zhongjun Qu for numerous helpful discussions and to Nickolai Riabov for research assistance. This article also benefited from the comments of Ivan Fernandez-Val, Hiroaki Kaido, Rasmus Varneskov, the editors, and two anonymous referees. Thanks are due to Denis Tkachenko for kindly sharing cleaned high-frequency foreign exchange data and deseasonalization code.