Quasi-Likelihood Estimation of a Censored Autoregressive Model With Exogenous Variables

ABSTRACT Maximum likelihood estimation of a censored autoregressive model with exogenous variables (CARX) requires computing the conditional likelihood of blocks of data of variable dimensions. As the random block dimension generally increases with the censoring rate, maximum likelihood estimation becomes quickly numerically intractable with increasing censoring. We introduce a new estimation approach using the complete-incomplete data framework with the complete data comprising the observations were there no censoring. We introduce a system of unbiased estimating equations motivated by the complete-data score vector, for estimating a CARX model. The proposed quasi-likelihood method reduces to maximum likelihood estimation when there is no censoring, and it is computationally efficient. We derive the consistency and asymptotic normality of the quasi-likelihood estimator, under mild regularity conditions. We illustrate the efficacy of the proposed method by simulations and a real application on phosphorus concentration in river water.


Introduction
Censored time series data are frequently encountered in diverse fields including environmental monitoring, medicine, economics, and social sciences. Censoring may arise when a measuring device is subject to some detection limit beyond which the device cannot yield a reliable measurement. For instance, the total phosphorus concentration in river water is an important indicator about water quality, and its fluctuations over time are often monitored in environmental studies. However, the phosphorus concentration cannot be measured exactly if it falls below certain detection limit.
There is an extensive literature on regression analysis with censored responses, since the pioneering work of Buckley and James (1979). However, the case of regression with both the response and covariates subject to censoring is relatively under-explored. Censored time-series regression analysis, for instance, the Tobit model with autocorrelated regression errors (Tobin 1958;Robinson 1982a), falls in the latter framework. Robinson (1980) studied maximum likelihood (ML) estimation of Gaussian time series models with left (right) censored data, and showed that with sufficiently sparse censoring, the likelihood of a censored autoregressive (AR) model only requires 1-dimensional integration. Robinson (1982a) showed that Gaussian likelihood estimation of a Tobit model that assumes independent errors is still strongly consistent and asymptotically normal, even if the Gaussian errors are autocorrelated. Moreover, Robinson (1982b) proposed two methods, namely, conditional least squares and method of moments, for estimating the residual autocorrelations, and established their strong consistency and asymptotic normality, under the normal error assumption. The consistent autocorrelation estimates can then be used to provide consistent estimation of any finiteorder autoregressive-moving-average (ARMA) model specified for the regression errors. Zeger and Brookmeyer (1986) studied maximum likelihood estimation of a regression model with the errors driven by an autoregressive (AR) model of known order p ≥ 0. We first outline their approach in the simple case of no covariates and AR(1) errors. Without censoring, the Markov property of the AR(1) errors implies that the log-likelihood can be decomposed as the sum of the conditional log-likelihood of Y t given Y t−1 . The presence of censoring invalidates the Markov property, but Zeger and Brookmeyer (1986) pointed out a generalized Markov property: the conditional log-likelihood of Y t given all past Y 's is equal to that of Y t given Y t− j , j = 1, . . . , m, if Y m is uncensored. Thus, the data can be partitioned into blocks that are preceded by an uncensored observation, except possibly the initial block, and the log-likelihood is the sum of the conditional log-likelihood of each block of data given the uncensored observation preceding it, plus that of the initial block. For AR(p) errors, the blocks are delineated by p consecutive uncensored observations, and the log-likelihood is then decomposed as the sum of the conditional log-likelihood of each block given the p consecutive uncensored observations preceding it, plus the log-likelihood of the initial block; if exogenous variables are available, the (conditional) likelihoods are further conditional on the exogenous variables. (See Robinson (1980) for a similar result in the normal case.) However, the block dimension increases quickly with increasing censoring and the AR order; see the supplement. Thus, maximum likelihood estimation becomes quickly numerically intractable with increasing censoring even for moderately high AR order. Zeger and Brookmeyer (1986) also briefly discussed a pseudo-likelihood approach, but it was not fully developed. Park, Genton, and Ghosh (2007) introduced an imputation method to estimate a censored time series model assuming the complete data is an ARMA process. They proposed to impute the censored values by some random values simulated from their conditional distribution given the observed data and the censoring information, and treat the imputed time series as the complete data with which any estimation procedure for complete time-series data can be used. However, they focused on the AR(1) model and relied on simulation studies to demonstrate their method, with no derivation of theoretical properties.
Our aim is to derive a computationally efficient estimation method via solving a system of fixed number of unbiased estimating equations for estimating a censored regression model with autoregressive errors. The basic idea of our approach assumes that the score of the complete-data conditional loglikelihood of Y * t given Y * t− j , j = 1, . . . , p and the covariates has a closed-form expression and so does its expectation given the (possibly) censored time series Y t , t = 0, . . . , p, evaluated at the same set of model parameters. Setting the preceding conditional mean score to zero then provides an unbiased estimating equation for estimating the model. Our proposed quasilikelihood method becomes maximum likelihood estimation in the absence of censoring. Furthermore, we derive the consistency and asymptotic normality of the proposed estimator under some mild regularity conditions. Implementation of the proposed method for the important special case of normal innovations is discussed in detail.
In Section 2, we elaborate the model, the proposed estimation procedure and its theoretical properties. We report the empirical performance of the proposed method in Section 3, and apply in Section 3.6 the proposed method in a real application with a series of censored phosphorus concentrations in river water. We briefly conclude in Section 4. All technical details are given in the supplement.

The CARX Model
The censoring region is frequently an interval of the form (−∞, c) or (c, ∞), or a finite interval (c l , c u ), which are referred to as left, right, and interval censoring, respectively (Park et al. 2007). In practice, for data subject to left censoring with censoring limit c, the reported value where c = (c l + c u )/2, for interval censoring. Below, the censoring pattern is not restricted, but assumed to be pre-determined and independent of the underlying process. The proposed method is applicable to left censoring, right censoring, interval censoring, or more general censoring patterns.
Additionally, let X t be a vector covariate which bears a linear regression relationship with Y t , with the regression errors assumed to follow an autoregressive model of order p, where p is some known nonnegative integer. (The proposed method can be readily extended to a nonlinear regression model.) {X t } is assumed to be always observable.
Below, let v denote the transpose of a vector or matrix v. For a time series {s t }, let s i: j = (s i , s i−1 , . . . , s j ) if i > j, and s i: j = (s i , s i+1 , . . . , s j ) otherwise. We now state the model with the latent process given by and their linkage to the observations given by where {ε t } is an independent and identically distributed process with mean 0 and variance σ 2 . For the general case of variable censoring region, C and c in Equation (3) are replaced by C t and c t . The preceding model is referred to as the censored auto-regressive model with eXgenous variables (CARX). Let B denote the backshift operator so that for any time (1) and (2) can be rewritten as Let ψ = (ψ 1 , . . . , ψ p ) . Throughout, θ = (β , ψ , σ ) denotes a generic parameter vector, while θ 0 denotes the true parameter vector. Throughout, it is assumed that σ 0 > 0.

Estimation
We consider the problem of estimating a CARX model with data {(Y t , X t )} n t=1 generated from the CARX model with unknown parameter θ 0 . Further notations are required. Define the following σ -algebras Our proposed method is motivated by maximum likelihood estimation were Y * t observable and supposing {ε t } admits a marginal probability density function (pdf) f θ (·), then the joint log-likelihood function for Y * n:1 conditional on X n:1 is given by Suppressing contributions from the initial values yields a simpler conditional log-likelihood the maximization of which requires the first-order optimality condition: where ∇ denotes taking the partial derivative with respect to θ . However, censoring in the observed time series {Y t } entails that its joint log-likelihood cannot be reduced to a simple form similar to the one for Y * t (Zeger and Brookmeyer 1986).
The proposed method of estimation is motivated by the observation that for all t, which combines information from all data. In principle, the σalgebra G t can be chosen to include more or less information, for instance, G t may be enlarged to the σ -algebra generated by all data, namely, Y 1 , Y 2 , . . . ,Y n , in which case the imputed score defined by the left side of (6) is the observed-data score, and the associated Z-estimation method corresponds to (conditional) maximum likelihood estimation. The current choice of G t is motivated by the ease of computation and the fact that in the absence of censoring, solving the preceding estimating equation reduces to maximum likelihood estimation. Below, we state an iterative algorithm for solving (6).
Step 1. Initialize the parameter estimate by some estimate, denoted by θ (0) . Step 2. For each k = 1, . . . , obtain an update of estimate θ (k) by where for any current estimate θ (c) , Q(θ|θ (c) ) = n t=p+1 Q t (θ|θ (c) ), and Step 3. Iterate Step 2 until θ (k) − θ (k−1) 2 / θ (k−1) 2 < for some positive tolerance ≈ 0. Letθ be the estimate obtained from the last iteration. We now justify the proposed iterative algorithm. In many cases Q(θ|θ (c) ) is differentiable with respect to θ , so solving Equation (7) is equivalent to solving the equation Under suitable regularity conditions, differentiation and expectation can be interchanged. Let f (·, θ ) be the density function of some distribution, Throughout, we assume that Thus, θ (k) solves the following equation: Hence, if the iteration converges to a limit denoted byθ , then it holds that so thatθ solves the estimating Equation (6).
Remark 2.1. Since Equation (6) may have multiple roots, it is desirable to initialize the preceding algorithm by some estimate close to the true value. As mentioned in Section 1, in the case of normal innovations and left (right) censoring, the estimators introduced by Robinson (1982a) and Robinson (1982b) can be used to provide consistent estimates that can serve as initial values for the proposed algorithm. However, these estimators are generally inefficient, for instance, the initial AR estimates use information from a subset of the data whose lagged responses from lags 1 to p are uncensored. Consequently, for small samples, the initial AR estimates could be nonstationary. Similarly, the innovation variance estimator could be non-positive (Amemiya 1973). We note that the methods introduced by Robinson (1982a) and Robinson (1982b) may be lifted to the case of nonnormal innovation distributions which admit explicit formulas for their truncated moments; see Jawitz (2004) for a survey of such formulas. An alternative initialization scheme consists of replacing the censored observations by their censoring limits for left or right censoring or the mean of the censoring limits in the case of interval censoring, and fitting model (1) with the modified data as if they were uncensored. While the initial estimates so obtained are biased (Park et al. 2007), our limited experience suggests that the algorithm so initialized generally converges without problems.
We present another characterization of the proposed estimatorθ . The conditional pdf of Y * t:t−p given X t: t:t−p signifies the product measure of the Lebesgue measure and the counting measure induced on R, for instance, the counting measure if R is a singleton. Hence, we can estimate the parameter by maximizing the composite likelihoodL n (θ ) = n t=p+1 log f (Y t:t−p |X t:t−p , θ ). Indeed, this will provide a consistent estimator as, under some general regularity conditions including ergodicity and stationarity, which is uniquely maximized at θ 0 , the true parameter. A major difficulty of this approach is that the maximization problem is generally intractable. Thus, we extend the definition of f (y t:t−p |x t:t−p , θ ) by decoupling the parameter indexing the conditional density of The new objective function suggests an iterative estimation scheme by first updating δ by some value that increasesL n (·, θ ) over its current value, with θ fixed at its current estimate, followed by updating θ as the just updated δ, with the procedure repeated until convergence. This is achieved by the proposed estimatorθ , becausẽ where the last inequality follows from Jensen's inequality. The consistency ofθ derived in the next section shows that it maximizes the objective functionL n (θ ) asymptotically. Henceforth, the proposed method will be referred to as quasilikelihood estimation.

Asymptotic Properties of the Estimator
In general, it is difficult to establish the global consistency of an estimator, which is also the case for our setting. Fortunately, as remarked earlier, a consistent estimator is generally available, so it suffices to establish local consistency for the proposed estimator. Henceforth, the initial estimate θ (0) for the iterative algorithm is assumed to be consistent. Therefore, we can and shall restrict the parameter space to , a neighborhood of θ 0 which, without loss of generality, is furthermore assumed to be compact.
The covariate process {X t } is assumed to be stationary and βmixing with exponentially decaying mixing coefficients, which is a mild assumption. So shall we assume the regression error process {η t }, which holds if (i) all roots of the characteristic polynomial 1 − p j=1 ψ j z j lie outside the unit circle (Cryer and Chan 2008) and (ii) ε t admits a pdf (Pham and Tran 1985).
The asymptotic property of the estimator depends on the following Z functions, The estimating Equation (6) is equivalent to To establish the desired consistency and asymptotic normality for the proposed estimator, we make heavy use of empirical process theories for Vapnick-Cervonenkis (V-C) classes of functions (Arcones and Yu 1994). See Van der Vaart (2000) for a review of V-C class. We shall require the process {Z t (θ ); θ ∈ } to be a V-C class satisfying some moment condition.
In summary, the following assumptions are imposed. Let q ∈ (2, ∞) be some fixed real number. A1. The initial estimate θ (0) is a consistent estimator and the parameter space is a compact neighborhood of the true parameter θ 0 . A2. The covariate process {X t } is β-mixing with exponentially decaying mixing coefficients. A3. The censoring region may vary with t but it is assumed to be an interval C t = (c t,l , c t,u ) with possibly infinite censoring limits. Alternatively, the censoring region is the complement of an interval C t = (c t,l , c t,u ) c , to accommodate simultaneous left and right censoring. The process {(Y * t , X t , c t,l , c t,u )} is stationary. The censoring limits are β-mixing processes with exponentially decaying mixing coefficients, independent of {X t } and Y * t . A4. For all |z| ≤ 1, the polynomial 1 − p j=1 ψ j z j = 0. A5. The distribution of ε t has a twice differentiable pdf. A6. The class of functions {Z t (θ ) : θ ∈ } is a V-C subgraph class and there exists an envelope function F such that and it is nonsingular at the true parameter θ 0 . The asymptotic properties of the proposed estimator are established in the following theorem.
Furthermore, if A7 holds, it is also asymptotically normal, that is, Remark 2.2. Note that the asymptotic covariance matrix has a sandwich form involving two matrices. The first one M 1 is assumed to be nonsingular, and the second matrix M 0 is an infinite sum of autocovariance matrices.
Both matrices are not easy to compute. In this regard, a parametric bootstrap procedure is proposed to estimate the covariance matrix and construct confidence intervals of the unknown parameters. Specifically, given the estimatorθ and using the observed {X t }, uncensored observations of the same sample size can be readily simulated. Then, under assumption A3 that the censoring process is independent of the X's and the Y * s, the uncensored observations can be subject to the observed censoring scheme to yield the simulated censored time-series responses with which a bootstrap estimateθ can be obtained using the iterative algorithm. The bootstrap procedure can be replicated for, say, B times to yield a sample of bootstrap parameter estimates with which the sample covariance matrix of the bootstrap estimates provides an estimate of the covariance matrix ofθ . Also, confidence intervals can be constructed directly from the sample quantiles of the bootstrap parameter estimates. The consistency of the proposed bootstrap method can be established via some general arguments (Van der Vaart 2000, problem 5, p.340), under some regularity conditions including the central limit theorem for triangular arrays of β-mixing processes (Andrews 1991).

Normal Innovations
We now specialize to the important case that the innovations {ε t } are normally distributed with zero mean and common variance σ 2 . Then, the log-likelihood of Y * t conditional on F * t is given by the following expression apart from an additive constant, For any given parameter vector θ , let which can be readily computed based on the explicit formulas for the moments of a truncated multivariate normal variable (Tallis 1961); see also the R (R Core Team 2015(@) package mvtnorm (Genz and Bretz 2009;Genz et al. 2014). Then The maximization required in Step 2 can be carried out via block coordinate descent that sequentially updates β, ψ's and σ 2 block by block as follows: 1. For given θ (k) , ψ, and σ , the regression coefficient β is updated by regressing 2. For given θ (k) , β, and σ , update ψ by maximizing the Q function which is a quadratic function of ψ so the optimization can be readily done. Alternatively, it suffices to update ψ to another feasible vector which increases Q(·|θ (k) ). 3. For given θ (k) , β, and ψ, update σ by the formula Furthermore, it is clear that assumption A5 is true. A6 can also be verified as in Proposition 1 in the supplement with some mild assumptions about X t . In addition, A7 can be verified for the case of left (right) censoring with a constant censoring limit. The proof techniques used there can be extended to other continuous innovation distributions, under certain regularity conditions.

Model Prediction
Consider the problem of predicting the future values Y * n+h , where h = 1, 2, . . . , H, given the observations {(Y t , X t )} n t=1 and supposing the availability of future covariate values {X t+h } H h=1 . To simplify the derivation of the predictive distribution, we assume the normality of ε t and known θ 0 , although the following method can be readily extended to nonnormal innovations. Due to the autoregressive nature of the regression errors is censored ; see Zeger and Brookmeyer (1986). If τ = n − p + 1, that is, the most recent p Y's are uncensored, the prediction problem is the same as that for an ordinary time series regression model. In particular, for any h = 1, . . . , H, L n,h is a normal distribution. Specifically, the predicted valueŶ * n+h is given recursively bŷ Y * n+h = X n+h β +η n+h , withη n+h = p l=1ψ lηn+h−l , and η t = Y t − X t β for t ≤ n. The prediction error can be written as n+i = ε n+i + p l=1 ψ l n+i−l = i j=0 ω i, j ε n+i− j , where the coefficients ω i, j can be calculated recursively through the preceding identity and the initial condition ω i,0 = 1, which results in a formula for the prediction variance, namely, var(Ŷ * n+h ) =σ 2 i j=0 ω 2 i, j . If τ < n − p + 1, then L n,h is generally a truncated multivariate normal distribution. Although its first and second moments can be computed analytically, they are not as useful in constructing predictive intervals. Here a Monte Carlo method is proposed to estimate any interesting characteristic of the predictive distribution of Y * n+h . Note that the regression errors {η t = Y * t − X t β} n t=τ follow a multivariate normal distribution, unconditionally. Let η c and η o be the subvectors of η τ :n such that the corresponding elements of Y τ :n are censored and observed, respectively. Then given Y τ :n , η c follows a truncated multivariate normal distribution, whose realizations can be readily simulated so we can draw realizations from the conditional distribution of Y * n−p+1:n and thence those of Y * n+h , h = 1, . . . , H, given {(Y t , X t )} n t=1 and {X t+h } H h=1 . We can then construct predictive intervals of Y * n+h from a random sample from the predictive distribution of Y * n+h , using the percentile method.

Simulated Residuals
In the presence of censoring, there are several ways to define residuals, for instance, generalized residuals and simulated residuals (Gourieroux et al. 1987;Hillis 1995). See also Cox and Snell (1968). If Y * t is observed, the corresponding residual is universally defined as Y * t −Ŷ * t|t−1 , whereŶ t|t−1 is the mean of L t−1,1 , evaluated at the parameter estimate. In the presence of censoring so that some Y * t s are unobserved, we compute the simulated residuals as follows: first, impute each unobserved Y * t by a realization from the conditional distribution L(Y * t | {(Y s , X s )} t s=1 ), evaluated at the parameter estimate. Then, refit the model with {(Y * t , X t )} so obtained, via conditional maximum likelihood; the residuals from the latter model are the simulated residualŝ ε t . Let the corresponding parameter estimate of θ beθ . The corresponding (simulated) partial residuals for the X's, that is, X tβ +ε t , can be used to assess the relationship between Y and X, after adjusting for the autoregressive errors. Gourieroux et al. (1987) showed that under some regularity conditions, model diagnostic tests using the simulated residuals have the same asymptotic null distributions as the uncensored case. Limited simulation study reported below shows that the asymptotic null distribution of the Ljung-Box test statistic based on the simulated residuals is the same as that for uncensored data, hence it provides a useful tool for model diagnostics.

Empirical Performance of the Proposed Estimation Method
We study the empirical performance of the proposed method by simulations. Data were simulated from the CARX model subject to left censoring with a constant censoring limit c, with the covariate X t 's being independent two-dimensional random vectors comprising independent standard normally distributed components. The regression errors follow an AR(3) process with normally distributed innovations of zero mean and variance σ 2 . Left censoring is enforced with the censoring limit being −1.5, −0.7, and −0.2 to make approximate censoring rates of 5%, 20%, and 40%, respectively. Several sample sizes including 100, 200, 500, and 1000 were tried. Unless stated otherwise, all experiments were replicated 1000 times. As comparison, we contrast the proposed method (method 1 in Table 1) with the incorrect but convenient method of ignoring censoring and applying conditional maximum likelihood estimation with the censored data simply replaced by their censoring limits (method 0), whose estimates served as the initial values for the proposed method. Table 1 reports the sample mean for each parameter estimate and their sample standard deviations enclosed in round brackets, for both methods. In addition, the empirical coverage Table . Simulation study. The true parameters are displayed in the first row. For each sample size , , , or  and left censoring limit −1.5, −0.7, or −0.2, we contrast the proposed method (labeled as ) with conditional maximum likelihood estimation ignoring censoring, that is, with the censored data simply replaced by the censoring limit c. For each parameter, the reported values are the sample mean and sample standard deviation (in round brackets). In addition, the empirical coverage rates of the 95% bootstrap confidence intervals based on the proposed method are enclosed in square brackets rates of the 95% confidence interval obtained by parametric bootstrap are listed in square brackets, but only for the proposed method because conditional maximum likelihood estimation ignoring censoring is quite biased for the case of moderately high censoring rate. It can be seen that the proposed method performed well in all cases, while ignoring censoring led to increasing bias with the censoring rate. For both methods, the variance of the estimator increased with the censoring rate and decreased with the sample size. Parametric bootstrap seems to work well as the empirical coverage rates of the bootstrap confidence intervals were quite close to the nominal 95%.
The MLE was computed by implementing the EM algorithm in Section 3.2 of Zeger and Brookmeyer (1986), except that the AR parameter update was done by directly maximizing the imputed log-likelihood. Here, we only report the results for the case of sample size equal to 400, as they are representative of results of other sample sizes; results for ψ = −0.5 are similar to those for ψ = .5, and hence omitted. Figure 1 plots the ratio of the mean squared error (MSE) of the proposed estimator to that of the MLE against the censoring rate, for each parameter of the model, which shows that for ψ = .5, there is little loss of efficiency as measured by the MSE, being at most 6% loss at 50% censoring rate. Moreover, the AR parameter seemed to have been slightly more efficiently estimated by the proposed method than ML estimation at low censoring rates, perhaps because the numerical (possibly high-dimensional) integration required by ML estimation more than offsets its theoretical efficiency in such cases. For ψ = 0.9, the loss of efficiency is greater but lower than 8%, except for the AR(1) coefficient estimate whose loss elevates to 14% at 50% censoring rate. Mean computation time is another important metric for comparing the two methods. Figure 2 plots against the censoring rate the ratio of the mean computation time of ML estimation to that of the proposed method, which demonstrates that the proposed method is computationally much more efficient than ML estimation as it was almost 50 (800) times faster than ML estimation at 50% censoring rate, for ψ = 0.5 (0.9).

The Method Applied to Missing Data
As noted by Zeger and Brookmeyer (1986), missing responses can be regarded as resulting from left censoring with an infinite censoring limit. In particular, a CARX model with the response missing at random provides another setting for comparing the proposed method with Gaussian likelihood estimation which can be readily carried out via Kalman filtering as implemented by the arima function in the stats package (R Core Team 2015; Ripley 2002). Data of size 100, 200, 400, respectively, were simulated from the model in Section 3.2 except that the regression errors now follow an AR(2) model. Data cases were subsequently discarded randomly to make a missing rate  of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, respectively. With each simulated series, the CARX model was estimated once by ML estimation via the arima function and once by the proposed method, again with the true parameter values as the initial values. The results were similar across different sample sizes, so we only report the case for sample size 400. Figure 3 plots against the missing rate the ratio of the MSE of the proposed method to that of the ML estimation, for each parameter, which shows that the MSE of the proposed method is less than 5% higher than ML estimation for missing rate up to 20%, and is generally not more than 10% higher even at 50% missing rate, except for the parameter σ . The simulation results in this and last subsections indicate that the proposed method generally incurs relatively little loss of efficiency compared with ML estimation.

The Robustness of the Proposed Method
As the proposed method may also be regarded as a generalization of Gaussian likelihood estimation or conditional least squares, it is of interest to assess its robustness against departure from the normal innovation assumption. We did this with a simulation study where the innovations were t-distributed, while the model estimation was done based on normal innovations.
Data were simulated from a CARX model with independent two-dimensional standard normal covariates and AR(3) errors with t-distributed innovations of degrees of freedom equal to 5, 10, 20, or ∞ (i.e., normal distribution), and sample size equal to 100, 200 or 400. See Table 2 for the true parameter values. The responses were censored whenever their magnitude exceeds some threshold that makes a censoring rate of approximately 20%. The innovation distributions are scaled to ensure that they have identical unit standard deviation. Computation of the parametric confidence intervals were based on 500 bootstraps, and each experiment was replicated 1000 times. The results summarized in Table 2 shows that the proposed method is robust to heavier tails in the innovation distribution than the normal distribution, as the estimates are comparable in terms of bias and standard deviation, across the range of degrees of freedom, and the empirical coverage rates are all close to the nominal 95%, except for the parameter σ . That the estimator of σ is nonrobust can be expected as this is the case even if there is no censoring, which highlights a future problem for developing more robust estimation of σ .

The Ljung-Box Test Statistic Based on Simulated Residuals
Next, we report some simulation results on the empirical performance of the Ljung-Box test statistic, based on the simulated residuals, that is used as a tool for model diagnostics. The null model is an AR(2) model with a two-dimensional covariate comprising two independent standard normal variables, with β = (0.2, 0.4), ψ = (0.28, 0.25), and σ = 0.6. We computed the Ljung-Box statistic using the first 10 or 20 lags of the sample autocorrelation function (ACF) of the simulated residuals. For assessing the power of the test, the AR part of the model is embedded in an AR(3) model with the AR operator given by (1 − δB)(1 − (B)), where δ = 0.0, 0.1, 0.2, . . . , 0.8. The data are left-censored so as to achieve a long-run censoring rate of 15%, or 30%, plus the case of no censoring as a benchmark, with sample size 100 or 200. The empirical rejection rates based on 1000 replications are shown in Figure 4. The sizes of the test under all settings are quite close to the nominal 5% level. The power generally increases with greater deviation of δ from zero, lesser censoring, larger sample size, and fewer lags used in computing the Ljung-Box statistic. That using more lags in the test resulted in lower power is expected owing to the geometric decay of the ACF so that its higher lags quickly become non-informative.

An Application to the Total Phosphorus Concentration in River Water
Phosphorus is one of the two nutrients of main concern in Iowa river water, as excessive phosphorus in river water can result in eutrophication. Phosphorus concentration in river water has been closely monitored under the ambient water quality program conducted by the Iowa Department of Natural Resources (Libra, Wolter, and Langel 2004). Here, we analyze a series of 120 monthly phosphorus concentration (P) in mg/L, in river water collected at an ambient site located at Whitebreast Creek near Knoxville, Iowa, USA, from October 1998 to March 2010. There is a gap of missing P data from September 2008 to March 2009, when data collection was suspended owing to lack of funding. The data were censored when P fell below certain detection limits c t (dashed red line in Figure 5) that varied over time, resulting in about 10% censoring. It is known that P is generally correlated with the water discharge (Q) (Schilling et al. 2010). The main interest is to explore the relationship between P and Q with censored P data. The Q data were obtained from the website of the U.S. Geological Survey. See Figure 5 for the time plots of P, Q, and the historical censoring limits.
Preliminary analysis (unreported) shows that taking the logarithmic transformation for both P and Q renders their relationship more linear. Model diagnostics with a preliminary linear regression model log(P t ) = β 0 + β 1 log(Q t ) + η t indicates (i) the presence of serial residual autocorrelation, hence we model η t as an autoregressive process, and (ii) that the P-Q relationship is seasonal. We model the seasonal relationship by introducing the dummy seasonal dummy variables S j , j = 1, 2, 3, 4 for quarters 1-4 and their interactions with discharge log(Q t ), where the first quarter comprises January to March, the second quarter from April to June, etc. In summary, the model is where the regression errors {η t } follow an autoregressive model as defined in Equation (2) with ε t independent and identically distributed as N(0, σ 2 ). The coefficients β 0,1 and β 1,1 are the intercept and slope for quarter 1, β 0,2 and β 1,2 those for quarter 2, etc.
Since the AR order is unknown and the seasonal P-Q relationship is uncertain, we fitted a number of models with or without seasonal P-Q relationship, altogether 8 models, and the  AR order from 1 to 3. Model selection was then carried out by using an information criterion similar to the AIC with the loglikelihood replaced by the conditional expectation defined in Equation (8). A seasonal regression model with AR(2) regression errors was selected. The final model fit is summarized in Table 3; the parametric bootstrap 95% confidence intervals are based on 1000 replicates.
The P-Q relationships in quarters 2 and 4 are quite similar. Indeed, constraining the regression coefficients to be identical for these two quarters slightly reduces the AIC from 17.027 to 17.013. The rate of change in log(P) per unit change in log(Q) is highest in quarter 1 and lowest in quarter 3; see Figure 6   which shows the simulated partial residual plot of log(Q) (cf. Section 2.6), with the fitted quarterly linear relationships superimposed in the diagram. These found relationships are consistent with the fact that discharge is generally lowest in quarter 1 and highest in quarter 3.
The AR estimates are moderate in values, suggesting rather short memory in the data. Figure 7 plots the ACF of the simulated residuals, which suggests no residual autocorrelation. The Ljung-Box test statistic using the first 10 lags of the residual ACF is 5.28 with p-value 0.27, suggesting no serial autocorrelation in the residuals and that the model provides a good fit to the data.
Finally, Figure 8 plots the time plot of P and the exponentiation of the fitted values, showing that the fitted values generally track the data well, but less so for the larger P peaks.
As an illustration of prediction with a censored time series, we refitted the selected model with the data excluding the last six observations that are withheld for assessing the real prediction performance.
The point predictors and their 95% prediction intervals, computed using the procedure elaborated in Section 2.5 and assuming normal innovations, are shown in Figure 9, with the actual data superimposed on the diagram. Overall, the prediction tracks the data movement reasonably well although the last three data points are somewhat close to the lower prediction limits. We note that the prediction results (unreported) are similar with the innovations bootstrapped by drawing randomly with replacement from the simulated residuals.

Discussion and Conclusion
We have proposed a new method to estimate a censored regression model with autoregressive errors. The consistency and asymptotic properties are established under some general conditions. The proposed method can be readily implemented for the case of normal innovations. Simulation studies indicate that the proposed method enjoys excellent sampling properties, whereas ignoring censoring results in substantial bias when the censoring rate is moderately high. We illustrate the efficacy of the proposed method by analyzing the seasonal phosphorus-discharge relationship with a phosphorus concentration data. Some interesting future works include extension of the proposed method to estimating a censored regression model with autoregressive moving average regression errors and studying the theoretical properties of the proposed estimator when the covariate process X t is nonstationary, for instance, in a trend analysis. It is also of practical importance to study the estimation method in the case of non-Gaussian innovation distributions. Also of interest is to study the theoretical properties of the simulated residuals in the current setting.