Working Paper : 2013053 Title of Paper A varying-coefficient expectile model for estimating Value at Risk

This paper develops a nonparametric varying-coefficient approach for modeling the expectilebased Value at Risk (EVaR). EVaR has an advantage over the conventional quantile-based VaR (QVaR) of being more sensitive to the magnitude of extreme losses. EVaR can also be used for calculating QVaR and Expected Shortfall (ES) by exploiting the one-to-one mapping from expectiles to quantiles, and the relationship between VaR and ES. Previous studies on conditional EVaR estimation only considered parametric autoregressive model set-ups, which account for the stochastic dynamics of asset returns but ignore other exogenous economic and investment related factors. Our approach overcomes this drawback and allows expectiles to be modeled directly using covariates that may be exogenous or lagged dependent in a flexible way. Risk factors associated with profits and losses can then be identified via the expectile regression at different levels of prudentiality. We develop a local linear smoothing technique for estimating the coefficient functions within an asymmetric least squares minimization set-up, and establish the consistency and asymptotic normality of the resultant estimator. To save computing time, we propose to use a one-step weighted local least squares procedure to compute the estimates. Our simulation results show that the computing advantage afforded by this one-step procedure over full iteration is not compromised by a deterioration in estimation accuracy. Real data examples are used to illustrate our method.


Introduction
Value at risk (VaR) is a popular measure to evaluate the market risk of a portfolio. VaR identifies the loss that is likely to be exceeded by a specified probability that generally ranges between 0.90 and 0.99 over a defined period. VaR is therefore a quantile of the portfolio loss distribution; however, the use of VaR is not without criticism. It is generally agreed that VaR has three major drawbacks. First, it focuses exclusively on the lower tail of the distribution, and hence it conveys only a small slice of the information about the loss distribution. Second, it lacks subadditivity. That means the VaR of a portfolio can be larger than the sum of the individual VaRs, which contradicts the conventional wisdom that diversification reduces risk. Third, VaR tells us nothing about the magnitude of the loss as it accounts only for the probabilities of the losses but not their sizes. In light of these shortcomings of the VaR, Artzner et al. (1999) proposed to measure portfolio risk by expected shortfall (ES) instead. The ES is defined as the conditional expectation of a loss given that the loss is larger than the VaR. Contrary to VaR, ES provides information on the magnitude of the loss beyond the VaR level and is subadditive. However, the calculation of ES can be an intricate computational exercise due to the lack of a closed form formula (Yuan and Wong, 2010).
Focusing on the third of the above mentioned drawbacks of the VaR, Kuan et al. (2009) proposed an expectile-based VaR (EVaR) as an alternative to the quantile-based VaR (QVaR) as a downside risk measure. Expectile regression estimates are obtained by minimizing an asymmetrically weighted sum of the squared errors (Aigner et al., 1976;Newey and Powell, 1987). Expectile estimates are thus more sensitive to the extreme values of the data than quantile estimates which are based on absolute errors. This feature makes the EVaR correspondingly more sensitive to the scale of losses than the conventional QVaR. Moreover, expectile estimates and their covariances are easier to compute, reasonably efficient under normality conditions (Efron, 1991;Schnabel and Eilers, 2009), and can always be calculated regardless of the quantile level, while the empirical quantiles can be undefined at the extreme tails. It has been shown that for a given distribution, there is a one-to-one mapping between quantiles and expectiles (Efron, 1991;Jones, 1994;Yao and Tong, 1996). In view of this, Efron (1991) proposed that the τ -quantile be estimated by the expectile for which there is 100τ % of sample observations lying below it. As pointed out by Kuan et al. (2009), this one-to-one relationship permits the interpretation of EVaR as a flexible QVaR, in the sense that its tail probability is determined not a priori but by the underlying distribution. Kuan et al. (2009) showed that the asymmetric parameter in the weighted mean squared errors may be interpreted as the relative cost of the expected margin shortfall which represents prudentiality, and the EVaR is thus a risk measure under a given level of prudentiality. As pointed out by Taylor (2008), there is an algebraic link between EVaR and ES under a given distribution of returns. This relationship permits a simple calculation of the ES associated with a given EVaR estimate.
Another contribution of Kuan et al.'s (2009) study is the development of a class of conditional autoregressive expectile (CARE) models, which allow expectiles to be estimated in a dynamic context where past returns influence present returns based on some special types of autoregressive processes. The allowance for asymmetric effects of positive and negative returns on the tail expectiles is a notable feature of the CARE models. Kuan Newey and Powell (1987) to allow for stationary and weakly dependent data. The main disadvantage of the CARE approach is that it considers only the stochastic dynamics of returns but ignores current information of the investment environment, such as the state of the economy and the financial environment at the time. It is possible to extend the CARE models to include both past observed returns and exogenous variables containing economic and market information as risk factors, and the objective of this paper is to take steps in this direction. Unlike the work of Kuan et al. (2009) which is based on a parametric set-up, we adopt the semiparametric varyingcoefficient approach (Cleveland et al., 1991;Hastie and Tibshirani, 1993). Nonparametric and semiparametric methods have the general benefits of allowing the data to self-adjust instead of assuming a priori a functional form. The varying-coefficient approach which allows coefficients to vary with other variables has in particular the advantage of circumventing the curse of dimensionality that afflicts the estimation of many nonparametric and semiparametric models. It also allows dynamic patterns and interactions of the covariates to be modeled in a flexible way. Due to these important advantages, the varying-coefficient approach is now a widely accepted modeling approach. The literature of varying-coefficient models is extensive; for a general appreciation of its scope and purpose, see Fan and Huang (2005) and Fan and Zhang (2008). Recent developments of the varying-coefficient approach emphasize time series analysis (Cai et al., 2000a;Cai and Xu, 2008), longitudinal analysis (Fan and Zhang, 2000;Fan and Li, 2004) and survival analysis (Fan et al., 2006;Cai et al., 2007. To the best of our knowledge, Honda (2004), Kim (2007) and Cai and Xu (2008) are the only existing studies that consider varying-coefficient models for conditional quantiles; Honda (2004) and Cai and Xu (2008) used local polynomials to estimate conditional quantiles with varying-coefficients, while Kim (2007) proposed an estimation methodology based on polynomial splines. However, the varying-coefficient approach to the estimation of expectiles remains heretofore unexamined. We call our model the varying-coefficient expectile regression model. It generalizes the functional coefficient autoregressive (FAR) model of Chen and Tsay (1993), and also encompasses the CARE model of Kuan et al. (2009) as a special case. Our model estimation is based on the ALS method using local linear smoothing along the lines of Yao and Tong (1996). It is shown that the proposed estimator possesses the desirable large sample properties of consistency and asymptotic normality. We also develop an efficient algorithm to facilitate the computation of the EVaR estimates.
The plan of the paper is as follows. Section 2 describes the varying-coefficient expectile model, and develops an ALS-based nonparametric approach for model estimation. The same section also discusses an iterative weighted local least squares approach together with a onestep algorithm for computing the coefficient estimates. Section 3 derives the asymptotic properties of the estimator and develops an estimator of the variance. Section 4 reports results of a simulation study and empirical applications. The technical proofs of theorems are presented in the Appendix.

Model Framework and an ALS Nonparametric Estimation Approach
Let (Y t , X t , U t ), t = 1, 2, · · · , T , be a sequence of strictly stationary random vectors, each having the same distribution as (Y, X, U). We let Y be asset returns, X = (X 1 , X 2 , · · · , X p ) be risk factors that may include lagged returns or other economic and financial factors, and U be a single effect modifying risk factor. We assume that all series in (Y, X, U) are strictly stationary processes satisfying the strong mixing (α-mixing) condition and E(Y 2 ) < ∞. The is the asymmetric squared error loss function, and θ ∈ (0, 1) is an asymmetric parameter that controls the degree of loss asymmetry. Note that ν θ (Y ) is different from the τ -quantile being the asymmetric absolute error loss function, and τ the corresponding asymmetric parameter. Owing to the squared error loss function (2.1), expectiles are more sensitive to the tails of the distribution than quantiles. Clearly, when θ = τ = 0.50, the θ-expectile and τ -quantile of Y reduce to the mean and median of Y respectively.
In this study, we let the θ-conditional expectile of Y be modeled by the varying-coefficient where a θ (U) = (a 1,θ (U), · · · , a p,θ (U)) T is a vector of smooth varying-coefficient functions of U, and a j,θ (U) s, j = 1, · · · , p, may be dependent on θ. For notational simplicity, we write a j,θ (U) simply as a j (U) hereafter whenever this is no confusion. The varying-coefficient model set-up is flexible in that the responses are linearly associated with a set of covariates, but their regression coefficients can vary with U. This framework also overcomes the curse of dimensionality as a j (U)'s are all low-dimensional functions. The coefficient function a j (U) thus characterizes the manner in which the relationship between a risk factor X j 's and asset returns Y changes as the level of the effect modifying risk factor U changes. We refer to θ-conditional expectile as the EVaR with θ-level of prudentiality. As mentioned in Section 1, the θ-level EVaR may be converted to QVaR's with tail probabilities that depend on the underlying distribution. Now, given ( 3) reduces to the ordinary nonparametric expectile regression model studied by Yao and Tong (1996). Another interesting special case of (2.3) is the following functional-coefficient autoregressive (FAR) model introduced by Chen and Tsay (1993): where ε t is a sequence of i.i.d. random variables distributed independently of Y t−i for any The FAR model is a very inclusive model that contains the threshold autoregressive (TAR) model (Tong, 1983(Tong, , 1990), the exponential autoregressive (EXPAR) model (Haggan and Ozaki, 1981), and the smooth transition AR (STAR) model (Granger and Teräsvirta, 1993) as special cases. To show that (2.5) satisfies the strong mixing condition, suppose that a j (·) can be written as a j (·) = is geometrically ergodic, which implies that Y t is a strong mixing process (Pham, 1986).
Minimizing (2.6) with respect to β yields the following estimating equation: where L θ (z) = 2z.|θ − I(z ≤ 0)| is the derivative of the loss function Q θ (z) defined in (1, h), with ⊗ denoting the Kronecker product, and I p being a p-dimensional identity matrix. When p = 1 and X 1 ≡ 1, the solution of (2.7) reduces to a special case of the local M-estimator given in Cai and Ould-Saïd (2003).

An Algorithm for Computing Estimates
Many different approaches can be used to solve equation (2.7) and obtain estimates for the unknown parameters β. Here, we use an iterative weighted local least squares (IWLLS) approach in conjunction with a one-step algorithm for obtaining the estimates. IWLLS approaches similar to ours have been used in other studies of expectiles (e.g., Newey and Powell, 1987;Yao and Tong, 1996). The one-step algorithm has the advantage of saving computing time. To describe the IWLLS approach, let us re-write the estimating equation the IWLLS estimator may be expressed as where e t (θ) = Y t − Z T t β j θ (u 0 ), provided that the matrix inverse in (2.9) exists. Iteration continues until convergence is achieved. A natural initial value for the iteration is the least squares estimator obtained by setting θ=0.50 because this estimator is easy to compute.
The computational burden involved in applying this iterative procedure to estimate the entire coefficient function β θ (u) is likely to be enormous as it entails the calculation of estimates at hundreds of points even over a small interval. One way to reduce the computational burden is to resort to the following one-step algorithm that has been used in several other studies of varying-coefficient models (e.g., Cai et al., 2000;Cai et al., 2007). Letβ 0 (u 0 ) be an initial estimate at any given point u 0 . A one-step application of (2.9) yields the estimator . The choice of a good initial estimate is the key to the success of this one-step algorithm. Following Fan and Chen (1999), it is not difficult to show that a sufficient condition for the estimator resulting from the one-step algorithm to share the same asymptotic properties as that obtained using the fully iterative procedure is As pointed out by Cai et al. (2007), if this sufficient condition is not satisfied, a multi-step estimator that repeatedly applies the one-step algorithm k times should be used instead, in which case the sufficient condition is relaxed to as initial values for their nearest grid points, and at each of these grid points we compute the one-step estimate based on (2.10). Then we use the newly computed one-step estimates as initial values for their nearest grid points to compute the one-step estimates at these other points. We propagate until the one-step estimates at all grid points are obtained.
For example, in our simulation study, our aim is to estimate the coefficient functions at n grid = 200 grid points. To do so we first compute the IWLLS estimates at five distant points: u 20 , u 60 , u 100 , u 140 , u 180 . We then use, for instance, β(u 60 ) as an initial value for calculating the estimates β(u 59 ) and β(u 61 ) based on the one-step estimator formula (2.10), and subsequently proceed to use these estimates as initial values for calculating the onestep estimates β(u 58 ) and β(u 62 ), and so on. We continue this process until the one-step estimates at all the points in the neighborhood of u 60 , say, the points between u 40 and u 79 , are all calculated. In Section 4, we will show that estimates resulting from this one-step procedure are as efficient as those obtained from IWLLS based on full iterations.

Bandwidth Selection
Various bandwidth selection techniques have been developed for nonparametric regression.
Here, we adopt the method proposed by Cai et al. (2000a), which may be regarded as a modified multi-fold cross-validation criterion that takes into consideration the structure of stationary time series data.
Let m and H be two positive integers such that n > mH. The method first uses H subseries, each of length T − km (k = 1, · · · , H), to estimate the unknown coefficient functions. Then it computes, based on the estimated models, the one-step forecast errors of other subseries each of length m. Specifically, we select the optimal bandwidth h = h opt that minimizes average mean squared (AMS) error and a j,k (·)'s are computed based on the subsample In practice, the choices of m and H usually do not have a large impact on the value of h chosen by this method as long as mH is reasonably large so that the forecast errors are stable.
We will apply this bandwidth selection method in the simulation and real data analysis of Section 4.

Using Expectile to Estimate VaR and ES
The θ-level expectile is the EVaR at θ-level of prudentiality. As mentioned previously, for any given distribution and θ, there is always a τ such that the θ-level EVaR equals the τ -level QVaR. This special feature makes it possible to use EVaR to compute QVaR. Specifically, let F (y) be the distribution function of asset returns Y , and for any τ ∈ (0, 1), let θ(τ ) be the expectile level such that ν θ(τ ) (Y ) = q τ (Y ). Yao and Tong (1996) showed that θ(τ ) and .
( of calculating the QVaR at a predetermined τ level as is commonly done, a more sensible strategy is to compute the EVaR at a given θ, then let the data reveal the corresponding tail probability and the QVaR at that level. Figure 1 and Table 1  As mentioned in Section 1, ES overcomes certain weaknesses of VaR and is becoming a widely used downside risk measure. Now, expectiles can also be used to calculate ES by using the relationship between VaR and ES, and that between expectiles and quantiles. Taylor (2008) provided a formula linking these quantities. He showed that the first-order condition resulting from the minimization of E[Q θ (Y − ν)] over ν may be written as Now, from the relationship between expectile and quantile, we have F (ν θ (Y )) = τ . Hence the above expression becomes This expression relates the ES associated with the τ -quantile to the corresponding θ-expectile.
The expression applies to the ES in the lower tail of the distribution; the corresponding upper tail expression may be obtained by replacing τ and θ by (1 − τ ) and (1 − θ) respectively.
Taylor's (2008) empirical results showed that the ES computed based on expectiles are very close to those obtained by other methods. The conditional ES(τ |·) can be obtained by

Asymptotic Properties and Variance Estimation
The purpose of this section is to explore the asymptotic properties of the estimators β θ s. In particular, we prove that β θ s are consistent and asymptotically normal. We also provide an empirical method for estimating the variance of the estimator.

Asymptotic Properties
Further, denote f U (·) as the marginal density of U, a (u) and a (u) the first and second derivatives of a(u) respectively. We have the following theorems:

Corollary 1 Under the conditions of Theorem 1, we have
in probability for any u ∈ U, with U being the support set of u.

Corollary 2 Under the conditions of Theorem 2, we have
Corollaries 1 and 2 focus on the the estimator of the functional coefficient vector, a θ , and are special cases of Theorems 1 and 2 respectively. By Corollary 1, a θ converges to the true parameter vector a 0 when the sample size is sufficiently large and is thus consistent.
By Corollary 2, a θ follows an asymptotic normal distribution. The result of Corollary 2 is useful for constructing confidence intervals of the unknowns.

Remark 2:
The conditional expectile for a given level of θ may be calculated using the estimated coefficients by virtue of equation (2.3). This expectile measure is also the EVaR, which may be converted to a τ −level QVaR under an assumed distribution using the one-to-one relationship between expectiles and quantiles. The ES corresponding to a given EVaR may also be calculated using Taylor's (2008) procedure.

Variance Estimation
We consider estimation of the covariance matrix of β θ by the sandwich method. To do so, we have to first estimate D(θ, u 0 ) and Σ(θ, u 0 ). By sample analogy, we have and where a ⊗2 = aa T , and Q θ (·) and Q θ (·) are respectively the first and second derivatives of the function Q θ (·) with respect to θ. We show in Lemmas A.1 and A.2 in the Appendix that D(θ, u 0 ) and Σ(θ, u 0 ) are consistent estimators of their respective unknowns. That is, Thus, a consistent estimator of the sandwich covariance matrix of β θ is The asymptotic covariance matrix of a(u 0 ) is the p × p north-west submatrix of (3.5). The The RASE is measured by where u k 's, k = 1, · · · , n grid , are the grid points at which the coefficient functions a j (u)'s are estimated. In each case we report estimation sampling performance relating to the empirical median and the standard deviation of the RASE across 500 repetitions. We consider sample sizes of T = 200, 400, 800, and use the Epanechnikov kernel function K(u) = 3 4 (1−u 2 )I(|u| ≤ 1) for local linear smoothing. The method discussed in Section 2.3 is used to select the bandwidth.
Our application of the one-step estimation algorithm involves the following: Step 1: Set n grid = 200 and divide the interval of U into {u 1 , u 2 , · · · , u 200 }.
Step 4: Calculate, using formula (2.10), the one-step estimator at u K−1 and u K+1 using β θ (u K ) obtained in Step 3 as an initial value, K = 20, 60, 100, 140, 180, and then use the resultant estimates as the initial values for calculating β(u K−2 ) and β(u K+2 ) based on formula (2.10). Continue this process until all the one-step estimates at u n k for n k = K − 20, · · · , K + 19 are computed.
We consider the following three experimental designs. Similar designs were used by Cai et al. (2000a) and Cai and Xu (2008) in their simulation studies.

Design 3: Varying-coefficient model with exogenous regressors:
. We set c = −5 and P = 0.05. The optimal bandwidths are chosen in the same manner as under the previous two designs.
and a 2 (·) are obtained based on θ = 0.50, they can also be obtained using other values of θ as under Design 2, the coefficients a 1 (·) and a 2 (·) do not depend on θ. The figures show very similar results to those shown in Figure 2 of Kuan (2009). Specifically, When P < τ, EVaR varies with c, but the corresponding QVaR is relatively insensitive to the magnitude of extreme losses based on the error distribution. Although QVaR changes with c when P ≥ τ , its magnitude is smaller than that of the EVaR for all c. These suggest that the EVaR is a more sensitive risk measurement to catastrophic losses than the QVaR.

Real data examples Example 1.
This example is based on 2348 daily closing bid prices of the Euro in terms of the U.S. dollar between January 1, 2004 and December 31, 2012. We denote the bid price as p t , and compute the weekly returns Y t as 100 times the difference of the log of prices, i.e., Y t = 100 log(p t /p t−1 ). We let the θ-level expectile be modeled by and denote the corresponding varying-coefficient model as VC (2). Fan et al. (2003) and Cai and Xu (2008) also considered the modeling of exchange rate data by a varying-coefficient approach, although they did not apply their methods to expectile estimation. Following these authors, we let U t = p t−1 /M t − 1 be the effect modifier, where M t = L j=1 p t−j /L is a moving average of time and can be used to proxy the trend at time, and L is the total number of periods in the moving average. We choose L = 10 to account for a two-week period. As discussed in Fan et al. (2003) and Cai and Xu (2008), this choice of U t is a moving average technical trading rule commonly used in finance, and M t a proxy for the trend at time t − 1.
We use the first 1500 observations for model estimation and the remaining observations for out-of-sample forecast evaluation, and adopt the Epanechnikov kernel function For comparison purposes, we also consider the following SQ(2) and ABS(2) parametric models used in Kuan's (2009) empirical analysis: ABS(q) model: Table 5 reports results corresponding to θ=0.01, 0.05, 0.10, where τ in and τ out are the in-sample and out-of-sample tail probabilities for the estimated expectiles respectively. We also consider larger values of θ, but as the results are qualitatively similar to those reported in Table 5 we omit them for brevity. We observe from the table that for any given θ value, which represents prudentiality, all three models invariably produce larger τ in and τ out than θ, and larger τ out than τ in . The former observation indicates that the QVaR, if calculated at the same level as the desired level of prudentiality, will likely underestimate value at risk. The latter observation is of no surprise, given the higher level of volatility of Y t in the out-of-sample period, as reflected in Figure 5(a). That said, among the three models, the VC(2) model yields the smallest |τ out − τ in |/τ in for any given θ. This may be taken as an indication that the VC(2) model produces more stable estimates than the SQ(2) and ABS (2) models. Comparing the τ in and τ out values shown in Table 5 against the tail probabilities under various distributions for different values of θ as shown in Table 1  Indeed, the strong asymmetry of some of the estimated functions is a notable feature of the results. For example, Figure 6(b) shows that a 0.95 (·) has a large positive value when U < −0.25, but a smaller negative value when U > 0.2. As well, the estimated functions at different θ levels frequently intersect one another. Furthermore, Figures 6(b) and (c) reveal that Y t−1 and Y t−2 mostly have a negative impact on the conditional expectile except for levels of θ higher than 0.5.
We also apply the VaRs estimated from the three models to construct forecast intervals for the last 848 observations. This entails the application of the correspondence between expectile and quantile as described in Section 2.4. Specifically, we first apply kernel smoothing to estimate the density function f (y) using the first 1500 observations. This allows us to calculate E(Y ), q τ (Y ) and qτ (Y ) −∞ ydF (y) in (2.13) and the corresponding expectile levels θ(τ ) for different τ 's. Then we estimate ν θ (Y |·), the conditional EVaR at the various expectile levels, by the varying-coefficient model. As discussed previously, ν θ (Y |·) is identical to q τ (Y |·) at the τ -level. For example, when τ = 0.05, θ(0.05) = 0.0216 (based on h opt = 0.023), and when τ = 0.95, θ(95%) = 0.9801 (based on h opt = 0.031). Figures 7 (a)-(c) present the estimated 5% and 95% VaR using the VC(2), SQ(2) and ABS(2) models respectively together with the actual observations. The estimated 90% prediction interval of the VC(2) model contains 87.26% of the observations. This is closer to the predetermined 90% and higher than the corresponding 84.32% and 85.14% achieved by the SQ(2) and ABS(2)-based prediction intervals.

Example 2.
In this example, we use dummy variables representing the five weekdays and the returns of the S&P index as exogenous variables to model the returns of the Shanghai Stock Exchange Composite (SHCOMP) Index. Our data are based on daily observations between January 4, 2007 and September 24, 2012, totalling 1358 observations. We denote p t and p * t as the daily closing values of the SHCOMP and S&P indices, and define the corresponding daily returns as Y t = 100 log(p t /p t−1 ) and U t = 100 log(p * t /p * t−1 ). We let the expectile of Y t be modeled by where Di, i = 1, · · · , 5, are the dummy variables for the five weekdays. The inclusion of these dummy variables allows the assessment of the day-of-the-week effect. The descriptive statistics and t-test results presented in Table 6 show that Monday, Wednesday and Friday are generally characterized by significant positive returns, while the opposite is observed for Tuesday and Thursday. We use the first 1333 observations for estimation and the remaining 25 observations for forecast evaluation. As in the last example, we use the Epanechnikov kernel function K(u) = 3 4 (1 − u 2 )I(|u| ≤ 1) for local linear smoothing, and the bandwidth selection method described in  Table 7 gives the 95% prediction intervals of the last 25 observations. By setting τ = 0.025 and τ = 0.0975, we have θ(0.025) = 0.0062 and θ(0.975) = 0.9907 based on formula (2.13) that relates expectiles to quantiles. We construct these intervals using the method as described in the previous example. Table 7 shows that in 24 out of 25 cases the prediction intervals contain the true observations. This indicates that the proposed method is effective.

Appendix: Assumptions and proofs of technical results
Assumptions: (A.1) a j (u) is twice continuously differentiable in u ∈ U, j = 0, 1, · · · , p. (A.6) Let f (μ, ν|x 0 , x l ) be the conditional density of (U 0 , U l ) given (X 0 , X l ). Assume that For purposes of technical convenience, we reparameterize the estimating function (2.6) as and ξ minimizes the function The proof of Theorem 1 is based on an approximation of the objective function in (4.5) by the quadratic function given in Lemma A.1 (see below). It can then be shown that the estimator ξ shares the same asymptotic behaviour as the minimizer of the quadratic function which is asymptotically normal. The convexity lemma (Pollard, 1991) plays a key role in the approximation. This proof technique is similar to those used by Fan et al. (1994) and Yao and Tong (1996) A.1, ξ can be explicitly expressed as uniformly for ξ ∈ K which is a compact set of ξ. From (4.7), we have uniformly for u 0 ∈ U. Using Lemma A.2 in conjunction with (4.8) proves Theorem 1. Theorem 2 is an immediate consequence of (4.8) and Lemma A.2 and Slutsky's theorem.      Note:" * * * " indicates significance at the 1% level, " * * " indicates significance at 5% level