Recursive kernel estimator in a semiparametric regression model

Sliced inverse regression (SIR) is a recommended method to identify and estimate the central dimension reduction (CDR) subspace. CDR subspace is at the base to describe the conditional distribution of the response Y given a d-dimensional predictor vector X. To estimate this space, two versions are very popular: the slice version and the kernel version. A recursive method of the slice version has already been the subject of a systematic study. In this paper, we propose to study the kernel version. It's a recursive method based on a stochastic approximation algorithm of the kernel version. The asymptotic normality of the proposed estimator is also proved. A simulation study that not only shows the good numerical performance of the proposed estimate and which also allows to evaluate its performance with respect to existing methods is presented. A real dataset is also used to illustrate the approach.


Introduction
In the classical theory of linear sufficient dimension reduction, we consider the ddimensional random vector X = (X 1 , . . . , X d ) and the real random variable Y, X is the predictor and Y is a univariate response. Many research has been done to find a better link between X and Y. A large amount of this research has assumed that X could be replaced by a linear combination of its components B T X without losing information on the conditional distribution of Y given X, B = (β 1 , . . . , β N ) is d × N matrix (see Duan and Li 1991;Li 1991;Zhu and Fang 1996;Aragon and Sarraco 1997;Nkiet 2008 among many others). This assumption can be expressed as where '⊥ ⊥' stands for independence. Thus, B T X summarises the information in the predictors relevant to predicting Y. An example where (1) holds is the semiparametric model proposed in Li (1991) which is defined by: where F is an arbitrary unknown function on R N+1 and ε is a real random variable that is independent of X. Equation (1) expresses the fact that the projection of X onto the N-dimensional subspace spanned by β T 1 X, . . . , β T N X, named the effective dimension reduction (EDR) space, contains all information about the response variable Y. EDR space is a subspace of R d spanned by B. It is, therefore, necessary to estimate B. To estimate B, two estimation methods have received much attention. They are Sliced Inverse Regression (SIR) (Duan and Li 1991;Li 1991) and Sliced Average Variance Estimation (SAV E) Weisberg 1991, 2000;Li and Zhu 2007). SIR is the first, and the most widely used method to estimate B, due to its simplicity, computational efficiency and generality compared to SAV E. Indeed, Sliced inverse regression uses only first moments E(X | Y), while Sliced average variance estimation uses first and second moments. In this paper, we are interested in the SIR method. For uniqueness, we are interested in a subspace with minimal dimensions. Under mild conditions, the minimal subspace is often uniquely defined in practice and coincides with the intersection of all subspaces that satisfy (1) (Cook 1994(Cook , 1996. This intersection is called the central dimension reduction (CDR) space and is written as S Y | X . In this article, we assume that S Y | X exists.
Since the directions β 1 , . . . , β N of B are, under some conditions, characterised as eigenvectors of the covariance matrix of E(X | Y), the aforementioned estimation problem is based on the estimation of . SIR method is based on slicing the range of Y, see Li (1991) and also Duan and Li (1991). An alternative method based on the kernel method was introduced by Zhu and Fang (1996). Recently, Nkou and Nkiet (2022) used a wavelet-based estimation to construct an estimator of CDR subspace. The resulting estimators of these methods (slices, kernel and wavelet) are the non-recursive estimators. In Bercu, Nguyen, and Sarraco (2012) and Nguyen and Saracco (2010), the authors have proposed and studied recursive estimators of by using the slice method. However, such a recursive estimator has not yet been proposed using the kernel method. That is why, we propose in this paper a recursive estimation method of the CDR space related to this model, based on kernel recursive estimations of the density and regression functions involved in .
The great advantage of recursive estimators over non-recursive estimators is that updating them from a sample of size n to a sample of size n + 1 requires considerably less computation. This property is particularly important in the framework of the estimation of the density and the regression function. These two estimators intervene in the estimation of the matrix by the kernel method (Zhu and Fang 1996). The first recursive version of Rosenblatt's kernel density estimator -and the most famous one -was introduced by Wolverton and Wagner (1969), and was widely studied; see among many others Yamato (1971), Davies (1972) and Devroye (1979). The application of the stochastic approximation algorithm was introduced first by Révész (1973Révész ( , 1977 and then extended by Mokkadem, Pelletier, and Slaoui (2009a) to estimate a regression function. Using this algorithm, Tsybakov (1990) approximated the mode of a probability density; Mokkadem, Pelletier, and Slaoui (2009b) estimated a multivariate probability density. According to the work of the aforementioned authors, this estimator mainly depends on two important parameters, which are the bandwidth and the stepsize of the stochastic algorithm. By making an adequate choice of these two parameters, the proposed recursive estimator could be very competitive compared to the non-recursive kernel estimator especially in terms of computational costs and speed. For the choice of these parameters, one can find some in Slaoui (2014Slaoui ( , 2015, and Mokkadem et al. (2009b).
The basic objective of the present work is to introduce recursive kernel estimators of the density and the regression function defined by applying a stochastic approximation algorithm in the estimation of the CDR space. This construction aims to improve the speed of the estimate while keeping its accuracy as much as possible. We will select the parameters favourable to obtaining our results and to achieve this goal.
The remainder of the paper is organised as follows. In Section 2, we present our recursive estimator built with the recursive estimators of the density and the regression function. In Section 3, we state our main results. Section 4 is dedicated to our simulation results, a real dataset is used to illustrate our approach. Concluding remarks are given in Section 5. Finally, Section 6 is devoted to the proofs of our theoretical results.

Kernel recursive version of SIR
In this section, we present a recursive form of estimation. First, we recall the nonrecursive form introduced by Zhu and Fang (1996).

Kernel non-recursive estimator of
As already mentioned in the introduction, Zhu and Fang (1996) introduced a kernel estimator of = Cov(E(X | Y)). This estimator is built with the kernel estimators of density and regression function. So, we will consider the sample (X i , Y i ) 1≤i≤n which is made of n pairs of random variables independent and identical with the same distribution as (X, Y). Letting f be the density of Y, we suppose that, for any y ∈ R, we have f (y) > 0. In the sequel, for any j = 1, . . . , d, we consider the functions: Y) being the density of the pair (X j , Y). Then, we have the random vector The non-recursive kernel estimators of f and g j are respectively: while the kernel estimator of R j is: with X i = (X i1 , . . . , X id ). Considering that E(X) = 0, a non-recursive kernel estimator of is:

A kernel recursive estimation method of
The estimators involved in the construction of n above are the non-recursive estimators of f and g j . However, these functions have their families of recursive kernel estimators defined by the stochastic approximation method introduced in Mokkadem et al. (2009b). Let (X i , Y i ) 1≤i≤n which is made of n pairs of random variables independent and identical with the same distribution as (X, Y) (used to construct n above) on the one hand, and the (n + 1)st observation (X n+1 , Y n+1 ) which is added. To construct a stochastic algorithm, which approximates the function f at a given point y, we define an algorithm of search of the zero of the function m : z → f (y) − z. Following Robbins-Monro's procedure (Révész 1973(Révész , 1977, this algorithm is defined by setting f 0 (y) ∈ R, and for all n ≥ 0 where the stepsize (γ n ) is a sequence of positive real numbers that goes to zero, h n+1 ) − f n (y), K is a kernel (a function satisfying K(x) dx = 1) and (h n ) is a bandwidth (a sequence of positive real numbers that goes to zero). The stochastic approximation algorithm introduced to recursively estimate the density f at point y can thus be written as This estimator was introduced by Mokkadem et al. (2009b). Throughout this paper, we will consider the sequence π n defined by π 0 = 0 and for n ≥ 1, π n = n i=1 (1 − γ i ). We obviously assume that 0 < γ 1 < 1. From (3) that one can estimate recursively f at the point y by For the recursive kernel estimators of functions (g j (y) = f (y)R j (y)) j∈{1,...,d} defined by stochastic approximation method, Slaoui (2015) uses the approach used to construct those of f. Let (X 1j , Y 1 ), . . . , (X nj , Y n ) be independent, identically distributed pairs of random variables. In order to construct a stochastic algorithm for the estimation of the g j at a point y such which approximates the zero of the function u : z → f (y)r(y) − f (y)z. Thus, this algorithm is defined by setting g j,0 ∈ R and for all n ≥ 0 h n+1 ) − g j,n (y). Then, the estimator g j,n to recursively estimate the regression function g j at the point y can be written as We suppose that, for any j ∈ {1, . . . , d}, g j,0 = 0, and it follows that, from (5) one can estimate g j,n recursively at the point y by Recursive properties (3) and (5) are particularly useful in large samples because f n and g j,n can be updated easily using each additional observation. These families of recursive estimators generalise the family of recursive estimators introduced in Amiri (2009). Indeed, one can check easily that, with the choice of the stepsize we have the families of densities in Amiri (2009). And , we have a more general form, H is the cumulative distribution function of the random variable y−Y. Now using (5), we will build a stochastic algorithm for the estimation of functions R j defined in (2). Posing Let the random vector R n (Y) = ( R 1,n (Y), . . . , R d,n (Y)) T defined for n ∈ N. Using the relation above, we define the algorithm: R 0 (Y) = ( R 1,0 (Y), . . . , R d,0 (Y)) T calculated by (3) and for n 1 Thus, we consider the following estimator of defined by: Using the above relation, we will deduce a recursive form of n . First, we have where With this equality, we get Remark 2.1: Equation (8) can also be written as Since ξ n+1 (Y) depends on Y, with this form, we cannot bring out a recursive relation between n+1 and n . The resulting matrix will not be the one desired. Note the following complexity presented by the estimator n of based on the kernel method introduced by Zhu and Fang (1996): the estimator uses twice the sample (Y i ) 1 i n , first for the construction of R n and secondly as argument of the obtained functions. Thus (8) seems to us to be heavier. However, a recursive method which seems to us simpler than (8) to obtain n+1 is to use (7). We will then have

Remark 2.2:
The approach of estimating (9) could be thought of as dividing the data into two subsets, one containing the n observations and the other containing only one new observation. Thus, it updates the estimate using information from a single data point. However, we may have a batch of p new data rather than just one. It is therefore interesting to establish an estimate that takes this new consideration into account.
From (3), we deduce for p ≥ 1: and Now, from (10) and (11) we deduce the following expression of R j,n+p for p ≥ 1: And we get We consider the random vector In the sequel, we have for n ∈ N * and p ≥ 1 Note that in the definitions of η n+p and η n+p,j above, B n,p (Y) = 0 if p = 0. Finally, noting the matrix n,p by In (12), for p = 1, we find (9) with n + 1. Another method is to use only (9) even when we have p additional data. We will assume that the p data are observed one by one. The idea is therefore to use (8) and insert the p observed data one by one. This procedure seems even simpler and is what we used to obtain the results of the simulations. Zhu and Fang (1996), in order to avoid the effect of the small values in the denominator, we consider

Remark 2.3: As was done in
is a sequence of real numbers which tends to zero as n → +∞ and f n is the recursive estimator defined in (4). The choice of f b n and f b n is becoming more and more common in papers using regression function estimation (Zhu and Fang 1996;Ferré and Yao 2005;Nkou and Nkiet 2019). In this paper, it has the advantage of not imposing a lower bound on f and f b n when the operations are performed on the R j,n . For example, by fixing a constant c strictly positive such that f > c > 0, as done in Zhu and Zhu (2007), the density function of the normal distribution would be excluded.
In view of this Remark 2.3, we will consider where g j,n is the estimator defined in (6). With the random vector ) T , and by considering that E(X) = 0, a non-recursive kernel estimator of is: Obviously, the recursive relation of n is obtained from that of n established in (12) by replacing R n,j by R b n ,j .

Assumptions and main results
In this section, we present our main result on the asymptotic distribution of √ n( n − ). We start with the definition on the class of regularly varying sequences.

Definition 3.1:
Let v * ∈ R and (v n ) n≥1 be a nonrandom positive sequence. We say that Acronym GS stands for (Galambos and Seneta). The notion of GS-sequence was introduced by Galambos and Seneta (1973) to define regularly varying sequences and by Mokkadem and Pelletier (2007) in the context of stochastic approximation algorithms. Note that the typical sequences in GS(v * ) are, for a ∈ R, n v * (log n) a , n v * (log log n) a . The case where a = 0 will be particularly used to obtain some of our results.
We remind that we have an independent, identically distributed sample ..,n of the pair of random variables (Y, X) verifying the model (1). To establish this result, we will need the following assumptions: A1: For any sample X 1 , . . . , X n of independent, identical random variables with the same distribution of X, there exists a sequence G n of strictly positive numbers such that G n ∼ log n and which satisfies Assumption A1 is also valid when X is bounded. It is particularly verified when X follows a multivariate distribution, which would not be the case if X d is bounded by a strictly positive constant.
A2: The density f of Y and the functions g j = fR j , j = 1, . . . , d are 3-times differentiable and their third derivatives satisfy the Lipschitz property: there exists c > 0 such that Assumptions A3 are traditional in the nonparametric estimate and are checked in particular by the gaussian kernel, Epanechnikov kernel and by many other kernels.
A4: The bandwidth (h n ), the sequence (b n ) and the stepsize (γ n ) verify the following conditions: where c 1 and c 2 are numbers satisfying c 1 ≥ 1/5, 0 < c 2 < 1/20 and 1/8 Remark 3.1: Assumption A4 combines the assumptions on (h n ), (b n ) and on (γ n ) used by Slaoui (2013Slaoui ( , 2015, Mokkadem et al. (2009b), and Zhu and Fang (1996) . Moreover, it was shown in Mokkadem et al. (2009b) that the bandwidth (h n ) which minimises the MISE of f n depends on the choice of (γ n ); they show in particular that the sequence (γ n ) = (n −1 ) belongs to this set, under some conditions of regularity of f, and they show that the bandwidth (h n ) holds h n = O(n −1/5 ). On the other hand, in practice, for (b n ), we will consider that b n = min(a, n −c 2 ), where a is a fixed strictly positive number and sufficiently small. This yields a more accurate estimation of f ; see more details in Remark 3.1 in Nkou and Nkiet (2019). It is clear that we always have (b n ) ∈ GS(−c 2 ) despite this consideration. On the other hand, Assumption A4(2) is standard in the framework of stochastic approximation algorithms. It implies in particular that the limit of (nγ n ) −1 is finite. Throughout this paper, we shall use the following notations: (1), for 1 ≤ k, ≤ d and any sequence (a n ) n∈N such that a n ∼ b n as n → +∞.
Assumptions A5 and A6 were made by Zhu and Fang (1996) to establish the asymptotic normality of recursive kernel estimate of .
We now present the main results.

Simulation study and real data application
In this section, we present the results of simulation studies to evaluate the performance of the recursive kernel estimator of SIR and to compare it with the recursive slice estimator of SIR. We have four estimations (the notations used are in parentheses): non-recursive slice version (Li 1991) (S−NR), recursive slice version (Bercu et al. 2012) (S−R), non-recursive kernel version (Zhu and Fang 1996) (K−NR) and recursive kernel version (in this paper) (K−R). We essentially compare the speed of convergence of the estimators illustrated by the correlation coefficient R 2 below proposed in Li (1991). It is the correlation coefficient between two vectors. This criterion measures the squared cosine of the angle between the estimated CDR space ( β j ) and the true CDR space S Y | X (β j ). The closer this coefficient is to one, the better the estimate. That is, when the estimated CDR space is covered by the ( β 1 , . . . , β N )'s that are associated with the N largest eigenvalues of n . This criterion is R 2 ( β j ) defined by: See Li (1991) for more details. Regarding the time taken, we compare the computational times taken by each estimator to accomplish the task. The results were obtained using the same samples. All the methods (S−NR, S−R, K−NR, K−R) have been implemented in R. The study simulation was done with this software. The corresponding codes are available from the author. Note that the results are presented for illustrative purposes only, as a serious analysis between the cited estimation methods, especially the estimation of the CDR space by a recursive method using kernels, may require further investigation. We present these results in three parts. First, we present the results which show the quality of the estimator K−R on its convergence speed (R 2 ( β j )) and on the time saving it presents. Secondly, we look at the time of the other recursive estimator (S − R). Finally, we apply the four estimators on real dataset.

Results with simulated data
We considered the following models Model 1 corresponds to N = 1 with β 1 = (1, 1, 1, 1, 0) T and it can be rewritten by: Y = (β T 1 X) 3 + ε, while Model 2 corresponds to N = 2 with β 1 = (1, 0, 0, 0, 0) T and β 2 = (0, 1, 0, 0, 0) T and it can be rewritten by: Each data set was obtained as follows: X = (X 1 , X 2 , X 3 , X 4 , X 5 ) T is generated from a multivariate normal distribution N (0, I 5 ), where I 5 is the 5 × 5 identity matrix, ε is generated from a standard normal distribution and Y is computed according to the above models. In slice versions, the number of slices is H = 5. The common kernel used is Epanechnikov kernel K(x) = 0, 75(1 − x 2 )1 [−1, 1] (x). For the choice of stepsize (γ n ) and bandwidth (h n ), we favoured the parameters which verify the assumptions used in the proofs. For this reason, we, therefore, considered a remarkable result of the work of Slaoui (2014Slaoui ( , 2015 where (h n ) = (n −0.2 ) and (γ n ) = (n −1 ). The results of the boxplots were obtained from the simulations of 200 independent replications. Note that a comparison between the non-recursive slice version and the non-recursive kernel version was made in Nkou and Nkiet (2022).
Let us recall the difference between recursive methods and non-recursive methods. Having already obtained a first estimator with n data, and that p other data are added, the recursive methods use the results already obtained with n initial data and make an update with the p additional data, while the non-recursive methods restart all the calculations each time the data is added. To obtain the results, in particular those of the calculation of R 2 ( β j ), and to evaluate the computational times we started with the initialisation n = 300, the variations will therefore be made on the additional data p. In this initialisation, very important for a good start, we used non-recursive methods. The algorithm enters the added p data one by one using the recursive methods. Table 1 presents the results of R 2 ( β 1 ) and of R 2 ( β 2 ) obtained from the two models with different values of p; 400, 800, 1200, 1600 and 2000. Recursive methods were used with sample sizes n + p.
First, a presentation of the results obtained. Table 1 presents the results of the convergence speeds of Models 1 and 2, Figure 1 presents the different boxplots obtained with more than 200 replicates of the four estimators obtained from Model 1, Figure 2 presents the different boxplots of the estimators obtained from Model 2 concerning the estimation of β 1 and Figure 3 we show the boxplots of the estimators obtained from Model 2 concerning the estimation of β 2 . And finally, Table 2 shows the computational times obtained for different values of p.
The results illustrate this fact already noted on the recursive estimators: they have a lower convergence speed compared to their non-recursive equivalents. Still, in general, recursive     estimators have a good reaction with R 2 ≥ 0.86. We note, however, a very slow reaction of these to estimate the direction β 2 of model 2. The evolution of R 2 in this case as a function of the increase in p suggests that a large sample size is needed to obtain a valid coefficient: R 2 ≥ 0.86. Another remark: the recursive estimator by the kernel method seems to have a better estimate than the slices method. This superiority is more precise on the estimation of the second direction β 2 in Model 2. However, in Figure 1, we see that the recursive kernel estimation has a greater dispersion compared to the slice method. Table 2 presents the computation times obtained between kernel methods estimation: non-recursive and recursive.
The results of Tables 1 and 2 and the boxplots 1, 2 and 3 present a fact: non-recursive methods are significantly better than recursive methods. Thus, for an estimation of the CDR space and during an update where rapidity is not required, it is preferable to use the non-recursive estimators rather than the recursive estimators.

Computational times of recursive version of SIR
The computational time obtained by an estimator corresponds to the number of operations performed by it. Therefore, the estimator of by the slice method in Li (1991) and Duan and Li (1991) has a very much lower computation time than the estimator of by the kernel method in Zhu and Fang (1996). Indeed, the construction of an element of the estimator of in Zhu and Fang (1996) has a greater number of operations than the construction of the estimator of in Li (1991) and Duan and Li (1991). The results of Table 3 illustrate this fact, it presents the computation time obtained from the non-recursive and recursive sliced versions. Thus the purpose of this article is not to make such an obvious comparison.

Results with real data
We illustrate our approach on a real dataset. The data is on diabetes and correspond to 768 patients. They were collected and made available by the 'National Institute of Diabetes and Digestive and Kidney Diseases' as part of the Pima Indian Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. The patients here are females of ages 21 and above. These data are available at www.kaggle. com. We will apply the four estimators with the Y = Glucose (Plasma glucose concentration over 2 hours in an oral glucose tolerance test) as a response variable and the predictor X = (X 1 , . . . , X 6 ) such that X 1 = BMI (Body mass index), X 2 = Pregnancies (Number of times pregnant), X 3 = DPF (Diabetes pedigree function: a function which scores likelihood of diabetes based on family history), X 4 = Age (years), X 5 = Insulin, X 6 = BP (Diastolic blood pressure). We, therefore, considered the situation Y = F(β T X), and we estimated the direction β = (β 1 , . . . , β 6 ). Since we do not know this direction to be estimated as in a simulation, the criterion used to check the quality of the obtained estimators is the following: we consider the vectors It is the squared multiple correlation coefficient between the variable B T i and the ideally reduced variables B T i (Li 1991). We, therefore, have four estimators: two non-recursive and two recursive. In Table 4 we have the estimators β 1 obtained from the four estimators of a direction β(unknown here) preceded by their respective deviations R 2 ( β 1 ). The results of non-recursive estimators were obtained with n + p = 768 and those of the estimators recursive were obtained with n = 300 and p = 438. For the sliced versions, the number of slices used is H = 5. The results were obtained with normalised data.
On these different estimators obtained, we looked at the estimation rate of these real data according to the different values taken by R 2 ( β). The results are given in Table 5.
For instance, we note that out of the 768 data, each of the estimators produces more than half an R 2 greater than 0.75 and that more than 25 % are estimated with an R 2 greater than 0.98.
Thus the recursive estimator by the kernel method that we propose in this paper seems to react well as well as the existing estimators.

Concluding remarks
In this paper, we have proposed a recursive method to estimate the matrix . This method is obtained by using a method based on a stochastic approximation algorithm, an approximation method making it possible to obtain a recursive estimator of the regression function. We have established the convergence in probability and the asymptotic normality of this estimator. Finally, we have illustrated by theoretical simulations and on real data the good numerical performance of this procedure. We have also shown the advantage of using recursive versions from a computation time point of view. However, we know that the SIR method used here fails in some cases. In particular, SIR will fail to work in symmetric regressions with Y = F(B T X) + ε, where F is a symmetric function of the argument B T X (Cook and Weisberg 1991). Therefore, theoretically, SAV E should be a more powerful method than SIR under regularity conditions to estimate the CDR space. In view of this consideration, another way is proposed for another investigation in order to improve the work proposed here: Propose a recursive method of the SAV E method to estimate the CDR space on the two ways that are: by slices and by kernel. For this last, it will be consider the recursive estimators f n (4) and g j,n (6) and to propose recursive estimators of R k (y) = E(X k X | Y = y) and G k (y) = R k (y)f (y) . The use of the parameter b n Remark 2.3 is recommended as carried out in Nkou (2022). With this parameter, the conditions seem to be more flexible.

Proofs
In this section, we give the proofs of Theorem 3.1 which uses technical lemmas which are given later in Section 6.3 and are themselves based on results on the consistency of f n and the g j,n 's that are stated in Section 6.2. In the sequel, let C, C1, C2, . . . be some finite positive constants, whose values are unimportant and may change. n is the size of a sample that follows the same distribution as (Y, X). We assume that n ≥ n 0 for an n 0 ∈ N.

Proof of Theorem 3.1
Let us denote by λ n k, the (k, )th entry of the d × d matrix n . We have .
Applying the similar argument used by Zhu and Fang (1996) and Nkou and Nkiet (2022) and using the results of Propositions 6.2 and 6.3 we obtain Moreover, by setting . From Lemmas 6.7 and 6.10, we have the following three equalities and It follows that, according to (15), (16) and Lemmas 6.9, 6.10, 6.11 and 6.12, we get Using (17), Lemmas 6.5 and 6.6, we obtain √ n ν k, = √ n λ k, + o(1), where λ k, = E(R k (Y)R (Y)), that is the (k, )th entry of the d × d matrix . Hence, Clearly, and, putting H n = √ n( n − ) and H n k = √ n( λ n k, − λ k, ), we have

Asymptotic properties of f n and g j,n
Before setting the outlines of proofs of the results used to prove Theorem 3.1, we introduce the following Lemmas 6.1 and 6.2 which will be used throughout the proofs.
Moreover, for all positive sequence (α n ) such that lim n→+∞ α n = 0, and all a ∈ R, Lemma 6.1 has been proven in Mokkadem and Pelletier (2008). It is widely used in stochastic approximation algorithms (Mokkadem and Pelletier 2007;Mokkadem et al. 2009b;Slaoui 2015) and it is widely applied throughout the proofs. Let us underline that it is its application, which requires Assumption A4(2). The following lemma enters the same context as the lemma above. It gives a result on the case v n = 1 and will be particularly used to prove the results of the next section.
It is therefore clear that, from the above equality, we have +∞ n=1 exp{−Cε 2 n n 3 h 2 n } < +∞, then we deduce that

Proof:
The proof of this proposition is obtained using Proposition 6.1 and Proposition 1 in Mokkadem et al. (2009b).
Proposition 6.3 is obtained by a method similar to Proposition 6.1 combined with Proposition 1 in Slaoui (2015). Hence we omit the details of its proof.
Proof: From Taylor expansion of g j and A2, we get And the proof of the lemma is obtained by Assumption A3(3).
Lemma 6.4: Under Assumptions A4 and A6, we have: which implies the following equality: On the other hand, by Assumption A6, we have Consequently, from (18) we get In the sequel, we have for V i,n : then, according to Lemma 6.3, It follows that On the other hand, we can notice that, if c > 0 h c n ∈ GS(−cc 1 ). Moreover, the first equality of Lemma 6.1 ensures that Then, using (20), we obtain √ nb −1 n π n n i=1 π −1 i γ i h 4 i ∼ √ nb −1 n h 4 n . From Assumption A4, we have √ nb −1 n h 4 n ∼ n −(4c 1 −c 2 −1/2) < n −1/4 . Therefore, we deduce that Combining (19) and (21), we have the proof of lemma.
The result of Lemma 6.6 is obtained using the similar method used in Zhu and Fang (1996), then we omit the details here.

Proof:
The proof is identical to that of step 2 of the proof of Theorem 2.1 in Zhu and Fang (1996).
To obtain a similar result for the case where s = 2, we will need the lemma below.
Using Lemma 6.8 together with (23), we get