Partial linear additive distortion measurement errors models

Abstract We consider partial linear regression models when all the variables are measured with additive distortion measurement errors. To eliminate the effect caused by the distortion, we propose the conditional mean calibration to obtain calibrated variables. A profile least squares estimator for the parameter is obtained, associated with its normal approximation based and empirical likelihood based confidence intervals. For the hypothesis testing on parameters, a restricted estimator under the null hypothesis and a test statistic are proposed. A smoothly clipped absolute deviation penalty is employed to select the relevant variables. The resulting penalized estimators are shown to be asymptotically normal and have the oracle property. Lastly, a score-type test statistic is then proposed for checking the validity of partial linear models. Simulation studies demonstrate the performance of our proposed procedure and a real example is analyzed as illustrate its practical usage.


Introduction
In many applications of regression analysis, variables of interest may be observed with measurement errors.A partial linear additive distortion measurement errors model is written as where Y is an unobservable response variable, X ¼ ðX 1 , X 2 , :::, X p Þ T is an unobservable continuous covariate vector (the superscript "T" denotes the transpose operator on a vector or a matrix throughout this paper), b 0 is an unknown p Â 1 parameter vector on a compact parameter space H b & R p , Z is an unobservable univariate covariate, gðÁÞ is an unknown smooth function, is an error term with finite variance, satisfying EðjX, ZÞ ¼ 0 and Eð 2 jX, ZÞ < 1: The distorted variables e Y and ð e X, e ZÞ are the observed response variable and covariates.The confounding variable U 2 R 1 is observable and independent of ðY, X, ZÞ: The additive distortion function wðÁÞ is a p Â 1-dimensional vector: ðw 1 ðÁÞ, :::, w p ðÁÞÞ T : Moreover, we assume that ð/ðÁÞ, w r ðÁÞ, nðÁÞÞ, r ¼ 1, :::, p, are unknown continuous distortion functions.It is noted that /ðÁÞ, wðÁÞ and nðÁÞ distort unobserved Y, X and Z in additive fashions.More generally, this model is useful to describe the relationship between variables that are influenced by a confounding variable, and see if this relationship still exists once the effect of the confounding variable has been removed.
This additive fashion of measurement errors on response variable and covariates is commonly seen in biomedical and health-related studies.Especially, in some medical experiments, obtaining the true values of Y and ðX, ZÞ may be very expensive or even impossible, instead, the surrogates of e Y and ð e X, e ZÞ are available.This motivates and produces a number of measurement error models.The additive distortion measurement error models, for example, e X ¼ X þ wðUÞ differs from the classical errors-in-variables model W ¼ X þ e, where W is error-prone, X is error-free and e is the measurement error.Although both kinds of measurement error models have similarly additive structure, they differ in terms of their estimation and statistical inference.In the literature, most of the proposed approaches for the model W ¼ X þ e depend on the normality assumption (or other distributions) for the unobserved covariates X and measurement error e.If the distribution assumptions are not imposed, it is necessary to include some moment conditions.For example, the least squares estimator between the response variable and W in a linear regression model results in attenuation bias (Fuller 1987), which is caused by the variance (covariance matrix) of e.In this case, a key issue is how to eliminate the attenuation bias to obtain an unbiased or consistent estimator, such as by using the attenuation reduction method (Fuller 1987) or the simulation extrapolation method (Carroll et al. 2006;Li, Zhang, and Feng 2016;Yang, Tong, and Li 2019, SIMEX).In model (1.1), the response variable Y and covariates ðX, ZÞ are both contaminated by a common confounding variable U. Unlike the measurement error e, the additive distortion functions /ðuÞ, wðuÞ and nðuÞ can be estimated by the kernel smoothing In this paper, we re-visit the partial linear models with additive distortion measurement errors.When the covariate Z in the nonparametric part is observed without distortion, i.e., nðuÞ 0, the estimation and hypothesis test for the partial linear models were studied in Zhang, Zhou, et al. (2017) and Zhang (2019).In this paper, the covariate Z is unobservable and distorted, so the method proposed in Zhang et al. (2016); Zhang, Zhou, et al. (2017) and Zhang (2019) fails to work in that the covariate Z is distorted by the confounding variable.For the partial linear models considered in this paper, there is no literature studying the estimation and statistical inference when the nonparametric part is unobserved and distorted.It is remarkable that extension of the direct-plug-in estimation procedure Feng, Chen, and Zhang (2020); Zhang, Lin, et al. (2019) is by no means trivial.One of the difficulties lies in the profile least squares estimator of b 0 which is required to estimate EðYjZÞ and EðXjZÞ: The nonparametric part Z is unobserved and distorted, we can use the calibration estimation method proposed in Feng, Chen, and Zhang (2020) to obtain b Z and use local linear estimation procedure to obtain EðYjZÞ and EðXjZÞ: But, it is still unclear in statistical theory that the statistical inference of parameters (such as the confidence intervals, hypothesis test for the parameters, variable selection and model checking) by adopting the calibrated variable b Z ¼ e Z À b nðUÞ to estimate EðYjZÞ and EðXjZÞ: Recently, Zhang, Lin, and Feng (2020) studied the partial linear model to solve the estimation of EðYjZÞ and EðXjZÞ under the multiplicative distortion e Z Ã ¼ n Ã ðUÞZ rather than the additive distortion e Z ¼ nðUÞ þ Z, and this is very different from the content in this paper.We will study these problems for model (1.1) in a general setting.
In this paper, we first obtained the calibrated variables f b Y , b X, b Zg: With these calibrated variables, a profile least squares estimator is obtained.We derive the asymptotic normality of this estimator and further estimate its asymptotic covariance matrix to construct asymptotic confidence intervals.We also conduct the empirical likelihood based statistic to construct confidence intervals for the parameters.Moreover, we present the asymptotic results of the local linear estimator of EðX r jZ ¼ zÞ and g(z), these theoretical results are new in the additive distortion literature.Secondly, we consider the problem of testing whether b 0 satisfies some linear constraints or not.A restricted estimator and a test statistic with restriction under the null hypothesis are proposed.Thirdly, to implement variable selection, we propose a penalized least squared method by the smoothly clipped absolute deviation (SCAD) method (Fan and Peng 2004).Lastly, we propose the score type statistic to develop a lack-of-fit test for checking the adequacy of partial linear regression model with additive distortion measurement errors in all variables.The quadratic form of the scaled test statistics are shown to be asymptotically chi-square distributed under the null hypothesis and have noncentral chisquare distribution under local alternative hypothesis which converges to the null hypothesis.We conduct Monte Carlo simulation experiments to examine the performance of the proposed estimation and test procedures.
The rest of the paper is organized as follows.In Section 2, we introduce the calibration procedure and present some asymptotic results of the estimators.In Section 3, we construct the confidence intervals for parameters.In Section 4, we consider the hypothesis testing problem under some restrictions.In Section 5, a variable selection procedure is proposed.In Section 6, we develop a score-type statistic for checking the adequacy of partial linear regression models, and give theoretical properties of the test statistic.In Section 7, we conduct Monte Carlo simulation experiments to examine the performance of estimators and test statistics.An analysis of the energy efficiency data is be reported in Section 8.All technical proofs of the asymptotic results are given in "online-supplementary materials".

Calibration procedure
In this subsection, we first calibrate the unobserved ðY, X, ZÞ using the observed i:i:d: Conditions (2.1) are for identifiability which introduced by S ¸ent€ urk and M€ uller (2005,2006).These assumptions on /ðÁÞ, w r ðÁÞ's and nðÁÞ are implied by the natural assumption that the mean distorting effect should correspond to no distortion, i.e., Eð e and Eð e ZÞ ¼ EðZÞ þ E½nðUÞ ¼ EðZÞ: If we do not impose condition (2.1), for example, neither E(Y) nor E½/ðUÞ can be identified (or estimated) because both E(Y) and E½/ðUÞ are unknown.The identifiability condition (2.1) is analogous to the zero mean errors in classical additive measurement errors W ¼ X þ e with EðeÞ ¼ 0 in the literature.See for example, Li, Zhang, and Feng (2016); Yang, Tong, and Li (2019).
Under the independence condition between U and ðY, X, ZÞ, the identifiability conditions and ) Using these equations (2.2)-(2.4), the local linear estimators are defined as From the proof of theorems in Appendix, the estimated distortion functions b /ðU i Þ, b nðU i Þ and b w r ðU i Þ, r ¼ 1, :::, p, i ¼ 1, :::n, have asymptotic biases with the convergence order Oðh 2 Þ: Under the condition (C4) in subsection 2.3, we have nh 4 !0 as n ! 1, and then ffiffiffi n p h 2 !0: Consequently, these biases can be analytically controlled in an asymptotical way such that they will not have impact on the estimator of b 0 with root-n convergent rate.For the nonparametric parameters such as g(z), the asymptotic bias terms involved in b /ðU i Þ, b nðU i Þ and b w r ðU i Þ will have impact on the asymptotic results of nonparametric estimators with root-(nh) convergent rate.The detailed theoretical results will given in subsection 2.3.

Profile least squares estimation
In the following, we define A 2 ¼ AA T for any matrix or vector A. For model (1.1), we have that Based on model (2.8), a profile least squares estimator of b 0 (at the population level) is obtained as (2.9) Recalling that U is independent of ðY, X, ZÞ, the identifiability condition (2.1) entails that Eð e XjZÞ ¼ EðXjZÞ and Eð e Y jZÞ ¼ EðYjZÞ: Thus, the equation (2.9) is equivalent to where where b V nd ðzÞ ¼ 1 Theorem 3, we present the asymptotic results of b b and b g ðzÞ: It is noted that the residual-based estimation proposed in Zhang and Feng (2017)

Asymptotic results
We first list the assumptions needed in the following theorems.
(C1) The distortion functions /ðuÞ, nðuÞ and w r ðuÞ, r ¼ 1, :::, p, have continuous third order derivatives for all u 2 ½U L , U R , where ½U L , U R denotes the compact support of U. The density function f U ðuÞ of the random variable U is bounded away from 0 and satisfies the Lipschitz condition of order 1 on ½U L , U R : (C7) The weight function wðÁ, ÁÞ satisfies E½w 2 ðX, ZÞ < 1, E½w 2 r ðX, ZÞ < 1, where w r ðx, zÞ ¼ @wðx, zÞ @x r , r ¼ 1, :::, p and w pþ1 ðx, zÞ ¼ @wðx, zÞ @z : These are mild conditions that are satisfied in most practical situations.Condition (C1) is used in the study of additive distortion measurement error models for estimating the unknown distortion functions, see for example, Feng, Chen, and Zhang (2020); Zhang (2019); Zhang, Chen, et al. (2017); Zhang and Feng (2017); Zhang, Lin, et al. (2019); Zhang and Yang (2022).Conditions (C2)-(C3) contain moment conditions and smoothing conditions which are used in the calibration procedures and the proof of theorems, see for example, H€ ardle and Liang (2007); Zhang, Feng, andXu (2015, Zhang et al. 2016); Zhang, Zhou, et al. (2017).Condition (C4) is the usual condition for the kernel function KðÁÞ: The Epanechnikov kernel satisfies this condition.Condition (C5) is the condition for the bandwidths ðh, h 1 Þ in the nonparametric kernel smoothing.The under-smoothing condition nh 4 0 !0 assures that the bias of the regression estimation is negligible to obtain a root-n consistent estimator of parameter b 0 and test statistics, see Theorem 2, Theorems 4-9 in the following.Condition (C6) is the technique condition involved in SCAD variable selection procedure, see Fan and Peng (2004); Liang et al. (2010); Peng and Huang (2011).Condition (C7) is used in the score-type test statistics for the model checking problems (Feng, Chen, and Zhang 2020;Zhang, Zhu, et al. 2019).
where, V r ðzÞ ¼ Remark If distortion function nðuÞ is a linear function such that nðuÞ 2 G w , where G w ¼ fwðuÞ : wðuÞ ¼ au þ b, E½wðUÞ ¼ 0, for some constants a, bg: We have n 00 ðuÞ ¼ 0 and the j r ðzÞ ¼ 0, then the asymptotic bias term j r ðzÞ vanishes.Then, the asymptotic bias of b s X r ðzÞ in Theorem 1 is the same as the classical local linear estimator of s X r ðzÞ (Fan and Gijbels 1996) when the variables X r 's are exactly observed without additive distortion.The extra term Varðw r ðUÞÞ involved in the asymptotic variance is because we use the partial calibrated variables f e X ri , b Z i g n i¼1 to estimate s X r ðzÞ ¼ Eð e X r jZ ¼ zÞ, r ¼ 1, :::, p, i.e., the additive distortion w r ðuÞ enlarges the asymptotic variance.In addition, if X r 's observed exactly without additive distortions (w r ðuÞ 0) and nðuÞ 2 G w , the asymptotic results obtained in Theorem 1 is the same as the classical local linear estimator of s X r ðzÞ (Fan and Gijbels 1996).
Remark.It is worth noting that the asymptotic variance of the estimator b b is the same with the asymptotic covariance matrix obtained in H€ ardle and Liang (2007); H€ ardle, Liang, and Gao (2000) and also the same with the asymptotic covariance matrix of the transformation based estimator obtained in Zhang, Zhou, et al. (2017).In other words, the profile least squared estimation procedure eliminates the effect caused by the additive distortion functions /ðuÞ, w r ðuÞ's, and nðuÞ, i.e., the effect of additive distortion measurement errors vanish.If we further assume that satisfies E½ 2 jX, Z ¼ r 2 , then R ¼ r 2 R and the asymptotic covariance matrix of Theorem 2 becomes r 2 R À1 : where V g ðzÞ ¼ Remark.If distortion function nðuÞ is a linear function such that nðuÞ 2 G w , we have n 00 ðuÞ ¼ 0 and then the asymptotic bias term qðzÞ vanishes.In this case, the asymptotic bias of b g ðzÞ in Theorem 3 is the same as the classical local linear estimator of g(z) in H€ ardle and Liang (2007); H€ ardle, Liang, and Gao (2000) when the variables fY, X, Zg are exactly observed without additive distortions.The extra term Varð/ðUÞ À w T ðUÞb 0 Þ involved in the asymptotic variance is because we use the partial calibrated variables . According to Theorem 2, the estimator b b is root-(n) convergent, faster than the nonparametric convergence rate root-(nh 1 ).Thus, the asymptotic result of b g ðzÞ is equivalently obtained from the "ideal" partial calibrated vari- Based on this observation, the bias term qðzÞ and additional variance term Varð/ðUÞ À w T b 0 Þ in Theorem 3 is analogous to the bias term j r ðzÞ and additional variance term Varðw r ðUÞÞ in Theorem 1.When the additive distortions satisfy nðuÞ 2 G w and Pð/ðUÞ À w T ðUÞb 0 ¼ 0Þ ¼ 1, the asymptotic results obtained in Theorem 3 is the same as the classical local linear estimator of g(z) in Fan and Gijbels (1996); H€ ardle and Liang (2007); H€ ardle, Liang, and Gao (2000).

Asymptotic normal approximation
According to Theorem 2, the ð1 À aÞ Â 100% ð0 < a < 1Þ confidence interval for b 0r can be obtained with the estimated asymptotic variances.Let b Let e r be the p-dimensional vector with r-th position 1 and 0 elsewhere, r ¼ 1, :::, p, and where b b r is the r-th component of b b, and z a=2 is the quantile satisfying PðNð0, 1Þ !z a=2 Þ ¼ a=2:

Empirical likelihood method
Another popular method to construct confidence intervals without estimating the asymptotic variance (or asymptotic covariance matrix) is the empirical likelihood (EL) method proposed by Owen (1991).EL uses a likelihood-type inference, and this method has certain inherent advantages over resampling methods: it determines the shape of the confidence regions according to data structure, and it combines data from multiple sources, and it facilitates incorporating side information.Briefly, it is a useful tool for making statistical inference when it is not too easy to assign a distribution to data.The method of empirical likelihood is now attracting serious attention from researchers in econometrics and biostatistics, as well as from statisticians.See, for example, Cui et al. (2009); Kiwitt and Neumeyer (2012); Lian (2012); Liang et al. (2009); Yang, Li, and Tong (2015); Zhu et al. (2010).In the following, we construct confidence intervals based on the EL principle.In general, the EL method needs an auxiliary random vector Recalling that model (2.8) is a linear regression model, at the population level, the auxiliary random vector can be constructed as Use the Lagrange multiplier method, we have b Theorem 4. Suppose conditions in Theorem 2 hold, then b l n ðb 0 Þ asymptotically converges in distribution to v 2 p , namely, a centered chi-squared distribution with p degrees of freedom.
Using Theorem 4, we can construct a confidence region of b 0 by I a ¼ fb 0 : b l n ðb 0 Þ c a g, where c a denotes the a quantile of the v 2 p distribution.If we focus on the confidence intervals for the parameter b 0r , we can construct the EL statistics as

Hypothesis testing
In the previous subsection, we considered the estimation of b 0 : In many applications, with sample information, we may have some prior information about the parameters that can be used to improve the estimation.That is, in addition to the model given in (1.1), assuming that non sample information also exist in the form of the possible constraints: where A is a known k Â p full-rank matrix, rankðAÞ ¼ k p and b is a k-vector of known constants.The full row rank assumption is chosen for convenience and can be justified by the fact that every consistent linear equation can be transformed into an equivalent equation with a coefficient matrix of full row rank.The restricted model (4.1) are widely applicable in the problem of general hypothesis testing in regression models, specially, this hypothesis test is usually used to check the special structure of parameters b or the influence of the components of X.
If the null hypothesis H 0 is true, the condition Ab 0 ¼ b can be used to estimate b 0 : Recalling model (2.8) is a linear regression model, a restricted least squares estimation procedure with Lagrange multiplier technique is proposed as: where k is a k Â 1 vector of the Lagrange multipliers.Differentiating Q n ðb, kÞ with respect to b and k, and equation (4.5) entails that With (4.7), we obtain We use the expression k in (4.8) and substitute it into (4.6), and the restricted profile least squares estimator of b 0 is We now present the asymptotic normality of b b R : Remark.From the definition of X A , it is seen that AX A ¼ 0: Then, the asymptotic Intuitively, if the null hypothesis H 0 is false, i.e., Ab 0 6 ¼ b, the value of A b b À b should be significantly large.The test statistic is defined as Theorem 6. Suppose conditions in Theorem 2 hold, under the null hypothesis H 0 , we have k is a centered chi-squared distribution with degrees of freedom k.Next, we consider the local alternative hypothesis The asymptotic properties of b b R and T n under the local alternative hypothesis H 1n are given in the following theorem.
Theorem 7. Suppose conditions in Theorem 2 hold, under the local alternative hypothesis is the noncentral chi-squared distribution with degrees of freedom k, and p c is the non-centrality parameter.

Variable selection
Variable selection is very important in the process of model building.In real data analysis, selecting relevant variables from a regression model with a large number of covariates is very important because keeping only the relevant variables in the model improves the quality of estimation, prediction, and interpretation.Thus, variable selection is of fundamental interest in statistical modeling.Model (2.8) is linear in b 0 , and it is interesting to select covariates in the sense that only a few covariates have actual influence on the response variable.Recently, variable selection based on the penalty is more appealing than other criteria such as stepwise regression or the best subset selection, because the latter procedures suffer from several drawbacks such as the lack of stability and lack of incorporating stochastic errors inherited in the stage of variable selection.Various penalized variable selection methods have been developed for statistical models; for example, least absolute shrinkage and selection operator (Tibshirani 1996, LASSO) and smoothly clipped absolute deviation (Fan and Peng 2004, SCAD).These methods select important variables and estimate the regression coefficients of the covariates simultaneously.Such variable selection methods have also been applied to frailty models, which are needed in case of partial linear regression models, for example, (Lian, Liang, and Wang 2014;Wang et al. 2014).In this section, we use the smoothly clipped absolute deviation (Fan and Peng 2004, SCAD) penalty function to obtain the penalized least squares estimators.The SCAD penalty function p f ðÁÞ satisfies p f ð0Þ ¼ 0, p 0 f ð0þÞ > 0, and its first-order derivative is , where, a is some positive constant with a > 2 and f is a tuning parameter.Fan and Peng (2004) suggested using a ¼ 3.7 from Bayesian statistical point of view, and this value is used throughout this paper.So far, very little attention has been paid to the problem of how to conduct variable selection in the additive distortion measurement error models.Zhang and Feng (2017) considered to use SCAD penalty functions for simultaneously variable selection and parameter estimation in the linear part of the partial linear single-index regression models, and there are no additive distortion effects on the covariates in the single-index structure.There is no literature to discuss the variable selection problem for the partial linear model considered in this paper, especially when the covariate in the nonparametric component is also distorted with additive distortion.
We propose the following SCAD penalized estimator: where p f ðÁÞ is the SCAD penalty function with the tuning parameter f.We study the sampling property of the resulting penalized least squares estimators.Without loss of generality, assume that b 0 ¼ ðb T 01 , b T 02 Þ T , where b 01 is p 0 Â 1 nonzero components of b 0 , and b 02 is a ðp À p 0 Þ Â 1 vector with zeros.In addition, denote that X 1 consists of the first p 0 components of X, and Remark.The SCAD penalty satisfies the oracle properties and these can perform well as the oracle procedure in terms of selecting out non-zero components of b 0 , i.e., b b P, 2 ¼ 0 holds true with probability tending to one, and keeping ffiffiffi n p -normality with an extra bias ffiffiffi n p R f 1 : Under the conditions ffiffiffi n p R f 1 !0 and R f 1 !0, it is seen that the asymptotic result of Theorem 7 (b) is the same as Theorem 2 as if we had known those non-zero components of b 0 beforehand.Theorem 7 also indicates that the proposed variable selection procedure possesses the oracle property with proper choices of tuning parameter f: We discuss the selection of the tuning parameter in the following.
For selection of the tuning parameter f j 's, we adopt the BIC selector suggested by Liang et al. (2010).It is noted that the minimization of BIC over a p-dimensional regularization parameters ff 1 , :::, f p g is difficult because the computational time is very expensive.So we follow Liang et al. (2010)'s procedure by reducing p-dimension to one dimension: let f j ¼ f 0 b r j , and b r j 's are defined in (3.1).The BIC score for f 0 can be defined as , where b , where b b P, f is the resulting penalized estimator of b 0 with tuning parameter f ¼ ðf 1 , :::, f p Þ T , where f j ¼ f 0 b r j : Thus, the minimization problem over f j 's is reduced to a one-dimensional minimization problem through f 0 .The minimizer of the tuning parameter f 0 can be obtained by a grid search.In our simulation, the range of f 0 is selected to be wide enough so that the BICðf 0 Þ score reaches its minimum approximately in the middle of the range.In detail, we set a 50 grid points ff 0, 1 < f 0, 2 < ::: < f 0, 50 g: According to (5.2), we calculate BICðf 0, s Þ, s ¼ 1, :::, 50 and obtain f 0, be ¼ argminfBICðf 0, s Þ, s ¼ 1, :::, 50g: Then, the final tuning parameters are obtained as f j ¼ f 0, be seð b b j Þ, j ¼ 1, :::, p: The gird number 50 is based our experience for simulations.In practice, the range of ðf 0, 1 , f 0, k Þ was selected to be wide enough so that the minimizer of fBICðf 0, s Þg k s¼1 was approximately at the center of the range, and k grid points were set over the range of f 0 .

Model checking
In this section, we consider model checking problem: gðZÞ, a:s: for some gðÁÞ, b 0 : To check whether the PLMs provide a satisfactory fit to data, there exist many methods such as residual-marked empirical processes method, the detection of the difference between nonparametric and parametric regressions, and the Score-type tests.See for example, Cook and Weisberg (1982); H€ ardle and Mammen (1993); Stute and Zhu (2005); Xu and Guo (2013); Xu and Zhu (2015); Zhu and Cui (2005).Score-type tests have a long tradition in statistics history, and the score-type tests are usually optimal once the direction from which the alternative tends to the null model has been specified.We adopt score-type tests using EðjX, ZÞ ¼ 0 under the null hypothesis H Ã 0 with an appropriate weight function of ðX, ZÞ for the partial linear additive distortion errors models (1.1).An ideal test statistic with a proper weight function wðX, ZÞ 2 R 1 is defined as Under H Ã 0 , we have E wðX, ZÞ ½ ¼0, thus, the ideal statistic T Ã n, ideal asymptotical converges a normal distribution with mean zero.We can see that the test statistic T Ã n, ideal can not be used directly because i is unavailable and ðX i , Z i Þ is unobservable and distorted.As a remedy, we use the calibrated variables and residuals to define an executable test statistic.Recalling the definition of b i used in subsection 3.1, the proposed test statistic based on (6.1) is defined as Before Theorem 8, we define the following expression: Theorem 9. Suppose conditions (C1) -(C5) and condition (C7) hold, a. under the null hypothesis H Ã 0 , we have , b. under the local alternative hypothesis for some nonzero function mðX, ZÞ, we have where, 1 ¼ E fX À EðXjZÞgmðX, ZÞ ½ ; moreover, Remark.If the weight function wðx, zÞ satisfies E½wðX, ZÞjZ ¼ 0, then the asymptotic expression of T Ã n reduces to , (6.3) the asymptotic expression is also a score test statistic which is ?and b g ?ðÁÞ are the profile least squares estimator and local linear estimator proposed in Liang, H€ ardle, and Carroll (1999) and H€ ardle, Liang, and Gao (2000).Moreover, there is no distortion effects in the asymptotic expression (6.3), and such weight function with E½wðX, ZÞjZ ¼ 0 can eliminate /ðUÞ, w r ðUÞ and nðUÞ in an asymptotic way.
Under the local alternative hypothesis H Ã 1n , if mðX, ZÞ ¼ lðZÞ, 1 ¼ 0 and the bias term of b b vanishes, i.e., the asymptotic result of b b is the same as that in Theorem 2. For T Ã n , the asymptotic shift reduces to E½lðZÞwðX, ZÞ; moreover, if we choose the nonzero "ideal" weight function wðX, ZÞ ¼ lðZÞ 6 0, the asymptotic shift becomes to E½l 2 ðZÞ, which is a nonzero constant.On the other hand, the statistic T Ã n fails to detect both the null hypothesis H Ã 0 and the local alternative hypothesis H Ã 1n when wðX, ZÞ ¼ mðX, ZÞ ¼ b T 0 ½X À EðXjZÞ: because E½wðX, ZÞjZ ¼ 0 and also J Ã n 0: To avoid this, we have to avoid choosing such weight function in practical use.
To conduct the test, the asymptotic variance AvarðT Ã n Þ should be estimated.Define where, b g 0 ðzÞ is obtained in (2.12), and b w X ðzÞ is the local linear estimator of E½wðX, ZÞjZ ¼ z, which is obtained based on the data set fwð b Finally, we propose the standardized test statistic with the quadratic form If H Ã 0 is true, the value of T Ã2 n, Sc is small and T Ã2 n, Sc asymptotically follows a chi-squared distribution with one degree of freedom.If H Ã 0 is not true, value of T Ã2 n, Sc is large and T Ã2 n, Sc asymptotically follows a noncentral chi-squared distribution under the local alternatives H Ã 1n when E½mðX, ZÞwðX, ZÞ À E wðX, ZÞX T Â Ã R À1 1 6 ¼ 0:

Implementation
Simulation studies are conducted in this section to show the performance of the proposed methods.The Epanechnikov kernel KðtÞ ¼ 0:75ð1 À t 2 Þ þ is used.For the parameter estimation, the bandwidths h and h 1 should to be chosen according to condition (C4), and the optimal bandwidths for h and h 1 can not be obtained because undersmoothing (nh 4 0 !0) is necessary.The consequence of under-smoothing is that the biases of the non-parametric estimates are kept small and preclude the optimal bandwidths.The asymptotic covariance matrices in Theorems (expect Theorem 2) depend on neither the bandwidth h and h 1 nor the kernel function K(t).Hence, we can use the rule of thumb: , the sample deviation of fU i g n i¼1 ; and Z i is also used by a standard plugin bandwidth (Wand and Jones 1995) for local linear estimators computed from the data f e Z i , U i g n i¼1 : The rule of thumb and plug-in bandwidth methods mentioned above are fairly effective and easy to implement in practice.Our numerical results were stable when we shifted several values around these data-driven bandwidths.
(1.2) Estimation of g(z).Performance of estimator b g ðzÞ is evaluated by the average squared error (ASE) where fz 1 , :::, z n 0 g are measured at grid points evenly distributed from ½À1, 1, and n 0 ¼ 400: In Table 2, we report the mean and standard errors of ASE for the true estimator b g T ðzÞ (the local linear estimator using , the proposed estimator b g ðzÞ, and two naive estimators b g N1 ðzÞ (the local linear estimator using f e ) and b g N2 ðzÞ (the local linear estimator using f e . We make a comparison between b g N1 ðzÞ, b g N2 ðzÞ and b g ðzÞ to see how the distortion function nðUÞ affects the estimation of g(z).In Table 2, we can find that the mean values of b g ðzÞ are close to   2 again coincide with the simulation results in Table 1 that ignoring the distortion nðUÞ eventually violates the estimation no matter how large the sample size is.
(1.3) Confidence intervals.We report the 95% normal approximation (NA) confidence intervals and the empirical likelihood (EL) confidence intervals for b 0s , s ¼ 1, 2, 3.The simulation results are reported in Table 3.In Table 3, when the sample size n gets larger, the EL confidence intervals show satisfactory performances in terms of the average lengthes of the confidence intervals as well as the coverage probabilities, while NA confidence intervals have wider intervals and larger coverage probabilities than the EL method.It is worth noting that the EL method does not need to estimate the asymptotic variances of estimators, while the NA methods need.Generally, the NA asymptotic intervals and the EL intervals are both recommended when the sample size is large in practice.
(1.4) Restricted Estimator.We consider the restricted estimator under two constraints 4, the values of MSE of the restricted estimator for b 02 with A 1 are much smaller than those obtained in Table 1 for case 1, and the values of MSE for b 01 and b 03 also improve.It indicates that condition A 1 can improve the estimation efficiency for b 02 without losing much estimation efficiency for b 01 and b 03 .For A 2 , the mean squared errors of the restricted estimator for b 0s s ¼ 1, 2, 3 are smaller than those obtained in Table 1, which implies that A 2 definitely improves the estimation efficiency for b 0 : Example 2. In this example, we conduct 1000 simulations and use the test statistic T n for the following hypothesis testing problem: where c ¼ 60:1, 60:2, 60:3 for the alternative hypothesis H 1 : We set A ¼ ð0, 0, 1Þ for the test statistic T n in this example.The parameter b 0 is set to be b 0 ¼ ð2, À 1, 0Þ T under the null hypothesis H 0 and b 0 ¼ ð2, À 1, cÞ T under the alternative hypothesis H 1 : The sample size n in this example is chosen to be n ¼ 300, n ¼ 500 and n ¼ 1000.In Table 5, it can be seen that the rejection probabilities are all close to the nominal size a ¼ 0:01, a ¼ 0:05 and a ¼ 0:10 under the null hypothesis H 0 (c ¼ 0 in Table 5).As the absolute values of c increase, the powers of test statistic T n increase rapidly.We can also see that as sample size n increases, the power function curves tend to one, which shows that the test statistic T n is powerful for the hypothesis test (5.2).
To measure the selection and estimation accuracy, we denote x u, b 0 , x c, b 0 and x o, b 0 as the proportions of underfitted, correctly fitted and overfitted models.In the case of Table 3. Simulation results of confidence intervals."NA" stands for the normal approximation and "EL" stands for the empirical likelihood."Lower" stands for the lower bound, "Upper" stands for upper bound, "AL" stands for average length, "CP" stands for the coverage probabilities.overfitted models, the labeled "1", "2" and "! 3" are the proportions of models including 1, 2 and more than 2 insignificant covariates.Denote Mse b 0 as the mean squared error jj b b P À b 0 jj 2 2 , where b b P is the the final penalized estimators.In addition, "C b 0 " and "IN b 0 " are the average number of the zero coefficients that were correctly set to be zero, and the average number of the non-zero coefficients that were incorrectly set to be zero, respectively.
In Table 6, we report the true penalized estimator (using the true covariates ðY, X, ZÞ), the penalized estimator b b P , and the naive penalized estimator (directly using the observed data e Y , e X, e Z).It can be found that values of "C b 0 " for the true penalized functions are calculated at the nominal size a ¼ 0:01, a ¼ 0:05 and a ¼ 0:10: The rejection probabilities are reported in Table 7.All empirical levels are close to 0.01, 0.05, 0.10 when c ¼ 0, which indicates that our proposed model checking method can provide proper rejection probabilities.As c increases, power functions increase rapidly to one when the sample size n ¼ 500 and n ¼ 1000.

Real data analysis
We apply our method to analyze the energy efficiency data as an illustration.The data set is available on the web site http://archive.ics.uci.edu/ml/datasets/Energy+efficiency.This data set contains 768 samples, and it is about the energy analysis using different building shapes simulated in Ecotect.The buildings differ with respect to the surface Area-X 1 , wall area-X 2 , relative compactness-X 3 , roof area-Z and cooling load-Y.The confounding variable for this data set is heating load-U.It is interesting to investigate the association between the cooling load-Y, the area features X s 's and relative compactness-Z by a partial linear regression model (1.1).
We The 95% confidence intervals of b 0r 's based on the normal approximation method are obtained as ðÀ0:1590, À 0:1284Þ, ð0:0910, 0:0975Þ and ðÀ54:1758, À 26:6821Þ, respectively.The empirical likelihood confidence 95% intervals are ðÀ0:1458, À 0:1416Þ, ð0:0913, 0:0973Þ and ðÀ42:2600, À 38:6050Þ, respectively.Both confidence intervals exclude zero which implies that b 0r 's are non-zero at the 5% significant level.Moreover, we can see that the empirical likelihood confidence interval of b 02 is similar to the normal approximation, but the empirical likelihood likelihood confidence intervals of ðb 01 , b 03 Þ are much shorter than the ones obtained from the normal approximation method.In Figure 4, we present the patterns of b g ðzÞ: The plot in Figure

Discussions and further research
In this paper, we have studied the estimation, hypothesis testing, variable selection and model checking for partial linear models when all variables are observed with additive distortion measurement errors.This paper provides the basis for studies on semiparametric regression models when the variables in the nonparametric part are observed with additive distortion measurement errors models.In future work, some semi-parametric models such as partial linear single-index models and partial linear varying coefficient models can be considered when all variables are additively distorted.For the semiparametric models with additive distortions, one can also consider the exponential calibrations (Zhang and Xu 2021;Zhang and Yang 2022) or mixed calibrations, such that some variables are calibrated by using zero mean identifiability conditions (2.1) and the other variables are obtained with exponential calibrations (Zhang 2022).For the other directions of the additive distortion measurement errors models, one can pursue to consider the high-dimensional problems in a future work.The research for this topic is ongoing.
methods and the observed variables ð e Y , e X, e Z, UÞ: After obtaining estimates b /ðuÞ, b wðuÞ and b nðuÞ, one can use the calibrated variables b Y ¼ e Y À b /ðUÞ, b X ¼ e X À b wðUÞ and b Z ¼ e Z À b nðUÞ to conduct standard statistical methods for estimation or statistical inference.
where b b t is the t-th component of b b: Similar to the proof of Theorem 4, we can construct confidence intervals of b 0r by I frg a ¼ fb 0 r : b l frg n ðb 0 r Þ c Ã a g, where c Ã a denotes the a quantile of the v 2 1 distribution.
variance of A b b R À Ab 0 under the null hypothesis H 0 is a zero matrix, which is also because the linear constrain A b b R ¼ b holds true in (4.2) when estimating b 0 : To test hypothesis H 0 , we propose to use a weighted quadratic form of A b b À b: f 1 ðjb 01 jÞ, :::, p 00 f p 0 ðjb 0p 0 jÞg: Theorem 8.Under the conditions (C1)-(C6), the estimator b b P ¼ ð b b T P, 1 , b b T P, 2 Þ T satisfies: a. (consistency) with probability tending to one, b b P, 2 ¼ 0; b. (asymptotic normality) ffiffiffi are all directly observed without any distortion.Here, b b 2 : For the bandwidth h 1 in the estimation b g ðzÞ presented in Example 1, a standard plug-in bandwidth (Wand and Jones 1995) for local linear estimators computed from the data f e Y i À e X here the bandwidth h involved in b nðuÞ for obtaining b by the simulated data set fY i , X i , Z i g n i¼1 ), the proposed estimator b b and the naive estimator b b N (the profile least squares estimator using f e Y i , e X i , e Z i g n i¼1 without calibration).Comparing the true estimator b b T with the proposed estimator b b, it is not surprised that the true estimator performs better than the proposed estimator.In Theorem 1, we show that b b is asymptotically efficient with b b T : In Table 1, we see that the values of MSE of b b T are all slightly smaller than b b: For b b, all the mean values are close to the true value ð2, À 1, 0Þ T , and the values of MSE decrease as the sample size n increases.It is also seen that the naive estimator b b N has large bias especially for b 02 and b 03 for case 1, and especially for b 01 and b 02 for case 2. In Table

b
g T ðzÞ as the sample size gets larger for both cases.While, the mean values of ASE for b g N1 ðzÞ and b g N2 ðzÞ have larger values than b g ðzÞ and b g T ðzÞ: The mean values of b g N1 ðzÞ and b g N2 ðzÞ decrease to zero much more slowly as the sample size increases, which indicates that biases always exist and result in inconsistent estimators.In Figure 1, we report the plots of estimators b g T ðzÞ, b g ðzÞ, b g N1 ðzÞ and b g N2 ðzÞ in one simulation when the sample size n ¼ 1000 to see the differences among these estimators.The estimators b g N1 ðzÞ and b g N2 ðzÞ in Figure 1 have much larger biases than b g T ðzÞ and b g ðzÞ: The figures and simulation in Table

Figure 1 .
Figure 1.Estimated curves of g(z) in Example 1 for case 1 and case 2. b g T ðzÞ (solid line), b g ðzÞ (dotdash line), b g N1 ðzÞ (dashed line) and b g N2 ðzÞ (dotted line).The sample size n is 1000.
estimator and b b P are close to the true values 17 (p ¼ 20), 27 (p ¼ 30) and 37 (p ¼ 40).And "IN b 0 " are close to 0 when the sample size n !500, at the same time, the IN b 0 is at most 0.05.For the true penalized estimator and the penalized estimator b b P , the proportion of which the model is correctly fitted (column x c, b 0 ) is above 90% when the sample size n !300: The proportions of which the model is underfitted (column x u, b 0 ) and overfitted (columns under x o, b 0 ) for the true penalized estimator and b b P are about 6% and 5% when the sample n ¼ 500 and n ¼ 1000.In the overfitted case, the proportion of models including 1 insignificant covariate dominates the ones including 2 or more insignificant covariates.The latter is nearly 0% in most situations.This indicates that the true penalized estimator and b b P most likely select models that are very close to the true model.Moreover, the mean squared errors Mse b 0 for b b P has much smaller values than the naive penalized estimator.For the naive penalized estimator, it seldom select the correct models (columns under x o, b 0 ) in most situations even when the sample size n ¼ 1000.The proportions of x c, b 0 and x o, b 0 have larger values even when the sample size increases to 1000, and this indicates that the naive penalized estimator finally results in a wrong model.Moreover, the naive penalized estimator definitely ruins the oracle property of SCAD penalty function due to larger values of Mse b 0 : This indicates that the naive penalized estimator has larger biases, which can not be present the patterns of b /ðuÞ, b w s ðuÞ's and b nðuÞ in Figures 2 and 3.The plots in Figure 2 show that these distortion functions are none constant functions.This also indicates that the confounding variable U has effect on the observed variables e Y , e X r 's and e Z: The estimator of ðb 01 , b 02 , b 03 Þ is ð b b 01 , b b 02 , b b 03 Þ T ¼ ðÀ0:1437, 0:0943, À40:4290Þ T :

4
shows that the nonparametric function g(z) is a non-linear function, which indicates that the linear regression model is not appropriate for this data set.Lastly, we use the statistic T Ã2 n, Sc to check whether the model (1.1) is adequate for fitting this data.Three weight functions are used:w 1 ðx, zÞ ¼ exp ðx 3 Þ, w 2 ðx, zÞ ¼ sin ðx 1 þ x 2 þ x 3 Þand w 3 ðx, zÞ ¼ cos ðpzÞ: The corresponding p-values are 0.9957, 0.7930 and 0.7521, suggesting the goodness of fit.Moreover, the mean of squared residuals is 1 n P n i¼1 b 2 i ¼ 3:9939: If we do not consider the effect of confounding variable U, and directly used the model e Y ¼ e X T e b þ e g ð e ZÞ þ e with the observed dataset f e Y i , e X i , e Z i g: the profile least squaresestimation procedure for this model results in 1 :0075, which is much larger than 3.9939.This again indicates the confounding variable U can not be ignored, and the model (1.1) with the confounding variable U is more appropriate for this data set.
To obtain the estimator of b 0 , we first use local linear estimators to estimate S Y ðzÞ and s X r ðzÞ: These estimators are obtained as b :::, h 0, r Þ T Þ T , and one needs to estimate h 0, ðÀ1Þ at first.Obviously, because h ¼ 1 is a one-dimensional parameter and h 0, ðÀ1Þ is an empty set in fact, so the leave-one-out component estimation method is not workable for the residual-based transformed model e e Y U The density function of Z, f Z ðzÞ, is bounded away from zero.Moreover, f Z ðzÞ, EðX s jZ ¼ zÞ, EðYjZ ¼ zÞ and g(z) have bounded continuous second order derivatives.(C4) The kernel function KðÁÞ is a symmetric bounded density function supported on ½ÀA, A and satisfies a Lipschitz condition.KðÁÞ also has bounded continuous second-order derivatives, satisfying l 2

Table 1 .
Simulation results of Mean (M), Standard Error (SD) and Mean Squared Error (MSE) for true estimator b b T , the proposed estimator b b , and the naive estimator b b N : MSE is in the scale of 10 À3 :

Table 2 .
Mean (M)and Standard Errors (SD) for ASE.The values are in the scale of 10 À2 :

Table 4 .
Simulation results of Mean (M), Standard Error (SD) and Mean Squared Error (MSE) for b b R with A 1 b 0 ¼ 0 and A 2 b 0 ¼ 1: All values of MSE are in the scale of 10 À3 :

Table 5 .
Simulation results for power calculations of T n in Example 2. c ¼ 0 stands for T n under the null hypothesis H 0 : In this example, we consider the model checking problem.We generate 1000 experiments with sample size n ¼ 300, n ¼ 500 and n ¼ 1000 from the following model

Table 7 .
Simulation results for power calculations of T Ã2 n, Sc in Example 4.