From Conditional Quantile Regression to Marginal Quantile Estimation with Applications to Missing Data and Causal Inference

Abstract It is well known that information on the conditional distribution of an outcome variable given covariates can be used to obtain an enhanced estimate of the marginal outcome distribution. This can be done easily by integrating out the marginal covariate distribution from the conditional outcome distribution. However, to date, no analogy has been established between marginal quantile and conditional quantile regression. This article provides a link between them. We propose two novel marginal quantile and marginal mean estimation approaches through conditional quantile regression when some of the outcomes are missing at random. The first of these approaches is free from the need to choose a propensity score. The second is double robust to model misspecification: it is consistent if either the conditional quantile regression model is correctly specified or the missing mechanism of outcome is correctly specified. Consistency and asymptotic normality of the two estimators are established, and the second double robust estimator achieves the semiparametric efficiency bound. Extensive simulation studies are performed to demonstrate the utility of the proposed approaches. An application to causal inference is introduced. For illustration, we apply the proposed methods to a job training program dataset.


Introduction
It is well known that marginal mean estimation can be enhanced by taking advantage of the additional information provided by covariates (Matloff 1981).There is a very nice property of marginal mean estimation derived from the conditional mean.More concretely, let Y and X be the response and covariate vector, respectively.Denote by respectively, the marginal mean of Y and the conditional mean of Y given X = x.It is easy to see that we have where G(x) is the marginal distribution of X.If the regression function μ(x) has a known parametric form, Matloff (1981) estimated μ by the average of the estimated regression function values at the sample points.He showed that such an estimator can offer a substantial improvement over the sample mean of Y and that in no case does the former estimator have larger asymptotic variance than the latter one.
The marginal mean reflects the location of the outcome and permits an easy interpretation, but it is not capable of reflecting the shape and capturing the entire distribution.Quantiles emerge as an important alternative, since they provide not only central features but also the tail properties of the response distribution (Koenker 2005).However, marginal quantiles cannot be identified from the conditional mean E(Y|X) if it is specified without any distribution assumption.Quantile regression, pioneered by Koenker and Bassett (1978), has attracted much attention in recent decades.By investigating different segments of the response distribution, quantile regression can flexibly and robustly capture the relationship between the outcome of interest and explanatory covariates.Wang and Zhou (2010) investigated estimation of the conditional mean through quantile regression, which provides a natural method for marginal mean estimation.It is still unclear how to estimate the marginal quantile from quantile regression.It is expected that the information in quantile regression can be exploited to improve efficiency over that provided by a simple marginal approach that only uses Y. Unfortunately, mathematically it is incorrect to average out the conditional quantile of Y given X for the marginal quantile estimation of Y.In this article, we provide a link between marginal quantile and conditional quantile regression, which suggests an alternative and direct approach to marginal quantile estimation.Simulation results (see Section 3) show that the estimator enhanced by quantile regression is more efficient than the simple quantile estimator.
Estimating marginal quantities, including quantile and mean, is of great interest in the context of missing data and causal inference.Causal inference could be considered as a special case of missing data, where the baseline measurements are available for all individuals, while the treatment outcomes are available only for those assigned to the treatment group.The treatment outcomes for those assigned to controls are missing.By contrast, the control outcomes are missing for all those assigned to the treatment group.Understanding the causal effect of a treatment or exposure on the outcome of interest is fundamentally important in statistics and economics, as well as other disciplines.Examples include predicting the impact of cleaning up a local hazardous waste site on housing prices (Stock 1991), decomposing the wage difference between men and women (Chernozhukov, Fernández-Val, and Melly 2013), evaluating a job training program on incomes (Donald and Hsu 2014), examining the effect of participation in school meal programs (Chan, Yam, and Zhang 2016), and delineating the causal effect of flexible sigmoidoscopy screening on colorectal cancer survival (Kianian et al. 2021).A comprehensive introduction to causal inference with various applications can be found in Imbens and Rubin (2015).In causal inference, it is impossible to estimate the joint distribution of outcomes for treatment and control.The best we can do is to estimate the marginal distributions or functionals of them (Qin 2017).The two estimands of main interest are the quantile treatment effect (QTE) (Firpo 2007) and the average treatment effect (ATE) (Rosenbaum and Rubin 1983;Hirano, Imbens, and Ridder 2003).
Based on the link that we have established, we propose two approaches for marginal quantile estimation in a missing data framework.The first approach is free of the propensity score.It works quite well when the quantile regression model is correctly specified, but may not be efficient.To avoid efficiency losses and potential biases caused by misspecification of the quantile regression model, in the second approach, we carefully adapt the augmented inverse probability weighting (AIPW) technique (Robins, Rotnitzky, and Zhao 1994) to improve robustness and efficiency.This is consistent as long as either the quantile regression model or the propensity score model is correctly specified.It is semiparametrically efficient if the two models are correctly specified.In contrast to existing AIPW estimators, the proposed estimator incorporates the link, which does, however, bring additional challenges, both numerical and theoretical.Tailored numerical algorithms that are easily implemented and theoretical techniques are introduced.Similarly, using the relationship between the marginal mean and conditional quantile regression, we propose two marginal mean estimators, the second of which (the AIPW one) achieves the semiparametric efficiency bound.
It is straightforward to apply the proposed approaches to causal inference.QTE, ATE, the quantile treatment effect on the treated (QTT), and the average treatment effect on the treated (ATT) are evaluated.The advantage of the first approach over that of Firpo (2007) and Hirano, Imbens, and Ridder (2003) is that it avoids propensity score estimation.It is useful in observational studies where the propensity score is unknown.The advantage of the second approach is its double robustness.It is worthy of note that Chernozhukov, Fernández-Val, and Melly (2013) studied inference of counterfactual distributions based on quantile regression.The marginal quantile (and mean) can be estimated on the counterfactual distribution.However, if the ultimate goal is to estimate marginal quantities, this indirect approach is complicated and numerical algorithms are scarce.
There are three advantages of using conditional quantiles rather than the conditional mean.First, conditional quantiles provide a comprehensive and flexible relationship and have practical utility in analyzing skewed, heavy-tailed, and heteroscedastic data.Marginal quantiles can be identified from conditional quantiles, but cannot be identified from the conditional mean model if it is specified without any distribution assumption.Second, conditional quantiles have an appealing equivariance property to monotone transformations, that is, Based on this property, it is easy to calculate the transformed mean.The conditional mean does not have this equivariance property.It is therefore difficult to calculate the transformed mean through the conditional mean.Third, conditional quantiles are more robust than the conditional mean.
The remainder of this article is organized as follows.In Section 2, we introduce the proposed approaches, estimation algorithms, and asymptotic properties.Extensive simulation studies are conducted in Section 3. Section 4 applies the proposed methods to causal inference.A job training program dataset is analyzed in Section 5. Some concluding remarks are made in Section 6. Technical proofs are relegated to the Appendix and supplementary materials.

Methodology and Asymptotic Properties
Quantile regression is capable of accommodating heavy tailed data, robust in the presence of outliers, and invariant to monotone transformation.We assume the quantile regression model where The mode (1) reflects the complete relationship between Y and X.We denote by θ q the qth marginal quantile of Y, which is defined by E{M(Y, θ q )} = 0, where M(Y, θ q ) = I(Y ≤ θ q ) − q, 0 < q < 1.Unfortunately, in contrast to the conditional density model, in general, averaging out the conditional quantile will not yield a consistent estimate of the marginal quantile.We provide the following link between the marginal quantile and quantile regression: where F(y|X) is the conditional density function of Y and F −1 (τ |X) is its inverse.
In the following, we focus on the missing data framework, where the response Y is potentially missing whereas the covariates are always observed.We will discuss its application to causal inference in Section 3. Let {(Y i , X i , D i ), i = 1, . . ., n} be the observed data, where Y i is the outcome of interest, X i is the associated p-dimensional vector of covariates, and D i is the missing indicator.That is, if D i = 1, Y i is available, and D i = 0 otherwise.We adopt the missing at random (Little and Rubin 2002) assumption (3) which implies the conditional independence of Y and D given X.

Marginal Quantile Estimation
In this section, we introduce two approaches to estimate θ q through quantile regression.The first approach is based on the link (2), where the specific form of π(X i ) is not required.The second approach adapts the AIPW technique to combine the link (2) with the propensity score π(X i ), where the explicit form of π(X i ) is assumed but is not necessarily correct.The link (2) motivates us to seek an enhanced estimation for the marginal quantile, where the quantile regression coefficient β(τ ) is replaced by its estimation β(τ ).The integration of β(τ ) is over 0 to 1.However, extreme quantile estimation is challenging both numerically and theoretically.As pointed out by Wang, Li, and He (2012), quantile regression estimation often suffers from high variability at high or low tails because of data sparsity.It is quite difficult to estimate the τ th quantile when τ approaches 0 or 1 at any rate.Estimators for extreme quantiles are quite unstable with finite samples.According to Gutenbrunner and Jurecková (1992) and Koenker (2005), the uniform consistency of the quantile regression coefficients β(τ ) and the uniform Bahadur representation of √ n{ β(τ ) − β 0 (τ )} hold only for τ ∈ [ , 1 − ], a subset of (0, 1), where is arbitrarily chosen in (0, 1/2).To overcome the challenges posed by extreme quantile estimation, we choose α n th trimmed integration.We consider the estimating equation where α n is a sequence that converges to 0. The stochastic integration representation in (5) suggests a grid-based numerical approximation.One can approximate β(•) by a right-continuous piecewise-constant function that jumps only on a grid, We define the size of the grid as max{τ k − τ k−1 : k = 1, . . ., K}.Without loss of generality, we assume the grid to be equally spaced, with size (1 − 2α n )/K.The resulting estimating equation is S n (θ q , β) = 0, where 6) is not continuous, an exact root may not exist.The proposed estimator θq is defined as a generalized solution.Common strategies, such as the Newton-Raphson algorithm, cannot be applied to find θq .
We denote by θ q,0 the true value of θ q .Following similar arguments as those in the appendix of Peng and Fine (2009), we show in the Supplementary Material that obtaining θq is equivalent to minimizing the following L 1 -type convex function: where R * is a sufficiently large number selected to bound from above for all θ q in the compact parameter space for θ q,0 .One may directly apply the rq() function in the R package quantreg to solve a median regression problem with an augmented dataset, where the response vector is It is worth pointing out that (5) does not use any information about the propensity score.In real applications, the covariates may affect both the outcome of interest and the indicator of missingness.Next, we adapt the AIPW technique discussed in Robins, Rotnitzky, and Zhao (1994) to estimate θ q .We suppose a parametric logistic regression model for the propensity score, that is, It should be noted that the covariate vectors in the quantile regression model ( 1) and the propensity score model ( 7) can be different.For notational simplicity, we write both of them as X.By fitting this model, information in a vector of covariates is transformed to a scale propensity score.We can maximize the log-likelihood to find the estimator γ .The AIPW-type estimating equation is For grid-based approximation, the proposed AIPW estimator θ A q is the solution of where The potential negativity of AIPW weights brings difficulties in accommodating the numerical algorithm of Peng and Fine (2009), we use a different optimization approach.We show in the supplementary materials that finding the solution to (8) can be reformulated as locating the minimizer of Because the weight 1 − D i /π(X i γ ) may be negative, the rq() function in R is no longer applicable.Instead, we use the optimize() function to obtain θ A q .Note that θ A q possesses the double robustness property: it is consistent provided that the quantile regression is correctly specified, or the logistic regression for D is correctly specified, but not both.
The technical proofs of Theorems 1-3 are relegated to the Appendix.The proofs have notable differences from those of Wang and Zhou (2010) and Peng and Fine (2009).The proposed estimating function is not differentiable, and to attain the largesample properties, we cannot adapt the techniques of Wang and Zhou (2010) because their estimator has an explicit form and is continuous in terms of β(τ ).Instead, we employ theoretical techniques introduced by Chen, Linton, and Van Keilegom (2003), which, however, did not address the link between quantile regression and the marginal quantile and did not consider the estimation error caused by the numerical approximation of the integration.Peng and Fine (2009) studied inference of the quantile regression coefficient and assumed the covariates to be uniformly bounded, which is too strong, excluding the normal distribution.We impose a less restrictive assumption on the covariates.Peng and Fine (2009) did not consider a second-stage marginal estimation, nor did they take account of the numerical approximation error.

Marginal Mean Estimation
In this section, we introduce the estimation of the marginal mean of Y, θ m , through quantile regression when Y is potentially missing.It follows from Wang and Zhou (2010) that The marginal mean of Y can be estimated by the α n th trimmed sample version which is approximated by As an alternative approach, we adapt the AIPW strategy to estimate θ m .Note that Then, θ m can be estimated by which is approximated by θ A m also possesses the double robustness property: it is consistent if either the model (1) or the model ( 7) is correctly specified, but not both.
Theorem 5 states that the asymptotic normality of θ A m is guaranteed by weaker conditions for α n and K.This is due to the explicit form of the marginal mean estimation and the elegant structure of the AIPW estimator.The technical proofs of Theorems 4-6 are relegated to the Supplementary Material.

Simulation Studies
In this section, we conduct simulation studies to investigate the finite-sample performance of the proposed estimators.We generate simulated data from , where X 1 and X 2 are independently generated from the normal distribution with mean 0 and standard error 2/3, N(0, 4/9), then follows from the standard normal distribution, N(0, 1).The true marginal mean of Y is 0. We use 100,000 replications of simulated Y to approximate the true marginal quantiles.The resulting true marginal quantiles at q = 0.2, 0.3, …, and 0.8 are -1.259,-0.747, -0.354, 0.000, 0.354, 0.747, and 1.259, respectively.The missing indicator D is generated from the logistic regression model , which follows the propensity score model (7) if the covariate vector is (X 1 , X 2 2 ) .The missing rate is approximately 25.8%.To investigate the robustness of the proposed estimators, we consider scenarios where β(τ ) and γ may not be consistent.Specifically, we consider the following three cases: Case I: Both β(τ ) and γ are consistent.We specify correct covariate vectors to fit the models (1) and ( 7).Case II: γ is consistent, but β(τ ) is inconsistent.We use the correctly specified covariate vector (1, X 1 , X 2 2 ) to fit the logistic regression model ( 7), but the misspecified covariate vector (1, X 3 1 , X 3 2 , X 2 1 , X 1 X 2 ) to fit the quantile regression model (1).Case III: β(τ ) is consistent, but γ is inconsistent.We use the correctly specified covariate vector (1, 2 ) to fit the quantile regression model ( 1), but the misspecified covariate vector (1, X 1 , X 2 ) to fit the logistic regression model ( 7).
We consider a sample size n = 200 and generate 500 replicates of simulated data.We choose α n = 0.02, which is smaller than n −1/2 .The τ -grid is chosen as {0.02, 0.03, . . ., 0.98}.β(τ ) is obtained using the rq() function in the R package quantreg.We apply the proposed approaches to estimate the marginal quantiles of Y at q = 0.2, 0.3, . .., 0.8 and the marginal mean.The ideal case (full data analysis), which assumes complete knowledge of Y i , and the complete case, which ignores the data with missing Y i , are considered for comparison.One hundred bootstrap samples are used to estimate the standard error.
We first investigate the performance of marginal quantile estimation.The full data estimator, θ F q , satisfies the estimating equation q is an "oracle" estimator and θ C q is a naive estimator.The proposed estimator with full data, which we denote by θ P,F q , is also considered.θ P,F q is derived exactly like θq , except that the quantile regression coefficient is estimated using the full data, that is, We also include the estimator of Firpo (2007), where the propensity score is estimated by fitting the logistic regression model with covariates , and (1, X 1 , X 2 ) under Cases I-III, respectively.Table 1 and Tables S1 and S2 in the supplementary materials present simulation results under Cases I-III, respectively, where the true parameters (True), the empirical biases (Bias), the empirical standard deviations (SD), the average of estimated standard errors (SE), and the empirical coverage probabilities (CP) are displayed.Under Case I, the two proposed estimators work quite well: they have negligible biases and reasonable coverage probabilities that are approximately equal to the nominal level 95%.The empirical standard deviations and the estimated standard errors are close to each other.
The proposed estimators are less efficient than θ F q , which is in accordance with our expectation, because the full data analysis cannot be executed in practice.The standard error of θ P,F q is generally smaller than that of θ F q , which shows that the proposed estimator is enhanced by the conditional quantile regression.Under Case II, β(τ ) is inconsistent, because the associated covariate vector is misspecified.We find in Table S1 that the proposed estimator θq is biased, which is reasonable because the consistency of θq is guaranteed by the correct specification of the quantile regression model.Under Cases II and III, one model is misspecified, but we find from Tables S1 and S2 that the AIPW estimator θ A q still has quite excellent performance.This shows its double robustness.However, the complete case estimator has nonnegligible biases and small coverage probabilities.The inverse probability weighting estimator of Firpo ( 2007) is unstable, and the estimated standard errors deviate from standard deviations under Cases I and III.We find from Table S2 that the estimator of Firpo (2007) has a large bias if the propensity score is incorrectly specified.Even if the propensity score is estimated by a logistic series under Case II, the coverage probability may also be poor: for example, the coverage probability of Firpo (2007) at q = 0.8 is 88.6%, which is far from the nominal level 95%.We also consider other simulation setups, including one where the quantile regression model and propensity score model are linear in terms of X 1 and X 2 , as well as those with asymmetric and heavy-tailed errors, heteroshedastic errors, and endogenous errors.More details and simulation results are presented in the supplementary materials.Now we investigate the performance of marginal mean estimation.The two benchmark estimators are θ The proposed estimator with full data, θ P,F m , is derived exactly like θm , except that the quantile regression coefficient is estimated by the full data.The estimator of Hirano, Imbens, and Ridder (2003) is also considered, where the propensity score is estimated by fitting the logistic regression model with covariates (1, X 1 , X 2 2 ) , (1, X 1 , X 2 , X 2 1 , X 2 2 , X 1 X 2 ) , and (1, X 1 , X 2 ) under Cases I-III, respectively.Table 2 presents the simulation results.We find that the two proposed estimators work quite well under Cases I and III, even when γ is inconsistent under Case III.As predicted, the proposed estimator θm is a little biased under Case II because of the inconsistency of β(•).The complete case estimator has nonnegligible biases and small coverage probabilities.The inverse probability weighting estimator of Hirano, Imbens, and Ridder (2003) is biased when γ is inconsistent.
Finally, we conduct simulation studies to show the appealing equivariance property of the proposed quantile regressionbased approaches to monotone transformations.For any non- that is, the conditional quantile of the transformed h(Y) is equal to the transformed quantile of Y.This is not the case for the conditional expectation unless h(•) is a linear function.We generate data from the following transformed linear model: where e ∼ N(0, 1/4), the normal distribution with mean 0 and standard deviation 1/2, and X 1 and X 2 are generated in the same way as before.When calculating the proposed marginal mean estimators θm , θ A m , and θ P,F m , we first use quantile regression to estimate the conditional quantile of log(Y) and then estimate the conditional quantile of Y by performing the exponential transformation.The remaining steps proceed in the same way.The naive transformed mean estimator θtrans , which is obtained by first fitting a linear regression to estimate the conditional mean of log(Y) and then averaging the exponential transformation of the conditional mean, is included for comparison.Simulation results based on θm , θ A m , θ P,F m , θ F m , θ C m , and θtrans are shown in Table 3, where the sample size is chosen as 100 or 200.We find from Table 3 that θtrans has a large bias, whereas the proposed estimators work well.This shows the advantage of the proposed approaches in marginal mean estimation.

Application to Causal Inference
In this section, we apply the proposed methodology to causal inference.We adopt the potential outcome framework (Rubin 1974) and consider a binary treatment.Let X be the covariate vector.Let V be the indicator of treatment, with V = 1 indicating treatment and V = 0 otherwise.Each subject has four quantities, (Y(1), Y(0), X, V), where Y(1) and Y(0) are the potential outcomes with treatment and control, respectively.Y(1) and Y(0) cannot be observed simultaneously.The observed data for each subject are (Y, X, V), where We make the un-confoundedness assumption (Rosenbaum and Rubin 1983) which is similar to the missing at random assumption in a missing data framework.The un-confoundedness assumption (11) implies that V and (Y(0), Y( 1)) are independent conditional on X.First, we apply the proposed approaches to estimate QTE and ATE.For a particular value of 0 1) and Y(0), respectively.ATE = E{Y(1)} − E{Y(0)}, which is the mean difference between treatment group and control group.We impose quantile regression models We assume that the propensity score follows the logistic regression model The quantile regression coefficients can be consistently estimated by and Using the first approach, Q Y(1) (q) is estimated by solving the equation where α is the maximum likelihood estimator of α.The second estimator of The second estimator of ATE is Next, we apply the proposed approaches to evaluate QTT and ATT.QTT is defined as where n 1 = n i=1 V i is the number of treated units.
The second estimator of E{Y(0 .
In the supplementary materials, we provide simulation results for the proposed two estimators of QTT and ATT.The results show that the proposed estimators work well.

Real Data Analysis
In this section, we apply the proposed approaches to a job training program dataset that was first analyzed by LaLonde (1986) and later by Dehejia and Wahba (1999), Firpo (2007), Chan, Yam, and Zhang (2016), among others.The National Supported Work (NSW) Program was a randomized experiment in the 1970s to study whether a job training program would increase income levels among workers.The program provided extensive job training to workers over a 9-18-month period.Both intervention and control groups were present in the program, because it randomly assigned some applicants to the program and randomly denied some other applicants.LaLonde (1986) combined this experimental dataset with the observational dataset to examine whether the results agreed with the unbiased results of a randomized experiment, where the observational dataset used information from the Panel Study of Income Dynamics (PSID) and was constituted by controls only.For a more detailed description of the data, see LaLonde (1986) and Dehejia and Wahba (1999).
We analyze the dataset that is constituted by 185 treated units from the NSW and 429 control units from the PSID.This combined nonexperimental dataset is called lalonde in the R package MatchIt.We are interested in the causal effects of the job training program on post-training earnings in 1978.The proposed approaches are applied to estimate QTT and ATT.QTT examines the effect of the job training program on different segments of distribution of the post-training earnings for those individuals who participated in the program.QTT(q), QTT at quantile level q, is the difference between the qth quantile of outcome distribution when individuals were trained by the program and the qth quantile of the potential outcome distribution in the (hypothetical) situation where individuals were not trained, given that the individuals were participants in the program.ATT evaluates the mean effect of the program on posttraining earnings for treated individuals.
We consider baseline covariates, including age, education (years of education), black (indicator of Black race), hispanic (indicator of Hispanic race), married (indicator of married status), nodegree (indicator of a lack of a degree), earn74 (earnings in 1974), and earn75 (earnings in 1975), where earnings in 1974 and 1975 are the two pre-training earnings.Table 4 provides summary statistics for the treatment group and the nonexperimental comparison group.Constant and linear terms of all covariates are chosen as analysis covariates in the logistic propensity score model and the quantile regression model.QTT: quantile treatment effect on the treated; ATT: average treatment effect on the treated; SE: Standard error; Firpo: Firpo (2007); HIR : Hirano, Imbens, and Ridder (2003).
The QTT at quantile levels 0.25, 0.5, 0.75 and the ATT are displayed in Table 5, where the proposed estimators (Proposed I and Proposed II), the well-known inverse probability weighting estimator developed by Firpo (2007) and Hirano, Imbens, and Ridder (2003), and the naive estimator (Naive) are shown.The Proposed II refers to the AIPW-type estimator.The naive estimator is calculated by naively taking differences between the quantiles or means of the treated and controls in the combined nonexperimental data.The corresponding standard errors, which are derived by 300 bootstrap samples, are also reported, along with confidence intervals.
We find from Table 5 that the proposed estimators and the estimators of Firpo (2007) and Hirano, Imbens, and Ridder (2003) are close, yielding similar conclusions.The job training program has a positive effect on post-training earnings, although the effect is not significant.The proposed estimators generally have smaller standard errors than those of Firpo (2007) and Hirano, Imbens, and Ridder (2003).However, there are large discrepancies between the proposed and naive estimators.For QTT(0.75), the naive estimator shows that the incomes of participants were significantly lower than those of nonparticipants, which implies that the job training program has a negative effect.This particular finding has a misleading implication.The naive estimator yields an opposite conclusion to that of the other estimators, which shows the bias of the naive strategy.Matloff (1981) pointed out that, compared with the sample mean, an improved estimate of the marginal mean can be derived by averaging out the estimated regression function values at the sample points.Unfortunately, this nice property is not inherited by marginal quantile and conditional quantile regression.In this article, we have constructed a mathematical connection between them.Moreover, we have proposed two methods to estimate the marginal quantile and marginal mean in the missing data framework where the response suffers from missingness.We have introduced an easily implemented computational algorithm to solve the discontinuous estimation equation.The proposed estimators have been shown to have desirable large-sample properties and excellent finite-sample performance.Finally, we have applied the proposed methodology to causal inference to evaluate treatment effects.

Appendix
In this appendix, we give the technical proofs of Theorems 1-3.Following Gutenbrunner and Jurecková (1992), we use the topology of uniform convergence on compact subsets of (0, 1) to approximate the process n 1/2 { β(τ ) − β 0 (τ )}.Let B be the infinitedimensional functional space that contains β 0 (•).We assume that B is endowed with a pseudometric || • || B , which is defined as The pseudometric on B is approximated by the sup-norm metric on compact subsets of (0, 1).Let o I (a n ) denote a term that converges uniformly to 0 almost surely for τ ∈ I after being divided by a n .It follows from conditions (C3)-(C5) that E{DXX f {X β 0 (τ )|X}} is positive definite.According to the continuity of f (y|X) and the Lipschitz continuity of β 0 (τ ), it follows from the Glivenko-Cantelli theorem that n ∈ (0, 1/2).Following Koenker (2005), we have As alternative definitions, we write the proposed estimators as
(a) Denote by 1 (θ q , β 0 ) the derivative of s(θ q , β 0 ) with respect to θ q .It follows from simple algebra that 1 2) follows from the continuity and nonnegativity of f (•|X), which is guaranteed by the regularity condition (C3).

Table 1 .
Simulation results for marginal quantile estimation under Case I.

Table 2 .
Simulation results for marginal mean estimation.

Table 3 .
Simulation comparisons among five marginal mean estimators.

Table 5 .
QTT and ATT for the job training data.