Uniformly Semiparametric Efficient Estimation of Treatment Effects With a Continuous Treatment

This article studies identification, estimation, and inference of general unconditional treatment effects models with continuous treatment under the ignorability assumption. We show identification of the parameters of interest, the dose–response functions, under the assumption that selection to treatment is based on observables. We propose a semiparametric two-step estimator, and consider estimation of the dose–response functions through moment restriction models with generalized residual functions that are possibly nonsmooth. This general formulation includes average and quantile treatment effects as special cases. The asymptotic properties of the estimator are derived, namely, uniform consistency, weak convergence, and semiparametric efficiency. We also develop statistical inference procedures and establish the validity of a bootstrap approach to implement these methods in practice. Monte Carlo simulations show that the proposed methods have good finite sample properties. Finally, we apply the proposed methods to estimate the unconditional average and quantile effects of mothers’ weight gain and age on birthweight. Supplementary materials for this article are available online.


INTRODUCTION
There is a growing literature on program evaluation studies. Estimation of treatment effects has provided a valuable method of statistical analysis of the effects of policy variables. This is especially true for program evaluation in economics and statistics, where these methods help to analyze how treatments or social programs affect the outcome distributions of interest.
The literature on unconditional treatment effects (TE) concentrates almost exclusively on the special case of discrete treatment, as binary and multi-valued treatment assignments. On the binary TE models, Hahn (1998), Heckman et al. (1998), Hirano, Imbens, and Ridder (2003), Abadie and Imbens (2006), Imbens, Newey, and Ridder (2006), and Li, Racine, and Wooldridge (2009) studied efficient estimation of the average treatment effects (ATE). The study of ATE has been extended to the quantile treatment effects (QTE) by Firpo (2007). There is also literature on estimation of ATE and QTE for multi-valued TE, see, for example, Imbens (2000), Lechner (2001), and Cattaneo (2010). However, the literature on continuous TE is relatively sparse. Among others, Hirano and Imbens (2004) and Imai and van Dyk (2004) developed a generalized propensity score for continuous average treatment models. Flores (2007) proposed nonparametric estimators for average dose-response functions (ADRF). Florens et al. (2008) considered identification of ATE using control functions. Lee (2012) studied unconditional distribution of potential outcomes with continuous treatments. Despite this sparsity, many empirical questions of interest in applied research involve continuous treatments. For example, in the study of the effects of mothers' weight gain during pregnancy on birthweight, the weight gain in pounds is a continuous variable.
This article studies identification, estimation, and inference of general unconditional TE models with a continuous dose of the treatment. We consider estimating the parameters of interest, the dose-response functions (DRF), through moment restriction models (Z-estimators) in which the generalized residual functions are possibly nonsmooth. In this general formulation, the range of models for which the methods are applicable is very broad. For instance, the framework includes average and quantile DRF, ADRF, and QDRF, as special cases, and consequently ATE and QTE are direct applications of the proposed methods. Throughout the article, we use the examples of ADRF and QDRF as applications of the general theory. Thus, we extend the literature on ATE and QTE from discrete to continuous dose of treatment. In addition, we extend the literature on continuous treatments, which to our knowledge only allows for ADRF and ATE, to general DRF, with QDRF and QTE being important examples.
Consistent estimation of the DRF requires identification of the parameters of interest. In this article, following Rosenbaum and Rubin (1983), the relevant restriction for identification is the ignorability assumption, that is, the selection to treatment is based on observable variables. The ignorability assumption states that given a set of observed covariates, each individual is randomly assigned to either the treatment group or the control group. This condition has been largely employed in the literature, see, for example, Rubin (1977), Heckman et al. (1998), Dehejia and Wahba (1999), Firpo (2007), Flores (2007), and Cattaneo (2010).
Based on the identification, we propose a two-step estimation procedure. The practical implementation of the estimator is simple. In the first step, one estimates a ratio of conditional densities, similar to a propensity score. In the second step, an optimization problem is solved. Notice that, once the identification is achieved and a DRF is estimated, other parameters of interest can be easily estimated. For example, one can estimate TE, which are defined as differences of the DRF evaluated at different levels of treatment. In addition, one could estimate the entire curve of potential outcomes.
We establish the asymptotic properties of the two-step estimator, namely, uniform consistency, weak convergence, and semiparametric efficiency. It is shown that the two-step estimator of the DRF is uniformly consistent over the set of treatments. Since the parameters of interest are now infinite dimensional, the results need to be uniform over the treatment levels. Moreover, we show that the estimator converges weakly to a Gaussian process, and that it is uniformly semiparametric efficient.
We develop statistical inference procedures. Inference on the DRF is uniform over the set of treatments. Differently from the binary or multi-valued treatment models, in which pointwise results are equivalent to uniform results (because the numbers of treatment levels are finite), when treatment levels belong to an interval T , the uniform results are stronger than pointwise, and consequently, only pointwise results are often not adequate for inference. We propose testing for general hypotheses. The test statistics used are Kolmogorov and Cramér-von Mises types, which detect deviations from the null hypothesis. Since the parameter of interest is infinite dimensional and the weak limits of these statistics are not standard, we compute critical values using a bootstrap method. We provide sufficient conditions under which the bootstrap is valid, and discuss an algorithm for its practical implementation.
Monte Carlo simulations to evaluate the finite sample performance of the methods are conducted. The simulations show that the estimators are approximately unbiased, the Cramér-von Mises-type test statistic has empirical size close to the nominal, and high power against selected alternatives. In addition, the result is improved when the sample size increases, and is not sensitive to the numbers of bootstrap.
To illustrate the methods, we consider an empirical application to a birthweight study using data from National Vital Statistics System. We estimate the unconditional average and quantile dose-birthweight functions for both mothers' weight gain during pregnancy as well as mother's age. The empirical results document important heterogeneity in the dose-birthweight functions for weight gain and age. The estimates of the dosebirthweight functions regarding the mother's weight gain reveal interesting heterogeneity both across the treatment (weight gain) and quantiles. The findings provide evidence that, in general, more weight gain during pregnancy leads to higher birthweight. However, the TE differ across levels of weight gain. For a given quantile, positive impacts are larger for low and high weight gains while relatively lower in the middle range of weight gain. The quantile dose-birthweight functions of the mother's age on birthweight is downward-sloping. The impact of the mother's age on birthweight remains negative, for all quantiles, as age increases. In addition, for a given age, this impact becomes more severe for lower parts of the distribution of birthweight.
The remaining of the article is organized as follows. Section 2 establishes identification and a two-step estimator. Section 3 derives the asymptotic properties. Section 4 discusses inference. Section 5 provides simulations. The application to dosebirthweight functions is in Section 6. Section 7 concludes. The proofs of the main results are collected in the Appendix.
Notations: Let E and E be expectation and sample average, respectively. Let , p →, and p * → denote weak convergence, and convergence in probability and in outer probability, respectively.

THE MODEL, IDENTIFICATION, AND ESTIMATION
In this article, we assume that a random sample of size n is available. The objective is to learn how an outcome variable changes as the dose of some treatment variable varies. The dose is denoted by t, where t ∈ T , an interval in R, and the outcome is denoted by Y (t). More specifically, for each t ∈ T , Y (t) is the outcome when the dose of treatment is t. When t varies in T , a random process Y (t) is defined. The random process Y (t) indexed by t ∈ T denotes potential outcomes under treatment levels in T . However, in practice, one cannot observe Y (t) for all t ∈ T . Rather, only a single Y (t 0 ) can be observed, where t 0 is the realization of a random variable T. Thus, the observed outcome is the random variable Ideally we would like to estimate the value of the DRF at t 0 using the sample with T = t 0 . However, in general, due to the self-selection problem, bias can arise by direct use of sample counterparts. To illustrate this point, we consider the estimation of ATE, from the DRF, as an example. For any t 1 < t < t 2 , Average treatment effect on the treated This simple calculation indicates that, due to the existence of the selection biases 1 and 2, it is impossible to directly use the sample counterparts to compute TE. To solve this problem, it is common in the literature to assume the existence of a set of random variables X conditional on which Y (t) is independent from T for all t ∈ T . In such case, which has a causal interpretation. This is the ignorability condition, which is discussed in more detail below. Finally, we need to combine the results for X to obtain an unconditional TE. By the law of iterated expectations, this unconditional expectation can be recovered.
The objective of this article is to study continuous treatments considering general forms of dose-response for TE models. To accomplish this aim, we develop a general framework for generic moment restriction estimators (Z-estimators) with possibly nonsmooth functions. For each t ∈ T , the parameter of interest β(t) ∈ B ⊂ R is assumed to uniquely solve the identifying conditions as where m(·) is defined as the generalized residual function. Then for each t, the DRF is defined as the value β(t) that solves the moment condition. We will use the important examples of average and quantile dose-response functions (ADRF and QDRF) as applications of the general theory. In addition, we use the ADRF and QDRF estimators in the empirical application to a birthweight study. The following examples show that ADRF and QDRF are special cases of β(·), which result from choosing specific forms of m(·).
Example 2 (Quantile). QDRF is another special case of the general model.
From the QDRF, one can estimate the QTE as QTE(t, t ) = q τ 0 (t) − q τ 0 (t ). In this article, QTE is defined as the difference of the τ th quantile at different levels of treatment. See Firpo (2007) for detailed discussion on QTE.
We state assumptions on the general model for identification of the parameters of interest.
I.I For each t ∈ T , β 0 (t) uniquely solves E[m(Y (t); β(t))] = 0, where m : R × B → R is measurable. I.II For all t ∈ T , we have: I.III Assume that 1. There exists a function e(y) with e(y) dy < ∞ such that |m(y; . Also the interval T is right open. Assumption I.I is an identification condition provided that Y (t) are observable. The parameter of interest, β(t), is defined by the moment condition. However, this condition cannot be used directly to estimate β(t) because our data are not experimental and Y (t) are not observable for all t ∈ T . Therefore, condition I.II.1, the ignorability assumption, is fundamental. According to I.II.1, although the assignment of the treatment level is not random, it is random within subpopulations characterized by X. This assumption has been used, among others, by Heckman et al. (1998), Dehejia and Wahba (1999), and Hirano and Imbens (2004). I.II.2 states that the density of treatment levels is positive. Thus, the triple (X, Y, T ) is observable, and a random sample of size n can be obtained. I.III allows for changing the orders of limits and integral. The set T is right open without loss of generality.
The identification result is presented in the following theorem. For notational convenience, denote u := (x , y) and U := (X , Y ) .
Proof. See the Appendix.
The result in Equation (1) allows identification of the DRF. The left-hand side of (1) is used to define β(t), which involves the unobservable Y (t). Consequently, it cannot be used to estimate β(t). Nevertheless, the right-hand side of (1) is expressed in terms of the observables (X, Y, T ), and hence, can be used to estimate β(t). Note that Y (t) is not observable while Y is. The intuition behind the result is that the existence of X delivers identification of the parameter of interest. That is, conditional on observed covariates X, each individual is randomly assigned to a treatment level.
Remark 1. The result in Theorem 1 has a similar format as eq. (2) of Cattaneo (2010), after we transform the latter. To see this, we begin with E 1{T =t}m(Y ;β(t)) p t (X) = 0. By the law of iterated expectation, the left-hand side of the previous equation equals to E m(Y ;β(t)) p t (X) E [1 {T = t} |X, Y ] . Noting that E [1 {T = t} |X, Y ] = P (T = t|X, Y ) and, by definition, p t (X) = P (T = t|X), the last equation equals to E m(Y ; β) P(T =t|X,Y ) P(T =t|X) . Thus, our result "replaces" the conditional probabilities by conditional densities.
Remark 2. The result in Theorem 1 is also related to Hirano and Imbens (2004). They extended the propensity score method to a setting with continuous treatment for ADRF. This article complements their results by providing a more general model for estimating DRF. This general formulation includes ADRF as a special case. In addition, we generalize the results in Hirano and Imbens (2004) by establishing uniform asymptotic results of the two-step estimator. In particular, in the next section we show uniform consistency, weak convergence, and semiparametric efficiency of the proposed estimator. We also develop practical inference.
Given the identification condition in Equation (2) of Theorem 1, we are able to estimate the parameters of interest. We propose a two-step estimator as follows.
Step 2. For each t ∈ T , find β(t) as a zero of the following condition The estimator β(t) is defined as the zero of the equation above. The identification conditions and the estimator are illustrated in the following ADRF and QDRF examples.

ASYMPTOTIC PROPERTIES
In this section, we derive the asymptotic properties of the two-step estimator. We establish the uniform consistency and the weak limit of the DRF, β(·), in ∞ (T ). In addition, as important applications and examples, we specialize the general results and establish the corresponding asymptotic properties of the ADRF and QDRF. Moreover, we discuss estimation of the nuisance parameter, π 0 . For conciseness, the uniform semiparametric efficiency is established in the online supplemental appendix. The uniform efficiency is based on the pointwise semiparametric efficiency and the weak convergence to a tight random process.

Consistency
Consistency is a desired property for most estimators. In this article, different from the discrete or multi-valued treatment models, the treatment levels take values on an interval T . Thus, consistency is established uniformly over T . For the general two-step estimator in (3) to be uniformly consistent, we state the following sufficient conditions.
Condition C.I defines the Z-estimator and C.II is an identification condition. Pakes and Pollard (1989) and Chen, Linton, and Van Keilegom (2003) had similar assumptions. For a detailed discussion of this type of identification assumption, see van der Vaart (1998). C.III only requires the moment of the estimating equation to be finite. This is analogous to 4(b) in Cattaneo (2010). C.IV requires consistent estimation of the nuisance parameter. This is a usual requirement corresponding to (1.4) of Theorem 1 of Chen, Linton, and Van Keilegom (2003). We discuss estimation of the nuisance parameter in Section 3.3. C.V implies a uniform law of large numbers, this is standard; see, for example, Newey and McFadden (1994). We provide more primitive conditions to establish consistency for the specific cases of ADRF and QDRF. Now we state the uniform consistency result for the DRF estimator.
As illustration of the general result, we discuss uniform consistency over t ∈ T of the two-step estimators ADRF and QDRF given in (4) and (5), respectively. In these specific cases, the functional form of m(·) is given and, hence, we can check the high level conditions of Theorem 2 (C.I-C.V). We show that both ADRF and QDRF satisfy C.I-C.III. They are formally verified in Corollaries 1 and 2. Assumptions C.IV-C.V refer to the estimation of the nuisance parameter π 0 in the first step, and examples satisfying these conditions are given in Section 3.3. Consider the assumptions for the ADRF.
AC.I requires the expectation of Y (t) to be finite. The diameter of the parameter space is finite, which is common for M-estimators; see, for example, Chen, Linton, and Van Keilegom (2003). Hirano, Imbens, and Ridder (2003) assumed the second moment of Y (1) and Y (0) to be finite, which is slightly stronger. AC.II is a high level condition on the nuisance parameter, parallel to C.V, and will be discussed in more detail below. Nevertheless, there are many functional classes that satisfy this condition. Examples include the smooth function class in Example 19.9 of van der Vaart (1998) for sufficiently smooth functions and sufficiently small tail probabilities. Uniform consistency of the ADRF is summarized in the following corollary. The intuition of this result is that under AC.I and AC.II we are able to check conditions C.I-C.V.
Proof. See the Appendix.
For the uniform consistency of the QDRF over t ∈ T , consider the following conditions. QC.I Uniformly in t, the densities f Y (t) (y) is bounded above and f Y (t) (q τ 0 (t)) > 0. Also, for any QC.III The function class {ψ 2,π,t : π ∈ δ , t ∈ T } is Glivenko-Cantelli, and have an envelope F 2 (y) such that F 2 (y) that is integrable.
QC.I is a standard identification condition in the quantile regression literature. It allows one to verify C.I. It is similar to A.2-A.3 of Angrist, Chernozhukov, and Fernandez-Val (2006), and corresponds to Assumption 2 of Firpo (2007). QC.II imposes boundedness on the joint density of (U, T ) analog to Assumption 1(ii) of Firpo (2007). QC.III is weaker than AC.II, since τ − 1{·} is uniformly bounded. Those two conditions are imposed for estimation of the nuisance parameter. Under QC.I-QC.III, we can check the general conditions C.I-C.V. Now we provide uniform consistency of QDRF.
Proof. See the Appendix.

Weak Convergence
Now we derive the limiting distribution of the general twostep estimator in (3). We impose the following sufficient conditions. As application of the general results, later, we establish the results for the ADRF and QDRF and check the general conditions in these examples.
In G.I, r n is a random sequence referring to a possible nonparametric estimation of π 0 in the first step. For example, a bandwidth parameter in kernel estimation. Note that when π 0 is estimated parametrically in the first step, we have r n ≡ 1. When π 0 is estimated nonparametrically, r n converges to 0 and, therefore, the second term disappears. Assumption G.I defines the Z-estimator. This type of o p (n −1/2 ) condition is slightly stronger than C.I but still allows the right-hand side to be zero only approximately. G.II requires the model to be differentiable in β and the derivative to be invertible. G.I and G.II are also assumed in theorem 3.3 of Pakes and Pollard (1989). G.III corresponds to assumption 6(b) of Cattaneo (2010). G.I-G.III are high level and we provide concrete examples using the ADRF and QDRF applications. G.IV strengthens C.IV such that the estimator of the nuisance parameter converges at a rate faster than n −1/4 . A similar assumption appears in Chen, Linton, and Van Keilegom (2003). G.V and G.VI are high level conditions, similar to Cattaneo (2010), and will be discussed below in the estimation of π 0 . Now we present the weak convergence result.
Proof. See the Appendix.
The result in Theorem 3 shows that the limiting distribution of the two-step DRF estimator is nonstandard (Gaussian process). This result is due to the presence of the set of continuous treatments. However, if one fixes the treatment att, then the limiting distribution collapses to a simple normal distribution. In spite of this, below we provide inference methods for DRF over the set of treatments that are simple to implement in practice. Nevertheless, Theorem 3 is an important step to rigorously establish the inference procedure.
This result has significant practical applications, for instance in the ADRF and QDRF examples. In these cases, the functional forms of m(·) are given and, hence, we can check the conditions in Theorem 3. We show that both ADRF and QDRF satisfy G.I-G.III, they are verified in Corollaries 3 and 4. G.IV-G.VI refer to the estimation of the nuisance parameter π 0 . For the weak convergence of the ADRF, we impose the following assumptions.

AG.I The parameter space for
AG.I is standard and requires the parameter space to be bounded. Also the second moment of Y (t) is bounded, which is used in Hirano, Imbens, and Ridder (2003) and Cattaneo (2010). Many functional classes satisfy AG.II, for example, the smooth class discussed above. AG.III is a high level condition on the nuisance parameter, and we provide estimators that satisfies this condition below.
Corollary 3 (Average). The two-step estimator of the ADRF is √ nr n -consistent and converges weakly in ∞ (T ), provided conditions AG.I-AG.III and G.IV.
Proof. See the Appendix.
To obtain the weak convergence of the QDRF, we impose the following assumptions.
Examples satisfying QG.II include smooth function classes. QG.I is a high level condition and will be discussed in the section of the estimation of π 0 . This assumption is similar to AG.III and a version of G.VI. Now we state the weak convergence of QDRF.
Proof. See the Appendix.

Estimation of π 0
We have been assuming that the estimator π of the nuisance parameter π 0 satisfies various conditions. In this section, we discuss the estimation of π 0 , that is, t|x) . The estimation of the nuisance parameter in the first step is very important for practical implementation of the proposed methods.
3.3.1 Nonparametric Estimation of π 0 . A potential nonparametric estimator for the nuisance parameter is the kernel estimator, as Although π 0 is the ratio of two densities, in fact the estimation is about the density of w ≡ x, y, and t. Under some assumptions described below, Gine and Nickl (2008), in their Proposition 4, showed that the kernel estimator converges weakly. Consider the following assumptions. Proof. See the Appendix.

NP.I The density function
Although the nonparametric estimation of π 0 seems persuasive from the theoretical point of view, there are issues with its practical implementation. First, in most empirical applications the number of variables in X used to satisfy the ignorability condition is relatively large. However, the dimension of X has an adverse effect on nonparametric methods due to the curse of dimensionality. Hence, practical estimation might be infeasible. Second, as Theorem 3 shows, when using nonparametric estimation in the first step, the convergence rate of the estimator is slower relative to parametric estimation. In addition, in the TE examples, for a formal verification of G.VI, parametric estimation in the first step is requested. Therefore, there are compelling reasons to use flexible parametric models to estimate the ratio of the conditional density functions, π 0 . In the next subsection, we describe such estimators.

Parametric
Estimation of π 0 . It is common in the literature to estimate nuisance parameters in two-step estimators using parametric models, see, for example, Murphy and Topel (1985), Newey and McFadden (1994), Chernozhukov and Hong (2002), Hirano and Imbens (2004), and Wei and Carroll (2009). We follow this literature and also propose a flexible parametric approach. For estimators of π 0 to have the desirable properties, we impose the following assumptions. P.I Assume π = π (u; t; ϑ), where ϑ ∈ R d ϑ with d ϑ being a positive integer. π (u; t; ϑ) is a smooth function of ϑ with uniformly continuous, bounded, and square integrable first derivative, π (u; t; ϑ), with respect to ϑ.
Condition P.I is a smoothness and boundedness condition on the function to be estimated. P.II assumes there is an estimator of the parameter that is asymptotically normal.
Proof. See the Appendix.
We provide examples to illustrate the estimation of π 0 in practice. These examples satisfy P.I-P.II and are a direct application of Proposition 2 such that the high level conditions C.IV-C.V and G.IV-G.VI are automatically satisfied.
is some known function, and b is an unknown parameter to be estimated. Using nonlinear least squares we obtain estimates of the conditional mean and variance, and therefore, the conditional density of T given X and Y . Similarly, we estimate f T |X (t|x).
Similarly, we estimate f T |X (t|x).
Example 7. A simple approach to estimate π 0 is to assume that (t, x, y) follow a known multivariate distribution, as a Normal distribution for instance. Then, MLE can be applied, f T ,X,Y (t, x, y) calculated, and π obtained.

INFERENCE ON THE DRF
In this section, we turn our attention to inference on the DRF. Important questions posed in the econometric and statistical literatures concern the nature of the impact of a policy intervention or treatment on the outcome distributions of interest; such as, for example, whether a policy exerts a significant effect, a constant versus heterogeneous effect, or a nondecreasing effect. Thus, we consider the following general null hypothesis uniformly, where r(t) is assumed to be known, continuous in t over T , and r ∈ ∞ (T ). Inference is carried uniformly over the set of treatment levels, T . The basic inference process is General hypotheses on the vector β(t) can be accommodated through functions of V n (·). We consider the Kolmogorov and Cramér-von Mises-type test statistics, T n = f (V n (·)), where f (·) represents the functionals for those two test statistics, as These statistics and their associated limiting theory provide a natural foundation for testing the null hypothesis. It is possible to formulate a wide variety of tests using variants of the proposed tests. The following are examples of hypotheses that may be considered.
Example 8 (The hypothesis of a significant effect). A basic hypothesis is that the treatment impact summarized by β(t) is ineffective for all doses, that is, statistically equal to zero for all t ∈ T . The alternative is that the treatment differs from zero at least for some t ∈ T . In this case, r(t) = 0.
Example 9 (The hypothesis monotone nondecreasing). The test of nondecreasing dose-response hypothesis involves the composite null β(t) ≥ 0, for all t ∈ T , versus the alternative of β(t) < 0, for some t ∈ T . In this case, the least favorable null involves r(t) = 0. Now we present the limiting distributions of the test statistics under the null hypothesis. From Theorem 3 and under the null hypothesis (H 01 : β(t) = r(t)), it follows √ nr n ( β(t) − r(t)) G(t). Thus, the following corollary summarizes the limiting distributions.
Corollary 5. Assume the conditions of Theorem 3. Under H 01 : β 0 (t) = r(t), as n → ∞, Proof. The assertion holds by Theorem 3 and the continuous mapping theorem.
In addition to testing the hypothesis β(t) = r(t) with known r ∈ C(T ), we could also test the hypothesis with unknown r, in which case, the estimation of r is needed.
Example 10 (The hypothesis of a constant effect vs. heterogenous effects). An important hypothesis is whether the treatment impact does not vary across the dose, that is, β(t) = β for some β for all t. In this case, r(t) = β, and one can estimate this component by T β(t). The alternative is the hypothesis of heterogenous effect, that is, β(t) varies across t. This can also be interpreted as a test for asymmetry of the dose-response function.
Now we display the limiting distributions of the test statistics under the null hypothesis.
Corollary 6. Assume the conditions of Theorem 3. Under H 02 : β 0 (t) = r(t), as n → ∞, Proof. The assertion holds by Theorem 3 and the continuous mapping theorem.
The weak limits in Corollaries 5 and 6 are not standard. Therefore, to make practical inference we suggest the use of bootstrap techniques to approximate the limiting distribution.

Implementation of Testing Procedures
Implementation of the proposed tests in practice is simple. To test H 01 with known r(t), one needs to compute the statistics of test T 1n or T 2n . Analogously, to test H 02 one computesT 1n or T 2n . The steps for implementing the tests are as following.
First, the estimates of β(t) are computed by solving the problem in Equation (3). For special cases of DRF, as ADRF and QDRF, one replaces (3) with (4) and (5), respectively. Second, V n is calculated by centralizing β(t) at r(t), and T 1n or T 2n is computed by taking the maximum over t (T 1n ) or summing over t (T 2n ). For the general case, H 02 with unknown r(t), the tests are computed in the same fashion. The only adjustment is the use of r(t) to computeV n . Third, after obtaining the statistic of test, it is necessary to compute the critical values. We propose the following scheme. We use the statistic of test T 1n as example, but the procedure is the same for the other cases. Take B as a large integer. For each b = 1, . . . , B: 1−α denote the empirical (1 − α)-quantile of the simulated sample { T 1 1n , . . . , T B 1n }, where α ∈ (0, 1) is the nominal size. We reject the null hypothesis if T 1n is larger than c B 1−α . In practice, the maximum in Step (iii) is taken over a discretized subset of T .
Proof. This theorem is a restatement of the Lemma 3 in the supplemental appendix.
Theorem 4 establishes the consistency of the bootstrap procedure. It is important to highlight the connection between this result and the previous section. In fact, Theorem 4 shows that the limiting distribution of the bootstrap estimator is the same as that of Theorem 3, and hence the above resample scheme is able to mimic the asymptotic distribution of interest.
] converges weakly to a tight random element G in L in P *probability.
Corollary 7 (Average). Under AG.IB-AG.IIIB and G.IV with "in probability" replaced by "almost surely," the bootstrap estimator of the ADRF is √ nr n -consistent and converges weakly in ∞ (T ).

Proof. It is an application of Theorem 4 and parallel to that of weak convergence of μ(t).
Consider the following conditions for QDRF.
Proof. It is an application of Theorem 4 and parallel to that of weak convergence of q τ (t).
As a remark, note that given the above framework, inference on the TE is simple. For example, consider the inference of QTE from treatment levels t 1 to t 2 . The point estimate is q(t 2 ) − q(t 1 ), which has an asymptotic normal distribution with mean q(t 2 ) − q(t 1 ), and its variance is computable from the covariance kernel of the weak limit of q(·). In a related work, He, Hsu, and Hu (2010) developed a testing procedure to test the hypothesis of no TE against a class of alternatives where the two outcome distributions differ only or mainly in the right tail.

MONTE CARLO
We conduct numerical experiments to assess the bias, size, and power properties of the proposed methods in finite samples. The data-generating process has treatment level t ∈ [0, 1], with N (0, 1). The treatment assignment is generated by T i = argmax t∈{0,0.01,...,0.99,1} H t,i , where H t,i = sin(2πt)X i + i (t) with independent innovations ( i (0), i (0.1), . . ., i (1)) ∼ N (0, 1). Thus, a sample of n iid random elements (X i , i (t), v i (t)) whose components are mutually independent is generated. We set γ = 0 under the null hypothesis. We examine bias, and the empirical rejection frequencies for 5% nominal level tests for different choices of sample size n = {250, 500, 1000}, number of bootstraps (250, 500), and parameter γ = {0, 0.05, 0.10}. We use the Cramér-von Mises test for the simulations. We report results for both average and median estimates, and also parametric and nonparametric estimation of π 0 , for completeness. In the former case, we use a normal distribution to estimate the densities. In the latter case, we employ Hall, Racine, and Li (2004), a nonparametric method to estimate conditional densities. The number of replications is 2000. Moreover, for comparison we report results for Cattaneo (2010) estimators. Although Cattaneo's (2010) methods are  We first report results on bias in Table 1. The bias of the proposed estimators are defined as the supreme of the pointwise biases. As expected the results show the bias for mean and median are approximately zero for both parametric and nonparametric estimation of the nuisance parameter in the first step. In addition, the bias decreases as sample size increases. On the other hand, Cattaneo's (2010) methods show some bias for small samples. Now we present the empirical size and power. Table 2 collects the results for mean and median, for the parametric estimation of the nuisance parameter, and Table 3 for the nonparametric estimation. For comparison, the results for Cattaneo (2010) are in Table 4.
In Table 2, we observe that the empirical sizes (γ = 0) are close to the nominal, 5%. The size improves with the sample size as well. The power is already high for small deviations from the null, γ = 0.05, and it improves substantially for γ = 0.10. We also study the impact of sample size and number of bootstrap. The power increases with the sample size, but the power is not sensitive to the number of bootstrap, implying that smaller number of bootstrap is satisfactory. Overall, Table 2 shows that the uniform tests are quite powerful and have small size distortions even in small samples. The results show that the use of parametric estimation in the first step leads to reliable, powerful, and computationally attractive inference.
The results for the nonparametric estimation are in Table 3 and show empirical size (γ = 0) slightly smaller than the nominal 5%. Thus, the parametric estimation controls the size better relative to the nonparametric estimation. As in the previous case, the empirical power is high, even for small deviations from the null (γ = 0.05), and it increases with sample size, but it is not very sensitive to the number of bootstraps. The simulations show numerical evidence that, in practice, although one is able to use nonparametric estimation in the first step, parametric methods are preferable and perform very well in finite sample.
Finally, Table 4 collects the results for Cattaneo (2010) methods. The results show there are large size distortions when using Cattaneo (2010) to analyze continuous data. In addition, our proposed methods substantively outperform Cattaneo's in terms of power. There is a significative loss of power when using a methodology that is not designed to handle continuous treatments. These results are in fact expected, since it is known from the literature that categorizing continuous treatments generally leads to a number of serious problems. Discretizing continuous outcomes leads to loss of power in testing, misclassification (which is associated with potential bias), problems for prediction, and even interpretation of the results and coefficients of interest (see, e.g., Cox 1957;Cohen 1983;van Belle 2008;Fedorov, Mannino, and Zhang 2009).
Overall the results suggest the proposed methods have good finite sample performance. Our main proposal, the uniform test, in addition to having a considerably better power than other methods, makes the bootstrap method a practical inference procedure.  Birthweight (in kilograms) Figure 1. Mother's weight gain during pregnancy and level of birthweight. The horizontal lines represent the high and low birthweight, respectively. The solid curve is the average and the dashed curves are the 90%, 75%, 50%, 25%, and 10% quantiles of birthweights.

APPLICATION: DOSE-BIRTHWEIGHT FUNCTIONS
We illustrate the use of the two-step estimator with a study of dose-birthweight functions. We estimate the unconditional average and quantile dose-birthweight functions, for both mothers' weight gain during pregnancy as well as mothers' age. Recently, birthweight has been shown to be the foremost indicator of infant health. In addition, unhealthy births have large economic costs in both immediate medical costs and longer care costs.
Infants are classified as low birthweight (LBW) when weighing less than 2.5 kilograms (kg) at birth. There is empirical evidence showing that LBW is associated with several health problems for the baby and the direct medical costs of LBW are quite high. Almond, Chay, and Lee (2005) documented that the hospital costs for newborns are elevated. The expected costs of delivery and initial care of a baby weighing 1 kg at birth can exceed $100,000 (in year 2000 dollars). The costs remain elevated even among babies weighing 2-2.1 kg. The infant mortality rate also increases at lower birthweights. On the other hand, problems associated with high birthweight have become more recognizable. For instance, babies weighing more than 4 kg (defined as high birthweight (HBW)) and especially those weighing more than 4.5 kg (classified as very high birthweight) are more likely to require cesarean-section births, have higher infant mortality rates, and develop health problems later in life. Recent research also suggests giving birth to infants over 4.5 kg carries significant risks to both the infant and the mother; see, for example, Cesur and Kelly (2010). Fetal disorders such as shoulder dystocia, stillbirth, Erb's palsy, jaundice, and respiratory distress have been found to be more common in HBW infants in addition to greater levels of obesity later.

Data
The data in this study are from the 2004 public use natality data of Wisconsin from the National Vital Statistics System of Centers for Disease Control and Prevention. We consider only live, singleton births (without missing values of any used characteristics) to new, white mothers that are not older than 45, with less than 5 years of college, whose counties of occurrence (birthing) and residence are the same. By using a more homogenous sample, we focus on the effects of birth inputs on the birthweights. This results in a sample of 13,581 births. We emphasize that the inferences using the sample should be applicable only for the subpopulation represented by the sample choice. Table 5 displays the descriptive statistics for birthweight (measured in kg), the mother's age, the mother's weight gain during pregnancy (WG), number of cigarettes per day (Cigarettes), number of prenatal care visits (No. care), and the mother's years of education for the sample. Out of 13,581 births, there are 6508 females (proportion 0.4792), and 7732 mothers are married (proportion 0.5693).

Estimation of Nuisance Parameter π 0
The estimation strategy of nuisance parameters π 0 (u; t) := (4) and (5) is the same for both the effects of the mother's age and weight gain during pregnancy. We describe the details of the estimation procedure using the mother's age, as an example.
The mothers' ages in our sample range from 14 to 45 years old. Therefore, it is natural to treat the mother's age as a continuous variable in the interval [13,46]. For the estimation of (a) 10% Q. conditional density, we assume that log( T i −13 46−T i ) = X i θ 0 + i , where i is independent of X i and has density N (0, σ 2 0 ). The choice of X is discussed below. The log-ratio form of the dependent variable makes mothers' age to be limited to [13,46]. Hence, T i −13 46−T i =: η i follows log-normal as log-N (X i θ 0 , σ 2 0 ). The density of T |X is obtained by calculating the distribution function, as F 0T |X (t|x) = F η|X ( t−13 where and φ are the distribution and density of a standard normal. For the conditional density of T |X, Y , we assume that log( T i −13 46−T i ) = U i ϑ 0 + ε i , where ε i is independent of U i and has density N (0, ς 0 ). Thus, the conditional density

Mother's Weight Gain During Pregnancy.
In the study of the mother's weight gain and age, we control for characteristics of mothers, that is, marital status, years of education, and number of cigarettes per day during pregnancy. It is important to note that although we are controlling for some characteristics of mothers, we are estimating the unconditional DRF. The results regarding the mother's weight gain during pregnancy show evidence that, after controlling for a mother's characteristics, in general, larger weight gain during pregnancy leads to higher birthweight. Just as the estimation of the unconditional treatment effects (TE) of the mother's age, we are estimating the unconditional TE of the mother's weight gain during pregnancy. Figure 1 reports the estimates of the average and selected quantiles of the birthweight for different levels of the mother's weight gain during pregnancy. From the figure, we see that the slopes are relatively larger for low or high weight gain. The shape of the curves resembles a simple cubic function with steeper slopes at the extremes. This implies weight gaining generates higher birthweights at low and high levels of weight gain. For low weight gains, the impact on the birthweight is higher for upper quantiles and relatively mild for low quantiles. However, for the middle range of weight gain, all the curves are relatively parallel. The disaggregated plots with 90% confidence bands are shown in Figure 2. The confidence bands in general are relatively wider at the extremes of weight gain due to the sparsity of the data at the extremes. In addition, Table 6 describes TE for selected weight. It contains 20 pound effects. The results show that the impact of gaining weight is positive and statistically significant.

Mother's Age.
The empirical results display negative effects of the mother's age on birthweight, after controlling for a mother's characteristics (i.e., age, marital status, years of education, number of cigarettes per day, and the month of first prenatal care visit). As expected, the birthweight decreases as age increases. Figure 3 plots the point estimates for the mean, 10%, 90%, and the three quartiles of birthweights for mothers' ages from 14 to 45. The impact of the age on birthweight is negative for all the quantiles as well. However, for a given age, the impact becomes more severe for lower parts of the distribution of birthweight. In particular, the impact is very prominent for the 10% quantile of mothers after 40 years old. The estimated (a) 10% Q.    (Figure 4), one can see that the 90% confidence bands are narrower in the middle ages because there are more data for that age range. In contrast, we can see that the confidence interval for 10% quantile at the age of 45 is relatively wide. Table 7 describes the TE for selected ages, containing 5 year effects. Most of them are statistically significant, and negative values show evidence that aging is negatively related to birthweight.
The results also show evidence of certain risks for birthweight when the mother's age is at the sample boundary. Although on average the birthweight is within the "healthy range," between 2.5 and 4 kg, estimates show that mothers younger than 20 years are likely to have HBW infants, while mothers older than 44 years are likely to have LBW infants.

CONCLUSION
In this article, we first study the identification of the DRF with continuous treatment levels. In empirical studies, we usually have observational data. Agents can choose the levels of treatment they desire. Under the ignorability assumption, we derive moment conditions that are identified by observational data. Based on the moment conditions, we propose a two-step estimator. Throughout the article, we use the important examples of average and quantile DRF as applications of the general theory. Sufficient conditions are provided for establishing the asymptotic properties of the estimator. We develop uniform inference for the two-step estimator. More specifically, we are interested in testing the null hypothesis that a DRF β(t) = r(t) with t ∈ T for known or unknown r(t). Because the parameters are of infinite dimension and the weak limits of test statistics are not standard, we use a bootstrap method when conducting practical inference. Finally, we apply the methods to the study of unconditional effects of mother's age and weight gain during pregnancy on infants' birthweight, illustrating the usefulness of the new approach. An important topic for future research includes further analysis of potential nonparametric estimation in the first-step for the continuous treatment effects models.

APPENDIX: PROOFS
This appendix collects the proofs of the results given in the text.
Proof. Fixing t = t 0 , by law of iterated expectations, E[m(Y (t 0 ); β(t 0 ))] = E{E[m(Y (t 0 ); β(t 0 ))|X]}. For the conditional expectation, where the first equality is by condition I.II.1, the second equation is by the fact that if T = t 0 , then Y = Y (t 0 ), and the third equality is by condition I.III.2. Moreover, we have By law of total expectation the right-hand side of Equation (A.1) equals where F T |X denotes the conditional distribution function of T given X.
Noting that where 1 and 2 are fixed numbers in [0,1]. The second equality is by mean value theorems for differentiation and integration. And the third equality is by condition I.III.1 and dominated convergence theorem.
To demonstrate Theorems 2, 3, and 4, we make use of Lemmas 1, 2, and 3, respectively, given in the online supplemental appendix. These lemmas establish, respectively, uniform consistency, weak convergence, and validity of the bootstrap for generic Z-estimators with possibly nonsmooth functions and a nuisance parameter, when both the parameter of interest and the nuisance parameter are possibly infinitely dimensional. The results allow for the case of profiled nonparametric estimator, that is, it depends on the parameters.
Condition C.II holds by condition QC.I. Condition C.III is satisfied because τ − 1{y < q τ (t)} is a bounded function. Condition C.V is implied by the fact that the function class {ψ 1q,t : q ∈ ∞ (T ), t ∈ T } is Glivenko-Cantelli because it is a Vapnik-C ervonenkis class and by condition QC.III. Hence, all the conditions of Theorem 2 are satisfied.