Nonparametric Specification Testing of Conditional Asset Pricing Models

Abstract This article presents an adaptive omnibus specification test of asset pricing models where the stochastic discount factor is conditionally affine in the pricing factors. These models provide constraints that conditional moments of returns and pricing factors must satisfy, but most of them do not provide information on the functional form of those moments. Our test is robust to functional form misspecification, and also detects any relationship between pricing errors and conditioning variables. We give special emphasis to the test implementation and calibration, and extensive simulation studies prove the functioning in practice. Our empirical applications show a conditional counterpart of a well-known problem of unconditional models. The lack of rejection of consumption based conditional models seems to be due to a poor conditional correlation between consumption and stock returns.


Introduction
Asset pricing theory must consider that asset returns and pricing factors are predictable, in the sense of a significant timevariation in their joint conditional distribution. For this purpose, conditional asset pricing models provide constraints that conditional moments of returns and pricing factors must satisfy. However, most of these models do not provide much information on the functional form of those conditional moments. This article presents an adaptive omnibus specification test for conditional asset pricing models based on kernel smoothing. In this way, our test is robust to functional form misspecification of both conditional moments and prices of risk, and has power against any relationship between pricing errors and conditioning variables.
There is a vast literature that uses parametric methods to evaluate conditional asset pricing models, and concludes that conditioning information substantially improves their empirical performance. For instance, Jagannathan and Wang (1996) found that the conditional capital asset pricing model (CAPM) can explain the cross section of stock returns even though the static CAPM cannot, while Lettau and Ludvigson (2001b) found that the value premium can be explained by a consumption CAPM (CCAPM) with a time-varying price of risk. Other authors suspected that this may be an illusion caused by low statistical power of standard asset pricing tests, see, for example, Lewellen and Nagel (2006).
In this line, there have been significant contributions that avoid specifying the conditional distribution of returns and factors by using semi-parametric techniques. Nagel and Singleton (2011) estimated nonparametrically the first and second conditional moments, but work with asset pricing models where the prices of risk are parametric. Local  been used in empirical asset pricing by Gagliardini, Gourieroux, and Renault (2011), Antoine, Proulx, and Renault (2018), and Gagliardini and Ronchetti (2020). Nonparametric prices of risk have been considered by Wang (2003) and Roussanov (2014) using kernel methods, and their work can be interpreted in terms of two stages. In the first stage, they estimate the first and second moments of returns and factors by kernel methods, and use those moments to estimate the prices of risk. However, in the second stage, they use parametric methods to test the asset pricing model. We start from their estimators of prices of risk, but we develop a formal nonparametric test of the model instead.
In particular, Wang (2003) used nonparametric methods to evaluate conditional variants of the CAPM and the Fama-French (FF) model. He uses the stochastic discount factor (SDF) approach to first estimate the prices of risk nonparametrically, and then test if pricing errors are linear in the conditioning variables. This is easy to implement but has zero power against pricing errors that exhibit nonlinear dependence on the conditioning variables. We introduce a test that is general enough to detect nonparametric alternatives from independence. Roussanov (2014) estimated nonparametric prices of risk by minimizing a quadratic form of conditional pricing errors, which nests the prices of risk in Wang (2003), with a focus on linearized consumption based models. However, he does not develop a formal test of the asset pricing model. Instead, he tests if a particular pricing error has a zero average, and displays figures of pricing errors with pointwise bootstrap bands. We derive a formal test and develop the asymptotic theory for it. Moreover, our test considers jointly all the pricing errors, and does not focus on unconditional means. Cai, Ren, and Sun (2015) developed the asymptotic distribution of a locally linear estimator of the market price of risk in the CAPM, but they do not provide a formal test of this model.
In econometrics and statistics, there exists a pleiad of specification tests to test parametric models against a broad set of alternatives. Mainly two fundamental approaches have been considered: statistics that appear as a weighted average of the residuals, and statistics that compare parametric and nonparametric fits (see Hart 1997). We follow the second alternative, along the lines of Härdle and Mammen (1993). Their test is based on an integrated squared difference between the two estimated curves, where the parametric curve is also smoothed. They show that standard bootstrap does not work for this test, whereas wild bootstrap does.
Our test is not a trivial application of this test to asset pricing models. Our test allows richer data dynamics, but more importantly our null hypothesis is nonparametric because a complex structural semiparametric model explains the conditional risk premia of excess returns. So the null contains an asset pricing model in which both the risks, or factor sensitivities, and the prices of risk are nonparametric functions of some given observables requiring several non-and semiparametric estimation steps. The general problem of nonparametrically testing against structural non-or semiparametric alternatives was studied, among others, by Rodriguez-Poo, Sperlich, and Vieu (2015). While our null hypothesis and estimators are much more complex, we can partly follow their construction of a test statistic and its calibration to achieve reasonable power while holding the level, even for small samples.
We study two empirical applications. In the first one, we work with traded factors and monthly data. Both the conditional CAPM and the FF model are rejected with size and book-to-market sorted portfolios, even though the latter model yields lower pricing errors. In the second application, we work with a mixture of traded and non-traded factors, and quarterly data. We find strong empirical evidence against the conditional CAPM. However, we do not find evidence against zero pricing errors when we test a linearized CCAPM with consumption growth as the only pricing factor. In fact, in the case of a CCAPM with Epstein-Zin preferences, where both the market portfolio and consumption growth are considered, the market price of risk seems to be zero.
Importantly, low pricing errors with the consumption factor do not necessarily mean that this is an economically meaningful pricing factor. The lack of rejection seems to be due to a poor conditional correlation between consumption and the cross section of stock returns. We find a conditional counterpart of a well-known problem with unconditional asset pricing models. Kan and Zhang (1999) argued that some empirical models rely on useless factors with a zero unconditional covariance with returns. Recently, the empirical relevance of this problematic case has been emphasized by Peñaranda and Sentana (2015), Burnside (2016), Bryzgalova (2016), Gospodinov, Kan, and Robotti (2017), and Kleibergen and Zhan (2020).
We focus our analysis on asset pricing models whose SDF is conditionally affine in the pricing factors, but we are flexible in the conditional moments of returns and factors, and the prices of risk of such an SDF. There are earlier examples of different flexible SDF methods in empirical asset pricing. Bansal and Viswanathan (1993) did not start from a particular asset pricing model, instead they only impose that the SDF depends on some variables, and are flexible in that dependence. Other flexible approaches are Aït-Sahalia and Lo (1998) and Rosenberg and Engle (2002), who focused on option pricing and the corresponding state-price density or SDF projection, respectively, as a function of the underlying asset's payoffs.
The rest of the article is organized as follows. Section 2 reviews the estimation of conditional asset pricing models. Section 3 develops our test, and the corresponding asymptotic behavior. Section 4 describes the implementation and calibration of the test, jointly with a Monte Carlo exercise, while Section 5 reports our applications. Finally, Section 6 concludes. Proofs and auxiliary results are deferred to appendices that are available online.

Conditional Asset Pricing Models
This section describes the nonparametric estimation of prices of risk along the lines of Wang (2003) and Roussanov (2014), who represent a conditional asset pricing model by means of its SDF. The simplicity of the SDF to present a general theory of asset pricing is well recognized, see Cochrane (2005).

Prices of Risk
The investment set is given by N assets, with a vector of excess returns r t+1 that is known at time t + 1. The information set known at t is given by a d × 1 vector z t of conditioning variables that measure the state of the economy. For instance, the excess returns could be associated with stock portfolios, and a conditioning variable could be the default spread, which is high during recessions. Standard arguments such as the lack of arbitrage opportunities, or the first-order conditions of a representative investor, imply the pricing conditions for some random variable m t+1 called SDF, which discounts uncertain pay-offs in such a way that their expected discounted value equals their cost. Appendix E introduces a riskless asset in the investment set, and describes its SDF implications. The usual approach in empirical finance is to model the SDF as an affine transformation of some K < N observable risk factors f t+1 such as the return on the market portfolio or aggregate consumption growth. There are two common variants to express that SDF, the uncentered SDF Wang (2003) uses the former variant, while Roussanov (2014) used the latter variant. In what follows we focus on the uncentered SDF as there are relevant situations for which this variant exists but the centered SDF does not, as we find in our empirical application. Appendix C develops the centered SDF and the relationship between the two variants.
The conditional extensions of the CAPM and the FF model are well-known examples of conditionally affine SDFs, see Wang (2003) and the references therein. In the former model, the excess return on the market portfolio is the only pricing factor, while the latter model has two additional factors associated with portfolios that capture size and value effects, denoted SMB (long/short in small/large capitalization stocks) and HML (long/short in high/low book-to-market stocks), respectively. The two commented models rely on traded factors only, and hence the SDF pricing conditions follow directly from the conditional mean-variance efficiency of a portfolio of the factors. But we can also use similar pricing conditions with models that include non-traded factors, such as linear approximations to the CCAPM and its extensions, which are often used in empirical studies, see Roussanov (2014) and the references therein. The linearized version of the canonical CCAPM with power utility has a single pricing factor, the aggregate consumption growth, and the price of risk is constant and equal to the relative risk aversion. Lettau and Ludvigson (2001b) motivated a timevarying price of risk by means of a linearized version of the habit model of Campbell and Cochrane (1999). The CCAPM with Epstein-Zin preferences identifies both the market return and consumption growth as pricing factors.
The vectors δ (z t ) and τ (z t ) represent the prices of risk, that is, the risk premium that is obtained per unit of risk, where risk is measured as sensitivity with respect to the factors. In the simple setting of a single-factor SDF in Equation (2), we can rewrite the pricing conditions (1) as all excess returns satisfying If the factor is traded, then the factor f t+1 itself should satisfy the pricing conditions when the model holds, and we can interpret the price of risk as where the factor's risk is its second moment. If the factor is nontraded, we still have a similar interpretation of risk premium per unit of risk when the asset pricing model holds, but in terms of the corresponding factor mimicking portfolio (the projection of the factor on the returns) In empirical work, the prices of risk are usually chosen to minimize some metric of pricing errors. Let us denote the vector of risk premia and, respectively, the risks or factor sensitivities μ (z t ) = E (r t+1 | z t ) and (5) Given some prices of risk δ (z t ), the model pricing errors are and the particular vector δ (z t ) can be chosen by minimizing the quadratic form q (z t ; δ (z t )) = e (z t ; δ (z t )) W (z t ) e (z t ; δ (z t )) (7) for some weighting matrix W (z t ) that may depend on z t , but not on δ (z t ). To keep our notation simple, we also denote δ (z t ) the particular prices of risk that minimize the criterion q (z t ; δ (z t )). The first-order conditions that pin down these prices of risk are which are equivalent to the exact pricing of the portfolios with excess returns D (z t ) W (z t ) r t+1 , and we obtain The chosen metric of pricing errors is irrelevant under the null hypothesis, when there is a vector of risk prices that makes the pricing errors equal to zero. However, under the alternative hypothesis different weighting matrices pin down different prices of risk. Two widely used choices of W (z t ) are the identity matrix, which makes the criterion (7) equal to the sum of squared pricing errors, and which makes the criterion (7) equal to the conditional counterpart of the squared Hansen and Jagannathan (1997) distance (HJD). These authors show that the mean-square distance between the SDF implied by a model and the set of SDFs with zero pricing errors is equal to a quadratic form in the model pricing errors, with the commented weighting matrix. The HJD is widely used in empirical finance, see Gagliardini and Ronchetti (2020) and the references therein. On the other hand, in the simpler setting of a parametric SDF, we could think of optimal IV like Nagel and Singleton (2011), and weight the pricing conditions with the inverse of E m 2 t+1 r t+1 r t+1 | z t evaluated at the true SDF m t+1 to optimize the large-sample properties of the estimators.
The HJD was developed to compare competing models, but the choice of weighting matrix (10) is also useful to interpret the prices of risk. This choice makes the first-order conditions (8) equivalent to the exact pricing of the factor mimicking portfolios and the corresponding prices of risk can be written as That is, we obtain prices of risk like in Equation (4), even if the asset pricing model does not hold. In fact, the choice of prices of risk in Wang (2003) with traded factors is equivalent to the choice of the weighting matrix (10), even though he follows mean-variance arguments without an explicit weighting matrix. In the case of traded factors, the pricing conditions (1) should also include f t+1 , pricing the vector x t+1 = (r t+1 , f t+1 ) instead of only r t+1 . We can show that, if we use W (z t ) = E x t+1 x t+1 | z t −1 to weight the pricing errors of x t+1 , then the corresponding first-order conditions (8) yield Wang's prices of risk.

SDF Estimation
A fully parametric prespecified SDF would inherit all problems of the various potential misspecifications. We relax such a framework, which actually requires a series of different nonparametric (pre-)estimates that entail the typical issues like bandwidth choice, boundary problems, or bias reduction. Practically, we have to estimate μ(z) in Equation (5), and D (z) in Equation (5), and maybe also W (z), to estimate the prices of risk δ (z) in Equation (9). Our estimation strategy is the following. We estimate the standard regression problem μ(z) in Equation (5) by local linear kernel smoothing with bandwidth h 1 . In this way, we nest linear regression models also for finite samples, see Fan and Gijbels (1995). We denote μ h 1 (z) the corresponding estimator. For the second-order moments D (z) in Equation (5), and potentially W (z) in Equation (10), we follow Yin et al. (2010) using a local constant kernel regression with bandwidth h 2 . We denote D h 2 (z) and W h 2 (z) the corresponding estimators. In particular, for a kernel function K(·), all proposed estimators are the respective minimizerb 0 of For instance, for y t+1 = r 1,t+1 and h l = h 1 , b 0 gives the first element of the vector estimate μ h 1 (z). For making b 0 a Nadaraya-Watson estimate of matrix element D i,j (z), set b 1 := 0, h l = h 2 , and y t+1 = r i,t+1 f j,t+1 . Analogously, we can obtain an estimate for E r t+1 r t+1 | z t . The estimators and bandwidths have to fulfill conditions (C.1), (C.2), (A.6), (A.7), and those mentioned in Theorem 1 of Section 3.2. To satisfy the bandwidth conditions, higher dimensional z may require higher order kernels K(·). In our simulations and applications, d is 1 so that we can take the Quartic kernel. Bandwidths h 1 and h 2 should be chosen to calibrate the test, and therefore unlike in Wang (2003) or Roussanov (2014). See Section 4 for the corresponding details of our data adaptive bandwidths.
Given the previous estimators, the estimator of the prices of risk (9) isδ This estimator depends on bandwidths (h 1 , h 2 ) through the estimator of the first moments μ h 1 (z t ) and the second moments D h 2 (z t ). The matrix W h 2 (z t ) may not require a bandwidth as it could be a known matrix, or even a constant matrix that does not depend on z t .
These prices of risk can be interpreted as a two-pass procedure where first the conditional moments μ (z t ) and D (z t ) are estimated from time series data, and second the prices of risk are obtained from cross-sectional data, with a weighted leastsquares projection of μ h 1 (z t ) on D h 2 (z t ). This interpretation is similar in spirit to Fama-MacBeth regressions, which are an alternative to SDF methods that may also allow for timevariation in risk and risk premia, see Cochrane (2005) for instance.

Omnibus Specification Test
We introduce a test for conditional asset pricing models, addressing the two-dimensional nonparametric nature of such a test: the SDF that represents the model, and the pricing conditions that represent the null hypothesis.

Test Statistic
The main problem in nonparametric testing is to (i) estimate the distribution of the test statistic under the null, and to (ii) calibrate this distribution estimate for a given significance level to guarantee that the practitioner can control the error of the first type, that is, the rejection rate when the null hypothesis is correct. Finally, it has to be checked that (iii) this calibrated version exhibits some power against all interesting alternatives. Out of a huge set of theoretically valid test statistics we have to find one for which we can provide solutions to these points (i) to (iii).
From the pricing conditions (1) and an uncentered SDF (2), the null hypothesis of a correctly specified asset pricing model is for all z ∈ Z, or equivalently by Equation (6). Therefore, our statistic must check whether the pricing errors are significantly different from zero, or any other given constant. This can be done by a statistic that integrates the quadratic form Equation (7) S = e (z; δ (z)) W (z) e (z; δ (z)) π 1 (z) dz, where π 1 (z) is a given weight function, typically used for trimming. Under the null hypothesis, S is small, while it is large under the alternative. Note that, while it is theoretically possible to choose a weighting matrix W (z) in the test statistic (15) that is different from that in the criterion function (7) that pins down the prices of risk, in practice we recommend the same choice. The reason is that W (z) defines the metric of pricing errors that we are interested in, and this metric should drive both the choice of prices of risk and the test of the model. In other words, we should compare the null model to a nonparametric alternative under the same metric as the one defining the null model estimate. Note further that the standard arguments from parametric inference for efficient estimation, and those for standardization to obtain chi-squared tests, do not directly apply to the analysis of nonparametric functions. Therefore, we favor the choice of W (z) in terms of the interpretation of the corresponding metric of pricing errors metric and the implied prices of risk. For instance, by choosing Equation (10), our statistic integrates the squared conditional HJD, and the prices of risk are given by the pricing of the factor mimicking portfolios.
To obtain a sample analogue of Equation (15), we start from the standard approach used in specification testing for regression functions. We look at the residuals under the null model Note that E (u t+1 | z t ) = e (z t ; δ (z t )) = 0 under the null. Let us define the residualŝ whereδ (z t ) is the nonparametric estimator (12) of the prices of risk. These residuals depend on bandwidth h 2 through D h 2 (z t ), whileδ (z t ) adds the bandwidth h 1 .
being a standard dvariate kernel function and h a bandwidth, we obtain a sample counterpart of (15) in which π 2 (z) corresponds to π 1 (z)p 2 (z). Once again, the matrix W h 2 (z t ) may not require a bandwidth. This test statistic can be interpreted as a standard approach in nonparametric specification testing, based on the difference between a direct estimator of μ (z) and an estimator of the null D (z) δ (z) convoluted with the same smoother as the raw output r t+1 . Note that this statistic considersû t+1 only at the values of z t within the support of p (·), and pays substantially less attention to areas where the data are sparse. This not only simplifies the theoretical derivations, but also makes the statistic stable in practice.
The statisticŜ h depends directly on h, but also indirectly on (h 1 , h 2 ) throughû t+1 . As explained in Sperlich (2014), (h 1 , h 2 ) have to be chosen in a way that the test is calibrated, that is, does not over-(or too much under-) reject under the null. We maximize the power along h by looking at for any h and given (h 1 , h 2 ). As bootstrap is used in Section 4 to approximate the distribution of our statistic under H 0 , the resulting bootstrap samples can also be used for estimating these quantities. In sum we have a test statistic which is quite attractive with a natural interpretation, an automatic procedure for maximizing power, and two calibration parameters, (h 1 , h 2 ). We review previous approaches to testing asset pricing models with nonparametric prices of risk in Appendix B.

Asymptotic Behavior
In this section, we show the asymptotic normal distribution of the (recentered and scaled) test statistic under the null hypothesis, as well as under the local alternatives to the null hypothesis for which the test is consistent.
The level and power of our test depend on the properties of μ h 1 (z) and D h 2 (z), which appear in the test through the prices of riskδ (z) defined in Equation (12). Therefore, we first analyze the properties of this estimator, and afterwards the asymptotic properties of the test. To do this, we introduce two different sets of assumptions. The first one is introduced to obtain the asymptotic properties ofδ (z), whereas the second set of assumptions is introduced to derive the asymptotic properties of our test statistic. Specifically, we assume the following: (C.1) Let Z be a compact subset of R d and let μ h 1 (z) be an estimator of μ (z) at observation points z such that p (z) > 0, then for some α 2 > 0, as T tends to infinity.
The assumptions above give uniform asymptotic bounds for μ h 1 (z) and D h 2 (z). They are fulfilled for many of the existing nonparametric estimators of μ h 1 (z) and D h 2 (z), including our proposals in Section 2.2. As it is well known, these bounds depend on the bandwidth (either h 1 or h 2 ), the dimensionality d, and a parameter (α 1 or α 2 ) that is related to the smoothness of μ (z) and D (z), respectively. Specifically, it is implicitly assumed that the α 1 th derivatives of μ are Lipschitz continuous, and analogously the α 2 th derivatives for D. In many articles dealing with nonparametric estimators, conditions (C.1) and (C.2) are shown to hold for estimators that rely on rather primitive conditions. For example, for a local linear estimator of μ (z), Masry (1996) showed under some mild regularity conditions a similar bound as the one in (C.1). In the case of a local constant estimator for D (z), Yin et al. (2010) show a similar bound as the one in (C.2). In sum, we have introduced (C.1) and (C.2) for the sake of generality, that is, to manifest that our test is valid for a broad range of nonparametric estimators of μ (z) and D (z). The next conditions are needed for identification: 3) The elements of W (z), D (z) and μ (z), at observation points z such that p (z) > 0, fulfill (C.5) At observation points z such that p (z) > 0 the weighting matrix W (z) is bounded from above and below.
Conditions (C.3) and (C.4) are standard bounds in first and second order moment conditions. Note also that the matrix W (z) needs to be bounded at values of z such that p (z) > 0. These conditions are similar to the assumptions imposed in Hansen (2008) to show weak uniform convergence for general nonparametric estimators. Now we can state that the estimator of the prices of risk satisfies the following: Theorem 1. Let Z be a compact subset of R d and assume conditions (C.1) to (C.5) hold. Then, as κ −1 T h 1 and κ −1 T h 2 tend to zero in such a way that κ 2 T Th d 1 → ∞ and κ 2 T Th d 2 → ∞ we have that, under the null hypothesis, as T tends to infinity.
Next, we introduce the assumptions for deriving the asymptotic properties of test statisticŜ h .
(A.1) The process (z t , r t ) is absolutely regular, that is, where F s t is the σ -field generated by {(z k , r k ) : k = t, · · · , s}. Furthermore, it is assumed that β(s) exhibits a geometric rate of decay. (A.2) z t has a bounded density function p (z t ). Furthermore, the joint density of distinct elements of (z 1 , r 1 , z s , r s , z t , r t ), (t > s > 1), is continuous and bounded by a constant independent of s and t. Furthermore, p (z t ) is twice continuously differentiable in all its arguments.
for all t and i = 1, · · · , N. Furthermore, for all i = j, and t = s, E ε i,t+1 ε j,s+1 z t , z s = 0. Finally, E |μ (z t )| 16 + r 16 t+1 < ∞. (A.4) K (·) is a product kernel, that is, K (z) = d k (z d ), and k (·) is a symmetric density function with bounded support in R and |k (z 1 ) − k (z 2 )| ≤ c |z 1 − z 2 | for all (z 1 ,z 2 ) in its support. There are many examples of absolutely regular processes in discrete and continuous time; see Bradley (1986) and Doukhan (1994). Further examples are renewal processes and solutions of stochastic differential equations; see Heinrich (1992) and Veretennikov (1987), respectively. Assumption (A.1) is needed to obtain the asymptotic distribution of the test statistic under the null. More precisely this assumption is necessary to apply a central limit theorem for U-statistics under dependence. Several CLT's under dependence are available in the literature, see among others Hjellvik, Yao, and Tjøstheim (1996), Fan and Li (1999) and Gao and Hong (2008). Assumption (A.2) specifies that the density p (z) is bounded, while (A.3) controls the tail behavior of the conditional expectation of the errors ε t . We further need to make assumptions about the structure of the matrix E ε t+1 ε s+1 z t , z s for t = s. Assumption (A.4) on the kernel is also standard in nonparametrics. Finally, (A.5)-(A.7) relate all bandwidths included in the procedure. In particular, (A.5) is concerned with the rate of the bandwidth of the test. In fact, for bandwidths (h 1 , h 2 ) we assume rates that are arbitrarily close to the optimal rates for nonparametric estimation (minimizing the average mean squared error), and for bandwidth h somewhat faster in accordance with nonparametric testing theory. While it generally cannot be directly compared to optimal rates for nonparametric testing when the null is fully parametric, it is interesting to note that it is close to the lower bound of optimal bandwidths found by Horowitz and Spokoiny (2001). More specifically, for κ T going arbitrarily slow to infinity, (A.5) to (A.7) imply that (h 1 , h 2 ) have rates slower than or equal to T − 1−ε 3d but faster than T − 1 d+2α j , j = 1, 2, while h simply follows (A.5). Roughly, α 1 and α 2 should not be smaller than dimension d, which reflects the well-known curse of dimensionality in the nonparametric analysis.
Theorem 2. Under conditions (C.1) to (C.5), (A.1) to (A.7), and if the null hypothesis H 0 is true, then as T tends to infinity.
For nonparametric methods, it is well known that, while the first-order approximations can be quite useful to understand the theoretical properties, they are less helpful for the empirical analysis. Typically, large samples are required before higher order terms become negligible; the same holds for the distributional approximation by normality. Furthermore, the asymptotic bias and variance have to be estimated nonparametrically which is not trivial but has an important impact on the test performance. All this makes the use of a bootstrap procedure recommendable in order to approximate the distribution of the test under the null.
Before we turn to the bootstrap and practical implementation issues, we study the power of our test against local alternatives to the null hypothesis. Let us define the sequence of local alternatives to the null hypothesis where γ T is a sequence that tends to zero such that Th d , as T tends to infinity, and (z) is a bounded function (uniformly in z) that fulfills (H.2) At observation points z such that p (z) > 0, we assume that sup z∈Z |W (z) (z)| < ∞.
Assumption (H.1) is related to the bias problem that is mentioned above. Combined with (A.5) we obtain The power of our test is described in the following result.

P Th d/2Ŝ
h > c T → 1 as T tends to infinity.
Theorem 3 indicates that our test has nontrivial power only against sequences of local alternatives for which γ T tends to zero at a rate that is smaller than √ T. As seen in Andrews (1997), tests based on weighted parametric residuals have nontrivial power against local alternatives for which the rate is exactly √ T. Thus, at least in terms of the asymptotic local power these tests appear to dominate tests that require slower rates. However, as shown in Horowitz and Spokoiny (2001), at an exact rate of √ T, no test can have nontrivial power uniformly over reasonable classes of functions (·) in (22).

Implementation and Calibration
The test distribution is estimated using wild bootstrap. We study different bootstrap procedures of our statistic, and show the consistency for the implemented procedure. Calibration and power are obtained by a proper combination of the bandwidth choices. Importantly, we develop a feasible testing procedure even though three bandwidths are required: Two for the estimation of the first and second conditional moments, and a third one for the test statistic.

Wild Bootstrap-Type Resampling
Härdle and Mammen (1993) studied three different bootstrap procedures and conclude that wild bootstrap is the most pertinent method for testing the regression structure. Therefore, we adopt a wild bootstrap scheme to estimate the distribution of our statistic. Note thatŜ h can be written aŝ where I is the identity matrix, and H h 1 is the socalled hat-or smoothing matrix of the kernel estimator used when estimating μ(·). Following Härdle and Mammen (1993), we generate B bootstrap analogues ofŜ h defined exactly as above but replacingû t+1 by u * t+1 = I −P r * t+1 with r * t+1 being generated under the null hypothesis. We may then set r * t+1 = D h 2 (z t )δ b (z t ) + ε * t+1 with bootstrap residuals ε * t+1 (see below for details) and the bootstrap bandwidth in the prices of riskδ b (z t ). One may rather want to generate bootstrap returns That is, these bootstrap excess returns are defined only implicitly. As simulations with the simplified version based on D h 2 give no satisfying results, one needs to look for alternatives.
For example, though in a quite different testing context, Kreiss, Neumann, and Yao (2008) proposed to use only ε * t+1 instead of u * t+1 when simulating the distribution of the statistic under the null. In fact, from Equation (16)

and (A.3) it is easy to see that
with innovations ε t+1 = r t+1 − μ (z t ) being invariant under the null hypothesis, and a quadratic form of them. It is not hard to show that under the null hypothesis, Th d/2Ŝ h is asymptotically normal, and more importantly, its asymptotic distribution is the same as that of Th d/2Ŝ h . This indicates that we could mimic the distribution ofŜ h by bootstrappingŜ h . The latter does not depend on the validity of the null hypothesis, althoughŜ h does, and hence it follows the guidelines set by Hall and Wilson (1991). Kreiss, Neumann, and Yao (2008) discussed various advantages of their bootstrap testing approach compared to that of Härdle and Mammen (1993). In their simulations, however, they only consider cases where the null hypothesis is a parametric model. That is, while their findings are useful for large samples, they miss the calibration problem one faces under non-and semiparametric null hypotheses highlighted in Sperlich (2014). For our case, this means the impact ofP on the test distribution matters in finite samples. This is confirmed by simulations that are not shown.
The obvious balance is to generate bootstrap replicates of the test by substituting u * t+1 = I −P ε * t+1 forû t+1 in Equation (23). This can either be seen as a refinement of Kreiss, Neumann, and Yao (2008) by includingP, or as a simplification of Härdle and Mammen (1993) by reducing r * t+1 to ε * t+1 . In all versions the bootstrap innovations ε * 1 , · · · , ε * T have to be conditionally independent given the observed data {(r t+1 , z t ) : 1 ≤ t ≤ T}. Furthermore, where E * denotes the expectation under the bootstrap distribution. In practice, we can define ε * t+1 = r t+1 − μ h * 1 (z t ) η t+1 using bandwidth h * 1 (see below for more details), where η t+1 is a sequence of iid random variables following a standard Gaussian distribution.
More specifically, we work with the bootstrap statistiĉ for which we can show the following bootstrap distribution.
Theorem 4. Assume that the conditions of Theorem 2 hold. For the bootstrap statisticŜ * h defined in Equation (26), we have as T tends to infinity, conditionally on {(r t+1 , z t ) : 1 ≤ t ≤ T}, where V is the same as given in Theorem 2, and We reject H 0 ifŜ h > t * α , where t * α is the upper α-quantile of the conditional distribution ofŜ * h obtained via bootstrap, that is, the p-value of the test is the relative frequency of the event Ŝ * h >Ŝ h in the bootstrap replications. Next we prove that the bootstrap is asymptotically correct in the sense that its significance level converges to α as T tends to infinity.
In practice, the test performance depends certainly on bandwidths (h 1 , h 2 ), h and h * 1 . To understand the calibration problem in nonparametric testing, the following facts must be recognized. While bootstrap methods in nonparametric inference work quite well for the variance estimation, they do less so for the approximation of the smoothing bias. Moreover, even if the variance is well approximated and maybe also the bias problem reduced, the bootstrap distribution may differ from the true one in some higher moments so that the critical values may not be estimated very well. There exist bootstrap methods that asymptotically adapt the higher moments, but there exists also an important literature showing that these methods only improve performance for either huge samples or very particular cases.
To solve these problems at once, we recommend to interpret the quantiles of the distribution ofŜ h | H 0 , say t α , as a function of the smoothing parameters, see Sperlich (2014). Furthermore, it is recommended to maximize the power of the test along h by using the statistic S in Equation (19) Finally, we are looking for (h 1 , h 2 , h * 1 ) such that for the corresponding quantile of S * | H 0 , say t * α , our test holds the nominal level α, as implemented in the next section. Here, S * is defined as S but replacingŜ h byŜ * h . We do not provide an extension of Theorem 2 to S as such proof is purely technical and would follow similar steps as the extension in Rodriguez-Poo, Sperlich, and Vieu (2015).
Note that the proposed test statistic is the best among different alternative (consistent) tests that we study. In fact, this statistic exhibits numerically the best performance. Moreover, some alternatives turned out to be very hard to calibrate. Recall that, even though one has some freedom to choose (h 1 , h 2 , h * 1 ) in practice, it is not guaranteed that for a given sample there exist a bandwidth vector such that the test works. As outlined in Henderson and Sheehan (2018), Rodriguez-Poo, Sperlich, and Vieu (2015), and Sperlich (2014), this is a common problem entailed by facing non-or semiparametric null hypotheses.

Simulation Results
Appendix D describes the DGP of the Monte Carlo studies in detail. We perform two studies along our empirical applications. In the first application, we work with traded factors and monthly data; while in the second one, we study a mixture of traded and non-traded factors with quarterly data. Essentially, we design the experiments with our real datasets, which are described in the next section. That is, the data-generating functions in the model are estimated from the real datasets. Therefore, the simulation study below reflects a realistic situation, and we can use its outcome for the calibration of the test when it is applied to the real data.
There are several parameters and estimators to be chosen. The conditional expectations of factors and excess returns are calculated with local linear estimators in order to eliminate biases in the linear direction, and keep boundary effects small. The second moment matrices D and W (when we use the weighting matrix (10)) are estimated by a Nadaraya-Watson estimator. We use the Quartic kernel throughout and ran our simulations with bandwidths h j := c j · h CV j for j = 1, 2, that is, cross-validation bandwidths multiplied by constants c j > 0. Most convenient is to simply set c 1 = c 2 = 1 and calibrate the test only along h * 1 . In particularly small and/or noisy samples one might also use these constants for further calibration.
For calculating our test, the integral in Equation (18) is calculated using π 2 only for trimming but else setting π 2 = 1. Trimming is often done to avoid that boundary effects of the nonparametric estimators distort the test. However, often the interesting things happen right at the boundaries of the support. Therefore, in our simulations we compare the case of no trimming to a trimming by cutting off 5% (2.5% each side) of the most extreme values of z t .
In our first experiment, we generate time series with T = 648 as in the real monthly dataset. In particular, we use two DGPs, both with a predictor that represents the log of the default spread, and six excess returns where the first three represent the FF factors: (a) the CAPM, where the SDF depends only on the first FF factor (the market portfolio return); (b) the FF model, where the SDF depends on the three commented factors. The simulation was repeated 500 times under the null H 0 and 100 times under each H 1 scenario we considered. In these simulations, we used 250 bootstrap samples for estimating the p-values. It turns out that the sample is large enough for using h j := h CV j , j = 1, 2, calibrating the test over h * 1 = c * · h 1 , c * > 0 (typically ≥ 1) if needed.
Appendix D.1 shows the pricing errors under the alternative hypothesis for the two models., which correspond to γ T (z) in the local alternatives (22) of Theorem 3. These two components are artificial constructs in order to define the local alternatives to which our test is consistent, while it is clearly consistent to global alternatives. In the simulations, we do not study the rate of convergence, nor in detail how the power increases with sample size for a fixed alternative. Therefore, to get an idea of (z) in our simulations, we can simply think of the corresponding Table 1. CAPM and FF model with monthly data: Average p-values (av.pv) and percentages of rejections for rejection levels of α = 0.1, 0.05 and 0.01 for S with W following (10) under different calibration scenarios with and without trimming, T = 648. pricing errors, and γ T ≡ 1 as sample size and bandwidths are fixed (i.e., no convergence to infinity or zero, respectively). In Table 1 the simulation results are summarized for the test statistic S with W following (10) in the one-and three-factor models. Of course, this is just a small extract of all simulations we performed. For example, we also repeated the simulations with W being the identity matrix, other bootstrap procedures, using local linear estimators for the second moments, etc. Table 1 shows that our tests meets quite well the error α of the first kind, and rejects in 100% of all cases under the simulated alternatives. We also see how the calibration of the test works well only by a proper choice of c * for the bootstrap bandwidth, while taking the cross-validation bandwidths for (h 1 , h 2 ), and h being selected along Equation (19). These findings hold for both models and independently from the trimming function π 2 . What can happen if the sample size is much smaller is illustrated below.
In our second simulation study, we not only generate time series with T = 251, as in the real quarterly dataset, but also repeat our simulations with T = 500 to see how performance improves when moving from small to moderate samples. The predictor represents the log of the default spread again, and there are three excess returns that represent the three FF factors. We simulate data under two asset pricing models: (a) the CAPM, as explained above; (b) the linearized Epstein-Zin CCAPM, a two-factor model with both the market return and aggregate consumption growth. The simulation was repeated 500 times under the null H 0 and 100 times under H 1 . In these simulations we used 250 bootstrap samples for estimating the p-values.
As can be seen from Tables 2 and 3, at least when T is small, we need (c 1 , c 2 ) for calibration. As cross-validation bandwidths are known for their tendency to undersmooth, it is not surprising that we achieve calibration for c 1 , c 2 > 1. We show only results for c 1 = c 2 with c * = 1, that is, h * 1 = h 1 . Not surprisingly, for the testing problem (b) containing more nonparametric function estimates, the calibration turns out to be harder. In Tables 2 and 3, the simulation results for test S refers to W following Equation (10). For the sake of brevity, we show results with T = 500 only for (b) but skip the values for the trimmed statistics. It can be seen clearly that in the two-factor model with only T = 251, the calibration does not work that well. As expected, this problem disappears for increasing samples. In any case, the low power in Table 3 is due to the calibration of H 1 to the data, where the pricing errors are low for this model due to reasons explained in the empirical application.
Like in the first experiment, this is just a small extract of all simulations we performed. We obtain similar results for  Table 3. Epstein-Zin CCAPM with quarterly data: Average p-values (av.pv) and percentages of rejections for rejection levels of α = 0.1, 0.05 and 0.01 for S with W following (10) under different calibration scenarios without trimming, comparing the performance for T = 251 and T = 500. simulations with W being the identity matrix and/or using other conditioning variables. We also simulated a linearized CCAPM, where there is a single nontraded pricing factor, and obtain similar results. Generally, when using the same bandwidths as for the CAPM, the test becomes a bit more conservative, that is, the same bandwidths give slightly larger p-values. Therefore in the application we use slightly larger (c 1 , c 2 ) for the CCAPM.
Putting all together, our test works well, even for relatively small samples. For appropriately chosen parameters (c 1 , c 2 ), the test is calibrated to hold the rejection level under the null and exhibit reasonable power.

Empirical Applications
We study two empirical applications. In the first one, we have asset pricing models with traded factors and monthly data. In the second application, we study a mixture of traded and nontraded factors with quarterly data. Note that our simulations can directly be used for the calibration of the tests, that is, the bandwidth choice that guarantees no over-rejection, because the DGPs were fully based on our real data. For the first application, cf. Table 1, these are exactly the cross-validation bandwidths, that is, c 1 = c 2 = c * = 1. In the second application, we take c 1 = c 2 = 1.0 for the CAPM and 1.5 for the CCAPM, but 3.0 for the two-factor model. For robustness check, however, we also ran the tests with slightly larger and smaller (c 1 , c 2 ) and larger c * . The findings remained constantly the same.

Traded Factors: CAPM and FF Model
The vector of excess returns r t+1 is given by the six FF size and book-to-market sorted portfolios. They are constructed as the intersection of two portfolios formed on size (market equity) and three portfolios formed on the ratio of book equity to market equity. These portfolios are widely used in empirical finance, and the corresponding data are available from Kenneth French's Data Library. We subtract the risk-free return from the portfolio returns to obtain the corresponding excess returns. For z t we use alternate two prominent predictors: the logarithm of the default spread (lds) constructed from FRED data, using yields on AAA and BAA-rated bonds, and the logarithm of the price-earnings ratio (lpe) taken from Robert Shiller's web page. We apply our test to two popular asset pricing models, or equivalently two choices of f t+1 . The first model is the conditional CAPM, where the excess return on the market portfolio (Mk) is the only pricing factor. The second model is the conditional FF model, where the SDF depends on the market, size (SMB), and value (HML) factors. We obtain the pricing factors from Ken French's Data Library. See his web page, as well as Fama and French (1993) for details.
We use monthly nominal data from 1964 to 2016 (T = 648). We show the results when z t is lds, but the performance of the models is similar if we use lpe instead, and the corresponding figures are available upon request. We find that the estimated conditional means are positive, and clearly nonlinear in lds. These functions increase for most values of lds, which is expected as we can associate high values of this conditioning variable to "bad states, " and low values to "good states. " We also find the size and value effects: the conditional mean of small firms is higher than the mean for big firms, and the conditional mean of high book-to-market firms is higher than the mean for low book-to-market firms. Figure 1 shows the prices of risk that are obtained from the estimated conditional moments with W following (10). This figure and the following ones display smooth 90% point-wise confidence intervals obtained from wild bootstrap. For better appreciating the local behavior of our estimators and confidence bands, we indicate the kernel density of the conditioning variable in all figures (adapted to visibility convenience, not normalized). The price of risk of the CAPM is positive and increasing for most values of lds. On the other hand, the FF model has three prices of risk, all of them positive. The market price of risk is relatively flat, but the size price of risk has a clear U-shape. The value price of risk is decreasing and becomes negative for high lds, but that region does not seem statistically significant.
The first row of Figure 2 shows the square root of the criterion function that the prices of risk minimize, and the second row the conditional mean of the associated SDFs. As discussed, the square root of the criterion function (7) can be interpreted as an HJD when we use the weighting matrix (10). The HJD has an U-shape for both models, but is clearly lower for the FF model. The SDF mean is positive across the default spread values, even though an affine SDF does not necessarily yield a positive SDF. The importance of studying the SDF mean is explained in Appendix E, where it is related to the correlation between the SDF and returns.
Finally, using 1000 bootstrap replications, the p-values of our tests are always zero for both models, no matter if we conditioned on lds or lpe, and independently of whether then we applied trimming or not. This does not change when trying some smaller and/or larger bandwidths. Therefore, we find clear empirical evidence against the conditional CAPM and FF model.

Traded and Nontraded Factors: CCAPM and Epstein-Zin Model
In our second application, we apply our test to models that have nontraded factors. In particular, we test three popular asset pricing models, or equivalently three choices of f t+1 : the conditional CAPM, the linearized CCAPM where the per capita consumption growth (cgp) is the only pricing factor, and the linearized CCAPM with Epstein-Zin preferences (EZ), which nests the previous two models. Consequently, this application considers a model with a traded factor, a model with a nontraded factor, and a model with the two types of factors.
As the models in this section have at most two factors, we can directly test them with the vector of returns r t+1 given by three FF factors (the market, SMB, and HML), which capture the relevant properties of the FF size and book-to-market sorted portfolios. The conclusions of our analysis are similar if we use the corresponding six portfolios. We work with two prominent predictors as components of z t : the default spread, which is also used in the previous application, and the cointegrating residual of consumption and wealth (cay). The latter predictor was developed by Lettau and Ludvigson (2001a), and we take its data from Martin Lettau's web page. We work with quarterly real data from 1951 to 2014 (T = 251). Like in the previous application, we show the results with conditioning variable lds, but also computed results conditioning on cay. The estimated first and second moments are available from the authors. They show that the market risk premium increases with lds. This is expected as we can associate high values of lds to "bad states, " and low values to "good states. " In fact, the market variance also increases with lds. The size premium increases for most of the predictor values, but decreases for extreme values. The value premium increases for most of the predictor values, but it is much flatter than the other two risk premia.
Like in the previous application, we show the results when looking at the HJD, that is, choosing W as in (10), and the figures display smooth 90% point-wise confidence intervals obtained from wild bootstrap. Figure 3 shows the prices of risk that are obtained from the estimated conditional moments. Both the market and consumption factors have a positive and nonmonotonic price of risk in the CAPM and CCAPM, respectively. Nevertheless, when we join the two factors in the EZ model, the market factor does not seem to be a relevant pricing factor in the SDF. We provide a reason for such a crowding out effect below.
The first row of Figure 4 shows the HJD and the second row shows the conditional mean of the associated SDFs. The minimized criterion is clearly lower for the CCAPM and EZ models than for the CAPM, being close to zero for the CCAPM and EZ models at many values of lds. Also the SDF mean is closer to zero for those two models in those regions, and hence those low criterion functions must be interpreted with care, see Appendix E. There we describe the problematic case of an SDF that is uncorrelated with the cross section of returns, which translates into both zero pricing errors and a zero SDF mean. In fact, we find that confidence intervals show that the conditional covariances of returns and consumption are not significantly different from zero. This may also explain why consumption crowds out the market in the EZ model, see the bottom row of Figure 3. Importantly, Appendix E also clarifies that introducing a riskless asset in this investment set does not fully solve the problem. Table 4 reports the p-values of our test for the three models with and without trimming based on 1000 bootstrap replications. The first row of Table 4 reports the results for z := lds, while the second row focuses on cay. We also computed the p-values for other admissible bandwidths, along the lines of Tables 2 and 3, obtaining essentially the same results: p-values 0 for the CAPM, about 0.5 for CCAPM and EZ when conditioning on lds but around 0.1 if conditioning on cay. The trimming has no visible effect. As expected from the findings in the simulation study, larger (c 1 , c 2 ) tend to give smaller p-values.
In sum, we find that only the CAPM is clearly rejected. Roussanov (2014) found evidence against the correct pricing of some portfolios by the CCAPM and EZ models with similar data, but using the centered SDF approach. However, finding empirical evidence against a joint hypothesis (i.e., in a multiple testing problem) is certainly harder than it is for individual testing problems. Moreover, Peñaranda and Sentana (2015) found discrepancies between the centered and uncentered SDF approaches when testing unconditional asset pricing models. In their data, this seems to be due to a poor unconditional correlation between consumption growth and the cross-section of excess returns to be priced. Interestingly, we find a conditional counterpart of this problem.
Following Appendix E, nontraded factors that are conditionally uncorrelated with excess returns will automatically price those returns with an SDF whose conditional mean is 0. The second row of Figure 4 shows SDF means for the CCAPM and EZ models that are close to zero in the regions where the pricing errors are low. Note that the figures display point-wise  Table 4. P-values of asset pricing tests with c 1 = c 2 = 1 for the CAPM, c 1 = c 2 = 1.5 for the CCAPM, and c 1 = c 2 = 3 for the EZ model. confidence bands, and uniform bands would be wider. We do not provide a formal test of a zero SDF mean, but we expect that such a test would not reject for the CCAPM and EZ model. In that case, the centered SDF approach is not well defined so that the corresponding asymptotic theory of the estimators and test would break down. When we implemented the centered approach, the procedure suffered from severe numerical problems simply because for some values of z (lds or cay) the nonparametric conditional covariances were almost zero. Further inference based on its inverse is then meaningless and actually corresponds to identification problems. Consequently, we rather recommend to use the uncentered SDF approach, which is well defined even in those problematic cases, and to carefully check the behavior of the SDF mean to detect an uncorrelated SDF.

Conclusions
In this article, we present an adaptive omnibus specification test of conditional asset pricing models. These models provide constraints that conditional moments of returns and pricing factors must satisfy, but frequently do not provide information on the functional form of those conditional moments. The main interest of our test is that it is not only robust to functional form misspecification of conditional moments, but it also detects any relationship between pricing errors and conditioning variables.
This last issue is of crucial interest for power in testing conditional models. Our test statistic belongs to the class of consistent specification tests that compare parametric and nonparametric fits. However, our model under the null is also nonparametric, which is another crucial difference with respect to the existing tests. We develop the asymptotic theory of the test, and give special emphasis on practical issues. The test distribution is estimated using wild bootstrap, and calibration and power are obtained by proper bandwidth choices. Our test works well even for small samples.
We find interesting results in our empirical studies. In the first one, we study asset pricing models with traded factors and monthly data. Both the conditional CAPM and the FF model are rejected with size and book-to-market sorted portfolios, even though the latter model yields lower pricing errors. In the second application, we study a mixture of traded and non-traded factors with quarterly data. While we find strong empirical evidence against the conditional CAPM, we do not find such evidence against zero pricing errors with models that consider consumption growth. Yet, the low pricing errors do not necessarily mean that consumption is an economically meaningful pricing factor. The nonrejection seems to be due to the poor conditional correlation between consumption and the cross section of stock returns.
There are several interesting avenues for future research. We have focused our analysis on linear SDFs, which are prevalent in empirical finance, but we could extend our results to cover nonlinear models. This extension could be interesting given the problems that we have found with linearized variants of the CCAPM.
We have related our test statistic to the HJD for a particular choice of the metric of pricing errors. The HJD is often used to compare competing models that are likely misspecified, and such comparisons represent another interesting extension of our theoretical setting. The choice of predictors is even more relevant in that setting, and hence it would be interesting to explore a larger set of conditioning variables. However, in the case of several covariates, bias (or dimension) reducing methods are needed like higher order kernels or separability structures.
Following Fama and French (2015) among others, the empirical asset pricing literature has enriched the usual cross section of size and book-to-market sorted portfolios with profitability and investment sorted portfolios. It would be interesting to apply our test to wider cross sections. In fact, recently there has been interest in testing asset pricing models with individual stocks instead of portfolios. We plan to extend our econometric methodology to the case of a number of assets that can grow with sample size.
Finally, there are recent applications of machine learning to asset pricing such as Kozak, Nagel, and Santosh (2020) that consider both many conditioning variables and stocks. This new literature does not start from some given traded and non-traded factors like our testing methodology does, but instead extracts traded factors from the return data. In particular, firm-specific conditioning variables (e.g., size and book-to-market ratio) are used to obtain new pricing factors in the spirit of the FF model. We plan to extend our econometric methodology to accommodate both macroeconomic and firm-specific conditioning variables, and provide a formal asset pricing test to this new literature.