Uniform Nonparametric Inference for Spatially Dependent Panel Data

Abstract This article proposes a uniform functional inference method for nonparametric regressions in a panel-data setting that features general unknown forms of spatio-temporal dependence. The method requires a long time span, but does not impose any restriction on the size of the cross section or the strength of spatial correlation. The uniform inference is justified via a new growing-dimensional Gaussian coupling theory for spatio-temporally dependent panels. We apply the method in two empirical settings. One concerns the nonparametric relationship between asset price volatility and trading volume as depicted by the mixture of distribution hypothesis. The other pertains to testing the rationality of survey-based forecasts, in which we document nonparametric evidence for information rigidity among professional forecasters, offering new support for sticky-information and noisy-information models in macroeconomics.


Introduction
Nonparametric regressions allow empirical researchers to study the conditional mean function of a dependent variable given certain covariates in a flexible manner.While classical methods were originally motivated to study iid data (Nadaraya 1964;Watson 1964;Andrews 1991a;Newey 1997), a vast literature has emerged to accommodate both time-series and spatial dependence (Robinson 1983(Robinson , 2011;;Chen and Shen 1998;Jenish 2012;Chen, Liao, and Sun 2014;Lee and Robinson 2016).The prior literature has mainly focused on the pointwise inference of the unknown function by providing confidence intervals for the function's value evaluated at a given point.This may be unsatisfactory in empirical work, because a practitioner's main goal of performing a nonparametric estimation in the first place is often to make inferential statements regarding the entire function, which would require a uniform inference theory.The contribution of this article is to develop such a method for panel data, which accommodates general unknown forms of dependence in both time-series and cross-sectional (i.e., spatial) dimensions that are now well known to be important in various empirical settings (Bertrand, Duflo, and Mullainathan 2004;Petersen 2009).
The key challenge for conducting uniform inference is that the asymptotic analysis for the nonparametric functional estimator is a non-Donsker problem, because the estimator does not admit a functional central limit theorem in the usual weakconvergence sense.This issue is particularly easy to understand in the context of series regression (Eubank and Spiegelman 1990;Andrews 1991a;Newey 1997;Huang 1998Huang , 2003)), where the nonparametric estimation is carried out by regressing the dependent variable on an asymptotically growing number of approximating functions (e.g., polynomials) of the covariates.Because the dimensionality of the set of regressors increases with the sample size, conventional central limit theorems and the "textbook" notion of convergence in distribution can no longer be used to capture the joint asymptotic normality of the regression coefficients.
In a cross-sectional setting with independent data, Chernozhukov, Lee, and Rosen (2013) and Belloni et al. (2015) make an important contribution to address this non-Donsker issue.These authors show that the growing-dimensional regression coefficient in the nonparametric series estimation may be strongly approximated, or "coupled, " by a Gaussian random vector.Consequently, the estimation error of the functional estimator may be further strongly approximated by a divergent sequence of Gaussian processes.Li and Liao (2020) extend this theory to a general time-series setting for heterogeneous mixingales, which permits a broader range of empirical applications.
The use of nonparametric methods in the time-series context, however, may be hindered by a small sample size: For example, the number of observations for macroeconomic time series is typically in the low hundreds.The limited information embodied in the small sample may render nonparametric estimators too noisy to provide interesting empirical discoveries.Panel data is helpful in this regard: If the researcher is willing to assume that the conditional mean function is shared among cross-sectional units (e.g., countries, states, cities, or firms), more accurate nonparametric estimates may be obtained by further pooling the cross-sectional information.
This consideration motivates us to develop a uniform nonparametric inference method tailored for panel-data appli-cations, for which it is important to accommodate different types of spatio-temporal dependence encountered by empirical researchers (Bertrand, Duflo, and Mullainathan 2004;Petersen 2009).In the baseline context of linear panel regressions, this mainly manifests as alternative ways of computing standard errors.One popular choice is the clustered standard error proposed by Arellano (1987), which is White's (1980) standard error formed using the cross section of time-series averages.This approach allows for general serial correlation, but relies on cross-sectional/spatial independence.Ruling out spatial correlation is undesirable for applications in macroeconomics and finance, which are our main empirical focus.Driscoll and Kraay (1998) propose an alternative approach under which the standard errors are computed using heteroscedasticity and autocorrelation consistent (HAC) estimators (Newey and West 1987;Andrews 1991a) of cross-sectional averages.An advantage of the Driscoll-Kraay approach is that it allows for arbitrary spatial dependence in the cross-sectional dimension and, at the same time, it also accommodates a type of "weak" serial dependence commonly employed in time-series analysis.
We develop an analogous panel-data method under a similar spatio-temporal dependence structure as in Driscoll and Kraay (1998), by combining their insight with some technical results developed by Li and Liao (2020).Like Driscoll and Kraay, we also allow for arbitrary spatial dependence and derive asymptotics under a "large T " setting by exploiting the weak dependence in the time-series dimension.Needless to say, our statistical objective is quite distinct from that prior work: Driscoll and Kraay's (1998) study is about how to construct a HAC estimator for the standard error of a classical extremum estimator, but we focus on how to make uniform functional inference for the conditional expectation function.
To illustrate the usefulness of the proposed procedure, we conduct two empirical applications.The first concerns the functional relationship between price volatility and trading volume of financial assets (Clark 1973;Tauchen and Pitts 1983;Andersen 1996).Specifically, we study how the volume-volatility relationship estimated using a recent panel consisting of all stocks listed in the U.S. equity market has changed after the breakout of the COVID-19 pandemic, and document a significant higher market impact of trades during the post-COVID period.In the second application, we study the rationality of survery-based forecasts, by estimating the nonparametric relationship between the average ex post forecast error and ex ante forecast revision in the Survey of Professional Forecasters (SPF).Consistent with the theoretical prediction from the information-rigidity theory (Mankiw and Reis 2002;Sims 2003;Woodford 2003;Reis 2006;Coibion and Gorodnichenko 2015), our nonparametric estimate of the conditional mean function of the forecast error is increasing in the forecast revision, and so, provides robust nonparametric evidence for the presence of sticky or noisy information.
The remainder of the article is organized as follows.Section 2 presents our uniform nonparametric inference method for panel data.The empirical applications are provided in Section 3. Section 4 concludes.The supplemental appendix contains all proofs and reports the finite-sample performance of the proposed method in a Monte Carlo study.

The Statistical Method
In this section, we present the uniform nonparametric inference procedure.We describe the setting and some relevant background in Section 2.1.Section 2.2 presents new growingdimensional Gaussian coupling results for spatio-temporally dependent panel data, which are then used to construct uniform confidence bands in Section 2.3.

The Setting and Background
Consider an N × T panel Y it , X it 1≤i≤N,1≤t≤T where Y it is a scalar-valued dependent variable and the covariate X it takes value in a compact set X ⊆ R d .Like Driscoll and Kraay (1998), we are interested in a setting with "weak" time-series dependence, whereas the spatial dependence among cross-sectional units may be arbitrarily strong with an unknown form.Correspondingly, we derive asymptotic results in a "large T" thought experiment, but do not make any assumption on the crosssectional dimension N.That is, T → ∞ and N may be fixed or grow to infinity.
The inferential target is the conditional expectation function of Y it given X it , denoted by g Setting the disturbance term as it ≡ Y it − g (X it ), we may equivalently state the problem as a nonparametric regression (2.1) Our main goal is to nonparametrically estimate g (•) and construct a uniform confidence band for it.More precisely, for a given confidence level 1 − α, we aim to construct a pair of functional estimates [L(•), U (•)] such that (2.2) Our procedure is built on the series regression method (Eubank and Spiegelman 1990;Andrews 1991a;Newey 1997;Huang 1998Huang , 2003)).Under the series approach, the nonparametric estimation can be performed by running a (pooled) least-squares regression of Y it on a collection of approximating functions of X it .Specifically, consider a column vector of approximating functions P (•) = (p j (•)) 1≤j≤m , which may be polynomials, splines, trigonometric functions, wavelets, etc.; see Chen (2007) for a comprehensive review.Regressing Y it on P (X it ) yields the regression coefficient (2.3) and the resulting nonparametric estimator for g (•) is then given by g (•) ≡ P (•) b. (2.4) This nonparametric series estimator is very simple to implement and naturally generalizes the commonly used ordinary least squares.The key element of the nonparametric theory is to let the number of series terms m → ∞ asymptotically, so that the unknown function g (•) can be well approximated by a growing set of approximating functions.The growing dimensionality is exactly the main source of complication for the theoretical analysis.
The pointwise inference for g (x) at a given point x ∈ X has been extensively studied in the prior literature using standard econometric techniques; see, for example, Andrews (1991a) and Newey (1997).The uniform inference, however, is much more challenging because it is a non-Donsker problem (i.e., g (•) does not admit a functional central limit theorem in the sense of weak convergence).This theoretical difficulty stems from the growing dimensionality of the series regression.In particular, one cannot use classical central limit theorems to characterize the asymptotic normality of b, because its dimensionality is divergent asymptotically.This in turn leads to difficulties for establishing the asymptotic Gaussianity of the functional estimator g (•).
In a cross-sectional setting (i.e., T = 1) with independent data, Belloni et al. (2015) show that the aforementioned non-Donsker issue may be addressed by using a strong Gaussian approximation theory.For ease of discussion, denote h it ≡ P (X it ) it , so that the score vector for the (single cross-section) series regression may be written as N −1/2 N i=1 h i1 .Under the assumption that the cross-sectional units are independent, Belloni et al. invoke Yurinskii's coupling to show that the growingdimensional (i.e., m → ∞) score vector may be strongly approximated, or "coupled, " by a zero-mean Gaussian random vector ξ N with the same variance-covariance matrix, that is, (2.5) where • denotes the Euclidean norm.Consequently, the estimation error in b also admits a Gaussian coupling in the form of where b * is the population regression coefficient and Q is the population Gram matrix E[N −1 N i=1 P (X i1 ) P (X i1 ) ].The coupling of the regression coefficient in turn implies that the scaled estimation error function N 1/2 ( g (•) − g (•)) may be coupled by a Gaussian process P (•) Q −1 ξ N , which can then be used to construct uniform confidence bands for g (•).
The logic above reveals that the key to establish a uniform inference theory is the growing-dimensional Gaussian coupling for the score vector as described in (2.5).Along this line, Li and Liao (2020) generalize Yurinskii's coupling from the independent-data setting to one with heterogeneous mixingales, which enables them to extend Belloni et al. 's method to various time-series settings.Li and Liao's coupling theory implies that the score for the ith time series admits a Gaussian coupling in the form of Not surprisingly, the variance-covariance matrix of the coupling variable ξ (i)  T is generally the long-run variance-covariance matrix of the (h it ) t≥1 series.For the conduct of feasible inference, these authors also show that standard HAC estimators (Newey and West 1987;Andrews 1991b) remain to be valid even under the growing-dimensional setting.
The present article further extends the aforementioned theory to the panel-data setting, without restricting (i) the size of the cross section (i.e., N may be fixed or growing) or (ii) the degree of spatial dependence between cross-sectional units.These features are now well recognized to be important in many applied scenarios.A seemingly natural approach is to "stack" the cross-sectional units into a multivariate time series and then directly apply Li and Liao's ( 2020) coupling theory to obtain a "stacked" version of ( 2.6).This approach, however, would have two drawbacks.First, note that the stacking would substantially increase the dimensionality of the coupling problem and, as a consequence, the joint coupling can only be obtained under very stringent restrictions on how fast N and/or m may grow as T → ∞.As a matter of fact, N could only grow at a much slower rate than T, which is undesirable in applications with even moderately large cross sections.Second, to conduct feasible inference, one would need to perform a HAC estimation for the (Nm) × (Nm) long-run variance-covariance matrix of the stacked score vector.A satisfactory HAC estimation is known to be difficult even if N is moderately large and m is fixed (Driscoll and Kraay 1998).This issue ought to be more severe in the present growing-dimensional setting with m → ∞.
We thus consider an alternative approach that is inspired by Driscoll and Kraay (1998).These authors' key insight, when applied to the present context, is to rewrite the scaled score vector as where This simple rewriting highlights the fact that the analysis for spatially dependent panels closely resembles the (seemingly) simpler time-series problem, except that the H t time series is now "generated" as a cross-sectional average of the unit-specific influence function h it .Combining this powerful idea with an adaptation of the technical results in Li and Liao (2020), we shall construct a uniform inference procedure for the conditional expectation function g(•) for spatio-temporally dependent panels.We now turn to the details.

Growing-Dimensional Gaussian Coupling for Panel Data
We now present the aforementioned new results concerning the growing-dimensional Gaussian coupling for spatio-temporally dependent panel data.This section may be skipped by readers who are mainly interested in applications.
The formal theoretical setting is as follows.Let h it be an m -dimensional random vector for 1 ≤ i ≤ N and 1 ≤ t ≤ T. In this section, we write m as m T to emphasize that m T → ∞ as T → ∞, whereas N may be fixed or divergent.We also consider a filtration F t .We do not always assume that h it is measurable with respect to F t , but the filtration is useful to specify the serial dependence of h it .
Our goal is to construct a Gaussian coupling for a sequence of m T -dimensional random vectors given by S The normalizing sequence a N is introduced to ensure that a −1 N N i=1 h it is nondegenerate.For example, one may set a N = N if the h it variables are strongly dependent on the cross section, or set a N = N 1/2 when the cross-sectional dependence is weak (e.g., independence).More generally, it is possible to have a N = N γ for some γ ∈ (1/2, 1) if h it exhibits some form of spatial weak dependence (Conley 1999;Kelejian and Prucha 2007).Introducing a N helps streamline our theoretical presentation, but the user does not need to know its specific form for implementation because it will be canceled through studentization.
We stress that the key novelty of the coupling theory under study is that it concerns the asymptotic normality of growingdimensional statistics, which is very different from conventional central limit theorems for fixed-dimensional statistics or tight empirical processes.Like Driscoll and Kraay (1998), we rewrite S T as The T subscript in H T,t highlights the fact that it is generally treated as a triangular array.
Our coupling theory will be developed in two steps: We first consider the baseline case in which each time series (h it ) t≥1 forms a martingale difference sequence (MDS) with respect to the filtration F t , and then extend the theory to the more general setting in which (H T,t ) t≥1 forms a mixingale.It is well known that the mixingale class includes linear processes (e.g., ARMA), various mixing processes, and certain near-epoch processes as special cases, and hence, accommodates a majority of dependence structures seen in time-series applications; we refer the reader to Davidson's (1994) monograph for a comprehensive review of these well-known facts.Singling out the MDS special case is useful, because it quite commonly arises from rationalexpectation models, for which the feasible inference does not require HAC estimation.
Regularity conditions for the MDS setting are collected in the following assumption, where (2.8) where r T is a real sequence such that r T = o(1).
Condition (i) of Assumption 1 states that h it forms an MDS for each i with respect to the filtration F t , which further implies that H T,t is a martingale difference array.Condition (ii) requires that the variance-covariance matrix of H T,t is nondegenerate, and condition (iii) further imposes a bound on its third moment.Condition (iv) mainly requires that the conditional variancecovariance matrix E[H T,t H T,t |F t−1 ] satisfies a matrix law of large numbers at a certain convergence rate.It is worth noting that this condition holds trivially for r T = 0 if H T,t is conditionally homoscedastic (i.e., the conditional second moments coincide with the unconditional ones).
Theorem 1, below, establishes the Gaussian coupling for the S T statistic when h it forms an MDS.
Theorem 1.Under Assumption 1, there exists a sequence (2.9) A couple of remarks are in order.First, we note that the variance-covariance matrix of the coupling variable is which does not involve any autocovariance, because h it forms an MDS with respect to the common filtration F t .Consequently, the related feasible inference will not require HAC estimation.Second, observe that the rate of convergence of the coupling error is the same as what Li and Liao (2020) obtain in the time-series setting.As alluded above, if we had directly applied Li and Liao's result by stacking the cross-sectional units into an Nm T -dimensional time series, the resulting rate (i.e., N 1/2 m 1/2 T r 1/2 T + T −1/6 N 5/6 m 5/6 T ) would be much slower when N is large.We have avoided this issue by relying on Driscoll and Kraay's (1998) insight.
We next extend Theorem 1 to the more general case in which the triangular array H T,t = a −1 N N i=1 h it forms a mixingale.The mixingale assumption is stated as follows: For a sequence of constants cT = O(1) and a summable nonnegative sequence (2.10) where • q denotes the L q -norm of a random variable for some q ≥ 1, and the constants cT and ψ k control the magnitude and the serial dependence of the H T,t t≥1 variables, respectively.Note that if H T,t forms a martingale difference array and each of its entries has bounded qth moment, it is trivially a mixingale that verifies (2.10) with ψ k = 0 for all k ≥ 1.We extend Theorem 1 to the more general mixingale case via a martingale approximation.Specifically, under the maintained assumption k≥0 ψ k < ∞, it can be shown that where Observe that H * T,t forms a martingale difference array and so T −1/2 T t=1 H * T,t admits a strong Gaussian coupling by Theorem 1.Since S T can be approximated by T −1/2 T t=1 H * T,t up to a relatively small O p (T −1/2 m 1/2 T ) error, this further implies that S T also admits a Gaussian coupling.Theorem 2 formalizes this logic.Theorem 2. Suppose (i) the triangular array H T,t forms a mixingale satisfying (2.10) for some q ≥ 3; (ii) the martingale difference array H * T,t defined in (2.12) satisfies Assumption 1; (iii) the largest eigenvalue of var[S T ] is bounded; and (iv) m T = o(T).Then there exists a sequence ξ T of m T -dimensional random vectors with distribution N (0, var [S T ]) such that (2.13) Theorem 2 establishes the strong Gaussian approximation for S T when H T,t forms a mixingale.The convergence rate in (2.13) is the same as that in Theorem 1.It is also important to note that var[S T ] is a long-run variance-covariance matrix given by var [S T ] = var[T −1/2 T t=1 H T,t ] that generally involves all autocovariances cov H T,t , H T,s .Since h js , which clarifies how the spatio-temporal correlation across the panel contributes to the sampling variability in S T .

Uniform Nonparametric Inference Procedures for Panel Data
The Gaussian coupling results developed in the previous section (Theorems 1 and 2) allow us to construct uniform confidence bands for the conditional expectation function g (•).Below, we discuss the implementation and accompanying heuristics; the technical justification is given by Theorem 3. Turning to the details, we first recall from (2.3) that b is the least-squares coefficient obtained by regressing Y it on P (X it ), and g (•) = P (•) b is the nonparametric series estimator for g (•).When the number of series terms m → ∞, we have g (•) ≈ P (•) b * for some "population" regression coefficient b * and so which obviously resembles the representation of the estimation error in the "textbook" least-squares regression, though in the latter case (2.14) would hold as an equality.
The approximation in (2.14) suggests that the asymptotic normality of b may be established by applying the aforementioned Gaussian coupling theorems for the panel h it = P (X it ) it .To do so, we set H t = a −1 N N i=1 h it as in Section 2.2 (the T subscript of the triangular array is omitted here for simplicity).With this notation, we rewrite (2.14) as (2.15) Theorem 2 above shows that T −1/2 T t=1 H t may be strongly approximated by a Gaussian vector ξ T ∼ N (0, A), where (2.16) The estimation error in b thus admits the following Gaussian approximation in distribution: m is a generic copy of an mdimensional standard normal vector.Since g (•) ≈ P (•) b * and g (•) = P (•) b, we have the following analogous result for the functional estimator In particular, the standard error function for the scaled functional estimation error displayed on the left-hand side is given by To carry out the feasible inference, we need a consistent estimator for the long-run variance-covariance matrix A. A natural choice is the Newey-West estimator given by ), and M T is the bandwidth parameter for nonparametric HAC estimation which may be chosen using Andrews's (1991b) procedure. 1If the number of regressors m were fixed, we could directly use Driscoll and Kraay's (1998) theory to justify the consistency of A (a N ).However, since m → ∞ in the present nonparametric setting, we may instead invoke the HAC estimation theory for growing-dimensional triangular arrays developed in Li and Liao (2020); see their Theorem 6. 2 In applications, it is useful to note that if P(X it ) it forms an MDS, then the user does not need to include sample autocovariances 1 In the empirical applications of this article, a simplified version of the optimal rule proposed in Andrews (1991b) is employed for setting the bandwidth in the Newey-West estimator.Specifically, we set M T = 0.75T 1/3 , where a denotes the largest interger smaller than or equal to a.This simplified rule has been found to work well in our simulation study, as demonstrated in the supplemental appendix of the article. 2 It is worth noting that while the Newey-West estimator is commonly used in practice, it may not be optimal in terms of mean square error (as discussed in Andrews (1991b)).The optimal HAC estimator proposed in Andrews (1991b) replaces the Bartlett kernel in (2.20) with the Quadratic Spectral kernel.By Theorem 6 of Li and Liao (2020) , this alternative estimator is consistent in the general case with a divergent m.However, the optimality of this alternative estimator and the optimal choice of bandwidth have not been established in the general case, to the best of our knowledge.
in A (a N ), which amounts to setting M T = 1. 3 Equipped with A (a N ), we estimate the standard error function σ (•) defined in (2.19) via its sample analogue (2.21) For notational simplicity in the discussion below, we omit a N in the notation when a N = N; in particular, we write A = A (N) and σ (•) = σ (•; N).It is worth noting that A and σ (•) are feasible estimators as they no longer involve the generally unknown normalizing sequence a N .
Our uniform nonparametric inference is based on a "sup-t" statistic defined as σ (x; a N ) .
At first glance, this statistic might appear "infeasible" because we do not assume that the form of the a N normalizing factor is known a priori.This is not an issue, because τ is in fact invariant to a N .To see this, we note that σ (•; a N ) = Na −1 N σ (•) by definition, and hence, we may rewrite τ as τ ≡ sup x∈X T 1/2 g (x) − g (x) / σ (x).In view of the Gaussian approximation in (2.18), we may approximate the distribution of τ via that of for which a feasible approximation may be further constructed as Critical values for the sup-t statistic can be computed as the tail quantile of τ * via simulation, which in turn can be used to construct the uniform confidence bands for g (•).
For ease of application, we detail the construction of a 1 − α level two-sided uniform confidence band for g (•) in the following algorithm.
We may also adapt this procedure to make uniform inference for the derivative function of g (•).For ease of discussion, we consider the case with scalar-valued X it and denote the derivative of g (•) by ∂g (•).
The statistical procedures described in Algorithms 1 and 2 can be readily implemented using an accompanying Stata Algorithm 1 Uniform Confidence Band for g (•)) Step 1. Run a pooled panel least-squares regression for Y it on P (X it ) and obtain b as described in (2.3).Set the nonparametric estimator g and σ (•) = σ (•; N) according to (2.20) and (2.21) with a N = N.
Step 3. Draw N * m from the m-dimensional standard normal distribution many times.For each draw, compute where the supremum may be computed on a discretized mesh of X .Set the critical value cv 1−α as the 1 − α empirical quantile of the simulated τ * .
Step 4. Report the 1 − α level two-sided uniform confidence band for g ( Algorithm 2 Uniform Confidence Band for ∂g (•)) Step 1. Compute b, Q, and A as described in Algorithm 1. Set Step 2. Draw N * m from the m-dimensional standard normal distribution many times.For each draw, compute where the supremum may be computed on a discretized mesh of X .Set the critical value cv 1−α as the 1 − α empirical quantile of the simulated τ * .
Step 3. Report the 1 − α level two-sided uniform confidence band for ∂g (•) as package.We also note that, since the underlying asymptotic theory is developed in a general setting with triangular arrays, the proposed method can be easily extended (mainly at the cost of more complicated notation) to the setting in which the number of observations in each cross-section, say N t , depends on t.The only modification needed is to replace the crosssectional average i=1 when computing various sample-average statistics.This more general setting is accommodated in the Stata package as well.
Theorem 3 justifies the theoretical validity of the confidence bands described in the algorithms above.The subsequent technical discussion may be skipped by readers who are mainly interested in applications.To facilitate exposition, we collect the requisite regularity conditions in the following high-level assumption.These conditions are either standard in the series estimation literature or can be verified using the limit theorems developed in the current article.The following notation is needed.For j = 0, 1, we denote ζ j m ≡ sup x 1 ,x 2 ∈X ∂ j P(x 1 ) − ∂ j P(x 2 ) / x 1 − x 2 , where ∂ 0 P(x) ≡ P(x) and ∂ 1 P(x) ≡ ∂P(x).For any vector a ≡ (a 1 , . . ., a d ) of nonnegative integers, the derivative ∂ a h(x) of any differentiable function h(x) is defined as Assumption 2. Suppose: (i) there exists a sequence b * of m-dimensional constant vectors such that where g * m (x) ≡ P(x) b * ; (ii) the eigenvalues of Q and A are bounded from above and away from zero; (iii) the sequence T −1/2 T t=1 H t admits a strong approximation ξ T ∼ N (0, A) such that Assumptions 2(i, ii) are fairly standard in series estimation; see, for example, Andrews (1991a), Newey (1997), Chen (2007), and Belloni et al. (2015).In particular, condition (i) specifies the precision for approximating the unknown function g (•) via approximating functions, for which comprehensive results are available from numerical approximation theory.For example when x is univariate, it is well-known that for polynomials and g − g * m 1,∞ = O(m −α+1 ) where α is related to the smoothness of the unknown function g (see, e.g., Newey (1997) for references and related discussions).When H t forms a martingale difference array, Assumption 2(iii) can be verified by using Theorem 1.More generally, this condition can be verified by using Theorem 2 for mixingales. 4Assumption 2(iv) pertains to the convergence rates of Q and A (a N ), which can be verified under low level conditions using Lemma B1 and Theorem 6 in Li and Liao (2020), respectively.Assumption 2(v) includes restrictions on the approximating functions.Since many approximating functions, such as polynomials and cubic splines include a constant and a linear term, the lower bound condition in Assumption 2(v) is easily satisfied.The upper bound ζ j m is known for many commonly used approximating functions.For example, ζ Theorem 3.Under Assumption 2, the 1 − α level two-sided uniform confidence bands constructed in Algorithms 1 and 2 cover g(x) and ∂g(x), respectively uniformly over x ∈ X with probability converging to 1 − α.

Empirical Applications
In this section, we demonstrate how the proposed uniform inference method may be used in empirical work via two examples: The first concerns the nonparametric relationship between the volatility and trading volume of financial assets, and the second pertains to testing the rationality of survey-based forecasts.

Volume-Volatility Relationship during COVID-19
A large literature in finance has been devoted to understanding the relationship between asset price volatility and trading volume; see, for example, Clark (1973), Tauchen and Pitts (1983), Andersen (1996), Bollerslev, Li, and Xue (2018), and the many references therein.A common theme in this literature is the Mixture of Distribution Hypothesis (MDH), which postulates that the trading volume and price volatility are both driven by news arrival, and so, implies a positive relationship between volume and volatility.While the MDH is the leading theory for understanding the volume-volatility relationship, it does not force any specific functional form between these variables, which makes nonparametric procedures a natural choice for empirical study.In the first application, we apply the proposed nonparametric inferential tools to examine the volume-volatility relationship for the U.S. equity universe, with a particular focus on how this relationship may have changed after the breakout of the ongoing COVID-19 pandemic.
The dataset used in this empirical study consists of daily time series of Parkinson's volatility measure (Parkinson 1980) and turnover for all stocks listed on NYSE, AMEX, and NAS-DAQ.Recall that Parkinson's volatility is defined as Y it = log(h it /l it )/ 4 log 2, where h it and l it are the intraday high and low prices of stock i on day t, and the stock's turnover X it is defined as the number of traded shares divided by the total number of shares outstanding.The data is obtained from the Center for Research in Security Prices (CRSP) via Wharton Research Data Services (WRDS).Our post-COVID subsample ranges from February 24, 2020 to June 30, 2021, consisting of 342 trading days.The cutoff date, February 24, 2020, is when the U.S. government issued an official response to fight the pandemic.Meanwhile, we form the pre-COVID subsample using data from October 11, 2018 to February 21, 2020, so that it contains exactly the same number of trading days as the post-COVID sample.Our full sample thus contains 684 trading days in total and there are 7452 firms in the cross-section on an average day.To make the volatility and volume observations more comparable across firms, we consider the logarithm of each series and normalize it via its full-sample mean and standard deviation.As in the simulation study (see the supplemental appendix), we further transform the (log-normalized) turnover onto the [−1, 1] interval via the x → 2 (x) − 1 transformation, where (•) denotes the standard normal distribution function.The nonparametric procedure is then carried out using Legendre polynomials up to the sixth order, which is guided by the rule m = 2T 0.19 .The figure plots the nonparametric estimates of the conditional mean functions of the volatility given the (transformed) turnover for the pre-and post-COVID subsamples using all stocks.Each time series of volatility and turnover is lognormalized to have zero sample mean and unit standard deviation over the full sample period from October 11, 2018 to June 30, 2021.The transformed turnover is defined by transforming the normalized data via the x → 2 (x)−1 transformation so that it takes values on the [−1, 1] interval.The nonparametric series estimation is implemented using Legendre polynomials up to the sixth order, which is guided by the rule m = 2T 0.19 .The 95% uniform confidence bands are computed using Algorithm 1, where the Newey-West bandwidth is set to 0.75T 1/3 .
Figure 1 plots the estimated conditional mean functions of volatility given turnover for the pre-and post-COVID subsamples.To gauge the sampling variability of these functional estimates, we also plot the associated 95% uniform confidence bands computed according to Algorithm 1.It is worth reemphasizing that the uniform confidence band is designed to cover the entire conditional mean function simultaneously, and hence, allows us to make inferential statements regarding the whole function, rather than its value at a specific point.From the figure, we see that the confidence bands are reasonably tight, which highlights the advantage of harnessing the rich information from the panel dataset.The functional estimates reveal a clear positive relationship between volatility and turnover.Interestingly, the estimated curves appear to be steeper when the turnover is exceptionally high (roughly corresponding to the highest quintile).This finding suggests that, during extreme market conditions with "trading frenzy, " the price impact of trades tends to be larger, which is likely due to an elevated level of asymmetric information among market participants during such circumstances (Kyle 1985).In addition, we also note that the post-COVID estimate is higher than the pre-COVID estimate uniformly across all levels of trading activity.The nonoverlapping confidence bands for the two estimated functions suggest that the observed difference between them is statistically highly significant (indeed, the formal test for their difference yields a p-value that is virtually zero).In view of the economic logic underlying the MDH, this result suggests that each "unit" of news during the pandemic period exerts more impact on stock price than it would do during normal times, even after conditioning on the same level of trading activity in a completely nonparametric fashion.
It is interesting to further investigate whether these findings may differ across small and large firms.To do so, we divide the stocks evenly into four quartile groups according to their market capitalizations.Figure 2 plots the nonparametric estimates and uniform confidence bands for these size-based groups.Remarkably, the results appear quite similar across all four groups.Hence, the findings seen from the full-sample analysis in Figure 1 are not driven by any particular size group.This observed homogeneity also confirms that pooling information among them, as we have done in Figure 1, is empirically sound.

Forecast Rationality and Information Rigidity
In our second empirical application, we apply the proposed method to study the rationality of survey-based forecasts.Expectation surveys are routinely conducted among individuals, corporate officials, and professional researchers, and are widely used to gauge their forward-looking belief on the various aspects of socioeconomic life, business operation, and government policy.Arguably one of the most important surveys of this type is the Survey of Professional Forecasters (SPF), which collects institutional researchers' forecasts on leading macroeconomic indicators such as GDP, inflation, and unemployment.A large literature in macroeconomics has argued that these forecasts are not rational, in the sense that the ex post forecasting error can be partly predicted using a priori available information.An important mechanism for explaining this phenomenon is information friction.Coibion and Gorodnichenko (2015) show that both the sticky-information model of Mankiw and Reis (2002) and the noisy-information model of Woodford (2003) and Sims (2003) imply a positive relation between the average ex post forecast errors across forecasters and their average ex ante forecast revisions, and find empirical support for this hypothesis.Set against this background, we revisit Coibion and Gorodnichenko's (2015) analysis by employing the proposed functional inference method, which allows us to study a more general nonparametric notion of information rigidity.
It is instructive to briefly recall the empirical framework developed by Coibion and Gorodnichenko (2015).Let Z k,t+h denote the kth economic variable to be forecast, which is realized at time t + h.The average time-t forecast across a group of forecasters for this variable is denoted by F t Z k,t+h .The average ex post forecast error is thus Z k,t+h − F t Z k,t+h , and the average ex ante forecast revision from t − 1 to t is F t Z k,t+h − F t−1 Z k,t+h .Coibion and Gorodnichenko's ( 2015) baseline specification is the following linear regression: where the error term e k,h,t is mean-independent of the time-t information, and β measures the degree of information rigidity.If the average forecast is rational, one would have β = 0; otherwise, both sticky-information and noisy-information models imply β > 0. This testable implication is quite intuitive: Information frictions tend to make the average forecast revision "too conservative" relative to the rational-expectation benchmark.For ease of notation in our discussion below, we shall use i = (k, h) to index both the variable of interest and the forecast horizon, and set A natural generalization of (3.2) is the nonparametric regression in (2.1), namely, Y it = g (X it ) + it .In the nonparametric context, information rigidity implies that g (•) is an increasing function.Compared to the baseline linear specification, the nonparametric model allows the marginal response (i.e., the derivative of g (•)) to depend on the level of forecast revision itself.This may be formalized in the sticky-information (resp.noisy-information) model if the agents's updating frequency (resp.Kalman gain) is a function of the revision.
We carry out the empirical analysis using the data from the SPF.For ease of comparison, we employ exactly the same data as in Coibion and Gorodnichenko (2015), obtained from American Economic Review's website. 5The dataset consists of quarterly time series of forecast errors and revisions from 1969 to 2014 for 5 macroeconomic variables (i.e., GDP price deflator, real GDP, industrial production, housing starts, and the unemployment rate) and 4 horizons (i.e., h = 0, 1, 2, 3).The resulting panel corresponds to N = 20 and T = 173.In this setting, it is clearly implausible to rule out cross-sectional dependence between forecast errors across different variables and/or horizons, or to impose ad hoc "weak" spatial dependence.Since forecasts over multiple horizons are involved, one cannot rule out serial dependence, either (Hansen and Hodrick 1980).Our proposed method is designed to accommodate this type of general dependence structure for nonparametric analysis.Like in Section 3.1, we normalize each time series with its sample mean and standard deviation, and transform the conditioning variable (i.e., forecast revision) onto the [−1, 1] interval via x → 2 (x) − 1.The series basis consists of Legendre polynomials up to the fifth order, which is guided by the rule m = 2T 0.19 .
We now turn to the empirical results.On the left panel of Figure 3, we plot the nonparametric estimate of the conditional mean function of the forecast error given the transformed forecast revision, along with its 90% uniform confidence band computed according to Algorithm 1.
Since the uniform confidence band does not fully cover zero, the functional estimate is statistically significantly different from zero, which provides nonparametric evidence against the hypothesis that the SPF forecasts are fully rational.
Importantly, the estimated conditional expectation function appears to be an increasing function in the forecast revision, which, as mentioned above, is consistent with the presence of information rigidity.To see this more clearly, on the right panel of Figure 3, we plot the nonparametric estimate of the derivative function, together with its 90% uniform confidence band computed according to Algorithm 2. The plot reveals that the derivative estimate is indeed almost always positive, and the nonparametric functional estimate as a whole is significantly different from zero. 6Overall, these findings provide strong support for those of Coibion and Gorodnichenko (2015): The information rigidity documented in the prior work is not solely driven by the linear specification but holds quite robustly in a nonparametric setting; moreover, information rigidity is not only an on-average phenomenon (as summarized by the scalar  2).The transformed forecast revision is defined by transforming the normalized data via x → 2 (x) − 1 so that it takes values on the [−1, 1] interval.The nonparametric series estimation is implemented using Legendre polynomials up to the fifth order, which is guided by the rule m = 2T 0.19 .The 90% uniform confidence band is computed using Algorithm 1, where the Newey-West bandwidth is set to 0.75T 1/3 .β), but also appears to hold uniformly in a functional sense across different levels of forecast revision.
We may also formally test whether the linear specification (3.2) is in fact compatible with observed data.This type of specification test can be easily carried out using the proposed method.To do so, we first estimate the linear model and obtain the residual; we then nonparametrically regress the residuals on the covariate.If the linear model is correctly specified, the nonparametrically estimated conditional mean function of the linear-regression residual should be statistically zero; otherwise, the linear specification should be rejected.Figure 4 plots the estimated conditional mean function of the residual and the associated 90% uniform confidence band.Since the confidence band always covers zero, we cannot reject the linear specification in (3.2).This suggests that the rigidity parameter β is likely constant across different levels of forecast revision and so Coibion and Gorodnichenko's (2015) linear specification (with constant β) is indeed adequate from this perspective.

Conclusion
Nonparametric regressions offer flexible empirical designs but need more data for informative inference.This need could hinder macroeconomic applications in which the number of observations for a typical time series is often in the low hundreds.A reasonable and oft-used empirical strategy to overcome this issue is to pool the richer information from a panel.The related inference should be done carefully due to the presence of serial and cross-sectional dependence in the data.The proposed uniform nonparametric inference method readily accommodates general spatio-temporal dependence.It may be used to make functional inference concerning the conditional mean function in panel-data settings with a large T, regardless whether N is small or large.72141305), and the Fundamental Research Funds for General Universities (S20220165).

Figure 1 .
Figure1.Volume-volatility relationship before and after COVID-19.NOTE: The figure plots the nonparametric estimates of the conditional mean functions of the volatility given the (transformed) turnover for the pre-and post-COVID subsamples using all stocks.Each time series of volatility and turnover is lognormalized to have zero sample mean and unit standard deviation over the full sample period from October 11, 2018 to June 30, 2021.The transformed turnover is defined by transforming the normalized data via the x → 2 (x)−1 transformation so that it takes values on the [−1, 1] interval.The nonparametric series estimation is implemented using Legendre polynomials up to the sixth order, which is guided by the rule m = 2T 0.19 .The 95% uniform confidence bands are computed using Algorithm 1, where the Newey-West bandwidth is set to 0.75T 1/3 .

Figure 2 .
Figure2.Volume-volatility relationship for small and large firms.NOTE: The figure plots the nonparametric estimate of the conditional mean function of volatility given the transformed turnover for each quartile of stocks sorted by their market capitalizations.The estimation is done separately for each size group following the same procedure as described in Figure1.

F
t−1 Z k,t+h , and it = e k,h,t .The specification in (3.1) may be rewritten more concisely as a linear panel regression

Figure 3 .
Figure 3. Nonparametric estimation of information rigidity.NOTE: The left panel plots the nonparametric estimate of the conditional mean function of the forecast error given the transformed forecast revision, and the right panel plots the nonparametric estimate of its derivative.Each time series of forecast error or forecast revision is normalized to have zero sample mean and unit standard deviation.The transformed forecast revision is defined by transforming the normalized data via x → 2 (x)−1 so that it takes values on the [−1, 1] interval.The nonparametric series estimation is implemented using Legendre polynomials up to the fifth order, which is guided by the rule m = 2T 0.19 .The 90% uniform confidence band is computed using Algorithm 1 (resp.Algorithm 2) for the left (resp.right) panel, where the Newey-West bandwidth is set to 0.75T 1/3 .

Figure 4 .
Figure 4. Test for linear specification.NOTE:The figure plots the nonparametric estimate of the conditional mean function of the linear-regression residual given the transformed forecast revision.Each time series of forecast error and forecast revision is normalized to have zero sample mean and unit standard deviation.The residuals are obtained from a pooled linear panel regression according to (3.2).The transformed forecast revision is defined by transforming the normalized data via x → 2 (x) − 1 so that it takes values on the [−1, 1] interval.The nonparametric series estimation is implemented using Legendre polynomials up to the fifth order, which is guided by the rule m = 2T 0.19 .The 90% uniform confidence band is computed using Algorithm 1, where the Newey-West bandwidth is set to 0.75T 1/3 .
• S denotes the matrix spectral norm and H T,t denotes the jth component of H T,t .
uniformly for any sequence T of integers that satisfies T ≤ T and T