Model averaging estimation of panel data models with many instruments and boosting

Applied researchers often confront two issues when using the fixed effect-two-stage least squares (FE-2SLS) estimator for panel data models. One is that it may lose its consistency due to too many instruments. The other is that the gain of using FE-2SLS may not exceed its loss when the endogeneity is weak. In this paper, an $ L_{2} $ L2Boosting regularization procedure for panel data models is proposed to tackle the many instruments issue. We then construct a Stein-like model-averaging estimator to take advantage of FE and FE-2SLS-Boosting estimators. Finite sample properties are examined in Monte Carlo and an empirical application is presented.


Introduction
In the social and behavioral sciences, research on the causal effect of one variable on another is far from settled.Regression analysis may fail to give a reliable estimate of the causal effect due to many reasons.In a panel data or longitudinal data model, the fixedeffects estimator may suffer from the endogeneity issues that arise due to the correlated unobserved effects or the correlation between explanatory variables and idiosyncratic errors.In the presence of such correlations, both fixed effects (FE) and random effects (RE) estimators yield biased and inconsistent estimates of the parameters.The resulted bias can not be removed via the differencing estimation.The commonly used technique to overcome this problem is to use instruments for those endogenous explanatory variables.The most basic approach to doing this is the two-stage least squares (2SLS) estimation.In most applied settings, many analysts have favored the 2SLS approach as shown in Hausman and Taylor [19], Amemiya and MaCurdy [2], Breusch et al. [10].These papers use instrumental variable (IV) procedures to estimate the parameters of the panel data model with endogenous regressors.The IV estimation is first introduced by Wald [28].A major breakthrough comes out later with the proposition of 2SLS by Basmann [6], Theil [27].[4] studies the widely used fixed effects-two-stage least squares (FE-2SLS) estimator which is very popular mainly because of its efficiency among the class of IV estimators under conditional homoscedasticity.
However, there are two issues that often arise in practice.The first issue is when the endogeneity is weak: the consistency gain of using FE-2SLS may not exceed its efficiency loss.The other issue is when there are too many instruments: FE-2SLS may converge to FE, losing its consistency even when all the instruments are relevant and valid.In this paper, we propose a two-step procedure to handle these two issues.
First, if the endogeneity is strong, FE is inconsistent and the gains from FE-2SLS may be big.However, if the endogeneity is weak, the consistency gain from using FE-2SLS can be small.As FE-2SLS is less efficient than FE, there is a trade-off between the inefficient FE-2SLS estimation and the inconsistent FE estimation.Under this scenario, FE-2SLS and FE estimations can be combined to take advantage of both estimators to obtain a combined estimator with a lower total mean squared error risk.
Second, 2SLS needs enough number of IVs.Otherwise, the finite order integer moments of the 2SLS estimator may not exist [21].Mogstad et al. [22] use the Web of Science Database across all papers published between January 2000 and October 2018 on the topic of instruments in the top five economic journals, and find that more than half of these empirical studies report results from a specification based on multiple IVs in the use of 2SLS.Their results confirm that empirical researchers often use more number of IVs than that of endogenous variables in the 2SLS estimation.However, applied researchers often confront problems when having a large number of instruments in panel data models, e.g. in the case when the IVs are formed from lagged endogenous variables.As shown in Bekker [7], too many instruments will cause inconsistency in the 2SLS estimator.Since the FE-2SLS estimator is sensitive to a large number of instruments, a regularization method is necessary to reduce the dimension of instruments for consistent estimation.For the crosssectional models, there is a rich literature on the regularization approaches.For example, in Belloni et al. [8], Belloni and Chernozhukov [9], Lasso is used for instrument selection.Caner [12], Fan and Liao [15], Cheng and Liao [13] extend the Lasso-type regularization to the generalized method of moments (GMM).Other than these regularized methods, Donald et al. [14] use information criteria for moment selection, while [23] use L 2 Boosting for instrumental variable selection.For the panel data models, little research is available on data-driven regularization.In this paper, we extend L 2 Boosting by Bühlmann [11] for regularization of the FE-2SLS estimator for a large n and large T panel data model.We call this estimator the 'FE-2SLS-Boosting'.We perform a series of Monte Carlo simulations to examine the issue that FE-2SLS becomes inconsistent if too many instruments are used and show that the proposed FE-2SLS-Boosting restores the consistency when there are many instruments.
We finally apply the proposed two-step approach to U.S. real house prices data and examine the extent to which house price fluctuations are driven by fundamental factors.This application is policy relevant.The empirical results indicate that the house prices that deviate from the equilibrium will eventually revert [20] (hereafter HPY).In this paper, the original HPY panel data of 49 States over the 29 year period of 1975-2003 is used to discuss the responses of the U.S. housing market to three fluctuations in incomes, population, and interest rates.
In sum, our procedure handles these two problems in the following two steps: First, L 2 Boosting (modified for a panel data model) is applied to regularize FE-2SLS for selecting instruments.Then FE and FE-2SLS-Boosting is combined to deal with the weak endogeneity.Using too many instruments, FE-2SLS can be inconsistent and can be worse than FE.Regularization by L 2 Boosting maintains the consistency for the FE-2SLS-Boosting estimator.The combination of FE and FE-2SLS-Boosting is shown to further improve over the FE-2SLS-Boosting estimator.Our Monte Carlo experiment demonstrates the relative performance of FE, FE-2SLS, FE-2SLS-Boosting, and the regularized combined (model-averaging) estimators.
The rest of the paper is organized as follows: In Section 2, FE and FE-2SLS estimators are discussed.In Section 3 we present a combined estimator that combines the FE and FE-2SLS estimators.In Section 4, we discuss the needs and benefits of using the regularization method when there are many instruments and introduce FE-2SLS-Boosting.Section 5 provides a Monte Carlo simulation.Section 6 presents an empirical example of real house prices in the US.Section 7 concludes with a brief discussion.All proofs are in supplemental materials.

FE and FE-2SLS
Consider the following large n and large T panel data model with fixed effects: where x it is q × 1, and β is a q × 1 vector of unknown parameters.α i 's are fixed effects and u it 's are the random disturbances.In matrix notation, Equation (1) can be written as where y = (y 11 , . . ., y 1T , . . ., y n1 , . . ., where QD = 0. Noting that Q is idempotent, βFE can be obtained as According to Corollary 3.1 in Bai [3], as n, T → ∞, under i.i.d.assumption of u it , the asymptotic distribution of βFE is Let plim denote the probability limit operator as n, T → ∞, and The feasible estimator of σ 2 u of σ 2 u can be obtained by first running the ordinary least squares (OLS) regression y on X to get ˆ it = y it − x it βOLS as the OLS residual and βOLS = (X X) −1 X y.This gives The endogeneity occurs due to the (i) correlation of α i with x it or (ii) correlation of u it with x it .We consider the latter case here for which the FE estimator becomes inconsistent.With u it and x it correlated, the vector x it is endogenous.Performing 2SLS on (3) with QZ as the set of instruments one gets the FE-2SLS estimator where where Remark 2.1: If a subset of regressors is treated as endogenous, consider the following structural equation of a panel data model: where X = (X 1 Z 1 ) and β = (β 1 β 2 ).Let X 1 be q 1 endogenous variables, Z 1 be 1 included exogenous variables, and q = q 1 + 1 .Let Z = (Z 1 Z 2 ) be the set of (= 1 + 2 ) exogenous variables (instrumental variables).This equation is identified if ≥ q, and therefore 2 ≥ q 1 .In this case, QZ can be used as the set of instruments to get the FE-2SLS estimator as

Combined estimator of FE and FE-2SLS
The FE-2SLS estimator is preferred to the FE estimator as it is consistent under endogeneity (which can be ensured by the regularization of many instruments), while the FE estimator is inconsistent.However, in small samples, FE-2SLS can have a much larger variance so FE can have smaller mean squared errors (MSE) especially when the extent of endogeneity is not severe.Motivated by this observation, we follow [17] to propose the following combined estimators βc , which is the weighted average of FE and FE-2SLS estimators with the weights depending on the Hausman statistic [18].The combination of FE and FE-2SLS is expected to improve the estimation precision.Let where βFE−2SLS is the FE-2SLS estimator, and where The model has the following reduced-form representation for x it as with E(z it v it ) = 0. Next, write the structural equation error u it as a linear function of v it and with E(v it ε it ) = 0. We use the local asymptotic approach.ρ is local to zero where δ is a q × 1 localizing parameter, which indexes the degree of correlation between u it and v it .δ (and thus ρ) controls the degree of endogeneity.We make the following assumptions: Assumption A2: u it are i.i.d.over t and i. α i are independent over i. α i and x it are independent of u it for any i and (1).
In Assumptions A1, x i are distributed not necessarily identically across different i.More details can be found in Ahn and Moon [1].For a vector or matrix A, its norm is defined as A = (tr(A A)) 1/2 .Assumption A2 rules out the cases in which the regressors include lagged dependent variables, although a dynamic panel introduces an even larger number of instruments for which the proposed method of using L 2 Boosting to select instruments would be useful.More discussion of the i.i.d.assumption about the idiosyncratic errors in a large T panel is provided in Bai [3].Assumption A3 is standard in the literature.Assumption A4 is the rank condition on to ensure that the coefficient β is identified.
Denote V = (v 11 , . . ., v 1T , . . ., v n1 , . . ., v nT ) , and = E(VV ).We then have the following theorem for the joint asymptotic normality of FE and FE-2SLS estimators. where Furthermore, Proof: See supplemental materials.Theorem 3.1 extends [17] for the panel data models and gives expressions for the joint asymptotic distribution of βFE and βFE−2SLS estimators, the Hausman statistic, and the combined estimator under the local-to-exogeneity setup in Equation ( 16).These alternative estimators (FE, FE-2SLS, and combined estimators) are compared in the asymptotic risk.Following [16,17], the asymptotic risk of any sequence of estimators β of β is defined as Denote the largest eigenvalue The following theorem is an extension of Theorem 3.2 of Hansen [17] for the panel data model with general weighting matrix W.
Proof: See supplemental materials.Equation (26) shows that the asymptotic risk of the combined estimator is strictly less than that of the FE-2SLS estimator, so long as the shrinkage parameter τ satisfies the condition (24).In the special case W = (V 2 − V 1 ) −1 , the condition (24) simplifies to q > 2 and 0 < τ ≤ 2(q − 2).The assumption q > 2 is Stein's [26] classic condition for shrinkage.
The following two corollaries are obtained with Corollary 3.1: Under Assumptions A1-A4 and the local-to-exogeneity setup, Corollary 3.1 indicates that when endogeneity is weak (ρ and hence δ is close to zero), the FE estimator may perform better than the FE-2SLS estimator.

FE-2SLS-Boosting
According to [7], the 2SLS estimator is inconsistent when the number of instruments is large.By replacing X with X * = (x = QZ, we extend the Bekker theorem on the inconsistency result of 2SLS to FE-2SLS in the panel data models, i.e. the FE-2SLS is inconsistent unless nT → 0. As shown in supplemental materials, while βFE is inconsistent due to the endogeneity, βFE−2SLS may also be inconsistent due to the large number of instruments. The reduced form equation for x * it can be written as (27) with E(z * it v it ) = 0. Instruments z * it is × 1. is an × q matrix, and k is the k th column of for k = 1, . . ., q.
In order to ensure the consistency of the FE-2SLS estimator, we extend the regularization method L 2 Boosting by Bühlmann [11] to Equation (27) for the panel data model.We use this regularization method to select a subset of instruments and compute the FE-2SLS estimator based on the selected instruments.We refer to the FE-2SLS estimator using L 2 Boosting as 'FE-2SLS-Boosting'.
Let m denote the m th iteration in the Boosting procedure, and M denote the maximum number of iteration.At each iteration m, we have a weak learner f m,it,k that gives a less accurate estimation on X * k .But the summation of the weak learners up to step M will give a strong estimation on X * k .We refer to this summation as a strong learner , where c m is the learning rate that controls the step of the learning process in Boosting.To simplify the notation, we drop the subscription k in the procedure.So, f m,it = f m,it,k and F m,it = F m,it,k .However, the procedure is repeated for each k ∈ {1, . . ., q}.The algorithm for instrument selection using L 2 Boosting for each X * k is as follows: (1) When m = 0, the initial estimate for x * it,k is which is the simple mean of x * it,k .Denote F 0,it = f 0 for all i and t. (29) We select the instrument that has the minimum sum of squared residuals, such that (c) The weak learner is where z * j m ,it is the instrument that is selected at iteration m.
(d) The strong learner F m,it is updated as where c m > 0 is a learning rate.(3) We repeat Steps 1 and 2 for k = 1, . . ., q.
A stopping rule is necessary in order to avoid over-fitting.Extending [11], we introduce a modified AIC for panel data models to choose the optimal number of iterations M. Let Vm = (v m,11 . . .vm,nT ) , f m = (f m,11 . . .f m,nT ) , F m = (F m,11 . . .F m,nT ) .We define P m = Z * j m (Z * j m Z * j m ) −1 Z * j m to be an nT × nT matrix.From Equation (31), When m = 0, P j 0 is an nT × nT matrix of 1 nT .Then the strong learner at each iteration m is Since the AIC c in Bühlmann [11] does not provide enough penalty in the panel data models, we modify the AIC c and denote it as AIC * c , where log( Then M = arg min m=1,..., M AIC * c (m) yields the stopping rule for the iterations.
For L 2 Boosting to be consistent for panel data models, we impose the following regularity conditions, under which FE-2SLS-Boosting is consistent for the conditional mean of x * it in quadratic mean under a panel data model.We make the following assumptions extending [11]: | and denotes the underlying probability space.
In Assumption B1, the dimension of instruments is allowed to grow exponentially with respect to the number of observations.So instruments can be in a high dimension.Assumption B2 gives an L1-norm sparseness condition that the sum of the coefficient j for all j is bounded.In this case, all instruments may be relevant, but the contribution of many instruments is very small.Hence weakly relevant instruments are allowed in the model.Assumption B3 states that by restricting the growth rate of nT , the maximum realization of random variable Z * j under sample space needs to be bounded.Assumption B4 specifies the existence of some higher moments of the error term v it , and the number of existing moments depends on η from assumption B1.Thus the number of existing moments and the growth rate of nT are related.
Similarly to Theorem 3.1 in Bühlmann [11], under Assumptions B1-B4, the L 2 Boosting estimation converges to the conditional mean of x * it in quadratic mean.Thus, the FE-2SLS-Boosting is able to shrink some elements of the coefficient matrix ˆ to zero corresponding to weak instruments, and the subsequent application of FE-2SLS will consider those removed instruments Z * as irrelevant to the endogenous variables X * .Once the instruments are selected, we use the selected instruments to compute the FE-2SLS-Boosting estimator, and then the combined estimator of FE and FE-2SLS-Boosting.Remark 4.1: Suppose some elements in are zeros, then only a subset of z it is relevant to x it .However, L 2 Boosting does not require some elements in to be zeros.What is required is Assumption B2 only requires a much weaker form of sparsity, the sparseness condition that the sum of the coefficient | j | for all j is bounded.Hence, only a finite number of instruments are strongly relevant even if all elements of j may not be zero.

Monte carlo
In this section, we carry out a series of simulations to show that FE-2SLS can go bad if too many instruments are used, and then the regularization method FE-2SLS-Boosting restores the consistency of the FE-2SLS estimator and the efficiency of the combined estimator.We also evaluate the finite sample performance of the regularized combined estimator and compare the risk of other estimators.A design similar to [17] is used.We consider the following data generating progress (DGP), Recall that x it is a q × 1 vector.For simplicity, we consider the case when all elements of the q × 1 moving average (MA) parameter vector θ are the same and set at the value 0.3.ρ is then simplified as a scalar.All elements of v it are i.i.d.N(0, I q ).Each pair of elements of errors u it and v it has covariance ρ √ q , but all other correlation zero.α i are i.i.d.N(0, 1).The parameter ρ can vary in (−1, 1) to control the extent of endogeneity of x it .The results are not quantitatively sensitive to the value of β, thus we set β to be zero without loss of generality.
Our goal is the consistent and efficient estimation of the structural parameter β.In the DGP, the variable x it is endogenous following an invertible vector moving average VMA(1) process in (35b), which we approximate by the VAR(p) model of order p ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.We consider the lagged variables of x it as instruments, i.e. z it ≡ (x i,t−1 . . .x i,t−p ).The parameter θ controls the strength of the instruments z it as they are taken from the lagged x it .The number of instruments equals to = q × p.To mimic the situation in the empirical application for estimation of the U.S. house prices using three endogenous variables in 49 states over 29 years, we consider n = 49, T = 29, q = 3.We set the range of ρ on a 20-point grid on [0, 0.975].Estimates of three estimators, βFE−2SLS , βFE , and βc , are computed from 2000 Monte Carlo replications.To compare these three estimators, we calculate the median squared error (MSE) of each estimator We present the results graphically.In Figure 1: (a) plots MSE( β) for p = 1, = 3.This context is with just-identified instruments.Figure 1(a) shows that the combined estimator has lower MSE( β) than FE-2SLS, regardless of the degree of endogeneity.It also shows that the classical 2SLS estimator problem that its moments do not exist for the just-identified case, [cf.24,25].Observe that MSE( β) of FE-2SLS in Figure 1(a) is quite large compared to that in Figure 1(b) for all degrees of endogeneity ρ.It shows that the number of instruments in Figure 1(a) is too small and it would be necessary to increase the number of instruments.If there are more instruments, then the system is over-identified with more moment restrictions than parameters to estimate.It will ensure FE-2SLS has a finite mean with one over-identifying restriction and a finite variance with two over-identifying restrictions [21].Figure 1(b) plots the MSE for p = 2, = 6.Now the model is over-identified and FE-2SLS is well behaved.Figure 1(b) shows that the combined estimator has similar MSE( β) to FE for the small values of ρ where FE has small MSE( β).The reduction in risk achieved by the combined estimator is large unless ρ is large.Figure 1 The combined estimator achieves some reduction in MSE( β) relative to FE-2SLS for small values of ρ.However, when the number of instruments becomes larger in Figure 1(f) with p = 6, = 18, it starts to show that for large ρ, FE-2SLS becomes biased towards FE.It becomes more apparent as becomes even larger as shown in the subsequent plots.Continuing to Figures (g)-(l), it is easy to see that the FE-2SLS is biased and the bias in FE-2SLS tends to get worse as more instruments are used.
To fix this bias problem in FE-2SLS, we use the extended L 2 Boosting for panel data models to select the instruments, which makes the FE-2SLS estimator and the combined estimator more robust and restore their consistency when there are many potential instruments.To demonstrate this, we consider the setup in Figure 1(l) with p = 12, = 36.In Figure 2(a), we zoom in Figure 1(l) and add 'FE-2SLS-Boosting' and 'Combined-Boosting', where FE-2SLS-Boosting is the Boosting-regularized FE-2SLS estimator and Combined-Boosting is the combined estimator of the FE estimator and the FE-2SLS-Boosting estimator.Notice that the vertical scale of Figure 2(a) is between 0 and 0.2.By using the L 2 Boosting to regularize a panel data model with many instruments, the FE-2SLS-Boosting estimator restores its consistency.The FE-2SLS-Boosting is the postselection FE-2SLS estimator.The combined-Boosting estimator is the combined estimator of FE and FE-2SLS-Boosting estimators.The MSEs of the FE-2SLS-Boosting estimator and the Combined-Boosting estimator are significantly reduced when compared to the MSE of the FE-2SLS estimator.When the endogeneity is weak (small values of ρ), the Combined-Boosting estimator (cyan-colored long-dashed) dominates the FE-2SLS-Boosting (green-colored dotted).As the endogeneity gets stronger, the combining weight in the Combined-Boosting goes toward the FE-2SLS-Boosting.The MSE of FE-2SLS-Boosting estimator is significantly reduced when compared to the MSE of the FE-2SLS estimator without instrument selection.The Combined-Boosting estimator dominates the FE-2SLS-Boosting estimator when the endogeneity is weak.However, the weight in the Combined-Boosting increases toward the FE-2SLS-Boosting as the endogeneity gets stronger.
In addition, we examine the effect of the sample size on the risk behavior in finite samples with different n and T. We fix the maximum lag at 12, and thus the effective time length is T−12.When we have n = 49 and T = 29 (as in Figure 2 Remark 5.1: In the presence of many instruments, the combined estimator may behave poorly, and Theorem 3.2 may not hold for moderate to large values of ρ.Theorem 3.2 says the combined estimator is always better than the FE-2SLS estimator in the asymptotic risk.Figures 1(g-l) and Figure 2 show, however, that Theorem 3.2 does not hold when p is large.This is because Theorem 3.2 holds only when FE-2SLS is consistent.When both p and are large, FE-2SLS is inconsistent and therefore Theorem 3.2 does not hold when there are many instruments.After selecting instruments through L 2 Boosting, we reduce the dimension of instruments to estimate the FE-2SLS estimator.This restores Theorem 3.2 and ensures the consistency of the FE-2SLS-Boosting estimator and the efficiency of the Combined estimator.

Estimation of house prices panel data model in United States
The U.S. housing price index, published by the Federal Housing Finance Agency (FHFA), is a measure designed to capture changes in the value of houses in the U.S. based on the data provided by Fannie Mae and Freddie Mac.HPY (2010) suggest a possible spatial pattern in U.S. housing prices using the common correlated effects estimator.Baltagi and Li [5] replicated the results of HPY by extending the period of research to 2011, and the housing price indexes at the metropolitan area level are used.The HPY results are shown to be robust.As noted in Baltagi and Li [5], 'The U.S. housing price index rose by nearly 46 % from 2000 to 2006, followed by a sharp 28 % drop, unprecedented in American history', the extended period of 2004-2011 covers quite rare housing statistics of boom, bubble, crash, and recovery circle.What drives house prices?In this paper, the original HPY panel data of 49 States over the 29 year period of 1975-2003 is used to examine the extent to which housing price fluctuations in the U.S. are driven by the fundamental fluctuations. 1 We consider the following panel data model where i = 1, . . ., 49, t = 1, . . ., 29, p it is the logarithm of the real house price for the ith State of year t.y it is the logarithm of the real disposable income per capita.c it = r it − p it is the net cost of borrowing.r it is the long-term real interest rate.g it is the population growth rate.α i is the State-specific factor including the endowment of location, culture and etc.A more detailed description of the data can be found in HPY.This is because Boosting selects 19 out of 36 instruments during the selection process.On the other hand, FE-2SLS uses all 36 instruments.The standard errors of estimators are computed using bootstrap.It is important to note that the combined estimators in each panel in Table 1(c ,d) have smaller standard errors than the corresponding FE-2SLS estimators, which is in accordance with Theorem 3.2.For example in Table 1(d), the standard errors of the FE-2SLS-Boosting estimators of the three coefficient estimators are 0.0493, 0.6410, and 0.1153 respectively.After combined with FE, the standard errors of the Combined-Boosting estimators reduce to 0.0476, 0.6125, and 0.1099.The combined estimator yields smaller standard errors than FE-2SLS-Boosting.

Conclusion
The FE-2SLS estimator for panel data models is a widespread choice in empirical research.However, the FE-2SLS estimator is sensitive to the number of selected instruments and can be inconsistent when many instruments are used even when all the instruments are relevant and valid.In this paper, we propose a two-step procedure -using the L 2 Boosting to select instruments and combining FE with FE-2SLS-Boosting.It is demonstrated that L 2 Boosting for the selection of relevant instruments is important as it ensures the consistency of the FE-2SLS-Boosting estimator and the efficiency of the combined estimator.The proposed procedure provides improvement over both FE and FE-2SLS-Boosting estimators in terms of the asymptotic risk.Our empirical application shows that our results are similar to those in Holly et al. [20] that the economic fundamentals play an important role in affecting the U.S. real house prices at the state level.