Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage

We develop a novel Bayesian doubly adaptive elastic-net Lasso (DAELasso) approach for VAR shrinkage. DAELasso achieves variable selection and coefficients shrinkage in a data based manner. It constructively deals with the explanatory variables that tend to be highly collinear by encouraging grouping effect. In addition, it allows for different degree of shrinkages for different coefficients. Rewriting the multivariate Laplace distribution as a scale mixture, we establish closed-form conditional posteriors that can be drawn from a Gibbs sampler. Empirical analysis shows that forecast results produced by DAELasso and its variants are comparable to that of other popular Bayesian methods, which provides further evidence that the forecast performances of large and medium sized Bayesian VARs are relatively robust to prior choices, and in practice simple Minnesota types of priors can be more attractive relative to their complex and well designed alternatives. ∗I would like to thank Gary Koop, Esther Ruiz and two anonymous referees for their constructive comments. I would also like to thank the conference participants of CFE11, ESEM2012, and RCEF2012 for helpful discussions. Any remaining errors are my own responsibility.


Lasso
This section presents the priors, posteriors, and full conditional Gibbs schemes for Lasso, adaptive Lasso, e-net Lasso, and adaptive e-net Lasso.

Lasso VAR Shrinkage
Following Song and Bickel (2011), we define Lasso estimator for a VAR as: Correspondingly, the conditional multivariate mixture prior for β takes the following form: where Γ = [γ 1 , γ 2 , ..., γ N 2 k ] ′ , M = Σ ⊗ I N k , and f j (Γ) is a function of Γ and Λ 1 to be defined later. In this mixture prior, the terms associated with the L 1 penalty are conditional on Σ through f j (Γ). In equation (2), the variances of β a and β b for a ̸ = b are related through M . However, β a and β b themselves are independent of each other.
We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (2) takes the form of a multivariate Normal distribution Γ ∼ N (0, M ). For ease of exposition, we first write the N 2 k × N 2 k covariance matrix M as following: We next construct independent variables τ j for j = 1, 2, ..., N 2 k using standard textbook techniques (e.g. Anderson, 2003;Muirhead 1982).
The Jacobian of transforming Γ ∼ N (0, M ) to (8) is 1. Defining η j = τ j /λ 1 , we can write (8) as Let f j (Γ) = 2(η 2 j ), the scale mixture prior is: The last two terms in (10) constitute a scale mixture of Normals (with an exponential mixing density), which can be expressed as the univariate Equation (10) shows that the conditional prior for β j is N (0, 1 2η 2 j ), and the conditional prior for β is Priors for Σ and λ 2 1 can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ −1 and Gamma Finally the full conditional posterior of 1 Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps: 5. calculate Γ based on draws of Σ and 1 2η 2 j in the current iteration.
1 We adopt the same form of the inverse-Gaussian density used in Park and Casella (2008).

Adaptive Lasso VAR Shrinkage
We define the adaptive Lasso estimator for a VAR as: Correspondingly, the conditional multivariate mixture prior for β takes the following form: and Λ 1 to be defined later. In this mixture prior, the terms associated with the L 1 penalty are conditional on Σ through f j (Γ). In equation ( We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (13) takes the form of a multivariate Normal distribution Γ ∼ N (0, M ). For ease of exposition, we first write the N 2 k × N 2 k covariance matrix M as following: Let We next construct independent variables τ j for j = 1, 2, ..., N 2 k using standard textbook techniques (e.g. Anderson, 2003;Muirhead 1982). ...
The joint density of Let f j (Γ) = 2(η 2 j ), the scale mixture prior is: Equation (21) shows that the conditional prior for β j is N (0, 1 2η 2 j ), and the conditional prior for β is Priors for Σ and λ 2 1,j can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ −1 and Gamma ). Γ can not be directly drawn from the posteriors. But it can be recovered in each Gibbs iteration using the draws of 1 2η 2 j and Σ .
Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps: 5. calculate Γ based on draws of Σ and 1 2η 2 j in the current iteration.

E-net Lasso VAR Shrinkage
We define the e-net Lasso estimator for a VAR as: Correspondingly, the conditional multivariate mixture prior for β takes the following form: and Λ 1 to be defined later. In this mixture prior, the terms associated with the L 1 penalty are conditional on Σ through f j (Γ). In equation (24) We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (24) takes the form of a multivariate Normal distribution Γ ∼ N (0, M ). For ease of exposition, we first write the N 2 k × N 2 k covariance matrix M as following: We next construct independent variables τ j for j = 1, 2, ..., N 2 k using standard textbook techniques (e.g. Anderson, 2003;Muirhead 1982). ...
The joint density of τ 1 , τ 2 , ...., τ N 2 k is where that it is computationally feasible to derive σ 2 γ j when M is sparse.
The last two terms in (32) constitute a scale mixture of Normals (with an exponential mixing density), which can be expressed as the univariate Equation (32) shows that the conditional prior for β j is N (0, 2η 2 j 2λ 2 η 2 j +1 ), and the conditional prior for β is where ]). Priors for Σ and λ 2 1 can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ −1 and Gamma priors for λ 2 1 and λ 2 : Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps: ) for j = 1, 2, ...N 2 k.
6. calculate Γ based on draws of Σ and 1 2η 2 j in the current iteration.

Adaptive E-net Lasso VAR Shrinkage
In line with Zou and Zhang (2009), we define the adaptive e-net Lasso estimator for a VAR as following: Correspondingly, the conditional multivariate mixture prior for β takes the following form: where Γ = [γ 1 , γ 2 , ..., γ N 2 k ] ′ , M = Σ ⊗ I N k , and f j (Γ) is a function of Γ and Λ 1 to be defined later. In this mixture prior, the terms associated with the L 1 penalty are conditional on Σ through f j (Γ).
We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (35) takes the form of a multivariate Normal distribution Γ ∼ N (0, M ). For ease of exposition, we first write the N 2 k × N 2 k covariance matrix M as following: We next construct independent variables τ j for j = 1, 2, ..., N 2 k using standard textbook techniques (e.g. Anderson, 2003;Muirhead 1982).
The joint density of τ 1 , τ 2 , ...., τ N 2 k is that it is computationally feasible to derive σ 2 γ j when M is sparse.
6. calculate Γ based on draws of Σ and 1 2η 2 j in the current iteration.

Detailed Forecast Evaluation Results
Tables 1-4 report the DAELasso forecasts results along with Lasso, adaptive Lasso, e-net Lasso, adaptive e-net Lasso, and those of the factor models and the seven popular Bayesian shrinkage priors in Koop (2011 Finally the results for factor-augmented VAR models with one and four lagged factors are labelled as 'Factor model p=1' and 'Factor model p=4', respectively. We refer to Koop (2011) for a lucid description of these priors.      Sum of log predictive likelihoods in parentheses.