Stock return predictability: A factor-augmented predictive regression system with shrinkage method

ABSTRACT To predict stock market behaviors, we use a factor-augmented predictive regression with shrinkage to incorporate the information available across literally thousands of financial and economic variables. The system is constructed in terms of both expected returns and the tails of the return distribution. We develop the variable selection consistency and asymptotic normality of the estimator. To select the regularization parameter, we employ the prediction error, with the aim of predicting the behavior of the stock market. Through analysis of the Tokyo Stock Exchange, we find that a large number of variables provide useful information for predicting stock market behaviors.


Introduction
The nance literature has long identi ed sources of comovement in stock returns over time, and the correct model speci cation for the prediction of asset returns is of central importance to investment practitioners. Some common sources of stock return comovements include the book-to-market ratio (Kothari and Shanken, 1997), the dividend payout ratio (Lamont, 1998), the dividend-price ratio (Fama and French, 1988), expected business conditions (Campbell and Diebold, 2009), interest rate spreads (Campbell, 1987), labor income (Santos and Veronesi, 2006), lagged industry portfolio returns (Hong et al., 2007), nominal interest rates (Ang and Bekaert, 2007), and oil prices (Driesprong et al., 2008), to name but a few. There is also some evidence that we can use a variety of economic variables to predict stock returns. However, evidence of stock return predictability remains controversial in the literature. Moreover, even if a particular asset-pricing model performs well with in-sample forecasting, it may not necessarily have the same accuracy for out-of-sample forecasts. Indeed, Goyal and Welch (2008) concluded that the predictive ability of a variety of popular economic variables does not hold up well in out-of-sample forecasting.
Since the recent subprime mortgage crisis, followed by the collapse of Lehman Brothers and the European sovereign debt crisis, investment practitioners have come to realize that stock prices are highly variable during macroeconomic shocks. As a result, investors now consider macroeconomic information to be an important factor in their decision making. Furthermore, the idea that quantitative models should re ect macroeconomic information in their construction for predicting asset returns has become one of the standards in asset management. Because the stock market re ects the economic outlook, it may be possible to improve the power of a predictive system by incorporating macroeconomic variables (Brown et al., 2009). Chen et al. (1986), for example, argued that innovations in macroeconomic fundamentals could explain expected stock returns, given that economic fundamentals re ect current and future consumption at various horizons. As one development, subsequent studies have con rmed that the book-to-market value and size factors are associated with economic fundamentals (Hahn and Lee, 2006;Petkova, 2006;Vassalou and Xing, 2004).
In practice, the channel through which macroeconomic conditions a ect stock returns is expressed by the fundamental equation where P t is a security price at time t, C t+k is the future cash ow paid at time t + k, and d t+k is the discount factor, consisting of the risk-free interest rate and the risk premium. The expectation is based on investor expectations about future economic activities. According to this equation, macroeconomic conditions can a ect stock prices through cash ows and the discount factor. If the economic outlook is indeed re ected in the stock market, market information, including turnover, liquidity, and the implied cost of equity capital, would then be helpful in stock return predictions. Therefore, awareness of these issues in practice is valuable for prediction of the future behavior of stock indices or prices, including expected returns, early con rmation of market turning points, and changes in market trends.
Obviously, we can also employ such information as trading signals for asset allocations. As this viewpoint becomes more widely held, and e ective investment styles change on a market-by-market basis, it is now becoming common for investors to consider many types of broader environmental information in their decision making. Motivated by this practical need, we address the question of which types of information concerning the dynamic behavior of the stock market and the real economy are important for capturing the future characteristics of the stock market. In the analysis, we use a factor-augmented predictive regression system (see Tsay, 2011, 2014;Bai, 2006;Bernanke et al., 2005;Ludvigson and Ng, 2007;Stock and Watson, 2002a,b) to evaluate the usefulness of the dynamic behavior of the stock market and the real economy in predicting the behavior of the Japanese stock market. Investigation of the Japanese market in particular can assist in assessing the impact of such dynamic behavior on other stock markets, given that Japan is the world's third-largest economy. Furthermore, the Japanese stock market is a rare example among developed economies of a market where the momentum strategy does not work well, whereas the value strategy works very well (Asness, 2011;Fama and French, 2012).
We construct a predictive system for predicting the conditional mean and the 5% and 95% quantile points. This is in stark contrast with previous studies, which have addressed only the prediction of expected stock returns. Of course, while the conditional mean is the central parameter in most investor decision-making processes, including asset allocation, portfolio optimization, market timing, and so on, the conditional 5% and 95% quantile points also represent important information, especially for those wishing to control tail or downside risks. It is also easy to understand that the 5% quantile points link directly to the concept of "value at risk. " Alternatively, the conditional 95% quantile points provide information on the upside potential, and value at risk involving shorting strategies. This is because if the prediction objective is an index for which derivatives are available, many practitioners use these derivatives to control for the exposure of their position to the market, mainly by taking a short position. For this purpose, the 95% quantile becomes "the value at risk of 5%" in addition to the 5% quantile. Finally, movements or signi cant shi s in the 5% and 95% quantile points are expected to re ect information on the distance to turning points in the market trend. This information helps practitioners to determine their investment stance, choosing between being, say, a trend follower or a contrarian.
A crucial issue in factor-augmented predictive regression modeling is the selection of the combination of predictors from among the many candidate or potential predictors. We note here that some traditional model selection criteria, such as the Akaike and Bayesian information criteria, make some distributional assumptions for stock returns, even though the true distribution is unknown. In addition, other problems with model selection criteria arise when there is a large set of predictors with no natural ordering, such that the enumeration of a very large number of predictive regressions is necessary. For instance, even with just 20 potential predictor variables, the number of candidate models exceeds a million. Although there has been rapid progress in information technology recently, evaluating such a large number of models is impractical.
We deal with these issues using the regularization method. This approach allows us to select a handful of predictors to be included in the predictive regression system, while the remaining variables are discarded. The idea of using the regularization method has become increasingly popular, with recent work in this area by Caner (2009), Fan andLi (2001), Fan and Lv (2008), Fan and Peng (2004), Tibshirani (1996), Zhang (2010), Zou (2006), Zou and Hastie (2005), and Zou et al. (2007), among many others. Together with the use of the regularization method, our study focuses on the question of what type of information is important for prediction. As discussed later, we select the regularization parameter using the prediction error.
Our study makes the following contributions. First, we construct a predictive system for predicting the 5% and 95% quantile points, as well as the conditional mean, whereas previous studies focused only on the prediction of expected stock returns. Second, we develop the variable selection consistency and asymptotic normality of our estimator. Third, we introduce a number of variables that represent the dynamic behavior of the stock market and the real economy in predicting the characteristics of the stock market. We also construct a set of new factors based on liquidity and the implied cost of equity capital. Fourth, we nd that a wide variety of types of information concerning the dynamic behavior of the stock market and the real economy contribute to out-of-sample stock return forecasting. However, most of these contribute little to the prediction of the 5% and 95% quantile points. Finally, we obtain more solid out-of-sample evidence for 12-months-ahead forecasting than for one-month-ahead forecasting. This implies that there is a long-term relationship between a number of the variables representing the dynamic behavior of the stock market and the real economy as well as the future stock market, and that the short-term relationships are much weaker.
The remainder of the article is organized as follows. Sections 2 and 3 review the method of factoraugmented predictive regression with shrinkage. Section 4 describes the data and speci es the model structure. Section 5 presents and analyzes the empirical results. We evaluate the approach using the mean squared forecasting errors (MSFE), mean absolute forecasting errors (MAFE), and correct sign prediction (SIGN). Section 6 concludes the article.

Predictive regression system
We focus on stock return predictability, and for this purpose, we review several key methods. Suppose that our aim is to predict the future rate of returns for equity portfolios y t+h using some predictive regression system, where h denotes the forecasting horizon. Our predictive system has a factoraugmented predictive regression structure as follows: where h is the lead time between information availability and the dependent variable, θ = (α ′ , β) ′ , z t = (f ′ t , w ′ t ) ′ are p-dimensional vectors, and α and β are r-and (p − r)-dimensional coe cient parameters, respectively. There are two types of predictor: r-dimensional unobservable factors f t = (f 1t , . . . , f rt ) ′ , which represent the market dynamics, and w t , which is a set of prepared predictors. In total, there is a set of p predictors {f t , w t }.
We assume that the N-dimensional stock returns x t = (x 1t , . . . , x Nt ) ′ are derived from the set of r-factors f t : where ε t = (ε 1t , . . . , ε Nt ) ′ is an N-dimensional random noise vector with mean 0 and a variance satisfying a condition described later. The N × r matrix r = (λ 1 , . . . , λ N ) ′ contains the factor loadings. Instead of f t , we observe a panel of data X = (x 1 , . . . , x T ) ′ that contains information about f t . Then, (1) and (2) constitute the di usion index forecasting model in Stock and Watson (2002a).
In this article, we also consider a quantile regression model with factor-augmented predictors. Quantile regression (Koenker and Bassett, 1978), a comprehensive methodology for estimating models of conditional quantile functions, is widely used in the economics literature. By complementing the focus of classical linear regression on the conditional mean, quantile regression allows us to estimate the e ects of covariates, not only on the center of a distribution but also on the upper and lower tails. In contrast to the factor-augmented regression model in (1), in which a conditional expectation of the response variable is the focus, quantile regression attempts to estimate the τ th conditional quantile of y t+h given f t and w t : where α(τ ) and β(τ ) are the vectors of coe cients that depend on the quantile τ . Section 3.2 in this article provides details of the assumptions imposed on the factor-augmented regression models in (1) and (3). Explanations of the factor method are available in the literature (Bai, 2003;Bai andNg, 2002, 2008;Bernanke and Boivin, 2003;Connor and Korajczyk, 1986;Forni and Reichlin, 1998;Forni et al., 2000Forni et al., , 2004Koop and Potter, 2004;Stock and Watson, 2002a,b). In the following section, we provide the inference procedure.

Inference with shrinkage methods
The inference procedure consists of two stages. In the rst stage, we estimate common factors from the panel of data using the principal components method. In the second stage, we apply the estimated factors, as described below.
The full set of observations X can be expressed in matrix form To estimate the factor model in (2), we use the method of asymptotic principal components (Connor and Korajczyk, 1986). Given a value of the dimension of r, the estimates F or of F and can be obtained by minimizing tr X − F ′ ′ X − F ′ /NT, where the T × r matrix F and the N × r matrix are subject to the normalization condition F ′ F/T = I r , where I r is the r × r identity matrix. Although the number of factors is unknown, it can be determined by some model evaluation criteria, including those explained in Bai and Ng (2002).
The asymptotic principal component estimate of F is √ T times the eigenvectors corresponding to the r largest eigenvalues of XX ′ /(NT). Substituting the estimated factors into the factor-augmented regression models in (1) and (3) yields and The unknown parameters θ = (α ′ , β ′ ) ′ in the factor-augmented regression model (4) are estimated by a shrinkage method, obtained by minimizing where p(θ) is a function of the coe cients indexed by the regularization parameter κ that controls the trade-o between the loss function and the penalty. In a similar manner, the unknown parameters θ (τ ) = (α(τ ) ′ , β(τ ) ′ ) ′ in (5) are estimated by minimizing the objective function with ρ τ (u) = u(τ − I(u < 0)).
We use the shrinkage method to estimate coe cients for irrelevant predictors to be exactly equal to zero. This operation is equivalent to selection of the relevant predictors. Fan and Li (2001) have shown that the Lasso penalty produces biases because of its linear increase of the penalty on regression coe cients. To remedy this bias issue, several procedures are proposed, including the Smoothly Clipped Absolute Deviation (SCAD) penalty  and the adaptive Lasso penalty (Zou, 2006). The adaptive Lasso requires data-adaptive weights for the penalty term. To avoid this di culty, we use the SCAD penalty for p(·), which is formally given as p(θ) = p k=1 p κ,γ (θ k ) with for κ > 0 and γ > 2. This penalty rst applies the same rate of penalization as the Lasso and then reduces the rate to zero as it moves further away from zero. Therefore, this penalty can avoid the bias issue of Lasso. For the quantile regression with shrinkage methods, see, for example, Koenker and Xiao (2006), Wu and Liu (2009), and Belloni and Chernozhukov (2011). An estimation algorithm for quantile regression with the SCAD penalty is described in Wu and Liu (2009). If we take an extremely large value of the regularization parameter κ, almost all elements of θ will be estimated as zero. In such a case, we could exclude an important predictor. Conversely, too small a regularization parameter could include a number of unrelated predictors because almost all elements of θ will not disappear at zero. Therefore, we need to balance these options and determine a proper size for the regularization parameter κ. To select a penalty size, we use the model selection procedure described in Section 3.3.

Some asymptotic results
In this section, we investigate some asymptotic properties of the parameter estimates. We rst clarify the assumptions imposed on the factor-augmented regression model in (1) for predicting the conditional mean. The following assumptions are adopted.
Assumption A concerns the factor model, and are identical to those made in Bai and Ng (2008). It assumes the existence of r common factors and allows for heteroscedasticity and weak time-series and cross-section dependence in the idiosyncratic component. See also Assumption C in Bai and Ng (2006). Assumption B implies that e t+h is conditionally unpredictable. Assumption C assumes that the data are strictly stationary and ergodic, and is identical to R(ii) of Cheng and Hansen (2013). This assumption simpli es the asymptotic theory. Assumptions D and D' are similar to M1(iii) in Bai and Ng (2008), and is the central limit theorem. Assumption E is needed for the variable selection consistency, and the use of the SCAD penalty implicitly imposed this assumption on the penalty term p(·). See both Theorem 1 and Lemma 1 in .
We establish the asymptotic normality of the estimated parameter for the model (1). In the next theorem, it is shown that the method described in Section 3.1 can identify the set of true explanatory variables. Recall that θ 0 = (θ 10 ′ , θ 20 ′ ) ′ is the true parameter value andθ = (θ ′ 1 ,θ ′ 2 ) ′ is the corresponding parameter estimate. In addition to the asymptotic normality ofθ 1 , we show that the estimator must possess the sparsity property,θ 2 = 0. All proofs are provided in the Appendix.
Theorem 1. Suppose that Assumptions A-D hold and that the regularization parameter satis es κ → 0 and √ Tκ → ∞ as T → ∞. Furthermore, if √ T/N → 0, then the following variable selection consistency holds:

Moreover,
√ T(θ 1 − θ 10 ) is asymptotically normal with mean 0 and variance-covariance matrix with the asymptotic covariance matrix given by is a block diagonal matrix, and V 0 is the probability limit of V, with V being the r 0 × r 0 diagonal matrix consisting of the r 0 eigenvalues of XX ′ /(TN) such that the corresponding r 0 factors are true predictors in the predictive regression system. Based on the set of r 0 factors, Q is de ned as the probability limit of F ′ F/T, is the probability limit of N −1 ′ , zz is the probability limit of Similar results are also obtained for the estimated quantile forecasting model. The result is summarized in the following theorem.
Theorem 2. Suppose that Assumptions A-C and D' hold and that the regularization parameter satis es κ → 0 and √ Tκ → ∞ as T → ∞. Furthermore, if T 5/8 /N → 0, then the following variable selection consistency holds:
Remark 1. In this article, the SCAD penalty is used for the shrinkage. We point out that the oracle property in Theorems 1 and 2 can be obtained even if we use the adaptive Lasso for the penalty. However, in this case, we would require the adaptive weight that satis es the regularity conditions. Remark 2. We extracted the factor structure through one panel X. There may be a situation where there are several groups of variables; for example, a panel of exchange rates, a panel of leading economic indicators, and so on. We can extract their factor structures through analyzing each of the panels separately. In such a case, we impose Assumptions A on each of the panels and still obtain the same oracle property as in Theorems 1 and 2.

Remark 3. In
Step 1, the factors are estimated from the full panel X, and the factors are expressed as the linear combinations of the variables in the original panel. As pointed out by a referee, if all variables are included as potential predictors, then there is no more gain from adding factors estimated from the same predictors (see also De Mola et al., 2008). However, we point out that the total number of predictors z t (compared with the number of observations T) should not be large when using the shrinkage procedure. Fan and Lv (2008) reported that in general, it is challenging to estimate the sparse parameter vector θ accurately when the number of potential predictors z t is much larger than T. Owing to the dimensionality, the maximum spurious correlation between an unrelated predictor and the response can be large (see Fan and Lv, 2008). In our application, the panel X consists of an individual rm's stock returns series, noting that we analyze roughly 1,700 rms in total. Thus, the direct use of all variables in the panel might cause the problem of dimensionality. Our empirical result in Section 5.4 also shows the usefulness of the factor-augmented predictive regression system in terms of prediction accuracy.
Remark 4. Our theorems consider the factor estimation error, and hence, the development of these results are not straightforward. Obviously, Theorem 1 reduces to the result of Fan and Li (2001), and Theorem 2 reduces to the result of Wu and Liu (2009). We point out that the number of potential predictors p should be xed. If it diverges, we need more strict assumptions on the model; in particular, the number of potential predictors p should be strictly restricted (see, for example, Fan and Peng, 2004). Several existing papers have developed theories for shrinkage estimation using estimated factors as regressors, including Lu and Su (2013) and . Ando and Bai (2013a,b) developed the variable selection consistency and the asymptotic normality of panel data models with unobserved factor errors. Caner and Han (2014) proposed a group bridge estimator to select the correct number of factors in approximate factor models.

Regularization parameter selection
A natural performance measure for tted factor-augmented regression models such as (4) is the predictive mean squared error (PMSE). Letŷ t+h =α ′f t +β ′ w t be the predicted value under the given value of κ, and the parameter estimatesα andβ are obtained by using the past T observations {y t+h−1 , y t+h−2 , . . . , y t+h−T }. In other words,ŷ t+h is constructed based on the historical information, up to time t − 1. The PMSE is then where n is the number of prediction points. We can select the value of the regularization parameter in (6) by minimizing the PMSE. This PMSE is a measure investigated by Inoue and Kilian (2006). If the dataset involves cross-sectional data, then the PMSE reduces to the well-known cross-validation score that evaluates the regression models. The following theorem justi es the use of the PMSE.
Therefore, we can expect that the proposed approach asymptotically minimizes the expected squared error. Once we select the value of the regularization parameter κ, we construct the predicted mean at time t + h + n + 1 asŷ t+h+n+1 =α ′f t+n+1 +β ′ w t+n+1 . Cheng and Hansen (2013) proved similar results for the averaging estimator under a weaker assumption, in the sense that the error term e t+h (h > 1) may be serially correlated. In their Theorem 2, it is shown that the cross-validated error is an asymptotically unbiased estimate of the in-sample squared error from the leave-h-out estimator, plus an irrelevant term. See also Theorem 2 of Hansen (2010), where the cross-validation for the multistep forecasting is investigated.
Remark 5. This out-of-sample procedure is a natural predictive measure and is most common in practice (Inoue and Kilian, 2001). In the context of standard regression, the abovementioned criterion corresponds to the cross-validation (see . However, Theorem 3 does not establish that the selected regularization parameter is asymptotically e cient. Although a rigorous development of the asymptotic optimality is interesting, it is beyond the scope of this article. Nevertheless, the asymptotic unbiasedness of the PMSE is shown in Theorem 3.

Data and empirical model speci cations
We obtain Tokyo Price Index (TOPIX) returns for di erent maturities. These serve as the dependent variable, y t+h , in the factor-augmented regression models in (1) and (3). To capture the dynamics of the stock market, we use returns for individual stocks listed on the Tokyo Stock Exchange (TSE) for x t in (2). In addition, we collect w t candidates for potential predictors that might a ect the dri term of TOPIX returns, or that bring in new information from outside Japanese stock markets. Thus, some indexes are selected that indicate current market conditions, changes of the oil price, the foreign exchange rate, and the stock indexes of foreign countries. More details of the model description are as follows.

Daily models
The rst model predicts the conditional mean and the 5% and 95% quantiles for short-term TOPIX returns, including daily returns. The lead times between information availability and the dependent variable, h, are set to one, two, three, ve, 10, and 20 days. We construct the factor-augmented regression models (1) and (3) and develop one-day-ahead predictive regression systems. We use daily returns for individual stocks listed on the TSE for x t in (2). For the w t candidates, we specify the variables in Table 1.
It is well known that daily Japanese stock market returns and those for the U.S. stock market on the previous day o en move in the same direction. Indeed, many Japanese market participants check the news about the U.S. stock market's movement in newspapers and morning news shows on television. Therefore, it would be useful to include the S&P 500 index (W 6 ). In the U.S. stock market, the Chicago Board Options Exchange (CBOE) volatility index is a key measure of market expectations of near-term volatility conveyed by S&P 500 stock index option prices. Thus, we can consider it to be the premier global barometer of investor sentiment and market volatility: it is also known as the "fear index" among practitioners, who consider that it provides information about the degree of possibility of a market crash. In Japan, the Nikkei volatility index (W 7 -W 10 ) is a proxy variable for investor sentiment. The interest rate spread (W 2 ) is one possible proxy variable for the economic outlook. Crude oil prices (W 13 -W 16 ) act as a proxy variable for commodity prices and are a major cost factor for various economic activities in Japan. Currency movements directly a ect the earnings of Japanese companies, so we also include the Japanese yen/U.S. dollar exchange rate (W 17 -W 20 ). In addition, this exchange rate may cause changes in monetary policy that give rise to changes in consumer behavior and capital ows to the stock market. To consider trade with Europe, we include the Japanese yen/euro exchange rate (W 21 -W 24 ). The publication cycle for typical macroeconomic variables such as gross domestic product (GDP) growth is monthly or quarterly, and thus these variables become nearly constant in the estimation period. Therefore, the Daily Models include only macroeconomic variables that are updated daily, including the yield spread, the exchange rate, and crude oil price changes.
The three-factor model in French (1992, 1993), one of the best-known asset-pricing models, argues that β, book-to-market, and size factors play an important role in explaining the cross section of expected stock returns. We employed the ordinal types of Fama and French's three factors by varying the term of the return calculation (W 25 -W 33 ).
In this analysis, we introduce a new set of investment-style related factors. The rst three factors relate to liquidity and are similar to the illiquid (ILLIQ) measure in Amihud (2002). To calculate the daily changes in stock prices, we adopt the σ 6 measure of Garman and Klass (1980), which also uses price information on open, high, and low stocks, rather than just the absolute value of daily returns. As a measure of liquidity, we specify the average value for the past 25 business days, which we herea er refer to as the modi ed ILLIQ (MILLIQ).
The set of variables W 34 -W 39 is calculated in much the same way as the three well-known Fama and French (1993) factors. At the end of August each year, we sort all rms listed on the First Section of the TSE by market capitalization and include them in two portfolios, small (S) and big (B). Next, we independently sort the same set of stocks into three portfolios according to their book-to-market value. These portfolios are denoted H 1 (highest), H 2 (medium), and H 3 (lowest). The cuto points are the 30% and 70% quantiles. We then form six portfolios (S/H1, S/H2, S/H3, B/H1, B/H2, and B/H3), comprising combinations of the size and book-to-market measures. Then, we calculate the market capitalization-weighted average of the liquidity measure MILLIQ for the six portfolios for each day over the year following the formation of the portfolio. We repeat this procedure each year and obtain a set of market cap-weighted average MILLIQ series over the various data periods. The W 34 variable (market level of MILLIQ) is then the market cap-weighted average of the liquidity measure for all rms listed on the First Section of the TSE. The W 35 variable (MILLIQ small-minus-big factor) is the simple average liquidity measure for the three small portfolios minus that for the three big portfolios: MILLIQ small minus big factor We calculate the W 36 variable (MILLIQ high-minus-low factor) in the same manner: MILLIQ high minus low factor The set of variables W 37 -W 39 are calculated in the same manner as above but using the one-day value of the liquidity measure; i.e., the value before averaging. The reasons for using these variables are as follows. First, liquidity measures, such as Amihud's ILLIQ, re ect the liquidity premium and the gap between supply and demand. Second, the Japanese stock market clearly shows a value e ect. We also observe a size e ect, although reversal of the estimated sign for the size e ect sometimes occurs (Jagannathan et al., 1998;Kubota and Takehara, 2003). Therefore, there is a possibility that the market players may di er across the six groups. Importantly, stock markets do not move as a mass: some segments move rst, and other segments follow them, to eliminate any arbitrage opportunities made available. Therefore, by calculating the liquidity variable based on our groupings, we may be able to grasp the potential magnitude of any such distortion in the market.
The TSE trading turnover ratio (W 40 ) measures trading activity, with W 42 calculated as the ratio of the number of di erences between rising and falling stocks to the total number of listed stocks measuring the direction of the market. The ratio of the number of traded stocks to the total number of listed stocks (W 43 ) also measures trading activity. The TSE yield (W 41 ) is a component of the rate of return on equity investment.
In addition to these variables, we include variables W 44 -W 46 relating to stock market states. We use the approach in Bai and Ng (2002) to identify the number of common factors in this large panel of data X. In this article, we adopted their PC 1 criterion. Then, the number of factors identi ed is used to calculate the corresponding cumulative eigenvalue, divided by the total sum of eigenvalues, and the rst eigenvalue, divided by the total sum of eigenvalues. Kritzman et al. (2010) introduced similar variables as a measure of implied systemic risk, called the absorption ratio, which equals the fraction of the total variance of a set of asset returns, explained by a xed number of eigenvectors. They argued that this variable captures the extent to which markets are uni ed (or tightly coupled). Indeed, the fewer the number of e ective factors, the greater is the tendency for the stock market to be more volatile.
Note that Bai and Ng (2002) proved that their PC and IC criteria asymptotically select the true number of factors. Thus, the use of PC criteria and IC criteria are asymptotically identical for constructing W 44 -W 46 . Also, the set of common factorsf 1 -fˆr is prepared based on the asymptotic principal components. Herer is the selected number of factors by Bai and Ng's (2002) PC 1 criterion. However, note that the set of predictors of the factor augmented regression models, given in (4) and (5), is selected by using PMSE(κ) and PE(κ) de ned in Section 3.3.
The prediction period is from December 30, 1999 to October 31, 2012. In this interval, full sample estimation is available for all models in this article. Although this period is shorter than that for the Monthly Models, described below, we chose this period because the Japanese yen/euro exchange rate has only been available since 1999. To check the robustness of the results that we obtained, we also separated the forecasting period into two parts: before and a er September 15, 2008, the day of the "Lehman shock. "

Monthly models
The second model considers the following speci cations for (1) and (3). TOPIX returns are used for y t+h . The lead times between information availability and the dependent variable, h, are set to one month, two, three, six, and 12 months. Therefore, this model uses longer forecasts. To capture the dynamics of the stock market, we use monthly returns for individual stocks listed on the First Section of the TSE for x t in (2). Table 2 provides the set of variables to be used for w t . In addition to the new investment factors based on liquidity, we introduce three factors related to the market-implied cost of equity capital. For Daily Models, we introduced a set of three illiquidity factors. By using the market-implied cost of equity capital, we introduce W 40 -W 54 . In calculating the implied cost of equity, we use the formula in Frankel and Lee (1998) so that the values computed in the evaluation formula and market capitalization match each other at the time of evaluation. We specify the two-period-ahead expected value, according to estimates by stock market analysts, as recorded in a database provided by Toyo Keizai where BV t , ROE t , and r e denote the book value, the return on equity, and the implied cost of equity capital, respectively. The price evaluation formula shows that the implied cost of equity capital is the  discount rate, extracted from the market price. In other words, this is direct information concerning pricing.
The motivation for using variables W 1 -W 39 , W 55 -W 77 , and W 105 -W 107 is as described for the Daily Models. For the Monthly Models, unlike the Daily Models, we can use macroeconomic variables as w t candidates. Variables W 78 -W 104 are "Indexes of Business Conditions, " published by the Cabinet O ce of the Government of Japan, and their movements. We regard these factors as business cycle variables that play an important role in forecasting stock returns. Lettau and Ludvigson (2001) argued that stock returns are forecastable by business cycle variables. Similar to the Daily models, the set of common factors is prepared based on the asymptotic principal components together with Bai and Ng's (2002) PC 1 criterion.
The prediction period for this model is from December 1995 to October 2012. As for the Daily Models, the full sample estimation is available for all models in this interval. Again, we separated the forecasting period into two parts: from December 1995 to August 2008, and from September 2008 to October 2012.

Forecasting results
In our modeling framework, the factor-augmented regression models (1) and (3) are determined based on the past n observations with the prediction errors (PMSE and PE in Section 3.2). To obtain the prediction errors in each time period, T past observations are used.
We set n = 60 and T = 60 in this study. Following Fan and Li's (2001) suggestion, we set γ in the SCAD penalty as γ = 3.7 to reduce the computational burden. The Daily Models start from December 28, 1999 for the full-size estimation periods. From a practical viewpoint, setting the models estimation periods to be equal to 60, or as a multiple of 60, appears to be reasonable because, in the case of the Daily Models, 60 days is approximately equal to a quarter, the shortest interval of analysis of business activity in a scal year. For the Monthly Models, 60 months is equal to ve years, the basic unit of time from the strategic point of view for businesses. Moreover, many practitioners doubt the idea that the structure of stock price dynamism is stable, so they tend to set shorter estimation periods, with keeping sample size can be thought as enough. However, because there is no theoretical background that supports the optimality of setting n = 60, we also discuss the e ect of using di erent estimation periods later.
The regularization parameter takes values of κ = {10, 1, 0.1, 0.01, 0.001, 0.0001}. Given an optimized value of κ (i.e., the optimized set of predictors w t and f t and the corresponding model parameters), the expected return in the next period t + h,ŷ t+h , and the 5% and 95% quantiles,ŷ t+h (5%) andŷ t+h (95%), are calculated asŷ y t+h (5%) =α(0.05) ′f t +β(0.05) ′ w t , y t+h (95%) =α(0.95) ′f t +β(0.95) ′ w t . Rolling the sample forward by one period, we repeat this procedure to the last period. Thus, all empirical tests in this study are based exactly on the outputs from out-of-sample predictions.
We computed the MSFE and the MAFE associated with our sector forecasts, given by where t min and t max are the forecast start and end points, respectively. We also computed the correct sign (SIGN) prediction measure: where δ(y t+h ,ŷ t+h ) = 1 if the signs for y t+h andŷ t+h are the same, and δ(y t+h ,ŷ t+h ) = 0 otherwise. Furthermore, we evaluated the statistical relationships between the observed values y t+h = µ t+h + e t+h and the predicted valuesŷ t+h . Here, µ t+h is the true conditional mean, and e t+h is the idiosyncratic shock. We regressed y t+h onŷ t+h using the simple linear regression y t+h = α + βŷ t+h + e t+h . Under Assumptions A-E and √ T/N → 0,ŷ t+h is a consistent estimator of µ t+h , and max t=1,...,T f t − Hf t = o p (1) (see Lemma 2 in the Appendix.). Thus, in the asymptotic sense, we can regard the idiosyncratic shock e t+h as independent ofŷ t+h , and the estimated β is equivalent to the beta coe cient estimated from the simple linear regression as if the true factor is used in the predictive regression system. We calculated the estimated β, its t-value and p-value, and the adjusted R 2 score. The adjusted R 2 measures how wellŷ t+h predicts y t+h . A higher adjusted R 2 indicates a better forecasting result. Gathering out-ofsample evidence is important in avoiding the problem of data snooping Lo and MacKinlay (1990). Table 3 presents MSFE, MAFE, and SIGN values for our forecasts, as well as estimated β values and the corresponding t-value, p-value, and adjusted R 2 . As shown in Table 3, the result for Daily Model (h = 1) represents a good prediction result. We also tested the hypothesis β = 1 v.s. β = 1 and found that the Daily Model with h = 1 did not reject the hypothesis β = 1 at the 5% signi cance level. The results of the test are not included in Table 3. Table 4 reports the list of the top-10 variables in the Daily Model for the conditional mean prediction (h = 1) in the following order: the number, the average times selected, and the longest period for which the variable was continuously selected. The extracted factors are also included as the top-10 variables. To understand the meanings of the factors, we calculated the correlation between the extracted factors  and the variables in Table 1. We found that the correlation is time varying, and thus the interpretation of the extracted factors also depends on the time points.
In Table 4, we found that the main explanatory variable for this model is W 6 , the S&P 500 index return at time t. For the same calendar day, the closing time of the US stock market is about half a day a er the closing time of the Japanese stock market, so investors cannot use the information to make their investment position until the next market opening. The behavior of some of the other variables is comparable to this case. However, this does not indicate that the result for Daily Model (h = 1) are invalid. Indeed, although the result is di cult to apply to daily base trading strategies, the model still has high power to predict prices in the future. To consider applicability to investment strategies, we also calculated MSFE, MAFE, and so on, using returns from the opening price at time t + 1 to the closing price at time t + h as observed values y t+h -instead of the returns from the closing price at time t to the closing price at time t + h-to eliminate this half-day front-running e ect. (Details are summarized in the Appendix.) Compared with Table 3, the results indicate that while the prediction power of Daily Model (h = 1) has become weaker, the result still indicates that a statistical relationship exists between the observed values y t+1 and the predicted valuesŷ t+1 . Overall, almost all of the Daily Models have signi cant p-values, although the predicting results are not as strong as the Daily Model (h = 1). We also found that the adjusted R 2 calculated for "a er the Lehman shock" are higher than those values calculated for "before the Lehman shock. " From this evidence, we can conclude that the results that we obtained are robust.
Unfortunately, the monthly return forecasting results were not as robust. First, if the prediction term, h, is shorter than six months, none of the Monthly Models has a signi cant p-value, and even if h is extended to six or 12 months, the p-values calculated for the "a er Lehman" period are insigni cant. Although the p-values and adjusted R 2 indicate good prediction results for h = 6 and h = 12 for the "before Lehman" period, the signs of β for those models are negative. Indeed, the adjusted R 2 of h = 12 is high enough if we useŷ t+h as a predictor, but overall, the resulting outputs from the prediction model are very poor. Table 5 shows the proportion of observed variables greater than the 95% quantile point and less than the 5% quantile point. As shown in Table 5, only the results for the Daily Models h = 1 and h = 2 are close to 5%. Tables 6 and 7 provide a summary of the selected predictors. We can see that the predictors for quantile point forecasting are weaker than those for the conditional mean. The number of times that the top 10 predictors are selected is much smaller than in the case of the conditional mean forecasting model.
We then checked how the number of predictors (the dimension of w t and f t ) varies over time to forecast the conditional mean, and the 5% and 95% quantile points for the TOPIX. Figure 1 shows the total number of predictors for the Daily Model (h = 1). It is clear that the optimal number of predictors is time varying. Moreover, it is evident that a variety of information types relating to the dynamic behavior of the stock market and the real economy contribute to the out-of-sample forecasting of stock returns. Figures (b) and (c) in Fig. 1 summarize the predictor selection results to forecast the conditional mean, and the 5% and 95% quantile points. In contrast to the results illustrated in Fig. (a) in Fig. 1, most of this information does not contribute to the 5% and 95% quantile points. We obtained similar results for the remaining daily and monthly forecasts. This indicates that the information that is useful for constructing the conditional mean and tail points for TOPIX returns may be di erent. Table 4 indicates that the suitable variables for the conditional mean prediction of the Daily Model (h = 1) should contain various types of information. For example, this includes the global base "price change" variables such as the S&P 500 index return, the yen/dollar currency, and oil price changes, and the "price dynamics" variables such as Principal Component Analysis (PCA) factors, Fama and French's style factors (Small-Minus-Big (SMB) and High-Minus-Low (HML)), and the variables of market conditions such as the dividend yield, the volatility index, and the style-related illiquidity factors that we introduced in this article.

Role of macroeconomic factors
To check the role of macroeconomic factors and the new set of style-related illiquidity factors, we undertook prediction with the following two additional models (herea er, Model A and Model B), which are distinguished from each other and the basic model, the Daily Model (h = 1), by the setting of the candidates for predictors. Candidates for predictors for Model A are the call rate, W 1 , and the set of PCA factors identi ed by Bai and Ng's (2002) PC 1 criterion. For Model B, candidates are the same as for the Daily Models, with the exception of the style-related illiquidity factors, W 34 -W 39 . We calculated the prediction accuracy measures, including MSFE, MAFE, and so on. Table 8 summarizes the results. The results of Model A, the model with PCA factors, are reasonable but not as strong as those for Model B and the Base Model. The PCA factors, a type of endogenous variable of stock market dynamics, do not have enough power to predict future prices. Adding "price change" variables such as the S&P 500 index return, yen/dollar currency, oil price changes, and some well-known variables of market conditions to the potential predictors of Model A results in Model B. The expansion of potential predictors brings signi cant predictive power to the model. Finally, adding stylerelated illiquidity factors improves the forecast accuracy. As shown in Table 8, the p-values for the test with the null hypothesis, β = 1, are more than 1%, and the Base Model is the only model that maintained this p-value level even for the "a er Lehman shock" period. This evidence indicates the importance of role of macroeconomic factors and market condition variables to predict future prices of TOPIX. Moreover, variables of "price dynamics, " the PCA factors, tend to increase in predictive importance in situations following a large shock such as the Lehman shock. Now, we compare the variable selection results in Tables 4 with the prediction term h (See also Tables 2-4 in the Appendix). When the prediction term h is extended from 1 to 2 and 20, the PCA Table 6. List of the top-10 variables in the daily model, 5% quantile prediction (h = 1), in order of the number and average times selected and the longest period for which the variable was continuously selected. factors disappear. Moreover, while the change in the commodities price and the foreign exchange rate fell in rank in terms of their predictive power, Japanese domestic market-related variables, such as the yield spread, the interest rate change, the dividend yield, and TOPIX's own past returns, increased in rank. This indicates that in quite a short period, even a single day, common global information impacts upon the TOPIX return. However, for short-term predictions, market participators tend to consider the condition of the Japanese stock market and their own past behavior. As shown by the list of variables included in the Monthly Model (h = 12), style factors using the implied cost of capital and the illiquidity measure are included as model variables at a high frequency. As shown earlier, the three-factor Fama-French model has predictive power on a daily basis; that is, for short-term predictions. Although we created our style factors based on a procedure similar to that developed by Fama and French (1993), our results indicate that our proposed factors potentially include information on the characteristics of the market pricing system from time to time. That is, they also capture the degree of market distortion by considering value and size measures, and extract information on future trends in the Japanese stock market. We investigated the time series of the estimated β coe cients for the di erent prediction period. We found that, in almost all cases, the number of selected factors includes the same variables (MILLIQ or ICC) and style (Market, SMB, or HML) but that the calculation term varies. Therefore, we investigated the sum of the estimated β coe cients with the same name but di erent terms. The time series plots in Fig. 2 are for the MILLIQ-based style factors in the Daily Models, h = 1. As shown in the gure, the signs of the estimated coe cients are closely clustered. We obtained similar results for the di erent prediction period (see also Figs. 1-3 in the Appendix). We obtained similar results for the remaining coe cients. In particular, only a few of the estimated coe cients for the series w t displayed the same sign over the entire period. However, each plus or minus sign tended to persist for a certain period rather than alternating frequently. This means that the role of each variable in the prediction of future stock price changes depends on di erences in the economic environment, business con dence, and the business cycle. In addition, we observed higher predictive power for the 12-months-ahead prediction models than for those models with relatively short-term prediction periods, such as for the one, two, three, and six-months-ahead periods. This implies that Table 8. Summary of the Mean Squared Error (MSE), the Mean Absolute Error (MAE), and the Correct Sign Prediction Measure (SIGN) for one day-prior-predictions. We regress y t+1 onŷ t+1 using a simple linear regression: y t+1 = α + βŷ t+1 + ε t+1 . The table also includes the estimated β coe cients, t-value, p-value, and adjusted R 2 scores. The t-value and p-value are not for the test of H0: β = 0, but for the test with H0: β = 1. Model A: Daily models that include the short-term interest rate and PCA factors derived from TSE individual stock returns only. Model B: Daily models that exclude liquidity variables, W 34 -W 39 , from Table 1 as potential candidates. Base model: daily model (h = 1) with the full set of potential predictors in Table 1. Therefore, the results are the same as the values appearing in the set of variables w t used in this analysis can be used for predicting whether the stock market direction will continue for a certain period rather than just predicting the e ect of shocks that could occur in the very near future. Figure 3 plots the time series of the estimated β coe cients of W 6 , the S&P 500 index return, for oneday-ahead prediction models. We found that even though this is the most powerful predictor for the oneday-ahead prediction model, the estimated value is time varying. Therefore, we ask whether the evidence supports the practitioners' commonsense understanding that the structure of stock price dynamism is not stable. To test this, we compared the results of three types of one-day-ahead prediction models, which are basically very similar to the Daily Model (h = 1) but which vary in terms of the size of the estimation period. We have already seen the result from setting n = 60 in this study. Adding to this, we set n = 120 and attempted to set n equal to all available periods; i.e., covering all past data. We have daily data used for estimating model parameters from June 30, 1999, so the full period model starts from n = 125, and the number of samples can be increased by one, every business day. Table 9 presents MSFE, MAFE, and SIGN values for our forecasts, as well as estimated β values and the corresponding t-value, p-value, and adjusted R 2 . The t-value and p-value are values for testing whetherβ = 1. Given the excessively time-consuming nature of the full-period prediction, we stopped our calculations before we reached the end of our data. Therefore, MSFE, MAFE, and so on are calculated with samples from December 30, 1999 to January 21, 2008. The highest adjusted R 2 is the value of the n = 60 in our basic model. Comparing the previously obtained results, the adjusted R 2 is more practical. Thus, this con rms the practitioners' experience.

The role of factor extraction
In our estimation procedure, the set of factors are extracted from the T × N-dimensional large panels. As noted by a referee, the factors are linear combinations of the variables in the original panel. If all panel variables x t are included as potential predictors, then there is no gain from adding factors estimated from   (2008) reported that a shrinkage regression using all x t as predictors works as an alternative. Herea er, we refer to this as the shrinkage without factor extraction. We therefore compared the predictive performance of this procedure with the proposed method that employs the factor approach. Setting h = 1 in the Daily Models, we evaluated the shrinkage without factor extraction by calculating the mean squared error (MSE), the mean absolute error (MAE), and the SIGN. The prediction period is from December 30, 1999 to October 31, 2012, and the corresponding MSE, MAE, and SIGN are 0.0002, 0.0098, and 0.5947, respectively. These numbers were then compared with those reported in Table 3. We found that the proposed method is better than the shrinkage regression using all x t as predictors.
We then regressed y t+h onŷ t+h using a simple linear regression: y t+h = α + βŷ t+h + ε t+h . The estimated coe cientβ = 0.7694 rejects the hypothesis β = 1, whereas theβ from our procedure did not reject the hypothesis. We also found that the adjusted R 2 of the shrinkage without factor extraction is 0.070, which is about half of the result, 0.1301, in Table 3. In summary, we believe that the factor extraction provides a useful tool for forecasting.

Modeling under the presence of structural breaks
In Table 3, we found that the prediction results of each model di er for the "before Lehman bankruptcy" period and the "a er Lehman bankruptcy" period. If a structural break occurred in the estimation period, it is natural to consider that the latest information should have more weight in the model parameter estimation. Let (a 1 , . . . , a T ) ′ be the weight vector that re ects the importance of information. Then unknown parameters θ = (α ′ , β ′ ) ′ in the factor-augmented regression model (4) are estimated by a shrinkage method, obtained by minimizing where p κ,γ (θ ) is the SCAD penalty, used in Section 3.1. Let t 0 denote the date of the Lehman bankruptcy. Then, we set where NC = T t=1 a t is the normalizing constant so that T t=1 a t = 1, and φ(> 1) puts more weight on the information observed for the Lehman bankruptcy. We point out that various functional form of a t can be considered if the purpose of a t is to place more weight on the information observed for the Lehman bankruptcy. However, the exploring the best functional form of a t is out of scope in this analysis. In our analysis, we prepared three levels of φ = 1 + 10 k (k = 0, 1, 2). The values of the regularization parameter κ and the weight parameter φ are selected by minimizing the PMSE score obtained in Section 3.3.
Setting h = 20, we computed the prediction results using the above approach, and then using the method that did not take into account the structural break. The prediction periods are from September 16, 2008 to September 30, 2010. We found that the consideration of a "structural break" contributed to increasing the adjusted R 2 . The improved adjusted R 2 (0.0637) is higher than that from the standard method (0.0192).
Recently, several forecasting procedures that take structural breaks into account have been proposed, including Pesaran et al. (2006) and Pesaran and Timmermann (2007). Although a rigorous theoretical investigation of the above methods is interesting, it is beyond the scope of this paper. However, we would like to investigate this problem in a future study.

Conclusions
In this article, we addressed the importance of macroeconomic variables in the Japanese stock market in the context of factor-augmented predictive regression. A er the recent subprime crisis, followed by the collapse of Lehman Brothers and the European sovereign debt crisis, investment practitioners have realized that stock prices are highly variable during macroeconomic shocks. As awareness of these issues increases in practice, the ability to predict the future behavior of stock markets becomes even more highly valuable.
Motivated by practical views on the stock market, we addressed the question of whether macroeconomic variables play an important role in capturing the characteristics of the Japanese stock market. Using a factor-augmented predictive regression system, we constructed quantitative models for predicting the conditional mean and the 5% and 95% quantile points for the TOPIX. As theoretical contributions, we developed the variable selection consistency and asymptotic normality of our estimators. The results show that macroeconomic variables play an important role in predicting the characteristics of the stock market in Japan, a er taking into consideration the various asset-pricing factors. We also checked the robustness of the predictability by regression of the empirical return on the predicted return, which revealed a signi cant relationship between these variables. Lemma 1 (Bai and Ng, 2002). Suppose that Assumptions A hold. Then where F 0 is the true factor, H = V −1 ( F ′ F/T)( ′ /N), with V being the r × r diagonal matrix consisting of the r largest eigenvalues of X ′ X/(TN).
Proof of Lemma 2. See the proof of Lemma 2 of Bai and Ng (2008).
As in Bai and Ng (2008), max t=1,...,T f t − Hf t = o p (1) if T 1/4 /N → 0. Lemma 2 allows us to establish the consistency of our estimatorsθ . This is because our sample loss function converges uniformly to the population loss function. In other words, we require the uniform convergence off t to the space spanned by f t . More details are discussed in Bai and Ng (2008).