Structural Breaks in Grouped Heterogeneity

Abstract Generating accurate forecasts in the presence of structural breaks requires careful management of bias-variance tradeoffs. Forecasting panel data under breaks offers the possibility to reduce parameter estimation error without inducing any bias if there exists a regime-specific pattern of grouped heterogeneity. To this end, we develop a new Bayesian methodology to estimate and formally test panel regression models in the presence of multiple breaks and unobserved regime-specific grouped heterogeneity. In an empirical application to forecasting inflation rates across 20 U.S. industries, our method generates significantly more accurate forecasts relative to a range of popular methods.


Introduction
Economic time series forecasts are plagued by structural breaks (Stock and Watson 1996;Pettenuzzo and Timmermann 2011). 1 Generating accurate forecasts in the presence of breaks requires careful management of bias-variance tradeoffs. Forecasts are typically generated from the parameters estimated using only post-break data which ensures unbiased estimates but may result in considerable parameter estimation error particularly in short regimes. Pooling parameter estimates across pre-and post-break data reduces parameter estimation error at the expense of inducing bias. Optimal management of bias-variance tradeoffs is a challenging problem that explains much of the poor forecasting performance in the presence of breaks (Pesaran and Timmermann 2007).
There is an emerging interest in forecasting panel data. 2 Panel data present the opportunity to manage bias-variance tradeoffs through the cross-section. If series in the cross-section have patterns of grouped heterogeneity in which series within a group share a common parameter, pooling parameter estimates within groups reduces estimation error without inducing any bias. 3 Exploiting this commonality within groups requires identifying the number of groups and which series belong to each. The group structure may be further complicated if it varies through time.
This article develops a new Bayesian panel regression approach that estimates unobserved patterns of grouped heterogeneity in which the regression coefficients and the grouping structure itself are allowed to change at multiple unknown times identified by structural breaks. 4 Breaks are assumed to be common such that all series in the crosssection are hit at the same time. 5 Parameters are pooled within, but differ across, regime-specific groups. Allowing the number of groups and cross-sectional ordering of series to vary across regimes enables improved management of bias-variance tradeoffs relative to existing methods. This approach allows, in a given regime, any number of groups ranging from one (fully pooled) to N (unit-specific). Such flexibility is critical when generating forecasts because one might wish to increase the amount of pooling by reducing the number of groups during short and volatile regimes that are susceptible to estimation error such as around the global financial crisis, for instance, while allowing more groups and less pooling during longer and less volatile regimes. Figure 1 illustrates the importance of allowing for a regimespecific grouping structure. For the petroleum and machinery industries, the left window displays the total number of industries within their respective groups from 2004 to 2018. 4 Some studies on breaks in panel models or multivariate time series include Bai, Lumsdaine, and Stock (1998), Qu and Perron (2007), and Baltagi, Feng, and Kao (2016). 5 For a framework that incorporates noncommon breaks-see Smith (2018).
This article is not subject to US copyright law. estimated from the model with breaks and regime-specific grouped heterogeneity when regressing the 20 U.S. industrylevel inflation rates on an intercept, a single lag of inflation and the lagged industry-level unemployment rate. The middle window displays the evolution of the posterior mean of the slope coefficient on the lagged unemployment rate for the corresponding series. The middle window also graphs the posterior mean of β estimated from the model with breaks and parameters pooled across the entire cross-section (green dashed line). The right window graphs the standard deviation of the slope coefficient for the machinery industry estimated from the model with breaks and regime-specific grouped heterogeneity (red dashed line) and the model with breaks and unit-specific parameters (purple dotted line).
The estimates are from a panel regression of U.S. industry-level inflation rates on lagged industry-level inflation and unemployment rates in a model that allows for regime-specific grouped heterogeneity. 6 Machinery (red dotted line) has its own group until the onset of the financial crisis in February 2008. Thereafter its parameter is pooled across six industries until the crisis regime ends in October 2009 when it returns to its own group. At the same time, the method identifies that petroleum is an industry unlike any other and therefore is assigned its own group throughout the sample period. 7 Panel break models can obtain more precise estimates by pooling parameters across the entire cross-section but this will introduce bias into the estimates if the coefficients are not identical across all series in the cross-section. The middle window graphs the evolution over the sample of the slope coefficient on the lagged unemployment rate, β, estimated from our model for the machinery (red dotted line) and petroleum (black solid line) industries. The path of β estimated from the model with breaks and pooled parameters is also graphed (green dashed line). Our model is flexible enough to simultaneously increase the absolute magnitude of β for the petroleum industry, implying more predictability, during the crisis regime and decrease the magnitude of β for the machinery group, implying less predictability. The model with breaks and pooled parameters, however, treats all series equally, increasing the absolute magnitude of the slope for all series during the crisis. This model is therefore likely to generate particularly inaccurate inflation forecasts for the series in the machinery group during the crisis.
The right window graphs the evolution of the standard deviation of β for the machinery industry estimated by our model (red dotted line) and the model with breaks and unit-specific parameters (purple dotted line). During the crisis regime, pooling across six industries the slope coefficient of the group in which machinery falls results in a more precise estimate with 6 The data are described in more detail in Section 4. 7 The petroleum industry has the most unusual inflation series with high volatility and frequent and large outliers.
a standard deviation peaking at 0.8. The model with breaks and unit-specific parameters has a much larger standard deviation of 1.9 during this period. The imprecision of the parameter estimate from the model with unit-specific parameters will likely translate into inaccurate forecasts. While allowing for regime-specific grouped heterogeneity may improve model fit, it also increases model complexity relative to simpler models. Our estimation algorithm directly delivers the posterior model probabilities of our model relative to five nested models: a constant grouped heterogeneity (no break) model, panel break models with either pooled or unit-specific parameters in each regime, and no break panel models with either pooled or unit-specific parameters. The algorithm integrates over all model parameters and inherently penalizes more complex models through the densities that enter the acceptance probabilities of the proposed moves, thereby guarding against overparameterization.
For high-dimensional problems, Han and Carlin (2001) recommend estimators that explore the model-parameter space jointly. We therefore employ a reversible jump Markov chain Monte Carlo (RJMCMC) estimator (Green 1995). 8 Specifically, we develop a five-step RJMCMC algorithm that estimates jointly the number of breaks, their locations, the number of groups in each regime, group membership, and the regression coefficients. The first Gibbs step samples parameters from their full conditional distributions. Second, a random-walk Metropolis-Hastings step estimates the break locations. Third, within each regime the cross-sectional ordering of series is estimated. Fourth, in each regime a reversible jump step estimates the number of groups by attempting with equal probability to either merge two existing groups into one or split an existing group into two. Fifth, a reversible jump step estimates the number of breaks by attempting with equal probability either to introduce a break at a randomly chosen location or remove a randomly selected existing break.
To demonstrate the ability of the method to estimate models with regime-specific grouped heterogeneity we conduct a simulation study in which the true data-generating process is known. When the true data generating process is characterized by regime-specific grouped heterogeneity, panel models with either breaks and pooled parameters or time-invariant grouped heterogeneity produce biased parameter estimates. A panel model with breaks and unit-specific parameters delivers parameter estimates that are unbiased but imprecise. The parameters estimated from the new method we develop are unbiased and more precise thereby dominating the aforementioned models from a bias-variance standpoint.
An empirical application that explores the ability of 20 lagged U.S. industry-level unemployment rates to forecast corresponding industry-level PPI inflation rates over the period 2009-2018 demonstrates the practical use of the new method (Faust and Wright 2013). Two breaks are detected at February 2008 and October 2009 along with large fluctuations in grouped heterogeneity across the three regimes. For at least 11 out of 21 cases (20 industries and the U.S. aggregate) our method produces statistically significantly more accurate inflation forecasts than seven popular benchmark models and is only significantly outperformed by the benchmarks for 13 out of 147 cases (21 cases against the seven benchmark models).
The remainder of the article is set out as follows. Section 2 sets out the model and prior specifications. Section 3 presents the simulation study. Section 4 presents an empirical application to forecasting U.S. industry-level inflation rates and Section 5 concludes.

Methodology
Our panel regression framework with i = 1, . . . , N series and t = 1, . . . , T time periods estimates the number of structural breaks K and their locations τ = (τ 1 , . . . , τ K ) that separate K + 1 regimes. 9 Breaks are common such that every series is hit at the same date. The common break assumption seems reasonable in our empirical application that compares the accuracy of U.S. industry-level inflation forecasts generated from our model relative to a number of benchmark models. Carlino and DeFina (1998), for example, report that monetary shocks are aggregate shocks, which correspond to structural breaks in our framework, that have a common yet heterogeneous effect. The latter is allowed for by our grouping structure.
The kth regime contains the observations (τ k−1 + 1, . . . τ k ) and thus its duration is l k = τ k − τ k−1 . The kth regime is further characterised by an estimated number of regime-specific crosssectional groups g k = 1, . . . , G k + 1 whereby N g k denotes the number of series in the g k th group. The cross-sectional ordering of the series, that is, which series belong to which group, is also estimated. Let c t = (c 1t, , c 2t , . . . , c Nt ) denote the group membership vector at time t, in which c it ∈ {1, . . . , G k + 1} such that c it = g k means the ith series at time t belongs to group g k in regime k. Parameters are pooled within, but differ across, regime-specific groups.
Our framework allows for grouped fixed effects, slopes, and volatilities that vary across regimes. The baseline model groups on all of these parameters which are assumed to follow the same grouping structure-later we discuss restricted versions that group on only one parameter. The framework nests several popular approaches. The heterogeneous time-invariant panel model with fixed effects are obtained if we estimate no breaks and N groups. Detecting breaks with N groups in every regime obtains the unit-specific panel model with common breaks and regime-specific fixed effects (Smith and Timmermann 2017b).

Model with Regime-Specific Grouped Heterogeneity
Denote the (N × T) matrix containing the observations on the dependent variable y and the (κ × N × T) matrix containing observations on the κ regressors X. The model is in which u it ∼ N(0, σ 2 g k ) and 1 (c it =g k ) is an indicator function that equals one if the ith series at time t belongs to the g k th group and equals zero otherwise. 10 For the g k th group, let β g k denote a (κ × 1) vector of slope coefficients on the κ regressors where the first regressor is a constant such that the first element of β g k is a group fixed effect for regime k. Let σ 2 g k denote the error-term variance and . . , β K+1 ), and σ 2 = (σ 2 1 , . . . , σ 2 K+1 ). Letting c = (c 1 , c 2 , . . . , c T ), the likelihood for this model is (2)

Prior Distributions
We now specify the prior distributions over the regime durations, cross-sectional groupings, and the regression coefficients.

Prior on Regime Durations
Following Koop and Potter (2007) and Smith and Timmermann (2017b) regime durations have a Poisson distribution where λ k , the expected duration of the kth regime, has a conjugate Gamma prior 10 Note that group membership is fixed within regimes and so if the ith series belongs to the g k th group then it does so for all time periods within the kth regime, that is, for t = τ k−1 + 1, . . . , τ k .
We define the prior over the structural breaks as

Prior on Cross-Sectional Groupings
A Poisson prior distribution is placed over the number of series in a group 11 where the expected number of series in every group in the kth regime, k , has a conjugate Gamma prior The prior on the grouping structure is therefore

Priors on Regression and Variance Parameters
Conjugacy helps maintain computational efficiency which is beneficial when estimating panel models with large time series and/or cross-section dimensions. For regimes k = 1, . . . , K + 1, we specify an inverse gamma prior distribution over the errorterm variances and a conjugate Gaussian prior distribution over the slopes (and fixed effects), β g k , conditional on the variance, σ 2 where a and b characterize the prior expected error-term variance and σ 2 b characterises the prior variance of β g k . In panel settings, one often has many potential predictors available and thus it is natural to consider some hierarchical shrinkage priors, for example the global-local prior of Polson and Scott (2010). Nonconjugate priors are possible within our framework, but would require careful tuning of the proposal distributions of the trans-dimensional moves in our RJMCMC algorithm to maintain mixing and computational efficiency. The conjugate priors enable the parameters to be integrated out and thus they do not need to be proposed in the trans-dimensional moves, circumventing any need to tune the proposal distributions. Tuning the proposal distributions of trans-dimensional moves in RJMCMC algorithms has been an impediment to some users and we therefore do not recommend deviating from conjugate priors. 11 Appendix A of the supplementary materials explains how this specification of multiple independent Poisson distributions is inferentially equivalent to a specification that uses a single Multinomial distribution.

Performing Inference
Inference is performed on the posterior distribution which combines prior information with information in the data. Multiplying the likelihood function by the prior distributions and marginalising β and σ 2 we obtain 12 in which, for regimes k = 1, . . . , K + 1 and groups

Estimating the Model
The model is estimated using Markov chain Monte Carlo methods. There are five estimation steps. First, the parameters are drawn from their full conditional distributions. Second, the timing of the structural breaks are estimated using a randomwalk Metropolis Hastings step. Third, the cross-sectional ordering of series in each regime is estimated using a block updating scheme. Fourth, the number of groups in each regime is estimated using a reversible jump Markov chain Monte Carlo algorithm. Specifically, in a given regime the algorithm attempts with equal probability to either split an existing group into two or merge two existing groups into one. Fifth, the number of structural breaks is also estimated using a reversible jump step: with equal probability the algorithm attempts either to introduce a structural break at a random date or remove a randomly selected existing break. 13

Related Approaches
Finite mixture models have been used to estimate models with grouping structures. A key limitation of finite mixtures, however, is that the number of groups must be prespecified. Such methods typically involve fitting several finite mixture models with different numbers of groups and estimating the number of groups using, say, BIC or marginal likelihoods, making the implementation quite cumbersome. Dirichlet process mixtures offer an attractive alternative as they allow for infinite components by construction. Posterior inference focuses on the grouping structure induced by the Dirichlet process prior, in which the number of nonempty groups is random and can be inferred from the data using straightforward MCMC samplers. Frühwirth-Schnatter and Malsiner-Walli (2019) made a significant advance by illustrating that sparse finite mixtures are 12 The properties of the natural conjugate Normal-Inverse Gamma prior are well known; see, for example, Chapter 2 in Chan et al. (2019). We derive Equation (9) by multiplying the priors by the likelihood and marginalising β and σ 2 such that p(y|X, τ , c) = p(β, σ 2 |y, X, τ , c)dβdσ 2 . 13 Section 1 of the supplementary materials explains each step in detail. generic and can be extended beyond Gaussian settings. 14 Sparse finite mixtures perform model selection with respect to the number of groups within one-sweep, circumventing the need to design proposals within trans-dimensional approaches such as reversible jump MCMC which have been an impediment to some users. However, they assume group membership is time-invariant. 15 We allow group membership to vary across regimes. In addition, by exploiting conjugate priors, the mean and volatility parameters are integrated out of the posterior, meaning they need not be proposed in the trans-dimensional moves (Smith, Bulkley, and Leslie 2020). This feature (i) reduces the computational burden, (ii) improves mixing, and (iii) makes the method easier to implement for applied users. Of course, these key features depend upon the conjugate prior specification and for that reason we do not recommend applied users implement our approach in settings without conjugate priors.
Generalizing the time-invariant clustering framework of Frühwirth-Schnatter and Kaufmann (2008), Wang and Tsay (2019) develop a new Bayesian approach to estimate panel data models with multiple breaks, after which group membership and group-specific parameters can change. Like us, they assume (i) group membership is fixed within, but can shift across, regimes, (ii) common parameters for series within a regimegroup, (iii) Normal-Inverse Gamma conjugate priors on the regime-group-specific mean and volatility parameters, and (iv) no prior information is imposed on series' group assignments. We now point out five key differences between our frameworks. First, Wang and Tsay (2019) impose a maximum number of groups in each regime; we do not. Setting this maximum too low prevents the algorithm from identifying the true number of groups, while setting it too high results in some groups containing no series which is inefficient. 16 Second, we assume that regime durations follow a Poisson prior distribution, while they assume that breaks occur randomly following a Bernoulli process. Third, we specify a Multinomial prior on the number of series that occur in a group, while in their framework series are assigned to groups according to the probabilities implied by a common Dirichlet process. Fourth, we estimate our model using reversible jump MCMC, while they employ a statespace framework in which regime and group membership are latent variables. Finally, they condition on the break dates and subsequently estimate the groups within regimes. This approach contrasts with our birth and death moves which allow joint estimation of breaks and group structure in the newly formed regimes. This can be critical to mixing because doing it in two moves might mean the algorithm must first move to an area of 14 They also show that finite and Dirichlet process mixtures yield similar inference on the number of groups once the two approaches' hyper priors are matched. 15 Creal and Kim (2021) develop a new Bayesian factor model with regression tree priors that performs variable selection-using a spike and slab priorfrom a large set of characteristics, identifying the set of economy-wide variables that drive aggregate time series return predictability, and forms portfolios based on a clustering technique. 16 They estimate the maximum number of groups by experimenting with different values. Bridge sampling techniques obtain the corresponding marginal likelihoods and the value assigned the highest posterior model probability is selected. Our approach does not require this additional computation.
low posterior density (which will likely get rejected) before it can move to an area of high posterior density (Fearnhead 2006

Simulation Study
To demonstrate the success of the method in estimating panel regression models with structural breaks and regime-specific grouped heterogeneity we now conduct a simulation exercise in which the true underlying data generating process is known. The framework we develop is compared with the time-invariant clustering model of Frühwirth-Schnatter and Kaufmann (2008).

Data-Generating Process
The time series dimension is T = 100 and we consider crosssection dimensions of N = 20, N = 100, and N = 500. We consider three different breakpoint scenarios for the data generating process. First, we consider one break occurring at t = 50. Second, we consider two breaks such that τ 1 = 35 and τ 2 = 70. Finally, we consider three breaks such that τ 1 = 25, τ 2 = 50, and τ 3 = 75. We assume that the number of groups in the kth regime, G k + 1, is equal to either 2, 3, 5, or 10. We assume the number of groups is fixed across regimes, but group membership is not. We allocate series to groups randomly and independently in each regime subject to the restriction that each group contains at least one series. This gives us the break dates and the indicator functions that reflect which group the ith series is allocated to at time t.
We construct the dependent variable as follows in which X it ∼ N(0, 1) and u it ∼ N(0, σ 2 g k ). We assume that σ 2 g k = 0.5 across all regime-groups and that the regime-groupspecific slope coefficients are simulated from a standard Normal distribution.
For each breakpoint/group membership scenario, we simulate H = 50 different panels. For each simulated panel, we estimate our model using 10,000 iterations. We discard the first 2000 iterations and thin the remaining 8000 iterations at an interval of 10. This leave 800 iterations for each simulated panel and a total of 40,000 iterations across the 50 different simulated panels. Inference is performed on these 40,000 iterations.
Our prior assumption is that a new break onsets every 120 periods since c = 240 and d = 2. The hyperparameters of the inverse gamma distribution on the error-term variance a and b are set equal to 2 and the variance of β, σ 2 β , is set equal to 0.1. Finally, we set f = 1 and e = 7.

Statistical Efficiency
We evaluate the statistical efficiency of our proposed method and compare it with the time-invariant grouped heterogeneity approach. To do this, we obtain β m,h i,t from Equation (12) using the mth retained draw from the hth simulated dataset for the ith series at time t. Letting β i,t denote the true value of β for the ith series at time t, the average mean squared estimation error (AMSE) from our baseline model is (13) Next, we compute the same measure using the time-invariant grouped heterogeneity model and denote this AMSE GRP . We report the ratio AMSE RSGRP /AMSE GRP in which a value less (more) than one assumes that the regime-specific grouped heterogeneity model is more (less) efficient than the time-invariant grouped heterogeneity model.

Mixing
To evaluate the mixing of the algorithm and whether the chain has converged, we use the Gelman and Rubin (1992) diagnostic statistic. To do this, for each of the H chains we first compute the within-chain variance As M → ∞ and as B → 0, this statistic approaches the value of one from above. The closer the statistic is to one, the more likely the chain has converged. Gelman and Rubin (1992) suggest that a value above 1.2 indicates nonconvergence, but we follow recent convention in using the more stringent threshold of 1.1 to assess convergence.  (2008), averaging across the retained draws, H = 50 simulated datasets, N series, and T = 100 time periods. We see that across all 36 cases, the AMSE is less than one which implies an improvement in accuracy of our approach relative to the time-invariant approach. These improvements are marginally larger when the number of series is smaller and moderately larger when the number of breaks is smaller (but not zero) 17 and when the number of groups is not too large. Specifically, we see that the gains increase as the number of groups increases beyond two, but then decreases sharply as the number of groups in each regime increases from 5 to 10. 18  (2008), averaging across the retained draws, H = 50 simulated datasets, N series, and T = 100 time periods. The Gelman-Rubin statistic to diagnose convergence of our algorithm along with its computation time in minutes (rounded to the nearest minute) are also reported. Bold values of the Gelman-Rubin statistic indicate that the statistic is above the 1.1 threshold, typically used to imply that the sampler has not converged. We use the simulated data and hyperparameters detailed in Section 3.1. Each MCMC sampler is run for 10,000 iterations with a burn-in period of 2000 iterations and the remaining 8000 iterations being thinned at an interval of 10. This leaves 800 iterations across the H = 50 panels which gives a total of 40,000 iterations on which inference is performed for each scenario.

Findings
The Table also displays the Gelman-Rubin statistic from our baseline model to diagnose convergence of the algorithm. Specifically, statistics above 1.1 (displayed in bold font), imply that the algorithm has not converged. According to this statistic, we see that the algorithm has converged in 33 out of the 36 scenarios we consider. The only cases in which it has not converged is when we have 500 series and 10 groups in each regime with either two or three breaks, and with 100 series and 10 groups in each regime with three breaks.
In addition, the computation time increases with the number of series N, the number of breaks K, and the number of groups G. However, even the most computationally intensive scenario-with three breaks, 500 series, and 10 groups in each regime-takes just 30 min to complete. Taken together, these results imply that the method is quite scalable as we increase the number of series, breaks, and groups. The main constraint to this scalability comes as the number of groups grows large, which can be exacerbated when there are multiple regimes (each with a large number of groups). Our approach is therefore suitable for applications with large cross-sections so long as the the total number of groups is not too large.
Finally, we evaluate the ability of our approach to jump across five simpler nested models: the break model with either pooled or unit-specific parameters, the no break model with either pooled or unit-specific parameters, and the time-invariant the time-invariant clustering model deliver biased parameter estimates. The heterogeneous panel breakpoint model correctly identifies the break dates and thus delivers unbiased parameter estimates. However, its estimates are imprecise. Our model delivers unbiased parameter estimates that are far more precise than the heterogeneous breakpoint model. To save space, these results are not shown but are available upon request. grouped heterogeneity model. The algorithm places more than 99% posterior probability on the regime-specific grouped heterogeneity model. We also conduct simulation studies in which the true underlying model in the data generating process is (i) a time-invariant grouped heterogeneity model, (ii) a pooled breakpoint model, or (iii) a pooled no breakpoint model. In each case, the posterior model probability assigned to the true underlying model is never lower than 97% . This suggests that the approach can be used successfully for model comparison. 19

Discussion
Our model specifies both the mean coefficients and the variances to be regime-group-specific. As noted by Wang and Tsay (2019), this choice, combined with conjugate priors, enables the parameters to be integrated out and thus improves the mixing and computational efficiency of the estimation, particularly for applications which involve large cross-sections and thus potentially many groups. It also circumvents the need to carefully tune the proposal distributions in the RJMCMC algorithm. However, these benefits come at the cost of (i) assuming quite a lot of homogeneity across the mean and volatility parameters for groups that contain many series and (ii) performing the clustering on both the mean and volatility parameters which might be undesirable if one wanted to understand the grouping structure based on only the mean parameter.
Regarding the first point, in the prior distribution specified in Equation (5), the user could specify the hyperparameters e and f to favor models with fewer series in each group, reducing the probability that any one group will be assigned many series. This, however, implies a larger number of groups and so may only be practical for smaller cross-sections. As the cross-section grows large, specifying such a prior may result in a large number of groups and this can cause problems for the estimation both in terms of mixing and computation, as illustrated in the simulation study. On the second point, for many applications clustering on both mean and variance may be beneficial. For instance, Pástor and Stambaugh (2001) link the mean and volatility parameters of the U.S. equity premium to better identify breaks because (i) it is typically easier to identify breaks in the volatility of stock returns than in their mean and (ii) large shifts in the mean of the distribution of stock returns may coincide with periods of high volatility, for example, the Global Financial Crisis. In such situations, clustering on mean only will require a larger (cross-sectional) break magnitude for the algorithm to identify it relative to clustering on both mean and volatility. Given the difficulty of identifying breaks in such volatile data, clustering on both mean and volatility may be preferable. For applications in which one wants to group only on the mean parameter, however, one could either apply (i) our procedure with the caveat that the volatility parameter may distort the grouping structure or (ii) a different methodology.

Out-of-Sample Forecasting
Forecasting inflation is challenging since a range of considerations such as monetary policy and the inflation expectations of economic agents that may vary through time must be accounted 19 To save space, these results are not shown but are available upon request.
for. In line with this view, the literature has documented evidence of instability in the predictive performance of leading predictors, such as the unemployment rate and the growth of real output, to forecast inflation (Faust and Wright 2013).
The empirical application analyses the ability of industrylevel predictors to forecast inflation across 20 U.S. industries using univariate predictive panel regressions. The lead predictor is the industry-level unemployment rate. We also present results for industrial production.
It is perhaps most common to focus on Consumer Price Indexes (CPI) or Personal Consumption Expenditures (PCE) when forecasting U.S. inflation rates. However, we focus on Producer Price Indexes (PPI) for five reasons. First, the PPI is particularly useful for measuring real output growth by deflating revenue sources, while the CPI is more useful for measuring cost of living adjustments. Second, the PPI may be a leading indicator of the CPI since increased producer costs are typically passed onto consumers with a lag. For forecasting purposes, as is the focus of this article, predicting PPIs may therefore be useful even if the user is ultimately interested in predicting CPIs. Third, the PPI may offer a truer output measure since it is not affected by consumer demand. Fourth, it has been argued that monetary policy rules should target the PPI. 20 Finally, the PPI has very different dynamics and clearly exhibits structural breaks and grouped heterogeneity as we show, and these features highlight the benefits of our methodology when modeling and forecasting inflation.
Following convention, our predictive regression includes a lagged dependent variable. Dynamic panel models often introduce an asymptotic bias into the parameter estimates (Nickell 1981). For this reason, this section shall focus on the out-ofsample forecasting performance of the model and will not focus on performing inference on the parameter estimates.
The number and timing of breaks along with the regimespecific grouped patterns of heterogeneity are first analysed. Second, we compare out-of-sample forecasts produced by our predictive panel regression with regime-specific grouped heterogeneity with forecasts generated by seven benchmark models. 21 The first is a panel model with grouped heterogeneity but no breaks. The second is a panel model with breaks and parameters pooled across the entire cross-section in each regime. The third is a panel model with breaks and unit-specific parameters in every regime. The fourth benchmark is an AR(1) model estimated by Ordinary Least Squares and applied independently to each series in the cross-section. The fifth is the approach of Wang and Tsay (2019 components model of Stock and Watson (2007). The final model is a large Bayesian Vector Autoregressive (VAR) model with proper shrinkage using the approach of Bańbura, Giannone, and Reichlin (2010). 22

Data
The dataset consists of 20 industries: accommodation, chemicals, computer and electric products, electrical equipment, food, furniture and related products, financials, hospitals, machinery, manufacturing, mining, nonmetallic mineral products, other, petroleum and coal products, plastics and rubber products, real estate, telecommunications, transportation, utilities, and wood products.
Each monthly series begins in January 2004 and ends in March 2018. Industry-level inflation rates, sourced from the Bureau of Labor Statistics (BLS), are constructed as y it = 1200× log(PPI it /PPI it−1 ) in which PPI is the Producer Price Index.
Monthly unemployment rates are sourced from BLS. Monthly industrial production in real terms (seasonally adjusted) for each industry is constructed as X it = 1200 × log(IP it /IP it−1 ) in which IP is the index of industrial production. This data is from Federal Reserve Economic Data (FRED). 23 We use relative importance weights (contribution to the total industrial production index) from FRED as industry weights to construct aggregate U.S. inflation forecasts from the industrylevel forecasts.
We construct an aggregate U.S. inflation series as y USt = 1200 × log(PPI USt /PPI USt−1 ) in which PPI USt is PPI Commodity index for All commodities (not seasonally adjusted) at month t sourced from BLS. 24 Table 2 presents summary statistics for the monthly inflation rates of each industry. The average monthly inflation rates range from −0.826 for computer and electrical products (Computer) to 5.827 for Mining with three industries-computer and electrical products, financials, and telecommunicationsexperiencing deflation over the sample.  Koop (2013), and Carriero, Clark, and Marcellino (2015). 23 Due to data availability constraints, industrial production is not available for financials, accommodation, transportation, real estate, and hospitals. 24 We apply an x13 filter to seasonally adjust the industry-level and aggregate U.S. inflation series. S. and 20 U.S. industries: accommodation, chemicals, computer and electric products, electrical equipment, financials, food, furniture and related products, hospitals, machinery, manufacturing, mining, nonmetallic mineral products, other, petroleum and coal products, plastics and rubber products, real estate, telecommunications, transportation, utilities, and wood products. Each series begins in January 2004 and ends in March 2018. We specify the mean, standard deviation, autocorrelation coefficient, minimum, and maximum values.

Constructing Inflation Forecasts
Each iteration on which inference is performed, our framework estimates G k + 1 groups in each of the K + 1 regimes. The group structure and parameter estimates from the G K+1 + 1 groups in the final regime are used to produce forecasts. The Markov chain Monte Carlo sweep consists of many iterations. On each iteration a different number of breaks may be estimated, at different times, with a different number of groups and crosssectional ordering of series in each regime. Any uncertainty surrounding all of these elements of the model is inherently incorporated into the parameter estimates and thus the inflation forecasts through Bayesian model averaging. Aggregate U.S. inflation forecastsŷ US,t+1 are computed as a weighted average of the industry-level forecasts, using the importance weights for each industry at time t.

Priors
In the empirical application we assign the same prior values to each of the competing models. Specifically, we assume that a new regime onsets every 10 years by setting the hyperparameters c=240 and d=2. The prior variance of β is determined by σ 2 β which is set equal to 0.1. The hyperparameters a and b determine the error-term variance and we set a = b = 2. For the models with grouped heterogeneity the prior expected number of groups (which is constant across any identified regimes) is determined by the hyperparameters e and f . We set f = 1 and e = 7.

Evidence of Breaks
The model with regime-specific grouped heterogeneity identifies two breaks with 100% posterior probability. These break The posterior model probability for the baseline model is 0.87, while just 0.08 posterior probability is assigned to the constant grouped heterogeneity model, indicating that allowing the group structure to vary through time is highly important when attempting to model the underlying data-generating process. 25 The pooled panel break model is assigned just 0.02 posterior probability, suggesting that allowing only for grouped heterogeneity is more important than allowing only for breaks when fitting the data.

Out-of-Sample Inflation Forecasts
We first estimate the model using only the data from January 2004 through December 2008 corresponding to an initial estimation period of five years. We recursively move through the sample one month at a time generating forecasts at each point using only the data available at the time the forecast is made. 25 We assume a uniform prior across these models.
The final forecast for March 2018 is generated from parameters estimated using the data from January 2004 through February 2018.
Regime-specific grouped heterogeneity allows for a datadriven approach to bias-variance management that varies through time by estimating the number of groups and crosssectional ordering of series in each regime. To avoid biases, some series may be allocated their own group, such as the petroleum industry which has a notably different inflation series from all the other industries, while others may be grouped together to allow the pooled parameter within their group to be estimated more precisely. Allowing for time-variation in the group dynamics is critical since the optimal bias-variance tradeoffs implied by the grouping structure may be unstable, that is, one might expect fewer groups and more pooling during highly volatile episodes such as the global financial crisis.

Performance of Inflation Forecasts
The forecasting performance of our model relative to each of the benchmarks is measured through the out-of-sample R 2 value in which MSE RSGrp and MSE bmk denote the mean squared forecast errors obtained from the model with regime-specific grouped heterogeneity and the benchmark model. If our model outperforms the benchmark in terms of lower mean squared  (1) model, the model with breaks and parameters pooled across the entire cross-section in every regime, the model with breaks and unit-specific parameters in every regime, the model with constant grouped heterogeneity, the univariate unobserved components model of Stock and Watson (2007), the model of Wang and Tsay (2019), and a large Bayesian VAR with proper shrinkage using the approach of Bańbura, Giannone, and Reichlin (2010). The predictive variable is either the unemployment rate (unmpl) or industrial production (indpr). Significance is evaluated using the Diebold-Mariano (DM) and Clark-West (CW) tests. For each procedure the table displays the number of industries for which our method produces significantly worse, insignificantly worse, insignificantly better, and significantly better forecasts at the 5% level. † indicates the bin in which lies the t-statistic for aggregate U.S. inflation forecasts.
forecast errors the R 2 OoS values will be positive. All reported R 2 OoS values are multiplied by 100. Each window in Figure 2 corresponds to a different benchmark and displays the R 2 OoS relative to that benchmark across the 21 cases (20 industries and U.S. aggregate). OoS values for all but four industries with 15 cases having an R 2 OoS value greater than 10%. Our model outperforms for 16 out of the 21 cases relative to the large Bayesian VAR and for 12 cases relative to the approach of Wang and Tsay (2019).
To assess the statistical significance of any out-or underperformance delivered by our method we first implement the procedure of Diebold and Mariano (1995) and second, to account for a nonstandard distribution of the test statistic that may arise from our forecasting models being nested, the MSEadjusted test statistic of Clark and West (2007).
The results are presented in Table 3. The table displays for how many of the 21 cases our method significantly underper-forms (t < −1.64), insignificantly underperforms (−1.64 < t < 0), insignificantly outperforms (0 < t < 1.64), or significantly outperforms (t > 1.64) when using the Diebold-Mariano (DM) or the Clark-West (CW) test. The seven panels correspond to the seven benchmark models. The first row of each panel displays results when using the unemployment rate (unmpl) as the predictive variable alongside an intercept and a single lag of inflation. The bin in which U.S. aggregate inflation falls is indicated by the † symbol.
Focusing on the DM statistic, relative to the univariate AR(1) model our method produces significantly more accurate point forecasts for 20 out of 21 cases including the U.S. aggregate. The model still outperforms for the remaining industry but the outperformance is insignificant.
The results are almost as strong relative to the panel break models with pooled and unit-specific parameters significantly outperforming for 20 and 18 cases including the U.S. aggregate. Our model significantly outperforms the constant grouped heterogeneity model for 17 cases. Across all 147 cases (21 cases against each of the seven benchmarks) our model significantly underperforms just 13 times.
Using the Clark-West statistic, the results are slightly stronger. Our method significantly outperforms for between 11 and 21 cases, significantly underperforms for just 12 out of the 147 cases, and always significantly outperforms for the U.S. aggregate. These formal tests of statistically significant out-or underperformance are measured using the entire out-of-sample period and thus are averaged across all observations. A strong forecasting model, however, should consistently outperform the benchmark models over the entire out-of-sample period and not be driven by only one or two short periods. The cumulative sum of squared forecast error differences relative to each of the seven benchmark models for the U.S. aggregate is upward-sloping throughout the out-of-sample period (results not shown to save space but are available upon request) indicating that our model produces consistently more accurate forecasts than each of the seven benchmarks. In addition, for the majority of industries, our method consistently outperforms the seven benchmarks throughout the sample.

Regime-Specific Grouped Heterogeneity
Comparing the forecasting performance of the competing models in the previous section provides evidence in favour of allowing for regime-specific grouped heterogeneity when generating out-of-sample inflation forecasts. Allowing for either grouped heterogeneity or structural breaks alone is unable to rival the forecast accuracy generated from allowing for regime-specific grouped heterogeneity. It is therefore critical for economic agents to allow for regime-specific grouped heterogeneity when forecasting inflation in real time.
The top left window of Figure 3 graphs the posterior mode number of groups estimated from the model that allows for regime-specific grouped heterogeneity (solid black line) and the model that allows for constant grouped heterogeneity (red dashed line). Relative to the model with constant grouped heterogeneity, our model identifies notably fewer groups in the second regime that corresponds to the global financial crisis. Five groups are identified compared with 10 and 9 in the preceding and subsequent regimes. The top right and two lower windows display histograms of the posterior mode number of series in each group across the three regimes identified by our model. Identifying fewer groups results in more series sharing groups and thus parameters being pooled across more series in the crisis regime (bottom left) compared with the other two regimes (top right and bottom right).
Only the petroleum industry, which has a markedly different inflation series from all the other industries, is assigned its own group during the crisis regime compared with five series being assigned their own groups in the other regimes. This illustrates the ability of our framework to vary the grouped heterogeneity through time enabling fewer groups and more pooling during highly volatile and short regimes that are more susceptible to parameter estimation error. Conversely, it can identify more groups with less pooling, even allocating unitspecific parameters to a number of series, reducing the amount of bias induced in the parameter estimates during longer and less volatile regimes, which is particularly useful for series that are unlike any other.
Models that ignore time-varying grouped heterogeneity lack this flexibility. The model that allows for constant grouped heterogeneity, for example, identifies 11 groups throughout the sample (dashed red line in top left window) with six of them being assigned their own group (not shown). This inability to allow the number of groups and degree of pooling to vary through time results in suboptimal management of bias-variance tradeoffs and ultimately less accurate forecasts as detailed in the previous section. By the same token, the panel break models with pooled and unit-specific parameters are unable to identify grouped heterogeneity resulting in biased or imprecise parameter estimates and less accurate forecasts.
Finally, using the full sample our approach identifies two breaks in February 2008 and October 2009, while the approach of Wang and Tsay (2019) detects only one break in September 2009. Our algorithm detects 10, 5, and 9 groups across the three regimes, while theirs detects nine and eight across their two regimes with similar patterns of group membership. Moreover, it appears that both algorithms have converged as the Gelman-Rubin statistic from our model is 1.08 and from their model is 1.09. The key difference therefore is that our algorithm detects the short regime from February 2008 to October 2009 during which the number of groups greatly falls and the degree of pooling correspondingly increases. This helps to robustify the forecasts from our model during a highly uncertain and volatile period related to the Global Financial Crisis (by reducing the variance from a bias-variance tradeoff standpoint) which their algorithm cannot exploit because it does not detect the first breakpoint. Indeed, cumulative sum of squared forecast error plots (not shown) confirm that the forecasts generated from our model strongly outperform those generated from their model around this period, supporting this assertion. 26 26 Our model's superior predictive accuracy could derive partially from faster detection of changing group membership in real time. If a model quickly detects changing group membership, it may realize most of the potential

Conclusion
This article develops a new approach to estimate panel regression models in the presence of structural breaks and regime-specific grouped heterogeneity. The number and timing of breaks is estimated as is the number of groups and the crosssectional ordering of series within each regime. Parameters are pooled within, but differ across, regime-specific groups. The fixed effects, slope coefficients, and volatilities follow the same unobserved grouping structure that is estimated using Markov chain Monte Carlo methods. Moreover, this grouping structure and the model parameters are allowed to change an unknown number of times at unknown locations identified by structural breaks.
Our method detects two breaks at February 2008 and October 2009, enclosing a short and volatile regime corresponding to the global financial crisis, in an empirical application that forecasts U.S. industry-level inflation rates using lagged industry-level unemployment rates and monthly data from 2004 to 2018. Grouped heterogeneity varies considerably across the three regimes. Models that allow for grouped heterogeneity but ignore breaks or allow for breaks but pool parameters across the entire cross-section induce biased parameter estimates, whereas panel break models with unit-specific parameters are estimated imprecisely. The improvement in out-of-sample forecasting performance relative to seven popular benchmarks for 20 U.S. industries and the U.S. aggregate suggests that our methodology successfully generates accurate inflation forecasts in real time.

Supplementary Materials
The supplementary materials provide additional information. Specifically, Section 1 of the supplementary appendix provides more details of the estimation procedure and Section 2 gives details of various robustness checks conducted in the empirical analysis. Finally, the supplementary appendix formally sets out the relation between the Multinomial and Poisson distributions.

Disclaimer
The views expressed in this article are those of the author and do not necessarily reflect the views and policies of the Board of Governors or the Federal Reserve System.