Identification and Bayesian Estimation of Dynamic Factor Models

We consider a set of minimal identification conditions for dynamic factor models. These conditions have economic interpretations and require fewer restrictions than the static factor framework. Under these restrictions, a standard structural vector autoregression (SVAR) with measurement errors can be embedded into a dynamic factor model. More generally, we also consider overidentification restrictions to achieve efficiency. We discuss general linear restrictions, either in the form of known factor loadings or cross-equation restrictions. We further consider serially correlated idiosyncratic errors with heterogeneous dynamics. A numerically stable Bayesian algorithm for the dynamic factor model with general parameter restrictions is constructed for estimation and inference. We show that a square-root form of the Kalman filter improves robustness and accuracy when sampling the latent factors. Confidence intervals (bands) for the parameters of interest such as impulse responses are readily computed. Similar identification conditions are also exploited for multilevel factor models, and they allow us to study the “spill-over” effects of the shocks arising from one group to another. Supplementary materials for technical details are available online.


INTRODUCTION
Dynamic factor models of high dimension are increasingly used in data-rich environments. This is particularly the case in economics and finance where common shocks, such as an oil price shock, drive the comovements of economic variables. As a result, economists are increasingly relying on these models for policy analysis. Examples include Bernanke, Boivin, and Eliasz (2005), Stock and Watson (2005), and Kose, Otrok, and Whiteman (2003), among others. This article considers identification issues for a general class of dynamic factor models and their Bayesian estimation.
Three sources of dynamics are considered. One is that the latent factor is dynamic. An example would be x it = λ i f t + e it , where f t is a vector autoregressive process. While f t is dynamic, the relationship between x it and f t is static. The other source of dynamics is that f t affects x it dynamically, for instance, x it = γ i f t + τ i f t−1 + e it so that the lags of f t directly impacts x it . We shall refer to the first example as a static factor model in spite of f t being dynamic. The second example can be put into a static factor framework. Let F t = (f t , f t−1 ) and λ i = (γ i , τ i ), then x it = λ i F t + e it . We will also consider another source of dynamics, that is, the idiosyncratic errors, e it , are serially correlated. There exists a large literature for analyzing static factor models. There are drawbacks, however, to treating a dynamic factor model as a static one. For example, the number of static factors is doubled here. To fix the rotational indeterminacy, the number of normalization restrictions would be (2q) 2 = 4q 2 , quadrupling the number of restrictions that are actually needed for identification.
The model considered in this article has more structures than the generalized dynamic factor model of Forni et al. (2000). We incorporate restrictions on the factor loadings and study a set of simple identification schemes for the dynamic factor models. These identification restrictions have economic interpretations. For instance, they allow a structural vector autoregression model (SVAR) to be analyzed through the factor analysis framework, while permitting measurement errors. We then consider the Bayesian estimation of dynamic factor models under these identification schemes. The Bayesian method proposed here has a number of desirable features. First, we allow cross-equation restrictions as well as sign restrictions on the factor loadings, which is not available in the principal component analysis of factor models as in Stock and Watson (1998) and Bai and Ng (2002). Second, we employ the Jeffreys' priors to account for the lack of a priori information about model parameters. Third, we combine the structural restrictions with a square-root form of the Kalman filter to improve the numerical robustness and accuracy when sampling the latent factors. Fourth, to make random draws from the posterior of the factor loadings and the latent factors, we provide a robust algorithm that guarantees the variance of the conditional normal distribution to be numerically positive definite when it should be.
The Bayesian analysis framework proposed in this article is able to perform estimation and hypothesis testing of large (N, T ) dynamic factor models with rich dynamic structures in both the measurement equations and the law of motion for the dynamic factors. We show how all three sources of dynamics can be naturally handled within the Bayesian framework.
Our method is closely related to the factor-augmented vector autoregression (FAVAR) approach of Bernanke, Boivin, and Eliasz (2005), and the dynamic factor model of Stock and Watson (2005). The model of Bernanke, Boivin, and Eliasz (2005) features a static factor model for the measurement equation plus a VAR specification for the latent factors and a few observed aggregate macro variables. Stock and Watson (2005) further considered a setup featuring lags of dynamic factors and serially correlated error terms in the measurement equation, which in turn implies a high-dimensional VAR with a large number of restrictions on the autoregressive coefficients. Both articles consider a two-step approach that applies either a direct principal components method or an iterated principal components method to estimate the static factor space, and then uses the estimates to obtain information about dynamics of either static factors or dynamic factors. Bernanke, Boivin, and Eliasz (2005) also considered a Bayesian approach using the Gibbs sampling method for state-space models. They imposed conjugate priors for model parameters, and derive the posterior under the identifying assumption for conventional static factor models.
Our method differs from that of Bernanke, Boivin, and Eliasz (2005) or Stock and Watson (2005) in several important aspects. Most importantly, we consider a minimal set of identifying restrictions to directly identify the dynamic factors instead of the static factors or the associated factor space. Second, different from most existing literature, our identifying restrictions allow the dynamic factors to be cross-correlated, and thus allow the study of impulse response function of one factor to the innovation of another factor. Similar identification conditions were considered by Bai and Wang (2014), here, however, we provide a direct and simpler proof. In addition, we consider identification for multilevel factor models as well as SVAR model with measurement errors. Third, we develop a Bayesian approach to analyze the dynamic factor model with general cross-equation restrictions along with sign restrictions on factor loadings; the Jeffreys' priors are considered for the model parameters. Finally, we also consider the Bayesian estimation of serial correlations in the idiosyncratic errors, which is important for economic time series. The outcome of the Bayesian analysis is the posterior distribution of model parameters along with the latent dynamic factors. The impulse responses and the associated confidence bands are natural by-products.
To conduct model comparison so as to determine the number of dynamic factors, we use the partial Bayes' factor, in which the first few observations are used to form a proper prior distribution for computing the marginal likelihood of the latter sample. This is in line with O'Hagan's (1995) fractional Bayes' factors or Berger and Pericchi's (1996) intrinsic Bayes' factors. The Bayesian method proposed in this article can also incorporate proper prior distributions so as to derive the conventional Bayes' factors or the marginal likelihood of the data for the purpose of model comparison.
Another related approach is to conduct Bayesian analysis of dynamic factor models with time-varying parameters, such as stochastic volatilities (Aguilar and West 2000), or both timevarying factor loadings and stochastic volatilities (Del Negro and Otrok 2008). The main focus of both articles is the timevarying feature of the dynamic factor model. The identifying restrictions assume that factors are mutually independent. The measurement equations are static. Our identification scheme targets the dynamic factors themselves instead of the static factors, and the dynamic factors are identified without independence assumption. The time-varying parameters version of the dynamic factor model is an important model capable of characterizing the evolution of business cycles. For example, using a time-varying parameters VAR, Cogley and Sargent (2005) have presented evidence of evolving inflation dynamics in the U.S. To limit our focus, however, we do not explore that direction in this article.
In the literature, the maximum likelihood estimation of dynamic factor models has been considered by many authors, such as Geweke (1977), Watson and Engle (1983), Quah and Sargent (1993), Kim and Nelson (1999), Junbacker and Koopman (2008), and Doz, Giannone, and Reichlin (2012). The maximization can be implemented through the expectation-maximization (EM) method as in Doz, Giannone, and Reichlin (2012). The EM method iterates between the expectation step and the maximization step until convergence so as to achieve the local maximum of the likelihood function. The generalized least-square estimator for nondynamic factor models was studied by Breitung andTenhofen (2011) andChoi (2012).
The Bayesian Markov chain Monte Carlo (MCMC) approach provides a more complete picture regarding the sampling distribution of the objective of interest, the outcome of which is the posterior joint distribution of the model parameters and the latent factors. Our focus is on the analysis of dynamic factor models with structural restrictions, in which factors are potentially correlated with each other. The existing literature focuses on estimating a rotation of the underlying factors. We consider the identification of factors without rotations.

DYNAMIC FACTOR MODELS: IDENTIFICATION CONDITIONS
We consider a dynamic factor model featuring a dynamic factor representation of the observables and a law of motion for the factors given by a VAR(h) process. Using the same notation as in Bai and Ng (2007), the model is given by (1) and (2): where the q × 1 vector f t is the latent dynamic factor, the N × 1 vector X t is the observable at time t, and the N × q matrix j is the dynamic factor loading for f t−j , j = 0, 1, . . . , s. To fix this idea, we first assume that e t is iid N (0, R), and will relax this assumption later when we consider serially correlated error terms. The dynamic factor follows a VAR(h) process where ε t is iid N (0, Q), j is a q × q matrix of autoregressive coefficients, 1 ≤ j ≤ h. We assume that {e t } T t=1 is independent of {ε t } T t=1 . The above dynamic factor model has two different aspects of dynamics. First, there are s lagged factors entering Equation (1), representing a dynamic relationship between f t and the observable X t . Second, the dynamics of the factors is explicitly parameterized by a VAR(h) process. This article emphasizes the dynamics in Equation (1), because it is these dynamics that make a major distinction between dynamic and static factor analysis. When s = 0 (no lags entering the measurement equations), the model will be regarded as a static factor model, as is customary in the existing literature, even though f t is dynamic. The number of dynamic factors is the dimension of f t , that is, q, irrespective of s and h.
Model (1) can be put into a static factor form. Let There are q(s + 1) number of static factors, which is the dimension of F t . Because of rotational indeterminacy, factor models, either in its dynamic form or in static form, are not identifiable without additional restrictions. Classical factor analysis (e.g., Anderson andRubin 1956, or Lawley andMaxwell 1971) focuses on static factor models. Classical factor analysis assumes a fixed N, and the model is often estimated by the maximum likelihood method. Under large N, the principal components (PC) method is often used to estimate (3). The PC method would need to impose q 2 (s + 1) 2 restrictions to uniquely fix the rotational indeterminacy. These restrictions were discussed by Bai and Ng (2013). Let F = (F 1 , . . . , F T ) . The restrictions in static factor analysis take the form where r = q(s + 1), and is diagonal, or (ii) F F /T = I r , and the first r × r block of is lower triangular, or (iii) F t is unrestricted, and the first r × r block of is an identity matrix.
In each form, there are q 2 (s + 1) 2 restrictions. These restrictions do not necessarily structurally identify F t (or f t ), but a rotation of F t . The reason is that these restrictions are normalization restrictions, and they may not have economic interpretation in general. For some special cases, for example, when s = 0, (iii) allows the identification of f t and . In general, we only estimate rotations of F t and under the static factor framework. The identification method to be introduced requires only q 2 restrictions, far fewer than the static counterpart. We impose restrictions on the innovations ε t , instead of f t . Because f t is a VAR process, the components of f t are mutually correlated, even when the components of innovations are uncorrelated. So it is appropriate not to assume f t to have uncorrelated components.
It is well known that the dynamic factor model defined in (1) and (2) is not identified without further restrictions. One can always premultiply the dynamic factor f t with an arbitrary full rank q × q matrix to define a new model. The newly defined model will be observationally equivalent to the original dynamic factor model. We consider two types of identifying restrictions and discuss their implications for structural factor analysis as well as structural VAR analysis. The key identification result is that we only need q 2 restrictions. Let where 01 is q × q. Two different sets of identification restrictions, referred to as DFM1 and DFM2, are given below. DFM1: var(ε t ) = I q , and the q × q matrix 01 is a lowertriangular matrix with strictly positive diagonal elements. This is sufficient to identify the dynamic factor model. There are a total of q 2 restrictions in DFM1 irrespective of the number of lags s and h in the model. The requirement var(ε t ) = I q imposes q(q + 1)/2 restrictions and the requirement that 01 is lower triangular imposes q(q − 1)/2 restrictions. The latter restriction says that the first variable X 1t is affected contemporaneously by the first dynamic factor, and the second variable X 2t is affected contemporaneously by the first two dynamic factors, and so on. In practice, the choice of the first q variables is important.
It makes more sense to assume the innovations ε t to be mutually uncorrelated than to assume f t to be mutually uncorrelated. This leaves the autoregressive coefficient matrix j , j = 1, . . . , h in (2) unrestricted, allowing the dynamic factors to interact with each other. Under DFM1, we shall say 0 is lower triangular for ease of exposition.
The proof is in the Appendices. Similar to Proposition 1, if ε t has an identity covariance matrix and j is lower-triangular for some given j ∈ {0, 1, . . . , s}, then the dynamic factor model in (1) and (2) is identified up to a sign change. We summarize the result in Corollary 1.

Corollary 1.
Consider the dynamic factor model as in (1) and (2). Assume that var(ε t ) = I q , j is a lower-triangular matrix for some given j = 0, 1, . . . , s, and the diagonal elements of the first q × q block of j are all strictly positive. Then the dynamic factors f t and the dynamic factor loadings j , j ∈ {0, 1, . . . , s} are uniquely identified.
Notice that in DFM1, other normalizations on the factor loadings also work, as long as they imply q(q − 1)/2 nonredundant restrictions. For example, the normalization that j j being diagonal for some given j ∈ {0, 1, . . . , s} also works, which is similar to the normalization in principal component analysis of factor models (see Anderson 1984;Connor and Korajzcyk 1986;Bai and Ng 2002;Stock and Watson 2002;Bai 2003). A variation to DFM1 is that var(ε t ) is diagonal but 01 has 1's on the diagonal. We next consider a different set of identification conditions. DFM2: The q × q block of 01 is an identity matrix, that is, This is also sufficient to identity the factor model. Again, there are q 2 restrictions, and no restrictions are imposed on matrix of the dynamic factors f t or the static factors F t to be an identity matrix.

Proposition 2.
Consider the dynamic factor model as in (1) and (2). Under the normalization that the upper q × q block of Similar to Proposition 2, if the upper q × q block of j is an identity matrix for some j ∈ {0, 1, . . . , s}, then the dynamic factor model in (1) and (2) is identified. We summarize the result in Corollary 2.
Corollary 2. Consider the dynamic factor model as in (1) and (2). Under the normalization that the upper q × q block of j is an identity matrix for some given j ∈ {0, 1, . . . , s}, then the dynamic factors f t and the dynamic factor loadings j , j ∈ {0, 1, . . . , s} are uniquely identified. Using a state-space representation of the factor model, Doz, Giannone, and Reichlin (2012) explicitly considered the dynamics in factors. Although they considered a vector autoregression in the factors, the factors considered therein are still "static factors." Only the current factors f t enter the measurement equation at time t. Their specification also differs from Amengual and Watson (2007) in that their shocks to factors have a full rank covariance matrix. Stock and Watson (2005) considered a more general version of dynamic factor models. They allowed the error terms in the observation equation to be serially correlated. Like Doz, Giannone, and Reichlin (2012), their identification is also based on restricting the static factors, so that the principal component analysis is embedded into an iterative least-square estimation. Our identification scheme is directly on the dynamic factors f t , not on the static factors F t = (f t , f t−1 , . . . , f t−s ) .

Identification for Multilevel Factor Models
The dynamic factor model with a multilevel factor structure has been increasingly applied to study the comovement of economic quantities at different levels (see Gregory and Head 1999;Kose, Otrok, and Whiteman 2003;Crucini, Kose, and Otrok 2011;Moench, Ng, and Potter 2013). When researchers confront the model with data, they usually find ways to impose economic meaning of factors. For example, Kose, Otrok, and Whiteman (2003) and a number of subsequent articles considered a dynamic factor model with a multilevel factor structure to characterize the comovement of international business cycles on the global level, regional level, and country level, respectively. For ease of presentation, we will focus on the case with only two levels of factors: a world factor and a country-specific factor. Extensions to models with more than two levels are straightforward.
Consider C countries, each having an n c × 1 vector of country variables X c t , t = 1, . . . , T , c = 1, . . . , C. One may model X c t as being affected by a world factor f W t and a country factor f c t , c = 1, . . . , C, all factors being latent, where c W , c C are the matrices of factor loadings of country c, e c t is the vector of idiosyncratic error terms (possibly serially cor-related) for country c's variables, and r c denotes the region that country c belongs to. The data-generating process for factors is given by a vector autoregression (VAR) with diagonal autoregressive coefficient matrix and independent error terms. Let F C t be a vector collecting all the country factors. Kose, Otrok, and Whiteman (2003) assumed both f W t and f c t are scalars and considered a VAR(1) specification of factors given below, where {ρ W , ρ C } are also diagonal matrices conformable to the dimension of corresponding factors. The innovation to factors vec[u W t , U C t ] is independent of e c t at all leads and lags and is iid normal, where {σ 2 W , σ 2 C } are themselves diagonal matrices. Given some sign restriction, this special VAR specification allows one to separately identify the factors at different levels up to a scale normalization. To achieve identification up to a sign normalization, it is sufficient to assume that the variance of vec [u W t , U C t ] is an identity matrix.
Observe that it is sufficient but not necessary to assume the independence among {f W t , f c t , c = 1, . . . , C} to achieve identification of the factors. Equation (5) rules out the possibility that different factors are related. In practice, factors might be correlated through "spill-over" effects, that is, the impulse responses of one factor to the innovation of another factor are not zero. Take the world factor, for example. It might be affected by the exogenous oil shock as well as the "spill-over" effects from individual countries or regions. A shock that is originated from a large economy like the United States might have an impact on other economies over time. This implies that the innovation to an individual country might have a "spill-over" effect on the rest of the world, and thus possibly has an effect on the dynamics of the world factor.
According to the identification scheme given in Proposition 1 (DFM1), our new identification scheme allows the autoregressive coefficient matrix in Equation (5) to be unrestricted. Under our assumption and using the same notation as (5), we have where the innovation to factors vec [u W matrix in (7) to be diagonal. When confronting the model with data, we will be able to test Equation (7) against Equation (5) using a Wald-test in classical estimation, or using the Bayes' factors in Bayesian estimation. We summarize the identification conditions in the following proposition.
Proposition 3. Define a dynamic factor model as in (4), (7), and (6). Assume that σ 2 are all lower-triangular matrices with strictly positive diagonal terms, and [ c W , c C ] is of full column rank (c = 1, . . . , C), then the factor model is uniquely identified.
The proof is in the Appendices. Proposition 3 can be easily extended to the case with three or more levels. A key feature with multilevel factor models is that there are many zero blocks in the loading matrix. The zero blocks are not sufficient to prevent the rotational indeterminacy, additional restrictions are needed. For efficient estimation, we will take into account the zero restrictions. Finally, note that Equation (4) is static. The presence of lags of f W t and f c t will not affect the identification conditions, and the proposition still holds, similar to the first two propositions. A similar set of identification conditions, which does not require sign restrictions on the factor loadings, is to require both σ 2 W and σ 2 C to be diagonal, and 1 W and { c C } C c=1 to be lower-triangular matrices with 1's on the diagonal. Under a static factor model setup, Wang (2010) estimated model (4) using an iterated principal components method. The identification conditions therein assume that the global factors are orthogonal to all country factors while country factors can be correlated with each other. Different from Wang (2010), we directly restrict the innovations of factors and allow lags of factors to enter Equation (4). Such restrictions naturally allow all factors to be correlated with each other.

Structural VAR With Measurement Errors
The dynamic factor model is useful for analyzing the structural VAR (SVAR) models, especially when the variables are measured with errors. Consider a standard SVAR given by where Z t is a q × 1 vector of economic variables, and a t is the vector of structural shocks with E(a t a t ) = I q . Let The reduced form VAR is given by SVAR analysis aims to identify A 0 under structural restrictions. Suppose that Z t is not directly observable, but observable with a measurement error η t such that In this case, it is difficult to estimate the SVAR model based on Y t . Now suppose that a large number of other observable variables (W t ) are determined by Essentially, W t is also driven by the fundamental structural shocks a t through Z t . Then it is possible to recover Z t so that SVAR analysis can be performed. Let Then we have a structural dynamic factor model According to the identification scheme DFM2, Equation (8) is identified and can be analyzed using a Bayesian approach.
In particular, without further assumptions, we are able to es- We may also incorporate additional structural restrictions (such as longrun restrictions) as in standard SVAR analysis to identify A 0 . The impulse responses to structural shocks can be obtained as ∂Y t+k /∂a t = ∂Z t+k /∂a t = ∂f t+k /∂a t and where the partial derivative is taken for each component of a t when a t is a vector.

BAYESIAN ESTIMATION USING GIBBS SAMPLING
Our estimation of the model is based on a likelihood approach, using the Bayesian estimation of a restricted linear state-space system. The likelihood estimation is feasible and computationally efficient because q N . An alternative likelihood approach is to use the expectation-maximization algorithm, which is not considered in this article. We choose the Bayesian approach mainly because it allows a coherent hypothesis testing, and the Bayes' factors allow a simple model comparison procedure to determine the number of dynamic factors in models (1) and (2). It is helpful to form a state-space representation of the dynamic factor model, and then Carter and Kohn's (1994) multimove Gibbs-sampling algorithm can be modified for estimation. Our specific estimation procedure has several desirable features. For example, it explicitly takes into account parameter restrictions and can accommodate cases with or without serial correlation in the error terms. In addition, we focus on directly estimating the dynamic factors instead of the static factors. Last but not least, to avoid numerical singularities due to round-off errors, we use the square-root form of the Kalman filter (Bierman 1977) and apply the Sherman-Morrison-Woodbury identity to guarantee that the covariance matrices are always numerically positive definite. Here, we focus on how to formulate the parameter restrictions so that a simple Bayesian algorithm can be designed. We then discuss the case with serially correlated measurement errors.

Formulating the Restrictions on Factor Loadings
In additional to the minimal identifying restrictions as in DFM1 or DFM2, there may be other overidentifying restrictions. For example, the multilevel factor model has many zero blocks. Cross-equation restrictions may also be present. We show how these restrictions can be imposed in a systematic way. We start with a static factor representation of (1) where F t = [f t , f t−1 , . . . , f t−s ] , and = [ 0 , . . . , s ]. Let X be the (T − s − 1) × N data matrix, F be the (T − s − 1) × q(s + 1) matrix of the static factors, then we have a matrix representation of the factor model where λ = vec( ) and vec(E) ∼ N (0, R ⊗ I T −s−1 ). The identifying restrictions in DFM1 or DFM2 impose linear restrictions on the factor loadings. Consider the following restriction where δ is a vector of free parameters with dim(δ) ≤ dim(λ). In general, B and C are known matrices and vectors that are defined by either identifying restrictions or other structural model restrictions. We give an example how B and C in (10) can be obtained from DFM2.
• Example: In DFM2, the upper-left q × q block of is I q . So B is a Nq(s + 1) × [Nq(s + 1) − q 2 ] selection matrix consisting of zeros and ones. C is a vector of zeros and ones. δ is a [Nq(s + 1) − q 2 ] × 1 vector of free parameters. There are q ones and q 2 − q zeros in λ, for which the corresponding rows of B are given by zeros and the corresponding elements of C are given by either one or zero. Combining all other rows of B, we obtain an identity matrix I Nq(s+1)−q 2 . For this particular case, one only needs to sample the free parameters in equation-by-equation due to the absence of cross-equation restrictions.
In view of (10), we may rewrite the restricted factor model (9) as . Impose the Jeffreys' prior for δ and R: which implies the following conditional posterior: withˆ = X F (F F ) −1 . Thus, we may draw δ according to (11) and construct the associated loading matrix (δ). In the meantime, we may draw R according to an inverse-Wishart distribution The Jeffreys' prior p(δ; R) for δ and R is improper. An important open question is under what conditions the corresponding posterior is proper. While the simulation results below show that the Gibbs sampler always converges and the posterior is well behaved (the point estimates are close to the true factors), further theoretical investigation is needed. We leave this as a future research topic.

Restrictions on R
If the cross-section dimension N is small in (1), we may allow R = var(e t ) to be an unrestricted covariance matrix. Given our knowledge of the model structure, the factors and model parameters can be identified through the autocorrelation function implied by the dynamic model, in contrast to the static factor model that is based on the variance only. However, if N is large, it is practically infeasible to make inference on an unrestricted R. On the other hand, for large N, the majority of the cross-correlation in the data matrix is accounted for by the dynamic factors. To preserve a parsimonious model structure, we may consider restricting R to be diagonal when N is large (e.g., when N ≥ 20). Other plausible restrictions include making R a block-diagonal matrix. For example, in the multilevel factor model, we may allow the error term e t to be crosscorrelated within the same group, but uncorrelated between groups.
We consider a diagonal R = var(e t ) = diag{σ 2 1 , . . . , σ 2 N }. Impose the Jeffreys' prior for δ (free parameters in loadings) and R with p(δ, R) ∝ N n=1 (1/σ 2 n ). Then, we may draw σ 2 n one-by-one according to the following inverse-Gamma distribution When the restrictions on the factor loadings are on an equationby-equation basis, we may also sample the factor loadings equation-by-equation: where the restriction for equation i is given by

Serial Correlation in the Measurement Errors
Serial correlation in the measurement errors in (1) can be conveniently modeled using a VAR(1) process where Let L be the lag operator and apply (I − ρL) on both sides of the measurement Equation (1), we obtain Thus, conditional on parameters {ρ, , R, , Q}, the latent factors can be analyzed in the same way as the case with serially uncorrelated measurement errors. Conditional on the data and the factors, the Gibbs sampler for model parameters is constructed by dividing the parameters into three blocks: {ρ, R}, { }, and { , Q}. Note that for the ith measurement equation, where λ i (L) is the ith row of (L). Given {ρ, R }, the factor loadings can be analyzed from the regression analysis with known variance. Given , ρ, and R can be analyzed equation-byequation for the regression e it = ρ i e i,t−1 + σ i u it . Sampling and Q is basically the same as before. The same method applies to the case with VAR(p) measurement errors. We provide the detailed sampling algorithm in the supplementary material.

Serially Uncorrelated Measurement Errors
This section uses some simulation studies to show the effectiveness of the Bayesian estimation method given our identifying restrictions. We first consider a dynamic factor model with q = 2, s = 1, h = 2. In particular, the data are generated according to (1) and (2), where the parameters are draw from λ ij s ∼ iid U (0, 1), e t ∼ iid N (0, R), ε t ∼ iid N (0, Q). We choose R = I N , Q = I q , 1 = [0.5, 0; −0.1, 0.2], and only affected by the first dynamic factors, the second row being affected by the second dynamic factors only, the third row being affected by lagged factors only, the fourth row being affected by current factors only, and the fifth row being unrestricted.
With the sample size chosen to be N = 10, 20, 50, 100, 150, 200 and T = 50, 100, 200, we first generate a sample of size N = 200, T = 200. Then, we estimate the model according to 18 samples with different combinations of N and T. We use the Jeffreys' priors throughout the estimation. The length of the Gibbs sampling chain is chosen to be 10,000. (We found that our Bayesian algorithm converges very fast. The chain size of 100,000 or 5000 offers basically the same results. In addition, for the model with N = 200, T = 200, the CPU time for 10,000 chain size is less than 1200 sec using a standard Intel Core i7 computer with 2 GB DDR3 memory. In all the simulations, the confidence bands well cover the true factors.) We discard the first 5000 draws before calculating the posterior mean and standard deviations. We calculate the posterior mean as our point estimates for the dynamic factors and compare them to the true factors. To evaluate the estimation outcome, we project the true factors on the estimates to obtain the adjusted Rsquared of a regression through the origin. For example, for the jth factor, we regress f j onf j to obtain the adjusted R-squared R 2 j . We also regress f j on bothf 1 andf 2 to measure the closeness of the true factors to the estimated factor space. Figure 1 shows the result. In each one of the four graphs, there are three lines, each connecting the adjusted R-squared of different N for a given T. For example, the solid line shows the pattern how the adjusted R-squared increases over the cross-section dimension N given T = 200. Such a pattern is in line with the theory on large dimensional factor models (such as Bai 2003). The other two lines show the same pattern given a time dimension of either T = 50 or 100. In each graph, all three lines are very close  to each other, especially when N is large. This implies that for the given data-generating process, the estimated factors reasonably well approximate the true factors and the overall estimation precision mainly comes from the cross-section dimension N.
For the case with N = 10 or 100 and T = 200, Figures 2 and 3 plot the two SE error bands of the estimated factors and compare them to the true factors. As is evident from the figures, factors are more precisely estimated when the sample has a larger N. This is also in line with the theory on large dimensional factor models such as Bai (2003).
Under the identification condition DFM2, we obtain similar results. We use the same DGP as described at the beginning of this section, except that we let the upper q × q block of the factor loading 0 in (1) to be an identity matrix. We also impose the same extra restrictions on every five rows of the factor loadings. When estimating the model, we use DFM2 and treat the error  variance Q in the factor process (2) to be unknown parameters. Figure 4 shows that with large N, the confidence bands are rather tight while still covering the true factors.
Given the data that produces Figure 3, we examine how the marginal likelihood varies for different choice of q. We use T 0 = 20 to obtain the prior from the training sample, and then use the prior to obtain the marginal likelihood for the sample T = 21, . . . , 200. To fix this idea, we fix s = 1, h = 2 and only vary the number of factors q. Table 1 reports the associated marginal likelihood of the data. The marginal likelihood achieves maximum when the number of factors is the same as the true value q = 2.
An alternative model comparison method is to use the deviance information criteria (DIC); see, for example, Berg, Meyer, and Yu (2004), Celeux et al. (2006), and Li, Zeng, and Yu (2013). The DIC does not require a proper prior and thus is applicable to our case with Jeffreys' priors; it also works for time series observations. Out of the seven versions of DIC in Celeux et al. (2006), we adopt their DIC 1 and report the results in Table 1 as well. A computationally simple approximation to DIC 1 is considered by Li, Zeng, and Yu (2013). In the table, we use the MCMC approximation. Note that DIC 1 is defined as where θ is the vector of model parameters andθ is the posterior mean. We may use the MCMC posterior draws {θ (m) , m = 1, . . . , M} to obtain the following approximation where p(X|θ ) can be evaluated at any given θ using the Kalman filter. In Table 1, the DIC achieves minimum when the number of factors equals the true value q = 2.

Serially Correlated Measurement Errors
This section explores the implications of serially correlated error terms for the analysis of the dynamic factors. We use the same data-generating process as the previous section, except that the error terms in (1) follow a VAR(1) process given in (13). The number of dynamic factors is set at two. We fix N = 50, T = 200, s = 1, h = 2. Two methods are used to compare the estimation results, one with ρ i being free parameters to be estimated, the other being under the assumption that ρ i = 0 for all i. Thus, the first estimation is conducted under the true model, while the second one is under the assumption of serially uncorrelated error terms and thus is using a misspecified model. This exercise helps us to understand how serial correlation in the error terms affects the estimation of the dynamic factors.
We use the adjusted R-squared to measure the overall fit of estimated factors. We project the true factors on the corresponding estimated factors to obtain one set ofR 2 . We also project the true factors on the entire estimated factor space to obtain another set ofR 2 . Table 2 reports the results. To fix the idea, we choose ρ i = ρ for all i = 1, . . . , N and examine the adjusted R 2 for different values of ρ ∈ {0.99, 0.9, 0.8, 0.5, 0.0}. NOTES. The two numbers within each pair of parentheses are the adjusted R 2 of regressing the true f 1 and f 2 , respectively, on the corresponding estimated factors (the left pane) or the entire factor space (the right pane). N.A. means the MCMC chain fails to converge.
From the table, we notice that for not-so-persistent measurement errors, say ρ ≤ 0.5, both estimation methods provide reasonably good fit. That is, ignoring the serial correlation still gives good fit. However, as the persistence increases, the fit of factors deteriorates when ignoring the serial correlation. For example, when ρ ≥ 0.8, the fit becomes rather poor when ρ i is set to zero. On the other hand, across all cases, the estimation taking into account serial correlation of error terms performs uniformly well in terms of fit, even in the near unit root case. To demonstrate the performance of our Bayesian algorithm in the case of persistent measurement errors, we estimate the model for N = 50, T = 200, and ρ i = 0.9 for all i. Figure 5 plots the two SE error bands of the estimated factors and compares them to the true factors. As is evident from the figures, the true factors are well covered by the tight confidence bands.

COMOVEMENT IN INTERNATIONAL BOND YIELDS
In this section, we apply the dynamic factor models to the analysis of international bond yields. We adopt the monthly nominal zero-coupon bond yields data constructed by Wright (2011), which cover nine countries from January 1971 to May 2009. We will use a balanced panel from December 1992 to May 2009. The nine countries are the U.S Canada, U.K., Germany, Sweden, Switzerland, Japan, Australia, and New Zealand. For each country, the data are constructed at 60 maturities from 3 months to 180 months, except for Sweden with 40 maturities. Let X c t (τ ) be the time t bond yield of country c at maturity τ . For c = 1, . . . , C, assume that a global factor g t and a country specific factor f c t combine to affect the bond yield: We further assume that the factors follow a VAR(1) process. In particular, Assuming that Q = I C+1 , both γ c (τ ) and λ c (τ ) being strictly positive for τ =3 months, then the multilevel factor model is identified. Using a similar but different model, Diebold, Li, and Yue (2008) found an economically significant global yield factor for Germany, Japan, U.S., and U.S. Our empirical exercise differs from theirs in two important ways. First, we adopt a much larger dataset with nine countries and more maturities, which expects to better represent the global bond yield. Second, we treat the AR coefficient matrix as unrestricted, which allows us to study the "spill-over" effects of one country to another, such as how a shock to the U.S. factor affects U.S. On the other hand, almost all existing literature on multilevel factor models assumes that different factors are independent of each other. This is restrictive because the country factors could be correlated with each other due to regional interactions or other economic linkages such as cross-border capital flows. Models (14) and (15) are estimated using the Bayesian method described in the previous sections. The posterior mean of the global factor g t along with two standard deviation error bands is reported in Figure 6. The large cross-section size N leads to very narrow error bands for both the global factor and the country factors. To save space, we do not report results for country factors.
We also report the variance decomposition results for each country in Table 3. (We calculate the variances due to the global factor, the country factors, and the idiosyncratic errors, Figure 7. Variance decomposition of country yields at maturities of 3, 12, 60, 120 months. Remark: the line with squares represents the variance explained by the global factor, and the line with crosses by the country factors. respectively. And then use the variance ratio to measure the variance contribution due to each component.) The result is similar to Diebold, Li, and Yue (2008) in that the global factor explains a large proportion of variance in country bond yields, and compared with most other markets the United States has a stronger country factor.
We conduct a further variance decomposition exercise of country yields at four representative maturities of 3, 12, 60, and 120 months. Figure 7 shows that, for all the nine countries considered, the global factor tends to explain more variation at longer maturities, while the country factor shows the opposite pattern. For Sweden, the majority of variation is explained by the global factor.
Given the multilevel factor structure, we are particularly interested in how different factors interact with each other. To estimate the impulse response functions, we use the median to denote the point estimate, and lower and upper five percentile to denote the error bands around the point estimate. The top left panel of Figure 8 shows the impulse responses of the global factor g t up to 24 months to one standard deviation shock of each one of the nine country factors. We report the impulse responses of the country factors f c t given a shock to the U.S. factor, the U.S. factor, and the Japan factor, respectively, in the other three panels in Figure 8. (Here, only report results for U.S., U.K., and Japan. The results for other countries are given in the appendices.) To briefly summarize, the global factor does not seem to respond much to country factors, except for some mild responses to several countries as can be seen from the top left panel of Figure 8. On the other hand, there are significant interactions between country factors based on evidence from other panels in Figure 8. For example, the shock originated from the U.S. factor seems to significantly affect most other countries, especially Canada and the European countries including U.K. The U.K. factor seems not to have a statistically significant impact on the U.S. factor, although it significantly affects Canada, Sweden, Japan, Australia, and New Zealand. On the other hand, the Japan factor affects most other countries such as the U.S. and U.K. Such rich evidence in the impulse response functions points to complex interstate relations, which could not be simply accounted for by the geographic relations. Our modeling strategy naturally allows the study of such complicated Figure 8. Impulse responses of global factor and country factors. pairwise relations between countries, which are not strong enough to be summarized by global factors or by geographically defined regional factors. Most literature on dynamic factor models assumes that factors are independent of each other, and thus is not suitable for the study of such rich interstate relations.

CONCLUSION
In this article, we discuss the minimal requirement of identifying dynamic factor models. Our identification scheme allows the interaction among different factors, which provides a useful framework to study the structural implications of the factor model. The empirical study of the cross-country linkage of bond yields well demonstrates the importance of allowing correlated factors. In addition, a more general class of impulse response functions can be derived under our framework, and the associated confidence bands are readily constructed from the Bayesian estimation results. The proposed identification scheme naturally combines the structural VAR analysis as well. We also show how the Bayes' factors can help determine the number of dynamic factors. Furthermore, the Bayesian method can easily handle serially correlated idiosyncratic error terms, greatly improving the fit of the estimated factors as compared to the case where such correlation is ignored in estimation.

APPENDIX A. PROOF OF PROPOSITIONS
Proof of Proposition 1: It is well known that a static factor model is identified up to an r × r rotation, where r = q(s + 1) is the number of static factors (Lawley and Maxwell 1971). We show that under the dynamic specification in the observation equation, the only admissible r × r rotation matrix is a block diagonal matrix with the same q × q matrix on the main diagonal. To see this, consider the case s = 1 and define the static factor as From the theory of static factor models, we may identify a rotation of F t , given by where (L) ≡ 1 + 2 L + · · · + h L h−1 . Combining terms to obtain By assumption, ε t is uncorrelated with both (A − D − C (L)) ε t−1 and ((A − D) (L) − C 2 (L) + B)f t−2 . This implies Cε t = 0 or C = 0 because var(ε t ) > 0. And thus No correlation between ε t−1 and ((A − D) (L) + B)f t−2 implies A − D = 0 or A = D, which leads to Bf t−2 = 0 and thus B = 0 because var(f t−2 ) > 0. To summarize, the dynamic structure implies A = D, B = C = 0, which implies that the dynamic factor model is identified up to a q × q rotation of the dynamic factor.
Let A be a full rank q × q rotation matrix. Left-multiply the dynamic factors f t by A and right-multiply the loading matrix j by A −1 , j ∈ {0, 1, . . . , s}. After the rotation, the new factorsf t = Af t have a VAR(h) representation given below The observation Equation (1) after the rotation becomes If we can show that the only admissible A is a diagonal matrix with either 1 or −1 on the diagonal, then both the factors and factor loadings are identified up to a sign change. The normalization var(Aε t ) = I q implies that Then the normalization of factor loadings requires 0 A −1 to be a lower triangular matrix, that is, . . . . . . . . . 0 λ q1 · · · · · · λ qq . . . · · · · · · . . .
2) from which we obtain a ij = 0 for any i, j such that i < j, or A −1 is lower triangular given the assumption that λ ii = 0, λ * ii = 0, i = 1, . . . , q. An orthogonal matrix that is lower triangular must be diagonal. In sum we have proved that A −1 = diag{a 11 , . . . , a qq }.
Given that A −1 · (A −1 ) = I q , we obtain a ii = 1 or −1 for all i = 1, . . . , q. So the rotation matrix A is also a diagonal matrix with either 1 or −1 on the diagonal.
This proves that the proposed normalization DFM1 identifies both the dynamic factors and the corresponding dynamic factor loadings up to a sign change.
As a normalization, we may assume λ ii > 0 so that both the dynamic factors and the associated dynamic factor loadings are fully identified.
Proof of Proposition 2: Similar to the proof of Proposition 1, let A be a full rank q × q rotation matrix. Left-multiply the dynamic factors f t by A and right-multiply the loading matrix j by A −1 , j ∈ {0, 1, . . . , s}. DFM2 implies that both 0 and the new dynamic factor loading˜ 0 must have I q as its upper q × q block. This implies I q A −1 = I q , or A = I q , which in turn implies that the factors and factor loadings are identified.
Let M be a rotation matrix. Left-multiply the factors by M and rightmultiply the loading matrix by M −1 . If under the current normalization, the only admissible M is an identity matrix, then both the factors and factor loadings are identified. From Wang (2010), the multilevel factor structure implies that the only admissible rotation matrix M takes the following form, . Then H t follows a VAR(1) process H t = ρH t−1 + U t , with U t ∼ iidN (0, I ). After the rotation, we obtainH t = MH t . AndH t follows a VAR (1) The normalization on U t andŨ t implies that MM = I . Along with (A.4), this in turn implies that M is block diagonal: M = blkdiag{A, A 1 , . . . , A C }. Similar to the proof of Proposition 1, it follows that 1 W being lower triangular with strictly positive diagonal terms implies A is an identity matrix. Likewise, {A c , c = 1, . . . , C} are all identity matrices. In sum, the rotation matrix M becomes an identify matrix. This proves that the proposed normalization separately identifies both the world factor and the country factors and the corresponding loadings.

APPENDIX B. IDENTIFYING DYNAMIC FACTOR MODELS WITH A MULTILEVEL FACTOR STRUCTURE
This section extends the result of Section 2.1 to the case with dynamic factors. Consider a dynamic version of Equation (4) (B.1) Stacking the observations from all countries, we can write (B.1) in matrix form as, . .
The following corollary is a natural extension of Proposition 3.
The proof of Corollary 3 is basically the same as that of Proposition 3.

APPENDIX C. MODEL COMPARISON WITH IMPROPER PRIORS
Under the minimal identification scheme DFM1 or DFM2, we are able to discriminate between models with different lags and number of dynamic factors using Bayes' factors. To fix idea, we will impose the identification scheme DFM2 (the upper q × q block of 0 being identity) throughout. Let M 1 and M 2 be two possible competing models for the time series X ≡ {X 1 , . . . , X T }, both being dynamic factor models but with different lags h, s and number of factors q. The Bayes' factor for model M 2 against M 1 from data X is the ratio Note that if we multiply the Bayes' factors by the ratio of priors for both models, we obtain the posterior odds of the two models. A key objective of interest is the marginal likelihood of the data where p(X|θ, M j ) is the likelihood under model M j and π (θ ) is the prior. Note that if we use the diffuse prior, p(X|M j ) in general is not well defined unless θ has a compact support. Possible alternatives to Bayes' factors in case of diffuse priors include fractional Bayes' factors (O'Hagan 1995) and intrinsic Bayes' factors (Berger and Pericchi 1996). To fix idea, we will adopt a version of the fractional Bayes' factors for the purpose of comparing different dynamic factor models in case of diffuse priors. Note that the intrinsic Bayes' factors can be obtained in a similar way. Given a fraction b of the data X 0 ≡ {X 1 , . . . , X [T b] }, calculate the posterior based on the training sample Then calculate the marginal likelihood for X 1 ≡ {X [T b]+1 , . . . , X T } conditional on the training sample p(X 1 |M j , X 0 ) = p(X 1 , θ|M j , X 0 )dθ = p(X 1 |θ, M j , X 0 )p(θ|M j , X 0 )dθ.
The fractional Bayes' factor is then given by where π (θ)is the diffuse prior. This suggests two equivalent sampling schemes to sample from the posterior p(θ|X 1 , X 0 ). One is to treat p(θ|X 0 ) as the prior and p(X 1 |θ, X 0 ) as the likelihood. The other is to treat π (θ) as the prior and p(X 1 , X 0 |θ) as the likelihood. Thus, to calculate the fractional Bayes' factor, we may use the same algorithm as in the previous section to obtain posterior draws from p(θ |X 1 , X 0 ). The algorithm consists the following three steps.
Step 1. Assume Jeffreys' priors on θ. Conduct a Bayesian analysis of the training sample X 0 to obtain a sequence of posterior draws {θ (l) , F (l) } L l=1 . Obtain the posterior density given X 0 p θ|M j , X 0 = p θ, F |M j , X 0 dF = p θ|M j , F, X 0 p F |M j , X 0 dF Note that p(θ|M j , F (l) , X 0 ) has a closed-form solution. Figure 9. Impulse response functions with 2 SE error bands given one std change of f 1 's innovation. Remark: the center line represents the true impulse response function.
Step 2. Repeat Step 1 for the whole sample X = {X 0 , X 1 } to obtain another sequence of posterior draws {θ (l) , F (l) } L l=1 . Obtain the posterior density given X p θ|M j , X ≈ 1 L L l=1 p θ |M j , F (l) , X .
A convenient alternative to Step 3 is to calculate the fractional marginal likelihood for X 1 conditional on X 0 according to the identity (Chib 1995) p X 1 |M j , X 0 = p X 1 |θ, M j , X 0 p θ |M j , X 0 p θ |M j , X 1 , X 0 .
Thus, we may obtain the logarithm of the fractional marginal likelihood log p X 1 M j , X 0 = log p X 1 θ * , M j , X 0 + log p(θ * M j , X 0 −log p θ * |M j , X 1 , X 0 , where θ * is the posterior mean for θ . Note that given the recursive structure of the dynamic factor model, it is easy to calculate the conditional likelihood p X 1 |θ, M j , X 0 , which is multivariate normal: p(X 1 |θ, M j , X 0 ) = T t=[T b]+1 p(X t |X t−1 , . . . , X 1 , θ, M j ) = T t= [T b]+1 N (E(X t |X t−1 , . . . , X 1 , θ, M j ), var(X t |X t−1 , . . . , X 1 , θ, M j )) in which the mean and variance of the normal distribution are given by the Kalman filter.
Regarding the choice of the fraction, O'Hagan (1995) proposed three alternatives: b = n 0 n , b = n 0 log(n) nlog(n 0 ) , or b = n 0 n , ranked by increasing robustness, where n 0 is the minimal training sample size such that a posterior is well defined.

APPENDIX D. NUMERICALLY POSITIVE-DEFINITE CONDITIONAL COVARIANCE MATRIX
In real applications, numerical round-off errors could result in an indefinite covariance matrix although the matrix should be positive definite. This situation arises more frequently for high-dimensional covariance matrices. We provide an algorithm to calculate the variance of a conditional normal distribution such that it is always positive-definite. (The algorithm is implemented in the Matlab function schur pos.m in the accompanying computer program files.) Let X p×1 Y q×1 ∼ N μ X μ Y , P 11 P 12 P 21 P 22 .
Then for the conditional distribution of Y |X var (Y |X) = P 22 − P 21 P −1 11 P 12 , which is assumed to be positive definite. However, numerical roundoff errors might produce an indefinite var(Y |X). To handle this, we propose a numerically stable algorithm to calculate var(Y |X), which is always positive definite as long as var([X; Y ]) is positive definite.
So var(Y |X) = P 22 − P 21 P −1 11 P 12 = ((P −1 ) 22 ) −1 . We first conduct singular-value-decomposition (SVD) of P to obtain P = USV ,  where U m is (p + q) × (p + q), S m is (p + q) × q, and V m is q × q. This admits a V m DV m representation of (P −1 ) 22 given below where D = S m S m is q × q. This shows that ((P −1 ) 22 ) −1 = V m D −1 V m Figure 13. Impulse responses of country factors given one unit shock to Sweden factor. Figure 14. Impulse responses of country factors given one unit shock to Switzerland factor. whose numerical outcome is always positive definite. In computationally efficient Matlab implementation, D = diag(diag(S m ) 2 ).

APPENDIX E. SQUARE-ROOT FORM OF THE KALMAN FILTER
We write the square-root form of Kalman filter Matlab program, k filter srf.m, based on Bierman (1977), Evensen (2009), andTippett et al. (2003). Consider a state-space representation of the dynamic factor model X t = F t + e t , e t ∼ iid.N (0, R) , F t = F t−1 + ε t , ε t ∼ iid.N (0, Q) .
(E.1) Figure 15. Impulse responses of country factors given one unit shock to Australia factor. Figure 16. Impulse responses of country factors given one unit shock to New Zealand factor.
Let the matrix square-root representation of P t|t−1 and P t|t be (not unique) of Hong Kong University of Science and Technology (Project DAG09/10.BM06).