Generalized Dynamic Principal Components

ABSTRACT Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here, we propose a new enterally empirical approach to DPC. The main differences with the existing methods—mainly Brillinger procedure—are (1) the DPC we propose need not be a linear combination of the observations and (2) it can be based on a variety of loss functions including robust ones. Unlike Brillinger, we do not establish any consistency results; however, contrary to Brillinger’s, which has a very strong stationarity flavor, our concept aims at a better adaptation to possible nonstationary features of the series. We also present a robust version of our procedure that allows to estimate the DPC when the series have outlier contamination. We give iterative algorithms to compute the proposed procedures that can be used with a large number of variables. Our nonrobust and robust procedures are illustrated with real datasets. Supplementary materials for this article are available online.


Introduction
Dimension reduction is very important in vector time series because the number of parameters in a model grows very fast with the dimension m of the vector of time series. Therefore, finding simplifying structures or factors in these models is important to reduce the number of parameters required to apply them to real data. Besides, these factors, as we will see in this article, may allow to reconstruct with a small error the datasets and therefore to reduce the amount of information to be stored. Dimension reduction is usually achieved by finding linear combinations of the time series variables which have interesting properties. Suppose the time series vector z t = (z 1,t , . . . , z m,t ), where 1 ≤ t ≤ T, and we assume, for simplicity, that z = T −1 T t=1 z t , which will estimate the mean if the process is stationary, is zero and Z the T × m matrix whose rows are z 1 , . . . , z T . Let C = Z Z/T, be the sample covariance matrix, λ 1 ≥ λ 2 . . . ≥ λ m eigenvalues of C and v i = ( v 1,i , . . . , v m,i ) , 1 ≤ i ≤ m, the corresponding eigenvectors satisfying v i v j = δ i j . Then, is the ith principal component of C. Let k < m, P k the k × T matrix with rows equal p 1 , . . . , p k and call V k the m × k matrix whose ith row is v i . Let P k = (p i,t ) be any k × T matrix and V k = (v j,i ) any m × k matrix and suppose that we reconstruct all the data z j,t using the k columns of V k by k i=1 v j,i p i,t . Then, the mean squared error of the reconstructed series is  Okamoto and Kanasawa (1968) showed that Note that in (2), the minimization is performed on all possible k × m matrices P k , and therefore it is not required that the element p i,t of P k are linear combinations of z t . However, the elements of the optimal matrix P k given by (1) have this property. One drawback of this reconstruction is that it is static, that is, to reconstruct an observation of period t only the values of the components in that period are used. Ku, Storer, and Georgakis (1995) proposed to apply classical principal components to the augmented observations z * t = (z t−h , z t−h+1 , . . . , z t ) , h + 1 ≤ t ≤ T, that includes the values of the series up to lag h. These principal components provide linear combinations of the present and past values of the time series with largest variance. However, it is not clear from their definition that these principal components have good reconstruction properties.
An alternative way to find interesting linear combinations was proposed by Box and Tiao (1977) who suggested maximizing the predictability of the linear combinations c t = γ z t . Other linear methods for dimension reduction in time series models have been given by the scalar component models, SCM (Tiao and Tsay 1989) and the reduced-rank models (Ahn and Reinsel 1990;Reinsel and Velu 1998). Brillinger (1981) addressed the reconstruction problem as follows. Suppose zero mean m dimensional stationary process {z t } , −∞ < t < ∞. The dynamic principal components are defined by searching for m × 1 vectors c h , −∞ < h < ∞ and β j , −∞ < j < ∞, so that if we consider as first principal component the linear combination is minimum. Brillinger elegantly solved this problem by showing that c k is the inverse Fourier transform of the principal components of the cross spectral matrices for each frequency, and β j is the inverse Fourier transform of the conjugates of the same principal components. See Brillinger (1981) and Shumway and Stoffer (2000) for the details of the method. Note that when this procedure is adapted to finite samples the number of lags in (3) and in the reconstruction of the series should be truncated. We can mention two shortcomings of Brillinger's procedure: (1) Brillinger's procedure can be used with nonstationary series, but in this case the mean square error of the reconstructed series may not be close to its possible minimum value (2) it is not clear how to robustify these principal components using a reconstruction criterion. A related line of research are factor models for time series. Static factor models assume a contemporaneous relationship between the series and a small number of factors. Some of these models assume stationarity (Peña and Box 1987;Stock andWatson 1988, 2002;Bai andNg 2002, andLam andYao 2012, among others). All these models use the eigenvalues of the lag covariance matrices of the process and are related to the principal components (PC) of the time series. Some generalizations to the nonstationary case are Peña and Poncela (2006) for integrated processes, Pan and Yao (2008) for general nonstationary processes and Motta, Hafner and von Sachs (2011) and Motta and Ombao (2012) for locally stationary processes.
Dynamic factor models are closely related to dynamic principal components (DPC), because they assume that one important part of the original series can be explained in a dynamic way by a relatively small number of common factors. Forni et al. (2000) proposed a very general dynamic factor model allowing for an infinite number of factor lags and low correlation between any two idiosyncratic components. They showed that the common component of the series can be consistently estimated by increasing the number of series to infinity. This estimator is obtained by projecting the data in the first q dynamic principal components which include leads and lags. These principal components are obtained as in Brillinger by the Fourier transforms of the eigenvectors of the spectral matrix. This model was applied for prediction by Forni et al. (2005), where the authors proposed a one-sided method of estimation of a dynamic factor model for forecasting. The forecasts generated with this procedure improve the ones derived by Stock and Watson (2002) using static principal components. Forni et al. (2009) proposed a model that can be seen either as a static model with r factors or as a dynamic model with q factor with q < r and develop estimation methods for the factor structure with finite-dimensional factor space. Forni and Lippi (2011) proposed a one-sided solution for the general dynamic factor model of Forni et al. (2000). Forni et al. (2015) proposed a model with possibly infinite-dimensional factor spaces and obtained a one-sided representation for the dynamic factor model. While some of these models (Stock andWatson 1998, 2002;Bai and Ng 2002;Forni et al. 2009) make assumptions on the relation between the series under study and the way factors are loaded, other ones (Forni et al. 2000;Forni et al. 2015) do not make any assumption of that type. Hallin and Lippi (2013) give a general presentation of the methodological foundations of dynamic factor models.
The procedure we propose is different from these approaches as follows: (1) it is entirely data-analytic and does not assume any given model; (2) it does not assume a fixed number of factors to be identified. Instead, the number of components is chosen to achieve a desired degree of accuracy in the reconstruction of the original series. These differences are the usual ones between the two classical approaches for dimension reduction: principal components and factor models.
In this article, we address the sample reconstruction of a vector of time series from the DPCs by using a finite number of lags. Even if we do not prove consistency, some interesting features of our procedure with respect to previous ones are: (1) the DPC we propose do not need to be a linear or stationary combination of the data and consequently, it may lead to a better adaptation to a possible nonstationary behavior of the series and (2) it can be easily robustified by changing the loss function to minimize, for example, from the mean square error criterion to a robust scale.
The rest of this article is organized as follows. In Section 2, we describe the proposed dynamic principal components of a vector time series based on a reconstruction criterion. In Section 3, we study the particular case where the proposed dynamic principal components depend only on one lag. In Section 4, we show the results of a Monte Carlo study that compares the proposed dynamic principal components procedure with the ordinary principal components used in a dynamic way and with the Brillinger's DPC procedure. The comparison is done for both: stationary and nonstationary series. We also compare our procedure with the one developed by Forni et al. (2009) for the factor models introduced in this work. We show the good performance of our proposal in all these models, including the case of very large number of series. We also apply our procedure to two nonstationary vector time series real examples. In Section 5, we define robust dynamic principal components using a robust reconstruction criterion and illustrate in one example the good performance of this estimator to drastically reduce the influence of outliers. In Section 6, some final conclusions are presented. Supplemental material containing mathematical derivations is available online.

Finding Time Series with Optimal Reconstruction Properties
Suppose that we observe z j,t , 1 ≤ j ≤ m, 1 ≤ t ≤ T, and consider two integer numbers k 1 ≥ 0 and k 2 ≥ 0. We can define the first dynamic principal component with k 1 lags and k 2 leads as a vector f =( f t ) −k 1 +1≤t≤T +k 2 , so that the reconstruction of series z j,t , 1 ≤ j ≤ m, as a linear combina- . . , f t+k 2 ) is optimal with the mean square error (MSE) criterion. More precisely, given a possible factor f, a m × (k 1 + k 2 ) matrix of coefficients γ = (γ j,i ) 1≤ j≤m,−k 1 ≤i≤k 2 , and α = (α 1 , . . . , α m ), the reconstruction of the original series z j,t is defined as Let k = k 1 + k 2 and put Then, the reconstructed series can also be obtained as Then, without generality we can use indistinctly k lags or k leads of the principal component to reconstruct the series. Although the reconstruction of the series with k lags is intuitively more appealing, we will derive the optimal solution for the case of k leads. The reason for this is that to derive the optimal solution using lags requires to deal with more cumbersome equations due to the occurrence of negative subscripts. Anyway, once obtained the forward optimal solution we can immediately obtain the backward one using (5).
j≤m,1≤i≤k+1 and α = (α 1 , . . . α m ), then the MSE loss function when we reconstruct the m series using k leads is given by Note that this loss function is well defined and makes sense even in the case of nonstationary vector time series. The opti- Clearly if f is optimal, γ f+δ is optimal too. Thus, we can choose f so that T +k t=1 f t = 0 and (1/(T + k)) T +k t=1 f 2 t = 1. We call fthe first DPC of order k of the observed series z 1 , . . . , z t . Note that the first DPC of order 0 corresponds to the first regular principal component of the data. Let and 0 otherwise and Differentiating (6) with respect to f t in Section 1 of the supplemental material we derive the following equation: Obviously, the coefficients β j and α j, 1 ≤ j ≤ m, can be obtained using the least-squares estimator, that is, where . . , f t+k , 1). Then, the first DPC is determined by Equations (10) and (11).The second DPC is defined as the first DPC of the residuals r j,t (f, β). Higher order DPC are defined in a similar manner.

Computational Algorithm for the DPC
To define an iterative algorithm to compute ( f, β, α) is enough to give f (0) and a rule describing how to compute β (h) The following two steps based on (10) and (11) describe a natural rule to perform this recursion.
Step 1. Based on (11) Step 2. Based on (10), define f (h+1) by The initial value f (0) can be chosen equal to the standard (nondynamic) first principal component, completed with k zeros. We stop the iterations when for some value ε.
Remark 2.1. Note that the dimension of the matrices to be inverted to compute f (h) , β (h) , α (h) are independent of the number of time series and therefore we can deal with large number of variables.
Remark 2.2. Note also that there are no restrictions on the values of f and in particular we do not assume, as in Brillinger, that their components must be linear combinations of the series. In this way, the values of f can be adapted to the nonstationarity character of the time series.
Assume we are considering p dynamic principal components of order k and let β j,i,s 1 ≤ j ≤ m, 1 ≤ i ≤ k + 1, be the coefficient β j.i corresponding to the sth component, 1 ≤ s ≤ p. Then, the number of values required to reconstruct the original series are the (T + k)p values of the p factors plus (k + 1)mp values for the coefficients β j,i,s plus the m intercepts α j . Thus, the proportion of the original information required to reconstruct the series is ((T + k)p + (k + 1)mp + m)/mT and when T is large compared to k and m, this ratio is close to p/m. In practice, the number of lags to reconstruct the series, k, and the number of principal components, p, need to be chosen. Of course the accuracy of the reconstruction improves when any of these two numbers is enlarged, but also the size of the information required will also increase. For large T increasing the number of components introduces more values to store than increasing the number of lags. However, we should also take into account the reduction in MSE due to enlarging each of these components. In general, increasing the number of lags after some point will have a negligible effect on the MSE. Then, if the level of the MSE is larger than desired, a new component should be added. Thus, one possible strategy is to start with one principal component and increase the number of lags until the reduction of further lags is smaller than some value . Then, a new principal component is introduced and the same procedure is applied. The process stops when the MSE reaches some satisfactory value. Note that this rule is similar to what is generally used for determining the number p in ordinary principal components.

Dynamic Principal Components When k = 1
To illustrate the computation of the first DPC, let us consider the simplest case of k = 1. Then, we search for β=( β j,i ) 1≤ j≤m,1≤i≤2 and f= ( f 1 , . . . , f T +1 ) such that Suppose now that z t is stationary, then in Section 2 of the supplemental material is shown that, except in both ends, f t can be approximated by where |c| < 1. Therefore, the DPC is approximated by linear combinations of the stationary series z j,t + ∞ i=1 c i (z j,t+i + z j,t−i ), and z j,t−1 + ∞ i=1 c i (z j,t−1+i + z j,t−1−i ), 1 ≤ j ≤ m. These series give the largest weight to the periods t and t − 1 respectively and the weights decrease geometrically when we move away of these values. We conjecture that in the case of the first DPC of order k, a similar approximation outside both ends of f t by an stationary process can be obtained.

Simulation Results for the Stationary Case
We perform a Monte Carlo study using as vector series z t = (z 1,t , z 2,t , . . . , z m,t ) , 1 ≤ t ≤ T generated as follows: where f t , −2 ≤ t ≤ T and u i,t , 1 ≤ t ≤ T, 1 ≤ i ≤ m are iid random variables with distribution N(0, 1). We compute three different principal components: (1) the ordinary principal component used in a dynamic way with k lags to reconstruct the original series (OPC k ), (2) the dynamic principal component (DPC k ) proposed here, (iii) Brillinger dynamic principal components (BDPC k ) adapted for finite samples as follows: where c k are the coefficients defined below (4) in Section 1. In this simulation, we used OPC 2 , DPC 2 , and BDPC 10 . To reconstruct the original series with the three procedures we used least squares. The BDPC k components were computed using the R code developed for Hörmann, Kidziński, and Hallin (2014) that was kindly provided by the authors. However, we were not able to run this program for dimension m = 1000 because of lack of enough memory (our computer has 8 GB of installed memory). We performed for each case 1000 Monte Carlo replications and in Table 1 we show the MSE of the reconstructed series. We observe that the performances of DPC 2 and BDPC 10 are quite similar and that the MSEs are close to one, that is, equal to the variance of the error terms u i,t s. We also observe that the MSE of the reconstructed series with the OPC 2 procedure is much larger.
In Figure 1, we plot the estimated factor loadings for a dataset satisfying (14) with T = 200 and m = 1000.
We observe that the estimated loadings follow a pattern quite close to the one satisfied by the true values: sin, cosine, and a linear trend.

Simulation Results in the Nonstationary Case: VARI(1, 1) Model
In this case, we consider a VARI(1, 1) m-dimensional vector series z t generated as follows. Consider a stationary VAR(1) z t−1 + x t . We consider 1000 replications and in each replication we generate a new matrix A of the form A = V V , where V is an orthogonal matrix generated at random with uniform distribution and is a diagonal matrix, where the diagonal elements are independent with uniform distribution in the interval [0, 0.9]. We took m = 20, 100, and 200 and T = 400. We obtained the first principal component using the OPC 10 , DPC 10 , and BDPC 10 procedures. In Table 2, we show the percentage of explained variance for each procedure. We consider that, since in this case we are not dealing with a factor model, this measure of performance is easier to interpret than the MSE of the reconstructed series. We observe that the best performance is achieved by the DPC 10 . We also tried to use the different procedures with larger number of lags obtaining essentially the same results.

Simulation Results Using the Factor Model in Forni et al. (2009)
In this section, we compare by means of a Monte Carlo simulation the dynamic principal components procedures with the one developed by Forni et al. (2009) (FGLR) for a special class of factor models that may be seen either as a static model with r factors or as a dynamic model with q factor with q < r. In our Monte Carlo study, we consider a vector series z t = (z 1,t , z 2,t , . . . , z m,t ) , 1 ≤ t ≤ T, where where all the u i,t are independent with distribution N(0, 1). The vector of static factors f t = ( f 1t , f 2t ) satisfies the autoregressive model f t = Af t−1 + v t b,1 ≤ t ≤ T, where all v t are independent N(0, 1), A is a 2 × 2 diagonal matrix with diagonal equal to (−0.8, 0.7) and b = (1, 1) . We took m = 5, 100, and 1000, T = 200 and 400 and the number of replications was 1000. Note that in this case r = 2 and q = 1. In Table 3, we show the MSEs of the reconstructed series. The code for the FGLR procedure was kindly provided by the authors of Forni et al. (2009). As was explained in Section 4.1, we were not able to run the BDPC 10 procedure with m = 1000. We observe that for small m the MSEs of the reconstructed series with the principal components procedures are smaller than those reconstructed with the FGLR. Instead, for large m, both reconstruction errors are close to the variance of the idiosyncratic component u i,t . The reason for this is that the goal of factor analysis models is accounting for cross-correlations (the instantaneous ones, in the static case, the instantaneous and lagged ones in the dynamic case) while the goal of principal components is dimension reduction, that is, to obtain an approximate reconstruction of the original data based on an small number of unobserved variables, However, it can be proved that when m increases the MSEs of the reconstructed series using dynamic factor analysis or dynamic principal components converge to the variance of the idiosyncratic component. Therefore, the comparison between both principal component methods and the FGLR factor analysis method is only relevant when m is large. The results presented in Table 3 indicates that both DPC and FGLR give very good results in this case.

Example 1
We illustrate the DPC with a small datasets with six series corresponding to the Industrial Production Index (IPI) of France, Germany, Italy, United Kingdom, USA, and Japan. We use monthly data from January 1991 to December 2012 and the data are taken from Eurostat. The six series are plotted in Figure 2.
In Table 4, we show the percentage of the variance explained by the OPC k and DPC k procedures for k = 1, 5, 10, 12 and for the BDPC 12 procedure. The reason to take only k = 12 for the BPC is to be close to the original Brillinger definition.
We note that the reconstruction of the series using the DPC is notably better than the one obtained by means of the OPC and BDPC procedures with the same lags. Increasing the number of lags obviously improves the reconstruction obtained by both components, although the improvement is larger with the DPC. Figure 3 shows the boxplots of the absolute value of the reconstructed series errors for the OPC 12 and DPC 12 procedures. Note that the reconstruction errors are significantly smaller when using the DPC 12 procedure.

Example 2
In this example, the dataset is composed of 30 daily stock prices in the stock market in Madrid corresponding to the 251 trading days of the year 2004. The source of the data is the Ministry of Economy, Spain. In Table 5, we show the percentage of the variance explained by the different procedures using the OPC k ,DPC k , and BDPC k procedures. We observe that the best performance corresponds to the DPC k procedure. As shown in Table 5 including lags in the OPC does not make much difference in the results, but it has an important effect on the DPC. In Figure 4, we show the percentage of the variance explained be the DPC 5 against the one explained by the DPC 5. for the 30 stock prices. We note the for most of the variables the best performance correspond to the DPC 5 procedure. Figure 5 presents the first OPC 5 and DPC 5 . The first DPC 5 , which is much smoother than the first OPC 5 , seems to be very useful to represent the general trend of the set of time series.

Robust Generalized DPCS
As most procedures defined by minimizing the mean square error, the DPC given by (7) is not robust. In fact, a very small fraction of outliers may have an unbounded influence on (f, α, β). The procedure proposed by Brillinger seems to be very difficult to robustify. At first sight, it may seems that it may be robustified by using a robust estimator of the spectral matrix. For example, Spangl and Dutter (2005) and Li (2012) proposed robust estimators for the spectral matrix. However, this is not enough to obtain robust DPCs. In fact, a robust estimator of the spectral matrix only guarantees the robustness of the coefficients in the linear combinations defining the principal components. However, the result of applying these coefficients to the original series may be largely affected by outlying observations. Instead, the DPC procedure defined by (7) is easier to robustify. A standard way to obtain robust estimates for many statistical models is to replace the minimization of the mean square scale for the minimization of a robust M-scale. This strategy was used for many statistical models, including among others linear regression (Rousseeuw and Yohai 1984), the estimation of a scatter matrix and multivariate location for multivariate data (Davies 1987) and to estimate the ordinary principal components (Maronna 2005). The estimators defined by means of a robust M-scale are called S-estimators. In this section we introduce S-estimators for DPC. Special care is required for time series with strong seasonality. The reason is that when the values corresponding to a particular season are very different to the others they will be considered as outliers by a robust procedure, and therefore they will be downweighted. As a consequence, the reconstruction of this season may be affected by large errors. Thus, the procedure we present here assumes that the series have been adjusted by seasonality to avoid this problem.

Generalized S-DPCs
Let ρ 0 be a symmetric, nondecreasing function for x ≥ 0 and ρ 0 (0) = 0. Given a sample x = (x 1 , . . . , x n ), the M-scale estimator S(x) is defined as the value s solution of If ρ 0 is bounded, then the breakdown point to ∞ of S(x), that is, the minimum fraction of outliers than can take S(x) to ∞ is b/ max ρ 0 . Moreover, the breakdown point to 0, that is, the minimum fraction of inliers that can take S(x) to 0, is 1− (b/ max ρ 0 ). Note that if b/ max ρ 0 = 0.5 both breakdown points are 0.5 (see Section 3.2.2 in Maronna, Martin and Yohai 2006). In what follows, we assume without loss of generality that max ρ 0 = 1. Moreover, ρ 0 is chosen so that E φ (ρ 0 (x)) =b,where φ is the standard normal distribution. This condition guarantees that for normal samples S(x) is a consistent estimator of the standard deviation. One very popular family of ρ functions is the Tukey biweight family defined by Given f = ( f 1 , . . . , f T +k ), β j = (β j,0 , β j,1 , . . . , β j,k ) and . . ,r j,T (f, β j , α j )). Let β the m × (k + 1) matrix whose jth row is β j , α = (α 1 , . . . α m ) and We define the first S-DPC f by Note that β and α are the coefficients that should be used to reconstruct the z j,t 's from f. We can observe that the only difference with the definition given in (7) is that instead of minimizing the MSE of the residuals, we minimize the sum of squares of the robust M-scales applied to the residuals of the m series. Put ψ = ρ , w(u) = ψ (u)/u,  Then s j satisfies Define the weights and where Differentiating (20) with respect to f t we derive in Section 3 of the supplemental material the following equation: Let F(f ) be the T × (k + 2) matrix with tth row ( f t , f t+1 , . . . , f t+k , 1) and W j (f, β,s) be the diagonal matrix with diagonal elements equal to w j,1 ((f, β j , s), . . . , w j,T (f, β j , s). Then differentiating (20) with respect to β j,i and α j we derive in Section 3 of the supplemental material Then, the first S-PDC is determined by Equations (19), (25), and (26). Note that the estimator defined by (7) is an S-estimate corresponding to ρ 2 0 (u) = u 2 and b = 1. Then for this case (25) and (26) become (10) and (11), respectively. The second S-DPC is defined as the first S-DPC of the residuals r j,t (f, β). Higher order S-DPC are defined in a similar manner.
One important point is the choice of b. At first sight, b = 0.5 may seem a good choice, since in this case we are protected against up to 50% of large outliers. However, the following argument shows that this choice may not be convenient. The reason is that with this choice, the procedure has the so-called 50% exact fitting property. This means that when 50% of the r j,t (f, β j ,α j )s are zero the scale S(r j (f, β j , α j )) is 0 no matter the value of the remaining values. Moreover, if 50% of the |r j,t (f, β j ,α j )| are small the scale S(r j (f, β j , α j )) is small too. Then when b = 0.5, the procedure may choose f, β, and α so to reconstruct the values corresponding to 50% of the data even if the dataset do not contain outliers. For this reason it is convenient to choose a smaller value as b, as for example b = 0.10. In that case to obtain S(r j (f, β j , α j )) = 0, it is required that 90% of the r j,t (f, β j ,α j )s be 0. One may wonder why for regression is common to use b = 0.5 and the 50% exact fitting property does not cause the problems mentioned above. The reason is that in this case, if there are no outliers, the regression hyperplane fitting 50% of the observations also fits the remaining 50%. This does not occur in the case of the dynamic principal components.

Computational Algorithm for the S-DPC
An iterative algorithm similar to the one described for the DPC in Section 2 can be used to compute the S-DPC. The only difference is that it should be based on (25)and (26) instead on (10) and (11).
The initial value f (0) can be chosen equal to a regular (nondynamic) robust principal component, for example, the one proposed in Maronna (2005). Once f (0) is computed we can use this value to compute a matrix . . , f (0) i+k , 1). The jth row of β (0) and α (0) j can be obtained using a regression S-estimate taking z ( j) as response and F (0) as design matrix. Finally, s (0) j = S(r j (f (0) , β (0) ). A procedure similar to the one described at the end of Section 2 can be used to determine a convenient number of lags and components replacing the MSE by the SRS.

Example 3
We will use the data of example 2 to illustrate the performance of the robust DPC. This dataset was modified as follows: each of the 7530 values composing the dataset was modified with probability 0.05 adding 20 to the real value. In Table 6, we show the MSEs of the series reconstructed with the DPC. Since the DPC is very sensitive to the presence of outliers, we also compute the S-DPC. Since the MSE is also very sensitive to outliers, we evaluate the performance of the dynamic principal components procedures using the SRS criterion. We take as ρ the bisquare function with c = 5.13 and b = 0.1. These values make the M-scale consistent to the standard deviation in the Gaussian case. Table 6 gives the MSEs and the SRSs for the DPC k and S-DPC k procedures for k = 1, 5, and 10.  We observe that the robust measure of performance SRS corresponding to the S-DPC is much smaller than the one corresponding to the DPC. In Figure 6, we show the boxplots for the first 16 stock prices with the DPC k and with the S-DPC k . For a better visualization, we have eliminated the outliers larger than 10. These boxplots shows that the S-DPC k is much less affected by the outliers than the DPC k .The boxplots of the 14 remaining stocks are similar, but are not shown by shortness sake.

Conclusions
We have proposed two dynamic principal components procedures for multivariate time series: the first one using a minimum squared error criterion to evaluate the reconstruction of the original time series and the second one using a robust scale criterion. Both criteria make sense even if in the case of nonstationary series. A Monte Carlo study shows that the proposed dynamic principal component based on the MSE criterion can improve considerably the reconstruction obtained using ordinary principal components in a dynamic way. In the case of stationary series, the performance of the proposed procedure is comparable with a finite sample version of Brillinger's approach and in the case of nonstationary series our procedure seems to behave better. We have also shown in an example that the robust procedure based on a robust scale is not much affected by the presence of outliers.
A simple heuristic rule to determine a convenient value for the number of components, p, and the number of lags, k, is suggested. However, further research may lead to better methods to choose these parameters trading off accuracy in the series reconstruction and economy in the number of values stored for that purpose.
Although the proposed DPC seems to be very powerful for data reconstruction they have some limitations for forecasting, because they use information from leads and lags to reconstruct the series, and this is not convenient for forecasting. However, they may be useful to find the dimension of the factors space in factor models and this will be the subject of further research.