A new multivariate EWMA scheme for monitoring covariance matrices

Abstract To monitor covariance matrices, most of the existing control charts are based on some omnibus test and thus usually are not powerful when one is interested in detecting shifts that occur in a small number of elements of the covariance matrix. A new multivariate exponentially weighted moving average control chart is developed for the monitor of the covariance matrices by integrating the classical -norm-based test with a maximum-norm-based test. Numerical studies show that the new control chart affords more balanced performance under various shift directions than the existing ones and is thus an effective tool for multivariate SPC applications. The implementation of the proposed control chart is demonstrated with an example from the health care industry.


Introduction
In modern manufacturing and service industries, it is common to simultaneously monitor several quality characteristics of a process (Alfaro et al. 2009;Du, Lv, and Xi 2012). Recently, a great deal of research interest has been directed at multivariate statistical process control (MSPC) and the techniques thus developed have been popular in areas such as signal processing, network security, health care surveillance, image processing, genetics, finance and economics. Woodall and Montgomery (1999) and Stoumbos et al. (2000) pointed out that the multivariate control charts are one of the most rapidly developing areas of SPC and suggested that basic and applied research should be conducted on methods for monitoring multiple parameters in a process. An extensive overview is given by Bersimis, Psarakis, and Panaretos (2007) for the existing multivariate control charts.
In addition to the mean vector, it is also important to monitor changes in the covariance matrix in many industrial settings. Whether a process is in-control (IC) or not could very well depend on the dispersion and interaction of multiple variables, represented by the elements in the covariance matrix. In the health care industry, for example, the foetal state may be determined using a technique called cardiotocography which measures multiple features such as the number of accelerations per second, the number of foetal movements per second and so on. Any significant deviation of the mean vector or the covariance matrix of those features from the normal state indicates that the foetus may have a health problem. Detecting the deviations as soon as they occur facilitates timely and effective treatments. In a high-dimensional case, the covariance matrix would contain more elements of interest and thus deviations occur with a higher probability. In other words, monitoring the covariance matrix is even more important in high-dimensional cases.
In this study, we focus on the on-line monitoring of the process covariance matrix. We assume that an IC reference sample of size m 0 has been collected in the Phase I analysis, and it can be used for estimating certain IC parameters. To be more specific, we assume that there are m 0 independent and identically distributed historical reference observations, x −m 0 +1 , . . . , x 0 ∈ R p , for some integer, p ≥ 1, and the i th future observation, x i , is collected over time from the following multivariate change-point model . . . ,0,1,. . . ,τ ;N p (μ,OC ), where τ is the unknown change point and IC = OC are, respectively, the IC and out-of-control (OC) covariance matrices.
In this study, as our focus is on monitoring the covariance matrix, it is assumed that μ would not change, which is consistent with the existing literature. As a Phase II SPC convention, it is usually assumed that the IC parameters are known or, equivalently, that they are estimated from a sufficiently large reference data-set. Without loss of generality, (μ, IC ) can be simplified to (0, I p× p ). Some multivariate control charts in this sphere have been proposed and discussed based on the multivariate exponentially weighted moving covariance matrix (MEWMC) recently. See in Yeh, Huwang, and Wu (2005), Yeh, Lin, and McGrath (2006) and Huwang, Yeh, and Wu (2007). More specifically, let a subgroup of size n at time t, x t,1 , x t,2 , . . . , x t,n , be given, where n is an integer great than or equal to 1. The MEWMC is defined as that, for t = 1, 2, . . ., where and 0 = I p× p . Based on the MEWMC, Hawkins and Maboudou-Tchao (2008) proposed a control chart, termed "HMT" chart here, and chosen its charting statistic to be the likelihood ratio statistic (Alt 1984), It is natural to define the charting statistic as a likelihood ratio. Similar consideration can be found in Yeh, Huwang, and Wu (2004). The control charts mentioned above with omnibus charting statistics like (4) are powerful for detecting shifts that occur in the majority elements of . In practice, however, shifts often occur in only a few of the variance and/or covariance elements (Yeh, Li, and Wang 2012). This is the so-called "sparsity" characteristic. A fault is more likely to be caused by a hidden source, which is reflected in unknown changes characterised by this sparsity (see Zou and Qiu 2009). In such cases, more powerful control charts are needed and two control charts have been thus developed via penalised likelihood estimation. Li, Wang, and Yeh (2013) developed a penalized likelihood ratio (PLR) control chart under the premise that the OC covariance matrix OC remains sparse in the sense that only a few diagonal elements are not equal to 1 and a few off-diagonal elements are not equal to 0 (assuming IC = I p× p ). Let the subgroup size n be larger than or equal to the dimensionality p and the observation at time t be x t , they defined a precision matrix ρ = −1 as the solution to the following penalised likelihood function modified from that developed by Rothman et al. (2008): where A 1 = p j=1 p i=1 |a i j | for A = [a i j ] p× p and ρ is a data-dependent tuning parameter which can be tuned to achieve different levels of sparsity of the . Then the charting statistic of the PLR chart can be written as If we change S t to t in Equations 5 and 6, a MEWMC-based PLR chart can be obtained. For the cases with individual observations, Yeh, Li, and Wang (2012) discussed a Lasso-MEWMC chart, termed "LMEWMC" chart in this paper. Its ρ is defined as the solution to the following penalised likelihood function: Correspondingly, the charting statistic of the LMEWMC chart is Both the PLR chart and the LMEWMC chart have been shown to perform well in many cases. However, their appealing performance depends heavily on the choice of ρ, the regularisation parameter which plays a vital role in balancing the robustness and sensitivity of the two charts against various shifts. Unfortunately, this tuning parameter is not easy to set as demonstrated by Yeh, Li, and Wang (2012) and Li, Wang, and Yeh (2013). Related discussions on this issue are provided by Zou and Qiu (2009), Wang and Jiang (2009), Capizzi and Masarotto (2011) and Zou, Ning, and Tsung (2012. They also used the penalised likelihood-based control schemes but the focuses are mainly on the monitoring of location parameters. To circumvent this problem of setting the regularisation parameter, a new control chart based on the MEWMC is proposed for monitoring deviations in a covariance matrix in this paper. A new charting statistic that takes the sparsity characteristic into account, together with a L 2 norm and a maximum norm, is developed accordingly. The proposed test offers more balanced protection against various possible shifts than the HMT chart in the sense that it is much more sensitive to process shifts characterised by sparsity yet its ability to detect shifts not characterised by sparsity is only slightly inferior to that of the HMT chart. In addition, the proposed control scheme often delivers detection results approaching the best result of the LMEWMC chart and the PLR chart. Unlike the two charts, the new approach does not require the setting of any meta parameters (except for λ when applying an EWMA-type control chart) and does not involve numerical minimisation. We describe in detail the MSPC scheme in Section 2, and investigate its numerical performance in Section 3. In Section 4, we demonstrate the proposed method by applying it to a real data-set from the health care industry. We conclude with several remarks in Section 5.

The proposed MaxNorm chart
Recall the MEWMC sequence defined in (2). When the process is IC, t would close to the identity matrix. Hence, monitoring changes in the covariance matrix is tantamount to determining whether t is significantly different from I p× p . Let C t = t − I p× p . We can then examine the deviation of variance/covariance elements in C t from 0. Define is the element in covariance matrix C t and i ≤ j. Then d t is a [ p( p + 1)/2] × 1 vector consisting of p diagonal elements and p( p − 1)/2 upper triangular off-diagonal elements.
Under the IC condition, d t is a natural estimator of the zero vector 0 p( p+1)/2 = (0, 0, . . . , 0) . Thus in Phase II monitoring, a significant deviation of d t from 0 p( p+1)/2 would indicate a change in the process. A similar idea of arranging all the elements in a covariance matrix in a vector and compare it with the IC vector was discussed as early as in Yeh, Huwang, and Wu (2005). An intuitive way of assessing the deviation is to calculate the distance between the two vectors based on a certain type of measure. Two commonly used measures, the L 2 norm and the L ∞ norm, defined by and are adopted in this study. When t deviates farther from I p× p , T t 1 and/or T t 2 will tend to have larger values. T t 1 is similar to the locally most powerful invariant test for sphericity under the multivariate normality assumption (John 1971), defined as Some control charts are essentially developed based on Q J . See Yeh, Lin, and McGrath (2006) for an excellent review. Intuitively speaking, T t 2 is more effective than T t 1 if potential shifts occur in only a few elements in . On the other hand, T t 1 would outperform T t 2 if shifts occur in a moderate to large number of covariance elements. Hence, combining these two statistics offers a more balanced detection ability. At time t, let E(T t 1 ) and E(T t 2 ) denote the expectations of T t 1 and T t 2 under the IC state, respectively. Also let Var(T t 1 ) and Var(T t 2 ) denote the corresponding variances. When t tends to infinity, the expectations and variances of T t 1 and T t 2 tend to their limits correspondingly. In this study, the limits are estimated through Monte Carlo simulation. Let E(T 1 ), E(T 2 ), Var(T 1 ) and Var(T 2 ) be those limits. Then the charting statistic of our proposed control chart, termed the "MaxNorm (maximum of the two norms) chart" for convenience, is naturally defined, for t ≥ 1, as Both T t 1 and T t 2 are standardised so that they are comparable. Choosing the "max", instead of the "average" or "min", as the combination method is in purpose of deriving a sensitive control chart to the shifts. This method is similar to that of Zou and Qiu (2009), who combined a sequence of "directional" likelihood ratio test statistics obtained from a LASSO-type minimisation.
After generating random data-sets from N (0, I p× p ), one can plot T t MaxNorm against time t and an OC signal will be issued as soon as the value of T t MaxNorm exceeds a pre-determined upper control limit (UCL).

Performance studies
In this section, we compare the chart performance of the proposed MaxNorm chart with that of the other three control charts introduced in Section 1, including the HMT chart, the LMEWMC chart and the MEWMC-based PLR chart. Note that the original PLR chart discussed by Li, Wang, and Yeh (2013) was built on the sample covariance matrices. Therefore, we also conduct a comparison between the MaxNorm chart with its λ being 1 and the original PLR chart in this section. In particular, we study the OC average run length (ARL), ARL 1 , assuming that all control charts are designed to have the same IC ARL, ARL 0 . Following the simulations conducted in Hawkins and Maboudou-Tchao (2008), Yeh, Li, and Wang (2012) and Li, Wang, and Yeh (2013), we set λ = 0.1 for all the control charts. Moreover, we set ARL 0 to 200 the simulation size to 10, 000. Assume that = I p× p when the process is IC and = OC = I p× p when the process is OC, where OC is some specific covariance matrix. To evaluate the performance of control charts under various situations, we follow Li, Wang, and Yeh (2013)  to test seven specific OC 's, denoted by OC j , j ∈ {1, 2, . . . , 7}, in the simulation studies. The seven covariance matrices (see Appendix) can be divided into three categories. In the category (I), covariance matrices OC 1 and OC 2 , the shifts are induced in a different number of diagonal elements. The shifts in off-diagonal elements are induced in the category (II), covariance matrices OC 3 and OC 4 . In the category (III), covariance matrices OC 5 , OC 6 and OC 7 , the shifts are induced in both the diagonal and off-diagonal elements. Note that OC j , j ∈ {2, 4, 6}, were previously considered in Hawkins and Maboudou-Tchao (2008) and OC j , j ∈ {2, 3, 4, 6}, were studied by Yeh, Li, and Wang (2012). Let us start with the comparison between the MaxNorm chart and the HMT chart. Notice that the HMT chart usually performs well when majority elements change in the covariance matrix. Therefore, the comparison is used to show the advan- tage of the MaxNorm chart in the cases of sparsity and also the satisfactory performance of the chart when majority elements change in the covariance matrix. In addition to the HMT chart, Yeh, Huwang, and Wu (2005) and Huwang, Yeh, and Wu (2007) proposed the MaxMEWMV chart and the MEWMS chart based on the MEWMC, respectively, both of which are powerful for detecting shifts that occur in the majority elements. We compared the three control charts through the simulations and found the following facts. The MEWMS chart with the charting statistic of tr( t ) is particularly powerful when the diagonal elements shifted in a covariance matrix. However, it is usually uncompetitive in detecting shifts of off-diagonal elements. The HMT chart and the MaxMEWMV chart have their merits in different scenarios of OC covariance matrix respectively. In general, the overall performance of the HMT chart and the MaxMEWMV chart are comparable, and both of which are better than the overall performance of the MEWMS chart under various OC scenarios. Therefore, in this study, we choose the HMT chart for the comparison.
We simulate the data based on the seven specific OC 's individually at the beginning of Phase II monitoring. The ARL 1 , which indicates the number of samples taken before the first OC signal is detected, is calculated. The UCL of each control chart for a given dimensionality p ∈ {5, 10, 30} and subgroup size n ∈ {1, 20, 50} is determined through Monte Carlo simulation before the monitoring process. Let the deviation where j ∈ {1, 2, . . . , 7}, increases from 1.6 to 8.0 in steps of 1.6. The overall performance, represented by ARL 1 and its standard deviation (STD), is presented in Tables 1-3. When the OC covariance matrix is one of the category (I) covariance matrices, the MaxNorm chart outperforms the HMT chart regardless of the dimensionality p or the subgroup size n. For a given n, the superiority of the MaxNorm chart grows as p increases. As increasing the dimensionality is equivalent to increasing the degree of sparsity of the covariance matrix, it can be concluded that, for the category (I) covariance matrices, the better performance of the MaxNorm chart becomes all the more apparent as the degree of sparsity increases.
For the category (II) covariance matrices, with a sufficiently large n (e.g. n ≥ 20 in the simulation study), the HMT chart performs better with a small p while the MaxNorm chart turns the tables when the degree of sparsity enhances. When n is small, even though the MaxNorm chart is still not able to outperform the HMT chart, the performance gap between the two narrows gradually as the sparsity increases.
In the case of the category (III) covariance matrices, with a fixed n, the MaxNorm chart tends to perform better than the HMT chart when the shifts are characterised by a more significant sparsity. The advantage of the MaxNorm chart in terms of sparsity becomes more significant when n becomes large.
In conclusion, the MaxNorm chart is usually competitive when shifts are characterised by sparsity. The superiority of the MaxNorm chart becomes more obvious when the degree of sparsity increases. When sparsity is not significant, the performance of the MaxNorm chart is similar to the HMT chart as long as the subgroup size is large enough. Therefore, the MaxNorm chart is in general good at detecting shifts in the covariance matrix.
Next, we compare the performance of the MaxNorm chart with that of the two penalised likelihood-based control charts, i.e. the MEWMC-based PLR chart when the subgroup size n ≥ 1 and the LMEWMC chart when n = 1. The regularisation parameter ρ in these two charts needs to be determined according to the specific OC covariance matrix to achieve the best performance of the charts. In other words, for a specific OC , the MEWMC-based PLR chart and the LMEWMC chart with To ensure that all the OC 's are singular in the following simulations, the deviation in 13 is set to change from 0.8 to 2.0 in steps of 0.4. Let the subgroup size n ∈ {1, 50} and the dimensionality p ∈ {5, 30}. The performance of the charts in terms of ARL 1 and its STD is shown in Tables 4-6. When n = 1, we compare the MaxNorm chart with both the MEWMC-based PLR chart and the LMEWMC chart. Tables 4 and 5 indicate that different ρ's give different ARL 1 's of the PLR chart and the LMEWMC chart when other parameters are kept constant. In the case of OC 2 with p = 30 and the cases of OC 3 and OC 4 with p = 5, the value of ρ even becomes the decisive factor for the performance of the two penalised likelihood-based control charts. When n = 50, we investigate the performance of the MaxNorm chart and that of the MEWMC-based PLR chart. The results are recorded in Table 6. Again, in the cases of OC 3 and OC 4 with either p = 5 or p = 30, inappropriate ρ's lead to the unacceptable performance of the MEWMC-based PLR chart.
Therefore, we can conclude that the performance of the MEWMC-based PLR chart and that of the LMEWMC chart depend crucially on ρ and that this parameter must be set carefully. However, the optimal value of ρ is partially determined by the specific OC which is usually unknown in reality. Setting an appropriate ρ seems to be the major obstacle to applying the MEWMC-based PLR chart and the LMEWMC chart in real cases.
In spite of the impractical problem, we compare the ARL 1 's of the two penalised likelihood-based control charts with those of the MaxNorm chart. In most cases, the MaxNorm chart performs as well as or even better than the two charts at their peak among a grid value of ρ's. Though sometimes the MaxNorm chart is not competitive to the best performance of the other two charts, it will never lead to a detection performance worse than the penalised likelihood-based charts when inappropriate values of ρ are selected. In the case of a higher dimensionality and/or a larger subgroup size, the MaxNorm chart carries a more significant advantage. Based on the above analysis, the MaxNorm chart is recommended when there is no foreknowledge of the OC covariance matrix. Recall that the original PLR chart discussed by Li, Wang, and Yeh (2013) was constructed on the sample covariance matrices. Therefore, in the following, we compare the performance of the original PLR chart with that of the MaxNorm chart with λ = 1 in (2). Again let ρ ∈ {0.05, 0.20, 1.00} and change from 0.8 to 2.0 in steps of 0.4. Note that it requires n ≥ p in the PLR chart. Therefore, we consider the simulations with p ∈ {5, 10} when n = 20 and p ∈ {5, 30} when n = 50. Corresponding results are recorded in Tables 7 and 8. The results reflect the same facts we have discussed above. The value of ρ plays an important role in determining the performance of the PLR chart. Its impact can be easily observed in the cases of OC j , j ∈ {3, 4}. In addition, the MaxNorm chart always performs better than the PLR chart with an improper ρ and in some cases outperforms the PLR chart at its peak among a grid value of ρ's. We thus conclude that the advantage of the MaxNorm chart in real applications is valid on the whole.
In the above simulations, we set λ = 0.1 for all the cases. To verify that the relative performance of these control charts does not change with the value of λ, we further evaluate the performance of the MEWMC-based PLR chart, the HMT chart and the MaxNorm chart with λ ∈ {0.1, 0.2, 0.3} when p = 10 and n = 20. The simulation result is presented in Tables 9 and 10. It indicates that, though the absolute value of ARL 1 of each control chart changes with the value of λ, the relative performance is robust and interrelated to the smoothing parameter. Also, the findings from above simulations are still valid randomly extracting the standardised observation in Phase I with replacement, instead of randomly generating variables from a multivariate normal distribution. Through this procedure, the control limits for T t MaxNorm and T t HMT are determined to be 0.5215 and 52.50, respectively. In Phase II, to alleviate the effect the order of the incoming data streams might have, a bootstrap type testing procedure is considered. That is, a random sample of size 295 is drawn from the 295 standardised observations labelled "suspect" without replacement. Then both charts are applied to this random example and the run lengths are recorded. This procedure is repeated 100, 000 times and the resulting ARL 1 's are found to be 35.25 and 79.28 for the MaxNorm chart and the HMT chart respectively.
After the detection, we look back to investigate the reason that the MaxNorm chart significantly outperforms the HMT chart in this real example. Note that we have assumed that I = I p× p and standardised all observations in this example. Therefore, we compare the followingˆ I I , derived from the 295 standardised observations labelled "suspect", with I p× p . Obviously, inˆ I I , there are only a small part of the elements that are significantly different from the expected values, 0 for the off-diagonal elements and 1 for the diagonal elements. Recall that in Section 3, we have shown that the proposed MaxNorm chart is more sensitive when the dimensionality/the degree of sparsity is high. The results in this section certainly confirm this.

Conclusion
As mentioned in Sections 1 and 3, Hawkins and Maboudou-Tchao (2008) proposed a chart based on MEWMC (i.e. the HMT chart) that is particularly effective when shifts occur in the majority elements. However, its performance is disappointing when the shifts are characterised by sparsity. With the consideration of sparsity, a LMEWMC chart was developed particularly for the cases with individual observations (Yeh, Li, and Wang 2012). And later, a PLR chart based on the sample covariance matrices was proposed by Li, Wang, and Yeh (2013) for the more general cases with the subgroup size n ≥ p, where p denotes the dimensionality. Both the LMEWMC chart and the MEWMC-based PLR chart, which is derived from the original PLR chart, perform better than the HMT chart in most cases characterised by sparsity. However, their performance relies heavily on the regularisation parameter ρ, which is difficult to determine without foreknowledge of the specific OC covariance matrix. Therefore, an insurmountable obstacle exists in the applications of the LMEWMC and the MEWMC-based PLR charts.
To overcome these problems, a new multivariate control chart based on the MEWMC, termed "the MaxNorm chart", is proposed in this study. Note that a modified MaxNorm chart based on the sample covariance matrices can be easily derived with the same idea. Composing the charting statistic with a L 2 norm and a L ∞ norm, the proposed chart is able to detect various shifts in the covariance matrix satisfactorily, as demonstrated through simulation studies. A real example from the health care industry further shows that the proposed MaxNorm chart can also work well in applications.
In this study, we consider, in particular, the monitoring of the covariance matrix and assume that the mean vector is constant and known. In practice, however, the mean vector rarely stays the same and thus the impact of the mean shifts should be investigated. Then, according to how the mean shifts affect the shift detection in the covariance matrix, a single chart can be developed in a future study for simultaneously monitoring the mean vector and the covariance matrix. Related problems have been studied by several researchers, including Chen, Cheng, and Xie (2005), Zhang, Zou, andZhang, Li, andWang (2010). Moreover, notice that the observations are usually assumed to come from a multivariate normal process in MSPC. But when that is not the case, a robust multivariate control chart for monitoring the mean vector and/or the covariance matrix would be needed.