Adaptive Sampling for Monitoring Multi-Profile Data with Within-and-between Profile Correlation

Abstract Multi-profile data can provide within-and-between profile information for efficiently modeling and monitoring system status. In practice, however, acquisition of such data requires large number of sensors, which raises various concerns and difficulties, for example, cost, energy, and data transmission bandwidth, in accessing the full data from each sensor. In this article, we propose an adaptive sampling strategy for multi-profile monitoring by using limited portion of data. The proposed sampling and monitoring scheme incorporates the within-and-between profile correlation and features the balance between random search and greedy search in identifying the most informative profiles. More specifically, the multivariate functional principal component analysis (MFPCA) is used to capture the within-and-between profile correlation, and the MFPC scores are augmented for unobservable profiles to feed into a multivariate CUSUM chart. Two properties of the proposed method for allocating sampling resources among sensors are investigated. Numerical and case studies are conducted under various scenarios to demonstrate the effectiveness of the method.


Introduction
Profile data represents the functional relationship between a response variable and one or more explanatory variables, which is one of the widely recorded data types in engineering practice (Wang and Tsung 2005;Fallahdizcheh and Wang 2022a). Some examples of profile data in engineering contexts can be found in tonnage readings within a stamping process (Jin and Shi 2000) and the relationship between engine torque and revolution speed (Amiri et al. 2010). With the development of data acquisition techniques and the prevalence of industrial internet of things (IIoT), more and more sensors are equipped on systems to provide information for real-time monitoring and quality control purpose. This motivates the development of multiprofile modeling and monitoring techniques, which focuses on extracting the between-profile correlation to enhance the monitoring performance. For example, in Paynabar, Zou, and Qiu (2016), tonnage profiles from four different channels/sensors were recorded and analyzed together to provide superior monitoring performance than using single profile data.
However, in many engineering applications, due to various constraints, it is difficult to take advantage of all the profiles in real-time. We denote these constraints as "sampling constraints, " which include but not limited to bandwidth or energy for data transmission, data acquisition cost, and processing capability (Liu, Mei, and Shi 2015;Xian, Wang, and Liu 2018). For example, semiconductor manufacturing process has hundreds or even thousands of sensors that generate terabyte level streaming data (i.e., profile data) in real-time (Bleakie and Djurdjanovic 2013). It is impossible to store and monitor all data streams in realtime (limitation of data storage and processing capabilities). In volcano activity monitoring, the seismic sensors (generating periodic seismic profile data) located in different locations have limited battery life, and the cost of replacing the battery is very expensive (Zhao, Li, and Wu 2020). As a result, only a part of sensors will be open and the rest will be in sleep state (Sasidhar, Sreeresmi, and Rekha 2014). Another example is the use of unmanned aerial vehicles (UAV) to take and transmit real-time images (a type of profile data) (Mitchell and Chen 2013;Skorput, Mandzuka, and Vojvodic 2016), where the number of UAVs is often limited compared with the areas to be monitored. It is clear from these examples that the key challenge for monitoring under sampling constraints is to strategically select the most informative sensors in real-time. In this article, we focus on developing such a real-time monitoring framework for multiprofile data, where the within-and-between profile correlations are considered under sampling constraints.
Adaptive sampling is an effective solution to monitoring under sampling constraints (Montgomery 2007;Li and Qiu 2014). The basic idea of adaptive sampling is to identify a partial set of observable sensors in each sampling step. The observable sensors in each sampling step will change dynamically and sequentially based on samples obtained from observable sensors in previous sampling steps. The key challenge for determining these observable sensors is how to handle the corresponding unobservable sensors in each sampling step. To deal with this issue, adaptive sampling first augments data for the unobservable sensors to facilitate data availability for all sensors, that is, observed data or augmented data. Then, local monitoring statistics, for example, cumulative sum (CUSUM), are constructed based on observed/augmented data for each sensor. These local statistics are compared and ordered, and sensors with high local monitoring statistics are selected as observable ones for the next sampling step. Based on this idea, different adaptive sampling methods have been developed in terms of how to augment unobservable sensors and how to generate local monitoring statistics.
The most straightforward way to augment unobservable sensors is to assign a fixed value, which was adopted in the top-r based adaptive sampling (TRAS, Liu, Mei, and Shi 2015). In this method, a tuning parameter was used to represent the data of all the unobservable sensors, and the CUSUM local statistics were compared to select sensors for next sampling step. TRAS is effective in detecting shifts in both positive and negative directions, thus, is superior than traditional adaptive methods, for example, causation-based adaptive (CBA) algorithm (Li, Jin, and Shi 2008;Liu, Zhang, and Shi 2013). However, TRAS requires three assumptions that limit its practicability Guo et al. 2020;Nabhan, Mei, and Shi 2021): (i) the value of is assumed to be known and fixed, (ii) all the sensors are assumed to be iid, and (iii) all the sensors are assumed to be Gaussian distributed variables. The nonparametric adaptive sampling (NAS, Xian, Wang, and Liu 2018) was then proposed to relax the Gaussian assumption in TRAS and achieved superior performance in monitoring both Gaussian and non-Gaussian variables. Built upon the idea of NAS, the rank-based adaptive sampling (R-SADA, Xian et al. 2021) was proposed to deal with the issue of fixed and demonstrated its effectiveness. The Thompson-sampling-Shiryaev-Roberts-Pollak (TSSRP, Zhang and Mei 2022) was also proposed to exclude impacts of by formulating the adaptive sampling as a multi-armed bandit problem. However, NAS, R-SADA, and TSSRP still rely on the iid assumption thus ignore the correlation among sensors. In fact, sensor correlations widely exist, and taking these correlations into consideration can significantly improve the performance of monitoring (Guo et al. 2020;Nabhan, Mei, and Shi 2021). Many researchers also realized the importance of considering sensor correlations in adaptive sampling. For example, the spatialadaptive sampling and monitoring (SASAM, Wang et al. 2018) was proposed to consider the spatial correlation, where the information from spatially adjacent variables was used to augment data. The correlation-based dynamic sampling (CDS, Nabhan, Mei, and Shi 2021) employed the conditional expectation to infer unobservable sensor values and formulated the sampling resource allocation as an optimization problem. The Thompson sampling and smooth-sparse decomposition were used in Guo et al. (2020) to extract information of sensor correlations, which was fed into a Bayesian hypothesis testing to detect change points. The tensor completion was applied in Gómez, Li, and Paynabar (2022) to facilitate the temporal and spatial correlation when implementing adaptive sampling.
Nevertheless, all the existing methods in adaptive sampling focus on scalar variables, that is, the data from each sensor is a scalar value. The key difference between scalar and profile is that a profile naturally contains within-profile correlation, while scalar variable does not. This difference makes the direct extension of existing adaptive sampling methods to profile data a very challenging, if not impossible, task. This is because the adaptive sampling methods developed for scalar variables only consider augmentation for a single data point, while the augmentation of unobservable profile requires information of the whole functional relationship. This fact also reveals the information loss of missing a profile sensor will be much greater than a scalar sensor. More specifically, incorporating the within-profile correlation in profile data results in two major challenges for developing adaptive sampling framework. The first challenge is to appropriately model the within-profile correlation and integrate it with between-profile correlation. An intuitive idea is to first fit each profile using regression models, for example, polynomial or splines, for characterizing within-profile correlation, then quantify the between-profile correlation through modeling the correlation among low-dimension coefficients from the regression models. Unfortunately, such idea only works when all profiles share the same regression basis. For example, if one profile is best fit with a polynomial model, and another profile is best fit with a spline model, then it is difficult to quantify the between-profile correlation since the low-dimension coefficients in two models are not comparable. On the other hand, some existing adaptive sampling methods considered the between-correlation among scalar variables. However, it is not straightforward to generalize them to model within-and-between profile correlations. For example, the CDS in Nabhan, Mei, and Shi (2021) used multivariate Gaussian to model correlations among multiple scalar variables. The extension of this method to model within-andbetween correlation in multi-profile requires the construction of Multi-variate Gaussian Process (MGP). However, MGP requires very expensive calculation load, which limits its application in real-time modeling and monitoring (Li et al. 2021). The second challenge is how to augment unobservable profiles. Different with the scalar variables, the unobservable profile means the missing of a whole sample of the functional relationship. It requires the algorithm to augment not only the value of a profile, but also the within-profile correlation among augmented values, which makes direct data augmentation very challenging on profiles. For example, augmenting to a profile (the TRAS method) will construct a straight line, which cannot provide any information about the functional relationship in a profile. As a result, it is imperative to develop a new framework to overcome these challenges for adaptive sampling and monitoring of multiprofile data.
In this article, we propose to employ multi-variate functional principal component analysis (MFPCA) to model the within-and-between profile correlation and realize the adaptive sampling for multi-profile data. The MFPCA is a widely used tool to model multi-profile data (Paynabar, Zou, and Qiu 2016;Fang, Paynabar, and Gebraeel 2017;Fallahdizcheh and Wang 2022b). It decomposes multi-profile into the summation of multiplications between eigenfunctions and MFPC scores. The MFPC scores are mutually correlated random variables, whose dimension is the same as the number of profiles. The co-variance matrix of MFPC scores thus characterizes the between-profile correlation. The eigenfunctions are added together by using the MFPC scores as the weights, which facilitate the unique withinprofile correlation of each profile. As a result, the challenge of modeling within-and-between profile correlation can be appropriately handled by MFPCA. Our basic idea to handle the data augmentation challenge is to augment the MFPC scores instead of the raw profile data. The rationale is that the MFPC scores contain most of the correlation information about profiles. As a result, the MFPCA can well handle the above mentioned two challenges, which makes it a preferable tool for adaptive sampling and monitoring of multi-profile. To realize our idea, historical (fully observed) profile data are fed into the MFPCA to provide an empirical estimation of the eigenfunction and MFPC scores, which is denoted as the offline stage. In the online monitoring stage, observable profiles can be used to directly calculate their MFPC scores, and the empirical MFPC scores are sampled to provide augmentation for MFPC scores of unobservable profiles. After the MFPC scores (observed and augmented) are obtained, the CUSUM procedure is adopted to construct the monitoring statistics and control chart for detecting shifts in profiles. The major contribution of our work is that we develop an adaptive sampling framework for multi-profile data, where the two above mentioned challenges are solved comprehensively by incorporating the MFPCA. In addition, we investigate two properties of the developed multi-profile control chart, which provide theoretical guarantees for the performance of the proposed method. Intensive numerical studies and one real-world case study are also investigated to demonstrate the superiority of the proposed multi-profile adaptive sampling chart in detecting shifts in profiles.
The rest of the article is organized as follows. Section 2 lists the assumptions for formulating the problem, where the CUSUM control charts and MFPCA are also briefly reviewed. Section 3 presents the multi-profile adaptive sampling monitoring scheme based on MFPCA, where two important properties associated with the sampling behavior are also discussed. In Section 4, numerical studies are conducted to compare the performance of the proposed method with selected benchmarks. A real case study of monitoring temperature profiles in roasting machines is studied in Section 5 to validate the method. Finally, Section 6 draws conclusion remarks and discusses some open topics based on this work.

Assumptions and Problem Formulation
Without loss of generality, we assume there are p profiles, that is, P = {1, . . . , p}, and the data/reading of each profile is generated from a sensor. The ith sampling step of the jth profile constructs a profile reading, that is, . is the index of sampling step, and t k , k = 1, . . . , l, is the value of explanatory variable within a profile sample. As a result, the full data generated in the ith sampling step is denoted as To facilitate the data analysis of MFPCA, we list two assumptions for the data generation process: A1 The generated data y i in each sampling step are iid samples from stationary multi-variate Gaussian process. A2 All the profiles share the same t k , that is, the values of the explanatory variable in each profile are the same.
The assumption A1 guarantees the within-and-between correlation structure in profiles keeps the same over sampling steps.
A1 also provides sufficient condition for the normality of MFPC scores (Ghanem and Spanos 2003), and the normality of MFPC scores can be checked by hypothesis testing in practice (Mardia 1970). The assumption A2 provides convenience for computation in MFPCA, and it can be relaxed by using kernel function regression (Yao, Müller, and Wang 2005).
Due to the sampling constraints, however, not all the profiles can be recorded for analysis. Let r be the number of profiles that can be recorded at each sampling step, where r ≤ p. To explicitly represent the sampling constraints, we denote the set of observable and unobservable profiles at the ith sampling step represent the number of observable and unobservable profiles in each sampling step, respectively, where | · | is the cardinality of a set. Built upon these definitions and notations, the profile adaptive sampling can be formulated as identifying (i + 1) and (i + 1) based on previously observed profile data, that is, The CUSUM procedure is a widely adopted tool for realizing the adaptive sampling, which can be formulated as follows (Cheng and Thaga 2005): i,j detect the positive and negative shift of the jth profile at the ith sampling step, respectively, and each follows a CUSUM procedure: where X i,j is the index for the information obtained from jth profile at the ith sampling step, and u min is a constant parameter that represents the minimum interested mean-shift magnitude. The updated value of W i,j in (1) will be used to rank the profiles to be observed in the next sampling step. Let s i,(n) ∈ P denote the index of profile that ranks nth (in descending order) according to W i,j , the index of observable/unobservable profiles at the (i + 1)th sampling step can be determined as (Liu, Mei, and Shi 2015): where the intuition is that the larger value of W i,j , the higher possibility the jth profile suffers anomaly, thus, sampling resources should be allocated to profiles with higher values of W i,j . For example, if W i,4 is the highest among all W i,j 's, then s i,(1) = 4 and the profile number 4 will become observable at the next sampling step.
To apply (1)-(3) to profile adaptive sampling, there are two critical challenges: (i) the X i,j is a scalar while the observed profile data y i,j contains a sequence of values, which makes the direct plug-in of y i,j into (2) infeasible. More importantly, the within-and-between profile correlation should be considered when implementing the adaptive sampling and monitoring; and (ii) the augmentation of p−r profiles in each sampling step. Note the specific profiles that need augmentation might be different in each sampling step since the set (i) changes with i.
In our work, we propose to use MFPCA to decompose the multi-profile into eigenfunctions and MFPC scores so that the within-and-between profile correlation can be modeled. We define y i (t) = [y i,1 (t), . . . , y i,p (t)] T as the observation vector for all p profiles in the ith sampling step. Note the difference between y i (t) and the previously defined y i , where y i defines the whole data in p profiles while y i (t) defines specific data points (at explanatory variable t) from p profiles. Then, MFPCA decomposes the y i (t) as Equation (4) shows that μ(t) characterizes the mean trend of each profile and the d 0 d=1 ξ i,d υ d (t) characterizes the residual, where ξ i,d provides between-profile correlation and υ d (t) presents the within-profile correlation. The formulation in (4) requires values of μ(t), d , and υ d (t), which can be estimated by m 0 fully observed training data (Paynabar, Zou, and Qiu 2016), that is, y 1 , . . . , y m 0 . Traditional monitoring framework by using MFPCA will estimate the MFPC scores of newly collected data to serve as monitoring statistics: where i = m 0 + 1, m 0 + 2, . . . represents the sampling step in online monitoring. Under sampling constraints, however, only limited sampling resources can be used in each sampling step, that is, y i (t k ) has p − r missing values. As a result, augmentation must be conducted to facilitateξ i,d for constructing the monitoring statistics.
In this article, we propose to augment missing values in ξ i,d by strategically sampling from the estimated distribution of ξ i,d , that is, MVN(0,ˆ d ), so that the value of W i,j can be computed for unobservable profiles in the CUSUM framework. A control chart will be established based on the proposed multiprofile adaptive sampling framework. The developed control chart features two important properties: P1 In-control property: When all the profiles are in-control, the proposed sampling framework will obtain data from each profile infinite number of times though only a limited number of sampling resources is available in each sampling step.
P2 Out-of-control property: When some profiles become out-of-control, once any of these profiles are observed, sampling resources will be eventually always allocated to out-ofcontrol profiles.
The property P1 guarantees every profile will be observed even if the sampling resources are limited, that is, r ≤ p. This property makes the in-control sampling a random search among all profiles so that shifts occurred anywhere and anytime can be quickly captured. Once the out-of-control profiles are observed by the random search, the property P2 can switch the random search into greedy search, that is, focus on allocating sampling resources to out-of-control profiles. This property can help allocate limited sampling resources to the out-of-control profiles adaptively and accurately, which maximizes the monitoring capability under the sampling constraints. It is clear that these two properties together guarantee the exploration (P1) for in-control stage and exploitation (P2) for out-of-control stage. We will introduce the details of our sampling framework and control chart for realizing these two properties in the next section.

Adaptive Sampling Strategy for Monitoring Multi-profile Data
In this section, we will introduce the proposed adaptive sampling strategy for monitoring multi-profile data. The following sections will elaborate the procedures of the proposed strategy in detail. Section 3.1 constructs a local CUSUM statistic for each profile through the extracted MFPC score, which will be used to redistribute sampling resources in each sampling step. Section 3.2 merges local statistics into a global statistic to develop the control chart for monitoring multi-profile data under sampling constraints. In Section 3.3, the in-control and out-of-control properties are mathematically formulated and investigated. Finally, Section 3.4 discusses practical guidelines and potential limitations for using the proposed chart.

Local Statistics
According to (5), there are still r observable profiles for constructing the MFPC scores, which we denote as We provide an example of obtaining the marginalized distribution in Figure 1, where we set p = 5 and r = 2. Assuming in the ith sampling step, we have (i) = {2, 5} and (i) = {1, 3, 4}, then the unobservable MFPC scores ξ u i,d will be sampled from marginalized distribution shown in Figure 1.
The rationale of augmenting ξ u i,d by sampling from MVN(0, d )), the property P1 will guarantee the shifted profiles can be observed in the future, and the property P2 provides guarantee on sticking on the shifted profiles after they are observed.
In practice, we can only obtain ξ o i,d and ξ u i,d from the estimated result. We denote the MFPC scores from (5) (Paynabar, Zou, and Qiu 2016).
To construct monitoring statistics for each profile, we propose to aggregate all theξ i,d , d = 1, . . . , d 0 as one index, that is,ξ i = d 0 d=1ξ i,d , which is equivalent to making a linear transformation of allξ i,d , d = 1, . . . , d 0 . We sum all scores up across d so that any projected shifts can be captured. Other forms of aggregation can also be applied to constructξ i , for example, weighted average or choose the largest score.
Theξ i can be further normalized. Note all the ξ i,d , d = 1, . . . , d 0 , are independent in terms of d (as defined in (4)), and the normalization ofξ i can be achieved through dividing each element inξ i by its square root of variance. The variance of each element inξ i can be obtained from the diagonal element in co-variance (j,j) . If we construct a p × p diagonal matrix with elements d 0 d=1σ d,(j,j) − 1 2 from j = 1 to j = p, then the normalized MFPC scores are: Note that each element inξ * i asymptotically follows N(0, 1), andξ * i has standardized correlation matrix Now we can plug the jth element inξ * i into (2) to replace X i,j and obtain the local statistics for the jth profile. Note we set u min = 1.5 in (2), and more discussions about the selection of u min are provided in Section 3.4.
To determine the sensors to be observed in next sampling step, all the local statistics can be ordered in descending sequence: where we randomly assign the sequence if there are multiple local statistics tying with each other. The (i + 1) and (i + 1) can then be updated using (3). To determine whether it needs to take observations from (i + 1), we need to first judge whether the monitoring stops at sampling step i, which depends on the global statistic that will be introduced in the next section.

Global Statistic and Multi-Profile Control Chart based on Adaptive Sampling
We will use multivariate CUSUM (MCUSUM) statistic to construct global statistic and determine stopping time. There are two ways of constructing the MCUSUM global statistic (Pignatiello Jr and Runger 1990): one is to individually calculate the local CUSUM statistic first and combine them into a single quadratic form, and the other is to first calculate local Hoteling T 2 statistic to incorporate the correlation then accumulate on an univariate CUSUM. We will take the first approach of MCUSUM since it naturally fits our local statistics introduced in (2): where W i = [W i,1 , W i,2 , . . . , W i,p ] T , and W i,j is calculated from (1). A similar MCUSUM global statistic was also used in (Nabhan, Mei, and Shi 2021) with the focus on adaptive sampling and monitoring for correlated scalar variables. Consequently, the stopping time for the entire procedure is defined as where h is the control limit specified by the in-control average run length, that is, ARL 0 . In statistical process control, the ARL 0 is defined as the expected number of sampling steps until the control chart signals when all profiles are in control. ARL 0 can be defined as ARL 0 = 1/α (Montgomery 2007), where α is the false alarm rate. By setting α to a specific amount, the control limit h can be achieved by satisfying: where G(i) should be obtained when all profiles are in-control.
In numerical studies and real case study, we obtain h by cutting the 1 − α quantile of the histogram of in-control G(i)'s. Some other methods for identifying h using Markov chain Monte Carlo are introduced in (Liu, Mei, and Shi 2015;Xian, Wang, and Liu 2018). To evaluate the chart performance, the ARL 1 will be calculated. The ARL 1 is the expected number of sampling steps required for first time hitting the h after any of profiles become out-of-control. As a result, under the same ARL 0 , lower ARL 1 represents better performance of a control chart.
To summarize the framework of the proposed multi-profile control chart, detailed procedures of constructing and using the proposed chart are provided in Algorithm 1. We also provide the numerical study code as online supplementary materials.

Properties of the Proposed Method
As we mentioned in Section 2, the proposed multi-profile chart has two important properties that guarantee its performance and practicability, which makes it attractive to engineering practice. In this section, we are going to present the mathematical form of these two properties so that proofs can be facilitated in detail. We want to point out that many existing literature of adaptive sampling based control chart (Liu, Mei, and Shi 2015;Xian, Wang, and Liu 2018;Xian et al. 2021) also have these two properties. However, our work is technically different with those works: Besides the previously mentioned difference in monitoring objects, that is, scalar data versus profile data, the procedures in the proof are also different. This is because most of existing literature conduct augmentation with either independent or fixed tuning parameters. Our work, however, augments the unobservable profiles with a vector of correlated random variables, that is,ξ u i,d , which means the augmented values in each sampling step are changing dynamically with correlations. Comparing to existing data augmentation techniques Algorithm 1 Multi-profile control chart with adaptive sampling strategy Main algorithm: Initialize u min , r, d 0 , and α. Offline stage 1. Use y 1 , . . . , y m 0 to estimateμ(t k ),υ(t k ), andˆ d in (4). 2. Use in-control partially observed data to obtain G(i)s and use (11) (1) and (3). 6. Compute G(i) using (9). 7. Order local CUSUM statistics W i,j , ∀j ∈ P, and update (i + 1) and (i + 1) based on (3). End for adaptive sampling, the unique advantage of our chart is that it requires much fewer tuning parameters before conducting monitoring, which improves the robustness of the proposed chart (as demonstrated later in numerical studies).
We first formulate the in-control property, that is, the adaptive sampling behavior when all the profiles are in-control. P1: Let U denote some profiles j ∈ P that will not be included in the (i) for i greater than some finite sample m 1 , that is, j ∈ (i), ∀i > m 1 . Then as h → ∞, P(U = ∅) → 1, where ∅ represents the empty set.
The proof of P1 is in Section A of the supplementary materials. This property illustrates that although the sampling resources are limited, none of the profiles will be ignored. This indicates that no matter which profile goes out-of-control at any time, it will always be observed. As a result, P1 lays a solid foundation for capturing out-of-control profile(s), which leads to the second property.
P2: Let B = {j ∈ P, E(ξ * i,j ) = 0, ∀i > m 0 } denote the set of out-of-control profiles after sampling step m 0 , where ξ * i,j is the jth element in ξ * i , and ξ * i = d 0 d=1 ξ i,d . Then once any profile j ∈ B is observed at sampling step m 1 (m 1 > m 0 ), that is, j ∈ (m 1 ), there exists a sampling step m 2 , m 2 ≥ m 1 , and a non-empty set B m 2 ⊆ B such that P(B m 2 ⊂ (i) for ∀i ≥ m 2 ) = 1.
The proof of P2 is in Section B of the supplementary materials. This property reveals that there will always be sampling resources allocated to out-of-control profiles after they are observed. We also want to mention that P2 shows it will keep exploring other out-of-control profiles instead of sticking on the observed one, which means the set B m 2 might change. Combining with P1, we can see the out-of-control profiles will be treated as in-control when they are not observed, and they will be eventually observed according to P1. After they are observed, out-ofcontrol profiles will eventually always have sampling resources allocated.

Discussions
In this section, we provide discussions about some practical guidelines and potential limitations for using the proposed chart.
Practical guidelines: There are some parameters to be selected before implementing the chart. Meanwhile, the proposed (4) has its application scenarios. We list the guidelines as follows: • The choice of d 0 can be determined by percentage of total variance explained by the extracted MFPC scores or using cross-validation on Akaike's information criteria (Shibata 1981;Yao, Müller, and Wang 2005). Once the value of d 0 is determined, it will not change across different sampling steps. In our article, d 0 is determined to account for 95% of the total variance. • The choice of u min represents the minimum interested meanshift magnitude. In general, smaller u min works better for smaller mean shifts, and a larger u min helps detect larger mean shifts more quickly since it enables reallocating sampling resources to unobservable sensors more frequently. The value of u min = 1.5 is suggested because it demonstrates good performance under different magnitudes of shift (Xian et al. 2021). • The choice of m 0 is highly related with the d 0 . Intuitively, larger d 0 needs more m 0 to achieve an accurate estimation. The suggested m 0 for d 0 = 4, 8, and 15 is 100, 200, and 400, respectively. More theoretical discussions about the convergence rate for eigenfunctions can be found in Happ and Greven (2018). • Equation (4) works better when all profiles represent the same measuring objects, for example, temperature, but are collected under different environment or locations. The multitonnage readings in Paynabar, Zou, and Qiu (2016) and the multi-temperature readings in our case study fit such setting. Another type of multi-profile data is collected from different measuring objects, for example, temperature and vibration, but in the same environment, which can be modeled by another decomposition in Ramsay and Silverman (2008). When the magnitudes of different profiles vary significantly, normalization can be first applied before using (4).
Potential limitations: There are also some limitations worth attention when using the proposed chart in practice.
• Local statistics aggregation. In Section 3.1, we propose to aggregate local statistics from different MFPCA components Although such aggregation considers the projections in all d 0 components, it might suffer a "canceling out" issue when detecting certain types of shifts. For example, if the jth profile has shift υ d (t) − υ d (t), then the estimated scoresξ i,j,d will increase one andξ i,j,d will decrease one. In this case, summing them together will not cause changes in ξ i . Although such shift happens rarely in real applications, our numerical studies (not reported here) show that such shift indeed impairs the detection power. In practice, if we know in advance the shift pattern is in the form of υ d (t) − υ d (t), we can choose alternative aggregation methods. For example, we can just use the score with the largest eigenvalue, that is, ξ i,1 . • Detection of correlation shift. Our method focuses on detecting partial or full mean shift in any profile. It also has the potential to detect the correlation shift, that is, shift in d . However, unlike the mean shift causing increase in W i,j , it is unclear how shifts in d influence the distribution of W i,j . In other words, the shifts in d might make the distribution of W i,j highly skewed, which fails the detection. As a result, it should be cautious when applying the method in detecting correlation shift. This is actually a balance between detection performance and feasibility of the algorithm. We provide detailed discussions about this concern in Section C of the supplementary materials.
• The assumptions listed in A1. Although many processes might satisfy this assumption, nonstationary processes are also widely observed in practice. The stationarity among profile samples can be tested by treating each sample as a functional time series (Horváth, Kokoszka, and Rice 2014). We also noticed some recent works that relax the stationary assumption in the context of adaptive sampling. We provided a more detailed discussion on this point in Section D of the supplementary materials.

General Setting
To validate the effectiveness of the proposed chart and investigate its performance in detecting shifts under sampling constraints, we select two types of benchmarks to compare with it. Our motivation for selecting benchmarks is to demonstrate the unique advantages of the proposed chart in characterizing within-and-between profile correlation and augmenting profile data, which are two major challenges introduced in Section 1. The first benchmark is to apply functional principal component analysis (FPCA) instead of MFPCA when modeling profile data, which is denoted as "FPCA chart. " The only difference between the FPCA chart and the proposed chart is that the FPCA chart models each profile independently, that is, ignore the between-profile correlation. In this case, the augmented samples for unobservable profiles are from iid distributions instead of correlated distributions. The comparison with this benchmark can demonstrate the importance of capturing between-profile correlation when monitoring multi-profile data under sampling constraints.
The second benchmark is built upon the first benchmark, where the data augmentation is further simplified by assigning a fixed tuning parameter instead of sampling from null distributions. It is clear that performance of such charts depends on the value of and might change for different profile settings. We denote this methods as "FPCA chart" and set three different values of , that is, = 0.01, 0.05, and 0.1. Note that the augmented values in this benchmark are always positive, which means not all the local statistics should be used in calculating the global statistic. Otherwise, the global statistic will keep increasing even under in-control status. To handle this issue, it is suggested adding up only a small number of large value local statistics to construct the global statistic (Liu, Mei, and Shi 2015). The exact number of local statistics should be as close to the number of out-of-control sensors as possible and smaller than the number of available sampling resources (Liu, Mei, and Shi 2015;Xian, Wang, and Liu 2018). In our numerical setting, we choose to add up top 4 local statistics to construct the global statistics.
Two different profile models are provided for generating data to compare performance among these charts. The general form of the profile data is y i ∈ R 50×20 and y i ( where there are 20 profiles, that is, p = 20, and each profile is collected at 50 points, that is, l = 50. The υ d (t k ), k = 1, 2, . . . , 50, are selected as the dth B-spline basis function of order 3 that has 50 equal spaced knot in [0, 1]. Two different correlation structures of ξ i,d are specified as follows: 1. Model I: ξ i,d follows a 20-dimensional multivariate normal distribution MVN(0, C 1 ). The jth row j th column element of C 1 is set as 0.8 |j−j | , j, j = 1, 2, . . . , 20. 2. Model II: ξ i,d follows a 20-dimensional multivariate normal distribution MVN(0, C 2 ). The jth row j th column element of C 2 is set as exp(− (j−j ) 2 p ), j, j = 1, 2, . . . , 20.

Numerical Study Results
We first demonstrate the ARL 1 for partial shift. The results are in Figure 2, where 500 repetitions are conducted to report the ARL 1 . The x-axis in Figure 2 represents different shift magnitude, and the y-axis represents the value of ARL 1 . We also provide tables of all numerical study results in Section E of supplementary materials, where the standard errors of ARL 1 are also demonstrated.
Based on observations of Figure 2, we conclude some discussions as follows: • The proposed chart performs the best among all methods, especially when the shift is small. This is due to the consideration of correlation among profiles when implementing the augmentation for unobservable sensors. • The ARL 1 is trending smaller when either δ or r becomes larger. This is intuitively expected since larger shift makes the anomaly easier to be detected and larger number resources make the abnormal sensors easier to be observed. • The proposed method outperforms the FPCA chart, which demonstrates the importance of incorporating betweenprofile correlation. • The performance of FPCA charts and the FPCA chart is similar. A detailed observation of the exact values of the ARL 1 (refer to the Section E in the supplementary materials) also demonstrates that the performance of FPCA chart is not stable and depends on the value of , r, and δ. This shows the limitation of FPCA method and poses difficulties in choosing the appropriate value when applying the method for different combinations of r and δ.
The comparison results for ARL 1 under full shifts are provided in Figure 3. Note we adjust the shift amounts δ so that the ARL 1 can be in a reasonable range for comparison purposes. It is clear the full shift provides a smaller ARL 1 than the partial shift when detecting the same shift amount. We can observe the proposed chart outperforms benchmarks in all cases in Figure 3. We also want to point out that the FPCA chart provides comparable results in Figure 3(a). We believe it is because the combination of model, shifts, and sampling resource fits the chosen value of in this scenario. We report them to further illustrate the sensitivity of the FPCA chart and the robustness of the proposed chart.
In addition to the performance of ARL 1 , it is also interesting to demonstrate the trajectory of local statistics when outof-control happens. Figure 4 demonstrates the index of four sensors with the top 4 local statistics in each sampling step of our proposed method. The profiles are generated from Model II, and sensors j ∈ {3, 4, 5, 8, 13} are set to have full shift of magnitude δ = 0.4. The number of observable profiles is 4, that is, r = 4. The CUSUM chart raises global alarm at RL = 29. Figure 4 shows starting from the 23rd sampling step, all sampling resources are concentrated at sensors 3, 4, 5, and 13. These four sensors are all out-of-control sensors. This clearly demonstrates the tracking capability of the proposed method, which also validates the property P2 that the available sampling resource will stick on out-of-control sensors eventually.   Finally, we report the computation load for the numerical studies. All numerical studies were run by R version 4.1.2 on a Windows 10 operating system with Intel(R) Core(TM) i9-10980XE CPU @ 3.00 GHz and 64 GB RAM. During the training stages, we need to use partially observable in-control profile samples to establish global statistics threshold. The computation time for the proposed method, FPCA, FPCA = 0.01, 0.05, and 0.1 are 2.32, 2.22, 2.05, 2.06, and 2.05 min, respectively. During the online detection stage, the ARL 1 and the detection time are affected by the number of observable resource, magnitude of mean shifts, and number of out-control sensors etc. Thus, it is more informative to report the time taken between consecutive sampling steps. When r = 6, the time between two sampling steps, including computing monitoring statistics and identifying next sampling sensors, for proposed method, FPCA, FPCA = 0.01, 0.05, and 0.1 are 7.5, 6.5, 6.3, 6.4, and 6.2 milliseconds, respectively.
More numerical studies and discussions are provided in Section F of the supplementary materials, including the performance when the covariance of ξ i,d is an identity matrix and when different numbers of out-of-control profiles present.

Case Study
In this section, we validate the proposed multi-profile adaptive sampling chart using multi-sensor temperature profiles recorded in roasting machine. The data is publicly available on Kaggle, and we provide the access to data in supplementary materials.
The roasting machine consists of 5 chambers, and each has a set of three sensors to put in specific locations to obtain temperature profile. Note the sensing locations are away from each other so that three sensors in a chamber can record different temperature profiles. As a result, 15 temperature profiles (p = 15) can be collected in one roasting cycle, that is, 1 hr. The data collection frequency is 1 temperature data point per minute, which provides 60 data points in one profile, that is, l = 60. After one roasting cycle is finished, all 60 data points in each sensor will be transmitted wireless simultaneously. The temperature profile data is critical to the quality of the final product. In ideal case, it requires 15 temperature sensors to provide fully collected temperature profiles in one roasting cycle, and we show 6 such roasting cycles in Figure 5, where each cycle is represented by a color. It is clear that the profiles have strong within-and-between profile correlations. where we characterize the between-profile correlation by demonstrating the correlation matrix among 15 profiles in Figure 6. In the multi-profile monitoring, it is important that all the profile data are synchronized to the same starting time (Zhang et al. 2018), which is especially critical for MFPCA since it requires that samples from different sensors are synchronized (assumption A2). However, the synchronization error increases at least linearly with the number of sensors in the network, which is further accompanied with noise in the wireless communication environment (Kim et al. 2012). Common solutions to facilitating synchronization in wireless sensing network is to use commercial synchronization server. But the cost for such servers is high, for example, one SyncServer S600 from Microsemi® costs more than $8000. By using the adaptive sampling strategy, we can significantly reduce the number of real-time acquired sensors, thus, reduce the number of such servers used. In this case study, we can use only one set of sensors, that is, r = 3, to monitor the quality of product.
The quality of the final products is measured by overall scores that evaluate the industry standards of size and hardness of products. We select the products with top 10th percentile scores (6000 products) as in-control samples, where m 0 = 200 samples are used to estimate the parameters, and the rest 5800 samples are for setting threshold h (ARL 0 = 100). The products with bottom 20th percentile scores are regarded as defective products, which represent out-of-control status. Figure 7 compares the out-of-control detection performance among five methods in one out-of-control run, where 25 out-ofcontrol products (samples) are used. Note the algorithm settings are the same as those in Section 4. It can be seen that the global   (1.80), respectively. As a result, the case study further validates the effectiveness of the proposed multi-profile chart based on adaptive sampling. We also provide the trajectory tracking of local statistics and boxplot of ARL 1 of five methods in Section G of the supplementary materials.

Conclusion
In this article, we propose a multi-profile monitoring framework under sampling constraints. The feature of the framework is that it incorporates the within-and-between profile correlation by MFPCA to help detect the shift more efficiently. The key to the proposed framework is to transform the unobservable profiles into unobservable MFPC scores so that the augmentation can be conducted in a MCUSUM chart. This is a significant difference with existing works that focus on scalar data monitoring. The in-control and out-of-control properties of the proposed framework are investigated to provide theoretical guarantee for the monitoring performance. Numerical studies are conducted under different combinations of data generation models, resource availability, and shift patterns to provide comprehensive evaluation and comparison of the proposed framework. A case study of monitoring temperature in roasting machine is also conducted. Both numerical and case studies demonstrate the effectiveness and robustness of the method.
Several open topics can be further studied based on the proposed framework. First, the propose method still needs m 0 fully recorded training samples. Although such samples can be obtained in certain scenarios, for example, system maintenance, the relaxation of this requirement can further improve the practicability of the method. A straightforward solution to relax these requirements is to estimate the MFPCA parameters using incomplete profile data (Rošt'áková and Rosipal 2017). In this way, the proposed method will become a self-starting scheme and has the potential to sequentially update correlation information after acquiring more incoming data. However, the consistency of such estimators still needs to be investigated to satisfy the condition of the two properties. Another interesting topic would be relaxing the assumption A1 to allow nonstationary and temporally dependent profiles. The idea of tensor completion applied in multi-variate scalar data (Gómez, Li, and Paynabar 2022) might be a straightforward way to facilitate this relaxation. But as we discussed in Section D of the supplementary materials, the method in Gómez, Li, and Paynabar (2022) requires smooth change of nonstationary data, which poses difficulties and limitations for applications in the context of multi-profile data. Moreover, the dimension of tensors and the number of parameters will increase significantly when applied to multi-profile setting. As a result, its accuracy of parameter estimation and the effectiveness in real-time monitoring still need to be carefully investigated. We will study these topics in our future work.

Supplementary Materials
There are two supplementary materials for this article: (i) a R code file that contains comprehensive procedures to reproduce the numerical study results and the link to real case study data; and (ii) a description file that contains the proof of properties P1 and P2, discussions on the proposed augmentation method and other state-of-art methods, and additional results of numerical and case studies.