Monitoring wafers’ geometric quality using an additive Gaussian process model

ABSTRACT The geometric quality of a wafer is an important quality characteristic in the semiconductor industry. However, it is difficult to monitor this characteristic during the manufacturing process due to the challenges created by the complexity of the data structure. In this article, we propose an Additive Gaussian Process (AGP) model to approximate a standard geometric profile of a wafer while quantifying the deviations from the standard when a manufacturing process is in an in-control state. Based on the AGP model, two statistical tests are developed to determine whether or not a newly produced wafer is conforming. We have conducted extensive numerical simulations and real case studies, the results of which indicate that our proposed method is effective and has potentially wide application.


Introduction
A wafer's geometric quality, which can be manifested by the thickness, roughness, or flatness profile of the entire surface layer, is an important quality feature in the semiconductor industry. More specifically, in the manufacture of electronic chips, a silicon ingot is usually sliced into sections using wire saws. After several flattening steps, including lapping, polishing, and cleaning, the wafers are sent to front-end and back-end processes to form the final chips (O'Mara et al., 1990). An undesired geometric quality often results in a large number of defective dies on the wafer during front-end processes (Doering and Nishi, 2007), causing production delays or economic loss. Due to the importance of geometric quality, people in the semiconductor industry are looking for effective methods to monitor and control quality.
Statistical testing methods enable us to quantitatively monitor quality characteristics. Prior to conducting statistical tests, several preparation procedures, such as data sampling and data modeling, may be applied to help construct the tests. In industrial practice, engineers have developed relatively systematic approaches to determine the geometric conformity of sliced wafers. As shown in Fig. 1, either the contact (using mechanical probe) or non-contact (using capacitance probe or wavelength scanning interferometer) method is able to produce numerous measurements on a single wafer that contain rich information about the geometric quality.
Then several indicators are derived from these measurements to measure the geometric quality based on the International Technology Roadmap for Semiconductors. These indicators include total thickness variation, non-linear thickness variation, bow, warp, and sori (Schmitz et al., 2003). Despite their importance in safeguarding the geometric quality, these summary indicators cannot provide a comprehensive view of CONTACT Nan Chen isecn@nus.edu.sg Supplemental data for this article can be accessed on the publisher's website at http://www.tandfonline.com/uiie. the geometric quality for several reasons. First, the aggregated indicators are usually summary statistics, which lose the majority of the rich information the metrology equipment may provide. Second, although the aggregated indicators are effective in screening out nonconforming units, the efficiency of the indicators for identifying process changes is usually unsatisfactory. Jin et al. (2012) reported that the contact method may take more than 8 hours to measure a typical batch (400 in one production run) of wafers. The non-contact method could take an even longer time. Third, and more important, when quality deterioration is noticed from the aggregated indicators, they cannot provide detailed insight about the failure patterns or root causes due to their loss of measurement information. Therefore, a more systematic and efficient method to utilize these data to model and monitor the geometric quality of a wafer is desired. However, there are several difficulties that make this task a challenge.
First, as demonstrated in Fig. 1, the thickness profile is rather complex. No simple patterns or trends can be visually identified, and it is difficult to accurately model using some parametric functions. As a result, traditional profile monitoring techniques Zou, Zhou, Wang, and Tsung, 2007;Jensen et al., 2008) that approximate the profile by a parametric function and then monitor the parameter vector are difficult to apply in this case. Second, the measurement locations on different wafers may not be perfectly aligned due to different crystal orientations of the ingots and wafer rotation during the measurement. Therefore, conventional multivariate monitoring schemes such as a T 2 chart are not suitable as the variables being monitored are essentially varying from one wafer to another. Third, the measurements are spatially correlated due to the similar conditions experienced by physically adjacent points. As a result, methods with the assumptions that errors are independently and identically distributed (i.i.d.) are no longer applicable. Last, but by no means least, not only do changes in the mean or variance of the deviations the reflect potential process shifts, but the changes in its spatial correlation are also a symptom of unexpected process shifts and should be monitored. Therefore, we need a comprehensive monitoring scheme that is effective in detecting all types of changes that may occur in the complex geometric data.
Due to these challenges, only a limited number of papers have been published. In particular, Jin et al. (2012) suggested using a Gaussian process to model the thickness profile of the entire wafer. To speed up the process, they proposed a sequential measurement strategy that adaptively determines the next measurement locations. Using their method, only a small set of measurements needs to be taken, and then a Gaussian process model is built to accurately characterize the entire geometric profile. As a result, the measurement time can be significantly reduced. Despite its importance, their method was developed for measuring a single wafer. Therefore, it is not suitable for quality monitoring, as each geometric profile is modeled as an independent Gaussian process, and there are no statistical rules to determine whether or not the fitted Gaussian process is in control. Zhao et al. (2011) proposed a partial differential equation-constrained Gaussian process model to predict the wafer thickness profile. The model integrates physical knowledge of the slicing process and the observed data to better characterize the geometric quality. However, their method also focuses on modeling a single wafer and lacks a quantification of the variations when the manufacturing process is in control.
Although the aforementioned works did not solve the problem, they have demonstrated that the Gaussian process is a suitable model for spatially correlated data (Cressie, 1993), and it can also characterize complex geometric profiles. In addition, compared with other non-parametric methods such as B-splines (e.g., De Boor (2001)) or kernel smoothing (e.g., Hastie and Loader (1993)), the Gaussian process is much easier to extend to higher input dimensions when the manufacturing process involves other controllable or uncontrollable variables. One thing to be noted is that a single Gaussian process model may not help to detect point-wise deviation between two profiles since they can be independent realizations generated from a single Gaussian process. However, in the early stage of wafer manufacturing, the point-wise deviation is not crucial. Instead, it is the spatial pattern of the surface that influences the downstream manufacturing. For example, Fig. 2 shows two curves (solid line and dashed line representing two profiles) that are point-wise different. However, in terms of their impact on the downstream quality, they are indistinguishable because their spatial patterns are the same. On the other hand, we also want to make sure that the surface does not significantly deviate from the desired profile. Therefore, considering the advantages of the Gaussian process and targeting the limitations in existing works, we propose an Additive Gaussian Process (AGP) model to characterize the geometric profile of a wafer using data measured on a group of wafers. The AGP model is composed of two independent Gaussian processes with different covariance structures. The first Gaussian process is used to approximate the unknown desired (or standard) geometric profile, whereas the second one is used to quantify the "distribution" of spatially correlated deviations from the standard profile when the manufacturing process is in control. By using this approach to construct the model, we are able to detect point-wise changes in the standard profile and spatial pattern changes in the deviations. We would like to highlight that we are not the only ones proposing this "additive" concept. Ba and Joseph (2012) proposed a similar structure called the Composite Gaussian Process (CGP) model. However, the CGP model is mainly focused on modeling non-stationary output data from computer simulations, which is different from our AGP model. A detailed discussion on the differences between the CGP and AGP models will be provided in the next section. Since our AGP model considers a group of geometric profiles collaboratively, it allows distinct measurement locations on different wafers. In addition, only a small set of measurements need to be taken from each wafer. Therefore, compared with traditional methods, the required measurement time is significantly shortened. Based on the AGP model, we also develop two statistical monitoring methods, the T 2 chart and the Generalized Likelihood Ratio (GLR) chart, to rigorously analyze whether a tested wafer conforms to the standard within an acceptable variation. The proposed monitoring schemes are able to detect different patterns in changes on the geometric profile including mean shifts, variance shifts, and correlation shifts.
The remainder of this article is organized as follows: Section 2 formulates the problem and introduces the AGP model used to model the standard geometric profile of a wafer and its deviation; Section 3 describes the statistical monitoring schemes we propos to monitor the geometric conformity of a wafer; Section 4 uses extensive simulation studies to investigate the performance of the proposed method; Section 5 presents an application in wafer manufacturing, which monitors the thickness of sliced wafers; Section 6 concludes the article and discusses future directions.

Problem formulation and notation
We assume a group of N 0 wafers have been produced when the manufacturing process was in control. On the ith wafer, we take measurements at n i different locations where x denotes the two-dimensional coordinate on the wafer.
We use the function f (x) : R 2 → R to denote the standard or desired quality value (e.g., thickness, roughness, or flatness) we expect at location x. Due to different sources of variation in the process, each produced wafer can be modeled as the summation of a standard profile f (x) and a random error; that is, where i (x i j ) is the deviation of the quality measurement at location x i j on wafer i from the standard value f (x i j ), including both process variations and measurement errors. Different from conventional models, in Equation (1) i (x i j ) and i (x ik ), k = j are typically correlated because points on the same wafer undergo similar processing conditions, which induces inherent spatial correlations of the deviations between locations x i j and x ik . In contrast, i (x i j ) and i (x i k ), i = i can be considered as independent from each other because different wafers are produced independently. Figure 3 uses a one-dimensional example to demonstrate our notation. When a new wafer is produced, we take measurements at locations X l ≡ [x l1 , x l2 , · · · , x ln l ] T with values Y l ≡ [y l1 , y l2 , · · · , y ln l ] T . Using the quality measurements (X l , Y l ), we want to develop a systematic monitoring scheme to detect whether or not the manufacturing process is in control. If abnormal deviations from the standard are discovered, appropriate diagnostic and corrective actions need to be taken to improve the process quality.

Gaussian process regression
Gaussian process regression is a popular model in spatial statistics (Cressie, 1993), and it also plays an important role in metamodeling to approximate complex functions (see, e.g., Sacks et al.  process regression model can be expressed as where λ(x) is a deterministic function, and Z(x) is realization of a Gaussian process with zero mean. The Gaussian process can be viewed as a distribution over a set of continuous functions, and any finite samples from the Gaussian process follow a multivariate normal distribution. As a result, [y( is a positive-definite kernel function. Commonly used kernel functions include the squared exponential function, Matérn functions, etc. (Rasmussen and Williams, 2006). The Gaussian process is attracting increasing interest due to its high levels of flexibility and conceptual simplicity. More important, it can provide a unique statistical view on the prediction errors. This feature also makes it useful in simulation optimization (Jones et al., 1998;Huang et al., 2006) and sequential samplings and experimental design (Jin et al., 2012). We also want to highlight that Equation (2) is essentially a nonparametric model. In other words, only knowing the parameters of λ(x) and Z(x) is insufficient to completely determine y(x). The observed data (whether noisy or not) are also required to provide meaningful predictions.
To demonstrate this, Fig. 4 illustrates two independent realizations of the same Gaussian process with radically different characteristics. As a result, simply monitoring the parameters of the Gaussian process models that are fitted for each individual wafer is insufficient to effectively detect changes in the geometric profiles.

AGP model
In Section 2.1, we use f (x) to represent the standard geometric profile as the desired or designed output from the process. However, the exact function is often unknown and needs to be estimated from historical data. Considering the flexibility requirement on approximation and the characteristics of spatial correlation, in this article we propose an AGP model to quantify the geometric variations when the process is in control. More specifically, we assume f (x) is a realization of the Gaussian process with mean μ and covariance function where θ 1 is the two-dimensional correlation parameter, and diag(θ 1 ) is the diagonal matrix with diagonal vector θ 1 . For demonstration purposes, in this article we use the squared exponential covariance function. Other covariance functions suggested in Rasmussen and Williams (2006) can also be easily applied in the AGP model. In addition, to model the spatial correlation in i (x), we assume i (x) i = 1, 2, · · · , N 0 , are independent realizations of another Gaussian process with mean zero and covariance function As a result, the observed quality measurements are simply the sum of the realizations of two Gaussian processes, which we refer to as the AGP.
Using the measurement data from in-control wafers, we can approximate the standard profile f (x) and quantify the amount of variation i (x), this provides us with a baseline to monitor newly manufactured wafers. Combining all of the in-control measurements, we denote n i is the total number of measurements from all in-control wafers. Also, we denote β ≡ [μ, σ 2 , θ 1 , τ 2 , θ 2 ] as the entire set of parameters in the AGP model. Given β, we note that Y IC follows a multivariate normal distribution based on the property of the Gaussian process, with joint density function where 1 p is a p × 1 vector with all ones, and 0 is the Based on the AGP model, the covariance between elements of Y IC takes the form because the deviation i (x) is assumed to be independent from j (x), j = i, and within one wafer i (x i j ) and i (x ik ) are spatially correlated. Because of this special structure, 0 is in fact the sum of two covariance matrices, as illustrated in Fig. 5.
Given the in-control measurements Y IC , we can compute the distribution of the measurements at any location on a new wafer if the process is still in control. In particular, the measurements Y l at location X l on a new wafer follow a jointly normal distribution with Y IC : where l,0 is the n l × M 0 covariance matrix between Y l and Y IC . Since l (x) is independent from previous deviations, the elements of l,0 are simply s(x l j , , ∀ j, k = 1, 2, · · · , n l . Following Equation (7), the conditional distribution of Y l given Y IC still follows a multivariate normal distribution with mean vector and covariance matrix In other words, when the process is in control, we expect Y l to follow the normal distribution with meanμ l and variance˜ l . We can use this information to develop monitoring statistics to detect changes in the process. If β is unknown, it can be substituted by the estimated valuê β, which can be obtained by maximizing the log-likelihood function (up to a constant) of the in-control samples: In the case of two-dimensional location data, β has seven dimensions, and direct optimization of β might be difficult.
In Appendix A, we propose the maximum profile likelihood method that reduces the dimension of β and has better numerical stability.
Remark 1. For general Gaussian process regression, the inverse of the covariance matrix can become numerically unstable when the sample size n is large. In addition, the computational complexity is of the order of O(n 3 ), which significantly increases as n increases. These problems are well recognized in the literature. On the other hand, because of the many nice properties of the Gaussian process, there have been significant developments on the large-scale computation of the Gaussian processes (Cressie and Johannesson, 2008;Haaland and Qian, 2011;Ranjan et al., 2011). These improvements have extended its application to very large datasets. However, in our application, we do not require a large data set to construct the AGP model. Also, as shown in Fig. 5, our covariance matrix contains block diagonal components. This structural advantage can also help to improve the numerical stability. Moreover, given the in-control sample data, our AGP model estimation the most expensive computations, such as inverting 0 , only need to be performed once. All subsequent predictions only involve matrix multiplication. This further improves the computational stability.
Remark 2. It is also interesting to note that our AGP model is indeed different from the CGP model proposed by Ba and Joseph (2012), although they look similar to each other. The CGP model is mainly focused on tackling non-stationary simulation output. The authors use two Gaussian process covariance structures, one to model global correlation and one to model local correlation. Even though the resulting Gaussian process is still stationary, it performs much better in modeling heterogeneous simulation output compared with a single covariance structure. In fact, similar ideas were used in Haaland and Qian (2011), where even more layers of covariance structures were used to improve the accuracy. However, both models are used to approximate a single realization of a Gaussian process. In contrast, our model was motivated by a radically different setup. In our model, each surface corresponds to a different realization of the underlying Gaussian Process. As a result, the first component of the Gaussian Process is used to characterize the shared mean surface, whereas the second component of the Gaussian Process reflects the characteristics of the deviation surface. Different from the CGP model, which accepts measurements from one surface (realization) as input and finds covariance functions for global and local correlations, the AGP model needs measurements from multiple surfaces (realizations). It then estimates the common mean function and the distribution of the deviations from the mean. In summary, our AGP model is different from the CGP model in both motivation and mathematical details.

Statistical monitoring of geometric quality
The AGP model provides a quantification of the geometric profiles when the process is in control. Based on the model predictions, we can further setup control charts to monitor the geometric quality. In this article, we only consider simple Shewhart-type control charts. In other words, each new wafer is tested independently without information aggregation as in the Cumulative Sum (CUSUM) chart or Exponentially Weighted Moving Average (EWMA) chart. As a result, studying the Average Run Length (ARL) of the charts is equivalent to studying the α, β errors of the statistical testing procedure.

T 2 test
As previously mentioned, conditioned on the historical incontrol measurements Y IC , the measurements on a new wafer follow a multivariate normal distribution if the process is in control. A natural choice to test whether or not Y l conforms with the predicted distribution can be stated as A commonly used test statistic is the T 2 statistic: , which follows χ 2 n l when H 0 is true (the process is in control). When T 2 l is larger than the control limit H T , we can reject H 0 (the process is out of control). The control limit can be determined such that the α error of the T 2 test meets a specified value ARL 0 .
When the number of measurements taken from each wafer are different, the distribution of T 2 l also varies according to n l . In this case, we can use the p-value of the T 2 statistic as the monitoring statistic p l = 1 − F χ 2 (T 2 l |ν), where F χ 2 (·|ν) denotes the Cumulative Distribution Function (CDF) of the χ 2 ν distribution with ν degrees of freedom. When p l is smaller than a given limit H p , we can declare that the process is out of control.
Remark 3. In this article, we focus on Phase II analysis; i.e., we assume the parameters of the AGP model are known or have already been accurately estimated. In this case, the test statistics have an exact χ 2 distribution. Unfortunately, when the parameters of the model are unknown and need to be estimated from limited samples, the test statistics do not have known distributions. This issue is beyond the scope of this article, and we will investigate it in our future work.

GLR test
Despite the simplicity of the T 2 test, it is designed to detect omnibus changes. However, in our application, when changes are detected, we may want to further analyze the root causes of these changes. Therefore, the specific change types are expected to be known. Based on engineering knowledge, there are three typical change scenarios in surface fabrication: mean shift, variance change, and roughness change. Figure 6 demonstrates these change scenarios using one-dimensional curves as an example. Under this circumstance, a proposed GLR test that is able to provide change type information can be applied.
In this section, we illustrate the procedure using one example involving multiple types of changes. In more detail, we assume that when the process is out of control, another geometric deviation is added to the model (1), leading to where ξ l (x) denotes the additional geometric deviation due to the out-of-control manufacturing process. Again, without prior knowledge on the forms of the deviation, we assume that it is an independent realization of another Gaussian process with mean δ and covariance function w(x l j , , the structure of which is consistent with Equations (3) and (4). More important, each parameter of this new Gaussian process component corresponds to different scenarios in Fig. 6. For example, the mean shifts lead to a non-zero δ; increased variance leads to larger γ 2 , etc. For notational simplicity, we use w that depends on θ l , γ 2 to represent the covariance matrix of ξ l (x) evaluated at locations X l . According to this assumption, to test whether or not the wafer is in conformance is equivalent to testing whether or not the deviation ξ l (x) is significantly different from zero. In other words, the hypothesis can be stated as for some non-zeroδ, γ 2 , θ l .
Consequently, the GLR statistic in this context can be expressed as To find the distribution of R l when H 0 is true (process is in control), we can reformulate the hypothesis as It is noted that when γ 2 = 0, θ l is meaningless and does not need to appear in H 0 . In other words, we only require the nonnegative θ l in both H 0 and H 1 . Since the condition that γ 2 = 0 in H 0 is on the boundary of the parameter space, R l approximately follows a 50{: 50% mixture of χ 2 1 and χ 2 2 distribution when n l is large according to Self and Liang (1987). In other words, the CDF of R l can be expressed as It is noted that the distribution of R l does not depend on n l or X l . Therefore, the same control limit can be used regardless of the number or locations of the measurements. When R l is larger than the control limit H R , we can reject H 0 (the process is out of control).

Remark 4.
When an out-of-control signal is received, parameters (δ, γ 2 , θ l ) in statistic R l can be used to diagnose the specific type of change. The GLR test generally performs well when the changes from H 0 are sufficiently characterized by the alternative hypothesis. However, when the changes are different from the types stated in the alternative hypothesis, the performance of the GLR test might not be satisfactory. The statistic (11) is only one possible choice of the GLR statistic and is designed to characterize shifts occuring globally on the surface. If the process engineers have relevant knowledge on the shift patterns and regions, a more specific alternative hypothesis could be used in Equation (10), and the GLR method can still be applied based on the new hypothesis.
Remark 5. Generally speaking, the T 2 test does not require prior knowledge on the possible shift patterns. It is suitable when knowledge on shift patterns is limited. On the other hand, the GLR test is developed to detect specific types of change determined through the alternative hypothesis. A specifically designed GLR test is expected to be more powerful than a T 2 test toward certain types of changes. Unfortunately, the increase in its detection power comes at a price: it is not robust against other types of changes. As a result, which test to use largely depends on the practical problems and available information.

Simulation studies
In this section, we present some simulation studies to demonstrate the effectiveness of our proposed method. We first illustrate that the AGP model is indeed effective in approximating complex profiles, and the estimation procedure is numerically  stable. Then we further analyze the performance of the statistical monitoring schemes based on the AGP model. For easier illustration, we use a one-dimensional curve instead of a two-dimensional geometric profile in the demonstrations in this section.

Approximation by the AGP and its estimation performance
We first demonstrate that the AGP model is sufficient to approximate the complex standard profile and quantify the in-control variations from a group of samples with spatially correlated errors. In the simulation, we use a one-dimensional function (Shpak, 1995) as the standard curve. The spatially correlated error ϵ(x) was generated from a Gaussian process with mean η = 0, τ 2 = 0.05, and θ 2 = 5 in its covariance function (4). We generated N 0 = 10 in-control curves, with n i = 10 measurements taken from each curve. The measurement locations were randomly selected  according to a Latin Hypercube Sampling (LHS) strategy. We used the maximum profile likelihood method to estimate the parameterβ for the AGP model and then predict y at different locations.
As a comparison, we use a Gaussian process with noisy observations (Rasmussen and Williams, 2006), which we call the Noisy Gaussian Process (NGP) model in this article. The NGP model assumes y i j = f (x i j ) + , i = 1, 2, · · · , N 0 ; j = 1, 2, · · · , n i to fit the data. More specifically, it still uses a Gaussian process to approximate the standard profile f (x i j ) but it simply uses an i.i.d. noise to model the deviations between individual samples and the standard profile. Figure 7 compares the predicted means and covariances at different locations using both AGP and NGP models.
It is shown in Fig. 7(a) that the difference between the mean predictions from AGP and NGP is not significant. Both predictions are very close to the exact function. More quantitatively, the mean prediction from the AGP model has an Integrated Mean Squared Error (IMSE) of 0.0052, whereas the IMSE of NGP is 0.0077. However, when predicting covariance, Fig. 7(d) clearly shows that the NGP model failed to predict the correct structure. This is simply because our simulation model includes a correlated noise, whereas the NGP model with an i.i.d. noise assumption will no longer closely fit the simulated data. In contrast, covariance prediction from AGP (Fig. 7(c)) is much closer to the exact case ( Fig. 7(b)). This comparison demonstrates that although the NGP is equally effective in mean prediction, our AGP model is overall more appropriate to model complex profiles with spatially correlated deviations.
We also conducted extensive simulations to show that the estimation procedure of the AGP model is accurate and numerically stable. In each simulation replication, we generated N 0 curves from an AGP model with parameters β ≡ [μ = 1, σ 2 = 0.2, θ 1 = 3, τ 2 = 0.05, θ 2 = 10]. n 0 measurements were taken from each curve based on the LHS strategy. From these data, the AGP parameters were estimated using the profile Maximum Likelihood Estimator (MLE) as presented in Appendix A. We repeated this procedure for K = 1000 times for different pairs of (N 0 , n 0 ) and calculated the bias and Root Mean Squared Error (RMSE) of each component insideβ. Using μ as an example: where i is the replication index. Calculations for the other components followed the same manner. The numerical results are summarized in Table 1. It clearly shows that increasing the sample size, either larger N 0 or larger n 0 , can generally reduce the bias and RMSE. In addition, for the same sample size N 0 × n 0 , it is more helpful in terms of estimation to have a larger n 0 rather than a larger N 0 . Table 1 also reveals that the parameters of the second Gaussian process component are much easier to estimate than that of the first one. In addition, it is expected that a more sophisticated selection method of the measurement locations could further improve the estimation performance.

... Known in-control parameters β
In this section, we further investigate the performance of the statistical monitoring schemes proposed in Section 3. We start by studying the charting performance when the standard function f(x) and parameters of ϵ(x) are known exactly. In this case, when the process is in control, the measurements Y l at locations X l follow normal distribution with mean f l ≡ [ f (x l1 ), f (x l2 ), · · · , f (x ln l )] T and covariance matrix l = [v(x li , x l j |θ 2 )] n l ×n l . It is noted that since f(x) is exactly known, the first Gaussian process component in the AGP model is not needed, and l is obtained using a single covariance function. Consequently, the T 2 statistic becomes Similarly, the GLR statistic can be simplified as For comparison, we also consider another simple method called the Max-Min statistic, which is currently used in the semiconductor industry to monitor the thickness profile of a wafer. The Max-Min statistic calculates the difference between the largest and smallest measurements among all measured locations on each wafer. When the difference exceeds a certain limit H M , the process is considered to be out of control.
In this simulation, we used the same standard curve (12) and noise process as in Section 4.1. For each wafer to be tested, we randomly selected 20 locations to measure based on the LHS strategy. The control limits of the T 2 test and the GLR test can be analytically obtained from the χ 2 20 distribution and the 50{: 50% mixture of χ 2 1 and χ 2 2 distribution, respectively. However, the control limit of the Max-Min statistic can only be obtained through simulation. In the simulation, we chose the control limits such that the alpha error α = 0.01, and the corresponding control limits were H T = 37.57, H R = 8.27, and H M = 5.39.
Using these control limits, we compared the performance of the three tests in detecting different types of changes, including mean shifts, variance shifts, and correlation shifts. This translates to the changes in η, τ 2 , θ 2 from the values listed in Section 4.1. In each change scenario, we estimated the β error of the tests using 20 000 simulation replications. The Operation Characteristic (OC) curves of different types of shifts are shown in Fig. 8.
It shows that both the T 2 test and GLR test can effectively detect all three types of changes. However, the Max-Min test is not able to detect mean shifts or correlation shifts because the difference between the largest and smallest measurement values remains the same (in distribution) when η or θ 2 changes. The Max-Min method also has a much larger β error in detecting variance shifts. Furthermore, GLR test is much more sensitive than the T 2 test in detecting the mean shift. However, it is not as good as the T 2 test in detecting variance shifts in spite of the small difference. When detecting correlation shifts, the GLRtest can detect both increasing and decreasing θ 2 , whereas the T 2 test is only able to detect increasing θ 2 , which corresponds to an increased roughness of the geometric profile. On the other hand, the T 2 test performs much better than the GLR test in detecting large shifts of θ 2 . This is mainly due to the fact that when θ 2 changes, the scenario is different from the alternative hypothesis (10) stated in the GLR test. As a result, the general-purpose GLR test is not very effective. However, if we are interested in faster detection of the correlation shifts, we can improve the GLR test by changing the alternative hypothesis. As we remarked in Section 3.2, in general the T 2 test is easy to implement and capable of detecting multiple types of changes without assumptions on the change scenario, whereas the GLR test is more complex yet flexible enough to be able to cater for different detection requirements and able to provide additional information on change type. Overall both the T 2 and GLR tests have their own advantages. We recommend that users choose between them based on practical considerations.
It is also interesting to note that the β error of the GLR test cannot decrease to zero even for a large magnitude of θ 2 shift. This might be explained by the Nyquist-Shannon sampling theorem (Shannon, 1949). Recall that the GLR test needs to estimate the shifted parameters from the data. Thus, with a finite sample size (20 in the simulation), it is unable to estimate large θ 2 values that correspond to high frequency changes in the observations. To confirm this, we chose different numbers of measurements (n l = 10, 20, and 30) on each wafer and adjusted the control limits such that their α errors were equal. The performance of these tests when different n l is used is compared in Fig. 9.
It clearly indicates that a larger n l leads to a better performance in detecting correlation shifts for both the T 2 test and the GLR test. Their performance in detecting variance shifts is also improved with a larger n l , although the improvement is relatively small. As a result, if the correlation changes Table . Average and standard error (in parentheses) of the real α error of GLR and T  tests with nominal α  = ., .. (roughness changes) are of major interest, it is better to take more measurements from each wafer to conduct the statistical tests.

... Unknown in-control parameters
The simulation studies in Section 4.2.1 compared the testing performance in the ideal case in which both f(x) and the process parameters of ϵ(x) are known exactly. In practice, this information is typically unknown, and we need to use the AGP model and corresponding tests developed in Section 2.3 and Section 3. As we would expect, with less information the performance is not as good as the case in Section 4.2.1.
When the AGP model parameters β are known and predictions are correct, the T 2 test statistic exactly follows the χ 2 n l distribution, whereas the GLR test statistic (11) asymptotically follows the mixture χ 2 1 and χ 2 2 distribution when n l is large. However, when the parameters are estimated from historical data, these results are no longer valid. Simply using the critical values derived from the theoretical distributions can lead to different α errors from those designed (Jensen et al., 2006). To investigate the impact of parameter estimation and model approximation on the real α error, we designed the following simulation study.
In this simulation, we chose different combinations of incontrol samples (N 0 , n 0 ). We also considered different numbers of testing samples n l = 10, 20, and 30. In each replication, the in-control data were generated based on the specific setting, and  the parameters of the AGP model were estimated from the data. When the critical values were determined based on the theoretical distributions with nominal error α 0 , we were able to estimate the real α error using 20 000 testing replications. This procedure was repeated 200 times, and the mean and standard error of the real α error are reported in Table 2.
The table reveals that more in-control samples or larger M 0 = N 0 × n 0 can indeed lead to a smaller discrepancy between the real α error and the nominal α error. In particular, a larger N 0 rather than larger n 0 is more effective in reducing the difference. Also, it appears that n l has a negative effect on the accuracy of the α error, especially when n l > n 0 . If a smaller α 0 is required, more in-control samples are needed to compensate for the effects of the estimated parameters.
When comparing their performance in detecting different types of shifts, we adjusted the control limits such that all of the tests had the same real α error of 0.01. Both the T 2 test and GLR test were constructed from each of the three cases: Exact, AGP, and NGP. Here Exact refers to the case when all the in-control parameters are known as in Section 4.2.1. The OC Figure . Predicted standard thickness profile using the AGP model. curves of these tests in detecting different shifts are shown in Fig. 10.
It shows that the differences between the AGP model and the exact scenario are small in most cases. In contrast, tests based on the NGP model are generally worse than those based on the AGP model except for the mean shifts scenario. As compared in Section 4.1, the NGP model is as effective as the AGP model in prediction of the mean. Therefore, their GLR test performances in the mean-shift scenarios do not differ too much. However, as we can observe from the figure, the T 2 test using the NGP model seems to be better than that using the AGP model and even Exact case. This is because the NGP model does not account for spatial correlation between samples. As a result, the T 2 statistic is more sensitive when the mean changes in one direction. A similar phenomenon in the context of time series data monitoring has been reported in Zhang (1998). He found that when the time series are correlated, directly monitoring the samples (without accounting for the correlation) has a better performance in detecting mean shifts than monitoring the residuals (uncorrelated when the time series model is correct). In practice, however, it is impractical to adjust control limits because we cannot use simulation to evaluate the real α error. When using the limits based on nominal α errors, the real α error can be far away from the nominal values. Similar findings have been reported in the literature (Neuhardt, 1987;Montgomery et al., 1991). In summary, the presented results confirm that the AGP model is more suitable for approximating complex profiles with spatially correlated errors. Consequently, charts based on AGP models generally have a better performance in monitoring a variety of shifts.
Similar to the influence on α error when the in-control parameters are unknown, the number of in-control samples N 0 , n 0 also has an impact on the detection performance. Using the same pairs of (N 0 , n 0 ), we compare the OC curves of the tests constructed based on the AGP model in Fig. 11.
Again, it shows that a larger in-control sample size generally leads to a faster detection of different shifts. The improvement is especially evident in detecting variance and correlation shifts. In contrast, only a marginal improvement can be observed in detecting mean shifts. Together with earlier findings in this section, we can conclude that more in-control samples are beneficial for the monitoring performance and result in a more accurate α error and lower β error. In the case with limited in-control samples, a self-starting strategy (Sullivan and Jones, 2002) can be used to continuously improve the AGP model estimation during the monitoring process.

Application to the monitoring of wafer thickness profiles
In this section, we apply our proposed method to monitor the thickness profile of a silicon wafer after the slicing process. The data were collected from a real semiconductor fabrication plant (with preprocessing to remove sensitive information). A total of 38 wafers were measured, among which 8 wafers were identified as normal and used as in-control samples. Although a perfectly flat wafer surface is desired, variations in the manufacturing process often cause roughness in its thickness profile. However, as long as the deviation from a flat surface is acceptable, the surface can be considered as being in control. Figure 12 demonstrates heat maps of the Gaussian processpredicted thickness profile of two different in-control wafers. Each prediction used 480 measurements at different locations. These two heat maps together with subsequent heat maps use the same color scale as in Fig. 1. Figure 12 clearly indicates that the wafer is not as flat as we expect and not very smooth due to process variations. More important, the thickness profiles of the in-control samples are quite different, which makes it a challenge to monitor the geometric quality using existing methods.
To monitor the remaining wafers, we used the eight incontrol wafers to fit the AGP model. We collected 60 measurements from each of the in-control samples using a spacefilling sampling strategy. These data were used to estimate the parameters of the AGP model. The MLE values were obtained asμ = −0.0159,σ 2 = 0.0043,θ 1 = [1.29 × 10 −4 , 3.82 × 10 −4 ],τ 2 = 0.0022,θ 2 = [0.0051, 0.0061]. Figure 13 shows the predicted standard thickness profile from the AGP model. It is interesting to note that the standard profile is not a simple flat surface. This is because the raw silicon ingot is highly likely to undergo stress deformation during the slicing process, which results in the slicing direction being non-perpendicular to the axial direction of the ingot. Therefore, the thickness profile of a sliced wafer turns out to have a specific geometric feature, whose common pattern can be depicted by the standard profile. Despite the non-flatness, compared with Fig. 12 we can observe that the standard profile is much smoother because the process variation has been filtered out from the standard profile in the AGP model. In addition, the deviations between each incontrol sample and the standard profile were obtained and used to quantify the process variations. These eight deviation profiles were numerically inspected. The results indicate that the spatial patterns of these deviations are similar and consistent with the assumptions of the AGP model. Please refer to the supplementary material for more details.
Based on the estimated AGP model, we can use the T 2 test and GLR test to determine whether or not the remaining 30 wafers are in conformance. From each wafer to be tested, 120 measurements were taken using the space-filling sampling strategy. Both test statistics were calculated using the procedures discussed in Section 3. To compare the T 2 test and the GLR test, we converted the test statistics to p-values, and the results are shown in Fig. 14. The test results indicate that most of the wafers is conform to the standard with acceptable variations. However, there are also a few wafers that fail both tests with α = 0.01. In Fig. B1 in Appendix B, these thickness profiles are shown in more detail, and they indeed display discrepancies from the standard profile and other in-control wafers. These failed wafers are either much thicker or thinner in particular regions and overall much rougher compared with in-control samples and the standard profile. We notice that wafer no. 12 failed GLR but passed T 2 . Analyzing the parameters of the GLR statistic we found that it is the variance shift causing that wafer 12 to fail the GLR test. As we can observe from Fig. B1(a), the thickness profile in the northeast region fluctuates a lot, which increases the overall variance. If the measurements happened to miss a region or did not sufficiently investigate a region, this test might not detect an abnormality on this wafer. This issue also raises the importance of sampling strategy. It is expected that by adaptively selecting the measurement points based on existing sampling information, the detection performance can be enhanced. We also reported the performance of the charts based on the NGP model in the supplementary material. Interested readers can refer to it for more discussion. In short, since the NGP model does not account for the spatial correlation of the data, and it is infeasible to adjust the control limit in practice through simulation, a significant number of false alarms are expected.

Conclusions and future directions
This article presented a systematic method to monitor the geometric quality of a wafer. We proposed an AGP model to approximate the unknown standard geometric profile and quantify the spatially correlated deviations during an in-control manufacturing process. Based on the AGP model, we developed two statistical tests, namely, the T 2 test and GLR test to determine whether or not newly produced wafer is conforming. Numerical simulations and real case studies have demonstrated that the proposed method is effective.
There are several topics worth further investigation. First of all, as demonstrated in Section 4.2.2, estimating the AGP parameters from the in-control samples often leads to an inaccurate α error of the developed test. This problem is especially important if the number of in-control samples is limited. Therefore, adjusting the control limit to account for parameter uncertainty may improve the accuracy of the test. Second, in this research all of the measurements are randomly sampled using a space-filling strategy such as an LHS plan. However, a more sensible way is to sequentially determine the measurement locations based on the current AGP model and detection objective. This adaptive sampling strategy is expected to improve the efficiency and effectiveness of the tests. Finally, other types of monitoring schemes that can aggregate information from multiple wafers can be investigated to allow faster detection of changes.