Root cause estimation of faults in production processes: a novel approach inspired by approximate Bayesian computation

This paper presents a methodology for estimating root causes of faults in multistage mass production processes that have three properties: (1) only the final inspection data can be acquired, (2) hundreds of products are manufactured in a lot-wise manner, and (3) the acquired dataset does not always follow a Gaussian distribution. The proposed method consists of two components: (i) derive the distribution of part variables from the inspection dataset by fusing the approximate Bayesian computation (ABC) and the process model, and (ii) derive the root cause scores from the normal and abnormal datasets, which quantify how much each part contributes to the abnormal condition. The proposed method can estimate candidates of the fault causes, and numerical experiments are performed to explore the effectiveness and limitations of the method. Furthermore, the application to actual data of an internal camera module yields consistent results with design information given as domain knowledge.


Introduction
Products are made up of various parts and fixtures through multistage manufacturing processes, and part fixturing strongly affects the quality of the final product (Whitney 2004). If all processes operate normally, the product will be of good quality, but sometimes fault products will be produced due to fixture errors somewhere under an abnormal process condition. Indeed, an internal camera module consists of several reflectors, detectors, and pins, the fixture deviations of which affect the optical quality ( Figure 1). The module is mass-produced through the multistage assembly process with deviations, and faulty products are sometimes produced. The root cause identification (RCI) is a key issue to identify the root causes of faults for productivity improvement in quality control management and high-throughput operations in manufacturing processes. If root causes of faults can be identified automatically and rapidly, then it is possible to reduce the process downtime because an expert exploratory analysis is not required (Mahto and Kumar 2008). Furthermore, there is a possibility of finding fault sources that are difficult for experts to notice.
In the present study, root cause estimation is defined as the problem of extracting candidates of possible fault Supplemental data for this article can be accessed here. https://doi.org/10. 1080/00207543.2022.2042611 sources from the inspection dataset in the manufacturing processes, which can be regarded as a part of the RCI problem.
Various approaches to solving these problems have been proposed depending on application scenarios. Failure Mode and Effects Analysis (FMEA), Fault Tree Analysis (FTA), or fishbone analysis are traditional methods to estimate the root causes of failure events of a process (Cristea and Constantinescu 2017;Vesely et al. 1981;Hung and Sung 2011;Oliveira, Miguéis, and Borges 2021). Since they are usually based on domain knowledge of experienced people, it is difficult to use the information of accumulated data straightforwardly and to develop an automated RCI system that feeds back into the actual situation. Data driven approaches are useful for cases in which rich measurement data is available. For example, a Bayesian network (BN) is widely used to identify the graphical structures that describe relationships between variables in the production processes (e.g. Carbery, Woods, and Marshall 2018;Lokrants, Gustavsson, and Jirstrand 2018;Jin, Liu, and Lin 2012;Verron, Li, and Tiplica 2010).
Alternative approaches are the combination of statistical analysis and the process (design) models that Figure 1. Schematic diagram of the internal mechanics of the camera module. The optical parts, such as reflectors, detectors, and fixtures (e.g. pins) are assembled in the module. The incident light passes through two reflectors and enters the detectors. The product quality y is measured as the light receiving coordinate in the detector, i.e. the image coordinate. The reflectors shown in dotted lines indicate the incident light enters the centre of the detectors in an ideal setting. The reflectors shown as solid lines indicate a special case, in which the fixture deviation of the ith part, x i , corresponding to the pin of Reflector1 occurs, and then the coordinate of the received light on the detectors shifts.
describe the product quality information flow in multistage manufacturing processes. Ding, Ceglarek, and Shi (2002) proposed a methodology called fault matching for the diagnosis of fixture failures based on the linear state-space model (SSM), which describes the product quality using information that includes part fixturing layout geometry and sensor locations. Their approach is a pattern matching method based on principal component analysis (PCA), which is conducted by comparing the fault feature patterns extracted from the in-line process to the pre-determined fault patterns generated from the analytical model. Zhou, Chen, and Shi (2004) estimated the mean and variance of the process fault based on maximum-likelihood estimation (MLE) and a hypothesis test by regarding the SSM as a generalised linear mixed model. These methods for fault source identification based on the linear process model were reviewed in Shi and Zhou (2009) and Ding, Zhou, and Chen (2005) and were successfully used in various fields, e.g. the part assembly process (Mantripragada and Whitney 1999;Ding, Zhou, and Chen 2005;Mao, Chen, and Zhang 2016), the machining process, and the assembly process (Zhou, Huang, and Shi 2003;Loose, Zhou, and Ceglarek 2007;Abellan-Nebot et al. 2012). Du, Yao, and Huang (2015) proposed the Bayesian method to estimate the process parameters such as mean and variance during the ramp-up phase based on the SSM. In recent years, Lee et al. (2020) showed that a Bayesian approach with a sparsity-enhanced prior can identify fault sources from the measurement data of the key product characteristics. Most of these methods were based on the variance estimation of part-related variables, which affect the product quality, where both the measurements and variables are assumed to follow a Gaussian distribution.
Our assembly processes have three major properties: (1) only the inspection data for the product quality can be acquired while there are no intermediate process data, i.e. end-of-line sensing, (2) hundreds of products are manufactured in the production lines in a lot-wise manner, i.e. a mass production process, so a set of inspection data is obtained in the lot, and (3) the acquired dataset does not always follow a distribution that can be described by the error variance introduced in the process model, due to various types of errors (e.g. fixturedependent errors, measurement errors, shape and size errors among parts, and individual differences in inspection equipment induced by shape deviations of moulds). Indeed, our process model only includes adjustment errors as a Gaussian distribution with a small variance. Furthermore, the distribution of the inspection datasets is sometimes asymmetric or multi-modal in abnormal conditions.
The present paper proposes a methodology for root cause estimation of faults in multistage assembly processes, which provides candidates of the faulty part variables, by formulating the problem as a Bayesian inversion problem. While most of the previous studies focus on the identification of root causes, the proposed method provides candidates of the faulty part variables. Unlike the RCI, the proposed method works even if the dimension of the part variables, which are unobservables, is larger in comparison to the dimension of the measurements. Narrowing down the fault sources, or knowing the normal part variables, is beneficial for practical purposes.
The proposed method consists of two components: (1) derive the posterior distribution of part variables from the inspection dataset by using the design model and (2) derive the root cause scores from the normal and abnormal datasets, which quantify how much each part contributes to the abnormal condition. The keys of the proposed method for posterior estimation is called ABC inspired inverse, which is based on an approximate Bayesian computation (ABC) with an efficient algorithm for the density ratio estimation, Kullback-Leibler importance estimation procedure (KLIEP). The proposed method is fairly general and has the following advantages: the proposed method (1) requires only the inspection data of the final products,  Ding, Ceglarek, and Shi 2002), Variance estimation (VE, Zhou, Chen, and Shi 2004), and Bayesian sparse estimation (BSE, Lee et al. 2020 (2) does not depend on the functional form of the design model (even if the model cannot be described as a linear model, our algorithm works), (3) is based on a non-parametric method, i.e. distribution-free estimation, for estimating the distribution of part variables from the inspection dataset, so the probabilistic model (e.g. Gaussian distribution) does not have to be assumed, and (4) can quantify the root causes of the fault, thus providing candidates for the fault sources.
The comparison of typically related methods is summarised in Table 1.
The properties of our product and assembly process are introduced in the next section. Section 3 shows that our problem reduces to the Bayesian inversion problem and reviews ABC and its naïve application to our problem. In Section 4, we present a general explanation of the ABC inspired inverse and the algorithm for the root cause estimation based on the ABC inspired inverse. Section 5 provides the numerical experiments for the ABC inspired inverse using toy models. The actual analysis for the camera internal module is presented in Section 6. The present paper is concluded in Section 7.

Internal camera module
The internal camera modules are assembled from optical parts such as the detectors, reflectors, and several fixtures, e.g. pins or hinges that control the posture of the reflectors. The light emitted from the light source passes through the reflector and reaches the detectors (Figure 1). The posture of Reflector2 depends on that of Reflector1 because the reflectors move in conjunction with each other. Thus, the fixtures should be designed carefully considering the locations on the reflectors, the tolerances, and the adjustment processes in the actual assembly lines (Section 2.2).
In the present study, we define the part variable, x, as fixture and part deviations, e.g. pin or hinge deviations, size deviations of the reflectors, and location deviations of the detectors. Figure 1 shows the simplified structure of the internal camera module, which actually has twenty part variables: nine variables for Reflector1, seven variables for Reflector2, two variables for Detector1, and two variables for Detector2. The product quality y is defined as the coordinates of the light (image) on the two-dimensional screen in the two detectors, so y has a four-dimensional vector.

Ray-tracing model in the multistage assembly process
The ray-tracing technique has been commonly used in the design of optical systems (Hopkins 1950;Spencer and Murty 1962). Sasaki et al. (1998) and Sasaki et al. (1999) developed the virtual Production Try (PT) system of optical products based on ray tracing and reported how much each part affects the final optical performance.
In ray tracing, the intersection calculation and bending calculation are performed for each lens plane by treating the light beam (ray) emitted from the light source as a single straight line, and the path is traced in order from the light source to the image plane.
Our process model based on the ray-tracing model outputs the optical deviation y from part variables x through the multistage assembly process (Figure 2). The internal camera module is mainly assembled in six stages: the first assembly stage, three adjustment stages, the final assembly stage, and the inspection stage. The first assembly stage determines the posture of the reflectors u by installing 16 part variables, x (1) ∈ {x 1 , . . . , x 16 }. In the first adjustment process, the specific component u (1) k is adjusted to the target valueû k by fixing the position of parts x (1) , where the adjustment error v k defined as a Gaussian distribution with small variance is introduced. For quality improvement, this adjustment is repeated three times, and then u (2) , u (3) , and u (4) are obtained as outputs. The external part variables x e ∈ {x 17 , . . . , x 20 } corresponding to the positions of the detectors are assembled in u (4) in the final assembly stage. Finally, the inspection is conducted, and then y is acquired.
After some straightforward manipulation, the final form of the model can be written as where A and C = [C u , C e ] represent the assembly matrices in the first and final assembly stages, respectively. Then, B(k j , l j ) encodes adjustment part information, where the kth component of the reflector posture is adjusted by controlling the lth part variable at the jth adjustment process. The transition matrix b a is defined by b j=a (I − B(k j , l j )A k j ,· ). For Equation (1), our model can be regarded as the state-space model which is a wellknown approach used to recursively describe variation propagation at the process level of a multistage assembly process (e.g. Jin and Shi 1999;Ding, Ceglarek, and Shi 2002;Zhou, Chen, and Shi 2004). The derivation of Equation (1) is presented in Supplementary Section S2. Note that the model only includes the adjustment error propagation and does not include other possible errors induced from the production process. Thus, this is a crucial problem because the straightforward simulation of the model does not follow the distribution of the dataset obtained from our actual production process. We discuss these points in Section 2.3 in detail.

Root cause estimation
The internal camera modules are assembled through mass production processes, which are, for example, daily or lot-by-lot product group processes, so many products are managed collectively and the product qualities are inspected ( Figure 3). Following this process, we obtain the inspection dataset, D = {y i } N i=1 , where y i corresponds to the ith product quality. However, the intermediate information of assemblies cannot be acquired except in special cases. Thus, it is not possible to track which part of which product was assembled in which process.
If all products are properly assembled and there are no possible errors, then they will be perfectly and equally fine. However, in actual cases, there are some differences between products due to various types of error sources, such as the fixture-dependent errors, measurement errors, shape and size errors among parts, thermal deformation of the lens, and individual differences in inspection equipment induced by shape deviations in the moulds. Thus, D is given by a distribution with a larger variance than that of outputs of the ray-tracing model, because that model only includes adjustment errors. The suitable tolerances of optical modules are determined at the design stage, and then the processes work normally when all of the assembled products are within the standard range defined in the product specifications. Monitoring D (or its statistics such as an average), we can grasp the product quality trend of the product group. Hence, we can easily determine whether the processes are working, because the trend of the product qualities can be monitored.
If D changes from the normal condition, which is sometimes asymmetric and bimodal, it indicates that a fault product has been included for some reason (Figure 3(b)). There are various potential fault patterns, and the difference between the normal and abnormal conditions can be complicated due to many parts and fixtures related to the product quality.
Our goal is to estimate the root causes of the fault, which reduces to the following question: ' Which part variables contribute to the abnormal condition?' In our process, although the inspection dataset is available, the part variable data cannot be acquired. Thus, the problem can be divided into two processes as follows: (i) estimate the part variables from the inspection dataset, and (ii) score the degree to which the part variable affects the abnormality.
We propose a novel root cause estimation method based on a probabilistic approach to achieve our goal, which has two processes: for problem (i), estimate the part variables x as the posterior distributions based on Bayesian estimation from the normal dataset by using the process model that describes the assembly process. for problem (ii), derive the fault cause scores of each part variable based on the outputs of the method for problem (i), given datasets containing normal and abnormal conditions.

Approximate Bayesian computation
In this section, we explain the reason why we solve problem (i) described in Section 2.3 as a Bayesian inversion problem and introduce ABCs to solve the problem. If Under the normal process condition, the distribution of each part can be estimated using the ABC inspired inverse. In an abnormal condition, the distribution of y is expected to change. In this case, the distribution of Part 3 is changed compared to the normal condition. Then, the cause of the fault is said to correspond to Part 3.
we measured a sufficient number of indices as the product quality y and the process model can be described as a deterministic model, then we could straightforwardly identify the root causes by inversion analysis of the process model. However, for the internal camera module, the product quality y is a four-dimensional vector (Section 2.1), so it is quite difficult to identify the root causes of the fault by only the inspection dataset D and the process model. Thus, the main strategy of the proposed method is to generate the part variable data virtually and execute forward analysis of the process model to compare with the inspection dataset D, instead of direct inversion of the process model.

Bayesian inversion problem
The process model, i.e. the design simulator, describes the relationship between the part variable x and product quality y: where x ∈ R D x , f 0 ∈ R D y is a deterministic part of the model, and represents the errors, such as the assembly errors that are random components contained in the model. The process model projects x onto y with a certain noise width (Figure 4(a)). There are two major difficulties for deriving x from y. First, for the case in which D x > D y , the system is underdetermined, where the number of equations is smaller than the number of unknowns. Thus, we cannot derive the unique inverse of the deterministic part of the model, f −1 0 . Second, even if we can derive the inverse f −1 0 , y may be excluded from Im f 0 due to the random noise , and we cannot apply the inverse.
To overcome these difficulties, Bayesian approaches have been widely used to estimate x from y (Dashti and Stuart. 2015). Bayes' theorem gives where p(x | y) is the posterior, p(y | x) is the likelihood corresponding to the forward process defined by Equation (2), and π(x) is the prior, which should be preset based on the initial degree of belief in x. Using this theorem, we can estimate the distribution of x without directly using the inverse function. Furthermore, we can consider the noise as the distribution of p(y | x). The approach that treats the inverse problem as a problem of computing the posterior is called Bayesian inversion. As mentioned in Section 2, the outputs of the process model cannot cover the observation D in principle (Figure 4(b)) because the process model partially captures features of the data. The goal of problem (i) is to find the set of variables X * mapped from the data D, which should be represented as a distribution with a finite and wide width inevitably (Figure 4(c)). Even though the simulator does not completely represent the distribution of D, the proposed method may find X * by extending the techniques of the approximate Bayesian method with the density ratio estimation. Furthermore, applying the method for problem (i) to the normal dataset D A and fault dataset D B , respectively, we can obtain corresponding part variables X * A and X * B . There is a difference between X * A and X * B depending on the change of D A and D B (Figure 3). Problem (ii) in Section 2 reduces to the problem of quantifying how much each part variable affects the difference.

Approximate Bayesian computation and its naïve application scenarios
In this subsection, we review ABC, particularly the ABC rejection algorithm, and introduce its naïve application scenarios to problem (i). Approximate Bayesian computation is a powerful tool for deriving the posterior distribution in Equation (3), even when the models are so complicated that calculating the likelihood is not practical.
where D is the dataset, i.e. D = {y i } N i=1 , S(·) denotes summary statistics, ρ(·) is a distance criterion, and is a tolerance level that is chosen to be small.
The distribution of the accepted samples in ABC rejection yields where I(·) denotes the indicator function. When S(·) is sufficient, the algorithm exactly gives the posterior distribution with = 0, whereas the algorithm gives the prior with → ∞ (Sisson, Fan, and Beaumont 2016;Wilkinson 2008). Based on the ABC rejection algorithm, we can consider the two specific algorithms for our problem, which are denoted as ABC1 and ABC2.

ABC1
The top row in Figure 5 illustrates the inputs/output and schematic diagram for the procedure in ABC1. The detailed algorithm of ABC1 is described in Algorithm 1. This is a simplified treatment of ABC rejection for our problem, but irrational points occur as follows: • The distance criterion ρ, the summary statistics S, and the tolerance level must be chosen appropriately. • The comparison of S(D) and S(y (i) ) is unreasonable.
The former is a general property in the ABC rejection algorithm. Particularly, the choice of summary statistics critically affects the performance of ABC-based inference. Poorly chosen summary statistics lead to a significant loss of distribution information, and then the accuracy, acceptance ratio, and stability may decrease (Csilléry et al. 2010;Marjoram and Tavaré 2006). The latter is a crucial problem in our setting because our process model includes limited errors. Thus, it is not possible to completely reproduce the inspection dataset D, where the distribution of y (i) is a function with small deviations compared to the deviations for the data D.

ABC2
The middle row in Figure 5 illustrates the inputs/output and schematic diagram of the procedure in ABC2. The detailed algorithm of ABC2 is described in Algorithm 2. In this method, we estimate the distribution of hyperparameters in the generative model of the variable x. Note that the generative model g(x | ) must be prepared separately from the process model f in this method. Based on (j) sampled from the prior π( ), we can generate a sample set {x} from g(x | ). So, simulating f(x) for each sample, we can obtain the set of y, i.e. Y = {f(x)}. ABC2 naturally outputs the distribution of y, and then it is comparable for Y and D. However, ABC2 has the following disadvantages: • The distance criterion ρ, the summary statistics S, and the tolerance level must be chosen appropriately.
• The generative model g(x | ) and π( ) must be defined additionally. • The distribution of x cannot be directly obtained.
The first problem also occurs with respect to ABC1. The second statement is a crucial problem in ABC2 because, in general, the generative model g(x | ) cannot be determined reasonably to produce D. Furthermore, it is not easy to interpret π( ) for practical usage because the domain experts usually deal with the data samples of the part variables but are unfamiliar with the distribution of the deviation. In addition, when you want to know the distribution of x, we should compute p(x | )π( ) d , but doing so has a computation cost.

Root cause estimation method based on ABC and density ratio estimation
In this section, we describe the details of the proposed methods.

ABC inspired inverse
In this subsection, we propose a method by which to infer the unobserved information of the part variable set, which is called the ABC inspired inverse. A schematic diagram of this method is shown in the bottom row of Figure 5. Note that while the conventional ABC schemes calculate the distance between the summary statistics for the simulation output and data in iteration and determine the acceptance/rejection for a sample, the proposed method performs N sim simulations in advance by generating part variable data virtually from a wide prior. After that, the sample set D sim generated from the simulator can be obtained by combining the outputs of the process model.
The key idea of the ABC inspired inverse is to determine the accepted samples with the probability proportional to the density ratio: where are the densities of the data and simulated samples, respectively. The acceptance criterion in this method has the following properties: • If r(y (j) ) is small, then there are few data in the region.
Thus, the corresponding sample x (j) that outputs y (j) through the process model should be rejected. • If r(y (j) ) is large, then the proportion of the observation data is high, and the simulated data are relatively rare. Thus, the corresponding sample x (j) that outputs y (j) through the process model should be accepted.
As a result, we can obtain the sample set {y * (j) } n j=1 following the distribution of the observation dataset and the corresponding parts variables {x * (j) } n j=1 , where n denotes the number of accepted samples. The detailed algorithm of the ABC inspired inverse is given as Algorithm 3. In contrast to ABC1 and ABC2, several extra tuning factors, such as ρ, S, , and g(x; ), are not required in this method.
To clarify the properties of our algorithm in a mathematical manner, we present the following lemma and theorem. The proofs of Lemma 4.1 and Theorem 4.2 are given in Supplementary Section S1. Here, we set the proportional constant of the density ratio as the reciprocal of max[r(y)].
To execute the proposed method, it is necessary to accurately estimate the density ratio. The Kullback-Leibler importance estimation procedure (KLIEP) is a useful algorithm to directly estimate the density ratio r of p sim to p data without estimation of individual densities (Sugiyama, Suzuki, and Kanamori 2012). In the KLIEP, the density ratio can be estimated by minimising the generalised Kullback-Leibler divergence from p sim r θ (y) and p data : gKL(p data ||p sim r θ ) = dyp data (y) log p data (y) p sim (y)r θ (y) − 1 + dyp sim (y)r θ (y), where the density ratio r θ (y) is parametrised by θ . Although there are various models for the density ratio, we consider the Gaussian kernel model: where the hyperparameter h denotes the bandwidth of the kernel, which can be determined using crossvalidation. The Gaussian kernel model can represent rich non-linearities, i.e. complicated forms of the density ratio. Based on Equation (7), we can derive the final form of the optimisation by replacing the expectation with the empirical distribution as follows: where we ignored terms unrelated to θ.

Remark
According to Lemma 4.1 and Theorem 4.2, the ABC inspired inverse gives the exact inference as long as the density ratio is estimated perfectly. It is known that when a non-parametric model (e.g. kernel basis functions centred at test samples) is used for importance estimation, the KLIEP converges to the true density ratio with a convergence rate that is slightly slower than order N −1/2 under N = N sim = N obs (see Theorem 1 and Theorem 2 in Sugiyama et al. 2008). Thus, our algorithm gives the approximated posterior and exact inference as N → ∞; if the samples are taken from a sufficiently wide prior distribution, the outputs of simulator f cover the entire dataset.

Root cause scores
In this subsection, we define the root cause scores based on the ABC inspired inverse, which quantify how much the part variable affected the abnormal condition.
We have two datasets D A and D B , which are acquired in normal and abnormal conditions, respectively. The distribution of D B should be different from that of D A because D B is supposed to contain faults. It is assumed naturally that the part variables that cause faults should affect the shape of the distribution of D B . In other words, variables that are not fault sources do not affect the difference between the distributions of D A and D B . Our strategy is to score how much each part variable affects the difference of the distributions.
First, applying the ABC inspired inverse to the normal condition dataset D A in a straightforward manner, we obtain the sample set X * A , which can generate D A . Next, we apply the method to the abnormal condition dataset D B , where a variable x d is sampled from the original prior π(x d ), and the other variables x \d are sampled from X * A . Based on this procedure, we count the number of accepted samples, which is defined as a d . The key point of the proposed method is that a d can be regarded as the level of abnormal reproducibility, which means how much the dataset D B can be reproduced. If the value of a d is large, then the abnormal reproducibility is high, and x d can be the root cause of the abnormal condition, i.e. x d could be faulty parts. If the value of a d is small, then the abnormality cannot be reproduced. Thus, x d is considered not to be a root cause. Thus, we can consider root causes based on the number of accepted samples {a d } D x d=1 . A schematic diagram of root cause estimation is illustrated in Figure 6.
Note the following when calculating the final scores of a root cause of the fault.
• Baseline: Due to the effects of the design specifications and adjustment processes, the sensitivity of the final products to the quality y varies depending on the part variable. • Random number sequence dependency: Since the proposed method is a probabilistic algorithm, it has random number sequence dependence.
Based on the above properties, the final root cause scores for x d can be defined as

Numerical experiments for ABC inspired inverse
In this section, we present an empirical performance evaluation of the ABC inspired inverse, i.e. Algorithm 3, using numerical experiments in comparison with ABC1 and ABC2.

Settings
An overview of the processes of performance evaluation is presented as follows: (1) Generate datasets.
• generate N samples of x from the normal distribution: x i ∼ N (μ 0 , σ 2 0 ). • simulate y based on a benchmark model: (2) Apply the ABC inspired inverse to the data D and obtain Y * and X * . Compute the density ratio using KLIEP: r(y) = p data (y)/p sim (y).

10:
Accept each sample x (j) with a probability proportional to the density ratio. • KL divergence in x space: KL(p true (x|D)||p ac (x|D)); • acceptance rate; • computation time.
Here, we define p data (y) and p ac (y) as the distribution of D and Y * , respectively. On the other hand, p true (x|D) and p ac (x|D) are the true and estimated posterior distributions, respectively. In our settings, N = 1000, μ 0 = 0.5, and σ 0 = 0.05. Here, y was computed through benchmark models f (x), the mathematical formulations, parameter settings, and performance measures of which are described in the following subsections.

Benchmark models: y = f(x)
We used typical models including the non-linear function, the posteriors of which can be computed analytically for evaluating performance as follows: • Monotonic function: f m (x) = tan x.
In terms of probabilistic forms, we can obtain the conditional densities as follows: The problem in this case is a well-posed inverse problem. Hence, the posterior is equal to the generative model of x, N (μ 0 , σ 2 0 ).
The conditional densities are given as follows: where x ij represents the jth solution of y i and f q (x ij ) is the derivative of f q at x ij . This case corresponds to a specific version of the next benchmark model, and then the derivation of Equation (17) is described in Supplementary Section S3. The parameters in f q were defined as follows: a = 1, b = −1. • Monotonic function with small noise: f mn (x) = f m (x) + N (0, σ 2 f ). The conditional densities are given as follows: where (19) is derived in Supplementary Section S3.

Performance measures
To evaluate the performance of the algorithms, we introduce the following performance measures: • KL divergence in y space: KL(p data (y)||p ac (y)) The algorithms output the accepted samples in y space, i.e. Y * . Then, we can compare their distribution using the KL divergence. The KL divergence from Q to P is defined to be where discrete probability distributions P and Q are defined on the same probability space χ . The KL divergence is zero if and only if P = Q almost everywhere. It can be estimated simply through discretisation of P and Q with finite bins in χ (Bettinardi 2020). • KL divergence in x space : KL(p true (x|D)||p ac (x|D)) As above, we can define the distribution difference of estimated X * from the true posterior.
• Acceptance rate: This shows how many samples were accepted from the fixed number of samples. It is a key performance metric for the ABC-based approach because it represents the efficiency of the calculation. • Computation time: This is the runtime of each algorithm. Figure 7 shows the results for the ABC inspired inverse in each benchmark model. The panels in the left-hand column are Y * with data D and the panels in the righthand column are X * with the posterior p true (x|D) for the case in which N sim = 10 5 . We can see that the accepted samples Y * and X * were able to sufficiently recover the original data distribution and true posteriors in all cases. The quadratic function in our case has two inverse solutions so that the posterior is a bimodal distribution. The posterior of the monotonic function with noise is asymmetric due to the noise, whereas that of the monotonic function is symmetric corresponding to the generative model x i ∼ N (μ 0 , σ 2 0 ) denoted in Section 5.1. Figure 8 shows the KL divergence in y space (left column) and x space (right column) for ABC1, ABC2, and the ABC inspired inverse. The ABC inspired inverse performed well overall compared to the other methods. The performance depends on the tolerance level in ABC1 and ABC2, whereas the proposed method does not have a corresponding parameter. The results using ABC1 were poor because of the output mismatch between the process model and the data. The KL divergences in ABC2 became smaller as became smaller because the posterior of ABC2 agrees with the true posterior as → 0, even though the number of accepted samples was small (the panels in the left-hand column in Figure 9). We can see that the acceptance rates associated with the ABC inspired inverse were relatively high in all cases. The acceptance rate in both ABC1 and ABC2 increased as the tolerance level increased. The panels in the right-hand column in Figure 9 show the computation time as a function of the number of simulations N sim . Although the ABC inspired inverse took longer to run when N sim was small, its computation time ( s) only increased mildly as N sim increased, as compared with the other methods. The ABC inspired inverse covers all simulations before making an accepted decision, so the runtime dependency of N sim is considered to be small.

Root cause estimation of the internal camera module
In this section, we apply Algorithm 4 to the assembly process of the internal camera module.

Properties of the root cause estimation
First, we examined the properties of the root cause estimation in the internal camera module through the numerical experiments using the ray-tracing model. Here, we considered typical examples in which typical changes occurred from the normal state.
When running the ray-tracing model, N samples of each part variable x d are generated from the uniform distribution U(λ 1 d , λ 2 d ), where λ 1 d and λ 2 d can be determined by the product design: The procedure of the experiments is as follows: (1) We generated sample x i from a Gaussian distribution with mean μ and covariance , which is a diagonal matrix with σ 2 d as the dth diagonal element, and then obtained y i through the process model N, where N = 500. The parameters for data generation were was regarded as normal dataset D A . 1. Assuming that a fault occurs in the f th part, x f , we generated N samples of x f by shifting the mean and variance from the normal quantities, i.e. N (μ + δμ f , (σ f + δσ f ) 2 ). The other variables were generated from the same distribution as the normal con- where d = f . The process produced the data {y i } N i=1 corresponding to D B . 2. We applied Algorithm 4 to D A and D B and obtained the root cause scores c defined by Equation (13). Figure 10 shows the results of the experiments. In the case that δμ 19 = δσ 19 = 0.1, the distribution of y 3 in D B was different from that in D A , whereas there was almost no change in y 1 , y 2 , or y 4 (the left-hand panels in Figure 10(a)). This is because x 19 directly affects y 3 in our process model. The root cause score c 19 took the highest value, which corresponds to the true root cause (the right-hand panels in Figure 10(a)). Although the variables x 13 , x 14 , and x 2 also yielded relatively high scores, this was a natural result because these parameters affect y 3 (see Supplementary Table S1). To be precise, our formulation provides candidates for the root causes of faults. In other words, although the part variables that realise low scores can be considered not to be the root cause factors, the part variables with high scores are not always root causes of the faults.
The proposed method worked well in a more complicated case (Figure 10(b)), where δμ 7 = δσ 7 = 0.05 and the variable x 7 is upstream of the process and affects multiple components of the product quality y. However, the proposed method could not identify the root cause for the case in which δμ 1 = δσ 1 = 0.05, as shown in Figure 10(c). This is because the value of x 1 was adjusted in the adjustment process, which means that the value of x 1 was reset to control the position of the reflectors. Thus, even if the distribution of x 1 was shifted, the shift did not affect the product quality y. Thus, the proposed method cannot estimate root causes, such as adjustment variables, which do not affect the final outputs. Figure 8. Kullback-Leibler divergence in the y-space (left-hand panels) and x-space (right-hand panels). The box plots represent the median, the 25th and 75th percentiles, and the outliers.

Real data analysis
Finally, we report the results for an actual production process of the internal camera module. The real datasets have been acquired from the actual process, where several optical products are assembled every day ( Figure 11). We had two datasets in which 500 samples were recorded: one was the normal dataset and the other was the abnormal dataset. All of the products are checked as whether those qualities meet the standards in the inspection process. If the inspection is passed, then the products will be shipped. The normal dataset was collected from the set of products to be shipped, while the abnormal dataset was assumed to contain faulty products to be repaired. As shown in the left-hand panels of Figure 12, we can actually see that the actual Our goal is to estimate the root causes of the fault in the absence of the part variable data (Section 2.2). We applied Algorithm 4 to evaluate the root cause of each part variable by using both D A and D B . The results for the root cause estimation showed that c 19 , c 13 , and c 14 took high values. Thus, the 19th, 13th, and 14th part variables can be interpreted as candidates for the root cause of the fault. Mechanical experts have confirmed that the 19th part variable was a critical cause of the fault by using heuristic factor analysis from a design point of view. By analysing the simulation model, it can be shown that x 19 , x 13 , and x 14 are included as parts that are sensitive to y 3 (Supplementary Table S1). However, the proposed method was able to automatically extract variables with quantified scores.

Conclusion
In the present study, we proposed a probabilistic method for estimating root causes of faults in the multistage assembly processes. In particular, our target assembly processes only provide the inspection dataset of hundreds of products through a mass production process, whereas part variable data cannot be acquired.
The key to the proposed method is to estimate the posterior based on ABC and KLIEP by formulating the problem as a Bayesian inversion problem, and define the root cause score of a part variable as the degree to which indicate how much each part variable contributes to the abnormal condition. Note that the root cause scores represent the candidates of the fault sources, whereas variables with low scores are considered not to affect the abnormal condition. Such a concept of the root cause estimation is different from the concept of traditional root cause identification and gives useful information for practical use.
The numerical experiments of toy examples showed that the ABC inspired inverse can estimate the posterior of part variables with high accuracy and efficient computation time compared to naïve applications of ABC, even if the process model was a non-linear function. Furthermore, we verified the properties of our root cause estimation using the process model of the internal camera module which was formulated as the linear state-space model. The proposed method made it possible to quantify the degree of influence of each part variable on faults and to capture the candidates of root causes in the internal camera module. We applied the proposed method to the actual datasets and showed that the result was in good agreement with the expert analysis.
The proposed method is applicable for more complex cases because this method is fairly general and provides fault candidates that are more stable and faster than expert exploratory analysis. Furthermore, the proposed method may find fault cases that are difficult for experts to notice. Hence, the proposed method can be introduced into a process diagnosis system to help improve the productivity in quality control management and highthroughput operations.
We recognise two major limitations of the proposed method. The first one is that our method could not detect the root cause when the fault occurs on a part variable which is adjusted (see Figure 10(c)), where a deviation of the part was reset to a target value and then the part variable does not affect the final inspection data. This limitation cannot be solved unless we directly measure Figure 12. The data obtained in the actual assembly processes (left-hand and centre panels) and root cause scores (right-hand panel). The blue and red bars represent the normal and abnormal data, respectively, in the left-hand panels. The error bars in the right-hand panel represent the standard deviations when running the algorithm by changing the random seed 10 times. the part variable, but this case does not cause any practical issues because such a part variable is adjusted even if faulty. The second one is that we formulated only the adjustment error in the design model although there are actually various types of error (see Figure 4). In our scenario, it was difficult to evaluate other error profiles individually, thus the proposed method handles the error variations by sampling the part variables from a wide prior distribution. If each error profile is available, we may be able to develop a more accurate algorithm but clarifying error profiles is also challenging.
Regarding performance comparison, although it is one of the promising research directions, other RCI methods are not easily applied to our scenario. Fault matching (Ding, Ceglarek, and Shi 2002) requires listing up all the fault patterns in advance. In our scenario, a fluctuation of every part variable can be regarded as a fault pattern, then the number of fault patterns is larger than the dimension of measurement. That makes the problem underdetermined. Indeed, variance estimation (Zhou, Chen, and Shi 2004) and Bayesian sparse estimation (Lee et al. 2020) could be applied to our scenario. However, those methods require parametrising the data distribution. As we saw in the real dataset, the actual data distribution is so complex that it is hardly explainable by simple distributions such as Gaussian. Hence, quantitative comparison with those methods should be considered along with how to parametrise the distribution of the actual dataset. Note 1. Approximate Bayesian computation has been used for estimating the posterior of parameters in a model. The parameters of the process model are determined at the design process in the present study. In the context of Bayesian inversion, x is regarded as random variables instead of the parameters, but the algorithm is essentially unchanged.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Yosuke Otsubo is Technical Expert and Section Manager of Mathematical Sciences Research Laboratory (MSRL) in Advanced Technology R&D Division for Nikon Corporation. He has been a visiting researcher at RIKEN Center for Advanced Intelligence Project, Japan since 2018. He had studied quantum computation and received a Ph.D. in Science from the University of Tokyo, Japan in 2014. His current research interests include an interpretable/explainable machine learning and AI system using domain knowledge in the industrial sector. He is working with various business units involved in the Imaging solution business, Semiconductor equipment business, and Healthcare business.
Naoya Otani is Senior Researcher of Mathematical Sciences Research Laboratory (MSRL) in Advanced Technology R&D Division for Nikon Corporation. He had studied applied physics and received a master's degree from the University of Tokyo, Japan in 2012. His current research interests include machine learning algorithms with imperfect information and implementation of the algorithms on edge devices. He is collaborating with Healthcare Business Unit in Nikon to develop functions of AI microscopes. He is also a visiting researcher at RIKEN, Japan, since 2020.
Megumi Chikasue is a member of Mathematical Sciences Research Laboratory (MSRL) in Advanced Technology R&D Division for Nikon Corporation. She had studied physics and received a master's degree from Yokohama National University, Japan in 2012. Her current research interests include data analysis for manufacturing processes. She is collaborating with several factories for digital cameras and FPD lithography systems in Nikon.
Mineyuki Nishino is Section Manager of Mathematical Sciences Research Laboratory (MSRL) in Advanced Technology R&D Division for Nikon Corporation. He had studied mechanics of structures and materials for aerospace applications and received a Ph.D. in aerospace engineering from the University of Tokyo, Japan in 2006. His research interests include manufacturing of precision equipment, data analytics and simulation for manufacturing processes. He is collaborating with several factories for digital cameras, lenses and lithography systems in Nikon.

Masashi Sugiyama is Director of RIKEN
Center for Advanced Intelligence Project and Professor at the University of Tokyo, Japan. He received a Ph.D. degree in Computer Science from Tokyo Institute of Technology, Japan in 2001. His research interests include theories and algorithms of machine learning, and a wide range of applications such as signal processing, image processing, and robot control. He is a recipient of the Japan Academy Medal in 2017.

Data availability statement
Data are not available due to industry-specific restrictions. Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.