Tensor Mixed Effects Model with Applications in Nanomanufacturing Inspection

Raman mapping technique has been used to perform in-line quality inspections of nanomanufacturing processes. In such an application, massive high-dimensional Raman mapping data with mixed effects is generated. In general, fixed effects and random effects in the multi-array Raman data are associated with different quality characteristics such as fabrication consistency, uniformity, defects, et al. The existing tensor decomposition methods cannot separate mixed effects, and existing mixed effects model can only handle matrix data but not high-dimensional multi-array data. In this paper, we propose a tensor mixed effects (TME) model to analyze massive high-dimensional Raman mapping data with complex structure. The proposed TME model can (i) separate fixed effects and random effects in a tensor domain; (ii) explore the correlations along different dimensions; and (iii) realize efficient parameter estimation by a proposed iterative double Flip-Flop algorithm. We also investigate the properties of the TME model, existence and identifiability of parameter estimation. The numerical analysis demonstrates the efficiency and accuracy of the parameter estimation in the TME model. Convergence and asymptotic properties are discussed in the simulation and surrogate data analysis. The case study shows an application of the TME model in quantifying the influence of alignment on carbon nanotubes buckypaper. Moreover, the TME model can be applied to provide potential solutions for a family of tensor data analytics problems with mixed effects.

accuracy of the parameter estimation in the TME model.Convergence and asymptotic properties are discussed in the simulation and surrogate data analysis.The case study shows an application of the TME model in quantifying the influence of alignment on carbon nanotubes buckypaper.
Moreover, the TME model can be applied to provide potential solutions for a family of tensor data analytics problems with mixed effects.

Introduction
Carbon nanotubes (CNTs) buckypaper is an important multifunctional platform material with great potential for creating lightweight and high-performance materials for various applications due to buckypaper's superior mechanical and electrical characteristics.One of the critical bottlenecks in the massive production and applications of high-quality buckypaper is quality inspection and monitoring of nanomanufacturing processes.The challenges include: (i) applying quick and accurate quality metrology to obtain information associated with microstructure, (ii) characterizing and analyzing in-line data to extract useful quality information for inspection and monitoring.
As an effective characterization method for nanostructure information, Raman spectroscopy is very suitable for in-line quality inspection of nanomanufacturing processes.As an example, one Raman spectrum of single-walled CNTs buckypaper is shown in Fig. 1.In the figure, the Raman peak intensity corresponds to material concentration and distribution; peak frequency is associated with molecular structure and phase; bandwidth is associated with crystallinity and phase (Salzer and Siesler 2009); intensity ratio of D-band and G-band can be affected by degree of functionalization (Cheng et al. 2009).Therefore, numerous vital information about buckypaper quality is hidden in the Raman spectra data, which provides unprecedented opportunities for quality inspection, system informatics, and monitoring.Due to the recent development of metrology technologies, Raman mapping (also called Raman spectral imaging) can be used to perform in-line quality inspection in continuous CNTs buckypaper nanomanufacturing processes.Raman mapping is a technique for generating detailed multi-array Raman spectra including numerous information about nanomaterials.Meanwhile, it is a challenging task to conduct data analytics, feature extraction, pattern recognition and in-line decision making, due to the high-dimensionality, large data size, as well as complex spatial and temporal correlation structures of Raman mapping.Specifically, in a Raman mapping measurement for single-walled CNTs buckypaper, about 600 Raman spectra can be collected per minute from a rectangular zone with a dimension of 10 micrometers by 60 micrometers.As shown in Fig. 2, multiple measurement points are chosen from a rectangular zone, and each measurement point generates one Raman spectrum.Every Raman spectrum includes 1024 Raman shifts and intensities.The correlations along x/y directions of the rectangular zone are different due to the alignment of carbon nanotubes in the CNTs buckypaper.Meanwhile, the correlation along the Raman shift is different from the aforementioned spatial correlation.

Fig. 2. Raman mapping from the rectangular zone in CNTs buckypaper
According to the data structure of Raman mapping, tensor is an efficient mathematical tool for formulating the Raman mapping for nanomanufacturing inspection.Tensors (also called multidimensional arrays) have become increasingly important because they provide a concise mathematical framework for formulating the high-dimensional data.Similar to the linear regression model in classical statistics, people use high-order tensor decompositions in high-dimensional statistics, such as CANDECOM/PARAFAC (CP) decomposition and Tucker decomposition, to separate different components inherent to the data.Kolda and Bader (2009) provided an overview of higher-order tensor decompositions, their applications, and available software.Corresponding to the generalized linear model (GLM) in statistics, Zhou et al. (2013) proposed a GLM model in the tensor domain, which extends the classical vector-valued covariate regression to an array-valued covariate regression.However, these tensor-based methods do not consider the multilevel variabilities (mixed effects) in the datasets.
A mixed effects model is a statistical model containing both fixed effects and random effects.It has nice properties, including (i) the capability to handle multilevel hierarchical data, such as longitudinal data with multiple measurements collected over time for an individual sensor; (ii) its ability to take complex association structures, including the correlation between different groups and correlation within an individual group, into consideration.Thus, the mixed effects model is widely used in a variety of disciplines such as physics, biology, engineering and social sciences (Demidenko 2013;Galecki and Burzykowski 2013).However, the classical mixed effects model treats multivariate data as a vector or a matrix, which is insufficient for analysis of highdimensional data, such as tensor-type Raman mapping with its high dimensionality and complex correlations.Thus we need to develop a novel tensor mixed effects (TME) model that can explore mixed effects in the tensor domain.
We emphasize the motivation of developing a TME model with an example in nanomanufacturing.Raman mapping data are collected to inspect the quality of continuously fabricated CNTs buckypaper.There are multiple components in the data that are associated with different critical quality characteristics.Specifically, fixed effects measure the fabrication consistency of quality features (such as degree of alignment, degree of functionalization, nanotube distribution, and dispersion).This indicates whether there is a gradual mean shift in the roll-to-roll fabrication process of CNTs buckypaper.In addition, random effects are relevant to the uniformity and defect information.The uniformity pertains to the status of the quality indices, while the defect information consists of the number and the pattern of defects in the CNTs buckypaper (Yue et al. 2018).Therefore, it is necessary to use a mixed effects model to decompose different effects in the Raman mapping data.From another point of view, Raman mapping data have tensor structures.
One Raman mapping dataset usually contains multiple dimensions: two measurement coordinates, Raman shift (frequency) and Raman intensity.If matricization or vectorization is conducted to process the Raman data, a classical mixed effects model can be developed.However, this vectorized linear mixed effects (vLME) model has three limitations: (i) the dimension after vectorization becomes very high and a large sample size is required for accurate parameter estimation; Also, vectorization destroys the tensor structure and results that the corresponding basis matrices are not full rank; (ii) the computation cost will be large, and it cannot meet the needs of in-line inspection and monitoring; (iii) the transformation alters the inherent multi-way correlation structures, which makes the correlation along different dimensions unobtainable.To overcome these three limitations, this paper proposes the tensor mixed effects (TME) model.
The TME model can effectively and efficiently explore multilevel variabilities (including fixed effects and random effects) inherent to tensor-structured high-dimensional data.It can be regarded as a logical extension from a vector-valued or matrix-valued mixed effects model to an array-valued mixed effects model.It is a challenging task to develop the TME model because (i) it deals with high-dimensional datasets with tensor structure; (ii) an efficient algorithm is required to do parameter estimation; (iii) it is necessary to ensure the identifiability of multi-dimensional correlations.In this paper, we propose the TME model and explore its properties.An iterative Flip-Flop algorithm is proposed for parameter estimation.
The remainder of this paper is organized as follows.Section 2 introduces basic tensor notation and preliminaries, and further proposes the TME model.Section 3 describes the proposed maximum likelihood estimation (MLE) algorithm for the TME model, in addition to investigating the existence of the MLE and the identifiability of the TME model.In Section 4, a double Flip-Flop algorithm is proposed to conduct parameter estimation of the TME model.In addition, initialization and convergence criteria of the algorithm are provided.Sections 5 presents a numerical simulation, a surrogated data analysis and a real case study of Raman mapping to test the performance of the TME model.Finally, a brief summary is provided in Section 6.

Tensor Mixed Effects Model
In this section, we first introduce the tensor notation and preliminaries.Then, we propose the TME model and explore the random distribution of tensor responses.Next, we discuss the maximum likelihood estimation (MLE) for the TME model, the conditions for the existence of the MLE, and the constraints to ensure the identifiability.

Tensor Notation and Preliminaries
In this section, basic notations, definitions, and matrix/array operators in multilinear (tensor) algebra are introduced and summarized.The terminology used here remains as consistent as possible with the terminology of previous publications (Kolda and Bader 2009;Zhou et al. 2013) in the area of tensor algebra.Scalars are denoted by lowercase italic letters, e.g., ; vectors by lowercase italic boldface letters, e.g., ; matrices by uppercase italic boldface letters, e.g., ; and tensors by calligraphic letters, e.g., .The order of a tensor is the number of dimensions (modes).
For example, an  -order tensor is denoted by  ∈ ℝ  1 ×•••×  , where   denotes the  -mode dimension of .The  th entry of a vector  is denoted by   , the element (, ) of a matrix  is denoted by   , and the element (, , ) of a third-order tensor  is denoted by   .Indices range from 1 to their capital versions, e.g.,  = 1,•••, .
Matricization, also known as unfolding or flattening, is the process of reordering the elements of a tensor into a matrix (Kolda and Bader 2009).The -mode matricization of a tensor . vec() is the vectorization of a tensor .The -mode product of a tensor  ∈ ℝ  1 ×•••×  with a matrix  ∈ ℝ ×  is denoted by  ×   and elementwise, we , where all the indices range from 1 to their capital versions, e.g., the index  goes from 1,2, … , , and the index   goes from 1,2, … ,   .The kronecker product of matrices  and  are denoted by ⨂ .The kronecker product is an operation on two matrices resulting in a block matrix and it is a generalization of the outer product.
Similar to the classical mixed effects model, we assume that the specification of the random effects core tensor   and the residual errors tensor   follow tensor normal distributions.
Particularly, the tensor normal distribution of random effects core tensor   is   2 , 2 , 2 (;  r ,  r ,  r ), where the mean tensor  is a zero tensor, and the covariance matrices In addition to the fixed effects core tensor and design matrices, the TME model includes two sources of random components: the random effects accounting for covariance along different dimensions, and the residual errors   relevant to the inevitable random noise.Based on the properties of tensor normal distribution, we can derive the random distribution of   , as shown in Proposition 1.
Proposition 1.The response tensor (1) follows a tensor normal distribution, that is Proof: please see the appendix A.1 in the supplementary materials.
In this section, we proposed the TME model and specified the distribution of random effects core tensor and errors tensor.We also derived the random distribution of the response tensors in Proposition 1, which lays a foundation for inference in Section 3.

Inference of the TME Model
This section discusses how to estimate the parameters in the TME model by using the maximum likelihood estimation (MLE).Generally speaking, the parameter estimation of a TME model involves three steps: (i) constructing a log-likelihood function for the MLE with relevant probability distribution functions; (ii) deriving the MLE of fixed effects and total covariance matrices along different dimensions; and (iii) obtaining the MLE for covariance matrices of residual errors based on the conditional probability distribution.

Maximum Likelihood Estimation of Fixed Effects and Total Covariance Matrices
We know that the response tensor   follows the tensor normal distribution in Equation ( 2) and the -mode matricization  () follows the matrix normal distributions as shown in Equations (3-5).Thus, we have Proposition 2.
Proposition 2. The log-likelihood functions of Equations (3-5) are the same, and can be represented as Proof: please see the appendix A.2 in the supplementary materials.
Maximization of the log-likelihood function ( 6) yields the MLE estimators, which are shown in Proposition 3.
Moreover, we can show that the estimator vec( ̂) given in Equation ( 7) is uniquely determined regardless of the parametrization of the covariance matrices, which is explored in appendix A.3.
From Equations (8-13), we can see that the estimators of covariance matrices are crossrelated.A Flip-Flop type algorithm is designed to compute them.We will describe the algorithm in Section 4. Before that, we continue to explore the MLE for the covariance matrices of residual errors based on the conditional probability distributions.

Maximum Likelihood Estimation of Random Effects and Residual Covariance Matrices
After finishing the estimation of the fixed effects and total covariance matrices, we consider the estimation for the covariance matrices of random effects and residual errors.The distribution of random effects core tensor   conditional on response tensors follows a tensor normal distribution.Assuming   and  ̃ are known, the estimation of   is the expectation of   |  and it can obtain The distribution of   −  ̃− ⟦  ;   (1) ,   (2) ,   (3) ⟧ conditioned on the random effects core tensor   is a tensor normal distribution given by (  −  ̃− ⟦  ;   (1) ,   (2) ,   (3) ⟧) |  ∽  ,, (;  ε ,  ε ,  ε ).
Similar to Proposition 3, we have the maximum likelihood estimators of  ε ,  ε ,  ε are Comparing Equations (15-17) with Equations (8-10), we notice similar patterns.The mean components change from   −  ̃ to   −  ̃−  ̃ based on the estimation of random effects.After that, a progressive estimation for covariance matrices  ε ,  ε and  ε are obtained.
We know that the covariance matrices   ,   ,   and  ε ,  ε ,  ε should be positive definite.
In order to ensure the positive definite property in Equations (8-10, 15-17), the existence of the MLE should be explored.This is shown in Section 3.2.Based on   = (1)  r   (1)  +  ε ,   =   (2)  r   (2)  +  ε ,   =   (3)  r   (3)  +  ε , we know that the covariance matrices of random effects and residual errors are not unique.The identifiability should be investigated, which is discussed in Section 3.3.The Proof is straightforward according to the conclusion in the paper (Manceur and Dutilleul 2013) The proof is straightforward by using the conclusion (Burg et al. 1982), and it is an extension of Theorem 3 in page 6 of Roś et al. (2016).

Existence of the MLE
In Proposition 6 is a three-dimensional extension of Theorem 8 on page 14 of Roś et al. (2016).
matrices are equal to 1.The constraints do not restrict the application of the TME model since the relative magnitude of the entries in the covariance matrix will be relevant to the key information that we care about.
Based on   =   (1)  r   (1)  +  ε ,   =   (2)  r   (2)  +  ε and   =   (3)  r   (3)  +  ε , we know that the covariance matrices of random effects and residual errors are not unique.In order to ensure identifiability in a similar way to the classical mixed effects model (Demidenko 2013), we need to ensure that the design matrices   (1) ,   (2) ,   (3) have full rank and to specify the structure of the covariance matrices  ε ,  ε ,  ε .In general, there are two ways to specify the structure of the covariance matrices: One way is to assume the covariance matrices  ε ,  ε ,  ε are diagonal matrices (corresponding to the independent noise).Notably, the covariance matrices for random effects  r ,  r ,  r and the total covariance matrices   ,   ,   are not diagonal.Another way is to determine the noise pattern based on a phase-I analysis.For example the noise is found to have a signal dependent property and the noise parameters are consistent for different data acquisition time in Raman inspection of nanomanufacturing (Yue et al. 2017).

Double Flip-Flop Algorithm
For existing maximum likelihood estimation of covariance matrices with Kronecker structure, a Flip-Flop algorithm has been proposed to update the estimation of several components sequentially and iteratively (Dutilleul 1999;Lu and Zimmerman 2004;Manceur and Dutilleul 2013; Sakata 2016).We have derived the maximum likelihood estimators for the TME model in Section 3. In this section, we will propose a double Flip-Flop algorithm to implement the parameter estimation iteratively.The algorithm is shown in Table 1.➢ Calculate the mean response tensor  ̅ , and then use high-order orthogonal iteration (HOOI) to compute a rank-( 1 ,  1 ,  1 ) Tucker decomposition,  ̅ = ⟦ {0} ;   (1){0} ,   (2){0} ,   (3){0} ⟧.The decomposed core tensor and factor matrices work as the initialized fixed effect core tensor and design matrices for fixed effects.
Step 5: Iterate between steps 2 and 4 until convergence or until reaching a predetermined number of iterations .
Step 10: Iterate between steps 7 and 9 until convergence or until reaching a predetermined number of iterations .
Note that the algorithm involves two iterative loops.The first one is related to the computation of the fixed effects and total covariance matrices, and the second one is relevant to the computation of covariance matrices of residual errors and random effects.Each loop follows the characteristic of a Flip-Flop algorithm, and that is why it is named after the double Flip-Flop algorithm.

Initialization of the Algorithm
Obtaining good initial values is important for parameter estimation in the TME model.For the initialization, we use high-order orthogonal iteration (HOOI) to compute a rank-( 1 ,  1 ,  1 ) Tucker decomposition.In the HOOI algorithm, a higher-order Singular Value Decomposition (HOSVD) is applied to initialize the factor matrices, and a set of orthogonal constraints to ensure the core tensor is all-orthogonal.This improves the uniqueness of Tucker decomposition (Kolda and Bader 2009).It is typically a challenging task to determine the parameters  1 ,  1 ,  1 .Basically, the parameters  1 ,  1 ,  1 should be relevant to the rank of given tensor.However, there is no straightforward algorithm to determine the rank of a specific given tensor, which is NP-hard (Hastad 1990, Kolda andBader 2009).In the implementations, we determine the key dimensional parameters by using the following procedures: (1) with the assumption that only part of features have random effects, we test that the ranges of these parameters   ,   ,   are J: K: L~1:1:1.The ranges of parameters   ,   ,   are   :   :   ~1:1:1.
(2) we conduct tensor decomposition for each combination of parameters.
(3) after obtaining core tensor from tensor decomposition, we check the sparsity for each combination of parameters.The sparsity is usually represented by the number of far-from-zero entries.For example, we choose a sparsity criterion that the summation of absolute value at one row of core tensor should be larger than a specific threshold.(4) we select the key dimensional parameters that have the largest summation of these parameters, as well as satisfy the sparsity criterion.The design matrices   (1){0} ,   (2){0} ,   (3){0} are determined by the factor matrices from the Tucker decomposition.The design matrices for random effects   (1){0} ,   (2){0} ,   (3){0} can be chosen as a subset of appropriate columns of the design matrices   (1){0} ,   (2){0} ,   (3){0} .While the columns are determined by possible random effects relevant to features of interest in a phase-I data analysis.In phase-I data analysis, a set of process data is gathered and analyzed all at once in a retrospective analysis, and the features of interest will be chosen by multiple trials.

Convergence of the Algorithm
Lu and Zimmerman ( 2004) have explored the convergence of a Flip-Flop algorithm.
According to the paper (Lu and Zimmerman 2004), the likelihood function of successive iterations of a Flip-Flop algorithm cannot decrease.Provided  ≥ , the algorithm is guaranteed to converge.However, whether it converges to a MLE is not ensured because the space of the covariance matrices is not convex.An empirical study of the convergence is investigated in Section 5.1.
The most commonly used stopping criteria are ones that based on the relative change in either the covariance parameters between successive iterations or differences between successive log-likelihood functions.Considering all the covariance matrices, the stopping criteria for the first  et al. 2008).We also investigate the asymptotic properties in Sections 5 through simulation and surrogated data analysis.

Computational Complexity of the Algorithm
Since the double Flip-Flop algorithm uses the HOOI algorithm to do initialization and then conducts two iterative Flip-Flop loops, we need to analyze the computational cost of this algorithm.
For simplicity, we assume the dimensions  =  =  ,  1 =  1 =  1 ,  2 =  2 =  2 .In the initialization part, each iteration in HOOI involves six tensor-by-matrix products and three maximization problems, where the computational complexity for each HOOI iteration is ( 3  1 +  1 4 +  1 6 ) (Elden and Savas 2009).The computational complexity of step 3 is ( 5 ), while step 4 is ( 1 9 +  3  1 3 ).Therefore the cost for each iteration of the first Flip-Flop loop is ( 5 +  1 9 +  3  1 3 ).Similarly, the computational complexities for step 8 and step 9 are ( 5 ) and ( 3  2 ), respectively.Thus, the computation cost for each iteration of the second Flip-Flop loop is ( 5 ), which is dominated by step 8.The computational time will also be impacted by the iteration number.According to the simulation study in Section 5.1, the algorithm will converge quickly.
To show the computational advantage of the proposed TME model, we consider the conventional linear mixed effects model for vectorized responses (marked as vLME) (Galecki and Burzykowski 2013).After vectorization of the tensor responses, the dimension of each response becomes  3 .Thus the computational complexity of the vLME model is ( 9 ) (Lippert et al. 2011).It is much larger than the complexity of the TME model, which is ( 5 +  1 9 +  3  1 3 ).

Simulation Study
In this section, the performance of the iterative algorithm is evaluated through simulation studies.In order to simulate the response tensor with mixed effects, we generate the fixed effects tensor with dimension 30×5×5.The dimensions of core tensors for fixed effects and random effects are 8×3×3 and 3×2×2, respectively.The covariance matrices of random effects are generated from random symmetric positive definite matrices.Two covariance matrices of residual errors are generated by isotropic matrices (an isotropic matrix is an identity matrix multiplied by a positive number) with dimension 5×5 and another covariance matrix with dimension 30×30.
1000 response tensors are generated to test the performance.A computer with Intel Core i7-4500U processor and 8.00GB RAM is used to conduct the numerical analysis.
After we generate the dataset, we run the double Flip-Flop algorithm for parameter estimation of a tensor mixed effect model.In order to explore the quantitative estimation accuracy and the asymptotic properties, we conduct the parameter estimation of the TME model for different sample sizes from 50 to 800.
One hundred simulation runs are tested, where the mean and the standard deviation of the quantitative indices   ,  = {,   ,   ,   ,   ,   ,   } are calculated.The results are listed in Table 2.For the convergence speed, we notice that as the increase of sample size from 50 to 800, the average iteration number becomes smaller (from 9.32 to 6.05).While the average time per iteration increases from 0.62 seconds to 9.89 seconds in the first loop and from 0.22 seconds to 3.48 seconds in the second loop.For the quantitative estimation accuracy, the indices   ,    ,    ,    ,    ,    ,    become smaller as the sample size increases.Which means that the estimated parameters are more accurate for the larger sample size.Of course, it costs more to obtain larger size of samples.We compare our proposed TME method with two benchmark methods.The first is the tensor normal model with a structured mean (Nzabanita et al. 2015), which corresponds to the model that only considers fixed effects.We name it as the Tensor Fixed Effects (TFE) model and it is shown in Equation ( 18).
where the distribution of the residual errors tensor   is  ,, (;  ε ,  ε ,  ε ), and the noise covariance matrices along different dimensions are  ε ∈ ℝ × ,  ε ∈ ℝ × ,  ε ∈ ℝ × .The second benchmark method is the Tucker decomposition (TD) for the average tensor response.In this method, we do not consider that the residual errors follow the tensor normal distribution.
In the simulation, we used the tensor toolbox from the Sandia National Laboratories (Bader, et al. 2015) when writing codes for the TME, the TFE and the TD.In order to evaluate the performance of the proposed TME model and benchmark methods, we use the mean square error for each sample denoted as MSE  = ‖  −  ̂‖  2 / with  = 1,•••, .The mean and standard deviation for MSE are calculated for different sample sizes and presented in Table 3.The time in Table 3 denotes the total running time of the corresponding model.The results of mean and standard deviation for MSE and computational time in different sample sizes are shown in Table 3.For the general pattern, as the sample size increases, the mean of MSE tends to become smaller for all those methods.This is because the quantitative estimation accuracy is low when the sample size is small.When the sample size is 50, 80, or 100, the MSE of the proposed TME model is larger than that of the TFE model and the TD model.The reason is that the sample size is lower than the number of unknown parameters needed to be estimated, which is 152 in this simulation example.Therefore the parameter estimation is not accurate.It indicates that if the sample size is low, the error from parameter estimation will significantly reduce the effectiveness of the model that considers random effects.Hence, it is better to only consider the fixed effects.When the sample size is larger than 200, the proposed TME model outperforms the TFE model and the TD model with respect to MSE.This is especially true when the sample size is comparable or larger than the total dimensions ( •  •  in this example).The reason is that the TME model considers not only the fixed effects, but also the random effects.Additionally, the computational time of the TME model, the TFE model, and the TD model are comparable.
We also tried to conduct the vectorized Linear Mixed Effects (vLME) model, which conducts the typical linear mixed effects model (Galecki and Burzykowski 2013) after vectorization of the tensor responses.We used the fitlmematrix function in Matlab to conduct the vLME model.In this simulation, we transform  = { 1 ,  2 , … ,   } into a matrix, and get basis matrices from the vectorized parameters in {  (1) ,   (2) ,   (3) ,   (1) ,   (2) ,   (3) } .However, vectorization destroys the tensor structure and results that the corresponding basis matrices are not of full rank.Therefore the vLME model failed in this situation.If we use different basis design in the TME and vLME models, the comparison will be unfair.Therefore, the TME model and vLME cannot be compared in the simulation.
It is worth mentioning that the small MSE is to show the TME model can realize accurate parameter estimation when there exist fixed effects and random effects in the multi-dimensional arrays.The main reason that we recommend the TME model is not that it can realize smaller MSE, but it can (i) separate fixed effects and random effects in a tensor domain; (ii) explore the correlations along different dimensions.If we know that there are not random effects in the tensor datasets according to domain knowledge, the TFE model and the TD model are recommended.

Surrogated Data Analysis of Raman Mapping
In this section, the performance of the TME model is evaluated through the surrogated Raman mapping data from a real CNTs buckypaper fabrication process.The setup of in-line Raman spectroscopy is shown in Fig. 5.In the experimental setup, Near Infra-Red (NIR) laser with a wavelength of 785nm and a laser output power of 150mW were used to eliminate the effect of ambient light.A low magnification lens was used to achieve a larger focus tolerance.
Raman mapping data have been collected from multiple rectangular zones, and the Raman data from each zone corresponds to one tensor.One Raman mapping tensor is shown in Fig. 2.
Red dots represent measurement points, and there is a Raman spectrum in each measurement point.
The mean response tensor is computed, and Tucker decomposition is conducted to obtain the design matrices.The dimension of the response tensor is 256×5×5.The dimensions of core tensor of fixed effects and random effects are 8×3×3 and 4×2×2, respectively.The covariance matrices of random effects are generated by weighted summation of diagonal matrices with random values and identity matrices.Two covariance matrices of residual errors are generated by the identity matrices with dimension 5×5 and another covariance matrix with dimension 256×256 that is diagonal with a given signal-dependent noise from experimental data.
After generating the surrogated Raman mapping data, the proposed TME is applied to extract different components including fixed effects, random effects, and signal-dependent noise.

Real Case Study
In this section, we show a real case study of applying the proposed TME model.The setup of in-line Raman spectroscopy is shown in  • For the random effects covariance along the horizontal direction, we observe that the coefficient  r (1,2) tends to become negative after alignment and as the degree of alignment increases, the absolute magnitude becomes larger.The covariance coefficient  r (1,2) changes from 0.0097 to -0.1267.It indicates that negative correlation along the alignment direction occurs, and the covariance coefficient changes with the alignment of CNTs buckypaper.This can be explained by the conservation of mass in a local zone.
Alignment introduces systematic ridges and valleys in the microstructure pattern, and a ridge will be close to a valley, that indicates the negative correlation in the height.The height will impact the measurement distance between the laser head in Raman spectroscopy and the Raman mapping.The covariance coefficient in  r (1,3) becomes negative after alignment, but the absolute magnitude becomes closer to zero.One physical interpretation is that after alignment, the distance between the first measurement line and the third measurement line becomes larger, and their correlation relationship becomes weaker.
• For the random effects covariance along the vertical direction, coefficients  r (1,2) becomes closer to zero, but the absolute magnitude of  r (1,3) becomes larger.The physical interpretation is that as the stretch ratio increases, the high-frequency surface roughness becomes smaller, while the low-frequency surface roughness becomes larger.
• For the range of diagonal entries in covariance matrices of Raman mapping, the change of  r ,  r and  ε for CNTs buckypaper with different degrees of alignment are quite random.
However,  ε has a larger quantitative difference between the maximum entry and the minimum one after alignment.Without alignment, the diagonal coefficients range is 0.0464, while after alignment with a stretch ratio of 60%, the range becomes as large as 0.3121.
Which means that for different measurement lines along the alignment direction, the variability becomes larger.This makes sense because the alignment creates systematic ridges and valleys along the alignment directions.We can use this index to quantify the degree of alignment.
In summary, based on the covariance matrices from the TME model, we can quantify the influence of alignment based on the range of diagonal entries in  ε and covariance coefficients  r (1,2) .
The quantitative changes after alignment can be interpreted by engineering knowledge.We want to point out that this case study is chosen to illustrate the performance of our approach.Other approaches may also work well for quantifying the degree of alignment.In addition, our TME approach can be extendable to other applications.

Summary
In this paper, we proposed a novel TME model that effectively and efficiently explores the

Finding
the estimation of the average component, fixed effects core tensor  ̂, is straightforward given the positive definite covariance matrices.Hereafter, we focus on the exploration of the existence of MLE for the total covariance matrices  ̂ ,  ̂ ,  ̂ , shown in Equations (8-13).A necessary condition for the existence of the MLE can be derived based on the paper (Manceur and Dutilleul 2013), which is demonstrated in Proposition 4. Proposition 4. If maximum likelihood estimators for the covariance matrices   ,   ,   in the TME model (1) exist, the sample size  of the response tensors   ( = 1,•••, ) satisfies the condition summary of Propositions 4 and 5, if  < max ( the MLEs of covariance matrices do not exist.However, if  ≥ , the MLEs exist with probability 1.Moreover, the dimensions of tensor samples are usually large, and  ≥  is hard to guarantee in practice.When the covariance matrices satisfy special structures, such as diagonal structures, the sufficient condition of existence will be changed.We have Proposition 6 to illustrate the existence conditions with the additional assumptions of diagonal matrices.Proposition 6.The response tensors   ( = 1,•••, ) satisfy the model (1) with the additional assumption that   is diagonal.If  ≥ max (, max ( the maximum likelihood estimators for the covariance matrices   ,   ,   in the TME model exist with probability 1. For convergence, we test several convergence indices that are the divided  1 norm of the difference between covariance matrices in two successive iterationsfor the second loop.We can see the convergence indices versus iterative histories in Fig.3.We can find that the convergence history is monotonic and fast.(a) Convergence of the first loop (b) Convergence of the second loop Fig. 3. Convergence of the iterative algorithm When we do the parameter estimation for the TME model, we will get the design matrices   (1) ,   (2) ,   (3) first by a Tucker decomposition for the mean response tensor.The design matrices .After convergence, we compare the estimated parameters with a sample size of 600 and the ones in the simulation model (underlying true parameters).The parameters that we consider include core tensor of fixed effects, total covariance matrices from the first loop and covariance matrices of residual errors from the second loop.The results are shown in Fig. 4. We can see the estimations are quite consistent with the simulated parameters (underlying true parameters).
(a) Core tensor of fixed effects (b) Covariance matrices  ̂ and  ̂ (c) Covariance matrices  ̂ and  ̂ (d) Covariance matrices  ̂ and  ̂ Fig. 4. Comparison between estimated parameters and simulated parameters In order to quantitatively evaluate the estimation accuracy, we introduce indices, including   ,    ,    ,    for the first loop, and    ,    ,    for the second loop.Where   = ‖ ̂− ‖  /‖‖  , and  ̂−  denote the difference between estimated and true matrix/tensor.Moreover,  = {,   ,   ,   ,   ,   ,   }, and ‖•‖  denotes Frobenius norm.Furthermore, we introduce three indices for showing the convergence speed of different sample sizes.These indices are the iteration number, the time per iteration in the first loop, and the time per iteration in the second loop.

Fig. 5 .
Fig. 5. Renishaw™ inVia micro-Raman system with custom-designed remote optical probe and roller sample stage

Fig. 5 .
Similar to the surrogated data analysis, NIR laser with a wavelength of 785nm and a laser output power of 150mW were used to eliminate the effect of ambient light.The multi-walled CNTs buckypaper before alignment and after alignment are measured by Raman mapping technique.The scanning electron microscope (SEM) pictures for CNTs buckypaper are shown in Fig.6.Alignment was conducted by stretching with different stretch ratio, including 0%, 20%, 35%, and 60%.When stretch ratio equals to 0%, it is referring to the CNTs buckypaper without alignment.The matrices  r and  ε are associated with the correlation along the horizontal direction, while the matrices  r and  ε are associated with the correlation along the vertical direction.

Fig. 6 .
Fig. 6.SEM pictures of CNTs buckypaper with different stretch ratios fixed effects and random effects inherent to the data in tensor domain.The advantages of this model include (i) its capability to handle multilevel hierarchical data; (ii) its ability to take complexed association structures, including correlation along different dimensions, into consideration; (iii) analyzing the mixed effects for the high-dimensional datasets.The proposed TME model can be viewed as a logical extension from a vector/matrix-valued mixed effects model to an array-valued mixed effects model.The proposed TME model is applied in the nanomanufacturing inspection.Moreover, the TME model can be applied to provide potential solutions for a family of tensor data analytics with mixed effects, such as problems in the research fields of multimodality imaging analysis, chemometrics, neuroimaging, multichannel signal processing, etc.For the TME model, the distribution of response tensors and its -mode matricization were explored.We also derived the log-likelihood function for the TME model.Maximum likelihood estimators for fixed effect core tensor and covariance matrices were derived.Existence of the MLE and identifiability of the TME model were illustrated.Moreover, an iterative double Flip-Flop algorithm has been developed for parameter estimation, and the initialization and convergence criteria have been discussed.The computational complexity of the Flip-Flop algorithm has been derived.The TME model was shown to outperform vectorized LME model from a computational complexity perspective.By simulation and surrogated data analysis, we found that the algorithm can realize very quick convergence.The iteration number becomes smaller and time per iteration becomes longer as the sample size increases.In addition, the asymptotic property was investigated in the simulation and surrogate data analysis.The estimation accuracy of total covariance matrices and covariance matrices for the error terms improve as the sample size increases.In the simulation study, we also show that the TME model outperforms two benchmark methods which do not 2 × 2 are positive definite.We know, from the properties of tensor normal distribution, vec(  ) is distributed as a multivariate normal distribution with mean vec() and covariance matrix  r ⨂ r ⨂ r .We write that  ∽   2 , 2 , 2 (;  r ,  r ,  r ) if vec(  ) ∽   2  2  2 (vec(),  r ⨂ r ⨂ r ).Similarly, the distribution of the residual errors tensor   is  ,, (;  ε ,  ε ,  ε ), and the noise covariance matrices along different dimensions are  ε ∈ ℝ × ,  ε ∈ ℝ × ,  ε ∈ ℝ × .
(Roś et al. 2016)ndition shown in Proposition 4 is necessary for the existence of the MLE, it is not sufficient because it cannot ensure that all the iterations of the algorithm have full rank matrices.Similar to the existence of the MLE for the model with Kronecker product covariance structure(Roś et al. 2016), it could happen that covariance matrices in the updated iterations do not have a full rank with the likelihood of the TME model converging to the supremum.The reason is that the space {  ⨂  ⨂  :   ∈ ℝ × ,   ∈ ℝ × ,   ∈ ℝ × ;   ,   ,   are positive definite} with any norm is not closed.If we choose a stronger condition, for a space  (equipped with the Frobenius norm) of positive definite  ×  matrices that have a kronecker structure such that   ⨂  ⨂  ∈ , where   ,   ,   are also positive definite, Then  is closed, according to the natural extension of(Roś et al. 2016).Based on this conclusion, we can formulate the sufficient condition for the existence of the MLE for the total covariance matrices, as shown in Proposition 5.

Table 2 .
Quantitative results for convergence speed and accuracy for different sample size

Table . 3
. Comparison of mean square error in the TME model and benchmark methods f1 ,    ,    ,    ,    ,    ,    that were defined in Section 5 have been used to evaluate the results under different sample sizes.The results are shown in Table4.

Table . 4
. Quantitative results for convergence speed and accuracy for surrogated Raman In comparison to Table2, the results in Table4show similar asymptotic patterns, which means the estimated parameters become more accurate as the sample size increases.For example, when the sample size is 600, the indices   ,    ,    ,    are as low as 0.0012, 0.0702, 0.2107 and 0.2206, respectively, which indicates the accuracy of the parameter estimation.However, the iteration number and time per iteration become larger for the same sample size due to the dimension increases from 30 to 256.Specifically, the computation time for 600 samples are 90.73 seconds per first loop and 42.32 per second loop.