Estimation and selection in linear mixed models with missing data under compound symmetric structure

It is quite appealing to extend existing theories in classical linear models to correlated responses where linear mixed-effects models are utilized and the dependency in the data is modeled by random effects. In the mixed modeling framework, missing values occur naturally due to dropouts or non-responses, which is frequently encountered when dealing with real data. Motivated by such problems, we aim to investigate the estimation and model selection performance in linear mixed models when missing data are present. Inspired by the property of the indicator function for missingness and its relation to missing rates, we propose an approach that records missingness in an indicator-based matrix and derive the likelihood-based estimators for all parameters involved in the linear mixed-effects models. Based on the proposed method for estimation, we explore the relationship between estimation and selection behavior over missing rates. Simulations and a real data application are conducted for illustrating the effectiveness of the proposed method in selecting the most appropriate model and in estimating parameters.


Introduction
Fisher [4] extended classical linear models through adding random effects to model the correlated structure in data, and the resulting model is referred to as the linear mixed model (LMM).
The long-standing problem in introducing random effects to classical linear models is in estimating parameters in the covariance structure for depicting the variability by additional random effects. Analogous to classical linear models, parameter estimation in linear mixed models are also categorized to maximum likelihood-based (ML) and restricted maximum likelihood (REML) approaches. The former is obtained by maximizing the likelihood of the full data; whereas, the latter is developed by maximizing the likelihood of residuals [9].
Both ML and REML estimators are especially popular for their purposes. Miller [7,8] developed asymptotic properties of ML estimators for variance components where consistency and asymptotic normality were shown, and Harville [5] discussed the attractive features of the maximum likelihood approach for the estimation problem in variance components. It should be noted that the above methods rely on iterative procedures to obtain final estimates. One of the most commonly used procedures is the Expectation-Maximization (EM) algorithm [3], and in fact, the lme function in the lme4 library in R employs the approach by Laird and Ware [6] that incorporates the EM algorithm for estimation.
To develop an estimation method under the circumstances where missing values are present, we propose and derive the estimators that incorporate missing rates with the indicator matrix and matrix theory in the linear mixed models with compound symmetric covariance structure. We then inspect the performance of the proposed approach in selection and estimation over different missing rates. We further investigate the performance of the proposed approach with missing data from various mechanisms, such as missing completely at random (MCAR) and missing not at random (MNAR) [10], and we compare the results with the maximum likelihood estimator (MLE).
In addition to the proposed method in estimation involving missing rates, we conduct model selection in the linear mixed models, which can be fulfilled by testing the equality of means. Because of its consistency and discreteness, we utilize the commonly used selection criterion, Schwarz Information Criterion (SIC), in Schwarz [11]. With the SIC, the posterior probability for each model is computed, and the model with the maximized posterior probability out of all candidate models is selected. The performance of model selection is therefore investigated given the estimators are produced from the proposed method in estimation.
The rest of the paper is organized as follows. In Section 2, we discuss the LMMs with a compound symmetric covariance structure, propose and derive the likelihood estimator with incorporation of missing indicator matrix and study the convergence speed of the proposed estimators. In Section 3, the simulations on selection and estimation performance with various sample sizes and missing rates are conducted for demonstrating the effectiveness of the proposed method. In Section 4, a real data application is also conducted to illustrate the effectiveness of the proposed method in estimation and model selection with missing values. Section 5 concludes and provides discussion.

The linear mixed model with missing data
We consider a class of linear mixed effects models with the presence of missing responses and code every missing response with 0. The model can be written as where y i is a t-vector of observed responses for subject i, i = diag{δ i1 , . . . , δ it }, where δ ij denotes the missingness, with δ ij = 1 when y ij is observed and 0 when y ij is missing, X i is a known t × p covariates matrix, Z i is a known t × q matrix usually a subset of X i , and b i and i are unobserved independent q and t-vectors of normally distributed random variables with mean zero and respective variance-covariance matrices D and R i . The marginal distribution of y i is N( i X i β, i ), where i = i (Z i DZ T i + R i ) i , which yields the log-likelihood function as where μ i = i X i β.
It should be noted that the variance-covariance component i is no longer positive definite, so it is not invertible. However, as shown in the Appendix, we can establish the approximations of the determinant and inverse matrix. With these approximations, we derive the likelihood and estimators of the parameters in the LMM.

Compound symmetric covariance structure
Let y ij denote the response of subject i at time j, where i = 1, . . . , m and j = 1, . . . , t, and consider a compound symmetric covariance structure for which the correlation is equal for all time gaps between any pair of observations. We consider the presence of missing responses and code every missing response with 0. The model is written as where X i are p-dimensional covariates, β denotes fixed effects in R p where β 1 , . . . , β t are fixed effects, respectively, at time 1, . . . , t, and the corresponding design matrix X it is a tdimensional identity matrix, δ ij denote the missingness with δ ij = 1 when y ij is observed and 0 otherwise, α i is the random effect corresponding to subject i, and ij denote the pure errors, for i = 1, . . . , m, j = 1, . . . , t. With a compound symmetry structure, it is assumed that α i ∼ N(0, σ 2 τ ), ij ∼ N(0, σ 2 ), and α i and ij are independent for all i = 1, . . . , m, j = 1, . . . , t. Define . Model (1) can be vectorized as where . . , m. We will derive the maximum likelihood estimators for all parameters in the following section.

The maximum likelihood estimators with missing data
Denoting the approximation of i as * i and by Remarks 3 and 5 in the Appendix, we have From the marginal distribution for the model with one random effect and the approximations of the variance-covariance component, the log-likelihood of the model takes the form of Differentiating the log-likelihood with respect to β and setting the result equal to zero yields the maximum likelihood estimator: To find the maximum likelihood estimators of σ 2 and σ 2 τ , we first obtain the partial derivatives of ( * i ) −1 with respect to σ 2 and σ 2 τ , as shown in the following equations: For the brevity of notation, we define The maximum likelihood estimator of σ 2 τ is derived as shown in the following equation: We can recall that p i = 1 T t i 1 t , and let wt i = σ 2 + σ 2 τ p i , so the likelihood estimator of σ 2 τ takes a form ofσ Similarly, the maximum likelihood estimator of σ 2 is derived and takes the following form: The detailed likelihood estimators derivations are presented in Section A.2 of the Appendix. Since they are not of closed forms, an iterative procedure is required to obtain the final solutions. We utilize the fixed-point iterative approach in the following procedure: (1) Choose initial values for β, σ 2 , σ τ .
(2) At iteration k, we update the parameter estimates with these three equations: (3) Repeat step (2) until convergence.
The convergence properties are evaluated by the simulations in Section 2.5.

Theoretical properties
To consistently denote the variance-covariance components with the approximations of the determinant and the inverse of the variance-covariance components in use to obtain the proposed estimators, we rewrite the model as with the corresponding determinant and inverse as in Remarks 3 and 5 in the Appendix. We show thatβ,σ 2 τ , andσ 2 in Equations (3), (4), and (5) are unbiased estimators with equations (A4), (A5), and (A6) in Section A.3 of the Appendix.

Simulation studies of convergence in estimation
We evaluate the successive fixed-point iteration technique in calculating the proposed estimators by simulated data, and we construct the convergence plots for the estimates of β, σ 2 and σ 2 τ , with data under the missing rates of 10%, 50%, and 80%. The samples are generated from a compound symmetric mixed-effects model for 100 individuals (subjects) with three repeated measurements. The fixed effects associated with time are chosen to be β t i , and the coefficients for each known covariate are set to be β T , where β T t i = (21, 11, 7) and β T = (−20, 13, 27, 0, 0) yielding the model where data truly consists of x 1 , x 2 , x 3 covariates with the known covariates x 1 , x 2 , x 3 , x 4 , and x 5 generated uniformly. The variance components are chosen to be σ 2 = 1 and σ 2 τ = 4. We study the convergence behaviors under two scenarios: (1) with the model being correctly specified and (2) with the model being incorrectly specified, i.e. fitting data with redundant covariate terms. All initial values for fixed effects parameters are chosen to be zero and initial values for variances are set to be one.
described at the end of Section 2.3. The iteration processes are recorded and displayed in Figure 1. The top row shows the convergence behavior of the fixed effects βs in the model, and all of them converge within five iterations for the missing rates of 10% and 50%. For the data that have 80% of missing values, the process takes about 10 iterations to converge.
Both variance estimates σ 2 and σ 2 τ converge within five iterations as well with the missing rates of 10% and 50%, as shown in middle and bottom rows of Figure 1. Similarly, both estimates take about 25 iterations to converge when 80% of data are not observed.
In Scenario (2), the analogous analyzes are carried out with the specified model, including a redundant variable x 4 and x 5 , and the results are shown in Figure 2. All estimates converge within the first five iterations. The process takes about two more iterations to converge when the missing rate increases to 80%.
The aforementioned simulation results on convergence demonstrate that generally the iterative process of estimating the parameters using the proposed method only takes a few iterations to converge even with a high missing rate (80%).

Objectives and generic settings
There are four objectives in this section for the simulations to examine (1) the selection performance under a series of data generated with various missing rates based on the proposed estimators, (2) the selection performance with missing data from different missing mechanisms based on the proposed estimators, (3) the accuracy, precision, and coverage Convergence of (top to bottom) β, σ 2 , and σ 2 τ with (left to right) 10%, 50%, and 80% missing values for Scenario (2). The beta indices 1-6 denote the estimates for β t 1 , β t 2 , β t 3 , β 1 , β 2 , and β 3 . probability of the estimators from different missing mechanisms, and (4) the comparison of the proposed approach with MLEs in estimation.
All data are generated from a compound symmetric normal mixed-effects model with three repeated measures in each subject. The number of subjects in the data varies and is specified in each of the simulations in this section. The mean parameter associated with time j is denoted as β t j , and the mean parameter associated with the covariate i is β i . The between-subject and within-subject variances are denoted, the same as before, by σ 2 τ and σ 2 , respectively.

Selection performance with missing rates
It is well known that the SIC, in Schwarz [11], is consistent in model selection and suitable for discrete models. Some variants of the SIC or alternate methods have been developed for improving the selection performance. The SIC f in Shang [12] is known as a variant criterion of the SIC in linear mixed models. In the case of considering different prior model probabilities, the SIC f is favored compared to the SIC. Delattre et al. [2] derived an alternative to SIC by substituting the sample size in the penalty term with two components: one referring to the number of estimated fixed effects and the other associating with the number of estimated variance components in random effects. Similar to the method in Shang and Cavanaugh [13], the posterior probability is a measure for selecting the most appropriate model.
Our main goal is to examine the effectiveness of the proposed method in estimation in Section 2 with the presence of missing data. For succinctness, we utilize the most commonly used selection criterion, SIC, to conduct the model selection in the mixed models, Since the fixed-point iterative method yields stable estimates, we move forward to illustrate the model selection performance with the implementation of the proposed method. We evaluate the performance of the proposed method with a series of data generated under a sequence of missing rates ranging from 0% to 80%, and we use the SIC as the assessment of the selection.
Data are generated from a compound symmetric normal mixed effects model with three repeated measures for each subject, and subjects of 30, of 50, and of 100 are considered. The mean parameters are chosen to be β T t j = (21, 11, 7) and β T = (−20, 13, 27, 0, 0) yielding a true model consisting of x 1 , x 2 , and x 3 covariates. We carry out studies with (1) small variances σ 2 = 1, σ 2 τ = 4 and (2) large variances σ 2 = 16, σ 2 τ = 100. These parameters are randomly specified, and different βs and both small and large variance component parameters are considered. This is the approach to selecting the true parameters through all simulations in this section. Among all 2 5 possible models, we consider the following 10 models for illustrative purposes (Table 1): We compare the selection performance based on the SIC values in a sequence of missing proportions from 0% to 80%. In each iteration, the SIC will be computed using the proposed estimators and the model that yields the lowest SIC is the most appropriate. The accuracy rate is calculated as the ratio of the correct model yielding the lowest SIC to the total number of replicates. For each setting, the process is repeated 200 times, resulting in the model selection rate of the true model as the percentage of model 3 yielding the lowest SIC . Figure 3 summarizes the correctly selected rates of the true model against the missing rates of 0%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, and 80% on the x-axis with sample sizes of 30, 50, and 100. As the sample size increases, the selection rates of the true model increase slightly regardless of the percent of missing values presented in the data. With a sample size of 100, it is observable that the selected rates of the true model are higher than 90% until the missing proportion rises to 40%. It is expected that the selected rates of the true model drop drastically when the majority of data are not observed. With a 60% missing rate in a sample of 100, only 40 observations are considered, and the SIC favors the model with more covariates such as model 4, model 5 or model 9. Figure 4 demonstrates the selection performance with larger variances, σ 2 = 16 and σ 2 τ = 100. The effectiveness of the proposed estimators on data with larger variability for the selected rates of the true model is ideal until the missing rate increases to 40%, just as in the previous scenario. From these two cases, we summarize that the proposed method can be utilized to return a reliable model selection result in choosing a set of variables with a moderate amount of missing data.
In addition to choosing variables, the SIC can also serve as an assessment for hypothesis type questions, such as testing Here, we conduct a study to evaluate the performance in choosing these types of models by the SIC. We consider the same model specification for data with the mean parameters of β T t j = (5, 5, 5) and β T = (−20, 13, 27) and the variances σ 2 = 1 and σ 2 τ = 4. For illustrative purposes, we consider the three candidate models shown in Table 2. With the setup of the parameters, model 3 is expected to yield the lowest SIC.  Figure 5 shows the selection result with a missing proportion less than 20% is optimal. A further investigation finds that the difference in the SIC between the three models is only within 0.01 to 1 when the missing proportion reaches 40%, though not shown here. One resolution for observing the results is to calculate the posterior probabilities for each candidate model by the SIC values, and this implementation is advised in real data analysis not in the simulations since it would be redundant to output all 200 sets of posterior probabilities.

Selection performance under MNAR
With an indicator function considered as a random variable, the core of the parameter estimation and model inferences rely on the assumption of the missing mechanism. In the derivation of the proposed estimators, we view the missing indicator as a part of the design matrix, so the assumption of the missing mechanism has no influence on the proposed estimators. Nevertheless, this paper is dedicated to evaluating the selection performance under as many scenarios as possible. In this subsection, we investigate the selection performance with the missing responses generated under missing not at random (MNAR).
The samples are generated in the same fashion as in the previous section with the mean parameters as β T t j = (5, 5, 5) and β T = (−20, 13, 27) and two sets of variances (1) σ 2 = 1 and σ 2 τ = 9 and (2) σ 2 = 16 and σ 2 τ = 100. As mentioned previously in Section 3.2, these parameters are randomly specified, and different βs and both small and large variance component parameters are considered. Again, this principle is the approach to selecting true parameters through all simulations in this section. Missingness is generated from a Bernoulli distribution with the specified missing proportion as the success p, and the probability of missing remains the same if the proceeding observation is observed and shifts to 0.25 otherwise. The resulting missingness follows the MNAR mechanism, where the missingness depends on unobserved data.   Figure 6 summarizes the selected rates of the true model against the missing rates of 0%, 5%, 10%, 20%, 50%, 60%, 70%, and 80% on 200 replicates, and the data are generated with small variances. Each trajectory represents the selected rates with the specified sample size. As the sample size increases, the selection rate increases regardless of the percent of missing values present in the data. Compared with Figure 3 where data are generated under MCAR, the selection rates seem better, and it is in fact due to the missingness generating process where some observations have only 25% chance of missing. Figure 7 shows the competitive performance under large variances scenario versus the missing rates compared with the results shown in Figure 6 with small variances, indicating that the proposed method behaves well in selecting the true model with large variances.

Estimation under MCAR and MNAR
Following the same data generating process to obtain samples with 100 observations, we investigate the accuracy and precision of the estimates through 1000 replicates under the scenarios of σ 2 = 1, σ 2 τ = 9 and σ 2 = 4, σ 2 τ = 100, each with missingness from MCAR and MNAR.
In the case of small variances, the results are presented in Table 3. All fixed effect estimates are close to their actual values, and the standard error of estimates increases as the missing rate increases, a direct consequence from the loss of a large portion of data. Under MCAR, the estimates of both σ 2 and σ 2 τ have a downward bias, but the magnitude of the bias is very small. With 80% of missingness, the estimates of σ 2 and σ 2 τ are off about 2% to 3% from their respective true values.
Under MNAR, the resulting estimates do not exhibit a systematic downward bias as the missing proportion increases. Overall, all estimates are very accurate with decreasing precision as the missing rate escalates.
The estimates on data with more substantial variances, σ 2 = 4 and σ 2 τ = 100, are shown in Table 4, the results have a similar pattern as those in the small variances.
All simulations results in Tables 3 and 4 demonstrate that the proposed estimators generate satisfactory results in estimation regardless of the missing mechanism with the presence of moderate missingness in responses.

Comparison with MLE
In this subsection, we compare the proposed estimators with the maximum likelihood estimates on data with 0%, 20%, 50%, and 80% missing rates. Missing data are generated under missing completely at random (MCAR) and missing not at random (MNAR). Each scenario is carried out with (1): σ 2 = 1 and σ 2 τ = 9, and (2): σ 2 = 16 and σ 2 τ = 100. Both cases are iterated 1000 times to obtain the final estimates and standard errors. The  (2) the former utilizes matrix theory to obtain the approximations of the determinant and the inversion of the variance-covariance components.
To find the coverage probabilities for the confidence interval (95%) estimation, the estimation process is iterated 500 times where each iteration contains 1000 data sets generated from each simulation setting: (1-a) MCAR with small variances, (1-b) MNAR with small variances, (2-a) MCAR with large variances, and (2-b) MNAR with large variances. The coverage probability is calculated over the 500 empirical 95% confidence intervals in each simulation setting. (1-a): MCAR with σ 2 = 1 and σ 2 τ = 9 Table 5 exhibits the mean estimate and the standard error on 1000 replicates. All fixed effects, β t 1 , β t 2 , β t 3 , β 1 , β 2 , and β 3 are almost identical in both the proposed estimators and MLE. A more noticeable increase in the standard errors of the estimates is present in all fixed effects as the missing proportion increases. The proposed method yields a more accurate σ 2 and σ 2 τ estimates under all considered missing rates, and the differences become more noticeable with moderate and excessive missingness.

Scenario
To better investigate the biases, we calculate the following reference value called relative bias defined as where θ k denotes the estimate of the kth sample, θ is the true value, and N is the number of replications in the study. In our simulation settings, N = 1000. Table 6 features the relative  biases, and the values in the table reassure our findings that the proposed method yields notably better estimates of σ 2 and σ 2 τ . Table 7 features the coverage probabilities in Scenario (1-a) for 95% confidence intervals. For the parameters of fixed effects, the values of coverage probabilities show that as the missing rate increases, the proposed method performs better than the MLE method, closer to the true confidence level, indicating that the proposed method is a good choice when the data have missing values in the LMM. Here, we comment on the coverage probabilities for the variance components σ 2 and σ 2 τ . The coverage probabilities for σ 2 in the proposed method are significantly better than those for MLE due to the bias of MLE estimator of σ 2 . As observed in the estimates of σ 2 in Table 5, the MLE estimate is notably off the true value. For σ 2 τ , the coverage probabilities show that the proposed method outperforms the MLE method when the missing rate is small or medium, but not when the missing rate is as high as 80%.  (1-b): MNAR with σ 2 = 1 and σ 2 τ = 9 We conduct a similar study with the scenario that the missing values are generated under MNAR where the succeeding response has 25% chance of missing if the previous observation is missing. Therefore, the missingness depends on the response itself and thus follows the MNAR mechanism. All parameters and sample sizes remain the same as in the previous scenario. Table 8 displays the mean estimates and standard errors of 1000 replicates for all parameters, and the estimates of the proposed method in the table demonstrate that they are very close to and closer to the true parameters compared with those under MLE. Table 9 shows the corresponding relative biases, and these relative biases are very close to zero. Note that the unbiased property of the proposed estimators is proved in Section A.3 of the Appendix. The observed values in Tables 8 and 9 confirm the two aspects of the proposed estimators: (1) the fixed effects associated with the covariates β 1 , β 2 , and β 3 estimates obtained from the proposed estimators are unbiased regardless of the missing proportions, and (2) the proposed estimates are less biased compared with the MLE method.

Scenario
Again, the coverage probabilities for the interval estimation are calculated using 500 replicates where each replicate contains 1000 generated data. Table 10 shows that for the fixed effect parameters, the proposed method produces the coverage probabilities closer to the true confidence level as the missing rate increases, which implies the proposed method performs well with various amount of missing values. As mentioned previously, the coverage probabilities for σ 2 in the proposed method are significantly better than those for MLE due to the bias of MLE estimator of σ 2 and the bias is shown in the point estimates of σ 2 in Table 8. For σ 2 τ , the coverage probabilities show that the proposed method outperforms the MLE method when the missing rate is small or medium, but not when the missing rate is very high.

Scenario (2-a): MCAR with σ 2 = 16 and σ 2 τ = 100
The mean estimates, standard errors, and relative biases for the MCAR case are shown in Tables 11 and 12. The performance from both approaches in estimating the fixed effects, β t 1 , β t 2 , β t 3 , β 1 , β 2 , and β 3 , are almost identical. Again, a noticeable increase in the standard errors of the estimates is observed in all fixed effects as the missing rate increases. For the variance components, the proposed method produces much more accurate estimate values under all considered missing rates than the MLE method.    Table 13 features the coverage probabilities for Scenario (2-a) MCAR with large variances, and the values show that both the proposed method and MLE produce competitive results in estimating the parameters of fixed effects. Again, as in Scenarios (1-a) and (1-b),  the coverage probabilities for σ 2 in the proposed method are apparently better than those for the MLE method due to the bias of MLE estimator of σ 2 . For σ 2 τ , when the missing rate is 80%, the coverage probability of proposed is not optimal. Generally, the coverage rates for σ 2 and σ 2 τ demonstrate that the proposed method outperforms the MLE in estimation. Therefore, we can conclude that the proposed method generally behaves efficaciously with the missing values present, but may not perform well with a high missing rate. (2-b): MNAR with σ 2 = 16 and σ 2 τ = 100 Following the same setups for generating samples, the evaluations of data with high noise and MNAR are in Tables 14 and 15. The performance from both approaches in estimating fixed effects, β t 1 , β t 2 , β t 3 , β 1 , β 2 , and β 3 , are again almost identical. It can be observed that as the missing proportion increases, the standard errors of the estimates in all fixed effects increase. For the variance components, the proposed method overall produces much more accurate values than MLE. For the σ 2 τ estimates, the proposed method has less precision  compared with the MLE method with a high missing rate of 80%. For the σ 2 estimates, the MLE method owns notably less accuracy because of the bias of MLE estimator of σ 2 . Table 16 features the coverage probabilities of confidence interval estimation for Scenario (2-b) MNAR with large variances. The values in the table show that the proposed method produces competitive coverage probabilities under almost all four missing rates for the parameters of fixed effects. As in the last three scenarios, based on the empirical confidence intervals, the coverage probabilities for σ 2 in the proposed method are significantly better than those for the MLE method due to the bias of MLE estimator of σ 2 and the bias is also shown in Table 14. This bias also implies the reason that the REML is more effective than the MLE for σ 2 in the LMM. For σ 2 τ , the coverage probabilities show that the proposed method outperforms the MLE method when the missing rate is small or medium, but not when the missing rate is very high. We can conclude that the proposed method in general performs effectively when involving missing values even though it may not be very optimal with a high missing rate. In the above four scenarios, we note that for the MLE method, the between-subject variability (σ 2 τ ) is mostly caught in the estimation, but not for the within-subject variability (σ 2 ), which implies the MLE method provides a good estimate of σ 2 τ and a poor estimate of σ 2 . Practically, a high missing rate reduces the between-subject variability, so the estimate of σ 2 τ is off the true value, leading to the lower values of the coverage probability under 80% missing rate compared with small and medium missing rates. Therefore, when the missing rate is very high, the coverage rate for σ 2 τ is not well-behaved for the proposed method.

Summary of selection and estimation performance
The simulations above are conducted and the results show that: (1) the performance of the proposed method in model selection under various settings are of satisfaction in the sense that the selected rates of the correct model are above 90% with a moderate amount of missingness (missing rate of 40% or less), (2) the estimates by the proposed method are close to the true parameters in both the small and large variability data, (3) the comparison of the proposed method with MLE in point estimation shows that the proposed method produces notably better estimates of both σ 2 and σ 2 τ , and (4) the comparison of the proposed method with MLE in the coverage probability of confidence interval estimation shows that the proposed method performs effectively and outperforms MLE with missing values in the LMM.

An application
In this section, we apply the proposed approach to incomplete blood lead concentrations data. The data recorded the results from the treatment of lead-exposed children, a placebocontrolled and randomized study on Succimer, a treatment that enhances urinary excretion of lead, in children with blood levels of 20-44 μ g/ dl. The utilized data were from TLC Research Group [14], and the collection of the data was also described in it. As mentioned in TLC Research Group [14], the data were collected following the rules of the Centers for Disease Control and Prevention (CDC) as described in CDC [1]. We analyze the blood lead levels from 100 children who were randomly assigned to either the lead Chelation treatment with Succimer or to the placebo group. Every participant was observed four times; each at baseline (μ 1 ), week 1 (μ 2 ), week 4 (μ 3 ), and week 6 (μ 4 ). The incompleteness comes from unobserved blood lead levels. A subset of data is presented in Table 17.
In this case, the ability to model the correlated blood lead levels from one individual would be essential in examining the changes in blood lead levels over the four-time courses. For this reason, we accommodate the linear mixed-effects model to measure the magnitude of the variations from subjects to subjects along with the variations from within-subject.
One straightforward model we can employ is to assign a varying intercept for each subject, where the varying intercept is characterized by the subject variability, σ 2 τ . The primary objective in the analysis is to examine the changes in blood lead levels, so statistically, we test whether the blood lead levels at time t, μ t , for t = 1, . . . , 4, are significantly different. Figure 8 shows the blood lead levels at each of the four weeks from 50 subjects. Notice that each individual appears to have their linear trajectory with some fluctuations over time. The variation from one subject to the other subject (from one linear trajectory to the other) is often referred to as the between-subject variation. As the name suggests, the variation is subject-specific and treating the subject as a random effect in modeling considers the magnitude of the variation. The variation from the individual observations to the individual's trajectory is referred to as the within-subject variation. It represents the deviations among the observations within the specific individual.
The triangular-shaped points represent the average blood lead levels at each week based on the available data. For the children receiving the active treatment, the mean blood lead level at baseline is about 26 μg/ dl, and the mean concentration at week 1, one week after the TLC treatment was received, dropped to 13 μ g/ dl. The mean blood lead levels for the children in the placebo group stayed around 26 and 24 μ g/dl from week 0 to week 6. The preliminary results based on the graphical inspection of the data show that (1) the linear trajectories for group 0 should be different from group 1 and (2) the between-subject variations in both groups may vary since the individual lines are more vertically spread out in the treatment group than in the placebo group. To statistically formulate the problem, we denote the blood lead level from subject i at time t as y it and denote the mean blood lead level at time t as μ t . The model addresses the subject-specific variation, denoted as b i , referring as a subject-specific random effect in the linear mixed model scheme. The effect of groups is also considered and denoted as β with an indicator covariate X i having a value 1 if the subject was in the treatment group and a value of 0 otherwise. The explicit model specification is as follows: . . , 100, and t = 1, . . . , 4, with b i ∼ N(0, σ 2 τ ) and it iid ∼ N(0, σ 2 ). Note that σ 2 τ models the subject variability, and it models the within-subject variability. Since the data consist of some missing values, we employ the approach in Section 2.2, which yields likelihood estimators. Notice that the current model considers the within-subject variations for both individuals in the placebo group and individuals in the TLC treatment group are of the same magnitude.
We carry out the model selection by the implementation of the SIC as the model selection assessment to choose the most appropriate model. The model posterior probabilities are then calculated for each candidate model based on the SIC value. The model that yields the highest posterior probability will be the most appropriate model for the data. To establish the ultimate goal of the analysis, that is, to determine whether the mean blood lead levels have significant changes over time with consideration of both within and between-subject variations, the candidate models in consideration are decided by the equality between the effects on blood lead level at time t. Based on the graphical inspection, we consider 15 models shown in Table 18. For instance, model 'μ 1 = μ 2 = μ 3 = μ 4 ' indicates there are no significant changes in the blood lead levels from week 0 to week 6; whereas, model 'μ 1 , μ 2 , μ 3 , μ 4 ' represents that there are significant differences in the blood lead levels from week 0 to week 6. Model 'μ 1 = μ 2 , μ 3 , μ 4 ' represents the blood lead levels at week 0 and week 1 are not different but very different from the blood lead levels both at week 4 and at week 6, and differences in blood lead levels at week 4 and week 6 are also significant. Table 18 features the 15 model posterior probabilities, suggesting that the differences in mean blood levels from week 1 to week 4 are not significant but the differences both from week 0 to week 1 and from week 4 to week 6 are statistically significant. With model 3 being the most appropriate model yielding the highest model posterior probability, the parameter estimates of mean blood lead levels are (μ 1 ,μ 2 ,μ 3 ,μ 4 ) T = (26.74, 20.02, 20.02, 22.27), and the standard deviation estimates areσ τ = 5.44 andσ = 4.43. The estimated group factor is -0.2173 with a standard error of 0.9527, which does not exhibit a significant group effect. A follow-up investigation for fitting the model without the group factor is executed, and the SIC infers the model without a group factor in consideration yields a better fit. These results may also be a consequence of the negligence of the varying scales in betweensubject variations between the treatment and placebo groups. The updated model yields the estimated mean blood lead levels as (26.41, 19.70, 19.70, 21.95) and the estimated standard deviations asσ τ = 5.44 andσ = 4.43.
One resolution for considering a non-homogeneity within-subject variation between the treatment and placebo groups is to fit a separate mixed effects model for each group. Following the same notations for the blood lead levels, the subject-specific random effect, and the within-subject variation, the model in consideration has a form of y it = μ t + b i + it , for i = 1, . . . , 50, and t = 1, . . . , 4, with b i ∼ N(0, σ 2 τ ) and it iid ∼ N(0, σ 2 ). We employ the same estimators in Equations (3), (4), and (5) to resolve the incompleteness and re-evaluate the 15 candidate models. Then the SIC values are calculated, and based on these values, the posterior probability for each model is computed. Table 19 features the posterior probabilities. With the suggested model as 'μ 1 , μ 2 = μ 3 , μ 4 ', the estimated

Concluding remarks and discussion
We propose an approach to incorporate incomplete data for estimation problems in the linear mixed-effects model with compound symmetric covariance structures. We derive likelihood-based estimators with the implementation of the indicator matrix for the accommodation of missing data. Due to the non-closed-form expressions in estimators, as in all other estimation approaches in linear mixed models, we apply the fixed-point iterative approach to obtain the final estimates. The iterative algorithm is explicitly listed, and simulations are conducted to demonstrate the successful iterative process. The simulation results show that all processes converge within the first few iterations on data with a small, moderate, and excessive amount of missingness.
To evaluate the proposed approach on incomplete data, numerous simulations are conducted to examine (1) the model selection performance under various amounts of missingness, (2) the selection performance under different missing mechanisms, (3) the estimation performance under various missing rates, and (4) the comparison in point estimation and coverage probabilities of confidence interval estimation between the proposed method and MLE.
The simulation results demonstrate the selected rates of the true model, accessed by the SIC with the implementation of the proposed approach, are satisfactory with a moderate amount of missingness, regardless of the missing mechanisms (MCAR or MNAR).
In the studies of the comparison between the proposed approach and MLE, the proposed method provides a noticeably better estimate for the variance components σ 2 and σ 2 τ , and yields competitive estimates for fixed effects in the compound symmetric models. We apply the proposed approach to the blood lead concentration data. The applied results illustrate the effectiveness of the proposed approach on estimation and on model selection with missing values in the linear mixed modeling framework with the compound symmetric covariance structure.
The proposed method focuses on estimation and selection in the linear mixed models under compound symmetric structures. We will extend it to other correlation structures in future research. We are currently exploring a similar method under AR(1) covariance structure in a follow-up project.