Checking identifiability of covariance parameters in linear mixed effects models

ABSTRACT To build a linear mixed effects model, one needs to specify the random effects and often the associated parametrized covariance matrix structure. Inappropriate specification of the structures can result in the covariance parameters of the model not identifiable. Non-identifiability can result in extraordinary wide confidence intervals, and unreliable parameter inference. Sometimes software produces implication of model non-identifiability, but not always. In the simulation of fitting non-identifiable models we tried, about half of the times the software output did not look abnormal. We derive necessary and sufficient conditions of covariance parameters identifiability which does not require any prior model fitting. The results are easy to implement and are applicable to commonly used covariance matrix structures.


Introduction
Observations collected from data of a clustered structure, or a longitudinal or repeated measures structure often exhibit correlations within the same cluster or the same unit of analysis. Linear mixed effects models provide an approach to model the correlation and are widely applied in a variety of disciplines such as agriculture, biology, medicine, physics, and social sciences. See, for instance, David et al. [2] for a modeling in genetic studies of animal feeding, and Wang [13] and Wu [17] for analyzing data of a complex structure. Let y be the response and let X be a known design matrix associated with unknown fixed effects β. A linear mixed effects modeling of y is y = Xβ + J j=1 Z j u j , where Z j 's are known design matrices associated with the random effects u j 's. Usually, u J represents the residue and is written as with the associated Z J = I. To ease discussion, we use the notation {Z J , u J } and {I, } interchangeably.
The u j 's are assumed mutually independent with covariance matrices j 's. It follows that the covariance matrix y of the response is modeled as y = J j=1 Z j j Z j . We get a single-level random effects model when J = 2, and a multi-level random effects models when J > 2. For a single-level random effects model, we usually write y = Xβ + Zu + and y = Z u Z + . It is clear that β is identifiable if X is of full rank. Hence throughout the article, we refer model identifiability to identifiability of the parameters in the covariance matrices.
To build a linear mixed effects model, one often needs to choose parametrized covariance matrix structures for the j 's [4,5,9,10]. Commonly used patterns include multiple of an identity (MI), variance components (VC), compound symmetry (CS), and symmetric and positive definite (UN). To select a model fitting the data best, information criteria and likelihood ratio tests are usually used to select the covariance structures [16]. Let θ contain all the parameters in y . Improper specification of the covariance matrix structures, however, can lead to θ not identifiable, even if y is not over-parametrized. That is, there may exist θ * = θ producing the same y , and both j (θ ) and j (θ * ) are positive definite for each j.
It is clear that j being positive definite ( j > 0) for all j is not necessary for y > 0. One can still get y > 0 even if j > 0 does not hold for some j's. That is, the parameter space {θ : y > 0} is larger than the parameter space {θ : j > 0, j = 1, . . . , J}. If the model is not identifiable in the latter space, neither will it be in the former space. Thus, chances of model non-identifiability are relatively higher in the space {θ : y > 0}. Indeed, Verbeke and Molenberghs [11, p. 112] note that to estimate a UN type j (UNj ) of the random effects, the SAS MIXED procedure (after the release 6.10) only requires the estimated diagonal elements being non-negative. Hence, the estimate of a UNj is not necessarily positive definite. Routinely, the u j 's are normally distributed, and so is the y. As a result, distinct parameter values can produce the same y component of the likelihood of y. Parameter estimates based on maximizing the likelihood is thus vulnerable due to non-identifiability.
In a mixed effects model, the u j 's and their Best Linear Unbiased Predictors (BLUP's) usually have concrete interpretation. For example, in longitudinal modeling of students' test scores, the BLUP provides the estimate of each individual teacher's effect contributed to student outcome variability [7]. When the model is not identifiable, the BLUP of the random effects is not uniquely determined. The ambiguity also occurs to the residue, both of which are often used in model diagnostics [11]. For instance, we consider the single-level random effects model. We have the BLUPû = u (θ )Z −1 y (y − Xβ) and the residual y − Xβ − Zû. In the non-identifiable case, the BLUP and the residual can bê u * = u (θ * )Z −1 y (y − Xβ) and y − Xβ − Zû * . In practice, there are N individuals and each response vector y i is modeled as where the u ij 's are all mutually independent. Demidenko [3, p. 47], Wang [12,14] study identifiability of the single-level random effects model, [3, p. 118] derives a sufficient condition of identifiability where covariance matrices of u i and i are respectively UN and MI structured (UN+MI model). Wang [12] focuses the identifiability study on the UN structured covariance matrix of u i , and shows that the UN+ MI model is usually identifiable. Wang [12] also studies other structures under a special 2 × 2 Z i . Wang [14] obtains a characterization of the design matrix when the model is not identifiable, and then proposes sufficient conditions of model non-identifiability for various structures. Approaches in Wang [12,14] are mainly constructive which involves seeking a θ * and checking if it is in the parameter space. It is not straightforward to generalize the results for models with multi-level random effects.
We study identifiability in a multi-level random effects framework which includes single-level random effects models as a special case. We drop the full column rank assumption of Z ij which is crucial in Wang [12,14]. Our approach can be applied to various covariance matrix structures, and it allows the structures or parameters to vary between individual models. The derived conditions are necessary and sufficient, and are easy to implement. We only need to verify if a combination of matrices are linearly independent or not, or if a matrix has full rank or not. The rest of the article is organized as follows. Section 2 contains the notation and the setup of the study. Conditions of model identifiability are presented in Section 3. In Section 4, we apply our results to check identifiability of a model in literature. Simulation studies are presented in Section 5, and discussion of the study is presented in Section 6. An R program implementing the approach to check identifiability is provided in the supplementary material.

Notation and setup
In mixed effects models, the covariance matrix of the random parts is usually parametrized and has certain structure. For instance, a VC structured covariance matrix with m distinct elements is written as diag{σ 2 1 I, σ 2 2 I, . . . , σ 2 m I}, where each I is an identity matrix of appropriate dimension. We study identifiability in the setup below, and show that a model being identifiable is equivalent to a map being one-to-one. The setup focuses on an individual model, which can be easily generalized to accommodate the joint model of all individuals. In Section 3.2, when we study joint model identifiability, we use sub/super-script index to differentiate individuals.

Parametrized covariance matrix
Let S n be the collection of real symmetric n × n matrices. Let E p,q ∈ S n be an elementary matrix with the (p, q)th and the (q, p)th elements being 1 and the others being 0. Let I = {(p, q) : 1 ≤ p ≤ q ≤ n} be the index set of matrix elements. It is clear that {E p,q : (p, q) ∈ I} is a basis of S n . Let S n + ⊆ S n be the collection of n × n symmetric positive definite matrices. Throughout the article, we study covariance matrices obtained through a map, to be defined shortly below, from a parameter space to S n + . Suppose θ is of length m, that is, θ = (θ 1 , . . . , θ m ) ∈ R m . Let I k be a subset of I and I k ∩ I k be an empty set if k = k . Let Commonly used covariance matrices can be written in the above form. This class includes structured covariance matrices such as MI, UN, VC, and CS. For instance, a CSis written as σ 2 [I + ρ(J − I)], where σ 2 > 0, −1/(n − 1) < ρ < 1, and J is a matrix of 1's. Let τ = σ 2 ρ. An alternative parametrization yields the linear form σ 2 I + τ (J − I), where −1/(n − 1) < τ/σ 2 < 1.

Covariance matrix of the response
From the random effects modeling, we have y = J j=1 Z j j Z j . Suppose each j is of dimension n j × n j and is parametrized by θ j . Using the notation in the previous subsection with a sub/super-script j, we get the domain of the jth individual map as j = {θ j : j ∈ S n j + }. Suppose each θ j is of length m j . Let elements of θ j be θ jk 's. We get j = In the previous subsection, the notation θ represents a parameter vector. Abusing notation slightly, we let θ = (θ 1 · · · θ J ) and let = 1 × · · · × J . Definition 2.1: For a given θ ∈ , the model is not identifiable if ∃ θ * ∈ , θ * = θ , producing the same y .
Suppose each Z j is of dimension n × n j . We define a map from to S n It is clear that the model is identifiable with respect to θ ∈ if and only if this map is one-to-one. In the following, we assume: (i) each j is identifiable with respect to its θ j ∈ j , (ii) θ jk = θ j k whenever j = j or k = k, and (iii) elements of the θ j 's do not appear in β. Assumption (i) avoids trivial cases. Assumption (ii) generally holds for linear mixed effects models. Validity of our derived results does not rely on this assumption, but we still list it to ease notation. Otherwise, we have to use another notation for the unique elements in θ = (θ 1 · · · θ J ) . Consequently, parameterization of y has to change. Assumption (iii) separates parameters in the fixed effects from those in the random effects.

Identifiability conditions
In the following two subsections, we investigate identifiability of an individual model y = J j=1 Z j j Z j , and then identifiability of the joint model for all individuals, { y i : All technical derivations are presented in the supplementary material. We consider an example of a single level random effects model y = Xβ + Zu + . Suppose Z is a column vector of 1's which corresponds to a random intercept model. Since ZZ = J, it is clear that ZZ is in the span of {I, J − I}. Thus, the model is not identifiable if has a CS structure. From the theorem, we can get sufficient conditions of identifiability, and sufficient conditions of non-identifiability separately. When we verify the conditions, we only need to consider θ j ∈ R m j without the restriction θ j ∈ j . The model is identifiable if J j=1 Z j j Z j = 0 leads to θ j = 0, ∀j = 1, . . . , J. The model is not identifiable if there exists at least a θ j = 0 such that J j=1 Z j j Z j = 0. To check model identifiability, Theorem 3.1 involves examining if several matrices are linearly independently or not. We derive an easy to use computational approach. Let 'vec' be the operator which transforms a matrix into a vector by stacking the columns one underneath another. Let ⊗ denote the Kronecker product which maps two arbitrarily dimensioned matrices into a larger matrix with special block structure.

Joint model
It is clear that if each individual model is identifiable, so is the joint model. However, the reverse is not necessarily true. We generalize the results in the previous subsection and obtain the necessary and sufficient conditions of identifiability for the joint model. We use similar notation as in the previous subsection with a sub/super-script i to indicate the ith individual model. Suppose each ij is parametrized by θ i,j and there are N individual models. Parameters in the joint model thus consist of the θ i,j 's. We assume there is at least aj such that θ i,j is the same for all i. Otherwise, identifiability of the joint model is no different from identifiability of each individual model.

j if the jth parameter vector of the joint model appears in the ith individual model, and
The joint model is identifiable if and only if W is of full column rank, where W = (W 1 · · · W N ).

Example
Linear mixed effects models have been used in the analysis of longitudinal measurements of the CD4 count of human immunodeficiency virus (HIV) infected patients. For instance, Abrams et al. [1] analyze the change in CD4 cell counts from baseline comparing the effects of two drugs. Littell et al. [6,Section 16.11] revisit the study using the UN + MI and UN+ VC structures. We check identifiability of the latter models using Corollary 3.2 and Theorem 3.3. Without loss of generality, we study a fictitious group of 4 patients for illustration purpose. We show that covariance parameters are not identifiable in some individual models, but are identifiable in the joint model. At last, we show if an alternative form of the VC structure is used, the joint model will not be identifiable.

Background
In the study of Abrams et al. [1], patients who had failed or had been intolerant to zidovudine treatment were enrolled in a multicenter, randomized clinical trial. There were 467 patients in 16 study units. Patient characteristics included age, gender, race, zidovudine treatment status: failure or intolerance, biochemical values, body weight, Karnofsky performance scores and other hematologic variables. During the trial, patients received two drugs, didanosine or zalcitabine. The response variable CD4 count was measured for every patient in 0 (baseline), 2, 6, 12 or 18 months. The number of measurements varies with patients, ranging from 1 to 5. Let t i,j be the time of the jth measurement of the ith patient who had n i measurements, 1 ≤ n i ≤ 5. We let t i = (t i,1 · · · t i,n i ) be the measurement times of the ith patient. For the 4 subjects, we assume they have respectively t 1 = (0 2 6 12 18), t 2 = (0 2), t 3 = (0 2 6) and t 4 = (0 2 6 12).

Modeling
For subject i, fixed effects in the model of Littell et al. [6,Section 16.11] include clinical center effect, drug effect, zidovudine treatment status, baseline indicator variable b i , where b i,j = 0 (baseline) or b i,j = 1 (not baseline), andt i wheret i,j = max{0, t i,j − 2}. Interactions of fixed effects are also considered. Let y i,j be the square root of the CD4 count of the ith subject at time t i,j . The random part of the model ( where the random u i = (u i1 u i2 u i3 ) has a UNu . Let y i = (y i,1 · · · y i,n i ) and i = ( i,1 · · · i,n i ). We write The u i 's and the i 's are mutually independent. The error = ( 1 · · · N ) has either an MI structure [6, p. 719] or a VC structure [6, p. 720].
In the VC structure, there are 4 elements corresponding to the different combinations of the drug and the zidovudine treatment status. As each patient receives one drug and has one zidovudine treatment status, there is only one element in the covariance matrix of a i . That is, the i has an MI covariance structure in any individual model. Hence, all individual models have a UN + MI structure, although the joint model has a UN + VC structure.

Identifiability verification
The joint model has either a UN+ MI structure or a UN + VC structure. As discussed above, all individual models have a UN+ MI structure. Given the t i 's, i = 1, . . . , 4, we obtain the associated Z i matrices. We notice when i = 2 and i = 3, y i is not identifiable due to over-parameterization. For the UN+ MI structure, the matrix W is of full rank, and so the joint model is identifiable. For the UN+ VC structure, the matrix W is also of full column rank, and thus we obtain joint identifiability.
Suppose the i in each individual model has another VC structure whose elements depend on the timing of the measurement. That is, the variance at t i,j is the same for all i, but varies with j. The joint model is not identifiable as the matrix W is not of full rank. In fact, none of the individual models are identifiable. The second and the third individual models are over-parameterized. In the first or the fourth individual model, the V matrix in Corollary 3.2 is not of full rank.

Simulation
Statistical software such as R, SAS, SPSS, STATA and HLM all can fit a linear mixed effects model [6,8,11,15]. In particular, the 'lme' function of R is widely used. Failure of confidence interval construction or extremely wide confidence interval for the covariance parameters are indications of model non-identifiability [8,15]. We conducted a simulation study to examine the frequency of such indications. When fitting the two non-identifiable models to be described below, about half of the times the output did not look abnormal. On the other hand, model non-identifiability can be easily detected by the R program provided in the supplementary material. We use the 'lme' function of the package nlme 3.1-120 of R 3.1.2, and then apply the function 'intervals' to obtain the 95% confidence intervals for the standard deviation of the covariance parameters.

Settings
We simulate data from a single-level random effects model y i = Xβ + Zu i + i .
Each y i is of length n = 4, and Z =

Tables
For both the ML and REML estimates, the fitting algorithm converges for all 1000 simulations. Table 1 shows about half of the times, the 95% confidence intervals can be obtained.   Usually more confidence intervals are obtained for bigger sample size 500 except for the ML method in model A.
To check the length of the 95% confidence intervals, we summarize the quantiles of the lengths in Table 2 where we usually observe shorter intervals when N = 500. In model A, using the ML method, there are extraordinarily wide confidence intervals for parameter σ , and for parameter σ u2 when N = 500. Extreme intervals of length 4/3 for parameter ρ are also observed except when N = 500 using the REML method. In model B, extremely wide confidence intervals are observed for σ and ρ, and for σ u2 when N = 500 using the ML method. However, in both models, at least 99% or 95% of the lengths do not seem abnormal.
We further examine the proportion of coverage for each parameter. In Table 3, we observe notable deviation from the nominal 95% level for parameters σ and ρ. In model A, the proportions are over 95% (bold) for ρ, and for σ when N = 200. In model B, the proportions are remarkably lower than 95%. Histograms of the estimated σ and ρ, and coverage plots of ρ are provided in the supplementary material.

Summary and discussion
Linear mixed effects models are frequently used to analyze data of a clustered or a longitudinal structure. However, a fundamental issue of model non-identifiability did not receive much attention. Inappropriate specification of the structures can result in the model parameters not identifiable. Non-identifiability can lead to unreliable parameter inference. Software output may show implication of non-identifiability, but in the simulation settings we tried, about half of the times there was no such indication.
One possible criticism of the identifiability study is that only the inference of the fixed effects deserves attention and the random effects should not matter. We observe that if a mixed effects model is preferred, the random effects and often the covariance matrix structures need to be specified. Otherwise, one can always fit a multiple linear regression model with fixed effects only. Therefore, it is useful to develop a way of model identifiability checking prior to any model fitting. We derive necessary and sufficient conditions of identifiability and the approach is easy to implement.