Mixture Modeling for Longitudinal Data

In this article, we propose an unbiased estimating equation approach for a two-component mixture model with correlated response data. We adapt the mixture-of-experts model and a generalized linear model for component distribution and mixing proportion, respectively. The new approach only requires marginal distributions of both component densities and latent variables. We use serial correlations from subjects’ subgroup memberships, which improves estimation efficiency and classification accuracy, and show that estimation consistency does not depend on the choice of the working correlation matrix. The proposed estimating equation is solved by an expectation-estimating-equation (EEE) algorithm. In the E-step of the EEE algorithm, we propose a joint imputation based on the conditional linear property for the multivariate Bernoulli distribution. In addition, we establish asymptotic properties for the proposed estimators and the convergence property using the EEE algorithm. Our method is compared to an existing competitive mixture model approach in both simulation studies and an election data application. Supplementary materials for this article are available online.


INTRODUCTION
The mixture model has been extensively applied in many fields due to its flexibility to capture the heterogeneity arising from subgroups (components) in the whole population. The mixture model can also be viewed as a two-level hierarchical structure with incomplete data, where the first level consists of latent variables indicating subjects' subgroup memberships, and the second level consists of outcome variables. However, the individual's subgroup membership has to be inferred from the location of the outcome response due to unobservable latent variables.
Existing methods of mixture model with covariates for independent data include, but are not limited to, Jacobs et al.'s (1991) mixture-of-experts for a mixture of component regressions, Jordan and Jacobs's (1994) extension on the generalized linear model for mixing proportion. For correlated or longitudinal outcome data, Rubin and Wu (1997) introduced a linear random-effects model for components' densities under the mixture framework. Alternatively, Rosen, Jiang, and Tanner (2000) proposed a generalized estimating equation for component distributions to incorporate correlations. These approaches all assume that the latent variables are independent. However, modeling correlation from outcome variables only is not sufficient to address correlations from subgroup membership over time.
One notable approach to model the correlated latent structure is the hidden Markov chain model (Qian and Titterington 1992). However, this is not applicable for longitudinal data since the Markov chain assumption does not hold or approximate some common correlation structure in longitudinal data such as exchangeable structure. Another well-known approach of mixture modeling for longitudinal data is related to growth-curve mixture modeling (Muthén and Muthén 2000), where the subject's group membership is fixed over time, while different trajectory classes represent different mixture components. Huang et al. (2014) proposed a kernel smoothing method for a mixture of Gaussian processes incorporating both functional and heterogeneous types of dense longitudinal data. In addition, to incorporate the individual effects, Sun, Rosen, and Sampson (2007) proposed a multivariate Bernoulli mixture model by using random effects in the generalized linear model for mixing proportion.
Although the mixed-effects model is widely used to handle dependent data, incorporating serial correlation for latent variables in mixture modeling for longitudinal data is still limited. In this article, we are interested in developing an efficient method in mixture modeling of longitudinal data, where within-subject subgroup memberships could be correlated.
One challenge in formulating a mixture model for dependent data is that the joint likelihood function usually does not have an explicit form, because the latent variables are correlated categorical variables. In addition, the latent indicator variables are unobservable and require imputation. Although one can assume independent structure for the latent variables, the estimation efficiency will be compromised.
In this article, we allow group memberships to change over time in addition to taking serial correlation into account. We adopt the generalized estimating equation (GEE) approach (Liang and Zeger 1986) for both component distributions and mixing proportion in a two-component mixture model. This can be accomplished by treating the estimating equations for incomplete data as the conditional expectations of those for complete data. Specifically, we apply the expectation-estimating-equation (EEE) algorithm to solve the equations. To impute the latent variable in the E-step, we provide an approximation method to calculate the joint conditional expectation using the conditional linear property for the correlated binary variables (Qaqish 2003;Qu, Lindsay, and Lu 2010).
In contrast to the joint likelihood approach, the proposed method only requires the marginal distributions for both components' densities and latent variables. In the estimating step, we fully use the serial correlation while treating it as a working structure. Allowing different working correlation structures enables one to incorporate various correlated latent structures, although the proposed method does not require to know the true correlation structure to produce consistent estimators of mean parameters. However, if the correlation structure is correctly or closely specified, we gain efficiency of the parameter's estimation and improve the classification accuracy from a model-based clustering.
The article is organized as follows: Section 2 introduces some notation and background knowledge; Section 3 presents our method and provides some theoretical results, as well as the EEE algorithm and imputation methods; Section 4 provides simulation studies; Section 5 considers an application to the Election data; and Section 6 offers a brief summary and some further discussion.

BACKGROUND AND NOTATION
For longitudinal data, let y it be the response of subject i at time t, and x it be a pdimensional covariate vector, where i = 1, . . . , n and t = 1, . . . , T i . For ease of notation, we first assume T i = T for all i representing a balanced data case, where an unbalanced data case will be discussed later. Denote y i = (y i1 , . . . , y iT ) as a T × 1 response vector, . , x iT ) as a p × T covariate matrix. In a two-component mixture model, let z it denote the binary latent variable associated with y it . Let μ r (·) be an inverse link func- In this article, we consider a two-component mixture model. The outcome variable y it is from either one of the two subgroup populations and thus is assumed to follow a mixture distribution π it f 1 (y it |x i t , β 1 , φ 1 ) + (1 − π it )f 2 (y it |x i t , β 2 , φ 2 ), where π it is a mixing proportion defined as the probability of a response y it from a specified subgroup, f 1 (y it |x i t , β 1 , φ 1 ) and f 2 (y it |x i t , β 2 , φ 2 ) are the components' distributions, and φ = (φ 1 , φ 2 ) is the dispersion parameter. This two-component mixture model can be regarded as a hierarchical structure. Specifically, where a latent variable z it represents a subgroup membership indicator.
The most common assumption for this hierarchical model is that the outcome variable y it 's are conditionally independent given the subgroup membership label z it , then the joint likelihood density function of complete data ( y i , z i ) can be written as Under the mixture framework, the latent variable z it is missing. The simplest way to get around the latent structure is to assume independence of subgroup memberships at different times within each subject and thus p( In the independent mixture model, the correlations among longitudinal observations and their subgroup memberships are not fully used. Jordan and Jacobs (1994) proposed a hierarchical mixture-of-experts model which takes covariates into consideration for both latent variable and component distributions, where the latent variable z it is assumed to follow a logistic model: π it = exp(η x i t )/(1 + exp(η x i t )), and components' densities have a mean regression form. Here, we denote θ = (η , β 1 , β 2 ) as a grand mean parameter vector, then the log of joint likelihood for complete data is under the independence assumption within subject. For classical mixture models where the latent variable z it is missing, the maximum likelihood estimator (MLE) is typically computed through the expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977). The EM algorithm proceeds iteratively in two steps. In the E-step, we take the conditional expectation of complete-data loglikelihood given observed outcome response y and current parameters' estimates (θ (k) , φ (k) ). Since the complete-data log-likelihood L c ( y, z, θ , φ) in (2) is a linear function of latent variable z it , the E-step only requires us to impute z it by its conditional expectation w (k) ] denoted as the mixing weight. In the M-step, we obtain the k + 1th updates of parameters' estimations by maximizing It has been shown that the obtained series (θ (k) , φ (k) ) in the EM algorithm converges to the MLE estimator of the mixture distribution f ( y, θ , φ) = z f c ( y, z, θ , φ)d z (Dempster, Laird, and Rubin 1977).

UNBIASED ESTIMATING EQUATIONS
Due to the correlated nature of longitudinal data, it is important to incorporate correlation for mixture modeling as it provides more efficient estimation and benefits the longitudinal subgrouping. In this section, we first propose an efficient unbiased estimating equation for complete data, and then extend it to the incomplete-data mixture model.
To account for within-subject correlation among group memberships in (1), we have to deal with the joint likelihood function p(z i |π i ) of the latent multivariate Bernoulli variable z i . Unlike the multivariate Gaussian distribution, there is no explicit form of the likelihood function for the multivariate Bernoulli distribution. The unknown membership in the mixture models makes the incomplete-data likelihood function even more complicated since the latent z i needs to be integrated out from the complete-data likelihood function. One possible approximation is to apply the Bahadur representation (Bahadur 1961) for the multivariate Bernoulli distribution by ignoring high-order moments. However, there are several drawbacks such that the correlations are constrained through marginal means and covariates in a complicated way (Diggle, Liang, and Zeger 1994). In addition, the dimension of the correlation parameters could be very high when the repeated measurement size T is large, which leads to increasing computational demands.
To address this problem, we first introduce an unbiased estimating equation approach and establish it for the complete-data model. The proposed unbiased estimating equation regarding the mean parameter θ is formulated through the log-likelihood function L c in (2). Maximizing L c with respect to θ by solving the corresponding score equation is equivalent to solving the quasi-likelihood equation (Wedderburn 1974;McCullagh 1983) as follows: where μ r i = (μ ri1 , . . . , μ riT ) , μ rit = E[y it |z it = 2 − r] = μ r (β r x i t ) (r = 1, 2) are the corresponding mean functions for components' densities, Z i = diag(z i1 , . . . , z iT ) is a diagonal matrix of corresponding latent labels, and U 1i and U 2i are the diagonal covariance matrices of component densities respectively: U ri = var( y i |z i = 2 − r). The covariance matrices U 1i and U 2i could be functions of both mean parameters μ 1i , μ 2i and dispersion parameter φ. As a result, given θ , the dispersion parameter φ can be consistently estimated via the second moment conditions of component distributions. For the independent model, To account for serial correlation induced by latent variable z i , we assume i analogue to the generalized estimating equation (GEE) (Liang and Zeger 1986), where A i = diag{π i1 (1 − π i1 ), . . . , π iT (1 − π iT )} is a diagonal matrix of marginal variance of z i , and R(ρ) is a working correlation matrix. If R is the true correlation matrix for z i , then V i = cov(z i ).
In the following, we extend the proposed estimating equation to the incomplete-data model. To handle the missing latent indicator z i , we take the conditional expectations for the equations in (4) given the outcome observation y, and therefore construct the unbiased estimating equations for the incomplete data as where w i = E z [z i | y, θ ] is the mixing weight representing the inferred probability of the subgroup memberships, and W i is a diagonal matrix of corresponding mixing weights associated with y i . Similar to the GEE method, our approach can handle unbalanced data if the missing mechanism is missing completely at random (MCAR; Rubin 1976). If the missing mechanism is not MCAR, then the estimating equation estimators could be biased and inefficient, and weighted generalized estimating equations (WGEE) can be employed to deal with the missing not completely at random cases (Robins, Rotnitzky, and Zhao 1995;Rotnitzky, Robins, and Zhao 1998).
To solve the estimating Equation (5), we present a two-step expectation-estimatingequation (EEE) algorithm. Analog to the EM algorithm, the EEE algorithm follows an iterative estimating process. Specifically, at the (k + 1)th step, based on the current estimation of parameter θ (k) , we impute w i (k) = E z [z i | y, θ (k) ], and then update θ (k+1) by solving the estimating equation A more detailed EEE algorithm is provided in Section 3.3. The main difference between the above method and Rosen, Jiang, and Tanner (2000) approach is that we incorporate an additional estimating equation associated with the mixing proportion. This enables us to use the longitudinal correlations among the unobservable subgroup memberships from the same subject. Furthermore, in our estimating Equation (5), we can also use the correlation information from the outcome variable y i simultaneously through the working correlation structures of U 1i and U 2i . In this article, we focus on using the subject's group membership correlation over time. We assume that y it 's given z it 's are conditionally independent within the same subject, and thus U 1i and U 2i are set to be diagonal.
Indeed, the proposed model assuming working correlation structure has the same identifiability problem as the independent model, which is equivalent to the mixtures-of-experts model. We can refer to Jiang and Tanner's (1999) discussion for the identifiability problem of the mixtures-of-experts model.

ASYMPTOTIC PROPERTIES
In this article, we use the latent variable's serial correlation information to improve the parameter estimation and the subject's group membership prediction. We first examine the optimization properties for the complete-data equations.
Once we establish the optimal estimating equation S c (z, y, θ ) for complete data, then the conditional expectation S( y, θ ) = E[S c | y] is the best estimating equation for the incomplete data such that the L 2 norm S c (z, y, θ ) − S( y, θ ) is minimized. In other words, if the proposed S c (z, y, θ ) is efficient enough, then S( y, θ ) will inherit some efficiency from the complete-data model. In the following proposition, we establish the optimality of the unbiased estimating equation (4) for complete data.
Proposition 1. If the working correlation structure is correctly specified, that is, V i = var(z i ), the estimating equations in (4) are the optimal linear estimating equations with respect to (z i , y i ), in the sense that the asymptotic variance of the estimator solved by (4) reaches the minimum.
The proof is provided in the Appendix (A.2). Indeed, incorporating the serial correlation information is more important for the mixture model since z i cannot be observed. For the complete-data estimating equations in (4), the first set of equations with respect to the group membership and the other two equations associated with component densities are uncorrelated with each other. Therefore, the induced serial correlations only affect the estimation of mixing proportion parameters in π (η), but have no influence on estimating the component parameters in μ 1 (β 1 ) and μ 2 (β 2 ). However, for the incomplete data, the three equations in (5) are correlated because the imputed term w i is a function of y i . For the longitudinal data of the subjects, taking advantage of the information at the other time points through accounting for the serial correlation improves the estimation efficiency of both the component parameters and the mixing proportion parameters.
It is known that the consistency of the above estimator only depends on correct specification of the mean functions, but does not rely on correct specification of the working correlation structure nor on the estimation of the correlation parameters ρ. This is one desirable property for the proposed approach since the serial correlation induced from unobservable latent variables could be difficult to model and estimate precisely. In contrast to the full joint likelihood approach, we only require the second moment approximation. Furthermore, the following theorem establishes the asymptotic properties of the proposed estimators.
Theorem 1. Let γ = (θ , φ ) denote a grand parameter vector, in the proposed unbiased estimating Equation (5), assuming that the following regularity conditions are satisfied: (i)φ is consistently estimated given θ via some unbiased estimating equation , H i (φ|θ) ) be the augmented unbiased estimating equation, then the asymptotic distribution ofγ obtained from n i=1 G i = 0 is: n 1/2 (γ − γ ) → N (0, V g ), where the asymptotic covariance matrix V g is given by Theorem 1 is established based on the following unbiased estimating equations: A sketch of the proof of Theorem 1 is given in the Appendix. Noting that even if w i is imputed through the marginal expectation w it = E[z it |y it ], the estimating equations in (5) are still unbiased. The estimating equation H i for the dispersion parameter is established based on the second moment condition provided in the Appendix (A.1).
In practice, a consistent variance estimator ofγ can be obtained by replacing cov(G i (γ )) byĜ iĜ i , and ∇G i by ∇Ĝ i at the estimates ofθ ,ρ,φ. One practical difficulty arises from getting an explicit form of the gradient ∇Ĝ i in the calculation of mixing weights w i . The imputation term w i is the posterior probability of z i containing both the response value y i and all the parameters θ , φ, and ρ. The complicated form of w i makes it difficult to calculate the exact derivatives, especially when the conditional expectation w i = E[z i | y i ] requires specifying the joint likelihood function for z i . We present some numerical approximation methods to calculate w i and ∇G i . More details are provided in Section 3.3.

ALGORITHM AND IMPLEMENTATION
In this section, we provide the two-step iterative EEE algorithm and its theoretical properties. In addition, the details of marginal imputation and the joint imputation methods are provided.

In
Step 3 of Algorithm 1, with the current mixing weights, we apply the Newton-Raphson method to solve the estimating equations in (6) to obtain the updates θ (k+1) for the mean parameter. The updated dispersion parameter φ (k+1) can be estimated using the second moment conditions given θ (k+1) . See Appendix (A.1) for more details. In the Newton-Raphson algorithm, the first derivative of the estimating function can be calculated as a simple diagonal matrix: For the EEE Algorithm 1, the estimating function in (5) can be regarded as a bivariate function S(θ |λ) restricted on the subspace θ = λ, which is denoted as S(θ |θ). In fact, the first argument θ of S(θ |θ) comes from the mean regression part, while the second argument comes from the conditional expectations. The EEE Algorithm 1 is simply updating the estimation of θ by solving S(θ |θ (k) ) = 0 in (6). If the iterative sequence of the estimator obtained from the EEE algorithm converges, it will converge to the solution of the original equation S(θ |θ) = 0 which is guaranteed by the continuity. The following lemma provides a local convergence of the estimator based on the Algorithm 1 in a more general form.
Lemma 1. Suppose that S(θ |λ) is a bivariate function on the space × satisfying the following regularity conditions: (a) S(θ |λ) is continuous and both ∂S ∂θ and ∂S ∂λ exist; The proof of Lemma 1 is provided in the Appendix. Lemma 1 provides a sufficient but not necessary condition for the algorithm's convergence. Indeed, the algorithm convergence problem is also associated with general convergence theory regarding the iterative solutions to nonlinear equations. More detailed discussion of this type of algorithm can be found in Ortega and Rheinboldt (1970).
In practice, it is possible to have multiple roots for the proposed estimating equations. One suggested method is to choose the root closest to the independent estimator which is consistent (Small, Wang, and Yang 2000). Seo and Kim (2012) proposed to exclude all singular solutions and therefore reduce the risk of selecting spurious roots of the likelihood equation. Their method is powerful in selecting reasonable roots for the independent finite normal mixture model. In addition, we may also need to provide a "good" initial value for the iterations to converge. One way is to select different initial values randomly until the algorithm converges; another choice is to set the initial value as the estimator obtained from the existing independent mixture model.
In Algorithm 1, the E-step ( Step 2) requires one to impute the missing latent variable z i through its conditional expectation w i = E[z i | y, θ , φ]. However, this joint conditional expectation involves the joint density of the multivariate binary variable z i , which is very difficult to calculate. One possible way is to use the marginal imputation w it = E[z it |y it , θ , φ] since we only assume a marginal distribution for the individual observation (y it , z it ) in our approach. The marginal imputation guarantees the estimating equations in (5) to be unbiased, where f 1 and f 2 are component densities. In fact, the imputation w it provides the inferred subgroup membership of the outcome observation y it . The drawback of the marginal imputation is that it only uses local information to infer the group membership. If two subgroups are well-separated, that is, most w it are either close to 1 or 0, then the marginal imputation is sufficient since the local information y it dominates the subgroup membership's prediction. However, if two subgroups are not well-separated, then the marginal imputation does not fully use the within-subject correlation to improve the subgroup classification.
Therefore, we provide an alternative approximation of the joint imputation w * it = E[z it | y i ] which relies only on the second moment condition of z i . To predict the subgroup membership of y it , we consider the conditional expectation E[z it |y it , z i(−t) ] based on local observation y it and all latent group labels z i(−t) at other time points for the ith subject, where z i(−t) = (z i1 , , . . . , z i(t−1) , z i(t+1) , . . . , z iT ). Since . By taking account of z i(−t) , this imputation w * it allows us to use the group membership information from other time points within the same subject as well.
However, the exact value of E[z it |z i(−t) ] still requires the joint likelihood function for z i . Motivated by Qaqish's (2003) conditional linear family for the multivariate binary distribution in generating correlated binary data, we consider a linear approximate: Based on the fact that for any two random variable X and Y, cov(X, Y ) = cov(X, E[Y |X]), we have:

Denote cov(z i(−t) , z it ) as s i t and cov(z i(−t) ) as V i(−t) , then b i t = V i(−t)
−1 s i t . Here, both s i t and V i(−t) could be obtained from the second moment condition cov(z i ).
To implement Algorithm 1, at the (k + 1)th step, we use the current estimators θ (k) and V (k) i(−t) to obtain π * it (k+1) and thus impute w * it (k+1) . Here, the true latent label z i(−t) is replaced by the last predictionẑ i(−t) (k) . A similar technique is also used by Qian and Titterington (1992) to approximate P (z it |ẑ (k) i(−t) ) in a hidden Markov field modeling. In Section 3.2, we mention that to obtain the robust variance estimator ofγ , it requires us to calculate the gradient of G i (γ ). Here, we take the numerical approximation of the gradient ∇G i (γ ) by G i (γ (k+1) )−G i (γ (k) ) , where γ (k) and γ (k+1) are obtained from the previous two adjacent iterations, and the ij th component of ∇G i (γ ) corresponds to the ratio between the ith and jth components of S i (γ (k+1) ) − S i (γ (k) ) and γ (k+1) − γ (k) , respectively.

SIMULATION STUDY
In this section, we conduct two simulation studies to illustrate the performance of the proposed method on mixture modeling for longitudinal data. We are particularly interested in comparing the performance of our method to other approaches when the serial correlation is induced by latent variables. Our simulation results show that we can gain efficiency on parameter estimation by using correlation information.
In the first simulation study, we consider a two-component mixture of univariate normal distribution, and mainly focus on the estimating performance of the mixing proportion parameters with different dependence structures. In the second simulation study, we consider a mixture of two regression models, where the separation levels of two subgroups will change over time. The proposed method has extra power in both estimation and prediction, especially at poorly separated time points.

STUDY 1: TWO-COMPONENT MIXTURE OF UNIVARIATE NORMAL DENSITIES
In this simulation study, we first generate component indicator variables Z i from Bernoulli distributions for subjects i = 1, . . . , n. For each Z i , we assume there are T repeated measurements over time, where Z i = (Z i1 , . . . , Z iT ), and Z it is a binary variable following a logistic regression with time covariate. That is, E[Z it ] = π it and log( π it 1−π it ) = η 0 + η 1 t T . Conditional on Z it , we generate outcome response Y it from two univariate normal distributions: Y it |(Z it = 1) ∼ N (μ 1 , σ 2 1 ) and Y it |(Z it = 0) ∼ N (μ 2 , σ 2 2 ). For latent variable Z i , we assume independence among different subjects, but a common correlation structure (either AR-1 or exchangeable) within a subject over time. For outcome response Y i , we assume independence among different subjects and conditional independence within a subject given component indicator Z i . In our simulation studies, we generate binary responses through the R package "mvtBinaryEP".
In the following simulation studies, we choose the initial value randomly in the neighborhood of the independent estimator until the EEE algorithm converges. The estimation results are summarized based on 1000 replicates. In our simulations, the empirical standard errors are quite close to the standard errors calculated from the robust sandwich variance, and therefore we only provide the empirical standard errors in our tables. In practice, labelswitching issues might arise, especially for Bayesian mixture models. Yao (2012) and Yao and Lindsay (2009) discussed many feasible labeling methods. In our simulation study, we solve the labeling problem by imposing an ordered constraint on components' mean parameters.
We compare the estimators based on the proposed unbiased estimating equation with either joint imputation (UEE Joint ) or marginal imputation (UEE Mar ), with working correlation structure of either exchangeable, AR-1 or unstructured, the mixtures-of-experts model (Jacobs et al. 1991) which is the same as the independent estimating equation (UEE Ind ), and the oracle estimators (Oracle) assuming the true values of latent variable Z i are known. The oracle estimator serves as a benchmark estimator, where the generalized estimating equation (GEE) uses the correlation structure for the binary data Z i , and the MLE estimators of component parameters are obtained given Z i . In the following tables, we denote the correlation structure in the superscript such as UEE AR1 Joint . In the text, to avoid redundant notation, we omit the superscript if the correlation structure is correctly specified.
In addition, we also compare the proposed method with the random-effects model (Sun, Rosen, and Sampson 2007). In the random-effects model, we assume E[Z it ] = π it and is the random intercept accounting for the subject effect. Given γ i , the latent variables within the ith subject are independent. The random-effects estimator (RE) are obtained by the EM algorithm.
In Tables 1 and 2, we observe that all the estimating equation estimators are consistent as discussed in Section 3, including the estimators using misspecified correlation structures. However, we can improve the estimating efficiency if the correct correlation information is incorporated especially for the joint-imputed model. This is reflected in that the UEE Joint estimators have much smaller standard errors compared with the other estimators in both well-separated and poorly separated cases. Indeed, the UEE Joint performs almost the same as the Oracle estimator. For a well-separated case or if an exchangeable correlation structure is assumed, the performance of the UEE Mar is quite similar to that of the UEE Ind since the longitudinal data generated here are balanced (Liang and Zeger 1986). But when two subgroups are poorly separated and the latent variable has an AR-1 correlation structure, then the UEE Mar has smaller standard errors than UEE Ind .
In addition, if we compare Table 1 and Table 2, we notice that the standard errors of the independent estimators increase significantly from a well-separated case to a poorly separated case, while the proposed UEE Joint is much more stable and performs similarly to the Oracle estimator. Table 3 provides the classification error rates from the model-based clustering. The proposed UEE Joint model has sufficient power in predicting the subgroup membership since we incorporate the information from other time points for the same subject.
In general, it is difficult to know the true correlation structure for latent variables. The unstructured working correlation is always a possible choice because of its flexibility, as it does not assume any pattern for the correlation structure. However, the unstructured correlation leads to additional computational cost, as it has more correlation parameters T * (T −1) 2 to estimate compared with the AR-1 and exchangeable working correlations. In addition, the variation introduced by unstructured correlation leads to less efficient estimations for regression parameters and increases the chance of the convergence problem  1.21 0.14 1.20 0.05 1.21  in the EEE algorithm. The unstructured model is recommended when the sample size n is large and the repeated measurement size T is relatively small in the well-separated case. In Tables 1 and 2, UEE Uns Mar and UEE Uns Joint do not show significant improvement in estimations, but they have more power in prediction compared with the independent and misspecified models in Table 3.
In Tables 1 and 2, the random-effect estimators perform poorly with large bias and standard errors. This is because Sun, Rosen, and Sampson (2007) approach can only incorporate the random intercept which might not be sufficient to handle the within-subject dependence if group membership changes over time. In addition, we also conduct a simulation study where the latent variable Z i is generated from the random-effects model only (Table 4). The random-effects model generates correlations which do not have an obvious pattern. Therefore, the proposed estimating equation assuming a certain pattern of working structure for serial correlations might not be the most efficient in estimation. Nevertheless, Table 4 indicates that although the standard errors of the estimating equation estimators are slightly overestimated, they are still acceptable. In all, regardless of which source the dependence within subjects is induced from, the proposed UEE estimators are robust and efficient in general.

STUDY 2: TWO-COMPONENT MIXTURE OF LINEAR REGRESSION MODELS
In simulation study 2, we consider a mixture of two linear regression models. In this case, not only the mixing proportion, but also the component densities follow a mean regression model with time covariates.
Similar to the simulation study 1, we generate component indicator variables Z i 's from a logistic model, and for each Z i , we assume there are T repeated measurements over time with Z i = (Z i1 , . . . , Z iT ). The mixing proportion E[Z it ] = π it is generated from the logistic model: with either an AR-1 or exchangeable correlation structure. Conditional on Z it , we generate the outcome response Y it from two normal distributions: In this study, we let the sample size n = 100 and time points T = 5, the mixing proportion parameters in the logistic model are set to be (η 0 , η 1 ) = (−0.3, 0.5), and the component's regression parameters are set as (β (1) 0 , β (1) 1 ) = (−3, 2) and (β (2) 0 , β (2) 1 ) = (3, −2), the variance parameters are set as (σ 2 1 , σ 2 2 ) = (1, 1). In contrast to the first simulation setting, the mixture components are regression functions of time covariates, where the component means are changing over time, leading to different separation levels at different times. Figure 1 illustrates four different separations of two components at different time points in this simulation study. This motivates us to take advantage of serial correlations within subjects to improve accuracy in predicting class memberships and thus to improve the estimations of the regression parameters. In addition, we allow subjects to change group memberships over time (see Figure 2 as an illustration), which makes our approach more flexible compared to growth-curve mixture modeling.
The results in Table 5 show that we gain extra efficiency with smaller standard errors in estimating both the mixing proportion parameters and the component regression parameters for the proposed UEE Joint estimators especially on slope estimators. Table 6 indicates that our approach using the within-subject correlation information provides more predictive power, especially at poorly separated times. In general, when within-subject correlation is large, our approach is able to "borrow" information from well-separated observations to enhance membership prediction for poorly separated observations by incorporating the correlations within each individual subject. Consequently, improvement of the prediction of the subgroup's membership leads to better estimation of the slope parameter associated with time effect.

REAL-DATA EXAMPLE: 2008 ELECTION DATA
In this section, we apply the proposed method to the 2007-2008 AP-YAHOO NEWS election panel study (http://www.knowledgenetworks.com/GANP/ election2008/index.html). The study was conducted by Knowledge Networks on behalf of the Associated Press and Yahoo! News (APYN) which intends to measure opinion changes starting with the primary elections through the presidential election in November 2008. The data consist of 4965 participants over 11 waves from November 2007 to December 2008, with nine waves before the election, one wave on election day, and the last wave after the election. The primary goal of the study is to investigate the change of participants' interest in the election over time and important factors associated with their interest. One important factor associated with the interest in the election is interest in election news.
Therefore, we choose one of the survey questions: "Question LV3: How much interest do you have in the following news about the campaign for president, a great deal, quite a bit, only some, very little, or no interest at all." The five categories of opinions are recorded as an ordinal response variable: 1, 2, 3, 4, and 5, where a smaller value corresponds to a high level of interest in the election news. To measure the opinion change toward the election, we focus on the first nine waves occurring before the election date. There are Figure 2. Plots of longitudinal responses from a subset of subjects in simulation study 2. 1200 participants who have completed the "Question LV3" over the first nine waves. In the following, we have n = 1200 and the repeated measurement size T = 9. In this study, we intend to classify all the participants into two groups, whether they actively follow the election or not, based on their responses to the question "LV3" in the AP-Yahoo survey. Since the survey collects participants' responses longitudinally, it is not surprising that their interest toward the election could be different at different time points. Consequently, this results in changes of group membership over time. The covariates include time, gender, and race, where gender is 1 for female and 0 for male, and "race" consists of "white," "black," and "the other" coded as dummy variables with "white" as the reference. In addition, we also include an interaction term between "time" and the "gender."  We formulate a two-component mixture model as follows. Let the latent variable Z it indicate whether the participant i at the time point t is interested in election news (Z it = 1) or not (Z it = 0). We model the mixing proportion using a logistic regression to capture the change in group membership, and the univariate Gaussian for the component distribution. We compare estimators via the proposed unbiased estimating equations using different working correlation structures: independent (UEE Ind ), exchangeable (UEE Exch ), AR-1 (UEE AR1 ) and unstructured (UEE Uns ), in addition to the random-effects model (RE). The joint imputation is applied in all cases. We focus on modeling the mixing proportion since we are interested in opinion changes over time. The estimates of mixing proportion parameters with corresponding p-values are summarized in Table 7. The p-values are calculated based on the asymptotic normal distribution since the sample, size n = 1200 is quite large. Table 7 indicates that the participants become more and more interested in the election campaign as the time gets closer to election day. Male participants are more interested in election news than females on average; however, females become increasingly more interested in election news than their male counterparts as the election gets closer. In addition, the "black" group is more interested in election news compared to the "white" group, while the "other" group is slightly less interested in election news than the "white" group. In total, about 43.5% of the participants changed their group memberships of showing interest in election news over time.
Furthermore, Table 7 shows that the proposed estimating equation estimators using different working correlations are similar in identifying the effects for "gender" and the interaction of "gender" and "time," which are highly significant. However, the estimators for the gender and the interaction term are not significant using the random-effects model with much larger p-values. In addition, the Pew Research Center (2010) reports that the "race" factor played a significant role in the 2008 presidential election, where "black" voters showed more interest in the election than other races in general. For this survey study, the "black" group is only about 7.5% of the participant population (90 out of 1200) which makes it more difficult to detect the "race" factor. Neither the random-effects model nor the independent estimating equations are able to detect a significant difference between the "black" group and the "white" group. However, the proposed method using the unstructured correlation is capable of identifying that the "black" group is significantly different from the "white" group with a p-value of 0.04. This implies that the proposed method accounting for the serial correlation can improve the estimation efficiency and increase testing power to detect a factor which might not be picked by other approaches.

DISCUSSION
In this article, we propose an unbiased estimating equation approach for mixture modeling in longitudinal data, and illustrate how to induce correlation at the level of latent subgroup indicator variables. The proposed method does not require that each subject belong to the same group at different time. To circumvent the complicated form of joint likelihood of the multivariate Bernoulli variables, we propose an unbiased estimating equation approach using the first two moment approximations of the full likelihood, where the serial correlation is incorporated through a working correlation structure. The proposed estimating equations can be regarded as a projection of optimal estimating equations obtained from the complete data via taking the conditional expectation for the missing latent variables.
Our numerical studies show that incorporating correlation information allows one to gain estimation efficiency for the mean regression parameters and the mixing proportion parameters compared to the independent models. The efficiency improvement is significant if the correlation from the longitudinal data is strong and the working structure is correctly specified. In addition, we can improve the classification accuracy for the boundary observations through joint imputation for the missing latent variable.
We can further generalize the current method for more than two subgroups in the population through the extended generalized estimating equation, applying the cumulative logit model for a multinomial latent variable. In addition, we can extend the univariate outcome variable to the multivariate component distribution. If the dimension of the multivariate distribution is high, we can employ the variable selection method by incorporating some penalty terms (Xie, Pan, and Shen 2010). In this article, we mainly focus on modeling serial correlation arising from the latent variable, it would be worthwhile to consider a more complicated setting where within-subject dependence is induced by both latent variable and outcome variable.