Stepwise Latent Class Models for Explaining Group-Level Outcomes Using Discrete Individual-Level Predictors

Explaining group-level outcomes from individual-level predictors requires aggregating the individual-level scores to the group level and correcting the group-level estimates for measurement errors in the aggregated scores. However, for discrete variables it is not clear how to perform the aggregation and correction. It is shown how stepwise latent class analysis can be used to do this. First, a latent class model is estimated in which the scores on a discrete individual-level predictor are used to construct group-level latent classes. Second, this latent class model is used to aggregate the individual-level predictor by assigning the groups to the latent classes. Third, a group-level analysis is performed in which the aggregated measures are related to the remaining group-level variables while correcting for the measurement error in the class assignments. This stepwise approach is introduced in a multilevel mediation model with a single individual-level mediator, and compared to existing methods in a simulation study. We also show how a mediation model with multiple group-level latent variables can be used with multiple individual-level mediators and this model is applied to explain team productivity (group level) as a function of job control (individual level), job satisfaction (individual level), and enriched job design (group level).

Typically, multilevel models are used to explain an individual-level dependent variable with individual-and group-level predictors (Goldstein, 2010;Hox, Moerbeek, & van der Schoot, 2010;Raudenbush & Bryk, 2002;Snijders & Bosker, 2012). However, predicting group-level outcomes is equally important. Snijders and Bosker refer to the latter as micro-macro analysis since a micro-(or individual-) level predictor is assumed to affect a macro-(or group-) level outcome. Such a micro-macro analysis is relevant, for example, when a health psychologist is interested in whether students' attitudes (micro-level predictor) affect teachers' stress Correspondence concerning this article should be addressed to Margot Bennink, Statistical Innovations Inc., 375 Concord Avenue, Belmont, MA 02478-3084. E-mail: margot@statisticalinnovations.com Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/hmbr. (macro-level outcome). Micro-macro analysis may also be relevant in multilevel mediation research. For example, a developmental psychologist may wish to investigate whether parenting practices (macro-level predictor) affect children's study habits (micro-level mediator), which in turn affects children's achievement (micro-level mediator), which subsequently affects parental stress (macro-level outcome; Bovaird & Shaw, 2012). These micro-macro (mediation) relationships cannot be addressed within the mainstream multilevel framework (Preacher, Zyphur, & Zhang, 2010).
Traditionally, data from micro-macro designs are analyzed using aggregation, that is, the group means of the individual-level variables are assigned to the groups, and subsequently, a group-level analysis is performed using the aggregated individual scores and group-level variables. This is in fact a stepwise procedure since the aggregation (measurement model) is separated from the group-level analysis (structural model). An example from group-performance research is provided by van Veldhoven (2005), who studied the relationships between perceived human resource practices, work climate, and job stress, and prospective and retrospective financial performance. Because these financial performance indicators are available only at the business level, individual survey scores were aggregated to mean scores to perform a single-level analysis at the business level.
Although this aggregation approach appears correct, it has serious drawbacks. First, implicit in this approach is the assumption that the group members provide perfect information about their group, while in practice it is more realistic to assume that the group means contain measurement error. Croon and van Veldhoven (2007) showed that ignoring this measurement error causes a bias in the estimates of the structural model parameters. A second problem is that it is not clear how to aggregate categorical predictors to the group level. For example, for nominal variables with more than two categories, the group mean has no substantive meaning. While it might be more appropriate to use group modes instead, the problem of ignoring measurement error remains.
The current article shows how to keep working in the same stepwise manner, to separate the aggregation from the group-level analysis but also correct the group-level estimates for the bias caused by measurement error in the aggregated scores. To overcome the measurement error issue, a latent variable model for two-level data is developed, where the individual-level responses serve as indicators for a group-level latent variable, which is used as a predictor of the group-level outcome variable. This model is described for continuous data by Croon and van Veldhoven (2007). For categorical variables, a latent class model with a categorical group-level latent variable can be used (Bennink, Croon, & Vermunt, 2013). This implies that group-level latent classes differ regarding the group members' responses. For example, one class can contain groups mostly consisting of members with high scores, and another class can contain groups consisting mostly of members with low scores.
A brief description of a three-step latent class approach follows: Step (1) Estimate a latent class model where the scores on the discrete individual-level predictor are used as indicators for a group-level latent class variable (measurement model).
Step (2) Use this latent class model to aggregate the individual-level predictor to the group level by assigning the groups to the latent classes.
Step (3) Perform a group-level analysis in which the aggregated measures are related to the remaining group-level variables (structural model), while adjusting for the measurement errors in the class assignments. The latter adjustments are based on work by Bakk, Tekle, and Vermunt (2013); Bolck, Croon, and Hagenaars (2004);. This stepwise latent class approach to micro-macro analysis with discrete data makes it possible to make adjustments in the structural model without changing the measurement model on which the aggregation is based. Thus, the assigned or aggregated group-level scores remain unchanged when a group-level variable is added or removed from the group-level analysis. This would not be the case when the measurement and the structural model are estimated simultaneously in a one-step procedure.
This article is organized as follows. First, we introduce a latent class model for discrete micro-macro analysis using a mediation model containing a single individual-level mediator variable. Second, we show how this model can be applied in a stepwise manner, and thirdly, we evaluate the model in a simulation study. Fourth, this stepwise procedure is applied to a more complex mediation model with two individuallevel mediating variables, and we conclude with a stepwise application to real data where team productivity (macro-level outcome) is affected directly by enriched job design (macrolevel predictor) and indirectly by job control (micro-level mediator) and job satisfaction (micro-level mediator).

MULTILEVEL LATENT CLASS MODEL FOR MICRO-MACRO ANALYSIS
Consider a model in which group-level predictor X j is expected to affect group-level outcome Y j , directly and indirectly through the individual-level mediator variable Z ij . Subscript j is used to denote a particular group and i to denote the individuals within a group. A group-level intervention theory can be tested with this model when a characteristic of the group members influences a group-level (performance) measure both directly and indirectly. These types of models are sometimes referred to as 2-1-2 models since the effect of the level-2 independent variable (X j ) on the level-2 dependent variable (Y j ) is mediated by a level-1 variable (Z ij ). In addition, the term "bathtub model" is used here because of the shape of the conceptual model as shown in Figure 1 (Coleman, 1990). As can be seen from the conceptual model, the (latent) group-level variable ζ j is introduced to estimate the effect of the individual-level predictor Z ij on the group-level outcome. The observed scores Z ij are interpreted as exchangeable indicators for the latent group-level variable ζ j . If ζ j were observed, the analysis would break down to a single-level analysis at the group level, and a well-known (logistic) regression analysis could be performed. Since ζ j is not observed, scores have to be assigned based on the individual-level variable Z ij that is observed. Because ζ j is an unobserved (latent) variable, the assigned scores will always contain a certain amount of measurement error.
When mean or mode aggregation is used to estimate a 2-1-2 model, researchers implicitly apply this conceptual model. The group means or modes are computed based on the individual-level variable Z ij , which is a way of assigning scores to ζ j based on Z ij . Afterwards, a single-level analysis at the group level is performed. This method ignores that ζ j contains measurement error, and as explained in the introduction, it is also questionable whether using a mean or mode group score is the optimal method for aggregating FIGURE 1 Micro-macro model with group-level predictor X j , individual-level predictor Z ij , group-level latent class variable ζ j , and group-level outcome variable Y j . a discrete variable. However, by explicitly introducing the latent group variable ζ j , it becomes clear how a latent class model can solve the problems associated with mean or mode aggregation.
The solution proposed here formulates a latent class model for two-level data where the scores of the individual-level units i within group j on the micro-level predictor Z ij are treated as indicators of a discrete latent class variable defined at the group level, ζ j . In other words, a (latent) typology of groups is constructed based on the responses of the group members. This part of the model is referred to as the measurement part, with the number of indicators of the latent variable equal to the number of individuals within a group. All group members are treated equally so that judgments from one group member are considered to be no more accurate than judgments of others. Thus, the relationship between the individual-level variable and the group-level latent variable is assumed to be identical for all individuals within a group. According to the local independence assumption commonly made in latent class analysis, responses of individual group members are independent given the score of their group on the latent variable. In the structural part of the model, the group-level latent classes are related to the group-level independent variable X j and the group-level dependent variable Y j .
For the general case where all variables in the latent class model are discrete, the model can be formally described with three multinomial logit models (Agresti, 2012). Let there be P, Q, R, and S nominal response categories for Z ij , ζ j , X j , and Y j , respectively, where a particular category is denoted by p, q, r, and s. Then there are P − 1, Q − 1, and S − 1 different logit equations defined for Z ij , ζ j , and Y j , respectively, in which each category is compared to an arbitrary baseline category. Taking the first categories as the reference (baselines), the multinomial logistic equations are: and Each equation contains an intercept term (β Z p , β ζ q , and β Y s ) and an effect for each predictor variable (β Z . Equation (1) describes the measurement part of the model in which the scores of the individual-level units on the microlevel predictor Z ij are treated as exchangeable indicators of a discrete latent class variable defined at the group level, ζ j . The structural part of the model is defined in Equations (2) and (3) in which the group-level latent variable is related to the other group-level variables: ζ j is regressed on X j , and the outcome Y j is regressed on ζ j and X j .
The parameters of this latent class model can be estimated simultaneously with full information maximum likelihood estimation, and thus, this method is referred to as the onestep approach. This approach has two drawbacks. First, it is counterintuitive to aggregate the individual-level variable to the group level and relate the aggregated scores to the remaining variables simultaneously, since researchers are used to working in a stepwise matter when a manifest mean or mode is used in the group-level analysis. Second, the latent group-level variable is defined not only by the micro-level indicators but also by the remaining variables in the structural model. Thus, the full model must be re estimated when the structural part of the model is altered by, for example, adding or removing a level-2 covariate or outcome. Consequently the measurement model may change. Since the theory dictates the latent classes to predict the outcome variable, allowing these classes to be affected by the outcome variable is undesirable. These problems associated with the one-step approach are circumvented with a stepwise approach.

STEPWISE ESTIMATION
Stepwise estimation of the micro-macro latent class model proceeds as follows: • First step: Estimate the measurement model (i.e., relate the micro-level variable to the latent group-level variable). • Second step: Aggregate the micro-level predictor to the group level by assigning groups to latent classes (i.e., a typology of groups is constructed). • Third step: Estimate the structural part of the model while correcting for the classification errors that were made in the second step.
Thus, as illustrated graphically in Figure 2A, a measurement model is defined for ζ j , and the corresponding latent class model is estimated in the first step of the analysis: Here, the vector Z j contains the I j responses Z ij of the members of group j. The model parameters are the class proportions P (ζ j = q) and the conditional response probabilities P (Z ij |ζ j = q), which are parameterized typically using a logit equation. As in the one-step model, the responses of the individual group members on Z ij are assumed to be exchangeable and independent given the latent classes, but the meaning of the latent classes is now determined by the individual-level scores on the micro-level predictor and no longer by the scores at the group-level variables.
The number of latent classes of ζ j is also determined during this step. The optimal number of latent classes is determined by comparing the fit of several latent class models differing only in the number of group-level classes. Model fit can be evaluated using the Bayesian information criterion (BIC) or similar statistics. Since the decision is about the number of classes at the group level, the fit indices that incorporate the sample size should be based on the number of groups (Lukočienė, Varriale, & Vermunt, 2010). The best fitting model is the model with the lowest value on the fit statistics.
In the second step, the groups are assigned to the Q latent classes based on their scores Z j . Let W j denote the assigned class for group j. Figure 2B illustrates the process by which W j is constructed. As in a standard latent class analysis, the class assignments are obtained using the posterior class membership probabilities P (ζ j = q|Z j ) from the first step. Several types of deterministic and probabilistic assignment rules have been proposed, but in the current article, we focus on modal and proportional assignment. With modal assignment, each group is assigned to the latent class for which the posterior probability is largest. With proportional assignment, a group is assigned to each class with a probability equal to the posterior membership probability for the class concerned.
Unless the micro-level predictor is a perfect indicator for the group-level latent class variable, classification errors will be made during the assignment. To account for these classification errors (in Step 3), we use the Q × Q classification table with the entries P (W j = t|ζ j = q). Note that P (W j = t|ζ j = q) is the conditional probability that a group belonging to class q is assigned to class t of W j (t = 1, · · · Q). The off-diagonals of this table represent classification error probabilities. In Appendix A, we show how these probabilities can be obtained from the latent class model parameters.
At the end of the second step, there will be Q types of groups, and each group will be categorized (with known classification error) as a particular group type based on the Z ij scores of the individuals within the group.
In the third step, the structural model is estimated, that is, the assigned scores W j are related to the other group-level variables, in this case X j and Y j . As shown by Bolck et al. (2004) in single-level latent class models, biases are caused by the classification errors introduced in the second step, that is, by the fact that we have W j instead of ζ j . Bolck et al. also showed how to prevent bias by adjusting for the classification errors. The key to this adjustment is the following relationship between P (Y j , W j |X j ) and P (Y j , ζ j |X j ):

666
BENNINK, CROON, VERMUNT This equation shows that a model for P (Y j , ζ j |X j ) can be obtained by estimating a model for P (Y j , W j |X j ) and correcting for the probabilities P (W j |ζ j = q), which were computed in Step 2. The resulting model is illustrated graphically in Figure 2C.
The unknown probabilities P (ζ j = q|X j ) and P (Y j |X j , ζ j = q) in Equation (5) can be estimated with maximum likelihood (ML) if the error probabilities P (W j |ζ j = q) are treated as known. In this analysis, W j is used as a single indicator for ζ j . The probabilities P (ζ j = q|X j ) and P (Y j |X j , ζ j = q) and the corresponding logit coefficients (see Equation [2] and [3]) are freely estimated, while P (W j |ζ j = q) is treated as known and thus fixed. We refer to this approach as the ML three-step method (Bakk et al., 2013;Vermunt, 2010). An alternative proposed by Bolck et al.-and that we therefore call the Bolck-Croon-Hagenaars three-step approach (BCH)-involves reformulating the problem into a weighted analysis (see also Vermunt, 2010). More specifically, by weighting the data points by the inverse of the classification probabilities P (W j |ζ j ), we adjust for the fact that W j contains classification errors. The reweighted data can then be used as observed data to estimate the structural parameters of interest.
Previous research on three-step latent class analysis has shown that the bias-adjusted stepwise approaches work very well in situations encountered in practice. However, this approach may overcorrect when a small sample size is combined with a very large proportion of classification errors (Bakk et al., 2013;Vermunt, 2010). In such situations, the maximum likelihood estimates of the first-step latent class model will tend to yield classes that are more different than they truly are (Galindo-Garre & Vermunt, 2006). Consequently, classification error is underestimated, and the structural parameters are not adjusted to a sufficient degree.
The proportion of classification errors is reflected in the entropy-based R-square (Vermunt & Magidson, 2013b), a common measure for class separation: How well class membership is predicted given the observed data depends on Entropy(ζ ), the total error when ζ is predicted without using information on Z and Y, and on Entropy(ζ |Z, Y ), the prediction error if all observed information from the cases is used. Formally, Entropy(ζ ) and Entropy(ζ |Z, Y ) are described by: and Thus, Entropy(ζ |Z, Y ) is the (weighted) average of the case-specific errors. The proportion of classification errors and the class separation are mainly a function of the number of indicator variables (in micro-macro analysis, this is the number of individuals within each group), the number of classes, and the response probabilities for the most likely response. Since all group members are assumed to be exchangeable, the response probabilities for the most likely response are the same for individuals within the same group, which makes the measurement model of a micro-macro model more parsimonious than a regular latent class analysis. In the following simulation study, we investigated under which conditions the class separation was large enough to perform a stepwise analysis for the current model. Thus, Entropy(ζ |Z, Y ) is the (weighted) average of the casespecific errors. The proportion of classification errors and the class separation are mainly a function of the number of indicator variables (in micro-macro analysis, this is the number of individuals within each group), the number of classes, and the response probabilities for the most likely response. Since all group members are assumed to be exchangeable, the response probabilities for the most likely response are identical for all individuals within the same group, which makes the measurement model of a micro-macro model more parsimonious than a standard latent class analysis. In the following simulation study, we investigate the conditions under which the class separation is large enough to perform a stepwise analysis for the current model.

SIMULATION STUDY
In this section, we begin by evaluating the performance of the stepwise approaches and compare it to the performance of manifest aggregation with a group mode, one-step latent aggregation, and stepwise latent aggregation without correcting for measurement error. Then we attempt to find the lower boundary for the class separation at which the stepwise approaches yield correct results. All analyses were carried out with Latent GOLD 5.0 (Vermunt & Magidson, 2013a).
In order to keep the setup simple, since this was the first simulation study performed in this field of research, data were generated according to the model shown in Figure 1 with all variables being dichotomous. Later discrete variables with more than two categories are used in the empirical data example. A typical setting was created by fixing the betweengroup effects from the structural part of the model to .4 on a logistic scale using effect coding (β Y X = β ζ X = β Y ζ = .4). 1 The number of groups was fixed at 100 (J = 100) and the number of individuals within a group at 10 (n j = 10), which are the minimum sample sizes for this type of analysis (Bennink et al., 2013). It was expected that larger samples would provide slightly better results. The relationship between the scores on the micro-level variable and the latent variable varied from weak to strong, again using effect coding (β Zζ = .2, .4, .6, or .8). This corresponds to class separations R 2 entr = . 24, .64, .88, and .97 where R 2 entr denotes the entropy-based R-square. In a single-level latent class analysis, R 2 entr = .24 or .97 would be very unlikely. In the current situation, these values refer to the situation in which individual scores are unreliable measures of the overall measure and to the situation in which the number of individuals within the groups yield high entropy. For each of the four conditions, 500 data sets were generated and analyzed with manifest mode aggregation, the latent variable one-step approach, and the latent variable three-step approaches. The threestep procedures were applied with modal and proportional assignment. 2 The estimated structural parameters under each condition are averaged, and one-sample t tests of whether they differ from the true value .4 are summarized in Table 1. As expected, the one-step and the corrected three-step procedures recovered the true values of the estimates in most conditions. Where indicator quality was poor and estimates of class separation being overestimated (β Zζ = .2), both methods performed less well. The standard deviations of the estimates decrease as the quality of the indicators increases. There are only two conditions in which the BCH method with proportional assignment performs better than the other corrected stepwise approaches. BCH and ML, and modal and proportional assignment, provide very similar results in the other conditions. 2 In the one-step and the three-step ML methods, we used weakly informative priors for the model probabilities to prevent boundary estimates for the logit parameters. Because this does not work in the three-step BCH method, 82 data sets with the modal assignment rule and 73 data sets with the proportional assignment rule did not converge and were excluded from further analysis.
The other two methods, mode aggregation and uncorrected stepwise estimation, recovered the true values of the parameters only under the condition of good indicator quality. When the uncorrected three-step method was used, the estimates of the group-level relationships in which ζ j was involved were underestimated in line with Bolck et al. (2004), regardless of whether the modal or proportional assignment rule was used. Because the indirect effect was underestimated, the direct effect compensates by being overestimated. The discrepancy between the estimated and true value decreased as the strength of the relationship between Z ij and ζ j increased. When Z ij was aggregated to the group level using the manifest group mode, the estimates of the between-group parameters were not correct in most conditions, the estimates improving as the indicator quality improved. In line with previous research (Bennink et al., 2013), the parameter estimates were correct only when the strength of the relationship between the individual-level predictor and the corresponding group-level variable was very good (β Zζ = .8). The standard deviations of the estimates obtained with mode aggregation and the uncorrected three-step method were stable over the true quality of the indicators.
As shown above, the corrected three-step approaches performed less well with poor indicators, this situation corresponding to extremely low class separation of .24. 3 To Note. R 2 entr : entropy-based R-square; β Zζ : regression parameter of Z regressed on ζ .
illustrate why this occurs, the true and the estimated proportion of classification errors are shown in Table 2 for each condition. As can be seen, the proportion of classification errors is underestimated in the condition in which R 2 entr = .24 resulting in third-step parameters that are not sufficiently corrected. To explore how large the amount of class separation needs to be to get better results, we added indicators with β Zζ values of .25, .30, and .35, corresponding to R 2 entr values of .35, .45, and .55. As can be seen, with a class separation of .45, the estimated proportion of classification errors gets close to the actual proportion. This was true under both modal and proportional assignment. It can also be seen that the variability (the standard deviation) of the estimated proportion of classification error decreased as the true quality of the indicators increased.
To conclude, the adjusted three-step approaches provide estimates that are as good as the estimates from a one-step analysis, so long as the class separation is sufficient (R 2 entr = .45). It does not matter whether a BCH or ML correction is used or whether modal or proportional assignment is used. When the relationship between the microlevel predictor and the group-level variable is very strong, all methods provide correct estimates, but this is not a realistic situation in practice. When the relationship between the micro-level predictor and the group-level variable is moderate, a latent variable should be used for the aggregation since the manifest mode aggregation provides incorrect estimates. When the relationship between the micro-level predictor and the group-level variable is extremely weak, all methods may provide incorrect estimates for the group-level sample size investigated in the current simulation study.

MULTIPLE MACRO-LEVEL LATENT VARIABLES
In this section we extend the simple model discussed thus far to a model with two micro-level predictors, Z 1ij and Z 2ij , and thus, two latent variables, 4 ζ 1j and ζ 2j (see Figure 3). The multinomial logistic equations for this model are: and Where X j , Z 1ij , Z 2ij , ζ 1j ζ 2j , and Y j contain, respectively, R, M, P W, Q, and S categories and a particular category is denoted by r, m, p, w, q, and s. q are associations. For continuous individual-level predictors, it is common practice to include their between-group and within-group correlation in the model. In the case of discrete variables, this concept is translated by incorporating an association between Z 1ij and Z 2ij (a Z 1 Z 2 ), and between ζ 1j and ζ 2j (a ζ 1 ζ 2 ). Thus, Z 1ij and Z 2ij are not expected to be independent given the latent group-level variables. Instead, while keeping the latent group-level variables constant, there may still be some residual within-group association between the micro-level predictors. At the between level, it is also expected that there is some residual association between ζ 1j and ζ 2j , after controlling for X j .
FIGURE 3 Micro-macro model with group-level predictor X j , individual-level predictors Z 1ij and Z 2ij , group-level latent class variables ζ 1j and ζ 2j , and group-level outcome Y j .
Although it is straightforward how to estimate this model in a single step, under stepwise estimation, the first-step model(s) must be defined. One alternative is to formulate a separate measurement model for ζ 1j and ζ 2j as described in Equation (4). By formulating two measurement models, the meaning of the latent classes is influenced only by the individual-level scores on the corresponding micro-level predictors. The number of latent classes for ζ 1j and ζ 2j can then be chosen without being influenced by the variables from the structural part of the model, but the eventual residual within-group association among Z 1ij and Z 2ij is ignored.
Another alternative is then to formulate a single simultaneous measurement model for the two latent variables, including the residual within-group association between Z 1ij and Z 2ij : By doing such, the meaning of the latent classes is still not influenced by the group-level variables from the structural part of the model, but the number of latent classes for ζ 1j and ζ 2j needs to be determined simultaneously. This decision can again be based on fit indices.
Next we examined whether ignoring the residual withingroup association among Z 1ij and Z 2ij affected the estimates of the between-level parameters. Since sampling fluctuation was not of primary interest, one very large data set (J = 10,000, I j = 100) was generated in each investigated condition. If it turned out that ignoring the within-group association provided incorrect estimates in these very large samples, the estimates in smaller samples could be expected to be even worse because of the sampling fluctuation.
The population models varied in the strength of the relationship between the latent variables and the corresponding micro-level predictors (indicators) and the strength of the within-group association among the micro-level predictors (within-group association): • indicators (β Z 1 ζ 1 and β Z 2 ζ 2 ): 0.2, 0.4, or 0.6 • within-group association (a Z 1 Z 2 ): 0, 0.2, 0.4, or 0.6.
For the sake of consistency, the other conditions were maintained from the previous simulation study. Therefore, all observed and latent variables were dichotomous, and effect coding was used to fix the between-group effects from the structural part of the logistic model (β ζ 1 X = β ζ 2 X = a ζ 1 ζ 2 = β Y ζ 1 = β Y ζ 2 = β Y X = .4). Because of the large number of individuals within a group, the R 2 entr value was very high (>.90) in all conditions.
The generated data sets were analyzed with the one-step and the bias-adjusted three-step approaches. In the one-step  No Yes No Yes No Yes No Yes No .0 .40 .40 .39 .40 .40 .40 .39 .40 .40 .40 .2 .40 .44 .40 .43 .40 .44 .40 .43 .40 .44 .4 .40 .50 .40 .48 .40 .50 .40 .48 .40 .50 .6 .41 .60 .42 .56 .43 .59 .42 .56 .43 .59 Note. BCH: Bolck-Croon-Hagenaars three-step method; ML: maximum likelihood three-step method; modal: modal assignment rule; prop: proportional assignment rule; a Z 1 Z 2 : association among Z 1 and Z 2 ; yes: a Z 1 Z 2 is incorporated in the measurement model; no: a Z 1 Z 2 is not incorporated in the measurement model. J = 10,000, n j = 100, and true value a ζ 1 ζ 2 = 0.4. procedure, we used the correct specification containing the residual within-group association and the incorrect model excluding this association. In the first step of the BCH and ML bias-adjusted stepwise procedures, we used either a single joint measurement model with the association between Z 1ij and Z 2ij or two separate measurement models that ignored this association. The modal and proportional assignment rules were used to assign the groups to latent classes in the second step of the analysis. The manifest mode aggregation and uncorrected three-step procedures were not used because the previous simulation study showed that these methods had failed in a simpler model. Table 3 presents the results for the conditions under which a Z 1 Z 2 varied and β Z 1 ζ 1 and β Z 2 ζ 2 were fixed to .4. The results reported concern the between-level parameter that is most strongly affected by ignoring the within-group association, that is, the association between ζ j 1 and ζ j 2 (a ζ 1 ζ 2 ), which has a true value of .4. As can be seen, when the within-group association among Z 1ij and Z 2ij is correctly modeled, the one-step method and the bias-adjusted three-step methods provide estimates that are close to the true values. However, when this within-group association is not taken into account, the between-group association estimate is incorrect for all estimation procedures. The larger the ignored value, the more the between-group association among ζ 1j and ζ 2j is overestimated. Note that the estimates obtained with the one-step method and the various types of bias-adjusted three-step estimates are all very similar. The estimates of the remaining between-group effects deviate less from the true value than the estimates of the between-group association. Any encountered discrepancy is an underestimation that is probably due to compensating the overestimation of the between-group association among ζ 1j and ζ 2j . Table 4 shows how the bias caused by ignoring the withingroup association among the micro-level predictors interacts with the quality of the micro-level scores as indicators for the group-level latent variables. With poor indicators and a strong ignored within-group association (β Z 1 ζ 1 = β Z 2 ζ 1 = 0.2 and a Z 1 Z 2 = 0.6 ), the estimates of the between-group association among ζ 1j and ζ 2j clearly deviate from the true value, while with good indicators and a small ignored withingroup association (β Z 1 ζ 1 = β Z 2 ζ 1 = 0.6 and a Z 1 Z 2 = 0.2 ) the estimates are still close to the true value.
All together, these results show that the bias-adjusted three-step procedures can be used for this micro-macro model just as well as the one-step procedure, so long as the withingroup association among the micro-level predictors is modeled in the first step. The within-group association can be ignored only when the micro-level scores are very good indicators of the group-level latent variables and the within-group association is small.

DATA EXAMPLE
The stepwise micro-macro model with two micro-level predictors was then applied to a real data example. Since the BCH and ML correction procedures and the modal and proportional assignment rules provided similar results in the simulation study, only one method was applied, namely, the ML three-step approach with modal assignment. The inspiration for the current data example came from a paper by Croon, van Veldhoven, Peccei, and Wood (2014). These authors showed the relevance of bathtub multilevel mediation models, such as the one discussed in this paper, for research on human resource management (more specifically on job design) and organizational performance by using an example from the Workplace Employment Relations Survey 2004 (WERS2004). This is a publicly available large-scale data set from the United Kingdom with representative sampling at the employee and the workplace level. More information about the survey can be found at (www.wers2004.info). Croon et al. investigated to what extent the relationship between the adoption of enriched job designs at the level of the workplace and workplace labor productivity was mediated at the individual level by employees' experienced sense of job control and job satisfaction. For the current application, all measures from Croon et al., enriched job design, job control, and job satisfaction, were categorized into variables with three levels of approximately equal size (low, medium, and high), while labor productivity was transformed into a variable with two approximately equal-sized categories (low and high). To keep the discussion simple, the current illustration ignores that the variables were originally measured with multiple items. The analyses were performed on 18,505 employees nested within 1,455 workplaces. The Latent GOLD 5.0 syntax (Vermunt & Magidson, 2013a) used in this analysis is given in online supplementary material.
In the first step of the procedure, a measurement model was formulated in which the individual-level scores on job control (Z 1ij ) and job satisfaction (Z 2ij ) were used as exchangeable indicators for two latent variables at the group Note. BCH: Bolck-Croon-Hagenaars three-step method; ML: maximum likelihood three-step method; modal: modal assignment rule; prop: proportional assignment rule; β νμ : regression parameter of ν regressed on μ; a ζ 1 ζ 2 : association among ζ 1 and ζ 2 . J = 10,000, n j = 100, and true value between-group effects = .4.
level (ζ 1j and ζ 2j ). In addition, a residual within-group association (a Z 1 Z 2 ) among the individual-level measures on job control and job satisfaction was included in the model. The optimal number of latent classes for the two latent variables was determined simultaneously by comparing the fit indices of models with all possible combinations of one to five classes for each latent variable. Because the decision was about the number of classes at the group level, the fit indices that incorporate the sample size were based on the number of groups (Lukočienė et al., 2010). The BIC, Akaike information criterion 3 (AIC3), consistent AIC (CAIC), and sample-size adjusted BIC (SABIC) were lowest for the model with three latent classes for job control and job satisfaction.
In contrast, the AIC was lowest when both latent variables contained four latent classes. Since most fit indices pointed toward this direction and the individual-level observed variables had three response categories, the three-class solutions for both group-level latent variables were retained. Table 5 displays the class sizes and the class-specific response probabilities for both latent variables. This information was used to interpret the latent classes. Table 5A shows that the first latent variable has a class containing 35% of the groups, and in these workplaces, most employees score low (p = .70) on job control. The second latent class contains 56% of the groups, and in those workplaces, most employees score medium on job control (p = .85). The final class of workplaces contains 9% of groups in which, similar to the ones from the second latent class, most employees score medium (p = .66), but in addition about one third of the employees score high (p = .33) on job control. From Table 5B, it can be seen that the second latent variable has a class containing 61% of the groups, and these workplaces have mostly employees who score low on job satisfaction (p = .88). The second class contains 23% of the workplaces with most employees scoring medium (p = .62) on job satisfaction. The last class contains 16% of the workplaces in which employees mostly score high on job satisfaction. The groups from the third class also have about a third of their employees scoring low (p = .34) on job satisfaction (p = .69). Based on these posterior probabilities, all workplaces were assigned to a particular latent class using a modal assignment rule. The class separation was .39 for the latent variable job control and .43 for job satisfaction, which is relatively low for a three-step analysis.
In the third step, the assigned latent class variables were related to the group-level measures of enriched job design (X j ) and labor productivity (Y j ) in a latent class model. For this latent class model, the assigned class membership scores from the second step were used as single indicators with known measurement errors, these errors being fixed at the classification errors from the second step. The group-level parameters obtained with the ML bias-adjusted three-step approach are  The overall effect of ζ 1j (job control) on labor productivity is not significant (χ 2 = 1.37, df = 2, p = 0.51), while the overall effects of ζ 2j (job satisfaction) and enriched job design are significant (χ 2 = 22.87, df = 2, p < 0.001 and χ 2 = 4.97, df = 2, p = 0.08), although the latter only at a significance level of 10%. The category-specific parameters are presented in Table 6D. All categories except β Y 2 ζ 2 3 score higher than the reference group.
To conclude, enriched job design has a significant positive direct effect on labor productivity. The two parts of the indirect effect of enriched job design (macro-level) on labor productivity (macro-level) through job satisfaction (microlevel) are also significant, while only the first part of the indirect path through job control (micro-level) is significant and the second part is not significant.

DISCUSSION
A stepwise multilevel latent class model was proposed to predict group-level outcomes with discrete individual-and group-level predictors. In the first step, a latent class model is estimated in which the individual-level predictor is used as an indicator for a group-level latent class variable (measurement model). In the second step, the individual-level predictor is aggregated to the group level based on the latent class model from the first step. This has two important advantages. First, the measurement error in the aggregated scores is known. Second, it is an elegant way of aggregating a discrete variable since it was not clear how to do this with a manifest mean or mode. Next, the aggregated scores are related to the remaining group-level variables while correcting for the known measurement error (structural model). The results of the bias-adjusted stepwise procedures are similar to the parameters obtained when the measurement and structural model are estimated simultaneously in a one-step analysis. Since researchers are used to working in a stepwise manner when they aggregate using a manifest mode, they can continue to work with a familiar method while accounting for measurement error in the aggregated scores.
Two issues are important. First, the results from the simulation studies suggest that when the model contains multiple macro-level latent variables, the within-group association among the micro-level indicators must be included in the first step of the stepwise procedure, unless the within-group association is small and the micro-level scores are very good indicators of the group-level latent variables. Conceptually, it would suit the philosophy of stepwise estimation better to formulate two separate measurement models in the first step, one for each latent variable, but the simulation study showed that ignoring this within-group association provides incorrect estimates of the between-group association among the latent variables. It is unrealistic to assume that there is no residual within-group association among the predictors since that would imply that all associations among the micro-level predictors can be explained through the group-level latent variables. Second, class separation must be sufficiently high (R 2 entr = .45 ) since the results with poorly separated classes are correct only with large sample sizes. However, this is not a problem in practice since it is of little use to aggregate a variable that is only weakly (or not at all) related to the group level. As with any simulation study, the simulations on which these conclusions are based suffered from various limitations. Most important is that not all conditions were varied. For example, the sample sizes at both levels, the strength of the between-group effects, and the number of categories of the variables were fixed. Furthermore, the simulation study cannot be generalized to other (more complex) models without further research.
The stepwise ML procedure was also applied to a real data example in which labor productivity (group-level outcome) was explained by enriched job design (group-level predictor), job control (individual-level predictor), and job satisfaction (individual-level predictor). All variables from the example were constructed from multiple items but were used as single variables in the model. For the continuous version of the current application, Croon et al. (2014) found that the factor analytic model was better equipped to detect bathtub-type links than a model using scale scores. A good direction for further research would be to see whether this is also the case for discrete variables, and thus look at micro-macro models in which the group-level outcome and predictor, as well as the micro-level predictors are latent variables measured with multiple items.
In the current article, the focus was the parameter estimates and not the standard errors of these parameters. Theory suggests that when fixed-parameter estimates, obtained in the first step of the stepwise procedure, are plugged into the likelihood function, the effect of their sampling variability on the uncertainty about the estimates in the third step should be accounted for (Murphy & Topel, 1985). Fortunately, an easily accessible correction method for the standard errors is available from Bakk, Oberski, and Vermunt (2014). In situations with a large sample size, such as the current simulation study and data example, correction of the standard errors is not needed because the uncertainty about the estimates from the first step is very small (Bakk et al., 2014).
Finally, future research should investigate the optimal method for choosing the number of group-level classes. We know from Lukočienė et al. (2010) that decisions about group-level latent classes should be based on the BIC computed based on the number of groups. The simulation study on which these conclusions are based are about the simultaneous decision of individual-and group-level classes in a multilevel latent class analysis. In the current application, only a decision about the number of group-level classes must be made. The conclusions might change in a stepwise situation compared to a one-step situation.
Our final conclusion is that a stepwise estimation procedure that separates the estimation of the measurement model from the structural model can be applied to micro-macro (mediation) models with one micro-level mediator as long as the structural model parameters are corrected for the known measurement errors of the measurement model. In a model with two micro-level mediating variables, the within-group association among the micro-level mediators must be included in the measurement error, unless the within-group association is small and the micro-level scores are very good indicators of the group-level latent variables.

ARTICLE INFORMATION
Conflict of Interest Disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described. Ethical Principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data. Funding: This work was supported by Grant NWO 400-09-018 from the Netherlands Organization for Scientific Research. Role of the Funders/Sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Technical Details on the Computation of P(W = t|ζ = q)
This appendix shows how P (W = t|ζ = q) is computed using the classification information from the second step of the stepwise analysis. A more detailed description is provided by Bakk et al. (2013) and Vermunt (2010).
Let P (ζ j = q|Z j ) denote the posterior class membership probability for group j and P (W j = t|Z j ) denote the probability by which a group is assumed to belong to class t of W j given the applied assignment rule. Using the modal class assignment rule, also called modal a posterior assignment (MAP), groups are assigned to that category of W j for which P (ζ j = q|Z j ) is largest: Using the proportional assignment rule, each group is assumed to belong to a particular latent class with a probability equal to the posterior membership probability for the class concerned. Therefore, P (W j = t|Z j ) = P (ζ j = t|Z j ) .
The probability of being assigned to class t conditional on belonging to the true class q, P (W = t|ζ = q), is theoretically defined as: P (W = t|ζ = q) = Z P (Z)P (ζ = q|Z)P (W = t|Z) P (ζ = q) .
(15) Note that the sum is taken over all possible patterns of Z. Because the number of possible patterns can be very large, it is more practical to take the sum over the data pattern of the groups present in the available data sets, which yields: As shown by Vermunt (2010), when the specified model is correct, the theoretical and empirical definition of P (W = t|ζ = q) provide very similar results.