A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables

This article compares a variety of imputation strategies for ordinal missing data on Likert scale variables (number of categories = 2, 3, 5, or 7) in recovering reliability coefficients, mean scale scores, and regression coefficients of predicting one scale score from another. The examined strategies include imputing using normal data models with naïve rounding/without rounding, using latent variable models, and using categorical data models such as discriminant analysis and binary logistic regression (for dichotomous data only), multinomial and proportional odds logistic regression (for polytomous data only). The result suggests that both the normal model approach without rounding and the latent variable model approach perform well for either dichotomous or polytomous data regardless of sample size, missing data proportion, and asymmetry of item distributions. The discriminant analysis approach also performs well for dichotomous data. Naïvely rounding normal imputations or using logistic regression models to impute ordinal data are not recommended as they can potentially lead to substantial bias in all or some of the parameters.

Ordinal categorical data such as those measured using Likert scales (e.g., 3-or 5-point Likert scales) are very common in social and behavioral sciences. When there are missing data on the ordinal variables due to nonresponse or attrition, the missing data need to be handled appropriately so that accurate parameter estimates and statistical inference on population parameters can be obtained. One of the popular ways to handle missing data is to impute them (i.e., replace the missing data by some plausible values). However, most of the missing data literature in the social and behavioral sciences has focused on imputation methods for normally distributed data. Methods for categorical variables have not been systematically investigated. In particular, we focus on imputation because recent software advances provide researchers with more than one option for handling discrete response scales.
Correspondence concerning this article should be addressed to Wei Wu, Psychology Department, University of Kansas, 1415 Jayhawk Blvd., Lawrence, KS 66044. E-mail: wwei@ku.edu Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/hmbr. The appropriate methods to handle missing data are contingent on the cause of missingness (missing data mechanism). Rubin (1976) outlined three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). MCAR indicates a scenario where the probability of missing a score is not related to any observed variables or the missing value. MAR indicates a scenario where the probability of missing a score is dependent on an observed variable but not the missing value itself. Once the observed variable is controlled for, missingness has no association with the unobserved scores on the incomplete variable. Missing not at random (MNAR) indicates a scenario where the probability of missing a score is dependent on the missing value itself even with observed variables controlled.
When the missing data mechanism is MCAR or MAR (referred to as ignorable missingness), multiple imputation (MI) is an appropriate method to impute the missing data (Little & Rubin, 2002;Rubin, 1987;Sinharay, Stern, & Russell, 2001). MI generates multiple copies of independent imputed data sets. To account for uncertainty in estimation due to missing-ness, missing data are predicted from observed data based on a different set of parameter values in each imputed data set (Schafer & Olsen, 1998;Rubin, 1987). After imputation, the target analysis (e.g., a regression analysis) is applied to each imputed data set. The point and standard error estimates from the analyses are then pooled to generate the final results (Rubin, 1987).
Three strategies are generally used to impute the missing ordinal data (described in more detail below): treat them as normally distributed and apply well-established normal data imputation models (e.g., linear regression models and saturated models 1 ), generate imputations using a true categorical data model (e.g., logistic imputation models), or generate imputations based on a latent variable model. Given the existence of multiple strategies, it is beneficial for researchers to know which strategy or strategies are appropriate given how the ordinal data are to be analyzed following imputation. The current study aims to address this question in a commonly encountered context where the ordinal variables are items in Likert scales and will be aggregated (sum or average) to scale scores for further analysis. In this context, MI has advantages over other popular missing data techniques such as full information maximum likelihood (FIML). MI can impute missing data at the item level but FIML relies on the analysis model in which the item scores are not directly included (Gottschall, West, & Enders, 2012). As a result, the observed information in the data set is fully utilized by MI but not by FIML. In addition, if the missingness on 1 item is determined by another, MI would minimize the bias due to the missingness by borrowing information from the item that predicts the missing data. Note that in practice, researchers often deal with missing item scores within a scale by simply averaging over the remaining items. This is not recommended because it will cause biased parameter estimates for analyses on scale scores and biased reliability measures (Enders, 2003;Enders, 2010;p. 51).
The rest of the article is organized as follows. We start with an introduction of the commonly used algorithms in MI followed by the strategies to impute missing ordinal data. We then review the past research on the available strategies. After the literature review, a simulation study is conducted to evaluate the strategies in the target context. An empirical example is also provided to illustrate the strategies. This article is concluded by discussing the implications and limitations of the simulation study as well as potential directions for future research.

IMPUTATION ALGORITHMS
Multiple algorithms have been developed to generate multiply imputed data. The most popular one is the Markov chain Monte Carlo (MCMC; Schafer, 1997) algorithm. Briefly speaking, MCMC facilitates MI by drawing values of imputation parameters (e.g., mean vector and covariance matrix if the imputation model is saturated or intercept and regression coefficients if the imputation model is a logistic regression model) from their posterior distribution (s) through an iterative procedure. Note that we use the MCMC algorithm as a general term to refer to a family of algorithms commonly used in MI such as the data augmentation algorithm and the Gibbs sampler. In other words, these algorithms are just special cases of a MCMC algorithm.
MCMC allows missing data on multiple variables to be imputed from either a joint distribution or a series of univariate distributions. The joint distribution approach imputes missing data on multiple variables simultaneously, which is particularly suitable for normal data. In contrast, the univariate distribution approach imputes missing data on a variableby-variable basis. The univariate approach is also called chained equation or sequential regression equation imputation because the variables are imputed separately and sequentially at each iteration. It is also referred to as fully conditional imputation because prediction of missing data on one variable is conditional on the current values of all of the other variables at a specific iteration (Raghunathan, Lepkowski, van Hoewyk, & Solenberger, 2001;van Buuren, 2007;2012). One major advantage of the univariate approach is that the imputation model can be easily tailored to the nature of each variable. Thus it is very flexible in accommodating mixed types of missing data (e.g., continuous, categorical, and semi-continuous). For instance, one variable can be imputed using a categorical data model while another can be imputed using a normal data model. In the following, we illustrate the MCMC algorithm with joint and chained equation imputation in more details as it will help readers understand how imputation is actually done for the strategies considered in the article.

Joint Distribution Approach
For joint imputation, MCMC starts with initial values on the imputation parameters and then iterates between two steps. Schafer (1997) describes the two steps as posterior step (P step) and imputation step (I step). Let y obs and y miss represent observed and missing data on a vector of variables, respectively; the two steps at a specific iteration, say tth iteration, can be described as follows.
P step: Draw a random set of imputation parameter values (θ (t) ) from the posterior distribution of the imputation parameters (θ) that typically contains the elements in a mean vector and covariance matrix, e.g., when variables are normally distributed.
Here y (t−1) miss represents predicted missing data from the (t-1)th iteration. I step: Predict missing data from the observed data conditional on θ (t) . This is equivalent to drawing random values from the predictive probability distribution of y miss . This distribution is typically multivariate normal.
miss ∼ p y miss |y obs , θ (t) (2) The predicted missing data are then carried to the P step of the next iteration to update the posterior distribution of imputation parameters.

Univariate Distribution Approach
In the univariate distribution approach, the two steps described in the joint distribution approach apply to every variable to be imputed. Suppose there are p variables to be imputed in the order of y 1 ,. . ., y p , each MCMC iteration would contain 2p steps. Note that the order is decided arbitrarily in this illustration example. In practice, it can be determined based on the amount of missing data on each variable. The two steps for y 1 at iteration t can be described as follows.
P Step: Draw a random set of parameter values (θ (t) 1 ) from the posterior distribution of the imputation parameters conditional on the values of the other variables (imputed or complete) from the previous iteration t−1. I Step: Impute missing data on y 1 conditional on all of the other variables and θ (t) 1 . Again, this is equivalent to drawing random values from the predictive probability distribution of y (t) 1(miss) .
The imputed values y (t) 1(miss) are then used to update the posterior distribution of imputation parameters for the next variable y 2 at iteration t (θ (t) 2 ), as shown below.
The missing data on y 2 are then predicted from a random draw from the predictive probability distribution of y (t) 2(miss) conditional on θ (t) 2 and the current values of the other variables.
(6) The two steps are repeated for the rest of the variables in the iteration. As mentioned above, each variable can be imputed by a different model. When a variable y is normal, θ may contain linear regression coefficients or elements in mean vector and covariance matrix, when y is dichotomous, θ may contain logistic regression coefficients, when y is ordinal, θ may contain ordinal logistic regression coefficients, etc. Correspondingly, the predictive probability distribution of y miss is tailored to the scale; normal data are drawn from a normal distribution, binary data are drawn from a binomial distribution, and polytomous data are drawn from a cumulative logistic distribution, etc.
The MCMC algorithm with joint or chained equation imputation typically cycles through many iterations. The algorithm converges when the posterior distribution(s) of the imputation parameters stabilizes. After convergence, the imputed data sets can be saved. However, they will be saved at specified intervals but not every iteration to avoid the autocorrelations among the imputation parameters at adjacent iterations.
Most of the strategies considered in the study are implemented using an MCMC algorithm. Another algorithm that is relevant to the current study is the so-called EMB algorithm. This algorithm combines expectation-maximization (EM) algorithm with bootstrapping to impute missing data (Honaker, King, & Blackwell, 2011). Specially, EMB applies bootstrapping to the sample data and then uses the EM algorithm to obtain the maximum likelihood estimate of the mean vector and covariance matrix of the variables in an imputation model for each bootstrapped sample. The EM estimate from each bootstrapped sample is then treated as a random draw of the imputation parameters and used to impute the missing data. For more information on the EM algorithm and bootstrapping, researchers can refer to Dempster, Laird, and Rubin (1977) and Efron (1979), respectively.
EMB assumes multivariate normality that is suitable for joint imputation of normal data. In this regard, it is theoretically equivalent to the MCMC algorithm and runs much faster than MCMC for two reasons: (1) convergence of the EM algorithm is more straightforward than convergence of the MCMC algorithm. It does not require the usual convergence diagnostics in MCMC, and (2) unlike MCMC, no autocorrelation exits among the imputation parameter values from different bootstrap samples. As a result, every bootstrap sample is used to generate the imputed data sets.

IMPUTATION STRATEGIES
Having illustrated the predominant imputation algorithms, we now turn to the strategies that can be used to predict missing ordinal data. As mentioned above, we focus on three types of strategies. The first is to treat the ordinal missing data as normally distributed and use the models that assume normality to impute the missing data. Because it is easy to define the joint distribution of variables with normal distributions, missing data are usually imputed jointly with either an MCMC or EMB algorithm, although the imputation can be also done sequentially with an MCMC algorithm. The normal model strategy is available in almost all of the MI packages and has been used most widely by researchers. The resulting imputed values are continuous and may be directly used in analysis or rounded to discrete metrics.
The second is to impute ordinal missing data using models designed for categorical data such as linear discriminant analysis, and logistic regression models. For the categorical data models, imputation is usually done using the MCMC algorithm with chained equation imputation. The imputed values from these approaches preserve the original metric of the ordinal variables. In what follows, we briefly describe the major characteristics of the categorical data models.
We only consider the discriminant analysis model for dichotomous data, although it can be also used for polytomous data. Discriminant analysis assumes that the predictors of group membership follow a multivariate normal distribution with means that vary across categories but a covariance matrix that is constant over categories (often pooled over categories; see Shafer, 1997). For each imputation, group means and pooled covariance matrix are drawn from their posterior distributions and used to derive a linear combination of the predictors (referred to as discriminant function), which maximizes the group difference. The discriminant function scores are then used along with prior probabilities of the categories (usually the sample proportions of each category) to compute the posterior probabilities of a participant belonging to each category ( van Buuren & Groothuis-Oudshoorn, 2011). The missing values are imputed based on the posterior probabilities typically in a way such as a participant is assigned to the category with the highest posterior probability. This imputation model is currently available in MI packages such as SAS PROC MI, MICE in R, and IVEware.
In terms of logistic regression models, there are different types of logistic regression models depending on the number of categories (K) in the ordinal data. For dichotomous data (i.e., K = 2), there are binary logistic regression models that link predictors linearly with the log odds (logit) of the probability of falling into a category versus the other. The parameters in a binary logistic regression model include the intercept and regression coefficients associated with each of the predictors. For polytomous data (K ≥ 3), multinomial and proportional odds logistic regression models may be used. The former is designed for nominal data but can be also used for ordinal data. Suppose there are K categories; a multinomial regression model contains K−1 equations linking predictors linearly with the log odds of falling into each category versus category K. The intercept and regression coefficients associated with the predictors are allowed to be different across the equations. A proportional odds regression model also contains K−1 equations. These equations link predictors linearly with the log-odds of falling into or below each category versus falling above it (cumulative logit). However, unlike the multinomial model, the effects of the predictors (i.e., regression coefficients) are assumed to be constant across equations (referred to as proportional odds assumption). That is, the influence of the predictors is assumed to be the same across all adjacent categories of the ordinal variable (e.g., a one-unit increase in a predictor has the same influence on the probability of belonging to categories 1 and 2, categories 2 and 3, etc.).
In theory, the multinomial and proportional odds logistic regression models have their own pros and cons. The proportional odds model takes the order of response categories into account while the multinomial model does not. In addition, it is more parsimonious (contains fewer parameters) than the multinomial model. On the other hand, the multinomial model is not restricted to the proportional odds assumption. Thus when this assumption is violated, it may be preferred to the proportional odds model. More information on the logistic regression models can be found in Agresti (2007). The binary and proportional odds logistic regression models for imputation are available in SAS PROC MI, MICE in R, and IVEware. The multinomial logistic regression imputation model is available in MICE in R, and IVEware.
The third type of strategy is to assume that there is a continuous latent variable underlying each observed ordinal variable (Asparouhov & Muthén, 2010a;McKelvey & Zavoina, 1975). The underlying latent variables are typically assumed to follow a multivariate normal distribution. The imputed values are first imputed at the latent variable level using a normal data model and then discretized based on estimated thresholds (see below for more details). This latent variable model is a formulation of a cumulative/ordinal probit model and is currently available in Mplus (Asparouhov & Muthén, 2010a).
Let y * represent the vector of latent variables underlying a vector of categorical variables. The following equation shows the default imputation model (a saturated model) for y * in Mplus (Asparouhov & Muthén, 2010a;2010b).
MVN stands for multivariate normal distribution. μ is a vector of latent means that are fixed to 0 for identification purpose because each latent mean is redundant with one of the threshold parameters for the corresponding ordinal variable. Fixing the means to be 0 allows all of the thresholds to be freely estimated. 2 is the covariance matrix of the latent variables. The diagonal elements of are fixed at 1 to set the scale and the off-diagonal elements are freely estimated. Specially, for complete cases, the y * values follow a truncated normal distribution, such that they are bounded by the appropriate threshold parameters. For incomplete cases, the y * values are unbounded since we are not able to condition on the discrete scores.
Once the imputed values are obtained for y * , they are discretized using the thresholds as follows (Asparouhov & Muthén, 2010a;2010b).
Because missing data are imputed at the latent variable level and the latent variables are assumed to follow a multivariate normal distribution, it is straightforward to establish the joint distribution even when there is a mixture of ordinal variables and continuous variables 3 (Asparouhov & Muthén, 2010a). Thus the missing data are usually imputed jointly with this strategy. The imputation process involves a few additional steps that are not present in the joint model described in Equations (1) and (2), e.g., drawing threshold parameters from their posterior distributions and categorize latent normal imputations using the threshold parameters. Interested researchers can refer to Asparouhov and Muthén (2010b) and Cowles (1996) for more detail on the imputation process.

LITERATURE REVIEW
The past research on imputation strategies for ordinal data has been focused on three issues: (1) the robustness of the normal model approach without rounding to discontinuity and nonnormality due to ordinal data, (2) the rounding strategies for the resulting continuous imputed data from the normal model approach, and (3) the comparison of the normal model approach to categorical model approaches.
With regard to the first issue, many of the previous studies suggested that with a moderate proportion of missing data (≤30%), the normal model approach without rounding was able to accurately reproduce the mean of polytomous data (Ake, 2005;Horton, Lipsitz, & Parzen, 2003), a linear regression coefficient of predicting a continuous outcome variable from a binary variable with missing data (Allison, 2005), Cronbach's alpha for Likert scales and Pearson correlations among ordinal items with 3, 5, or 7 categories for MCAR and MAR data (see Enders, 2003;Leite & Beretvas, 2004). However, with a larger proportion of missing data, the normal model approach without rounding can potentially result in biased parameter estimates. For example, Leite and Beretvas (2004) found that with 50% missing data, the normal model approach led to underestimated correlations among the ordinal items.
Many rounding strategies have been proposed to discretize the continuous imputed data from the normal model approach. The simplest strategy is to round the imputed values to the nearest integer (naïve rounding). However, naive rounding is generally not recommended for estimation of proportions and regression coefficients when the ordinal variables are predictors (see Ake, 2005;Allison, 2005;Horton et al., 2003;Lee, Galati, Simpson, & Carlin, 2012). Finch and Margraf (2008) also reported substantial bias in parameter estimates due to naïve rounding in item response theory analysis. However, naïve rounding seemed acceptable for estimation of parameters such as odds ratio in logistic regression (Bernaards, Belin, & Schafer, 2007). Alternative rounding strategies (e.g., adaptive rounding, two-stage calibration rounding, and indicator based rounding strategies) have been also developed in the past decades (see Demirtas, 2010;Lee et al., 2012;Yucel, He, & Zaslavsky, 2008). However, these rounding strategies are generally cumbersome to implement. In addition, none of them have emerged as superior to the other under all analytical scenarios examined in the past research. For example, Lee et al. (2012) found that all of the rounding strategies resulted in biased estimates of the proportions of ordinal variables with 4 categories. Based on the review, the current study will only consider naïve rounding for its simplicity.
Limited research has compared the normal model approach to a subset of the categorical model approaches mentioned above. Using simulated data of N = 500, Allison (2005) found that the binary logistic regression and discriminant analysis approaches were comparable to the normal model approach without rounding to impute MCAR data on a binary variable, and were superior to the normal approach without rounding to impute MAR data on the binary variable in estimating mean (proportion) of the binary variable especially when the distribution of the binary variable is severely asymmetric (proportion ≤ .2). However, the normal model approach without rounding performed as well as the logistic regression and discriminant analysis in correctly recovering the linear regression coefficient when the binary variable with missing data was used to predict a continuous variable. Finch (2010) compared the performance of the normal model approach with naïve rounding and multinomial logistic regression approach to impute 5 ordinal items with 5 categories. He examined the effect of imputation strategy on the regression coefficients in an ordinal regression analysis where one of the ordinal variables is predicted by all of the rest. He varied sample size (N = 200, 500, or 1000) and missing data mechanism (MCAR or MAR) and set the percentage of missing data on the 5 variables to 25%. He found that the multinomial logistic regression approach yielded more bias in the point and standard error estimates of the logistic regression coeffi-cients than the normal model approach with naïve rounding. However, the biases tended to decrease as N increased.

PURPOSE OF STUDY
Our literature review suggests that the past research on the imputation strategies has been conducted in a variety of analytical contexts (e.g., linear and logistic regression models, coefficient alpha, and estimation of variable means). However, little research has been done to evaluate the imputation strategies in a commonly encountered scenario in which the ordinal variables are to be aggregated to scale scores for further analysis. In addition, some imputation strategies have only become available or accessible to researchers in the recent years. They have not been included in the previous literature. For example, the past literature has much less compared the proportional odds logistic regression 4 and the latent variable approach.
This article is thus designed to compare a large set of currently available imputation strategies for Likert scale variables using Monte Carlo simulation. We restrict our attention to scale scores in the article, as this is an important starting point for understanding the behavior of categorical imputation models and is overlooked in the previous research. Specifically, we examine the extent to which the imputation strategies will influence the estimation of reliability coefficients (Cronbach's alpha), mean scale scores, as well as regression coefficients of predicting one scale score from another. The examined strategies include the normal model approach without rounding and with naïve rounding, the latent variable approach, the binary logistic regression and the linear discriminant analysis (only for dichotomous data), and the multinominal and proportional odds logistic regression approaches (only for polytomous data).

SIMULATION STUDY Data Generation
In the simulation study, we generated 14 variables in total. Twelve of them were ordinal (A 1 to A 6 , and B 1 to B 6 ) and 2 of them were continuous (aux1 and aux2). The 12 ordinal variables belong to 2 scales A and B, each containing 6 items. For the purpose of generating data, we assumed that there was a continuous variable underlying each ordinal variable. We first generated 12 standard normal variables using a two-factor model with each factor indicated by 6 4 Finch (2010) examined a stochastic regression imputation method in which ordinal missing data are predicted by a proportional odds logistic regression model (see Sulis & Porcu, 2008). He found that this approach provided comparable performance to the normal model approach with naïve rounding. However, the stochastic regression imputation is not directly comparable to the imputation done through an MCMC algorithm.
variables. The correlation between the two factors was set to .3 (a medium effect). The loadings of the variables on the corresponding factor were all set to .7. On the continuous metric, this model would yield within-scale item correlations of .49, and between-scale item correlations of .15. This model would also yield Conbach's alphas equal to or above .7 (a commonly used threshold for acceptable internal reliability) for each scale on the discrete metric (Nunnally, 1978). The ordinal variables were then created by discretizing the continuous variables using thresholds. With K categories, K-1 thresholds need to be specified (more details on the thresholds used to generate the ordinal variables follow). After generating the ordinal variables, two scale scores (X A and X B ) are computed by summing their corresponding item scores.

Missing Data Generation
We imposed missing data on half of the ordinal items in each scale (A 1 -A 3 and B 1 -B 3 ). The two continuous variables (aux1 and aux2) are used to generate missing data. aux1 and aux2 had a correlation of .5 with each other and with each of the two scale scores. The missingness on A 1 -A 3 is determined by aux1. We rank ordered aux 1 from the smallest to the largest and computed the probability of being missing based on the rank order. For example, for each case, the probability of having missing data on A 1 is computed as 1-the rank order of the value on aux 1 /N. This probability is then compared to a random number drawn from a uniform distribution. If the probability is higher than the random number, the observation is treated as missing. This procedure is continued until the desired percentage (e.g., 30%) of missing data is achieved. The missing data on the B items are imposed in a similar fashion except that the missingness is determined by the rank of the sum of aux 1 and aux 2 .
The missing data generated in this way are MAR. We only considered MAR data in the simulation considering (1) pure MCAR is rare in practice, and (2) the method that works for MAR data usually also works for MCAR data. To account for the MAR missingness, aux1 and aux2 are included in the imputation models as auxiliary variables.

Design Factors
The following factors are manipulated in the simulation study: (1) Number of categories in the ordinal items (K = 2, 3, 5, or 7). (2) Sample size (N = 125 or 500). These Ns fall in the range of the typical sample sizes in social and behavioral science research. For ease of presentation, we arbitrarily call N = 125 a small sample size and N = 500 a large sample size.
(3) Proportion of missing data on each of the 6 incomplete items (30% or 50%). These missing data proportions are moderate or large proportions examined in the previous literature. We selected relatively large proportions to reveal the difference among the imputation strategies. These missing data rates are not uncommon in some studies such as longitudinal studies (Gustavson, von Soest, Karevold, & Røysamb, 2012). (4) Asymmetry of thresholds used to discretize continuous distributions (symmetric, moderately asymmetric, or severely asymmetric). Because the distributions of the ordinal items are not necessarily symmetric in practice, we manipulated the asymmetry of thresholds at the three levels to introduce different levels of asymmetry in the item distributions. Table 1 shows the thresholds used to generate the levels of asymmetry for ordinal variables with different Ks. These thresholds are borrowed from Rhemtulla, Brosseau-Liard, and Savalei (2012). To visualize the resulting item distributions, the theoretical frequency distributions for the ordinal data with N = 100 are presented in Figure 1.
In sum, there are 48 conditions for each imputation strategy. Each condition was replicated 1,000 times. To remove any possible simulation error in comparison of the strategies, the same set of simulated data is used across imputation strategies. Since we have maximum 50% missing data on some of the variables, 50 imputed data sets are obtained for each replication in accordance with White, Royston, and Wood's (2011) suggestion.

Parameters to be Evaluated
In the current study, we examine how imputation strategy influences the estimation of three parameters: Cronbach's alpha coefficient (a measure of internal consistency) of scale B, mean scale score of scale B, and linear regression coefficient of predicting X B from X A . Note that we reported the Cronbach's alpha or mean scale score for scale B only because the  items in the scales A and B are generated in the same fashion thus the results for the two scales are comparable. The mean scale score and regression coefficient are scale-level quantities. The alpha coefficient is an item-level statistic that is calculated based on averaged Pearson correlation coefficients across imputations following typical practice.
where p is the number of items andr is the average correlation.
The population values for the three parameters for ordinal data with different numbers of categories and shapes of item distributions are presented in Table 2. The population values are computed based on a single large simulated data set (N = 500,000) of complete ordinal data because the categorization process prevents us from analytically determining the parameter values.

Bias in Point Estimate
Following Collins, Schafer, and Kam (2001), we used % standardized bias to quantify the bias in a specific parameter (θ ).
whereθ i is the parameter estimate for the ith replication. q is the number of replications. θ 0 is the population parameter value. s θ is the standard deviation of the parameter estimates across replications (i.e., empirical standard error). Collins et al. (2001) suggested that % standardized bias greater than 40% (i.e., more than 40% of a standard error) represents estimation bias that is of practical significance.

Mean Square Error (MSE)
For each parameter, MSE is the averaged squared discrepancy between the parameter estimate and the population parameter value.
MSE is a measure of overall accuracy (the extent to which sample estimates deviate from the population parameter value), which is a combination of both bias and variance of estimation. A strategy with a smaller MSE is more accurate thus is preferred. To facilitate comparison of MSEs among multiple strategies, a ratio of MSE was computed for each strategy relative to the normal model approach without rounding given that it is the strategy that have enjoyed the most widespread use by researchers. An MSE ratio larger than 1 indicates that a specific strategy is less accurate than the normal model approach without rounding.

95% CI Coverage
We also reported the 95% CI coverage for the target parameters. The 95% CI coverage for each parameter is calculated as the proportion of the 95% CIs that cover the population value across replications. The upper and lower limits of the CI for the regression slope or mean scale score are computed based on the corresponding standard error estimate. The upper and lower limits of the CI for Cronbach alpha are computed as follows (Fan & Thompson, 2001): whereα is the sample estimate of the alpha coefficient. γ is the significance level (γ = .05 for a 95% CI). F (γ /2),df 1 ,df 2 and F (1−γ /2),df 1 ,df 2 represent percentiles γ /2 and 1 -γ /2, respectively, in an F distribution with df 1 = (N -1) and df 2 = (N -1)(p -1), where N indicates sample size and p indicates number of items, respectively.

Data Analysis
A repeated measures ANOVA was conducted to examine the effect of the design factors on standardized biases for each of the four types of ordinal data and each of the three parameters (12 ANOVAs in total). We did not run the analyses for MSE or CI coverage because these outcomes violate the normality assumption of ANOVAs. In addition, since a larger bias tended to be associated with a larger MSE and lower CI coverage (see the Results section), the ANOVAs can also give us some sense about how the design factors affected the two outcomes.
Among the design factors, sample size, missing data proportions, and level of asymmetry are between-subject factors. The type of strategy is a within-subject factor because the same set of simulated data was used across imputation strategies. We included all possible main and interaction effects in the ANOVA models and computed an eta square (η 2 ) for each effect as the effect size measure. For a betweensubject effect, η 2 is computed as the ratio of the sum of squares explained by the effect and the total between-subject sum of squares. For a within-subject effect, η 2 is computed as the ratio of the sum of squares explained by the effect and the total within-subject sum of squares. To simplify presentation, we only reported the effects with a η 2 > .05.
Software R, Mplus, and SAS were used to perform the tasks in the simulation study. The normal model approach was implemented using Amelia II in R (Honaker et al., 2011), which uses the EMB algorithm for imputation. The latent variable model approach is implemented in Mplus 7 (Muthén & Muthén, 1998-2012 through the MCMC with the joint imputation. Mplus automatically monitors convergence of the MCMC algorithm using the potential scale reduction convergence criterion (Asparouhov & Muthén, 2010b;Gelman & Rubin, 1992). After convergence, the imputed data sets were saved every 100 iterations. The discriminant analysis and logistic regression approaches were implemented using MICE in R ( van Buuren & Groothuis-Oudshoorn, 2011) through the MCMC algorithm with chained equations imputation. Five burn-in iterations (the iterations occur before convergence) were specified following the general recommendation of the MICE developers ( van Buuren & Groothuis-Oudshoorn, 2011). To ensure that MCMC converged appropriately with five burn-in iterations, we checked the trace plots of the parameter estimates across MCMC iterations. We also compared the result from five burn-in iterations to that from 20 burn-in iterations for one condition (50% missing data, K = 7, and the item distributions were severely asymmetric). We did not see any substantial difference between the two sets of result. R was used to compute the alpha coefficient, the mean scale score, and the regression coefficient of predicting X B from X A . PROC GLM in SAS 9.2 was used to perform the repeated measures ANOVAs.

RESULTS
The substantial effects (η 2 > .05) from the repeated measures ANOVAs are summarized in Table 3. The standardized bias, MSE, and CI coverage for each method under each combination of N and distribution asymmetry are summarized in Tables 4-7 for K = 2, 3, 5, and 7, respectively. As shown in Table 3, missing data proportion did not have a substantial influence on standardized bias for any of the parameters, controlling for the other factors. The MSE and CI coverage results also showed similar patterns between the two missing data proportions (30% and 50%). Thus, for simplicity, we present the average outcomes across the two proportions in Tables 4-7. In what follows, we describe the results for the dichotomous data first followed by those for the polytomous data.

Dichotomous Data
For dichotomous data, the standardized bias was substantially influenced by imputation strategy, N, and distribution asymmetry for any of the parameters (see Table 3). Specifically, imputation strategy interacted with distribution asymmetry (η 2 ranged from .09 to .29) or N (η 2 ranged from .08 to .12) to affect standardized bias, indicating that distribution asymmetry or N effect differed across imputation strategies. Figure 2 displays the plot of % standardized biases against imputation strategies for each N and level of distribution asymmetry for regression slope. The same patterns were seen in the two interaction effects for the other parameters. As shown in Figure 2, the logistic regression approach and naïve rounding were more sensitive to N than the other approaches. N influenced the two approaches differently. Increase in N improved the performance of the logistic regression approach while it worsened the performance of naïve rounding. In terms of the effect of distribution asymmetry, all of the approaches were influenced by distribution asymmetry to some extent; however, the logistic regression approach was most sensitive to distribution asymmetry. As shown in Figure 2, the bias due to the logistic regression approach increased dramatically as distribution asymmetry increased. Taking a further look at the magnitude of the standardized biases, MSEs, and CI coverage probabilities under each combination of N and distribution asymmetry, it is obvious that the normal model approach without rounding, the latent variable approach, and the discriminant analysis approach not only were least sensitive to the design factors, but also yielded more accurate estimates (|standardized bias| < 40%), lower MSEs, and better CI coverage rates for the parameters under examination than naïve rounding and the logistic regression approach (see Table 4).

Polytmous Data
For polytomous data, a two-way interaction effect was found between imputation strategy and distribution asymmetry (η 2 ranged from .09 to .47) for many of the conditions. The two-way interaction effects for the regression slope under K = 3, 5, or 7 were plotted in Figure 3. The same interaction effects for the other parameters share a similar pattern as those shown in Figure 3. As shown in Figure 3, only naïve rounding and the two logistic regression approaches (multinomial and proportional odds) were influenced by distribution asymmetry. In general, larger biases were associated with asymmetric distributions. However, the interaction effect exhibited different patterns between K = 3 and K = 5 or 7. Specifically, with K = 3, whether the distributions were moderately or severely asymmetric made significant difference. The biases associated with moderate asymmetry were substantially lower than those associated with severe asymmetry. However, with K = 5 or 7, the two levels of asymmetry led to very similar results for the two logistic regression approaches.
A two-way interaction effect was also found between imputation strategy and N (η 2 ranged from .08 to .12) for many of the conditions (see Table 3). Representative interaction effects for K = 3, 5, or 7 were plotted in Figure 4. Again, naïve rounding and the two logistic regression approaches were more influenced by N than the others. Specifically, the proportional odds approach yielded more bias with a smaller N regardless of the number of categories. However, the N effect for the multinomial approach differed between K = 3 and K = 5 or 7. With K = 3, more bias was associated with a smaller N, while with K = 5 or 7, more bias was associated with a larger N. This result is likely due to the fact that the multinomial approach ignores the order information in the response categories. The bias due to this problem became only noticeable with K > 3 and was exacerbated as N increased. Naive rounding seemed to be influenced by N only when K is small. Similar to the result for dichotomous data, the bias due to naïve rounding increased as N increased.
Furthermore, a three-way interaction among strategy, N, and distribution asymmetry was found for the regression slope for ordinal data with K = 5 categories (η 2 = .09), and the mean scale score for ordinal data with any number of categories (η 2 ranged from .10 to 14). The three-way interaction effects suggested that N and distribution asymmetry interacted with each other to affect the outcomes in these cases and this interaction effect differed across imputation strategies. Representative three-way interaction effects were plotted in Figure 5. As shown in Figure 5, the interaction effect seemed only to occur with naïve rounding and the two logistic regression approaches. For these approaches, N af-fected the bias most when the item distributions were severely asymmetric within each K. For the proportional odds logistic regression approach, the largest bias occurred when severe asymmetry was combined with a small N. However, for the multinomial regression approach and naïve rounding, the largest bias occurred when severe asymmetry was combined with a large N.
Taking a further look at the magnitude of the standardized biases, along with the MSEs and CI coverage probabilities, associated with the examined methods for polytomous data (see Tables 5-7). It is clear that the latent variable model approach and the normal model approach without rounding outperformed the other approaches regardless of the parameter and number of categories. Between the two approaches the latent variable approach performed slightly better than the normal model approach by producing smaller standard- ized biases and MSEs when the distributions were severely asymmetric.
Naïve rounding produced acceptable estimates for the regression slope under all conditions with only one exception: K = 3, the item distributions were severely asymmetric, and N = 500. However, it yielded underestimated alpha coefficients (highest standardized bias = −83%, highest MSE ratio = 1.64, lowest CI coverage = 73%), and overestimated mean scale scores (highest standardized bias = 118%, highest MSE ratio = 2.15, lowest CI coverage = 79%) when the distributions were severely asymmetric and N was large.
The result also suggests that the performance of naïve rounding and the two logistic regression approaches tended to be influenced by number of categories (K). As K increased, the logistic regression approaches became less and less likely to yield acceptable results for the parameters. For instance, with K = 3, the multinomial approach was acceptable for all three parameters when N = 500 and the item distributions The two-way interaction effects between imputation strategy and N and between imputation strategy and distribution asymmetry for the regression slope for dichotomous data.
were NOT severely asymmetric. However, with K = 5, it yielded acceptable result for only the scale level parameters and the item distributions need to be symmetric. Once K increased to 7, it failed under all conditions. Turning to the proportional odds logistic regression model, it was able to produce acceptable result for the scale level parameters when item distributions were NOT severely asymmetric with K = 3 categories. With more categories (5 or 7), however, it only performed well with symmetric distributions and required a larger N with 7 categories than with 5 categories. For naïve rounding, the direction is opposite. As K increased, naïve rounding became more and more likely to yield acceptable results for the parameters especially with a small N.
Another interesting observation is that the standardized bias in the alpha coefficient was generally greater than that in the scale-level parameters (mean scale score and regression slope) when the item distributions were not severely asymmetric, implying that the item-level parameters (correlations in this case) may be more impacted by imputation strategy than the scale level parameters. This is not surprising as the scale-level parameters probably absorbed this bias in a way that yields more acceptable parameter estimates.

FIGURE 3
Representative two-way interaction effects between imputation strategy and distribution asymmetry for polytomous data.

EMPIRICAL EXAMPLE
An empirical example is provided to illustrate the examined strategies for polytomous data. The data for the example is from a study conducted by Salyers et al. (2014) to examine FIGURE 4 Representative two-way interaction effects between imputation strategy and N for polytomous data.
the effect of burnout of staffs working in community health on quality of care in mental healthcare. The participants are 113 employees at a Community Mental Health Center. Staff burnout and quality of care were measured using a 7-point and 5-point Likert scales, respectively. Each scale contains a few subscales. For simplicity, we selected one subscale from each scale for analysis. Specifically, we selected the personal accomplishment subscale (8 items, measures feelings of competence and successful achievement in one's work) from the burnout scale and the client-centered care subscale (11 items, measures the degree to which a clinician work with clients around their identified goals) from the quality of care scale.
In the original data set, the missing data on the clientcentered care items ranged from 2.7%-27.1% (average = 14.6%) and the missing data on the personal accomplishment items ranged from 1.8% to 4.4% in the original data set. To approximate the amount of missing data used in the simulation study (the average missing data proportion was 15% for the conditions with 30% missing data on half of the items), we imposed additional 20% missing data on half of the personal accomplishment items in a way such that the missingness was determined by gender (MAR missing). With the additional missing data, the missing data on the personal accomplishment items ranged from 1.8% to 24.8% (average = 13.0%).
Since the data in the two scales have 5 or 7 categories, we applied the strategies for ordinal data with more than 2 categories (i.e., the normal model approach without rounding and with naïve rounding, the latent variable, the multinomial regression, and the proportional odds regression model approaches). The software packages used for the strategies are the same as those used in the simulation study (the R and Mplus codes are included in the Appendices A and B, respectively). Fifty imputations were obtained for each strategy. After imputation, a scale score was computed for each subscale by summing the corresponding item scores. In parallel to the parameters considered in the original study and the simulation study, we computed the Cronbach's alpha and mean scale score for each subscale, and the linear regression coefficient of predicting client-centered care from personal accomplishment at the scale score level. It is expected that higher sense of personal accomplishment is associated with higher quality of client-centered care.
The result is reported in Table 8. Consistent with the simulation study, the latent variable model approach and the normal model approach without rounding produced almost identical result on all of the parameters. Naïve rounding performed comparably to the normal model approach without rounding in this case due to small N and relatively large number of categories in the two scales. The multinominal and proportional odds logistic regression approaches yielded different estimates on the parameters especially the regression coefficient than the other approaches.
However, there were a few results from the empirical example that were different from those from the simulation study. One is that the two logistic regression approaches produced larger regression coefficients than those from the other approaches, while the simulation study suggests that they tended to underestimate the regression coefficient. This discrepancy is probably due to the fact that most of the items in the example had negatively skewed distributions (the average skewness of the personal achievement items was -1.04 and the average skewness of the client-centered care items was -0.91) while only positively skewed distributions were generated in the simulation study (see Figure 1). This implies that the direction of biases would be influenced by the direction of skewness.
The other is that the imputation strategies did not seem to differ as much in the estimation of the alpha-coefficient or mean scale score as in the simulation study. A possible explanation is that we reported standardized biases in the simulation study but raw estimates in the empirical example given that the population values and the empirical standard errors were unknown in the empirical example. Small difference in the raw metric might translate to a large difference in the standardized metric if the empirical standard error is small. For example, the standard error of the mean client-centered care scale score from the analysis was .77 for the proportional odds model approach. If this standard error is used to standardize the difference in the mean scale score between the latent variable and proportional odds approaches, then the 1.29 unit difference in the raw metric (see Table 8) will turn out to be 1.68 (i.e., 168%) units difference in the standardized metric, which would be deemed substantial based on the threshold of 40%.

CONCLUSION AND DISCUSSION
In the current article, a large scale simulation study is conducted to compare the performance of a variety of imputation strategies for ordinal missing data in recovering reliability coefficients, mean scale scores, and regression coefficients of predicting one scale score from another. Based on the result from the simulation study, we recommend the latent variable approach, the normal model approach without rounding for either dichotomous or polytomous data, and the discriminant analysis approach for dichotomous data. These approaches were least influenced by the design factors and yielded reliable and consistent result across all conditions examined in the study. These approaches yielded not only the lowest biases, but also the best MSEs and CI coverage rates in general.
We generally don't recommend naïve rounding for ordinal data for the following reasons: a) It does not offer a better performance than the recommended approaches, although it seems acceptable for the estimation of the regression slope, and b) It could potentially result in substantial bias in alpha coefficients and mean scale scores especially when N is large and item distributions are asymmetric.
We do not recommend the logistic regression approaches (i.e., binary, multinomial, and proportional odds) for either dichotomous or polytomous data either. These approaches, although designed specifically for categorical data, yielded poor estimates, MSEs, and CI coverage probabilities for the parameters considered in the study under many of the conditions. They were also highly sensitive to N, asymmetry of item distributions, and number of categories. This sensitiv-ity is probably related to the fact that the logistic regression models are highly influenced by sparseness of the ordinal data. To reduce the sparseness, a large N is usually required. This N requirement will also increase when the distributions get more asymmetric, the number of categories increases, or the number of variables increases in the imputation model. The reason Allison (2005) found that the binary logistic regression model approach yielded satisfactory results with N = 500 is probably that there is only one binary variable in his imputation model. In our study, there are 12 ordinal variables in the imputation model and the variables can have more than 2 categories. In this case, N = 500 was apparently insufficient. In real world studies, imputation models could get more complicated than ours. Thus, although the logistic regression model approaches may be theoretically sound, their real word applicability may be highly compromised by the sample size requirement. This problem is also noted in Schafer (1997;p. 239).

LIMITATIONS AND FUTURE DIRECTIONS
Given the scope of the study, we could not cover every analytical context that is of interest to researchers. For instance, it would be interesting to know how the strategies would perform if an ordinal variable with missing data was used to predict a continuous variable in a linear regression analysis or if the ordinal variable was the dependent variable in an ordinal logistic regression analysis. In the latter case, the normal model approach without rounding would not be applicable because the imputed values have to be discrete. As reviewed previously, Finch (2010) found that the multinomial regression model approach was inferior to the normal approach with naïve rounding in the estimation of the ordinal logistic regression coefficients. However, it is unclear how the proportional odds logistic regression and the latent variable approaches would perform in this context. A study that considers these analytical contexts is currently being conducted.
In addition, we only considered parametric models for imputation in the current study. Recent research has shown that ordinal missing data may also be imputed using nonparametric models such as random forest models (Stekhoven & Bühlmann, 2012). The random forest model is a machine learning technique that has many desirable features. First, it can be applied to a variety of categorical data types (including ordinal data) and can be used when there is a mix of categorical and continuous data (Stekhoven & Bühlmann, 2012). Second, it does not require any distributional assumption. Third, it can accommodate nonlinear relationships such as interactions among variables (Doove, van Buuren, & Dusseldorp, 2014;Shah, Bartlett, Carpenter, Nicholas, & Hemingway, 2014). This method has been implemented in R packages MICE and missForest (Doove, van Buuren, & Dusseldorp, 2014;Stekhoven & Bühlmann, 2012). It would be interesting to compare this nonparametric approach to the parametric model strategies.
One might argue that the reason the latent variable approach performed well in the current study is that it matches with the data generation mechanism: the ordinal data are generated from normally distributed latent variables. Thus it would be interesting to know how this approach would perform if the latent distributions are in fact nonnormal or if the ordinal data are generated from truly categorical distributions such as cumulative logistic distributions. We briefly investigated the former case by generating ordinal data from moderately nonnormal latent distributions (skewness = 2.25, kurtosis = 7) for a subset of the conditions. We found that the conclusion remained the same, implying that the shape of the latent variable distributions might not matter as much as the shape of the observed item distributions. We are not clear on how the imputation strategies examined in the article would perform under the latter case especially whether the logistic regression approaches would perform better if they conform to the data generation mechanism. This may be an avenue for future research.
Another limitation is that we computed the alpha coefficient based on Pearson correlations. Although this is a common practice, it is more appropriate to compute the alpha coefficient based on polychoric correlations if the items cannot be assumed to be continuous (Zumbo, Gadermann, & Zeisser, 2007). Future research needs to be conducted to examine whether the conclusion of the simulation study can be generalized to alpha coefficients computed based on polychoric correlations.
Furthermore, the items were simulated to share the same level of distribution asymmetry. In practice, the shape of item distribution is likely to vary across items. As a preliminary investigation, we mixed items with symmetric, moderately asymmetric, or severely asymmetric distributions in one imputation model for a subset of the conditions examined in the study. The results suggest that the influence of the items with asymmetric distributions seemed to be offset by the items with symmetric distributions to a certain degree. However, even with mixed distributions, the latent variable approach and the normal model approach without rounding still outperformed the other approaches. In addition, as mentioned above, we only considered positively skewed distributions. Thus the result from the simulation study particularly the direction of the biases may not be directly generalized to negatively skewed distributions. As shown in the empirical example, with negatively skewed distributions, the logistic regression approaches led to overestimated instead of underestimated regression coefficients. This issue needs to be further investigated.
Finally, we only examined Ns up to 500. Since the performance of the binary and proportional odds logistic regression seemed improved as N increased, it would be interesting to investigate whether a larger N would enable them to deliver a satisfactory result. As a preliminary investigation, we examined N = 1000 for the imputation strategies for ordinal data with 3 categories. We found that N = 1000 did improve the performance of the proportional odds logistic regression approach. However, the standardized biases in the parameters from the approach were still above the threshold (40%) when the distributions were severely asymmetric. Another interesting observation is that as N increased to 1000, the normal model approach without rounding started to yield unacceptable standardized biases (>40%) in the alpha coefficient when the item distributions were severely asymmetric. In contrast, the results on all of the parameters from the latent variable approach still stayed below threshold, implying that the latent variable approach may outperform the normal model approach without rounding with a large N. Based on these preliminary results, a future study is warranted to investigate the large sample properties of the imputation strategies.
In sum, we compared multiple imputation strategies for ordinal missing data in the context where the incomplete ordinal variables are to be aggregated to scale scores. This comparison resulted in the recommendation of the normal model approach without rounding and the latent variable model approach. In practice, researchers are often faced with data sets containing missing data on a wide range of variable types. For example, continuous and nominal missing data may coexist with ordinal missing data. It is important to note that the two recommended approaches are still applicable in this case if the nominal missing data are imputed as sets of dummy codes. Allison (2002, p. 40) suggested that missing dummy codes can be imputed under a normal model and proposed a simple process to recreate the dummy codes from the normal imputations. Nominal missing data can also be imputed as a single variable using categorical models (e.g., multinomial logistic regression model) or nonparametric models (e.g., random forest). To our knowledge, these imputation models cannot be implemented along with the latent variable approach in the existing software packages. However, they can be implemented along with the normal model approach through chained equation imputation (e.g., MICE), with ordinal and continuous missing data imputed under a normal model. A study is currently being conducted to compare these