Revisiting Savalei’s (2011) Research on Remediating Zero-Frequency Cells in Estimating Polychoric Correlations: A Data Distribution Perspective

Abstract In Savalei’s (2011) simulation that evaluated the performance of polychoric correlation estimates in small samples, two methods for treating zero-frequency cells, adding 0.5 (ADD) and doing nothing (NONE), were compared. Savalei tentatively suggested using ADD for binary data and NONE for data with three or more categories. Yet, Savalei’s suggestion could be explained by the skewness of the data distribution being severe for binary data and slight for three-category data. To rule out this alternative explanation, we extended Savalei’s design by incorporating the degree of skewness into our simulation. With slightly skewed data, NONE is recommended due to its high-quality estimates. With severely skewed data, only ADD is recommended for binary data when the skewness of two variables is the same-signed and the underlying correlation is expected to be strong. Methods for improving the polychoric correlation estimates with severely skewed data merit further study.


Introduction
A polychoric correlation coefficient estimates the productmoment correlation between two normally distributed variables assumed to underlie the observed ordered categorical data (Drasgow, 2005;Pearson, 1900;Pearson & Pearson, 1922;Ritchie-Scott, 1918).Specifically, the observed ordered categorical variables are hypothesized to be obtained by discretizing the underlying normal distribution with certain thresholds of each variable, and their data can be displayed in a contingency table.Based on the frequencies observed in such a table, polychoric correlations are estimated using two sets of parameters: the thresholds for each variable and the underlying correlation.These parameters can be estimated jointly in one step or using a two-step procedure, with thresholds estimated first, followed by the correlation.Although polychoric correlations are biased estimates, the biases are negligible if the contingency table contains no low-frequency cells (Olsson, 1979).
Contingency tables with zero-frequency cells, commonly occurring in small sample sizes, can produce severely biased polychoric correlation estimates.As demonstrated by Brown and Benedetti (1977), a contingency table with one zero-frequency cell can easily be collected from a sample size of 100 with highly skewed binary variables (e.g., probability of 0.90 for responding 0 and 0.10 for responding 1).They further showed that even if the population correlation is zero, a near-boundary estimate (e.g., greater than 0.90) can be obtained from the contingency table with merely one zerofrequency cell.Thus, Brown and Benedetti, borrowing the concept from Yates' (1934) continuity correction, suggested adding 0.5 (ADD) to the zero-frequency cell to avoid the severely biased estimate.
Many computer programs provide options for adjusting zero-frequency cells when estimating polychoric correlations, including R package psych (Revelle, 2022), lavaan (Rosseel, 2012), and EFAutilities (Zhang et al., 2022), as well as EQS (Bentler & Wu, 2005) and Mplus (Muth en & Muth en, 1998-2017).Using a small-sized binary dataset with a zero-frequency cell (i.e., N ¼ 55 from Maydeu-Olivares & B€ ockenholt, 2005), Savalei (2011) demonstrated that different programs consistently yielded near-boundary estimates if no adjustment to zero-frequency cells was made (NONE), 0.866 and 1.00 from EQS and Mplus, respectively, whereas corresponding estimates of ADD were quite different being 0.407 and 0.373.Due to the nontrivial difference between estimates obtained by NONE and ADD, Savalei (2011) compared the performance of these two estimators using a numerical experiment.To increase the likelihood of generating at least a zero-frequency cell, Savalei manipulated two-and three-category data distributions so that a small proportion (i.e., 6.7%, 15.9%, or 21.2%) fell into the minimal or maximal response category.In the case of two-category data, the remaining proportion was assigned to the other response category; in the case of three-category data, it was distributed evenly between the remaining two response categories.Savalei tentatively suggested using ADD for dichotomous data and NONE for data with three or more categories because the performance of ADD and NONE in dichotomous data differed from data with three or more categories.This suggestion has been widely adopted in empirical studies (e.g., Burnett et al., 2014;Disabato et al., 2016;Disabato et al., 2019;Jenkins et al., 2018;Lacko et al., 2022;Recio-Rom an et al., 2021) and simulation research (e.g., Lubbe, 2019;Schuberth et al., 2018).
However, Savalei's (2011) numerical experiment may overlook a crucial factor that affects polychoric correlation estimates in the presence of zero-frequency cells.According to Brown and Benedetti's (1977) demonstration using binary data, increasing skewed responses would increase the likelihood of generating zero-frequency cells in a contingency table, resulting in a severely biased estimate of polychoric correlation.Yet, Savalei did not control the degree of skewness over different numbers of categories.When the skewness of the two-and three-category data distributions manipulated by Savalei was calculated (i.e.,1.43,1.85,or 3.45 and 0.32,0.43,or 0.49 for two-and three-category data, respectively), a notable difference in skewness was observed.This difference in skewness may explain why the performance of NONE and ADD in Savalei's two-category data was different from that in the three-category data.However, to the best of our knowledge, no studies have been conducted to rule out this alternative explanation 1 .We thus reexamined whether Savalei's suggestions to use ADD for binary data and NONE for data with three or more categories were applicable when the skewness of data distributions was considered.Given the wide-ranging impact of Savalei's research, it would be beneficial to reexamine the findings.Our study results are expected to enhance the present understanding of the estimators of NONE and ADD for future studies that employ polychoric correlations with ordered categorical data.
This paper is organized as follows.First, Savalei's (2011) simulation is briefly introduced.Second, we illustrate the need to reexamine the conclusion of Savalei's (2011) research.Third, our simulation design for comparing the NONE and the ADD estimator is presented, followed by a summary of the results.The paper concludes with suggestions for dealing with zero-frequency cells when estimating polychoric correlations using these two approaches.

Simulation Study of Savalei (2011)
After noticing a nontrivial disparity between the NONE (i.e., doing nothing) and ADD (i.e., adding .5)approaches to correcting zero-frequency cells, Savalei (2011) compared these two estimators using a numerical experiment by manipulating conditions that would yield zero-frequency cells in the EQS environment with parameters jointly estimated using the least square method.To increase the likelihood of generating a zero-frequency cell in a contingency table, Savalei used one extreme threshold to categorize the underlying standard normal distributions under dichotomous data.Under three-category data, in addition to the one extreme threshold used in dichotomous data, Savalei considered another threshold that equally divided the larger proportion of the distribution.In addition to manipulating the degree of extreme thresholds, other variables that may affect zero-frequency cells were also examined, including the presence of same-or opposite-signed thresholds for two variables, small sample sizes (N ¼ 100 or 200), and population correlations (R ¼ 0.3, 0.5, 0.7, or 0.9).Any difference between the NONE and the ADD methods was caused by the treatment for zero-frequency cells.Prior to comparing the performance of the NONE and ADD estimators, Savalei (2011) checked if simulated contingency tables would contain at least one zero-frequency cell in each condition.The likelihood of containing at least one zero-frequency cell in a contingency table was greater in three-category data conditions compared to dichotomous data conditions, and it increased further as the size of the correlation grew and the threshold became more extreme.Furthermore, conditions with opposite-signed thresholds would have a higher probability than those with the samesigned thresholds.The effect of sample size appeared to be minor under the simulated conditions.Consequently, it was anticipated that the difference between the two estimators would be proportional to the number of categories, strength of correlation, degree of extreme thresholds, and presence of same-or opposite-signed thresholds.
To evaluate these two estimators, Savalei (2011) considered the quality of correlation estimates (i.e., bias, empirical standard error, and the empirical distribution of correlation estimates) and the quality of standard error estimates (i.e., the ratio of estimated empirical standard error and the coverage of 95% confidence intervals).According to Savalei's results, two estimators performed differently with different numbers of data categories.Under two-category data, ADD yielded biased estimates with small empirical standard errors, particularly at weak correlations (i.e., 0.3 and 0.5), while NONE yielded severely biased estimates with large empirical standard errors.Moreover, as the extreme thresholds became severe and the signs of thresholds were in opposite directions, the empirical distribution of NONE turned out to be bimodal with certain proportions of nearboundary estimates (i.e., values greater than 0.90), whereas the distribution of ADD remained unimodal without any near-boundary estimates.On the contrary, under three-category data, using NONE instead of ADD yielded unbiased estimates and comparable empirical standard errors in all conditions.Based on these findings, Savalei thus recommended that researchers use ADD for two-category data and NONE for three-category data.Savalei's (2011) Findings from a Data Distribution Perspective Savalei's (2011) tentative suggestion on what to do about zero-frequency cells in estimating polychoric correlations has been popular in the scientific literature (e.g., Disabato et al., 2019;Jenkins et al., 2018;Lacko et al., 2022;Lubbe, 1 Our review revealed that Savalei's (2011) research has been widely referenced in one of the two contexts: (1) to support the use of ADD (e.g., Disabato et al., 2019;Jenkins et al., 2018;Lacko et al., 2022;Lubbe, 2019;Recio-Rom an et al., 2021;Schuberth et al., 2018) or (2) to substantiate the argument that tables with zero-frequency cells can result in biased polychoric correlation estimates (e.g., Bainter & Forster, 2019;DiStefano et al., 2018;DiStefano et al., 2021;Flora et al., 2012;Olvera Astivia, 2013;Pendergast et al., 2017;Yang & Xia, 2019).

Reexamining
2019; Recio-Rom an et al., 2021;Schuberth et al., 2018).However, due to Savalei's manipulation of thresholds, the skewness of simulated data under two and three categories varied greatly, signaling the need to reevaluate Savalei's suggestion.For dichotomous data, having only one extreme threshold in a contingency table could result in a severely skewed distribution; however, for three-category data, using an additional threshold that halved the larger portion would result in a distribution of merely a slight degree of skewness.The slight skewness in three-category data would lead to well-performed NONE estimates of polychoric correlation because these data contain too few zero-frequency cells to yield near-boundary estimates under the NONE option.
As shown in Table 1, for Savalei's design, distributions for two-category data could be severely skewed (beyond the absolute skewness of 1.43), with a small proportion (i.e., 0.21, 0.16, or 0.07) for one category and a large proportion (i.e., 0.79, 0.84, or 0.93) for the other, resulting in increasing bias in the estimates of correlation from NONE as the skewness becomes extreme and opposite-signed.In contrast, distributions for three-category data were merely slightly skewed (below the absolute value of 0.49), with the same small proportion as in dichotomous data for the minimal or maximal response category, but with the remainder distributed equally to the other two categories (i.e., 0.40, 0.42, or 0.47).Even though such manipulation would result in a higher likelihood of containing at least one zero-frequency cell in three-category data than in two-category data, polychoric correlation estimates were relatively unbiased.Based on this observation, the different performance of NONE under data with two or three categories may be due to inadequate manipulation of skewness.
To explore whether skewness influences the bias of correlation estimates from NONE, we controlled the degree of skewness for two-and three-category data to be slight and severe.As displayed in Table 1, skewness better explains the magnitude of bias in polychoric correlation estimates from NONE than the likelihood of containing at least one zerofrequency cell in a contingency table.Under both two-and three-category data, slight skewness led to negligible bias, whereas severe skewness resulted in trivial to significant bias.With severely skewed data, the bias increased as skewness grew, and it deteriorated when the skewness was of the opposite sign.Generally, an increase in skewness results in an increase in zero-frequency cells, thereby exacerbating the bias of polychoric correlation estimates for NONE.Notably, a high likelihood of containing at least one zero-frequency cell in a contingency table does not guarantee a severely biased estimate from NONE.Instead, conditional on each number of categories, the average number of zero-frequency cells appears to be proportional to the bias of NONE.Based on this preliminary investigation, skewness appears to be a strong predictor of NONE's performance for two-and three-category data in the presence of zero-frequency cells.Therefore, the suggestion made by Savalei (2011) regarding the performance of the two estimators (NONE and ADD) must be revisited by controlling for the degree of skewness over different numbers of categories of data.The population correlation was set to 0.30 with a sample size of 100.Number of replications was set to 2,000.Since the category proportions of the negative skewness are the same as the positive one only with their labels reversed, we thus present category proportions of the positive skewness.Likelihood: the likelihood of containing at least one zero-frequency cell in a contingency table.Average: the average number of zero-frequency cells in a contingency table.

Purpose of This Study
This study aimed to reexamine the performance of polychoric correlation estimates of doing nothing (NONE) and adding 0.5 (ADD) to zero-frequency cells in a contingency table from the perspective of data distribution.The reexamination was facilitated by conducting a Monte Carlo simulation to replicate Savalei's (2011) study conditions while controlling for the degrees of skewness.The slight and severe skewness observed in Savalei's three-and two-category data were considered in simulating our data distributions.This simulation was extended in two respects.First, fourcategory data were included to cover the range of ordered categorical data that simulation researchers have suggested should be considered discrete (e.g., Preston & Colman, 2000).Second, zero correlation was included for two reasons.First, in the presence of zero-frequency cells, researchers may erroneously conclude on the basis of polychoric correlation that two unrelated, ordered categorical variables are highly correlated (e.g., Brown & Benedetti, 1977), which would hinder scientific progress.Second, when applying parallel analysis to ordered categorical data, it is recommended that researchers use polychoric correlations on randomly generated data to determine the baseline for selecting the number of factors (Garrido et al., 2013;Lubbe, 2019;Timmerman & Lorenzo-Seva, 2011).If simulated polychoric correlations on randomly generated data are biased, then the results of parallel analysis will be compromised.
The performances of the NONE and the ADD were compared in terms of the qualities of their respective parameter estimates.The qualities of the parameter estimates were examined using the same dependent variables as in Savalei's research, including biases, empirical standard errors, and empirical distributions of the two estimators.

Methods
The following sections describe the manipulated variables, the data generation and analysis procedures, and the dependent variables used in this simulation study.

Population Conditions
We manipulated five sizes of correlation between two underlying variables: 0.0, 0.3, 0.5, 0.7, and 0.9.The number of categories was set to two, three, or four.Two types of skewness (slight and severe) were considered, each consisting of six levels.Slight skewness was set to 0.49, 0.43, 0.32, À0.32, À0.43, or À0.49, and severe skewness was set to 3. 45, 1.85, 1.43, À1.43, À1.85, or À3.45.The levels of slight and severe skewness were based on the levels used in Savalei's design for three-and two-category data, respectively.
Under each type of skewness, all pairwise combinations of the six levels of skewness were examined, with the exception of nine, which were deleted since they had the same skewness as other combinations with only the signs of skewness of the two variables reversed (e.g., 3.45/3.45and À3.45/À3.45for the skewness of the first/second variable).Therefore, for each type of skewness, 12 skewness combinations were considered, half of which had the same sign of skewness and the other half with the opposite sign.All skewness values, along with their corresponding kurtosis values, manipulated thresholds, and marginal probabilities, are summarized in Table 2 and Figure 1.A total of 360 population conditions (5 correlations Â 3 # categories Â 2 types of skewness Â 12 combinations of skewness) were constructed.

Data Generation and Analyses
In each population condition, 2,000 datasets were sampled with the sample size set at 100 or 200.All data generation and analyses were conducted in R version 4.1.1(R Core Team, 2022).Polychoric correlations were estimated with two-step maximum likelihood estimation (Olsson, 1979) using the polychoric function in R package psych (Revelle, 2022).Every dataset was analyzed by the NONE method (leaving zero-frequency cells as they were) and the ADD method (adding .5 to zero-frequency cells).

Dependent Variables
Under each condition, the biases and empirical standard errors of correlation estimates derived from NONE and ADD were evaluated and compared.The bias of polychoric correlation estimates was calculated by q À q, where q is the estimate of q (the population correlation), and q is the mean of q across all the datasets in a condition.The empirical standard error of polychoric correlation estimates was calculated by , where qr is the estimated correlation from the r th replication, and Rep is the number of replications (i.e., 2,000).

Results
The results for each dependent variable are presented in the same order as in Savalei's (2011) study on NONE and ADD to facilitate comparison.To gain an initial understanding of the performance of the two methods, the tendency of zerofrequency cells generated under all conditions is analyzed.Biases and empirical standard errors are then presented to evaluate the quality of NONE and ADD estimates, followed by an examination of near-boundary estimates through visualizing the empirical distributions of the estimates obtained from the two methods.In each section, we first compare our results to Savalei's under identical conditions to ensure successful replications, and then analyze the performance of the two methods with skewness controlled over the data with varying categories.The findings of a sample size of 200 revealed similar trends to those of a sample size of 100, with only a minor improvement in the quality of the correlation estimates obtained from the two methods, and are presented in Appendix A.

Zero-Frequency Cells
The likelihood of at least one zero-frequency cell in the data, as shown in Appendix B, was identical to what Savalei (2011, Tables 2 and 3 on pp.258-259) found under the same conditions.Generally, the likelihood increased with increasing correlation and skewness and was greater for two variables of opposite-signed skewness than of same-signed skewness.Moreover, three-category data had a greater likelihood compared to two-category data.Despite the successful replication of the likelihood of the occurrence of zerofrequency cells, such a likelihood has a low explanatory power in polychoric correlation estimates.Rather, the average number of zero-frequency cells is a better indicator of the performance of NONE and ADD.It is expected that the disparities between the two methods can mirror the trends in the rise of the number of zero-frequency cells.
The average number of zero-frequency cells is presented in Table 3.As known from the preceding section (see also Table 1), the increase in zero-frequency cells from two-to four-category data was expected.Within each number of categories, the zero-frequency cells continued to rise as the degree of skewness increased and the sign of skewness changed from being the same to being opposite, accompanied by a slight increase in the size of the correlation.These findings showed that, for data with a particular number of categories, skewness dominated the number of zerofrequency cells, supporting the notion that skewness should be accounted for when the performance of NONE and ADD is evaluated.For the salient impact of the degree of skewness, the following results are presented separately for both slight and severe skewness.

Bias of Polychoric Correlation Estimates
Table 4 displays the biases of the polychoric correlation estimates of NONE and ADD with slight skewness.For threecategory data, our results successfully replicated Savalei's (2011, Table 5 on p. 263) findings that estimates of NONE were unbiased under all conditions and those of ADD tended to be underestimated, especially for variables with opposite-signed skewness and strong correlation.Even after we extended the slight skewness to two and four categories, NONE still yielded unbiased estimates.In contrast, the severity of the underestimation of ADD increased with the increase in the number of categories.As a result, NONE appears to provide an unbiased estimate in the presence of slight skewness, whereas ADD would introduce additional bias.Table 5 shows that the estimation biases of NONE and ADD were greater under severe skewness than under slight skewness.Our dichotomous results mirrored those of Savalei (2011, Table 4 on p. 261), who found that for samesigned skewness, estimates of NONE were slightly more biased than those of ADD, while for opposite-signed skewness, NONE overestimated but ADD underestimated.After we extended the severe skewness to three and four categories, the pattern was slightly different.Generally, for samesigned skewness, NONE became relatively unbiased except for the underestimation at small correlations of 0.0 and 0.3 with at least one variable being extremely skewed (i.e., 3.45), whereas ADD tended to overestimate the smallest correlations of 0.0 and 0.3 while underestimating the largest correlation of 0.9.For opposite-signed skewness, the estimates of NONE and ADD exhibited severe biases, with NONE overestimating and ADD underestimating.However, as the number of categories and the strength of correlation increased, NONE tended to improve while ADD deteriorated.
Overall, the performance of the bias for estimates of NONE and ADD varied with skewness.Under slight skewness, NONE were unbiased and outperformed ADD regardless of the number of categories in the data.In contrast, under severe skewness, ADD performed better than NONE only for binary data with same-signed skewness, consistent with Savalei (2011), and its performance deteriorated as the number of categories and the strength of correlation increased.

Empirical Standard Error of Polychoric Estimates
Table 6 displays the empirical standard errors of polychoric estimates for NONE and ADD with slightly skewed data.Savalei (2011) also manipulated three-category data with slight skewness, and our results were consistent with Savalei (see Table 5 on p. 263) in that the empirical standard errors of NONE and ADD were comparable.As the number of categories was varied to two or four, the empirical standard errors of the two methods remained similar, and both mainly decreased as the strength of correlation and the number of categories increased.
As shown in Table 7, with severely skewed data, the dichotomous results were also consistent with the findings of Savalei (2011, Table 6 on p. 264) in that ADD had an advantage of smaller empirical standard errors than those of NONE, especially when the correlation was weak (0.0-0.5).After extending the severe skewness to conditions involving a larger number of categories, we found that the performance of NONE and ADD improved and that NONE remained inferior to ADD.Despite the superior performance of ADD over NONE, the empirical standard errors were relatively high for binary data in the presence of weak correlations (e.g., 0.0-0.5 for same-signed skewness; 0.0-0.3 for opposite-signed skewness).

Distribution of Polychoric Estimates
According to Savalei (2011), examining the entire shape of polychoric estimates could reveal if the empirical distribution of estimates from the two methods can reasonably approximate a normal distribution.For brevity, we selected a small number of levels from each manipulated variable, including those with slight and severe skewness, one skewness combination under same-and opposite-signed of skewness (0.49/0.43 and 0.49/À0.43for slight skewness and 3.45/1.85and 3.45/À1.85for severe skewness) with a correlation of 0.0, 0.3, and 0.7, and either two or four categories.
Figure 2 depicts the distributions of NONE and ADD for these selected conditions.For slight skewness, NONE and ADD showed perfect bell-curved shapes under all conditions, whereas for severe skewness, NONE exhibited bimodal distributions while ADD displayed unimodal distributions.As also mentioned in Savalei (2011), the bimodal distributions could be viewed as a mixture of two components; accordingly, we elaborated on how the bimodal distribution changed under our simulated conditions based on severely skewed data.
For NONE, the bimodal distributions appeared to have different patterns for same-and opposite-signed skewness.For same-signed skewness, a large proportion of nearboundary estimates at À1.0 could be found under two-category data with zero correlation, and the proportion decreased as the number of categories and the strength of correlation increased.It is further noteworthy that, except for those negative near-boundary estimates, the remainder of the estimates seemed to remain unbiased.For oppositesigned skewness, similar to Savalei's findings (2011, Figure 1, p. 266), one of the components was near-boundary estimates at 1.0 with the proportion decreasing as the number of categories increased, which could explain the decline in the bias of NONE.Also, aside from near-boundary estimates, the peaks of the distribution regarding non-nearboundary estimates seemed to be close to their true values.As a result, while NONE would yield near-boundary estimates in the presence of severe skewness, other estimates that were non-near-boundary appeared to be reliable.For ADD, although no near-boundary estimates were found and the distributions of ADD were unimodal, the peaks of ADD's distributions deviated from their true values, especially for opposite-signed skewness with four-category data at a correlation of 0.7, consistent with the severe underestimation observed in the bias section.

Summary
In this simulation, we successfully replicated Savalei's (2011) findings under two-and three-category data in that the same conclusions were reached.Under two-category data with severe skewness, the two methods both yielded severely biased estimates, but ADD outperformed NONE due to its relatively small empirical standard error and unimodal distribution.Under three-category data with slight skewness, NONE performed better than ADD because it provided an unbiased estimate and had an extremely similar empirical standard error as ADD.
To investigate whether Savalei's (2011) conclusion still holds after controlling for the severity of skewness, we extended the severe skewness to data with three and four categories and slight skewness to data with two and four categories.The results showed that the degree of skewness affected the performance of both methods.Under slight skewness, NONE yielded unbiased estimates in all conditions, while ADD tended to underestimate at a higher rate and became more biased as the number of categories increased.Also, NONE had nearly identical empirical standard errors to ADD.Under severe skewness, NONE and ADD both yielded severely biased estimates when two variables were skewed in the opposite direction.When two variables were skewed in the same direction, NONE yielded relatively unbiased estimates under three-and four-category data but was accompanied by large empirical standard errors.ADD, despite being relatively unbiased under binary data, had large empirical standard errors in the presence of weak correlations.
Based on the visualization of the distribution of estimates, estimates of NONE appeared to be bimodal mostly in severe skewness, with portions of near-boundary estimates at 1.0 or À1.0 for same-and opposite-signed skewness, respectively.However, after those near-boundary estimates were excluded, the other estimates seemed to be only slightly biased.Compared to NONE, although ADD yielded estimates that approximated a bell-curved shape under all simulated conditions, the estimates of ADD seemed to deviate from the true values.
Overall, NONE yielded high-quality estimates for slightly skewed data but obtained certain proportions of nearboundary estimates for severely skewed data.Except for dichotomous variables that are severely skewed in the same direction and highly correlated (i.e., greater than 0.5), ADD did not improve the quality of correlation estimates but rather introduced additional bias.Given that NONE (doing nothing) performed poorly for severely skewed data and ADD (adding 0.5 to zero cells) can be used only for dichotomous variables that are severely skewed in the same direction and highly correlated, it would be beneficial to investigate additional methods or options other than ADD.We thus searched for variations of the ADD estimator, as well as other methods and options available in R packages and investigated the performance of those additional methods/estimators, relative to the ADD estimator.Our search identified three additional estimators: (1) adding 0.5 to all cells (ADD.All; see Bonett & Price, 2005, 2007;Long et al., 2009), (2) adding 0.5 to zero cells with margins being held constant in tables with two categories (ADD.Margin; the option of zero.keep.margins in the function lavCor of the R package lavaan, Rosseel, 2012), and (3) adding 1/(#cells in a contingency table) to all cells (ADD#Cell; the default in the function get.RGamma of the R package EFAutilities; Zhang et al., 2022).Using the same design as in the Method section, we conducted an additional simulation to investigate the performance of these three estimators against ADD, especially under severe skewness for which correcting zero-frequency cells was required.Our findings indicated that ADD.All and ADD.Margin performed nearly identically with ADD.The ADD#Cell estimator exhibited lower biases than ADD; yet it consistently yielded larger empirical standard errors than ADD.Based on these results, the three additional estimators (i.e., ADD.All, ADD.Margin, and ADD#Cell) did not exhibit superior performance over ADD.In summary, none of these three additional estimators are available for correcting zerofrequency cells under severely skewed data 3 , and ADD is acceptable for data conditions with two categories when variables are skewed in same direction and highly correlated.We present the three additional estimators' biases, empirical standard errors, and empirical distributions in Supplementary Materials.

Discussion
Dealing with zero-frequency cells in contingency tables is critical in polychoric correlation estimation.Savalei (2011) examined the performance of two approaches for dealing with zero-frequency cells, NONE (doing nothing) and ADD (adding 0.5 to zero-frequency cells), given two-and threecategory data.Savalei further found that under dichotomous data, ADD yielded unbiased and stable estimates under correlations at 0.3 and 0.5, whereas NONE yielded severely biased and unstable estimates with a bimodal distribution due to the near-boundary estimates, as also found in Brown

2
We sincerely thank Dr. Savalei for her constructive comments on expanding the scope of our work to include additional methods and options beyond NONE and ADD that performed poorly under severely skewed data in our simulations.

3
We also examined the performance of the three additional estimators for slightly skewed data and found that neither of them outperformed NONE.Therefore, in slight skewness, NONE is still recommended to be used.
and Benedetti (1977).Under three-category data, NONE produced unbiased estimates whereas ADD tended to underestimate values, and the empirical standard errors of the two methods were small.Based on these results, Savalei (2011) suggested using ADD for dichotomous data due to its relatively acceptable quality of polychoric correlation estimates and using NONE for three-category data to obtain unbiased estimates of polychoric correlations.However, Savalei (2011) did not control for the skewness of the data given different numbers of categories.The skewness was severe in dichotomous data but slight in three-category data.Therefore, this study reexamined the performance of NONE and ADD while controlling for skewness across different numbers of categories.
In this study, we first successfully replicated Savalei's (2011) results under the condition of severe skewness with two-category data and slight skewness with three-category data.As we extended the slight and the severe skewness to different numbers of categories, NONE and ADD performed differently given slight and severe skewness.Under slight skewness, NONE was unbiased and had small empirical standard errors, whereas ADD, despite its small empirical standard errors, tended to underestimate, and this underestimation became severe as the number of categories increased.In contrast, under severe skewness, both NONE and ADD yielded severe biases when two variables were skewed in the opposite direction.When two variables were skewed in the same direction, NONE was relatively unbiased for three-and four-category data, but the estimates were extremely unstable with a certain proportion of near-boundary estimates.Even though ADD performed well for binary data with strong correlations (greater than 0.5), its performance deteriorated as the number of categories increased.Based on our findings, the size of skewness confounded Savalei's suggestion to use ADD for dichotomous data and NONE for three-category data.We instead suggested that with slightly skewed data, NONE can be applicable to any number of categories due to its high quality of polychoric correlation estimates.With severely skewed data, however, NONE is not recommended due to its possibility of obtaining near-boundary estimates under either same-signed or opposite-signed skewness.On the other hand, ADD (i.e., adding 0.5 to zero-frequency cells) is tentatively recommended only for dichotomous variables that are severely skewed in the same direction and highly correlated (greater than 0.5) because ADD tended to introduce additional biases into the estimates in other conditions.
It should be noted that there is a difference between the recommendations of Savalei (2011) and this study under the same simulated conditions.For binary data with severe skewness, Savalei tentatively suggested using ADD unless the two observed variables were in opposite-signed skewness and highly correlated.However, we consider ADD to be used under more restricted conditions in light of our two findings.First, estimates of ADD tended to be severely biased when two variables were in opposite-signed skewness and slightly correlated (less than 0.5).Second, the estimates of ADD were unstable under the same-signed skewness with weak correlations.Therefore, we conservatively suggested that ADD be used only for data with the same-signed skewness and a strong correlation.
As we changed the correlation to zero, it appeared that the quality of the polychoric correlation estimates was low for both NONE and ADD, particularly given severely skewed data.NONE would yield a high proportion of nearboundary estimates.Despite no near-boundary estimates, ADD also yielded severely biased estimates.These findings indicated that researchers may incorrectly conclude that two unrelated variables are highly correlated when either NONE or ADD is used.Also, the performance of parallel analysis under ordered categorical data could be compromised when the polychoric correlation is used to estimate the correlations between randomly generated data.Studies that suggest using polychoric correlation with ordinal data for parallel analysis (e.g., Garrido et al., 2013;Lubbe, 2019;Timmerman & Lorenzo-Seva, 2011) may require reexamination with a severely skewed data distribution.
As mentioned by Brown and Benedetti (1977) and Savalei (2011), zero-frequency cells could result in severely biased polychoric estimates.In this study, we further found that polychoric correlation is inappropriate for use under conditions of severe skewness with a small sample size.Also, adding .5 to zero-frequency cells did not improve the quality of polychoric correlation estimates.Under such challenging data conditions, the following question may be raised: Under severe skewness, what sample size is required to achieve well-behaved polychoric correlation estimates?Increasing the sample size is expected to raise the frequencies in a contingency table, reduce the number of zero-frequency cells, and may lead to proper polychoric correlation estimates.However, a slight increase in the sample size, in our study, from 100 to 200, showed only a trivial improvement of the quality of polychoric correlation estimates.Therefore, the identification of the optimal sample size in each condition warrants future studies.
According to the current study's findings, when the skewness of two variables is severe, the results of the polychoric correlation estimation cannot be relied upon due to the high possibility of obtaining near-boundary estimates.Except for the estimates near the boundary, the remaining estimates appeared to be unbiased or only slightly biased.It is, thus, possible that polychoric correlation will provide accurate estimates in the presence of severe skewness if we can identify and exclude data with near-boundary estimates.Therefore, future research could aid in developing a method for identifying and excluding such data that could lead to near-boundary estimates, thereby facilitating obtaining reliable results.
Given that NONE and ADD, as well as the three additional estimators (i.e., ADD.All, ADD.Margin, and ADD#Cell), failed to address the severe bias caused by zerofrequency cells in the presence of severe skewness, it would be beneficial to consider the development of new methods for correcting zero-frequency cells in the contingency table.Until such a new method is developed, researchers should be cautious when interpreting the results of polychoric correlations computed from highly skewed data.This research has several limitations.First, this study did not consider the medium degree of skewness for two variables (i.e., between 0.49 and 1.43), which is a common range of skewness observed in empirical studies (see Cain et al., 2017).Second, an extremely small sample size (e.g., 50) may occur in empirical studies but was not taken into account in our simulation.In future studies, researchers are, therefore, encouraged to include a wider range of conditions to determine if the present conclusions can be generalized.

Figure 1 .
Figure 1.Marginal probabilities for different numbers of categories and types of skewness.

Figure 2 .
Figure 2. Distribution of correlation estimates of NONE and ADD under same or opposite sign of skewness, correlation at 0.0, 0.3, or 0.7, and number of categories at two or four with N ¼ 100.The distributions of NONE and ADD are presented in blue and red, respectively.True values are shown by dash lines.Frequencies exceeding the upper bound (500) are displayed on the bin.R ¼ correlation; C ¼ number of categories in the data.
7. Findings Beyond NONE and ADD for Severely Skewed Data 2

Table 1 .
Number of zero-frequency cells and bias of polychoric correlation estimates for NONE under two-and three-category data with different skewness and their corresponding category proportions.

Table 2 .
Two types of skewness under different numbers of categories, along with kurtosis, thresholds, and marginal probabilities.

Table 3 .
Average number of zero-frequency cells in a data set under different numbers of categories, sizes of correlations, and combinations of skewness at N ¼ 100.±0.49 and ±3.45 are shown in bold; ±1.43 and ±0.32 are shown in italic.

Table 4 .
Bias of polychoric correlation estimates of NONE and ADD for different numbers of categories, sizes of correlations, and combinations of slight skewness at N ¼ 100.

Table 5 .
Bias of polychoric correlation estimates of NONE and ADD for different numbers of categories, sizes of correlations, and combinations of severe skewness at N ¼ 100.

Table 6 .
Empirical standard error of estimates of NONE and ADD for different numbers of categories, sizes of correlations, and combinations of slight skewness at N ¼ 100.

Table 7 .
Empirical standard error of estimates of NONE and ADD for different numbers of categories, sizes of correlations, and combinations of severe skewness at N ¼ 100.
Values less than 0.10 are shown in italic.Values greater than 0.18 are shaded in gray and over 0.25 are further displayed in bold.