Rotation group bias and the persistence of misclassification errors in the Current Population Surveys

Abstract We develop a general misclassification model to explain the so-called “Rotation Group Bias (RGB)” problem in the Current Population Surveys, where different rotation groups report different labor force statistics. The key insight is that responses to repeated questions in surveys can depend not only on unobserved true values, but also on previous responses to the same questions. Our method provides a framework to understand why unemployment rates in rotation group one are higher than those in other rotation groups in the CPS, without imposing any a priori assumptions on the existence and direction of RGB. Using our method, we provide new estimates of the U.S. unemployment rates, which are much higher than the official series, but lower than previous estimates that ignored persistence in misclassification.


Introduction
The Current Population Survey (CPS) has been the primary source of labor force statistics for the United States since the 1940s. A prominent feature of the CPS is its 4-8-4 rotating panel design, where an individual is interviewed for 4 consecutive months, idle for 8 months, and then interviewed for another 4 consecutive months before permanently being dropped out of the sample. In any given month, the CPS sample consists of eight rotation groups in the 4-8-4 rotating structure. In theory, all rotation groups should report the same statistics, apart from sampling errors, as they are all independent representative samples of the U.S. civilian population. However, it has been observed that rotation group one systematically posts much higher unemployment rates than other rotation groups, leading to the so-called phenomenon of "Rotation Group Bias (RGB)". 1 As shown in Panel A of Fig. 1, the average unemployment rates over 1995-2018 for the first rotation group and others are 6.4% and 5.8%, respectively.
In this paper, we propose a measurement error model to explain the existence of RGB. We allow reported labor force status (LFS) to depend not only on latent true status, as in Feng and Hu (2013), but also on reported status in the previous interview. Rotation group one, and to a lesser extent, rotation group five, are structurally different from other rotation groups because they have no previously-reported status to condition on. Therefore, although reported unemployment rates from all rotation groups are potentially biased due to misclassification errors, there exist systematic differences in the (mis)reporting processes among rotation groups, leading to different reported statistics and hence RGB. Under certain conditional independence assumptions, we identify the distributions of misclassification errors and the distributions of latent true LFS for different rotation groups. To the best of our knowledge, this is the first rigorous statistical model incorporating RGB in a general misclassification framework. Our model does not impose any assumptions regarding which rotation group would post more accurate statistics, nor does it have any a priori implications on the existence and direction of RGB. In contrast, our framework is flexible enough to allow for all possibilities regarding RGB. 2 This flexibility is important because not all rotating panel surveys display the same patterns of RGB as in the CPS, as noted by Krueger, Mas, and Niu (2017) for the Canadian Labor Force Survey.
We find strong evidence for our key assumption that the reported LFS depends not only on the latent true LFS, but also on the previously-reported LFS. For ongoing rotation groups where respondents were interviewed also in the immediate previous month, i.e., rotation groups 2-4 and 6-8, the misclassification matrices, which map true status into reported status, differ significantly for those who reported different LFS in the previous period. This suggests persistence in reporting behavior not captured by previous studies. Our out-of-sample corrections eliminate RGB between rotation group one and other rotation groups. We also provide estimated true unemployment rates, which are higher than the official ones.
Finally, in light of the evidence found in this paper, we examine the plausibility of alternative theories on panel conditioning . We rule out the "practice effect" hypothesis as our results suggest that later rotation groups underreport unemployment more than rotation group one. We show that the most likely explanations consistent with a higher unemployment rate for rotation group one are related to survey manipulations. Previous studies suggest that survey respondents may manipulate survey instruments in order to save time or energy costs associated with survey participation, or to avoid stigmatized answers such as being unemployed. We provide some empirical support for these hypotheses. However, the evidence is still preliminary and we do not rule out other mechanisms and alternative approaches to understanding RGB. 3 The rest of the paper is organized as follows. Section 2 reviews prior studies related to RGB. Section 3 presents our measurement errors model and discusses key assumptions. Section 4 presents the main results, including estimated misclassification probabilities, out-of-sample corrections, and corrected labor force statistics based on our method. Section 5 discusses related studies and potential behavioral explanations for the observed RGB patterns. The last section concludes. The online Appendix includes data construction, detailed examinations of our key assumptions, robustness checks, and some additional results.

Literature review
Rotation group bias in the CPS has been well recognized and studied by researchers (Bailar 1975;Solon 1986;Erkens 2012;Krueger, Mas, and Niu 2017;Ahn and Hamilton 2022). 4 Many empirical exercises have been conducted to identify important factors that might affect rotation groups differently. A widely discussed factor is sample attrition, and in particular non-interviews, as rotation group one typically has higher Type-A non-interview rates than other rotation groups. 5 Although studies find that some RGB can be attributed to differences in response probabilities among rotation groups (Williams and Mallows 1970;Krueger, Mas, and Niu 2017), non-interviews do not explain the very existence of RGB. As Krueger, Mas, and Niu (2017) show, RGB 2 In Section B of the online Appendix, we use Monte Carlo simulations to show that our framework can deal with positive, zero, and negative RGB.
3 See more discussion in Section 5. 4 RGB has also been observed in Canada (Ghangurde 1982), Netherlands (Van den Brakel and Krieg 2015) and New Zealand (Silverstone and Bell 2011). 5 Type-A non-interview households are those eligible for the CPS interview but not interviewed because of refusal, temporary absence, non-contact, and other non-interview reasons.
exists even for those respondents who are surveyed in all eight interviews, and it remained stable before and after the 1994 CPS redesign.
Many other factors associated with the CPS interviewing process have also been discussed, including survey modes (computer-assisted vs. paper-and-pencil, telephone vs. in-person), types of survey respondents (self vs. proxy response), and types of interviewers (McCarthy 1978; U.S. Bureau of the Census 1978). These factors have been found to be inconsequential in general. For example, RGB remains even during the early CPS when all interviews were conducted in person (Hansen et al. 1955). In particular, the U.S. Census Bureau conducted a Methods Test Panel in 1978, including a set of twelve different treatment combinations in terms of survey modes, types of respondents, and types of interviewers. 6 Using the data from this experiment, Irvine (1984) finds that RGB is not attributable to any of these three types of treatments.
There is a strand of literature that studies how participating in a repeated survey may affect later responses to similar questions or subsequent behavior, which is known as "panel conditioning" . Different hypotheses have been proposed to explain why panel conditioning arises. On the one hand, panel conditioning might arise due to changes in true behavior. For example, respondents may become more conscious of the importance of the topics after being interviewed repeatedly, thereby changing their true behavior, known as the channel of "cognitive stimulus" (Sturgis, Allum, and Brunton-Smith 2009;Zwane et al. 2011). On the other hand, panel conditioning might also lead to changes in reporting behavior. It is possible that repeated interviewing may improve understanding of the rules that govern the interview process, leading to more accurate responses through the so-called "practice effect" (Bailar 1989;Waterton and Lievesley 1989). In contrast, repeated interviewing may also lead to inaccuracy. Respondents are more likely to report falsely to avoid socially non-normative or stigmatized answers, or to reduce the time or energy costs associated with survey participation, as noted by Warren and Halpern-Manners (2012). So far, no rigorous behavioral studies have been conducted with respect to the labor force status in the CPS to identify possible sources of RGB.
Lacking a clear understanding of why RGB arises, the Bureau of Labor Statistics report averages of all eight rotation groups in their official reports of labor force statistics. Other studies, such as Bailar (1975), focus instead on changes in unemployment rates, assuming RGB to be additive and constant over time. Nevertheless, Solon (1986) provides empirical tests that reject the assumption of additive rotation group effects. Alternatively, Ahn and Hamilton (2022) define RGB as a tendency that an individual changes his or her responses depending on the number of times surveyed, and use rotation group one as the benchmark to correct for RGB in other rotation groups. They also separately correct for general misclassification errors and nonrandom sample attrition common to all rotation groups.
On the other hand, there is a strand of statistical literature that tries to identify general measurement errors in the CPS, irrespective of their sources, thereby estimating the "true" labor force statistics. However, most of these studies have ignored RGB so far. These studies rely on the local independence assumption that the reported status in one period is independent of everything else conditional on the true status in the sample period, under which the misreporting behavior can be summarized as a simple 3-by-3 misclassification matrix. To identify such a misclassification matrix, some early studies use the responses from the reconciled reinterview surveys as the "true" LFS (Abowd and Zellner 1985;Summers 1986, 1995;Magnac and Visser 1999). Other studies also use the reinterview data but impose strong assumptions for model identification (Chua and Fuller 1987;Sinclair and Gastwirth 1996). For example, Sinclair and Gastwirth (1996) assume that the misclassification probabilities are the same for different subsamples. Utilizing the panel structure of the CPS, more recent studies develop a latent process of the underlying true LFS together with the misclassification process, such as Biemer and Bushery (2000); Bassi and Trivellato (2009) ;Feng and Hu (2013); Shibata (2019). Nevertheless, the extent to which the bias in labor force statistics can be corrected depends on the specification of the misclassification errors. As Bound, Brown, and Mathiowetz (2001) point out, if measurement errors are persistent across periods, then the existing correction approaches might produce biased results.
Two previous studies are most similar to our paper in some methodological elements. Shockey (1988) study RGB in the CPS using a latent class approach, treating RGB as a result of different misclassification errors among rotation groups, with the same distribution of underlying LFS. However, Shockey (1988) only uses one-month CPS data and does not introduce persistence in misclassification errors. On the other hand, Bassi and Trivellato (2009) allow correlated classification errors by allowing the reported status in period t to depend on the true statuses both in periods t and t þ 1, as their data is retrospective and information in period t is collected in period t þ 1. In this paper, we extend existing models and introduce persistence in reporting behaviors by allowing reported status to not only depend on latent true status, but also on previouslyreported status. Our approach takes care of RGB along with other misclassification errors common to all rotation groups simultaneously.

A model of rotation group bias
In this section, we develop a model based on the 4-8-4 rotating structure in the CPS. Suppose that for each individual in a random sample, we observe the reported LFS for two spells of four consecutive periods as follows: Between periods 4 and 13, there are 8 drop-out periods, on which we have no information as individuals are not interviewed. The reported LFS is defined as follows: The latent true status U Ã t shares the same support as the reported status U t . We are interested in how the reported status U t is associated with the latent true status U Ã t in the whole process fU t , U Ã t g: Following Hu (2008), we need four-periods matched data ðU tþj , U t , U tÀ1 , U tÀi Þ to identify misclassification probabilities. To maximize variations in LFS across different periods and to be consistent with our identification assumptions, we choose ðU 13 , U 4 , U 3 , U 1 Þ: 7 Let PrðÁÞ stand for the probability distribution function of its arguments, we outline the following assumption to describe the reporting behavior: Assumption 1. The reported status in period t only depends on the true status in period t and the reported status in period t -1. That is, This assumption allows for correlation between the current reported status U t and the previouslyreported status U tÀ1 conditional on the current true status U Ã t , which can be supported by some 7 In Section E of the online Appendix, we also provide robustness checks using alternative matched periods, such as ðU 13 , U 4 , U 3 , U 2 Þ and ðU 13 , U 3 , U 2 , U 1 Þ, and the results are still robust. The tradeoff is that if two periods are close to each other, the matrix M U4, u3, U2 or M U3, u2, U1 (required in Assumption 3) is more likely to be nearly singular, so the variances would be larger. studies on panel conditioning, such as Halpern-Manners and Warren (2012). 8,9 Assumption 1 is much weaker than the widely-used local independence assumption, which further impose PrðU t jU Ã t , U tÀ1 Þ ¼ PrðU t jU Ã t Þ, such as Feng and Hu (2013) among others. Note that LFS is not observed in the immediate previous periods before the starting periods (periods 1 and 13). This implies that the misclassification errors may only depend on the current true value for the starting periods. 10 That is, for t ¼ 1, 13, We then derive the joint distribution PrðU 13 , U 4 , U 3 , U 1 Þ as follows: Having assumed the conditional independence of the reporting process, our next assumption deals with the dynamics of the latent true LFS: Assumption 2. Conditional on the true status in period t and the reported status in period t -1, the true status nine months later is independent of the reported values in other periods. That is, Technically, this assumption is not completely comparable with that in Feng and Hu (2013), which assumes PrðU Ã tþ9 jU Ã tþ8 , U Ã tÀ1 Þ ¼ PrðU Ã tþ9 jU Ã tþ8 Þ, as our Assumption 2 involves reported status. 11 Nevertheless, they are both much weaker than the Markov process assumption that the whole process U t , Biemer and Bushery (2000); Shibata (2019).
Assumption 2 implies that In Section B of the online Appendix, we use Monte Carlo simulations to evaluate the robustness of this assumption through two deviations. The first is The simulation results show that our estimators are robust to reasonable deviations. 9 We have also tried to allow for higher-order persistence in the misclassification error process along the lines of Bollinger and David (2005), and estimated PrðU t jU Ã t , U tÀ1 , U tÀ2 Þ using the CPS data. However, this would require more periods of matched data, leading to large standard errors in some estimates. The results are similar to our baseline results, as shown in Section E of the online Appendix. 10 For period 13, it is possible that people still remember their previous reports after eight drop-out periods. However, it is important to note that similar to period 1 (the first interview), dependent interviewing is also not used in period 13 (the fifth interviews), which might considerably weaken the correlations between reported value in period 13 and previous reports. Please refer to https://www.census.gov/programs-surveys/cps/technical-documentation/methodology/collecting-data.html. Despite this, we perform simulation study to show that even if this assumption is relaxed to PrðU 13 jU Ã 13 , Suppose that there is no misclassification error and the U tÀ1 in PrðU Ã tþ9 jU Ã t , U tÀ1 Þ is replaced with U Ã tÀ1 , we may be able to argue that our Assumption 2 is weaker than that in Feng and Hu (2013), because our assumption is imposed on true labor dynamics across eight drop-out periods, while theirs is imposed across two consecutive periods. 12 In Section B of the online Appendix, we use simulations and show that even if relaxing this assumption to Therefore, Equation (3) can be further written as follows: This means that we may apply the identification strategy in Hu (2008) to identify the unknown conditional distributions on the right hand side for each u 3 , and therefore, to identify the misclassification probabilities PrðU 4 jU Ã 4 , U 3 Þ:

Identification of misclassification probabilities for ongoing rotation groups
In this subsection, we specify the main procedures to identify the misclassification probabilities for ongoing rotation groups, i.e., rotation groups 2-4 and 6-8. We start with Equation (7) with U 13 ¼ 1 and U 3 ¼ u 3 : 13 That is, We first define matrices as follows: Therefore, Equation (8) is equivalent to Similarly, we have In order to identify the unknown matrix M U 4 jU Ã 4 , u 3 , we need a technical assumption as follows: Assumption 3. For each u 3 , M U 4 , u 3 , U 1 has a full rank.
Since this assumption is imposed on the observed probabilities matrix, it can be directly tested using the CPS data. We use bootstrapping to show that the determinant of the observed matrix M U 4 , u 3 , U 1 for each u 3 is significantly different from zero even if controlling for observed heterogeneity, 13 The identification procedure holds for each value of U 13 , implying that the model is over-identified. Here we use the subsample with U 13 ¼ 1 to illustrate the identification strategy. In Section E of the online Appendix, we also provide robust results using different subsamples. which means that M U 4 , u 3 , U 1 is invertible. 14 Eliminating M U Ã 4 , u 3 , U 1 in Equation (9) and (10) leads to Following Hu (2008), PrðU 4 jU Ã 4 , U 3 ¼ u 3 Þ for each u 3 can be identified from an eigendecomposition.
In order to determine the eigenvector uniquely for each given eigenvalue, we need the eigenvalues to be distinctive, which can be formally stated as follows: are the three eigenvalues of the observed matrix M u 13 , U 4 , u 3 , U 1 M À1 U 4 , u 3 , U 1 : We can test this assumption directly by calculating the differences between the eigenvalues. The bootstrapping results show that the absolute differences between the eigenvalues are significantly different from zero even if controlling for observed heterogeneity, which means that the eigenvalues are distinctive. Intuitively, this assumption implies that the current true status has an impact on the probability of reporting being employed nine months later.
In order to determine the ordering of the eigenvectors, we impose the following assumption: is the smallest element in column j. Assumption 5 implies that given the previously-reported status, when the true status is the same as the previously-reported status, individuals are always more likely to report that status than if the true status is otherwise. Further, when the true status is different from the previouslyreported status, then the least possible choice for the individual to report would be the status other than the true or previously-reported status. 15 Assumption 5 is an intuitive extension of the standard assumption in the literature that the reported status is assumed to only depend on the latent true status, and people are more likely to report the truth than otherwise, such as in Feng and Hu (2013) and other studies reviewed by Bound, Brown, and Mathiowetz (2001). In our extended framework, both the latent true status and the previously-reported status could matter in the current interview.
In fact, under the maintained assumptions above, we may use the method in Hu and Shum (2012) to identify the misclassification probabilities for each ongoing rotation group. For the convenience of estimation, we impose the stationarity and apply the misclassification probabilities PrðU 4 jU Ã 4 , U 3 Þ to all other ongoing rotation groups, instead of identifying them separately, i.e., PrðU j jU Ã j , U jÀ1 Þ ¼ PrðU 4 jU Ã 4 , U 3 Þ for j ¼ 2, 3, 14, 15, 16: 14 See Section C of the online Appendix. The statistical inference is based on bootstrap. For example, we construct the bootstrap samples by resampling with replacement from the matched samples ðU 13 , U 4 , U 3 , U 1 Þ for B repetitions. Then the bootstrap estimates of standard error can be calculated as follows: wherep i is the bootstrap estimates and p ¼ 1 In Section D of the online Appendix, we illustrate how Assumption 5 works. Furthermore, we show results from alternative ordering possibilities and discuss the validity of this assumption.

Identification of misclassification probabilities for rotation groups one and five
To ensure that each rotation group posts the same true labor force statistics, we impose the following assumption similar to Shockey (1988): Assumption 6. In a given calendar month, the marginal distribution of the true LFS does not depend on rotation groups.
As discussed in the last subsection, we may just use PrðU 4 jU Ã 4 , U 3 Þ as the misclassification probabilities for all ongoing rotation groups, and then achieve the distribution of the true LFS. Let s stand for a calendar month, PrðU Ã i, s Þ for i ¼ 1, 13 be the distribution of the true LFS for rotation groups one and five respectively, and PrðU Ã o, s Þ be the distribution of the true LFS for ongoing rotation groups with o 2 f2, 3, 4, 14, 15, 16g in month s. In addition, we define Therefore, pðU Ã o, s Þ for each s 2 ft 1 , t 2 , :::, t n g can be estimated as follows: Assumption 6 implies that for i ¼ 1, 13. Since we also observe the distribution of the reported LFS for rotation groups one and five, i.e., pðU i, t 1 Þ, pðU i, t 2 Þ, :::, pðU i, t n Þ Â Ã for i ¼ 1, 13, thus we have the matrix form with the stationarity as follows: The only unknown in Equation (12) is the misclassification matrix in the starting periods, M U i jU Ã i : We may identify and estimate M U i jU Ã i with observations from enough number of calendar months. In practice, we estimate M U i jU Ã i by minimizing the L 2 distance between the left-hand side and the right-hand side of Equation (12). 16 Since we only deal with sample proportions in different subpopulations, the consistency and the asymptotic properties can be straightforwardly derived and are omitted here. 16 In estimation, we impose the restriction that all probabilities are in 0, 1 ½ , see Hu (2017) for details.

Misclassification probabilities
We use the public-use monthly CPS data from September 1995 to February 2018. Figure 1 displays the reported unemployment rates, labor force participation rates and employment-population ratios. There is clear evidence of RGB, that is, rotation group one posts higher levels in almost all periods among the three labor force statistics. These differences are also statistically significant although we do not report standard errors in the graph. On average, the reported unemployment rate from rotation group one is about 0.61 percentage points (6.40% vs. 5.79%) higher than that from other rotating groups, while the differences for labor force participation rates and employment-population ratios are 1.28% percentage points (66.46% vs. 65.18%) and 0.81% percentage points (62.22% vs. 61.41%), respectively. To utilize the method present in the last section and estimate misclassification probabilities M U 4 jU Ã 4 , U 3 , we need to obtain the joint distribution PrðU 13 , U 4 , U 3 , U 1 Þ: We follow the algorithm proposed by Madrian and Lefgren (2000) to match the CPS monthly files. Specifically, we first match the CPS sample based on household identifier, household replacement number, and personal identifier, then use the information on sex, age, and race to "certify" the crude matches. Nevertheless, such matched sample may not be representative of the cross-sectional sample due to sample attrition (Feng 2008). Therefore, we generate a matching weight to correct for sample attrition using the same procedure as in Feng and Hu (2013). 17 In order to have a large sample size, we pool all the matched samples over our study period to estimate misclassification probabilities.   See detailed procedures on matching the monthly CPS data and correcting for sample attrition in Section A of the online Appendix.
Panel A of Table 1 shows the estimated misclassification probabilities for ongoing rotation groups. 18 When the current true status and the previously-reported status are the same, the probability of reporting that status is always the highest. For example, conditional on that the previously-reported status is employed, 98.8% of truly employed people would report being employed. We also find that even conditional on the current true status, the misclassification probabilities are still highly related to the previously-reported status. For example, conditional on that the previously-reported status is employed, 74.0% of truly unemployed individuals and 55.9% of truly not-in-labor-force individuals report being employed falsely. 19 In addition, the results also suggest that when the current true status and the previously-reported status are different, the third status (different from both the current true status and the previously-reported status) becomes the least likely to be reported. This also confirms our re-ordering Assumption 5. For example, conditional on that the current true status is unemployed and the previously-reported status is employed, the probability of reporting being not-in-labor-force is only 6.3%, less than that of reporting being employed (74.0%) or unemployed (19.7%). When the current true status is not-in-labor-force and the previously-reported status is employed, the probability of reporting being unemployed is only 2.2%, less than that of reporting being employed (55.9%) or not-in-labor force (41.9%).
To make sure that our results on the persistence in misreporting behavior come from correlations between the reported statuses in two periods, we conduct a falsification test. The idea is that if the reported status is generated independently, we would not obtain the result of persistence as in Panel A of Table 1. After matching ðU 13 , U 4 , U 3 , U 1 Þ, we fix the marginal distribution of U 3 but randomly re-assign the reported status to generate ðU 13 , U 4 ,Ũ 3 , U 1 Þ: This procedure makes the reported status U 4 independent of the counterfactual previously-reported statusŨ 3 : 20 Panel B of Table 1 shows that there is no significant difference among the three misclassification matrices, implying that the current reported status is no longer affected by the previouslyreported status conditional on the current true status. 21 Once we have identified the misclassification probabilities for ongoing rotation groups, we can estimate the distribution of true LFS, which can then be used to back out the misclassification probabilities for rotation groups one and five. Panel C of Table 1 shows that the misclassification matrices of rotation group one and five are similar to those in earlier studies, in which the diagonal elements are always the largest in each column, i.e., people are more likely to report the truth. Note that this is no longer an imposed assumption as in previous studies. Similar to what was found in previous studies as in Abowd and Zellner (1985); Poterba and Summers (1986); Feng and Hu (2013), there exist considerable misclassification errors, especially for those whose true status is unemployed, with their probabilities of reporting being unemployed correctly to be only 61.0% and 57.1% for rotation groups one and five, respectively.

Out-of-sample corrections and corrected labor force statistics
We next perform out-of-sample corrections to investigate whether our methodology eliminates RGB. For each period t, we first estimate misclassification probabilities for ongoing rotation 18 We also control for observed heterogeneity and estimate PrðU t jU Ã t , U tÀ1 , XÞ, and the patterns remain. See Section F of the online Appendix. 19 Using a linked CPS-UI data set, Abraham et al. (2013) report that, among those workers who receive wages in the UI records ("true" status is employed), 6.4% do not report working in the CPS, while among those who do not receive wages in the UI records, 17.6% report working in the CPS. That is, PrðU ¼ NEjU Ã ¼ EÞ ¼ 6:4% and PrðU ¼ EjU Ã ¼ NEÞ ¼ 17:6%, where E and NE refer to "working" and "not working", respectively. In comparison, we integrate out previously-reported status in our misclassification probabilities and re-classify "unemployed" and "not-in-labor-force" as "not working", then calculate two corresponding probabilities, PrðU ¼ NEjU Ã ¼ EÞ ¼ 5:2% and PrðU ¼ EjU Ã ¼ NEÞ ¼ 10:0%, which are comparable with those in Abraham et al. (2013). 20 Note that the partial relationship between U 4 and U Ã 3 , if any, is not affected.

21
A formal test based on bootstrap is shown in Table F7 of the online Appendix.
groups, as well as for rotation groups one and five, using the CPS data from September 1995 up to period t À 1. Second, we use the estimated misclassification probabilities to correct the marginal distribution of LFS for period t and compare the labor force statistics from rotation group one and other rotation groups. Panel A of Fig. 2 displays the results for unemployment rates from January 2013 to December 2016. The first row simply shows the existence of RGB in reported unemployment rates between rotation group one and other rotation groups. As we can see, almost all the differences lie above the horizontal zero-line. On average, the unemployment rate of rotation group one is 0.66 percentage points higher than that of other rotation groups for this time period. In the second row, we correct for misclassification errors using the method proposed in Feng and Hu (2013), where the misclassification probabilities are assumed to only depend on the current true status. Overall, the patterns regarding RGB are almost the same as the reported ones. In the third row, we report the correction results using our proposed method in this paper. On average, RGB almost disappears completely, as the average difference between rotation group one and other rotation groups is only 0.12 percentage points, compared to 0.66 percentage points in the first row and 1.02 percentage points in the second row. Similar results also appear in the out-of-sample corrections for labor force participation rates and employment-population ratios, as shown in Panels B and C, respectively.
Although misclassification errors in the CPS have been realized and studied before, these studies have ignored the issue of RGB and not incorporated persistence in the reporting process, such as Abowd and Zellner (1985); Poterba and Summers (1986); Feng and Hu (2013). Figure 3 compares our corrected labor force statistics with the reported series, as well as the corrected series based on Feng and Hu (2013). In terms of levels, Panel A shows that the "Reported" unemployment rates are lower than our "Corrected" series on average, which shows that the official unemployment rates systematically underestimate true levels of unemployment, consistent with the main findings of Feng and Hu (2013). However, our "Corrected" series, with an average of 7.43%, is lower than the "Corrected-FH" series in levels (8.26%). Therefore, ignoring persistence Figure 2. Out-of-sample corrections: differences of labor force statistics between rotation group one and other rotation groups. Note: "Reported" series are based on reported labor force statistics. "Corrected" series are calculated using the misclassification probabilities in this paper, while "Corrected-FH" series are calculated using the misclassification probabilities in Feng and Hu (2013). Each dot denotes the difference of labor force statistics between rotation group one and other rotation groups in a given month. The dashed horizontal lines are average values of the differences over the covered periods. The corresponding 95 percent confidence intervals are calculated from bootstrapped standard errors based on 500 repetitions. in reporting errors leads to an overestimation of unemployment rates. In addition, Panel B shows that our "Corrected" labor force participation rates nearly coincide with the "Reported" series, which is substantially lower than the "Corrected-FH" series. Similarly, in Panel C, our "Corrected" employment-population ratios are also lower than both the "Reported" and "Corrected-FH" series. Nevertheless, a reassuring pattern is that all these series track very Note: "Reported" series are reported labor force statistics. "Corrected" series are calculated using the misclassification probabilities in this paper, while "Corrected-FH" series are calculated using the misclassification probabilities in Feng and Hu (2013). All numbers are seasonally-adjusted using the Census's X-12-ARIMA algorithm.
similarly, implying that labor market developments over time appear to be robust to misclassification errors. (2022) Recently, Ahn and Hamilton (2022) also correct for RGB and other inconsistencies in the CPS data, including sample attrition, misclassification, and unemployment duration. In terms of RGB, they argue that it is related to the number of times interviewed and interpret this as differences in interview technology among rotation groups. Ahn and Hamilton's framework is essentially considering the following counterfactual question: how would labor force statistics have changed if a rotation group had instead been interviewed using the interview technology of another rotation group. They use rotation group one as a benchmark for labor force statistics of other rotation groups, which is supported by several aspects of empirical evidence, including survey disengagement, stigma, and so on. In comparison, our model does not impose any assumption regarding which rotation group would report more accurate labor force statistics, nor does it have any a priori implications on the existence or direction of RGB.

Comparison with Ahn and Hamilton
Regarding misclassification errors, Ahn and Hamilton (2022) utilize the information of reported unemployment duration. In particular, they re-classify UNU transition as UUU if the final U reports a duration of job search greater than 4 weeks, as they find that the job-finding probabilities reported by people who make NU and UU transitions with unemployment duration   above 4 weeks are essentially the same. 22 However, Ahn and Hamilton only focus on the misclassified N, ignoring that other labor force statuses are also subject to potential misclassification errors. For instance, a validation study has shown that 6.4% of the workers who receive wages in the UI records would not report working in the CPS, while 17.6% of those who do not receive wages would report working (Abraham et al. 2013). Instead, our approach considers potential misclassification across all three labor force statuses jointly. Although the two papers utilize very different frameworks, the corrected unemployment rates are similar, with an average of 7.9% over 2001-2018 in this paper and 8.2% in Ahn and Hamilton (2022). We also find the reported unemployment rates from rotation group one closest to the corrected unemployment rates, which is consistent with Ahn and Hamilton's choosing rotation group one as a benchmark in correcting for RGB.

Potential mechanisms
In this subsection, we discuss theories in the "panel conditioning" literature that may shed light on why RGB arises in the CPS. One possible explanation is the "practice effect", that is, respondents may gain a better understanding of survey questions as subsequent interviews continue. According to this conjecture, later rotation groups should be closer to the truth than rotation group one. However, comparisons from Table 2 show that the reported unemployment rate from later rotation groups deviates more from the corrected unemployment rate. This suggests that unemployment tends to be under-reported systematically in all rotation groups, and repeated interviewing leads to more inaccuracy progressively. This pattern is robust for different sub-periods, demographic groups, and regions. 23 Therefore, these empirical results cast doubt on the hypothesis that RGB arises through the "practice effect".
On the other hand, two other hypotheses are potentially consistent with the pattern of progressively more misclassification errors, both related to survey manipulation by respondents. The first is that respondents are more likely to avoid stigmatized answers such as being unemployed (Halpern-Manners and Warren 2012). As Ahn and Hamilton (2022) note, self-reported respondents may care more about stigmatization than the proxy-reported. Based on our estimates in Table 3, conditional on that the previously-reported status is unemployed, the probability of truly unemployed people's reporting being not-in-labor-force is higher for the self-reported (6.9%) than the proxy-reported (0.0%). Ahn and Hamilton (2022) also find that unemployment falls more quickly across rotation groups among self-reported respondents than the proxy-reported.
The second hypothesis, as suggested in Halpern-Manners and Warren (2012), is that respondents may become disengaged from repeated interviews and strategically choose answers that may lead to less interview time. Using records on household's cumulative interview time in the CPS, we find that an increase in the number of unemployed people in a household is associated with an increase in interview time by 12%, as shown in Table 4. This suggests that there is an incentive for respondents to not report being unemployed if they want to reduce interview time. Similarly, our results show that conditional on household size, total interview time is negatively related to the number of disabled people in a household. This is consistent with the findings in Ahn and Hamilton (2022) that people in later rotation groups are more likely to report being not-in-labor-force for the reason of disability, and disabled people in later rotation groups are more likely to return to the labor force in the future than those in rotation group one. In the meantime, the number of multiple job holder is positively related to interview time, consistent 22 E, U and N stand for employment, unemployment and not-in-labor-force, respectively. This procedure differs from that in Rothstein (2011);Farber and Valletta (2015); Elsby, Hobijn, and S¸ahin (2015) among others, which re-classify all UNU transition as UUU.

23
See Section F of the online Appendix.
with the findings in Hirsch and Winters (2016) that people in rotation group one are more likely to report holding multiple jobs. All of these are consistent with the hypothesis of avoiding lengthy interviews. However, additional research, preferably controlled experiments, is necessary to provide more credible causal evidence.

Conclusion
In this paper, we develop a general statistical framework to study the issue of rotation group bias. In particular, we allow the distribution of misclassification errors to depend not only on current true status, but also on previously-reported status. Under certain conditional independence assumptions, we identify the distributions of misclassification errors for different rotation groups, with which we correct for the misclassification errors in labor force status and calculate the corrected labor force statistics.
Using the CPS data, we find strong evidence for our key assumption that both the current true status and the previously-reported status matter. In other words, what respondents reported in the previous interview does affect the reported value in the current interview. The out-of-sample corrections show that our proposed method does eliminate rotation group bias. We also provide new estimates of the U.S. unemployment rates, which are much higher than the official series, but lower than the previous estimates that ignored persistence in misclassification.
Although this paper specifically focuses on the issue of rotation group bias in the CPS, many surveys have similar rotating structures as the CPS and our method can be applied directly there with some modification. More generally, researchers who take misclassification errors seriously can extend our proposed model to many empirical studies using panel data.