Testing Measurement Invariance over Time with Intensive Longitudinal Data and Identifying a Source of Non-invariance

Abstract Longitudinal measurement invariance (LMI) is a critical prerequisite to assessing change over time with intensive longitudinal data (ILD). For LMI testing with ILD, we propose cross-classified factor analysis (CCFA) to detect non-invariant item parameters and alignment optimization (AO) to detect non-invariant time points as a supplement to CCFA. In addition, we use a covariate in CCFA to identify a source of non-invariance. To evaluate the proposed models under unique features of ILD, such as autoregression (AR), we conducted a Monte Carlo simulation study. The results showed CCFA can be an excellent tool for ILD LMI testing regardless of simulation factors even when AR was misspecified and can identify a source of non-invariance using a covariate. AO can supplement CCFA to find non-invariant time points although AO requires a large number of persons. We provide detailed discussions and practical suggestions.

Intensive longitudinal data (ILD) that are collected at numerous time points (e.g., !20 time points; Collins, 2006) through real-time data collection methods, such as ecological momentary assessments, experience sampling, and daily diaries, are widely used to investigate changes (growth, trend, fluctuation, etc.) and the unfolding of dynamic processes over time (e.g., temporal dependency) (Hamaker & Wichers, 2017;Walls & Schafer, 2006). For instance, to study temporal dynamics of affect (e.g., Stavrakakis et al, 2015;Wen et al., 2022), researchers may use a psychological measure and collect participants' positive affect at the moment four times a day for two weeks and investigate how their positive affect changes over time within person (e.g., throughout a day, a week, or the entire period of study time) and also how positive affect dynamics and its relation with other factors (e.g., physical activities) are different between persons.
Measurement invariance (MI) is a critical prerequisite to assessing change over time with intensive longitudinal data (McNeish et al., 2021;Vogelsmeier et al., 2021). Longitudinal measurement invariance (LMI) is the equivalence of measurement (specifically, item parameters) over time (Kim & Willson, 2014;Liu et al., 2017;Meredith & Horn, 2001;Millsap, 2011;Vandenberg & Stanley, 2009;Widaman et al., 2010;Wu et al., 2010). The observed changes and patterns in responses over time are interpretable only if the meaning of the construct (e.g., positive affect) remains the same over time with the scale used to measure the construct. When LMI does not hold, changes in observed scores cannot be meaningfully interpreted.
Suppose participants interpret an item of a positive affect measure differently in the morning and afternoon (or on weekdays and weekend) or respond to the item differently depending on the contexts where they are situated, for example, with family members vs. with friends regardless of the true level of positive affect. Then, the change in their scores on this item reflects not the change in their positive affect but the change in their interpretation of the item. Therefore, LMI is imperative to make valid inferences about change over time.
Despite the importance of LMI with ILD, LMI testing with ILD could be challenging methodologically because ILD has unique features: the number of measurement occasions is generally very large (e.g., 20-100 time points), the assumption of independent observations does not hold due to repeated measures, and other complex features (e.g., autoregression) could be present (Walls & Schafer, 2006). Thus, traditional approaches to testing LMI (e.g., longitudinal measurement models; e.g., Liu et al., 2017;Widaman et al., 2010) may not be viable options. In the longitudinal measurement model, each factor represents a measure at each time point (assuming the unidimensionality of a measure for simplicity) and the multiple factors of repeated measures are allowed to be correlated (factor at time 1 with factor at time 2, etc.) as well as item residuals because the same items are used repeatedly over time (e.g., item 1 residual at time 1 with item 1 residuals of all the other time points). That is, with 100 time points we need to build 100 correlated factors with their item residuals correlated.
Along with the development of dynamic structural equation modeling for intensive longitudinal data analysis (e.g., Asparouhov et al., 2018;Hamaker et al., 2018), cross-classified factor analysis (CCFA) was introduced for LMI testing with ILD (McNeish et al., 2021). In the CCFA framework, observations in ILD are considered cross-classified by two higher-level factors (time and people). When observations are nested within time, we can evaluate the random effects of item parameters (i.e., factor loadings and intercepts) over time at the time level: that is, the variability of factor loadings and the variability of intercepts across time points. No variability of each item parameter over time indicates LMI (the details of CCFA are presented in the following section). Even though CCFA was demonstrated as a promising method for LMI testing with ILD (McNeish et al., 2021), the adequacy of CCFA for LMI testing has not been systematically investigated yet considering the unique features of ILD. Also, CCFA does not provide information about noninvariant time points (namely, the source of non-invariance), which could be of interest to applied researchers. Thus, in this study, we examined how CCFA performs in detecting measurement non-invariance (MNI) or noninvariant item parameters in ILD under various research conditions and examined how a covariate can be used to detect a source of MNI (non-invariant time points) in CCFA. Alignment optimization (AO; Asparouhov & Muth en, 2014) is gaining popularity for measurement invariance testing due to its flexibility in accommodating a large number of groups and comparing means without requiring exact invariance (e.g., Byrne & van de Vijver, 2017;Kim et al., 2017;Lai, 2021). However, its applications are generally limited to multiple-group comparisons (e.g., comparisons across countries). Recently, beyond group comparisons, Lai (2021) demonstrated its application to growth modeling under the violation of LMI (or partial invariance) by applying the alignment optimization algorithm to a configuralinvariance longitudinal measurement model with four time points. However, AO in this case has the same limitations as traditional approaches to ILD (e.g., building 100 correlated factors with 100 time points). In particular, AO for a longitudinal model requires a configural invariance model (all item parameters are freely estimated across all time points except minimal constraints; for example, 100 factor loadings per item), and thus constructing and estimating a longitudinal measurement model with ILD would be very challenging in AO.
Additionally, treating time points as groups and applying multiple-group AO instead of a longitudinal measurement model seemingly violates the independence assumption because repeated measures are less likely independent of each other. Previous studies (e.g., Kim et al., 2012) showed that ignoring data dependency in MI testing resulted in inflated Type I error (false detection of non-invariance when invariant). Even though we are aware of this obvious limitation of multiple-group AO with ILD, we believe it is still worthwhile investigating its performance for LMI testing for two major reasons. First, AO is a very appealing approach to approximate invariance (Asparouhov & Muth en, 2014) in which item parameters are not assumed to be exactly identical over, for instance, 100 time points. Second, AO produces information that is not available in CCFA: the exact location of MNI, that is, non-invariant time points for each item. We particularly focused on the second feature in AO because this is one of the biggest limitations of CCFA approaches to LMI testing. Given these benefits, we evaluated its feasibility for ILD LMI testing, particularly as a supplement to CCFA to identify the source of non-invariance.
Thus, the purposes of this paper are 3-fold. First, we examined the adequacy of CCFA for LMI testing with ILD under various conditions (e.g., with autoregression). Second, we investigated how a covariate can be used to identify the source of MNI across time points in CCFA. Third, in addition to CCFA, we evaluated AO for its feasibility with ILD in LMI testing. To this end, we conducted a simulation study. Through the systematic investigation, we (1) explore the behaviors of CCFA and AO specifically for ILD LMI testing, (2) evaluate and suggest the cutoffs to determine non-invariance in CCFA, (3) assess the impact of AR on the performance of CCFA and AO, and (4) promote practices to identify a source of non-invariance beyond the detection of non-invariance with practical suggestions.

Longitudinal Measurement Invariance
In the structural equation modeling framework, longitudinal measurement invariance (LMI) can be tested by showing the equivalence of measurement parameters (or item parameters) over time using confirmatory factor analysis (CFA) (Kim & Willson, 2014;McArdle & Nesselroade, 2014;Meredith & Horn, 2001;Millsap, 2011;Widaman et al., 2010). With a set of observed variables or items in a scale (Y) repeatedly measured over time, a measurement model at a time point t is mathematically defined in matrix form as: where m t , K t , g t , and e t denote the intercept, factor loading, latent factor, and residual, respectively at a time point t. The measurement model shows the relation of observed variables (Y t ) with their corresponding latent factors (the constructs that underlie the observed variables, g t ). The LMI is summarized as: Traditionally, LMI is tested hierarchically from configural invariance (equal forms), metric invariance (equal factor loadings; K t ¼ K), scalar invariance (equal loadings and intercepts; K t ¼ K, m t ¼ m), to strict invariance (equal loadings, intercepts, and residual variances; (Kim et al., 2020;Meredith, 1993;Widaman et al., 2010). Scalar invariance is required for a meaningful interpretation of change over time (Meredith & Horn, 2001;Widaman et al., 2010;Wu et al., 2010). It should be noted that in this study we assumed configural invariance and tested the invariance of factor loadings and intercepts (Equations 2 and 3) simultaneously using CCFA and AO as described in the Method section.

Cross-Classified Factor Analysis
In CCFA, intensive longitudinal data are considered crossclassified: observations are cross-classified into individuals and time points. For a CCFA model formulation, Equation (1) is rewritten as: with subscripts i for an individual and t for a time point to allow the parameters to vary across persons and time (namely, random effects). Therefore, the latent factor (g it ), factor loading (K it ) and intercept (m it ) are decomposed into the fixed effect, between-person random effect, and between-time random effect as shown below (McNeish et al., 2021).
The first component in each equation is the fixed effect which is the average of the corresponding parameters across persons and across time points. Note that the fixed effect of g it , that is, factor grand mean (a 00 ) is constrained at zero for identification. 1 The second component is the random effect of a person i denoted by p i : That represents, for example, how much the factor loadings of an individual i deviate from the averages. The random effects at the person level (p ig , p iK , p im ) are assumed to be multivariate normal: p i $ MVMð0, X ðpÞ Þ and estimated at the person level. The third component in each equation is the random effect of time t denoted by s t : That represents, for example, how much the factor loadings at time point t deviate from the averages. The random effects at the time level (s ig , s iK , s im ) are assumed to be multivariate normal: s i $ MVMð0, X ðsÞ Þ and estimated at the time level. In addition, the within-level residuals e it are assumed to be multivariate normal: e it $ MVMð0, HÞ: The residuals and random effects at different levels (within, person, and time) are all assumed to be independent of each other. It is also commonly assumed that H, X ðpÞ , and X ðsÞ are all diagonal matrices, which means that residuals or random effects within the same level are uncorrelated.
Let the variances of p ig , p iK , and p im be x K , and x ðpÞ m , respectively, which are components in X ðpÞ at the person level. Let the variances of s tg , s tK and s tm be x ðsÞ g , x ðsÞ K , and x ðsÞ m , respectively, which are components in X ðsÞ at the time level. For the LMI testing in CCFA, of focal interest are the variances of item parameters over time (i.e., between-time variances): specifically, the variances of two random effects at the time level (x ðsÞ K and x ðsÞ m ). x ðsÞ K indicates the variability of factor loadings over time. When x ðsÞ K ¼ 0 (no variability over time), it indicates that (3) for the intercept invariance. In practice, when the between-time variances are close to zero (e.g., .002;McNeish et al., 2021), the item parameters can be considered as invariant; when the variances are substantially large (e.g., 0.07;McNeish et al., 2021), MNI can be evidenced.

Identifying the Source of Non-invariance with a Covariate in CCFA
When the variance of a measurement parameter is large, which indicates measurement non-invariance, a covariate can be included to explain the variability. To explain the non-invariance across persons, a person-level covariate is entered at the between-person level: where X i is a covariate at the person level that explains the random effects of factor loading (p iK ) and intercept (p im ), c pK and c pm are the regression coefficients of X i , and u iK and u im are the residuals for factor loading and intercept, respectively. When the regression coefficients are statistically significant and/or the residual variances of u iK and u im (x ðpÞ K and x ðpÞ m ) are zero or close to zero, the covariate is identified as a source of non-invariance across persons. Some examples of person-level covariates are participant demographic variables, such as gender.
Similarly, to explain the non-invariance over time, a time-level covariate is entered at the between-time level: where X t is a covariate at the time level that explains the random effects of factor loading (s tK ) and intercept (s tm ), c tK and c tm are the regression coefficients of X t , and u tK and u tm are the residuals for factor loadings and intercepts, respectively. When the regression coefficients are statistically significant and/or the residual variances of u tK and u tm (x ðsÞ K and x ðsÞ m ) become zero or close to zero, the covariate is identified as a source of non-invariance over time. Some examples of time-level covariates are time of a day, day of the week, and some contexts at the moment of assessment (e.g., alone or accompanied).
When the residual variances are still large after the inclusion of a covariate, multiple covariates can be considered. Of note is that the identified source of non-invariance is a covariate that is related to the variability of item parameters and a causal inference (e.g., interpreting it as a cause of non-invariance) should not be made.

1
For identification, the within-level factor variance is fixed at one.

Data Requirements for CCFA
In intensive longitudinal studies, the types of ILD are exceedingly diverse depending on study aims, study design, assessment schedule, and sampling strategies (Shiffman et al., 2008). CCFA may be suitable for some types of ILD but not others. To use CCFA for longitudinal measurement invariance testing, time should be a unit of analysis (e.g., an ID variable at the time level in CCFA). Thus, CCFA would be applicable to longitudinal data collected based on an interval-contingent design, which is one of the common designs in ILD studies. 2 In this design, participants provide response data at set times determined by the researcher (Bolger & Laurenceau, 2013), for example, data collected at three predetermined times a day ($10:00 a.m., $16:00 p.m., and $22:00 p.m.) for 30 days (Stavrakakis et al., 2015). The interval-contingent designs tend to have the advantage of being more predictable and less disruptive for the participant, and also, analyses that require fixed intervals, such as time-series modeling techniques can be used.
In intensive longitudinal studies, missing data (nonresponse to some measurement occasions) are likely prevalent. In CCFA, the data are structured in the long format (also called univariate format) in which time is a variable (ID) and the values of time (time points) are repeated over persons. Thus, all available time points per person (e.g., if participant 1 responded to all 50 time points, 50 time points for participant 1, if participant 2 skipped 10 times, 40 time points for participant 2, and so on) are included in CCFA. Although CCFA can be done with only available time points without losing any participants with skipped time points, missing rates, patterns, and mechanisms should be scrutinized as a standard practice, and some missing data treatment may be considered (e.g., including participants with the compliance rate 80% or above).
In terms of the measurement model, because CCFA is a CFA model with cross-classified data, it requires the same identification rules as CFA does. Thus, for a CCFA model to be identified, at least three reflective indicators (or items) for a single factor and at least two reflective indicators per factor for two correlated factors are needed. CCFA also assumes configural invariance over time because only a single CFA model is specified in CCFA and MI is evaluated based on the variability of the estimated measurement parameters.

Alignment Optimization
Alignment optimization (AO; Asparouhov & Muth en, 2014) aims to find the measurement model that has a minimal number of items with large non-invariance while keeping the majority of items approximately invariant, which is similar to the rotation in exploratory factor analysis in which large factor loadings are retained but small factor loadings are minimized. LMI testing with ILD treats time points as independent groups and starts with the most relaxed model over time, that is, a configural invariance model, without constraining any parameters to be equal across time points. Factor mean and variance of time points can be estimated while searching for the optimal measurement invariance model that has a minimal amount of measurement non-invariance. Factor means and variances of each measured occasion are computed based on the sum of measurement non-invariance in both intercepts and factor loadings of all pairs of time points. The total simplicity F function that sums the total measurement non-invariance is listed in Equation (13).
where p is the number of observed variables, o m and o n indicate time points m and n (m 6 ¼ n) for every pair of time points in the data, k po m and k po n represent the factor loadings of m and n, respectively, and similarly, m po m and m po n are the intercepts of m and n, respectively. There are two different types of implementation of AO: free and fixed optimizations. In the free optimization method, the factor means and variances of all time points are freely estimated, whereas in fixed optimization the factor mean and variance of the first time point are fixed at 0 and 1, respectively. AO (as implemented in Mplus) provides detailed invariance analyses and results that are of focal interest in this study. For each item parameter, an invariant set of time points is achieved, such that each item parameter of any time point in this invariant set is not statistically significant from the average of the parameter of this invariant set at the alpha level .001 (Asparouhov & Muth en, 2014). On the contrary, for each time point that does not belong to the invariant set, its parameter is statistically different from the average of the parameter of the invariant set. Therefore, non-invariant time points are identified for the factor loading and intercept of each item, which can assist researchers in detecting the potential source of non-invariance (e.g., time points over the weekend are non-invariant from the other time points that occur over weekdays). In Mplus, noninvariant time points are marked with parentheses. However, there is a lack of guidelines about the degree of non-invariance that is allowed for meaningful interpretation of factor mean difference (Kim et al., 2017) in addition to the requirement of configural invariance and the violation of the independence assumption with ILD. Interested readers can refer to Asparouhov and Muth en (2014), Flake and The interval-contingent design is in contrast to signal-contingent and eventcontingent designs (Bolger & Laurenceau, 2013). With signal-contingent designs participants provide data at time points randomly signaled by the researcher and in event-contingent designs, participants provide data when a predefined event has occurred (e.g., smoking event). Each of these designs has advantages and disadvantages and the decision to use a particular design is usually determined by a number of factors that include the nature of the construct under investigation (e.g., amount of variability), research questions, and practical considerations (e.g., participant compliance). Papini et al.'s (2020) systematic review reported that the studies they examined (n ¼ 29) were predominantly based on either random prompts (n ¼ 13) or fixed time intervals (n ¼ 10). The examples of ILD data we observed in the literature are presented in the online supplements: https://osf.io/8pf6k/?view_only= a0e99ef8916b433baabceae49a725eeb.
McCoach (2018), Kim et al. (2017), and Lai (2021) for additional technical details, other features, and discussions about the benefits and limitations of AO in non-ILD settings.

Research Questions
In this study, we evaluated the performance of cross-classified factor analysis (CCFA) and alignment optimization (AO) for longitudinal measurement invariance (LMI) testing with intensive longitudinal data (ILD) through three research questions.
Research Question 1 (RQ1): How well does CCFA detect noninvariant items across time points with ILD?
Research Question 2 (RQ2): How well does a covariate identify a source of non-invariance in CCFA?
Research Question 3 (RQ3): How does AO detect non-invariant items and time points in LMI testing with ILD when time points are treated as independent groups?
To address the three research questions, we conducted a simulation study. Data were generated under one scheme for all three questions. However, fitted models and simulation outcomes are different across RQs and hence results are presented by the research question. Of note is that CCFA and AO are different analytic approaches to LMI with ILD producing different types of outcomes, and we did not intend to compare their performances.

Data Generation
The simulated data should be realistic and plausible based on research settings that are likely to occur in reality. Thus, we adopted population parameters from the real data example that was found in McNeish et al. (2021) and generated data based on research conditions we observed in ILD applied studies (e.g., Hardin & Smith, 2022;Madden et al., 2020;Thompson et al., 2012) as well as systematic reviews of intensive longitudinal studies (Heron et al., 2017;Rabasco & Sheehan, 2022). We also reviewed simulation studies on ILD (e.g., Asparouhov et al. Data were generated under the framework of CCFA. That is, we generated observations nested within time as well as people. At the within level, four continuous and multivariate normally distributed observed variables (Y1-Y4) under a single factor were created. 3 The factor loading fixed effects (K 00 ) for the four items were 0.8, 0.6, 0.7, and 0.7, respectively. The intercept fixed effects (m 00 ) were set at 3.0 for all four items. Residual variances of items (H) were all 1.0. The factor mean (a 00 ) was fixed at 0. Factor variance at the within level was set at 1.0. Note that the factor mean and within-level factor variance were constrained at 0 and 1, for identification. The factor variance at the person level (x ðpÞ g ) was simulated at 1.0, and the time level variance (x ðsÞ g ) was varied to create different levels of intraclass correlation (ICC; see Simulation Factors).
At the person level, we generated factor loading variance (x ðpÞ K ) and intercept variance (x ðpÞ m ) across persons at 0.07 (SD ¼ 0.26) and 0.10 (SD ¼ 0.32), respectively, which are considered as substantial differences across persons in terms of factor loadings and intercepts (i.e., factor loading noninvariance and intercept non-invariance across persons). Although our focal interest in this study was non-invariance over time, we generated non-invariance between persons to represent the situation where non-invariance is present between persons and moreover to ensure that LMI can be tested and established independent of MI across persons (regardless of invariance status across persons). In CCFA, the variances of item parameters were estimated at between person level (x ðpÞ K and x ðpÞ m ) and would be identified as noninvariance, but because non-invariance across persons was not of interest, its performance was not investigated in this study.

Simulation Factors
The simulation factors and their levels were selected based on our review of the aforementioned applied and methodological papers of ILD and also based on the results of our preliminary simulation.

Number of Time Points (t ¼ 15, 50, 100)
Because the parameters of interest in this study (factor loadings and intercepts across time points) are at the time level, the number of time points is expected to be related to the performance of CCFA and AO in detecting non-invariance. In the ILD applied studies we reviewed (n ¼ 11), t was as small as 12 and the mean was about 50. In previous simulation studies of ILD, generally, the minimum ranged from 10 to 30 and the maximum ranged from 100 to 300. Thus, 15 was selected as a small t in ILD. Although it was not uncommon to observe time points over 100, we expected the results of t ¼ 100 could be generalizable to a larger t.

Number of Persons (n ¼ 50, 100, 200)
The two systematic reviews of applied ILD studies reported 63 (with a range of 13 to 248) and 77 (range 6-303), respectively, for the average number of participants. In the simulation studies, n generally ranged from 10 to 300 with 200 as a more common maximum. We considered 50 and 100 as common sample sizes and 200 as a large sample size in ILD studies.

3
Because participants in ILD studies need to respond to survey prompts numerous times (e.g., five times a day for 10 days), the length and complexity of a survey needs to be very limited (Shiffman et al., 2008) and it would be easier to have fewer items for one latent construct. For example, the average number of items was 2.75 in the range of 1-9 in a systematic review of ILD studies of suicidal ideation (Rabasco & Sheehan, 2022).
2.3.3. Factor ICC at the Time Level (ICC ¼ .20,.60) The factor ICC at the time level was computed as ICC ¼ where the notations were defined earlier and 1 is the fixed within factor variance. The between-time factor ICC was generated by varying the between-time factor variance (x ðsÞ g ) at two levels (0.5 and 3.0) 4 while fixing the between-person factor variance (x ðpÞ g ) at 1. Thus, with an ICC of .20, the between-time factor variance (0.5) was half of the between-person factor variance (1.0); with an ICC of .60, the first (3.0) was three times larger than the second (1.0).

Autoregression (AR
There are two conditions: no autoregression and large autoregression. The condition of no autoregression served as a baseline to compare with AR 0.6 to investigate how AR affected the performance of the LMI testing methods. In the reviewed applied studies, AR ranged from 0.016 (not significant) to 0.37; in the simulation studies, the range was 0-0.70. We selected a large AR to ensure that we did not miss any potential impact of AR. It should be kept in mind that autoregression was generated only for the intercept non-invariance conditions: when autoregression is present or estimated, factor loadings are not allowed to vary over time (no random slopes over time). Thus, AR was applicable only to the intercept non-invariance conditions.

MNI Location (Factor Loading, Intercept)
Measurement non-invariance (MNI) was generated at either factor loadings or intercepts of two items (Y3 and Y4) out of four. The two items without MNI (Y1 and Y2) were used to evaluate the false detection of MNI.
For the following two simulation factors (MNI proportion and MNI size), we used covariate effects on the between-time random effects of item parameters (s tK , s tm ) to manipulate the MNI proportion and size. Using a covariate has advantages over directly manipulating the sizes of the between-time variances (x ðsÞ K , x ðsÞ m ) when generating MNI in CCFA. First, we were interested in identifying the source of MNI over time, and using a between-time covariate (X) allowed us to create a set of specific time points to be non-invariant (X ¼ 0 for invariant time points; X ¼ 1 for non-invariant time points). Second, we could also control the proportion of non-invariant time points by adjusting the cut point when creating a binary covariate, which is explained below.
2.3.6. MNI Proportion (25%, 50%) A binary covariate (X) that was related to non-invariant time points (0 for no MNI and 1 for MNI) was generated from a normally distributed variable. When the normally distributed variable ($N(0,1)) was dichotomized, we controlled the proportion of 1s by using a cut point. The cut point 0 created 50% of 1s or non-invariant time points, and the cut point 0.67 created 25% of 1s or non-invariant time points.
2.3.7. MNI Size (No, Small, Large; within-Subjects Factor) The size of MNI over time in this study was the variances of factor loadings or intercepts across time points (x ðsÞ K or x ðsÞ m ). First, MNI size was a within-subjects simulation factor in this study. That is, MNI size varied across four items. Y1 and Y2 were generated as approximately invariant over time (namely, size ¼ no). The variances of Y1 and Y2 at the time level were 0.0004 (SD ¼ 0.02) for factor loadings and 0.002 (SD ¼ 0.045) for intercepts, which were observed in McNeish et al. (2021) when item parameters were considered as invariant. A large size MNI and a small size MNI were generated for Y3 and Y4, respectively, using a covariate effect as shown in Equations (11) and (12). The covariate effects for factor loadings (c K ) were 0.7 and 0.4 for large and small MNI, respectively; those for intercepts (c m ) were 1.0 and 0.6, respectively. For factor loadings, the covariate effects 0.7 and 0.4 with a residual variance (the variance of u tK ) 0.0004 yielded the corresponding large and small loading variances (x ðsÞ K ) 0.092 and 0.030 for 25% MNI and 0.123 and 0.040 for 50% MNI conditions. For intercepts, the covariate effects 1.0 and 0.6 with a residual variance (the variance of u tm ) 0.002 yielded the corresponding large and small intercept variances (x ðsÞ m ) 0.190 and 0.070 for 25% MNI and 0.252 and 0.092 for 50% MNI conditions. The population parameters are summarized in Table 1.
For the intercept MNI, all the factors were crossed except the within-subjects factor (MNI size), which yielded 3 Â 3 Â 2 Â 2 Â 2 ¼ 72 conditions. With 3 Â 3 Â 2 Â 2 ¼ 36 conditions (no AR conditions) for the loading MNI, a total of 108 conditions were simulated. For each condition, 100 replications were generated. It should be noted that the In McNeish et al. (2021), the between-time factor variance was 0.012 while the between-person factor variance was 0.913. In our empirical investigation with one set of ILD, the between-time factor variance was 1.359 while the between-person factor variance was 1.074. execution time to run a single CCFA model with random intercepts and random slopes was about 30 min 5 (although it varied considerably depending on sample size and the presence of random slope). Because we fitted multiple CCFA models in addition to AO for each replication, we had to limit the number of replications to 100 and also curtail the simulation conditions.

RQ1
To examine how well CCFA can detect non-invariant items, we fitted CCFA to the generated data. When data were generated without AR, we did not specify AR in the analysis model (CCFA-noAR). However, when data were generated with AR, we ran two CCFA models: CCFA-AR and CCFA-noAR. In CCFA-AR, a first-order autoregressive effect was specified for the factor at the within level. We fitted CCFA-noAR in addition to CCFA-AR when the data were generated with AR to investigate the impact of ignored AR in detecting MNI because researchers might ignore modeling AR in LMI testing. CCFA-AR was not applicable to the factor loading non-invariance conditions and applied only to the intercept non-invariance conditions. For both CCFA models, we freely estimated all between-person and between-time variances for factor loadings and intercepts because in reality, researchers do not know which items are invariant or non-invariant. The factor variance at the within level and factor mean were constrained at 1 and 0 for model identification as noted earlier.

RQ2
To evaluate the covariate effect on the random effects of item parameters at the time level as a source of MNI, we ran CCFA with a covariate. The covariate was specified as a time-level variable and entered as a predictor of the random factor loadings of each item for factor loading non-invariance conditions and as a predictor of the random intercepts of each item for the intercept non-invariance conditions at the time level as shown in Equations (10) and (11). The covariate effect was specified for all four items. Note that when AR was present in the population, we specified AR to be estimated (CCFA-AR); when it was absent, we ran CCFA-noAR. In other words, AR was always correctly specified because the preliminary analysis (under the four conditions with the largest t and n) showed that the ignored AR impact on the covariate effect estimation was negligible when CCFA-noAR was fitted to the data with AR. The other model specification was identical to the fitted models in RQ1.

RQ3
To investigate the behaviors of AO in ILD LMI testing, we fitted AO models to all the generated data: intercept MNI with AR, intercept MNI without AR, and factor loading MNI (without AR). The time points were specified as known classes and a single-factor CFA model was built for each time point for LMI testing. We used the fixed optimization method because free and fixed methods did not show any notable differences in the preliminary analysis. For data generation and subsequent analyses, we used Mplus version 8.4 or higher (Muth en & Muth en, 2021) and R package MplusAutomation (Hallquist & Wiley, 2018) to execute Mplus and extract simulation outcomes. Simulation outcomes were analyzed and summarized with R and SAS. See Appendix A for the Mplus syntax for data generation and each fitted model. For CCFA, Bayesian estimation with GIBBS sampler was utilized with the default non-informative priors. A random cross-classified model was run with two processors and 2500 BITERATIONS using the procedure of potential scale reduction (PSR; Gelman et al., 1996) based on the last half of the iterations. For convergence, we used the default PSR criterion with BCONVERGENCE ¼ 0.05 in Mplus which indicates that the PSR is below 1.10 for all model parameters (Asparouhov & Muth en, 2010). For AO, ML estimation with the EM algorithm was utilized by default.

Research Question 1
For RQ1, we evaluated (1) the MNI detection rates as a primary simulation outcome, (2) the adequacy of the cutoff we used to determine MNI, and (3) parameter recovery (bias and root mean square error) under the conditions of ignored AR.
2.5.1.1. MNI Detection Rate. The major simulation outcome for RQ1 was the MNI detection rate which was defined as the proportion of replications in which a designated item parameter was detected as MNI out of 100 replications using the criterion we developed as discussed below. The MNI detection rates were computed for all four items. The detection rates of invariant items (Y1 and Y2) were regarded as false positive (FP) and those of non-invariant items (Y3 and Y4) as true positive (TP).
We considered the examined item parameter (factor loading or intercept) as MNI if the corresponding betweentime variance was larger than 0.02. 6 Given a lack of guidelines, we selected 0.02 as a cutoff because in the Bayesian approximate invariance literature the variance of 0.01 is often considered as an acceptable level of heterogeneity of item parameters across many groups (e.g., Cieciuch et al., 2014;van de Schoot et al., 2013). Also, the variance of 0.02 (SD ¼ 0.14) corresponds to a 95% confidence interval 5 We used R package parallel for simultaneous running of multiple models to maximize the computer capacity. With 10 cores utilized, running four conditions of the largest sample size took about 3 days.

Adequacy of the Cutoff
To evaluate whether the cutoff of 0.02 is reasonable for both factor loading and intercept non-invariance, we inspected the distribution of between-time variances across 100 replications for each item parameter of four items one by one. Specifically, we inspected the 1st, 5th, 10th, 50th, 90th, 95th, and 99th percentiles to determine a reasonable cutoff for MNI (what would be considered as a substantial size of between-time variance for factor loadings and intercepts, respectively?).

Parameter Recovery under the Ignored AR Conditions
To assess the impact of ignored AR on the CCFA performance, in addition to the MNI detection rates (based on the between-time variances), we inspected all the other parameters in the model (CCFA-noAR). For any influenced parameters, we checked the parameter recovery using bias/relative bias and root mean square error (RMSE) as secondary simulation outcomes and compared their values with those of the correctly specified model (CCFA-AR).

Research Question 2
The simulation outcomes for RQ2 include (1) power and Type I error of the covariate effect, (2) parameter recovery (bias and RMSE) of the covariate effect, and (3) the size of residual variance (unexplained by the covariate).

Power and Type I Error
The focal interest in RQ2 is the covariate effect on the random factor loadings and intercepts at the time level because a substantial size of the covariate effect and its statistical significance (based on the 95% credible interval) can indicate this covariate as a source of non-invariance over time. The primary simulation outcomes were the power and Type I error rates of the covariate effects on non-invariant item parameters (Y3 and Y4) and invariant item parameters (Y1 and Y2), respectively. The estimated covariate effect was considered statistically significant if the 95% credible interval did not capture zero.

Parameter Recovery
In addition, we investigated the parameter recovery of the covariate effect with bias (also relative bias) and RMSE.

Size of Residual Variance
If the covariate is a source of non-invariance over time and explains the between-time variance, the corresponding residual variance would become smaller and negligible (close to the variance of invariant item parameters, such as 0.002). Thus, we also evaluated the size of residual variance of each item parameter at the time level.

Research Question 3
For RQ3 with AO, two major simulation outcomes were evaluated: (1) MNI detection rates for item parameters and (2) accuracy in identifying non-invariant time points as a source of non-invariance.

MNI Detection Rates
In AO, if an item parameter at a time point is statistically different from the average of the parameter across invariant time points, it indicates the non-invariance of the item parameter at that time point, which appears in parentheses in the Mplus output of "approximate measurement invariance (non-invariance) for groups" (Asparouhov & Muth en, 2014). Thus, we defined an item parameter as MNI if the number of parentheses (i.e., the number of detected non-invariant time points) was larger than 10% of the total number of time points. 7 That is, two parentheses or more with 15 time points, six parentheses or more with 50 time points, and 11 parentheses or more with 100 time points were considered as MNI of the item parameter. Note that there is no specific guideline on how to determine MNI with parentheses information in the literature especially for each item parameter although Muth en and Asparouhov (2013) suggested a limit of 25% non-invariance across all item parameters for valid mean comparisons across groups. Given a lack of guidelines, we chose the 10% rule considering that in the population we generated there were a minimum of 25% non-invariant time points with a small size of MNI (in Y4), which can be considered as a plausible lower bound of MNI in reality.

Accuracy in Identifying the Non-invariant Time Points
The second simulation outcome for AO is the accuracy in identifying the non-invariant time points which is one of the main interests of this study. Note that we considered the detected non-invariant time points as a source of noninvariance. Because the proportion of non-invariant time points varied by condition, we examined the accuracy for the invariant time points (X ¼ 0) and for the non-invariant time points (X ¼ 1) separately: 100% accuracy for the invariant time points means that none of the invariant time points had parentheses and 100% accuracy for the noninvariant time points means that all non-invariant time points had parentheses.

MNI Detection Rates
First, for all the fitted models in RQ1 and RQ2 using CCFA, the models generally converged with admissible 7 We also examined the proportion of non-invariant time points (i.e., the number of time points that are detected as MNI divided by the total number of time points). We did not use this proportion as a detection rate because it depends on the proportion of MNI and cannot reach 1.0 (.50 maximum with 50% MNI; .25 maximum with 25% MNI).
solutions; the convergence rate was .98 or above. The nonconverged replications were excluded in the subsequent data analyses. When AR was not present in the population (no AR conditions), the MNI detection rates of the four items based on the cutoff of 0.02 for factor loading and intercept variances across time points are summarized in Table 2. Regarding factor loading MNI conditions, the FP rates for Y1 and Y2 were below .05 across all conditions. The TP rates were mostly 1.00 for large MNI (Y3). For small MNI (Y4), the TP rates were also usually 1.00 if the MNI proportion was 50%, but ranged from .78 to 1.00 with 25% noninvariant time points. For the intercept MNI conditions, the TP rates for Y3 and Y4 were above .90 in all conditions. However, the FP rates were seriously inflated in small sample size conditions. With 15 time points, the FP rates were mostly >.05, ranging from .01 to .63 and decreasing as the number of persons increased and ICC became smaller. With t ¼ 50 and n ¼ 50, the FP rates ranged from .00 to .10. In all the other conditions with more time points and more persons, the FP rates were close to zero.
When AR was present in the population under the intercept non-invariance conditions, we did not observe any notable impact of AR on the MNI detection rates. That is, the MNI detection rates of CCFA-noAR that ignored the AR were very comparable with those of CCFA-AR. Furthermore, whether there was AR in the population or not (AR vs. no AR conditions) did not make notable differences in MNI detection although the FP inflation with small samples was generally more serious under the presence of AR in the population regardless of fitted models. Because the results of AR conditions were not apparently different from those of no AR conditions, we do not present their detection rates in the paper, but the complete results are available online 8 (see Table S1). In sum, the TP rates were near 1.00 and the FP rates were close to zero for both factor loading and intercept non-invariance conditions when t ! 50 and n ! 100 regardless of the presence of AR in the population.

Adequacy of the Cutoff
To evaluate the adequacy of the cutoff 0.02 and explore potential cutoffs for MNI detection, we further examined the distributions of factor loading variances and intercept variances at the time level across 100 replications. The 1st, 5th, Table 2. Detection rates of non-invariant items using 0.02 cutoffs for the variances of intercepts and factor loadings in cross-classified factor analysis (CCFA).  10th, 50th, 90th, 95th, and 99th percentiles of the betweentime variance for each item parameter are presented in Tables S2 and S3 (from two fitted models as examples: CCFA-AR under intercept non-invariance and CCFA-noAR under factor loading non-invariance, respectively). The cutoff 0.02 we used as a substantial size of between-time variances appeared reasonable for both factor loadings and intercepts, especially if the number of time points is 50 or more and the number of persons is 100 or more as evidenced by the satisfactory detection rates presented above. However, except for the smallest sample size condition (t ¼ 15, n ¼ 50), the 95th percentiles of invariant intercept variances (Y1 and Y2) usually did not exceed 0.04 while the 5th percentiles of non-invariant intercept variances of Y3 (large MNI) and the 10th percentiles of the corresponding Y4 variances (small MNI) exceeded 0.04 (see Table S2). Thus, the cutoff 0.04 could generally achieve TP .90 or above even for small MNI while controlling for FP .05 or below. Similarly, the 95th percentiles of invariant factor loading variances usually did not exceed 0.01 while the 5th percentiles of non-invariant factor loading variances of Y3 and Y4 exceeded 0.01 except for one condition (see Table  S3), which indicates the cutoff 0.01 could be considered for TP .95 or higher and FP .05 or lower. We applied these two alternative cutoffs for intercept and factor loading MNI conditions, respectively. The detection rates based on the new cutoffs are presented in Table 3 for t ¼ 15 (see Table S4 for all conditions). For intercept MNI conditions, the FP rates in small sample size conditions decreased with the cutoff of 0.04 compared to the cutoff of 0.02; for factor loading MNI conditions, the TP rates of Y4 (small MNI) improved considerably. In sum, the cutoff of 0.02 was reasonable when t ! 50 and n ! 100, but when t < 50 and n < 100, the cutoff of 0.01 for factor loading variances and 0.04 for intercept variances showed improved performance.

Parameter Recovery under the Ignored AR Conditions
Although we did not find a noticeable impact of ignored AR on the MNI detection rates, we further examined whether the ignored AR affected the other parameters in the CCFA-noAR model as secondary simulation outcomes. Hence, under the conditions of large AR in the population, the parameter estimates of CCFA-AR and CCFA-noAR were compared systematically. The comparison showed that some parameter estimates at the person level of the CCFA models were impacted but parameters at the within level and time level were not. Thus, it is not surprising the MNI detection rates at the time level were robust to the misspecified AR. The bias and RMSE of the affected parameters (fixed-effect factor loadings, between-person factor loading variances, and between-person factor variance) from the two CCFA models (CCFA-noAR vs. CCFA-AR) were examined (Table S5). These parameters were all overestimated substantially in CCFA-noAR compared with those in CCFA-AR. For example, the bias in the fixed-effect factor loading of Y1 at the person level ranged from 0.10 to 0.18 for CCFA-noAR whereas the bias ranged from 0 to 0.03 for CCFA-AR. The corresponding RMSE values were also notably larger in CCFA-noAR than CCFA-AR. In sum, when the AR effect was ignored, fixed-effect factor loadings, between-person factor loading variances, and between-person factor variance were overestimated with larger RMSE.

Power and Type I Error of the Covariate Effect
In each of the population models, the correctly specified model with the covariate effect on random factor loadings or random intercepts at the time level was run to investigate the power/Type I error rate as well as the recovery of the covariate effect. Because the presence of AR in the population model did not have a noticeable impact on the covariate effect estimation, we present the results of no AR conditions in Table 4 (see Tables S6 and S7 for the complete sets of results). For loadings MNI conditions, the Type I error rates of the covariate effect on Y1 and Y2 were around or below .05 except for a few conditions with 50 or 100 time points combined with 100 persons, but they did not exceed .10. The covariate effect was significant in 100% of the replications for Y3 and Y4 in almost all conditions. For Table 3. Detection rates of non-invariant items using new cutoffs for the variances of intercepts and factor loadings in cross-classified factor analysis (CCFA).  intercepts MNI conditions, when the covariate effect on the random intercepts was simulated as zero for Y1 and Y2, Type I error rates were all under good control. When there were 15 time points, the power rates ranged from .73 to 1.00, and from .21 to .98 for Y3 and Y4, respectively. It varied majorly as a function of the covariate effect size and the proportion of non-invariant time points, decreasing with a smaller covariate effect (Y4) and 25% of non-invariant time points. When there were 50 or more time points, the covariate effect was significant in 100% of the replications. In sum, regardless of the AR status, both Type I error and power of the covariate effects were adequate across conditions except when there was small non-invariance in the intercepts with t ¼ 15 and n 100 where power was lower.

Bias and RMSE
The bias and RMSE of the covariate effect were also examined in the CCFA models as shown in Table S7. The bias of the covariate effect estimates was close to zero in most conditions irrespective of MNI location. RMSE ranged from 0.01 to 0.12 in the loading MNI conditions and from 0.02 to 0.23 in the intercept MNI conditions, deceasing with more time points and persons as expected. In sum, the covariate effects were unbiased across simulation conditions.

Results of RQ3
First, the AO models that were fitted to the data under factor loading MNI, intercept MNI without AR, and intercept MNI with AR conditions generally converged with admissible solutions; the convergence rate was .96 or above across conditions. The non-converged replications were excluded in the subsequent data analyses. The proportion of replications that had 10% or more time points detected as noninvariant (i.e., 10% or more parentheses) for each item parameter was defined as the detection rate in AO: FP rates for Y1 and Y2 and TP rates for Y3 and Y4. In the loading MNI conditions, we observed item switching in terms of noninvariant items (namely, item switching). To be more specific, in the population, Y3 and Y4 were simulated to be non-invariant; but the AO output in some replications showed Y1 and Y2 as non-invariant while Y3 and Y4 as invariant. Item switching for factor loadings happened usually in the 25% MNI conditions (switching rates up to .66 with large t and n). To evaluate the simulation outcomes Notes. MNI: measurement non-invariance; AR: autoregression; prop.: proportion of non-invariant time points (%); ICC: factor intraclass correlation. Type I error rates for Y1 and Y2 (invariant items); power rates for Y3 and Y4 (non-invariant items).
accurately, we identified those cases and switched the item labels (switch between Y1 and Y3; between Y2 and Y4). See Discussion for more information on switching.

MNI Detection Rates
The MNI detection rates of AO are presented in Figure 1 and Table S8. In the loadings MNI conditions, when there were only 50 persons, the detection rates of Y3 (large MNI) were close to zero regardless of all other design factors. When there were 100 or more persons, the detection rates ranged from .09 to 1.00, increasing with more time points, more persons, a larger proportion of non-invariant time points, and smaller ICC. The detection rates for Y4 (small MNI) had similar patterns but were generally lower than those for Y3, ranging from .01 to .80 when there were 100 or more persons. The FP rates for Y1 and Y2 were mostly zeros except for the conditions with a larger sample size (t ¼ 50 or 100, n ¼ 200) and 25% of non-invariant time points. A close inspection of these unacceptably high FP conditions revealed that Y1 seemed to be a duplicate of Y3 and thus was detected as MNI.
In terms of the intercept MNI conditions, regardless of AR status, the FP rates were mostly near zero (Figure 1). The TP rates were generally higher than those of loading MNI conditions, but the patterns of TP rates were similar between intercept and loading conditions. The impact of n was notable. For example, holding t ¼ 50, MNI proportion ¼ 50%, and ICC ¼ .20, as n increased from 50, 100, to 200, the TP rates of Y3 increased from .17, .88, and 1.00. The ICC was negatively related to the detection rates: for example, the detection rates were 1.00 and .78 for small and large ICCs, respectively, when t ¼ 100, n ¼ 100, and MNI proportion ¼ 50% under AR. Note that generally ICC was not related to the detection rates in CCFA.
The impact of AR on the detection rates was observed in some conditions of intercept MNI. When MNI size was small (Y4), the detection rates under no AR conditions were often higher than those under AR (e.g., .16-.91 under no AR vs. .09-.43 under AR for Y4 intercept when t ¼ 50 and n ¼ 200) indicating some negative impact of AR on detecting small MNI using AO. However, for large MNI (Y3) the detection rates under AR were usually higher than under no AR when ICC was large, t was small, and MNI proportion was small. In sum, the FP rates were close to zero across conditions except for the loading non-invariance conditions with n ¼ 200 and MNI proportion ¼ 25%. The TP rates were associated with all simulation factors, but particularly with the number of persons (the larger, the higher).

Accuracy of Identifying Non-invariant Time Points
To examine the accuracy of AO in identifying the invariant and non-invariant time points, the true invariant time points (that corresponds to the covariate value of 0) and the detected invariant time points by AO were compared (X0 accuracy); the true non-invariant time points (with a covariate value of 1) and the detected non-invariant time points by AO were compared (X1 accuracy) separately. Of note is that we observed switching between invariant and noninvariant time points in AO (namely, time point switching) generally when the proportions of invariant and non-invariant time points were balanced (50% MNI). The switching rates reached about 50% when n was 200 (see Discussion for more information on switching). When it occurred, we switched back invariant and non-invariant time points. Because the accuracy was not meaningful when the items were not detected as MNI with at least 10% non-invariant time points, we excluded the conditions in which the MNI detection rates were too low: n ¼ 50 and 100 for the factor loading non-invariance and n ¼ 50 for the intercept noninvariance. Because the patterns were similar between factor loading and intercept MNI conditions, we present the accuracy rates of the second only in Figure 2 (see Table S9 for the complete results).
Across all conditions (e.g., intercept or loading MNI, with or without AR), the accuracy in identifying invariant time points (X0) was very high (above 90%). However, the accuracy rates of non-invariant time points (X1) varied depending on simulation factors as observed in the MNI detection rates. Smaller n, smaller MNI, and larger ICC resulted in substantial deterioration in the accuracy. More time points and more non-invariant time points improved the accuracy. Taken together, regardless of AR status and the proportion of MNI, when t ¼ 50 or more, n ¼ 200, and ICC ¼ .20, the accuracy rates of Y3 (large MNI) were 1.00 or close to 1.00. In contrast, when n ¼ 100 and MNI size was small (Y4), the accuracy rates were near zero. That is, the time points with small MNI could not be detected with n ¼ 100 or less. However, with n ¼ 200, at least half of the non-invariant time points of Y3 could be correctly identified (accuracy rates over .50) across all conditions. The impact of AR was also observed on the accuracy of non-invariant time points (X1). The AR impacts were mixed depending on ICC and MNI size. The accuracy rates of Y3 (large MNI) were generally higher with AR than without AR in the population and this pattern was more notable when ICC was large (.60). On the contrary, the accuracy rates of Y4 (small MNI) were generally higher without AR and this pattern was more obvious when ICC was small (.20). In sum, the accuracy of detecting invariant time points was 90% or above across simulation conditions. The accuracy of detecting non-invariant time points depended on simulation factors such that higher accuracy was associated with more time points, more persons, larger MNI, larger MNI proportions, and smaller ICC.

Discussion
This simulation study found that for intensive longitudinal data (ILD) that allow time as a unit of analysis (e.g., ILD based on the interval-contingent design), CCFA was an excellent tool for longitudinal measurement invariance (LMI) testing: CCFA detected non-invariant item parameters almost always (100% or near 100%) across conditions while controlling for false positive rates near zero when the sample size was sufficiently large. We consider CCFA as an excellent method because its performance was generally robust to the simulation factors in this study. That is, when t ! 50 or n ! 100, CCFA reached optimal points (e.g., high TP and low FP) regardless of simulation factors. Moreover, CCFA allows a covariate to explain non-invariance. For the covariate effect, both power and Type I error control were excellent and the estimated covariate effect was unbiased across simulation conditions. In the following sections, we discuss salient findings and the limitations of the current study along with future research directions. We also provide practical recommendations for applied researchers.

Major Findings
First, the presence of the AR effect in the population did not have a notable impact on the LMI testing using CCFA, and furthermore modeling AR or not (CCFA-AR vs. CCFA-noAR) did not matter in the detection of MNI. Given the prevalence of AR effects in ILD, the robustness of CCFA to the misspecified AR in detecting MNI would be considered a major strength. However, because the ignored AR effect was related to severe upward bias and inflated RMSE in some parameters at the person level (specifically, betweenperson factor variance, between-person factor loading variances, and fixed-effect factor loadings), it is still important to properly specify AR.
We support the use of the cutoff 0.02 for the item parameter variance at the time level to detect MNI. The CCFA performance was very adequate with this cutoff. However, close inspection of the item parameter variances across 100 replications showed that 0.04 for intercept MNI and 0.01 for factor loading MNI could be reasonable cutoffs when the sample size is small (t ¼ 15; t ¼ 50 with n ¼ 50). These cutoffs (0.04 or 0.01) can also be considered when researchers have concerns about false positives in detecting intercept non-invariance or want to detect even a small size of noninvariance in factor loadings, respectively.
Instead of using a cutoff, a researcher can evaluate the sizes of variances and interpret them in context as demonstrated in McNeish et al. (2021). For example, assuming factor loadings are normally distributed, $95% of factor loadings would fall within two standard deviations of the mean (fixed effect). If the obtained factor loading (fixed effect) is 0.8 and the factor loading variance is 0.07 (SD ¼ 0.26), about 95% of factor loadings would fall between 0.28 (¼ 0.8 À 2 Ã 0.26) and 1.32 (¼ 0.8 þ 2 Ã 0.26) with a range of 1.04, which can be considered non-invariant in the context of the study. If the factor loading variance is 0.002 (SD ¼ 0.04), about 95% of factor loadings would fall between 0.72 (¼ 0.8 À 2 Ã 0.04) and 0.88 (¼ 0.8 þ 2 Ã 0.04) with a range of 0.16, which can be considered invariant in context.
This study also demonstrated that researchers could investigate a source of non-invariance over time using a covariate as a predictor of item parameters (random effects) at the time level. For example, if a theoretical consideration indicates that people are likely to respond to an item differently in the morning compared to other time points, this morning indicator could be used as a predictor of the intercept and factor loading of the item. If the covariate effect is statistically significant, the covariate is considered as a source of non-invariance. In addition to checking the magnitude and statistical significance of covariate effects, it is strongly recommended to check the residual variances after modeling the covariate as a source of non-invariance. If the covariate is a source of non-invariance, the residual variances would become notably smaller and ideally close to zero because the between-time variance is explained by the covariate, which was observed in this simulation study.
Furthermore, this practice can be applied to identifying the source of MNI across persons at the person level with a person-level covariate as shown in Equations (9) and (10). When the factor loading or intercept of an item has a substantially large variance at the person level, a person-level covariate is entered to explain the large variance across persons. A statistically significant covariate effect that results in the reduction in the person-level variance indicates that the covariate is related to MNI of the item parameter across persons.
As expected, the performance of AO heavily depended on the simulation factors. Most of all, we observed the consequences of ignoring data dependency in AO. That is, as ICC increased, the detection rates notably deteriorated. We also observed the impact of ignored AR although the AR effects on AO performance (detection of non-invariant items and non-invariant time points) were more complicated with other simulation factors. On the other hand, the impacts of ICC and AR were not notable in CCFA for LMI testing.
In general, AO required a much larger sample size than CCFA. In CCFA, for the detection of non-invariance in item parameters, it is sufficient to evaluate their variances at the time level. However, in AO, item parameters were estimated for each time point (e.g., 50 factor loadings per item when t ¼ 50) and an item parameter at each time point is compared with the average across invariant time points. For example, the number of free parameters in AO with t ¼ 100 was 1299 whereas that was 30 in CCFA-noAR without a covariate. Given a large number of free parameters to estimate, AO needed a large sample size.
Specifically, AO required a large number of persons per time point whereas the number of time points had less impact on the MNI detection rates. In this study, we found that regardless of the number of time points, AO did not perform well with n ¼ 50 and its performance improved notably as n increased. For example, when n ¼ 50, the TP rates did not exceed 50% even with t ¼ 100. On the other hand, even with t ¼ 15, the TP rates reached over 90% if n ¼ 200. Because we treated time points as groups in AO (e.g., 15 time points as 15 groups), this finding is reasonable and also consistent with the literature. Previous studies on MI across multiple groups (e.g., Asparouhov & Muth en, 2014; Lai, 2021; Muth en & Asparouhov, 2013) reported that AO performed reasonably with the number of groups as low as two and with group sizes as low as 100, which correspond to two time points and 100 persons per time point, respectively. On the other hand, in CCFA, time is a unit of analysis at the time level, and thus, more time points are needed for stable estimation of the parameters at the time level.
We propose the use of AO specifically to find non-invariant time points to supplement CCFA. Because CCFA does not produce the information on which time point is invariant or non-invariant, we were very interested in how AO accurately identifies the invariant vs. non-invariant time points. The accuracy in detecting non-invariant time points was high (over 90%) in some conditions, but the accuracy decreased notably as n was small, ICC large, MNI magnitude small, and MNI proportion small. However, the good news is that AO seldom misidentified an invariant time point as non-invariant. The accuracy of detecting invariant time points as invariant was about 90% or above across all conditions irrespective of simulation factors. In other words, if a time point is detected as non-invariant, the detected time point is highly likely a non-invariant one. Thus, even though the accuracy for detecting non-invariant time points was not very high (e.g., 50%), the detected set of non-invariant time points (e.g., 50% of morning time points) could be very informative for researchers to identify a source of noninvariance. Thus, we conclude that AO is a reasonable method to supplement CCFA by providing information on non-invariant time points. To use AO for this purpose, it is recommended to have a large sample size if all possible, as discussed earlier.
We observed two types of switching across replications in AO: item switching and time point switching. The switching is very problematic in a simulation study because a different item or time point is identified as MNI across replications. We acknowledge that the switching mechanism we described here is based on our observations, which should be confirmed in a study with a more systematic design to delineate different patterns of switching. It should also be noted that when the detection rates were very low, we could not definitely discern whether switching occurred or not.
The item switching (i.e., the two invariant items Y1 and Y2 were detected as non-invariant while the two non-invariant items Y3 and Y4 were detected as invariant) occurred in the factor loading MNI conditions only. We scrutinized the replications with item switching (Y1 and Y2 detected) in comparison to the replications without item switching (Y3 and Y4 detected). We speculate that item switching happened because in some replications the higher factor loadings relative to the others were considered as noninvariance whereas in other replications the lower factor loadings relative to the others were considered as noninvariance. Suppose the estimated factor loadings of Y1 and Y3 are 0.8 and 1.4, respectively for t ¼ 1. In one replication, Y1 is detected as non-invariance because it is significantly lower than the average, but in another replication, Y3 is detected as non-invariance because it is significantly higher than the average. This type of switching was usually observed when the number of non-invariant time points was small (25%).
We also observed that time point switching occurred (i.e., invariant time points were labeled as non-invariant with parentheses, and vice versa) when the invariant and non-invariant time points were balanced. In other words, when there was no dominant set of time points between invariant vs. non-invariant sets, AO could arbitrarily label one of them as non-invariant. It occurred less with a smaller sample size possibly because non-invariant time points were less well detected and thus the invariant time points could be dominant as the invariant set. Although these two types of switching will be problematic for simulation researchers, they may not be of concern in practice. For example, applied researchers can tell what half is different from the other half without labeling one of them as non-invariant.

Limitations and Future Directions
The findings of the study should be viewed with caution when they are generalized beyond the scope of the study. Future research can expand the range of conditions examined in the present study to evaluate the generalizability of the present findings. For example, we assumed that the variables in a scale can be treated as continuous. However, items often have <5 response categories. Thus, the performance of CCFA with ordered-categorical variables needs to be investigated. Although we used a Bayes estimator in CCFA, we utilized the software default settings and the results were not necessarily interpreted in the Bayesian framework. Future research could incorporate, for example, the use of informative priors and the model evaluation of CCFA in the Bayesian framework. In terms of the factor structure, only a single factor model was investigated in this study. However, Cao and Liang (2022) found that the performance of MI testing could be different with multiple factors (e.g., the decreasing sensitivity of some fit measures in MI testing as model size increases), and thus the extension of this study to a more complex factor structure in CCFA is called for.
We used a covariate to identify a source of non-invariance (non-invariant time points). Mixture modeling has been suggested to assess measurement invariance across groups and to identify a source of non-invariance (e.g., Kim et al., 2017;Lubke & Muth en, 2005;Maij-de Meij et al., 2010;Wang et al., 2021). A similar strategy can be applied to LMI testing with time points. For example, a mixture model with ILD was illustrated in latent Markov factor analysis (Vogelsmeier et al., 2019) which allows researchers to explore the changes of a measurement model over time. We limited our focus on LMI without considering the test of non-invariance across persons. Future research can include a simultaneous investigation of MNI at both time and person levels particularly when there is an interaction between them. We also assumed that time is discrete to implement CCFA and AO. Future studies on LMI testing with different types of time variables are called for.

Practical Suggestions
In terms of sample size for CCFA, if all possible, we do recommend t ¼ 50 or more and n ¼ 100 or more to investigate longitudinal measurement invariance in intensive longitudinal data. The variance cutoff of 0.02 appears reasonable to detect non-invariance. However, when researchers need to evaluate ILD LMI with smaller ends of sample size, the cutoff of 0.04 variance for intercepts and 0.01varaince for factor loadings instead of 0.02 are recommended. However, LMI testing should not be considered with t 15 and n 50.
Although this study found no notable impact of AR in the CCFA LMI testing with ILD, the AR effect should be properly modeled in CCFA because the deleterious impact of ignored AR was apparent in the other parameters at the person level. Especially, researchers should be aware that the variances of factor loadings are inflated at the person level, which could lead to false detection of non-invariance across persons.
We recommend the use of AO as a supplement to CCFA. One of the major limitations of CCFA is the lack of information about non-invariant time points. We demonstrated using a covariate as a feasible solution to this limitation, but in practice, it may not be easy to find a good covariate. The information about non-invariant time points obtained from AO could be very useful to identify a covariate. Researchers can scrutinize the characteristics of detected non-invariant time points and select a potential covariate or covariates. However, applied researchers should be mindful that AO requires a large number of participants and its performance is optimal when ICC is small. When t ¼ 15, n should be at least 200; when t ¼ 50 or more, 100 or more participants are recommended.
In addition, theoretical considerations are strongly recommended in the selection of covariates. For example, if a researcher hypothesizes that the interpretation of an item measuring alcohol expectancy may differ depending on when they are alone or accompanied at the moment, a covariate "being alone" can be used to explain the detected noninvariance. Or, if an ILD study includes an intervention and it is hypothesized that the intervention changes not only the outcome but also possibly the way participants interpret some items of the outcome measure, the intervention can be considered as a covariate. This kind of theory-based covariates needs to be considered before data collection so that they can be included in data collection. As more researchers identify potential covariates, these covariates can then be built into future data collections.
Because CCFA assumes configural invariance, it is strongly recommended to use a well-established measure, utilize previous studies on the psychometrics of a selected measure, and conduct analyses, such as CFA before CCFA to examine the measurement properties of the measure with the sample. It should also be noted that there are LMI testing methods that are more exploratory (e.g., models that combine factor analysis and latent Markov modeling: Vogelsmeier et al., 2019;Xia et al., 2016) and they are advantageous in detecting unknown heterogeneity of measurement models over time without assuming configural invariance (Vogelsmeier et al., 2019).
After the detection of non-invariance over time and the source of non-invariance, further investigation is highly recommended. For example, the impact of detected non-invariance on subsequent data analyses needs to be evaluated. Also, the identified source of non-invariance (covariate) can be incorporated in subsequent data analyses and researchers should consider the potential impacts of non-invariance in the interpretations of the results. The information on noninvariance should be utilized in the next iterations of scale development or in the future ILD studies. Importantly, the identified source of non-invariance is a covariate and should not be interpreted as a cause of non-invariance. Further research (including qualitative studies) can be conducted to discover the cause of non-invariance.

Concluding Remarks
In conclusion, we propose CCFA for LMI testing with ILD and AO as a supplement to CCFA. For the detection of non-invariant items over time, CCFA is a very adequate tool. If some of the item parameters are detected as noninvariant through CCFA, AO can be utilized for the detection of non-invariant time points as a source of non-invariance when theory is lacking to identify a good covariate. Then, a relevant covariate based on the findings of AO can be used as a predictor to explain non-invariance at the time level and evaluated as a source of non-invariance.