Optimizing baseline and post-concussion assessments through identification, confirmation, and equivalence of latent factor structures: Findings from the NCAA-DoD CARE Consortium

Abstract Objective: Concussion evaluations use a multidimensional assessment to evaluate unique patient function dimensions (e.g., subjective symptoms differ from balance assessments), but the overarching latent factor structure has not been empirically substantiated. Our objective was to determine the cumulative latent factor structure of pre-injury baseline and acute (<48-h) post-concussion assessment battery outcomes, and determine measurement equivalence among common factors in collegiate student-athletes. Methods: Collegiate student-athletes at baseline (n = 21,865) and post-concussion (n = 1,537) across 25-institutions completed standardized assessments. Individual items were used from the baseline and post-concussion assessments and consisted of: Sport Concussion Assessment Tool, Brief Symptom Inventory-18, Standardized Assessment of Concussion, Balance Error Scoring System, Immediate Post-Concussion Assessment and Cognitive Test, and vestibular-ocular motor screening. Exploratory factor analysis was used on half the baseline data, and confirmatory factor analysis on the remaining baseline data and post-concussion data separately. Measurement equivalence was assessed between sex, sport contact classification, concussion history, and time. Results: A 10-factor exploratory model was established and comprised of: depression, somatic, vestibulo-ocular, headache, postural stability, neurocognition, emotional, fatigue, cognitive, consciousness clouding. The 10-factor model was confirmed at baseline and post-concussion with strong measurement equivalence between timepoints. Strong to strict measurement equivalence was observed for sex, sport contact classification, and concussion history at both timepoints separately. Conclusion: Our findings established a robust 10-factor latent factor model equivalent across timepoints and common factors among healthy and concussed collegiate athletes. Clinicians can use these findings to target specific factors while reducing redundant elements to provide efficient, comprehensive post-concussion assessments.


Introduction
Concussion is a common sport injury across all competition levels and sports (Chandran et al., 2022;Kerr et al., 2019), yet is one of the most elusive and challenging sport pathologies to diagnose since there is currently no single gold-standard measure (McCrory et al., 2017).Medical organizations and international concussion consensus groups therefore recommend multimodal battery examinations be comprised of physical, mental, sensorimotor, cognitive and/or vestibular oculomotor assessments to increase diagnostic certainty (Broglio et al., 2014;Harmon et al., 2019;McCrory et al., 2017).Each component of the multimodal battery can provide complementary insights for comprehensive patient function evaluation.For example, a subjective symptom inquiry provides clinicians information about a patient's perceived function that overall differs from the other clinical assessments.However, this premise is based primarily on expert opinion rather than evidence due to the overarching interrelationships and multidimensional latent factor structure of current concussion assessments being minimally explored (Kissinger-Knox et al., 2021).Limited insights into the overarching latent factor structure is a concern as the assessments and their underlying components used may overlap in constructs assessed (i.e.redundant), and thus optimizing the multimodal battery may be possible by reducing which components are included.
Determining if redundant outcomes are present both within and between assessments is important for minimizing the cumulative health care burden for clinicians and patients.The most comprehensive multimodal batteries come with a relatively large time, financial, and personnel cost demand for stakeholders.For example, implementing a battery that thoroughly evaluates the recommended domains can require at least one clinician to spend 60-90-minutes per individual (Mihalik et al., 2013) and use assessments that often cost per use (e.g., most computerized neurocognitive testing batteries), with more clinicians needed for pre-injury baseline testing and thus time and salary support.To reduce these burdens, we must first determine if redundancy is present so assessments can then be examined for ways to potentially optimize data elements.Multiple methods exist for identifying and reducing assessment redundancy (Garcia et al., 2020), with the most common approach being factor analysis to examine dimensionality present.
Factor analysis examining outcome structures has been used in prior concussion research surrounding symptom assessment checklists to either optimize the assessments by reducing survey items (Joyce et al., 2015;Piland et al., 2003), or identify overarching symptom clusters (i.e.latent factors) to help understand commonalities and guide patient assessment and rehabilitation (Anderson et al., 2020;Barker-Collo et al., 2018;Brett et al., 2020).This technique has also been used on computerized neurocognitive test platforms to simultaneously reduce data dimensions and improve diagnostic accuracy through construct use over composite scores (Schatz & Maerlender, 2013).Only one study to our knowledge has examined the dimensionality of a multimodal concussion assessment battery using principal component analysis among the post-concussion symptom checklist (PCSS), Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT), and vestibular-ocular motor screening (VOMS) (Kissinger-Knox et al., 2021).Kissinger-Knox et al. (2021) examined 237 adolescents within one-week post-concussion and identified three distinct latent factors (symptoms, cognition, vestibular-ocular) with not every outcome meaningfully loading onto a factor (i.e.potential battery redundancy or unnecessary components).However, the factor structure observed among this adolescent population may not translate to collegiate student-athletes.Further, Kissinger-Knox et al. ( 2021) tested a relatively incomplete multimodal battery as current evidence-based guidelines (McCrory et al., 2017) and clinical practice patterns (Lempke et al., 2020) suggest sensorimotor and cognitive assessments provide valuable information.Lastly, measurement equivalence (also known as measurement invariance) (Karr & Iverson, 2022;Putnick & Bornstein, 2016) was not assessed between demographic factors with known confounding potential, such as sex, concussion history, and sport contact classification (Iverson et al., 2017).Measurement equivalence is an analytic technique to assess to what degree a latent factor structure has consistency either between groups or across timepoints (Bontempo & Hofer, 2007;Vandenberg & Lance, 2000), and specifies whether the latent factors being measured are consistent and truly comparable (i.e., if meaningful factor group comparisons are appropriate).This is critically important as we draw inferences, for example, in changes between pre-injury baseline and post-concussion assessment outcomes.Deeper investigations are thus necessary to contextualize the latent factor structures of the multimodal concussion assessment battery pre-and post-concussion and examine measurement equivalence due to the scarcity and limitations of current literature examining dimensionality (Kissinger-Knox et al., 2021).
The purpose of our study was to 1) examine the latent factor structure of pre-injury baseline and acute (<48-h) post-concussion assessment battery outcomes through exploratory and confirmatory factor analysis, and 2) determine measurement equivalence for sex, concussion history, sport contact classification, and between baseline and post-concussion among collegiate student-athletes.

Methods
This study was part of the national Collegiate Athletic Association (nCAA)-Department of Defense (DoD) Concussion Assessment, Research and Education (CARE) Consortium (Broglio et al., 2017), a study aimed at examining the effects of concussion in collegiate student-athletes and united States military service academy cadets.We examined varsity student-athletes who completed previously described pre-injury baseline testing and longitudinal serial testing after a concussion (Broglio et al., 2017) between Fall 2014-Spring 2020.Longitudinal assessments occurred at standardized post-concussion timeframes, with data collected during the <48 h, timepoint being examined in the present study (Broglio et al., 2017).Some enrolled participants had multiple evaluations and assessments completed at the <48 h timepoint.In those cases, the first occurring assessment outcomes were used.This study is reported in accordance with Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (von Elm et al., 2008).

Participants
The initial sample of 60,720 baseline cases and 5,485 concussion cases from collegiate student-athletes and military cadets were examined for eligibility.The CARE Consortium cohort has been previously described regarding their pre-injury baseline assessment performance, post-concussion injury characteristics, and recovery trajectories elsewhere (Broglio et al., 2022).Student-athletes were excluded for the following self-reported factors being present: learning disability, autism, bipolar disorder, schizophrenia, moderate or severe traumatic brain injury history, brain surgery history, and any sleep-, balance-, vestibular-, or psychological-disorder.We excluded individuals with these self-reported factors as low case counts were observed in these data (≤3.5% of baseline cases), and therefore controlling or modeling them was not possible.The methodological decision to exclude these cases was done to ultimately minimize additional variance introduced that may confound the factor analyses, but likely negates the generalizability of these findings to individuals with these excluded factors.We also excluded participants enrolled in the CARE Consortium study who were non-varsity military cadets and midshipman due to these individuals completing different computerized neurocognitive tests than the student-athletes, and any individual who had incomplete assessment outcomes and subscores except for VOMS.Only the first baseline or post-concussion data occurring in time were used for analyses in the event of multiple records per student-athlete.Stepwise exclusion flowcharts are presented in Supplementary Figure 1, with the final sample size and demographics provided in Table 1.The united States Army Medical Research and Materiel Command Human Research Protection Office, the university of Michigan Institutional Review Board, and each CARE consortium sites' Institutional Review Board reviewed all study procedures.All participants provided written informed consent prior to participation.

Baseline and post-concussion assessments
Participants completed identical questionnaires during study enrollment across each site consisting of demographic, sport, concussion, and medical health history questionnaires (Broglio et al., 2017).Specific to the present study, the demographic characteristics for sex (male, female), sport contact classification as defined by Rice and The Council on Sports Medicine and Fitness (2008) (contact [e.g. Football, Soccer, Basketball], limited [e.g.Baseball, softball, volleyball], non-contact [e.g.golf, swimming, track]), and concussion history before enrolling in the study (yes vs no) were examined via measurement equivalence given their potential confounder roles to the latent factor structure (Iverson et al., 2017).The specific sport student-athletes participated in was considered for use, but was not possible given the numerous sports present in these data, and thus sport contact classification was used as a proxy.
The CARE Consortium assessments used at baseline and post-concussion and their psychometric properties within this dataset have been described elsewhere (Broglio et al., 2017(Broglio et al., , 2018)).In brief, the core assessments included the following assessments and outcomes for factor analysis use: Sport Concussion Assessment Tool (SCAT) symptom inventory (each 22-symptom item separately) (Bpgl, 2013;Guskiewicz et al., 2013), Brief Symptom Inventory-18 (BSI-18; each 18-item separately) (Derogatis, 2001;Lancaster et al., 2016), Standardized Assessment of Concussion (SAC; orientation, immediate memory, concentration, and delayed recall subscores) (McCrea et al., 1997(McCrea et al., , 1998)), Balance Error Scoring System (BESS; all 6 subscores from the surface and stance conditions) (Guskiewicz et al., 2001), and neurocognitive testing platforms at the discretion of each CARE site.The Immediate Post-Concussion Assessment and Cognitive Testing (Schatz et al., 2006) (ImPACT was used by 83% (25/30) of CARE sites, and therefore its standard domain scores (verbal and visual memory, visuomotor speed, and reaction time composite scores) were used for analysis.VOMS was a supplementary assessment used among a subset of 16 CARE Consortium sites and thus these outcomes (summed symptom provocation during smooth pursuits, horizontal and vertical saccades, convergence symptoms, horizontal and vertical vestibular ocular reflex, and visual motion sensitivity conditions separately) (Mucha et al., 2014) were included in our study as exploratory outcomes and modeling given its growing clinical use (Lempke et al., 2020) and relevance to concussion diagnostics (Ferris et al., 2021(Ferris et al., , 2022;;Kontos et al., 2021;Mucha et al., 2014).

Statistical analysis
Descriptive statistics via means and standard deviations, medians and interquartile ranges, and frequencies and proportions for participant baseline and <48 h post-concussion demographics (age, sex, race, sport contact level, division level, concussion history) were calculated separately and where applicable.All analytics were conducted using the R Project for Statistical Programming (v 4.2.1, Murray Hills, nJ) (R Core Team, 2018) with the exploratory factor analyses carried out with the Psych package (Revelle, 2022), and confirmatory factor analysis and measurement equivalence via the Lavaan package (Rosseel, 2012).The baseline dataset consisting of the SCAT, BSI-18, SAC, BESS, and ImPACT outcomes was split equally and randomly into two datasets (training: n = 10,933, testing = 10,932) with the training dataset used for exploratory factor analysis and testing dataset for confirmatory factor analysis and measurement equivalence testing.All outcomes were treated as discrete or continuous as is common for factor analysis modeling.

Exploratory factor analysis
The training dataset was assessed for factor analysis suitability through Kaiser-Meyer-Olkin (≥0.60) (Kaiser, 1974) and Bartlett sphericity tests (p ≤ 0.05) (Tobias & Carlson, 1969) at their respective a priori thresholds.Factor analysis suitability was confirmed and a subsequent Scree plot with parallel analysis was used to objectively determine an appropriate number of factors to use based on the eigenvalues, resulting in a 10-factor structure (Supplementary Figure 2).Importantly, the factor labels used throughout the manuscript were assigned by the author team based upon the items within a given factor consistent with factor analysis approaches (Karr & Iverson, 2022;Kissinger-Knox et al., 2021).
The exploratory factor analysis model was fitted with a ProMax rotation (i.e., oblique rotation method to allow inter-factor correlation) using maximum likelihood estimation.Exploratory factor analysis statistics consisted of factor loadings, factor-level variance explained, uniqueness (outcome variance not shared with other outcomes), Hoffman's index of complexity (how much an outcome reflects a single construct; value of 1.0 indicates loads onto a single factor with higher values indicating relatively more cross-loading), and the Tucker-Lewis Index (TLI; ≥0.90 indicates adequate fit) (Bentler & Bonett, 1980).Baseline assessment outcomes were required to meaningfully load onto a factor (≥0.40) to be included in the model, and any outcome demonstrating significant cross-loading (≥0.40 on two or more factors) was excluded from the model as significant cross-loading indicates the outcome is not a specific measure of the latent factor (Costello & Osborne, 2005;Howard, 2016).Following the above criteria being met, the exploratory factor model was passed on for confirmatory factor analysis testing.

Exploratory VOMS model
An exploratory factor analysis was applied to the subset of data also with complete VOMS assessment outcomes at pre-injury baseline (n = 5,530, k = 13), but not acutely post-concussion given the relatively small sample size (n = 515).The VOMS exploratory model followed the same analytic guidelines as the above exploratory factor analysis, with the exception of being fitted to an 11-factor and 10-factor model with confirmatory factor analysis comparing these two models for best fit.

Confirmatory factor analysis
The testing dataset and post-concussion dataset were separately fit to the final exploratory factor model.Confirming the model fit being used at post-concussion relative to pre-injury baseline provides empirical evaluation of whether the same overall factor structure is applicable before and after a concussion (i.e.whether or not we clinically should be using the same assessment constructs to evaluate pre-and post-injury).Confirmatory factor analysis and measurement equivalence models used maximum likelihood estimation with robust standard errors used within the Lavaan package (Rosseel, 2012).Model fit indices of X 2 goodness-of-fit, comparative fit index (CFI), and root mean square error of approximation (RMSEA) were used to determine whether an acceptable fit was present.Established threshold values for CFI (≥0.90) (Bentler & Bonett, 1980) and RMSEA (≤0.06) (Hu & Bentler, 1999) were used to determine if an adequate fit was present, with decreases in CFI ≥0.01 (Cheung & Rensvold, 2002) or RMSEA ≥0.015 (Chen, 2007) indicating worse model fit between the confirmatory model relative to exploratory (Chen, 2007;Cheung & Rensvold, 2002).non-significant pvalue from the X 2 were not considered for determining difference in model fit as they are discouraged due to being highly affected by large sample sizes (Tanaka, 1987).If confirmatory factor analysis parameter thresholds were not exceeded, then the initial exploratory factor analysis model would be deemed valid, visually depicted using a path diagram, and used for subsequent measurement equivalence testing.

Measurement equivalence
The established method (Bontempo & Hofer, 2007,Cheung & Rensvold, 2002;Vandenberg & Lance, 2000) of using additive model constraints were used to evaluate the level of measurement equivalence (configural, weak, strong, strict) in model factor structures between the levels of each factor (sex, sport contact classification, concussion history) at baseline and post-concussion, as well as between these two timepoints, as these factors may be considerable confounders in the literature (Iverson et al., 2017).For example, females, individuals participating in high-contact sports, and individuals with a prior concussion have been reported to have greater concussion risk along with differences in concussion assessment performance at pre-injury baseline and post-injury.Evaluating the factor structure between sex, sport contact categories, and concussion history provides insights to whether differences are attributed to these factors, or whether they may be partially attributed to unique constructs underlying and being evaluated from these assessments.Determining measurement equivalence pre-and post-concussion quantitatively evaluates whether the same factor structure and parameter weighting are present, and thus whether we can truly compare these constructs between timepoints or across confounders.
In brief, configural equivalence constrains the same assessment outcomes to load on the same respective factors between comparisons.Weak equivalence adds that the loading values are similar between comparisons, in addition to the configural constraints.Strong equivalence further adds to weak by specifying the assessment outcome thresholds to be equivalent between comparisons.Lastly, strict equivalence adds to the strong equivalence model that the constraint of assessment outcome residuals are invariant between comparisons.Strong equivalence reveals that changes in assessment outcomes are accounted for by the respective factor level (i.e., measured construct) itself rather than the assessment outcome.Strict equivalence indicates the differences in assessment outcomes are entirely due to changes in the respective factors mean (i.e., strong measurement equivalence) and the remaining unaccounted variance is equal between comparisons.Decreases in CFI ≥0.01 (Cheung & Rensvold, 2002) or RMSEA ≥0.015 (Chen, 2007) across the additive measurement equivalence models relative to the referent configural model were used for fit thresholds for worse model fit (i.e.model constraint is not appropriate) between the factor levels (Chen, 2007;Cheung & Rensvold, 2002).

Results
A total of 21,865 student-athletes with no missing data elements at pre-injury baseline across 25 CARE Consortium sites, and 1,537 student-athletes <48 h post-concussion across 25 CARE Consortium sites met inclusion criteria.Both the baseline and post-concussion cohorts had a median age of 18 years old, and the majority were predominantly male, white, participated in contact sports at the Division I level, and did not have a self-reported concussion history prior to study enrollment (Table 1).

Multimodal assessment battery factor analysis
An initial 10-factor exploratory factor model was conducted among the training baseline dataset with all assessment outcomes included.Insufficient factor loading was observed for the following items, and were thus removed from the model: SCAT: Trouble falling asleep, neck pain, nausea or vomiting, Sensitivity to light, Sensitivity to noise, Confusion; BSI-18: "Feeling so restless you couldn't sit still", "Feeling tense or keyed up", "Thoughts of ending your life", "Feeling fearful", "nervousness or shakiness inside", "Suddenly scared for no reason", "Spells of terror or panic"; BESS: firm and foam double limb stance; SAC: all outcomes from SAC.Following exclusion of these variables, a 10-factor model with sufficient loading across all items accounted for 46.9% of the total variance was established (Figure 1; Supplementary Table 1).The 10-factor model resulted in the following factors with associated explained variance: depression (6.7%), emotional (6.1%), somatic (5.5%), fatigue (4.9%), consciousness clouding (4.8%), vestibulo-ocular (3.9%), cognitive (3.8%), headache (3.8%), neurocognition (3.7%), and postural stability (3.6%).Individual item uniqueness and Hoffman's index of complexity are presented in Supplementary Table 1.
Confirmatory factor analyses were conducted to compare the 10-factor baseline model derived from the training dataset to the baseline testing dataset and post-concussion (Table 2).no meaningful fit differences in the 10-factor model fit parameters were observed between the baseline training dataset relative to the testing dataset (ΔRMSEA = 0.002, ΔTLI=-0.003).A potential meaningful fit difference between baseline and post-concussion datasets was observed based on the CFI threshold being crossed (ΔCFI=-0.011)and demonstrating a marginal 1.1% worse fit at post-concussion relative to baseline.

Measurement equivalence across factors
Measurement equivalence via stepwise model constraints across sex, sport contact type, and concussion history at baseline and post-concussion separately and between these timepoints are presented in Table 3. Overall strong measurement equivalence was observed for sex and sport contact type at baseline, and between the two     timepoints.Strict measurement equivalence was observed at baseline for concussion history, and at post-concussion for sex, sport contact type, and concussion history.no comparisons resulted in measurement equivalence worse than strong (Table 3).

VOMS subset factor analysis
An initial 11-factor exploratory factor model was applied to the subset of baseline data with VOMS also completed (n = 5,530 of the original sample).Insufficient factor loading was present among the same aforementioned items, along with the BSI-18: "Feeling of worthlessness".The 11-factor model displayed sufficient loading across all items and accounted for 54.2% total variance (RMSEA = 0.033, 90% CI: 0.032-0.035;TLI = 0.963).Confirmatory factor analysis compared a VOMS-included 11-factor model (variance explained = 53.3%) to a VOMS-included 10-factor model, with the two models not fitting different from each other (11 vs 10: ΔCFI = 0.009, ΔRMSEA=-0.004)and both models demonstrating VOMS outcomes all loading onto a single factor.Thus, the 10-factor VOMS-included model was used due to similar fit coupled with reduced factor quantity (Supplementary Figure 3).The 10-factor VOMS-included model factors and associated explained variance were overall similar to the original 10-factor model (Figure 1), with the exception of the original Consciousness Clouding and Fatigue factors merging into one (Fatigue-Clouding; 5.2% variance explained) and the VOMS outcomes loading as one of the distinct factors (Symptom Provocation; 16.0% variance explained).

Discussion
Our study comprehensively analyzed components of the multifactorial concussion assessment battery to understand overarching and cross-assessment latent factors at play among healthy and concussed collegiate student-athletes.Our findings expand upon prior work by Kissinger-Knox et al. Kissinger-Knox et al., Kissinger-Knox et al., (2021) by identifying and confirming a robust, 10-factor latent factor model consisting of: depression, somatic, vestibulo-ocular, headache, postural stability, neurocognition, emotional, fatigue, cognitive, and consciousness clouding.Further, measurement equivalence was determined across time and between sex, sport contact classification, and concussion history.Our findings ultimately provide awareness to the heterogeneity of concussion, multifactorial factor structure and potential redundancy at play when assessing collegiate student-athletes, and provides a framework for future research to simultaneously consider reducing overlap in assessment items without compromising diagnostic accuracy.Important points for consideration were noted when interpreting the 10-factor model structure.The 10-factor model resulted in numerous items across assessments being excluded from the model.Specifically, 7-items from the SCAT symptom checklist, 6-items from the BSI-18, 2-items from BESS, and all items from SAC were not included in the model.Item exclusion occurred because our a priori loading guidelines were not met, with one reason being the specific item did not explain any additional, unique variance beyond that already accounted for by other assessment outcomes.
Clinicians may therefore be able to forego using these excluded assessment items when the others are used as they do not assess the observed 10-factor constructs.Further, moderate to high correlations between latent factors were observed between and within latent factors derived from BSI-18 and SCAT (Figure 1).Therefore, unique factor structures are present, but with varying inter-factor relationships worth considering among the 10-factor model.Prior work has identified some symptom presentations are dependent upon others presenting in collegiate athletes (Chandran et al., 2022), and thus the inter-factor correlations further highlight the symptom presentation intricacies and synchronies at play.Though overlap existed among the subjective BSI-18 and SCAT items, none of their derived factors correlated with the postural stability or neurocognition constructs derived from BESS and ImPACT, respectively.Thus our findings further highlight subjective and objective function are not aligned, and clinicians should continue providing thorough evaluations comprised of both elements.
Commonalities in factor constructs present when comparing applicable components of our 10-factor model to the prior factor analysis model (Brett et al., 2020).Brett et al. used data from the CARE Consortium between 2014 and 2018 to conduct exploratory and confirmatory models solely on the SCAT symptom checklist, with a 7-factor, bifactor model comprised of general symptoms, headache, vestibulo-ocular, sensory, cognitive, fatigue, and emotional symptoms observed (Brett et al., 2020).Our present study took a further step by examining the factor structure of multiple assessments in the CARE Consortium from 2014 to 2020.Five of the six SCAT symptom-based factors between prior work (Brett et al., 2020) and our present study are alike in the underlying items contributing to each factor, with the exception of the symptoms we removed from our model due to insufficient loading.Thus, current evidence using large sample size data across numerous institutions indicates examining SCAT symptoms within these domains, along with other assessments in the multifactorial concussion battery help provide a holistic examination for collegiate student-athletes.
Determining measurement equivalence is an important analytic procedure to understand if an assessment measures the same construct between groups or across time, and thus inform clinicians if examining the direct underlying factor differences is an acceptable practice (Liu et al., 2017).Prior work from Brett et al. also examined measurement equivalence, and our present study expands on this prior work by examining the multifactorial concussion assessment battery.Brett et al. interestingly did not observe measurement equivalence over time and differs from our present findings.We identified strong measurement equivalence for the 10-factor model between pre-injury baseline and <48-h post-concussion, along with strong to strict measurement equivalence for sex, sport contact classification, and concussion history at both timepoints separately (Table 3).Differences between prior work (Brett et al., 2020) and ours are likely attributed to 1) our modeling considering more assessment outcomes and thus constructs than prior work only examining the SCAT checklist, or 2) we excluded and dropped items that not meet our a priori loading criteria.Our cumulative measurement equivalence findings indicate assessment outcomes changes among these common factors and pre-and post-concussion are accounted for by the identified constructs.Thus, our findings provide assurances to clinicians that the same multimodal assessment battery can effectively be used and interpreted before and after a concussion, and between sex, concussion history, or sport contact status levels.
The exploratory model incorporating VOMS (Supplementary Figure 3) with the multifactorial assessment battery revealed an overall similar factor structure as the original 10-factor model (Figure 1).The main difference surrounds VOMS loading as its own unique factor (symptom provocation) and resulting in the original fatigue and consciousness clouding factors to converge into a single fatigue factor.Though VOMS items loaded onto the symptom provocation factor, there was high redundancy among those items as exemplified by factor loadings ranging from 0.93-1.00.The symptom provocation factor had weak to low correlations between other latent factors.Our findings and prior work (Kissinger-Knox et al., 2021) therefore indicate that symptom provocation via VOMS may be a relatively unique latent factor for the multifactorial concussion assessment, but within the symptom provocation factor, the underlying VOMS subscores were heavily redundant with each other at pre-injury baseline.Clinicians can use these findings to incorporate a more succinct VOMS assessment to examine this construct, and is further supported by a recent study (Ferris et al., 2022) indicating a modified VOMS is sufficient for concussion diagnostics.

Limitations
Our study examined a collegiate athlete sample and these findings may not be generalizable to other populations, such as high-school athletics, as well as individuals with disorders excluded from analysis (e.g.learning or psychological disorders).All factor models used and reported were limited to the assessment and items used to evaluate them, and therefore may result in a different structure in the presence or absence of other assessments.As mentioned, the factor labels used throughout the manuscript were assigned by the author team based upon the items within a given factor to succinctly describe the findings observed, and is consistent with factor analysis approaches (Karr & Iverson, 2022;Kissinger-Knox et al., 2021).This is noteworthy as the factor labels should not be interpreted as concrete, but rather the underlying factor structure and loadings of these items (i.e. the labels could change, but the factor items and loadings would remain).Additionally, the factor analysis items were presumably administered in a time-dependent order since they are part of already existing assessments typically performed in stepwise order.Therefore, our findings may be biased towards the inter-and intra-assessment administration order and their associated fatigue and effort effects (Lempke et al., 2022).Further, the models evaluated were developed from the baseline data to then evaluate model of post-concussion data relative to baseline, and may have biased the overall constructs identified.Lastly, it is important to note our findings do not consider the items effectively loading or their latent factors in context to the assessment diagnostics previously established (Broglio et al., 2019;Garcia et al., 2020).Therefore, researchers and clinicians should not consider altering these assessments until future work can incorporate these latent factors with diagnostic modeling to ensure safe and efficacious patient care.

Conclusions
Our multi-site findings established a 10-factor latent factor model robust to the effects of concussion and common confounding factors consisting of: depression, somatic, vestibulo-ocular, headache, postural stability, neurocognition, emotional, fatigue, cognitive, and consciousness clouding.Strong to strict measurement equivalence was observed and indicates the same constructs are appropriately being assessed across time or between common founders.Our findings identify the multifactorial factor structure and potential redundancy at play when using concussion assessments among collegiate student-athletes.Clinicians should aim to implement assessments that measure components from each identified latent factor to ensure comprehensive examination before and after a concussion.Future research should aim to simultaneously determine how removing potentially superfluous items also impacts diagnostic accuracy to provide researchers, clinicians, and ultimately patients with the strongest and time-efficient multifactorial concussion assessment battery possible.
no. W81XWH-14-2-0151.Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the Department of Defense (Defense Health Program funds).

Figure 1
Figure 1 legend.Boxes represent each individual assessment and outcome from it.Black arrows with numeric value above represent the factor loading onto each latent factor.ovals represent the labeled latent factor with the variance the cumulative factor explained.Connecting lines with color and line type weighting indicate the strength of the between-factor correlations.numerous outcomes from assessments, and all standardized assessment of concussion (saC) subscores, did not meaningfully load in the model with the presence of the other included outcomes (i.e., did not explain additional unique variance beyond what was already accounted for), and were therefore excluded.excluded outcomes from sCaT were: Trouble falling asleep, neck pain, nausea or vomiting, sensitivity to light, sensitivity to noise, and Confusion.excluded Bsi-18 outcomes: restless, Tense, end life, Fearful, nervous, scared, panic.excluded Bess outcomes: firm and foam double-limb stance scores.abbreviations: Bsi-18 = 18-item brief symptom inventory; sCaT = sport concussion assessment tool; Bess = Balance error scoring system; impaCT = immediate post-Concussion assessment and Cognitive Test.
rows indicate the equivalence level determined based upon the criteria statistics also bolded.abbreviations: df = degrees of freedom; rMsea = root mean square error of approximation; Ci = confidence interval; Tli = Tucker-lewis index; CFi = Comparative Fit index.

Table 2 .
Confirmatory factor analysis -overall model fit indices.

Table 3 .
Measurement equivalence at baseline and post-concussion for 10-factor model.