Validation of the work-ability support scale in individuals seeking to return to work after severe acquired brain injury

Abstract Purpose To assess the reliability and validity of the work-ability support scale (WSS) in a severe traumatic/acquired brain injury (TBI/ABI) population seeking to return to work (RTW). Materials and methods One hundred forty-four clients were enrolled in a vocational rehabilitation (VR) intervention trial through the Brain Injury Rehabilitation Program in New South Wales, Australia. Each client’s primary brain injury clinician and VR provider completed the WSS pre- and post-intervention. Validating measures assessing dysexecutive behavior, disability, participation, and work instability were completed. Several aspects of reliability and validity were evaluated. Results Internal consistency was excellent for Part A (Cronbach’s αs > 0.9) but unacceptably low to questionable for Part B (αs < 0.6). Inter-rater reliability between clinicians and VR providers was generally fair to moderate for Part A (κw < 0.6) and worse for Part B (κw < 0.5), with both slightly improving at post-intervention. Strong support was found for predictive and convergent validity, but not divergent validity. Confirmatory factor analysis indicated a poor fit for Part A, whereas most Part B fit indices met criteria. Conclusions The WSS can play a useful role in assessing return to work (RTW) potential, planning and evaluation after severe TBI/ABI. Training could improve consistency of administration among staff working across health and VR service sectors. IMPLICATIONS FOR REHABILITATION The work-ability support scale (WSS) has potential as a screening tool in assisting return to work (RTW) assessment, planning, and evaluation, following severe traumatic brain injury and acquired brain injury. Employment success following a RTW intervention was predicted by the initial WSS Part A total score. The low inter-rater reliability between brain injury clinicians in health settings and vocational rehabilitation providers suggests that training will be important to improve consistency in WSS administration across service sectors.


Introduction
Return to work (RTW) is a high-priority rehabilitation goal for many individuals recovering from severe traumatic (TBI) or other forms of acquired brain injury (ABI; e.g., stroke, hypoxic-ischemic injury, brain malignancy).Successful RTW is associated with many positive outcomes [1][2][3][4][5][6].Despite the importance of RTW, few instruments have been developed to assist with assessing, planning and evaluating vocational re-entry after ABI to facilitate the vocational rehabilitation (VR) process [7].
Work-ability is broadly defined as an individual's functional capacity to meet the requirements of their job adequately and safely [8].The assessment of work-ability, as part of RTW planning, is complex given the multidimensional nature of VR [9] as well as the concept being inherently intertwined with the specific requirements of the job the individual is planning to undertake.In their review of the literature on work-ability within the VR context, Fadyl et al. [10] identified six contributing categories: physical, psychological, cognitive, social/behavioral, workplace factors, and factors outside the workplace.They identified and evaluated 10 measures of work-ability but found that none of them comprehensively assessed each of these categories.In response to the lack of an appropriate work-ability measure, the authors created the work-ability support scale (WSS).
The WSS was developed in the context of an international collaboration between VR researchers in New Zealand (NZ) and the United Kingdom (UK) [11] and refined over multiple phases of data collection through an iterative process with input from multiple key stakeholder groups.The WSS is intended to be completed by a health professional in the context of a vocational assessment, to assess a client's ability to work as well as their support needs within a current or proposed work environment following acquired disability [12].The client's current or estimated level of work performance within specific functional domains is rated through direct observation and interview with employers and/or colleagues.The WSS consists of two parts: Part A comprises 16 items covering three domains of work-related function, and Part B comprises 12 contextual factors.
Psychometric properties of the WSS were initially established by Turner-Stokes et al. [12] who reported on scoring accuracy against an agreed-upon reference standard for a series of case vignettes, as well as inter-rater and intra-rater reliability for individual and teams of occupational therapists working in both community-and hospital-based VR services in NZ and the UK.Scoring accuracy based on the case vignettes was found to be acceptable, with intra-class correlation coefficients (ICCs) ranging from 0.95-0.96for Part A and 0.78-0.84for Part B. In addition, item-level analysis indicated substantial to almost perfect agreement for all Part A items (weighted Kappa (κ w ) = 0.71-0.94)and 8 out of 12 Part B items (κ w = 0.61-0.91).Inter-rater reliability and one month intra-rater reliability were similarly high.
Beyond this initial study, investigation of the reliability and validity of the WSS has been very limited in neurological populations.Guo et al. [13] translated and validated a Chinese version of the WSS among community-based young and middle-aged stroke survivors.The study found acceptable to excellent internal consistency (Part A: α = 0.93; Part B: α = 0.76), adequate inter-rater reliability (κ > 0.60) for all Part A items and 10 out of 12 Part B items, and acceptable content validity.They also indicated support for construct validity in confirmatory factor analysis (CFA) of the original three-factor structures of Parts A and B, following the addition of several correlated error terms.However, the sample was not actively engaged in VR, and only 11% had successfully resumed working.
To date, the WSS has not been validated in a severe TBI sample.An opportunity to test the psychometrics of the WSS was presented during a controlled trial evaluating the efficacy of a novel VR program (titled the Vocational Intervention Program (VIP)) in facilitating return to competitive employment for people with severe TBI or ABI.The VIP1.0 trial was conducted in New South Wales (NSW), Australia [14].The WSS was one of the measures employed, alongside an indicator of VR outcome and other validated measures of impairment and disability.
The VIP involved establishing partnerships between six community-based brain injury rehabilitation services (NSW Health) and three private VR service providers.The partnership model sought to address the limited knowledge that VR providers have about the needs of people with TBI or ABI and improve coordination between health-based rehabilitation and private VR services [5,15].Brain injury-specific resources and mentoring about RTW after brain injury were provided to both the VR providers and health-based brain injury clinicians to increase their knowledge and skills.Employing a three-armed design, the VIP was compared to one existing health-based brain injury VR service (H-VR) as well as matched treatment-as-usual (TAU) controls from another five community-based brain injury rehabilitation services (NSW Health), who did not receive the specialized VR intervention.
As part of the trial, the brain injury clinicians and VR providers independently rated clients participating in the trial on the WSS pre-and post-VR intervention, providing an opportunity to test different aspects of reliability, namely (i) internal consistency and (ii) inter-rater reliability (between clinicians and VR providers) at pre-intervention; and possible improvement in inter-rater reliability from pre-intervention to post-intervention.Using the RTW indicator from the trial, (iii) predictive validity was tested, examining the association between pre-intervention WSS ratings and successful vocational re-entry post-intervention.The other validated measures provided the opportunity to evaluate (iv) convergent and divergent validity.Finally, (v) construct validity was also tested through a CFA examining whether the original three-factor structures adequately fitted the Part A and B data [11].

Sample
This secondary analysis of the VIP1.0 efficacy trial dataset [14] included 144 clients with severe TBI or ABI from all 12 adult sites of the NSW Brain Injury Rehabilitation Program (BIRP) [16].The study was conducted between July 2015 and September 2018.
The VIP arm trialed the novel partnership model described above at six BIRP sites in conjunction with three private VR providers, appointed by the study sponsor via a tender process.Potential clients for the VIP arm were identified through systematic reviews of service caseloads.Inclusion criteria were: (1) primary diagnosis of severe TBI (i.e., PTA duration > 24 h) or severe ABI (e.g., stroke, hypoxia, brain malignancy, or infection); (2) active rehabilitation goal (to return to pre-injury employment, or otherwise agreeable to undertaking an unpaid work training placement); and (3) independent in self-care and mobilization and living in stable accommodation.Exclusion criteria were: (4) behavior considered inappropriate for a workplace or training environment; and (5) current alcohol/substance use or mental health disorder that could compromise program engagement.
The first comparator arm (H-VR) was based at a single BIRP site, the Head2Work VR service at Liverpool Hospital Brain Injury Unit [17].Clients were a consecutive series of referrals to Head2Work recruited concurrently with the VIP arm that also met the above eligibility criteria.The second comparator arm (TAU) comprised clients from the other five BIRP sites who were matched to TBI clients enrolled in the VIP arm, using demographic and injury variables.Matching to the ABI clients in the VIP arm was not done due to the difficulties associated with accurately matching injury severity in this population.

Intervention
To help understand the context for testing the psychometric properties of the WSS, a brief background of the trial itself is provided.Clients in the VIP and H-VR arms received individually-tailored VR interventions [14].For those seeking to return to their pre-injury employer, possible interventions included functional and workplace assessments, employer education, development and implementation of graded RTW plans, equipment prescription and strategy development, on-site reviews, and support to the client and employer.The intervention lasted up to 6 months.For those without an identifiable employer, or for whom return to their pre-injury employer was not possible, VR providers sourced a 12-week work training placement for the client as an initial step towards returning to the workforce.Specific activities included completing vocational assessments to establish RTW goals, identifying a suitable employer and negotiating work training placement conditions, providing on-site training and support, equipment provision and/or training needs, monitoring and upgrading the work training placement, and planning for onward VR.Further detail about the VIP can be found in the following reports [14,15].
The main distinction between the comparator H-VR and the VIP arm was that the Head2Work team had specialist experience and exclusively worked with TBI and ABI clients, whereas the private VR providers worked with a much broader range of disability groups.Clients in the TAU comparator group were undertaking outpatient brain injury rehabilitation, varying in scope from case management only up to a comprehensive program that involved case management, medical management, and multidisciplinary allied health therapy.While some may have received informal support for employment from their case manager, no formal VR intervention was offered to them.

Raters
Clients in the VIP arm were rated by both a brain injury clinician and a VR provider, whereas clients in the H-VR and TAU arms, by definition, only had a clinician rating.This meant that psychometric analyses focusing on clinician-rated WSS scores were able to utilize a larger subset of the original VIP1.0 sample (comprising clients from all three trial arms), while analyses focusing on VR provider ratings involved the VIP arm only (see Figure 1 for a more detailed breakdown).
In total, 15 clinicians from the 12 community rehabilitation units of the NSW BIRP and seven VR providers independently rated clients on the WSS.The brain injury clinicians all had allied health backgrounds and substantial experience in working with TBI or ABI.The VR providers came from a similar range of occupational backgrounds and were experienced in delivering RTW programs for people with disabilities but with limited experience in working with brain injured clients.All brain injury clinicians and VR providers participating in the trial underwent a 1 h training session in the administration of the WSS and other validating measures.

Work-ability support scale
The WSS Part A comprises 16 items covering three domains (physical/environmental: items 1-5; thinking/communication: items 6-10; and social/behavioral: items 11-16) as applied to the work context.Each item is assigned a score ranging from 1 (indicating "completely unable/constant support required") to 7 (indicating "completely independent") where higher scores indicate lower need for support and/or greater work productivity.Items can be summed to produce a total score ranging from 16 to 112 and domain scores ranging from 5 to 35 for the physical/environmental and thinking/communication domains, or 6-42 for the social/ behavioral domain [11,12].
Part B consists of 12 items covering contextual factors that can influence RTW.The items can be grouped into three domains (personal factors: items 1-4; environmental factors within the workplace: items 5-8; and barriers to RTW: items 9-12).Each item is scored on a 3-point scale reflecting the overall effect of the contextual factor (+1 = positive effect, 0 = neutral/unknown effect, −1 = negative effect), with items 9-11 (reflecting different barriers) reverse-scored.While it is possible to produce domain scores by summing individual items, interpretation at the item level is generally considered to be more informative given that the presence of different contextual factors can have varied impacts on work-ability.

RTW indicator and other validating measures
Demographic and injury data were collected on enrolment.Four validating measures with established reliability and validity in ABI populations [18,19] were also selected, as well as a RTW outcome indicator (employed vs. not employed).Three of these measures (DEX, DRS, and SPRS-2) were selected as they were clinician-rated, brief to complete, and commonly used in the TBI field to measure the necessary validating constructs.The TBI-WIS was selected to also gauge client perspectives, as it is the only self-report measure within the field that assesses client self-evaluation of their performance within the workplace.
Dysexecutive Questionnaire (DEX; Informant Version) [20] measures behavioral changes and everyday problems associated with frontal lobe dysfunction.It consists of 20 items rated on a 5-point scale (0 = never, 4 = very often) that are summed to produce a total score out of 80, with higher scores indicating greater levels of executive functioning deficit.
Disability Rating Scale (DRS) [21] is an 8-item outcome measure of disability following brain injury.Each item is rated on a scale from 0 to 3 or 0 to 5. For this study, item 8 (the "employability" item) was examined separately from the other items to avoid confounding the primary trial outcome (RTW) within the instrument.This made 26 the maximum possible total score.Sydney Psychosocial Reintegration Scale-2 (SPRS-2; Clinician Form B) [22] is a 12-item measure of community participation following brain injury.Items are grouped into three domains (Occupation, Relationships, and Independent Living; each with four items), rated on a 5-point scale (0 = very poor, 4 = very good), and summed to produce scores ranging from 0 to 16 for each domain and a total score from 0 to 48, with higher scores reflecting greater levels of participation.
TBI-Work Instability Scale (TBI-WIS) [23] is a 36-item self-report screening tool measuring work instability resulting from a mismatch between functional ability and work tasks.Items cover perceptions of cognitive and emotional difficulties faced at work as well as workplace accommodations being made by colleagues and employers.Each item involves a binary response (1 = true/0 = not true) that is summed to produce a total score ranging from 0 to 36.

Procedure
Ethical approval for the VIP arm of the study was granted by the Northern Sydney Local Health District Human Research Ethics Committee (NSLHD HREC; RESP/15/161), with clients providing written informed consent.Additional ethical approval (NSLHD HREC (RESP/15/188) was subsequently granted to waive consent for the other two arms (H-VR and TAU) as the measures were rated by the primary brain injury clinician based on their clinical knowledge of the client, without needing to contact the client.
Clinician and VR provider ratings on the WSS and other validating measures were collected at two-time points (pre-intervention and post-intervention) except for the TBI-WIS, which was only completed by the clients enrolled in the VIP arm post-intervention.The post-intervention rating occurred up to 6 months after enrolment if the client was attempting to return to their pre-injury employer, or around 12 weeks after enrolment if they were seeking new employment.For TAU clients, ratings were obtained from the clinician at the same time as their matched TBI client in the VIP arm.Ratings were completed in a secure on-line Excel database.

Statistical analysis
Data were aggregated into a single file.Statistical analyses were conducted primarily using JMP v15, with SAS® Studio v3.8, SPSS v26, and AMOS v26 used for specific analyses.Statistical significance was set at p < .05for all relevant comparisons.Unless otherwise specified, analyses focused on pre-intervention, clinician-rated WSS scores to maximize the available sample size and prevent interference from the VR intervention.
Descriptive statistics were generated to characterize the demographic and injury-related profile of the sample.Mann-Whitney U tests and chi-square tests of independence tested for possible differences between the VIP arm and the other two VIP1.0 trial arms pooled together (i.e., H-VR + TAU).Wilcoxon signed-rank tests were used to detect changes in responsiveness of the WSS from pre-to post-intervention.
Psychometrics of the WSS Parts A and B were evaluated using several methods in line with the study objectives: i.
Internal consistency of the total scores and the domain scores was assessed using Cronbach's α coefficient.ii.Inter-rater agreement between ratings on each of the 16 WSS Part A and 12 Part B items by the clinician and VR provider involved with each client in the VIP arm was assessed by Fleiss-Cohen κ w [24] with the strength of agreement interpreted according to Landis and Koch [25]: poor < 0.00; slight ≥ 0.01 < 0.20; fair ≥ 0.21 < 0.40; moderate ≥ 0.41 < 0.60; substantial ≥ 0.61 < 0.80; almost perfect > 0.81.Similarly, inter-rater consistency for the Part A total and domain scores was assessed using intra-class correlation coefficients (ICCs; one-way random-effects model, single rater) [26,27] with the strength of agreement interpreted according to Koo and Li [26]: poor < 0.50; moderate ≥ 0.51 < 0.75; good ≥ 0.76 < 0.90; excellent ≥ 0.91.Inter-rater reliability of Part B domain scores was not examined in line with the conventions of previous WSS validation papers [12,13].Next, 95% confidence intervals (CIs) were calculated for each κ w and ICC.iii.Predictive validity was assessed by point-biserial correlations of the pre-intervention WSS Part A total and domain scores with the dichotomous RTW indicator (employed/ not employed) at post-intervention.iv.Convergent and divergent (construct) validity were separately assessed at pre-and post-intervention by spearman's ρ correlations between the WSS Part A total score and the validating measures.We focused on the relationship of Part A with these measures as they assess cognitive impairment, functional capacity, and participation, which are more closely linked to work-related functions than the Part B contextual factors.The correlations of the Part A total score with the TBI-WIS total score at post-intervention (for VIP arm clients) was also evaluated.v. Construct validity was further tested by conducting separate CFAs of the WSS Parts A and B. For Part A, three models were examined.Model 1 evaluated the original three-factor structure proposed by Fadyl et al. [11] with the assumption that the three latent variables representing the domain scores were correlated.Model 2 tested the final model reported by Guo et al. [13] in a sample of stroke survivors.It was nearly identical to Model 1, using the same item-factor loadings but with an additional assumption of correlated errors involving five of the items.Model 3 fits a three-factor solution derived from an exploratory factor analysis (EFA) of the pre-intervention, clinician-rated WSS dataset (n = 144).The solution resulted in a subtle change to the item loadings, with items 1-4 loading onto a physical/environmental domain, items 5-8 loading onto a thinking/cognitive domain, and items 9-16 loading onto a social/communication/behavior domain.
For Part B, only two models were examined.Model 1 evaluated the original three-factor structure proposed by Fadyl et al. [11] with the assumption that the three latent variables representing the domain scores were correlated.Model 2 tested the final model reported by Guo et al. [13] which had the same item-factor loadings but with the additional assumptions of correlated errors among four items.
Each model was compared against common criteria of model fit and parsimony, including absolute fit indices: χ 2 (overall model fit, nonsignificant), root mean square of approximation (RMSEA) with 90%CI (<0.08), and standardized root mean square error (SRMR) (<0.05 good); incremental fit indices: Bentler's comparative fit index (CFI), normed fit index (NFI), and Tucker-Lewis index (TLI) (for each, >0.90); and parsimony fit indices: Akaike's information criterion (AIC) [28,29].For Model 1 (Parts A and B) and Model 3 (Part A), modification indices were inspected to determine if the model fit could be improved by allowing additional items to covary.

Sample characteristics and WSS scores
Sample demographic and injury characteristics are displayed in Table 1.The 56 clients from the VIP arm for whom both the clinician and VR provider rated the WSS generally did not differ from the rest of the sample (who only had clinicians rating the WSS) aside from consisting of a slightly higher proportion of ABI cases relative to TBI (25% vs. 10%; p = .02).This was an artefact of the original study design, as all clients in the TAU arm had sustained a TBI and were each selected specifically to match a TBI client in the VIP arm, whereas the VIP and H-VR arms each comprised a sizeable minority of ABI cases.
Scores were rated close to ceiling on the WSS Part A (Table 2), with the median total pre-intervention clinician-rated score (94/112) corresponding to modified independence (scores of 6/7) across most items.There was a small further improvement from pre-to post-intervention, with the median total score rising to 102 out of 112 (p < .0001).At the domain level, the median thinking and communication domain score declined slightly from pre-to post-intervention (p < .0001)whereas the other two domain scores improved slightly over the course of the intervention (ps < .002).A similar pattern emerged for VR provider-rated scores, except that the median thinking and communication domain score increased slightly from pre-to post-intervention in line with the other domain scores (p < .0001).
Greater differences between clinician and VR provider ratings were evident for Part B (Table 3).Specifically, at the pre-intervention time point, clinicians generally rated personal factors as having a positive impact on RTW more often than environmental factors, although the prevalence of negative ratings was still relatively low across these items.There was reasonable concordance between clinicians and VR providers in the frequency of positive ratings for personal factors items.However, for environmental factors items, VR providers made positive ratings for a higher proportion of clients than clinicians did.For barriers to RTW, the frequency of positive, neutral, and negative ratings was more mixed but relatively consistent across rater types.

Internal consistency
Cronbach's α for the clinician-rated WSS Part A total score was 0.94 at pre-intervention (n = 144) and 0.97 at post-intervention (n = 99), while the corresponding α coefficients for VR provider ratings were 0.93 (n = 56) and 0.95 (n = 47), indicating excellent internal consistency overall.Cronbach's α for clinician ratings of the domain scores ranged from >0.70 to >0.90 across the two time points (physical/environmental: 0.77-0.88;thinking and communication: 0.91-0.96;social/behavioral: 0.94-0.96).Similar α coefficients were obtained for the corresponding VR provider ratings, which ranged from 0.76-0.79,0.93-0.95,and 0.90-0.92,respectively.Overall, these results reflect acceptable to excellent internal consistency for domain scores.
Internal consistency was much lower for Part B. Specifically, Cronbach's α for the clinician-rated total score was 0.53 at pre-intervention and 0.60 at post-intervention, compared to −0.09 and 0.42 for VR provider ratings, indicating unacceptably low to questionable internal consistency.There was considerable variability at the domain level, with α coefficients for clinician ratings ranging from 0.08-0.60 for personal factors, 0.68-0.77for environmental factors within the workplace, and −0.07 to 0.12 for barriers to RTW.The equivalent α coefficients for VR provider ratings were 0.29-0.53,0.59-0.77,and −1.49 to −1.30.The negative coefficients for the barriers to RTW domain appeared to reflect the strong negative correlation of item 12 (other barriers) with items 9-11.

Inter-rater reliability
At pre-intervention, Fleiss-Cohen κ w coefficients for WSS Part A items (Table 4) indicated only fair to moderate agreement on 14 out of the 16 item scores, and only item 14 (interpersonal (colleagues)) achieved substantial levels of agreement.There was moderate agreement for the total score (ICC = 0.54) and the physical/environmental (ICC = 0.53) and social/behavioral domain scores (ICC = 0.55).By contrast, agreement was poor for the thinking and communication domain (ICC = 0.41) (Supporting Information Table S1).Inter-rater reliability was worse for Part B,    6) note.VR: vocational rehabilitation.Domain scores presented as Med (iQR); item scores presented as n (%).Percentages may not add up to 100% due to rounding error.items 9-11 are presented with reversed scoring applied (i.e., "yes (or probably)" response coded as "-1" and "no (or probably not)" response coded as "+1") to maintain consistency with scoring for the rest of the scale.Clinician-rated dataset = aggregated ViP + h-VR + taU arms.VR provider-rated dataset = ViP arm only.  .n = 56 paired ratings from ViP arm only.strength of agreement interpreted according to landis and Koch [25]: "poor" < 0.00; "slight" ≥ 0.01 < 0.20; "fair" ≥ 0.21 < 0.40; "moderate" ≥ 0.41 < 0.60; "substantial" ≥ 0.61 < 0.80; "almost perfect" > 0.81.a κ w could not be calculated for item b1 due to the VR providers giving all 56 clients a positive rating.
with poor agreement evident on 2 out of the 12 items, slight agreement on 6 items, fair agreement on 2 items, and moderate agreement on item 4 (personal support) only.A weighted kappa coefficient could not be calculated for item 1 due to the VR providers giving all 56 clients a positive rating.Inter-rater reliability was slightly better at post-intervention for both parts of the WSS (Table 5), although it remained less than acceptable.For Part A, at the item level, the range of κ w coefficients remained similar to pre-intervention, but there was a general trend towards slight improvement overall, with substantial agreement observed on more items (7/16).Reliability was slightly higher while remaining within the moderate agreement level for the Part A total score (ICC = 0.67) and the physical/environmental (ICC = 0.65) and social/behavioral domain scores (ICC = 0.64).However, agreement on the thinking and communication domain score was much worse (ICC = 0.09), due in part to the much lower absolute and partial agreement among its constituent items than was seen for the other two domains (Supporting Information Table S1).
Agreement at post-intervention was also slightly better for Part B, with fair (4 items) or moderate agreement (2 items) observed for half of the items.Of note, the 95%CIs derived from this analysis were quite large, often exceeding 0.3-0.4 for the total and domain scores and 0.5 for the item scores.

Predictive validity
Supportive evidence was found for the WSS Part A total score predicting RTW outcome at the end of the VR intervention.There was a medium-sized positive correlation between the pre-intervention Part A total score and post-intervention employment status (clinician: r pb = 0.38, p < .0001,n = 107; VR provider: r pb = 0.35, p = .02,n = 47).This was largely underpinned by the thinking and communication (clinician: r pb = 0.38, p < .0001;VR provider r pb = 0.32, p = .03)and social/behavioral domain scores (clinician: r pb = 0.39, p < .0001;VR provider: r pb = 0.49, p = .0005).By contrast, the small positive association between the clinician-rated physical/environmental domain score and post-intervention employment status was only marginally significant (r pb = 0.19, p = .051),while the corresponding VR provider-rated domain score did not predict employment outcome at all (r pb = 0.07, p = .66).

Convergent validity
Pre-intervention WSS Part A total scores (clinician and VR provider) were negatively correlated with the DEX total score, DRS employment item, and DRS total score minus the employment item (clinician-rated only).They were also positively correlated with the SPRS-2 total score and the occupation domain score (Table 6).
Post-intervention, most of these relationships continued to hold and increase in strength (reaching medium to large effect sizes) (Table 6).In particular, the VR provider-rated WSS Part A total score was far more strongly correlated with the DRS total score minus the employment item and reached statistical significance at this time point.The WSS Part A total score was negatively correlated with the TBI-WIS.Overall, these results provide solid evidence of convergent validity for Part A.

Divergent validity
Unexpectedly, the pre-intervention WSS Part A total score was positively correlated with the SPRS-2 Relationships and Independent Living domain scores.Post-intervention, correlations between the clinician-rated WSS Part A total score and these two SPRS-2 domains was slightly stronger, as was the relationship between the VR provider-rated WSS Part A total score and the SPRS-2 Independent Living domain.In contrast, the VR provider-rated WSS Part A/SPRS-2 Relationships domain correlation was weaker than at pre-intervention and only marginally significant.

Confirmatory factor analysis
Formal evaluation revealed a high level of multicollinearity amongst WSS Part A items, particularly between items 6-8 and items 13-15, as well as items 5-7 of Part B (Supporting Information Figure S1).In addition, Parts A and B item score distributions were significantly negatively skewed (aside from Part B items 5 and 7), with most demonstrating ceiling effects.Overall, there was a significant departure from multivariate normality for both parts (Part A: z CR = 26.44,p < .0001;Part B: z CR = 14.748, p < .0001).To address this violation of assumptions we initially ran the CFAs on the raw item scores, and then repeated the procedure several times after: (1) log transforming the item scores; (2) using asymptotically distribution free (ADF) estimation; and (3) bootstrapping for model comparisons.However, none of these techniques substantially improved the goodness-of-fit assessment for any models.Therefore, for simplicity of interpretation we present in Table 7 the results of the initial approach based on raw item scores.
A visual representation of the pathways and parameter estimates for each of the Parts A and B models is presented in Supporting Information Figures S2 and S3.Overall, none of the three models based on Part A provided an adequate fit of the data.In the best fitting model (Model 2; Guo et al. [13]), four of the six measures of model fit and parsimony fell outside of recommended criteria (χ 2 , RMSEA, SRMR, and NFI).Making additional adjustments based on inspection of modification indices provided by the AMOS software did not lead to significant improvement of the fit for any models.
In contrast, for Part B the original factor structure provided a reasonable fit following the addition of correlated error terms between items 6/8 (Model 1b), with all measures of fit and parsimony aside from SRMR and NFI falling within recommended criteria.The standardized factor loadings ranged from 0.35 to 1.23 for 9 of the 12 items, with smaller negative loadings (-0.28 to −0.01) obtained for items 3, 4, and 12 (realistic expectations, personal support, and other factors).Correlations between the three factors were small to moderate in size (rs =0.13-0.39).

Discussion
The current study establishes the initial psychometric properties of the WSS for a clinical sample of people with severe TBI or ABI.The study found acceptable levels of internal consistency as well as strong support for several forms of validity (i.e., criterion and construct (convergent) validity) for Part A. However, inter-rater reliability between brain injury clinicians and VR providers was found to be less than acceptable, despite the level of agreement improving at the end of the intervention, and the original three-factor model of the WSS Part A did not provide a good fit of clinician-rated scores obtained prior to the RTW intervention.
The internal consistency for the WSS Part A total and domain scores was strong for both clinician raters and VR providers, consistent with Guo et al. [13].Internal consistency was lower for the   Part B total and domain scores, which was also consistent with the previous report [13].However, a point of contrast is that in this study the internal consistency of Part B was below acceptable levels for both rater types.This may be due in part to use of a three-point scale in Part B, compared to the seven-point scale in Part A.
Inter-rater reliability between clinicians and VR providers was lower than reported in previous studies [12,13].We propose two factors that could account for this finding.First, the raters were all practicing clinicians rating actual clients, compared to research-trained clinical staff doing the ratings [13] or clinicians rating vignettes [12].Second, if there had been paired raters within the clinician group and within the VR provider group, it is likely that the level of inter-rater agreement would have been stronger.
In this study, criterion validity was established for the WSS for the first time, with pre-intervention Part A scores predicting vocational re-entry at the time of VIP completion.The relationship was driven primarily by the thinking and communication and social/behavioral domains, as opposed to the physical/environmental domain.Previous studies have flagged the impact of impairments in these domains as predictive of loss of employment and RTW failure following severe TBI and ABI [30][31][32][33][34][35].
Strong support was found for the convergent validity of the WSS Part A (Part B was not included in this set of analyses, as the validating measures more closely matched the content of Part A).The Part A total score was strongly correlated with the SPRS-2 total score, another common measure of participation, as well as its Occupation domain that directly measures work participation.Higher Part A total scores were also associated with lower ratings of disability (DRS) and dysexecutive behavior (DEX), reinforcing the link between greater injury severity/challenging behavior and reduced likelihood of employment post-injury [35].Furthermore, post-intervention, higher Part A scores were associated with greater work stability (i.e., lower TBI-WIS scores).
Conversely, tests of divergent validity were not supported, with the WSS Part A total score found to have just as strong an association with the SPRS-2 relationships and independent living domain scores as for the occupational domain and total scores.Closer inspection of items within these two domains reveals that there is in fact correspondence with aspects of the work ability model [11,36], particularly in terms of the social/behavioral category.In addition, SPRS-2 items assessing better communication and interpersonal skills, higher levels of social support, and independent access to transport are all recognized as distal factors associated with a successful RTW outcome [32,[37][38][39].
The CFA indicated that the three-factor model of the WSS Part A proposed by Fadyl et al. [11] was a poor fit for the current dataset.Additionally, several statistical issues were identified that are likely to have led to the model fit statistics being considerably underestimated.This included the presence of multicollinearity, the negatively skewed distributions for all item scores, and the less-than-optimal sample size for CFA (n < 200) [40].The fact that ADF estimation and bootstrapping for model comparison methods had negligible impact on the model fit indices lends further support in this regard.The high level of multicollinearity among some Part A items was found both in this study (items 6-8; 13-15) and previously by Guo et al. [13] (items 9-10; 13-15), suggesting some redundancy within the thinking/communication and social/behavioral domains.
The major clinical implications drawn from this study relate to the different applications of the WSS and how this is reflected in its psychometrics.In Guo et al. [13], the WSS was used to measure work-ability and degree of support needs among community-dwelling stroke survivors living in the community for potential future employment.However, relatively few were actively seeking employment during the study (only 11% in work, the rest took sick leave or retired early).Reflecting this, a greater range of scores was observed across WSS Part A items, with scores averaging 1-2 points lower for individual items compared with the current study.In contrast, the current sample represented people actively seeking work or attending work programs which more closely reflects the setting for which the WSS was originally intended [11].Within this context, the WSS can be useful to ensure that a comprehensive workplace assessment is conducted.Furthermore, the findings relating to predictive validity support the use of the WSS to consider work potential in greater depth early in the rehabilitation continuum (e.g., in the inpatient setting) [11].The relatively poorer psychometrics found for Part B is acknowledged by the WSS authors, who have suggested that the Part B contextual factors are likely to always be vulnerable to varying interpretation and might be better used as a checklist as opposed to a component of the measurement tool [12].
The study had a number of limitations.Given that the sample were actively involved in RTW programs, they were not representative of the full spectrum of disability following severe TBI or ABI.If clients with a broader range of injuries had been included, then the ceiling effects may not have been so pronounced.Next, as flagged earlier, the inter-rater reliability may have been stronger had the rater pairs been within rather than across service sectors.Finally, a larger sample size may have resulted in stronger model fit statistics in the CFA.Future research could undertake further validation of the WSS in more diverse rehabilitation populations actively seeking RTW in countries beyond Australia, New Zealand, the United Kingdom, and China.

Figure 1 .
Figure 1.schematic illustrating the subsets of the ViP1.0 trial dataset used for each type of psychometric analysis conducted in this study.note: DeX: dysexecutive questionnaire; DRs: disability rating scale; h-VR: health-based brain injury vocational rehabilitation service; RtW: return to work; sPRs-2: sydney psychosocial reintegration scale -2; taU: treatment-as-usual; tbi-Wis: traumatic brain injury -work instability scale; ViP: vocational rehabilitation program; VR: vocational rehabilitation; Wss: work-ability support scale. the aggregated sample (N = 144) comprised 72 ViP, 33 h-VR, and 39 taU clients with clinician-rated Wss scores at pre-intervention; n = 56 ViP clients were additionally rated on the Wss by their VR provider at pre-intervention.
: dysexecutive questionnaire; DRs: disability rating scale; sPRs-2: sydney psychosocial reintegration scale -2; VR: vocational rehabilitation; tbi-Wis: traumatic brain injury -work instability scale; Wss: work-ability support scale.Convergent and divergent validity was not explored for Part b items as contextual factors were considered to be less related to cognitive impairment, functional capacity, and participation.Clinician-rated dataset = aggregated ViP + h-VR + taU arms.VR provider-rated dataset = ViP arm only.a administered at post-intervention timepoint only.

Table 1 .
sample demographic and injury characteristics.
note.abi: acquired brain injury; Pta: post-traumatic amnesia; taFe: technical and further education; tbi: traumatic brain injury; VR: vocational rehabilitation.Percentages may not add up to 100% due to rounding error.Clinician-rated dataset = aggregated ViP + h-VR + taU arms.Clinician + VR provider-rated dataset = ViP arm only.

Table 2 .
Median (iQR) Wss Part a total, domain, and item scores by rater and time-point.

Table 3 .
Wss Part b domain and item scores by rater and timepoint.

Table 4 .
inter-rater agreement between clinicians and VR providers at pre-intervention.

Table 5 .
inter-rater agreement between clinicians and VR providers at post-intervention.

Table 6 .
Convergent and divergent validity of Wss Part a total score by rater and time-point.