Reliability and responsiveness of the Danish version of The Chelsea Critical Care Physical Assessment tool (CPAx)

ABSTRACT Introduction Measurement instruments are important in clinical practice and research for assessing physical function in critically ill patients in the intensive care unit (ICU). Objective To investigate inter-rater reliability and responsiveness of the Danish version of the CPAx (CPAx-D). Method Critically ill patients from three Danish ICUs were included. Patients were assessed with CPAx-D by two blinded testers during a regular physiotherapy session. Follow-up tests were performed in patients who stayed in the ICU for more than 24 hours, were not transferred to another hospital or received palliative care. Floor and ceiling effects were examined in all assessments.Results For the reliability analysis 66 patients were included. Results Showed no significant difference between raters. For the total score, intra class correlation coefficient (ICC) was 0.996 (95% CI: 0.993; 0.997), standard error of measurement was 0.72 point and minimal detectable change 2.0 points. Bland-Altman plot revealed no heteroscedacity. The responsiveness results of 24 patients showed that the effect size was 1.2 and the standardized response mean 1.1, which was in accordance with the hypothesis. No ceiling or floor effect was revealed. Conclusion The CPAx-D showed excellent inter-rater reliability and responsiveness.


Introduction
An increasing number of patients are surviving critical illness due to advances in medical care (Graf et al., 2005). However, both the critical illness itself and the iatrogenic effects of its management, such as enforced immobilization, sedation, mechanical ventilation, and physical inactivity, can result in severe and rapid peripheral and respiratory muscle wasting (Latronico and Bolton, 2011;Puthucheary et al., 2013). This is referred to as 'Intensive Care Unit-Acquired Weakness' (ICU-AW). ICU-AW affects around 43% (IQR 9-86%) of critically ill patients (Appleton, Kinsella, and Quasim, 2015;Vanhorebeek, Latronico, and Van den Berghe, 2020) and is linked to the presence of sepsis and multi-organ failure (Fan et al., 2014). The rapid and substantial loss of muscle mass and reduced muscle strength that occurs during the ICU stay can result in prolonged weaning from mechanical ventilation, physical disability, and impaired activities of daily living (ADL) (Herridge et al., 2011;Vanhorebeek, Latronico, and Van den Berghe, 2020;Visser et al., 2002). Early physiotherapy for patients in the ICU is essential to minimize the physical consequences of critical illness (Anekwe, Biswas, Bussières, and Spahija, 2020;Schaller et al., 2016;Schweickert et al., 2009) and improve long-term outcomes and survival (Iwashyna, Ely, Smith, and Langa, 2010;Needham et al., 2012).
Assessing and monitoring physical function is essential to be able to monitor progress thereby helping to focus the patient care, supporting the treatment plan and ensuring continuity of care from the ICU to the ward (Häggström and Bäckström, 2014;Rosa et al., 2016). Several measurement instruments have been developed to assess and monitor physical function in ICU patients in a standardized way (e.g. Physical Functional in ICU Test-scored; Functional Status Score for the ICU; Perme Mobility Scale and The Chelsea Critical Care Physical Assessment tool (CPAx)) (Corner, Soni, Handy, and Brett, 2014;Parry et al., 2015;Perme, Nawa, Winkelman, and Masud, 2014). The CPAx tool is unique in that it incorporates assessment of respiratory function and the ability to cough as well as functional muscle testing, thereby monitoring the effects of ICU-AW on both peripheral and respiratory muscles. These two items separate the CPAx from other ICU-specific measurement instruments (Parry et al., 2015;Parry, Huang, and Needham, 2017).
It is important that measurement instruments have good clinimetric properties such as acceptable reliability and responsiveness. Reliability reflects the consistency of a measurement method (Mokkink et al., 2010c). A component of this is measurement error, which tests how similar the results of the repeated measurements are, and allows quantification of the systematic and random error of a score that is not attributed to true change in the construct to be measured (Mokkink et al., 2010a). Responsiveness is defined as the ability of an instrument to accurately detect change over time (Mokkink et al., 2010b).
The original (English) version of the CPAx has shown good inter-rater reliability (ICC 0.988 to 0.996), validity, responsiveness, and a limited floor and ceiling effect in trauma and general ICU (Corner, Handy, and Brett, 2016;Corner, Soni, Handy, and Brett, 2014;Parry et al., 2015). The CPAx has undergone translation and cross-cultural adaptation from English to Danish including evaluation of face validity of the Danish version of the CPAx (called CPAx-D) (Astrup, Corner, Hansen, and Petersen, 2020). Whether the CPAx-D is reliable and responsive to change remains to be investigated. Therefore, the objective of this study was to evaluate the inter-rater reliability and the responsiveness of the CPAx-D in a population of critically ill patients.

Methods
The study was performed in accordance with COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) (Mokkink et al., 2010a). The study was conducted at the Department of Physiotherapy and Occupational Therapy at Aarhus University Hospital in Denmark. An ethical application was submitted, however, considered unnecessary as the study did not involve changes to usual care. The study was approved by the Data Protection Agency (Reference number 681665).

CPAx tool
The CPAx consists of 10 domains (respiratory, cough, moving within the bed, supine to sitting on the edge of bed, dynamic sitting, standing balance, sit to stand, transferring from bed to chair, stepping, and grip strength) which are rated on a 6-point scale from complete dependency (score 0) to independency (score 5) (Corner et al., 2013). The total sum score ranges from 0 to 50, with a higher score indicating a better physical function.

Participants
Critically ill patients were recruited from three different ICUs, representing a large variation in terms of diagnosis. Inclusion criteria were: 1) adult patients (age 18 and above) and 2) patients considered clinically stable and suitable to receive physiotherapy treatment. Exclusion criteria were: 1) acute neurological diagnoses (e.g. Guillain-Barré syndrome, cerebral hemorrhage or other diseases with acute CNS involvement); and 2) patients unable to speak or understand Danish. Patients with acute neurological diagnoses were not included because the original (English) version of the CPAx was validated in ICU patients without acute neurological diseases, other than ICU-AW.
The following demographic data were extracted from the medical records: sex, age, body mass index (BMI), number of comorbidities, use of mobility aid prior to hospitalization, reason for ICU admission, number of days with mechanical ventilation, length of hospital admission before the ICU and the length of the ICU stay.

Raters
These raters were seven physiotherapists, who routinely treated patients in the ICU (2-15 years of clinical experience in the ICU). Prior to the study, the raters completed the English E-leaning program (Corner, Handy, and Brett, 2016), followed by a short training period to familiarize themselves with the CPAx. The raters were calibrated by assessing at least 13 patients in the ICU with the CPAx-D and discussing the assessments with a CPAx experienced supervisor. During the process of completing the E-learning course and the calibration period, the CPAx-D underwent a few adjustments for a clearer understanding of the content. These adaptations were approved by the original developer of the CPAx tool, E.J. Corner, before the use of the final version of CPAx-D in this study (Appendix 1). After the calibration period, all seven raters completed two pilot tests in order to practice the standardized reliability test procedure.

Inter-rater reliability
Each of the patients were assessed by two of the seven raters on the CPAx-D. To do this the raters observed a physiotherapy session performed by a physiotherapist independent of the project who guided the patients through all 10 items of the CPAx-D. Meanwhile, the two raters present in the room during the treatment session, individually assessed the patient's ability to perform these 10 items on the CPAx-D, without any involvement in the treatment or discussion between raters. Both raters were blinded to the assessment of the other rater. The session lasted for approximately 30-40 minutes.

Responsiveness
Responsiveness was investigated according to the COSMIN guideline (Angst, 2011;de Vet, Bouter, Bezemer, and Beurskens, 2001) using the construct approach. Overall, it seems reasonable to assume that the patients' condition will improve considerably from the point of ICU admission to the point of ward transfer. The study group hypothesized that the change in the total CPAx-D score from early admission to leaving the ICU will show large Effect Size (ES) and Standardized Response Mean (SRM) (≥ 0.8) (Cohen, 1988).
For the responsiveness analysis, two assessments at baseline and follow-up were needed. The baseline assessment was collected at an early stage during ICU admission as part of the inter-rater reliability testing, using the score of one of the raters. The follow-up assessment was completed by one of the two inter-reliability raters who had performed the baseline assessment, before the patient was transferred from the ICU to the general ward or shortly after arriving at the general ward (± one day).
All patients involved in the inter-rater reliability test were eligible for investigating responsiveness, except patients that were: 1) moved from the ICU to a regular ward within 24 hours after the inter-rater reliability assessment; 2) moved to the regular ward for terminal or palliative care; 3) transferred to another hospital before being follow-up tested; or 4) because of death.

Statistical analysis
A sample size of at least 50 is recommended for interrater reliability testing (Mokkink et al., 2010c). Descriptive statistics were used to present the characteristics of the study population. Normal distributed data were described by the mean and standard deviation (SD), otherwise by median and interquartile range or percentage.
The difference in total CPAx-D score between rates 1 and 2 was analyzed with a paired t-test. Reliability of the total CPAx-D score was investigated using the intraclass correlation coefficient (ICC) model 2.1 with 95% confidence intervals (CI), and a quadratic weighted kappa for the 10 items (de Vet, Terwee, Mokkink, and Knol, 2011). ICC and Kappa values between 0.75 and 0.90 indicate good reliability and ICC and Kappa values ≥0.90 were considered as excellent reliability (Koo and Li, 2016).
Measurement error of the total CPAx-D score was assessed with standard error of measurement (SEM) and minimal detectable change (MDC), and percentage agreement for the 10 items. SEM was calculated as SEM = SD/ √2. Next, SEM was converted into MDC (MDC = 1.96 x √2 x SEM). A Bland-Altman plot of the total CPAx score was made including 95% limits of agreement (LOA) (de Vet, Terwee, Mokkink, and Knol, 2011).
Responsiveness was assessed using ES and SRM with values between 0.5 and 0.8 considered moderate and ≥ 0.8 considered large (Cohen, 1988). Responsiveness was evaluated by testing the hypothesis that ES and SRM was ≥ 0.8. Possible floor and ceiling effects were also examined using a 15% cut off. The alpha was set at 0.05 values. Statistical analyses were conducted with STATA 16.1 software (STATA Corp, College Station).

Results
A total of 66 patients were included in the reliability study with 24 of these included in the responsiveness assessment. The characteristics of the study population are presented in Table 1. The mean was 66 years, 65% were men, mean BMI was 27 (SD 5.6), 94% had one comorbidity, 68% had 3 or more comorbidities and 32% needed a mobility aid to hospital admission.

Inter-rater reliability
The range of the total CPAx-D score at baseline was 4-44 points, and the range of the CPAx-D scores among the 24 follow-up tests was 10-49 points. There was no significant difference between raters (p = .81). The ICC was 0.996 (95% CI: 0.993; 0.997), SEM was 0.72 point and MDC 2.0 points ( Table 2). The Bland-Altman plot revealed no signs of heteroscedacity and LOA were +2.0/-2.0 points (Figure 1). The quadratic weighted kappa on the 10 items individually ranged between 0.914 and 0.995 and the agreement between 97.9% and 99.9% (Table 3).

Responsiveness
The mean difference in CPAx-D score between the baseline and follow-up test was 9.8 points (95% CI 6.2; 13.5) (P < .0001). ES was 1.2 and SRM was 1.1. which was in accordance with the hypothesis, that the change in the total CPAx-D score from early admission to leaving the ICU would show a large ES and SRM (≥ 0.8).

Floor and ceiling effect
None of the 66 included patients scored zero or fifty points on the total CPAx-D score on either assessments. This means there was no ceiling effect or floor effect of the total CPAx score.

Discussion
The objective of this study was to evaluate the inter-rater reliability and the responsiveness of the CPAx-D in a population of critically ill patients in the ICU. Excellent inter-rater reliability was found both for the total score (ICC = 0.996) and all ten individual items (Kappa = 0.914-0.995). The measurement error in terms of MDC was 2.0 points equal to 8.1% of the mean score of the two raters, which is considered acceptable for individual assessment in CPAx.
Our results are consistent with those found in two other studies investigating reliability. A study of the original CPAx tool demonstrated ICC values ranging from 0.996 to 0.988 (Corner, Handy, and Brett, 2016). However, in this study, the CPAx assessments were based on videotaped sessions. Another study investigating the Swedish CPAx used the same method as in our study and found results comparable to ours (ICC = 0.97 and quadratic weighted Kappa 0.86-0.98) although the quadratic weighted Kappa values in our study were a bit higher than in the Swedish study (Holdar et al., 2021). This difference might be due to a different training and calibration procedure of the raters.
The results of the change score from baseline to follow-up showed an ES of 1.2 points and an SRM of 1.1. This result is in accordance with the predefined hypothesis, which indicated that the CPAx-D was responsive to measure a change of the expected magnitude from early during ICU admission to the time being transferred to a regular ward.
For comparison, a feasibility study investigated the ES of the CPAx in a complex Neuro-rehabilitation Unit (Wilson-Barry, Spencer, and Haworth, 2019), and found an ES of the CPAx of 1.02 which is similar to our result. However, these studies should be compared cautiously due to the difference in patient groups.
The range of the total scores from 4 to 49 points showed that no floor or ceiling effect was present in our population. Furthermore, the range of scores recorded in this study suggest that the full spectrum of the CPAx scores in all 10 domains were used, indicating that the CPAx is sensitive to the full range of function from the weakest, most passive and unstable patients in the ICU to the patients able to independently mobilize without assistance. A previous study of floor and ceiling effects of the CPAx in an ICU population described a limited floor effect (3.2%) and ceiling effect (0.8%) (Corner, Soni, Handy, and Brett, 2014), which supports the efficacy of CPAx during the overall ICU admission.

Limitations of the study
The present study has some limitations. First, our results can only be generalized within physiotherapists and not necessarily to other health professionals at the ICU. We only included physiotherapists as raters in this reliability study, because the different items of the CPAx-D are focusing on aspects of physical function that are included in the regular assessment and treatment done by the physiotherapists working within the ICU. Finally, the sample size for responsiveness was small, including only 24 patients. The 42 patients were excluded from the follow-up assessment in line with the exclusion criteria, i.e., due to transfer to the regular ward within 24 hours after the baseline assessment, transfer to another hospital or death. Nevertheless, baseline characteristics of patients excluded from the followup assessment did not differ from the patients that were included in the responsiveness analysis.

Strengths of the study
First, random variability between test scores is often caused by subjective evaluations of the raters. In this study, we attempted to prevent biases and inaccuracy between the raters by having all seven raters completing a training period. This period consisted of taking the English E-learning course, gaining experience with the CPAx-D during a calibration period and finally completing two pilot tests followed by discussion with a supervisor before participating in the reliability test procedure.
These steps were applied to ensure that the raters had the same level of understanding and experience when applying the CPAx-D. The rationale is that these steps should also be applied before implementing the CPAx tool in clinical practice to ensure consistency.
Secondly, the raters were physiotherapists who had ample experience with daily treating patients in the ICU. This choice was made to reflect usual clinical practice of the ICU setting, where physiotherapists need to be trained and have some clinical experience before treating patients.
Another strength of CPAx is the ease of use, as the assessment can be done as part of the usual physiotherapeutic intervention with the patient. The assessment itself only requires the usual equipment for mobilization and a dynamometer to test the grip strength. Subsequently, it takes less than 5 minutes to complete the CPAx form.

Perspective and further research
The aim is for the CPAx-D to support the interdisciplinary goal setting for ICU patients by reaching different milestones toward independent respiratory function, ability to cough effectively and achieve physical independence, as well as optimizing the written documentation for the benefit of the interdisciplinary collaboration.
Having a core set of measurement instruments to assess physical functioning and treatment effect in patients in the ICU as well as during the overall hospital admission is important. Having just one measurement instrument to cover the entire hospitalization period would be ideal but may not be possible because of the large variations in physical functioning from early ICU admission until hospital discharge. The CPAx-D could also be used to explore patient recovery trajectories from the ICU to hospital discharge.

Conclusion
The CPAx-D showed excellent inter-rater reliability and responsiveness. No floor or ceiling effect was present in the study population. This makes CPAx-D suitable for use in any ICU population both in clinical practice and research.