Stroke self-efficacy questionnaire – Denmark (SSEQ-DK): test–retest of the Danish version

ABSTRACT Background In stroke rehabilitation, measurement tools measuring self-efficacy with sound psychometric properties are needed. The Stroke Self-Efficacy Questionnaire (SSEQ) has recently been translated and validated into a Danish version (SSEQ–DK). Objectives To evaluate the test–retest reliability of the SSEQ-DK. Methods Fifty people with stroke ≥ 18 years in the sub-acute and chronic phase were included from February 2019 to August 2020. The SSEQ-DK was completed twice; on day 1 and day 7–14. Test–retest reliability of the single items was assessed using weighted Cohen’s kappa and percentage agreement. The activity and self-management scales were assessed by the intraclass correlation coefficient (ICC). Measurement error was assessed by calculating the Smallest Detectable Change (SDC) based on the standard error of measurement. Results Overall, kappa values showed fair to substantial test–retest reliability of the single items. However, several kappa values were missing as the statistical prerequisites were not present. The percentage agreement ranged from 78% to 94%. Based on the reported confidence interval of the estimated intraclass correlation coefficient, the test–retest reliability of the activity and self-management scales was poor to excellent in all analysis. Ceiling effects appeared in the single items. Conversely, no floor effect was seen. Conclusion The SSEQ-DK showed good test–retest reliability of the single items based on agreement among a population with stroke in the subacute and chronic phase. Broad ICC confidence intervals bar any firm conclusions concerning the test–retest reliability of the activity and self-management scales. Trial registration ClinicalTrials.gov NCT03183960. Reg. 15 June 2017.


Introduction
Worldwide, stroke is a major cause of healthrelated problems, and the number of people living with the consequences of stroke is growing globally due to expanding populations and higher survival rates. 1 The annual number of stroke events in Europe is expected to increase by 34% from 2015 to 2035. 2 The number of people living with post-stroke sequelae is therefore increasing, and many societies are accordingly facing major challenges in stroke rehabilitation. 3 In Denmark, with a population of 5.5 million, the annual incidence of stroke is estimated at 5,297 stroke per year, 2 of which 20-25% are suffering post-stroke sequelae. 4 Stroke affects people's lives on multiple level -Both physical, psychological, cognitive and behavioral. This could reduce the individual's ability to manage life and to maintain the best possible quality of life post-stroke. [5][6][7] Self-efficacy theory forms the basis of many post-stroke self-management programs, as reflected in two recent systematic reviews by Fryer et al. and Wray et al. 8,9 The concept selfefficacy was first described by Bandura, and is a psychological construct referring to confidence in one's ability to perform a specific task or specific behavior. 10,11 Self-efficacy has been found to predict both quality of life and disablement in people poststroke. 12 This is supported in a systematic review by Jones and Riazi who reported self-efficacy to be positively associated with quality of life, depression, activities of daily living (ADL) and social activities in people after a stroke. 13 Moreover, self-efficacy has been found to be a predictor of mental health, lifesatisfaction and quality of life for people with other chronic conditions. [14][15][16] In sum, individuals who display higher levels of self-efficacy post-stroke experience and perceive less functional decline, greater control over many important aspects of their life and improve their chances for better and sustained rehabilitation effects. 17 For this reason, self-efficacy is considered to be an important psychological construct in stroke rehabilitation. 17 In this light monitoring the individual's selfefficacy in stroke rehabilitation is an essential aspect during the rehabilitation process. Measurement tools measuring self-efficacy with sound psychometric properties are needed. The Stroke Self-Efficacy Questionnaire (SSEQ) is to our knowledge the only measurement specified the stroke population. SSEQ has recently been translated and validated into a Danish version (SSEQ-DK). 18 The aim of this study is to evaluate the test-retest reliability of the SSEQ-DK.

Materials and methods
This study follows the recommendation of the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) for the reliability domain. On this basis, reliability and measurement error are included. Reliability is defined as the proportion of the total variance in the measurement which is due to true differences between patients, while measurement error is defined as the systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured. [19][20][21] In addition, floor and ceiling effects are calculated for the single items. Internal consistency is not included, as it was evaluated in a validity study by Kristensen and Pallesen. 18

Study population and setting
People diagnosed with stroke (brain infarction or brain hemorrhage), fluent in Danish language and over the age of 18 years were included from February 2019 to August 2020 from two different wards at a specialized neurorehabilitation hospital, three different out-care municipality neurorehabilitation centers and one out-care private neurorehabilitation clinic. To recruit participants, the manager of the involved departments/clinic were contacted to approve participation in the project. Subsequently, clinicians in charge of the rehabilitation received oral and written instruction on inand exclusion criteria in order to recruit participants for the project. To determine the test-retest reliability of the SSEQ-DK, people with stroke were requested to complete the questionnaire on two occasions. Follow-up was conducted within a time span of seven to no more than 14 days, which was considered appropriate to prevent recall bias. The questionnaires were handed out and collected by the authors and two assistants. The participants were considered medically and clinically stable. On this basis the participants were assumed mentally stable (confusion, consciousness) in the interim period of the two test. None of the participant were aware of the scores on the first administration prior to the second administration. Potential participants were excluded if the two tests could not be answered in similar test conditions e.g. at the hospital or at home both times. Individuals, including people with severe aphasia, who needed assistance to understand the written instructions and/or to answer the questions were also excluded..

The SSEQ instrument
The Stroke Self-Efficacy Questionnaire (SSEQ) is a stroke-specific measure developed by Jones et al. in 2008 to determine the level of self-efficacy in people with stroke. 22 The 13-item questionnaire is self-administered. Each item is to be scored on a four-point Likert scale from 0 (Not at all confident) to 3 (Very confident). The answers are based on the stroke individual's belief in his or her personal capability to achieve each item. 23 The SSEQ is divided into two scales with items 1 to 8 demonstrating an activity scale and items 9 to 13 demonstrating a self-management scale. The activity and self-management scales range from 0-24 and 0-15, respectively. The two subscales are unidimensional and measure two separate constructs. 23 The total score is not evaluated in the present study.
The SSEQ has demonstrated good psychometric properties in the original language, English. 22,23 It also appears to be reliable and valid in Italian, Chinese, Turkish and Portuguese. [24][25][26][27] The Danish adaptation of the stroke self-efficacy questionnaire -Denmark (SSEQ-DK) showed good face validity and internal consistency for the activity and selfmanagement scales. Furthermore, no floor and minimum ceiling effects appeared. 18 SSEQ-DK is presented in the supplemental material.

Statistical analysis
Test-retest reliability was analyzed for the single items and the activity and self-management scales. Descriptive statistics were generated for participants' characteristics and for each item to determine floor and ceiling effects. Such were considered present if more than 25% of responses were at the lowest or highest possible category within a single item of the first test (arbitrarily chosen). [28][29][30] The single items were presented on an ordinal scale, and the Wilcoxon-matched pairs signed rank test was used to evaluate differences. Test-retest reliability was assessed using weighted Cohen's kappa and was defined as poor (κ = 0.00-0.20), fair (κ = 0.21-0.40), moderate (κ = 0.41-0.60), substantial (κ = 0.61-0.80) and almost perfect (κ = 0.81-1.00). 31 To avoid problematizing items with a low kappa value despite showing a high percentage agreement due to a skewed response distribution (creating instability in the kappa statistic), items with a κ > 0.60 and/or percentage agreement ≥ 80% were considered to display good reliability (arbitrarily chosen). [32][33][34] To ensure the statistical prerequisites of the weighted kappa, kappa values were not calculated if differences between the test and retest were statistically significant and/or if a cell with zero counts occurred in the main diagonal of the contingency table.
The activity and self-management scales were converted into continuous variables, differences between test and retest were calculated and systematic differences were assessed by paired t-test and scatters of the differences between test and retest were plotted against the means to indicate if differences were related to the activity and selfmanagement scales. Bland-Altman plots with 95% confidence intervals (CI) and 95% limits of agreements (LOA) were created to investigate agreement between test and retest. The reliability of the variables was examined using the intraclass correlation coefficient (ICC). ICC was calculated using single rating, absolute agreement and 2-way random effects model with corresponding 95% CI. 32 ICC values were defined as poor (ICC <0.5), moderate (ICC ≥0.5 to <0.75), good (ICC ≥0.75 to <0.9) and excellent (ICC ≥0.90). 32 Measurement errors were estimated by calculating the standard error of the measurement (SEM), and SEMs were converted into the Smallest Detectable Change (SDC) (SDC = 1.96×√2× SEM). The SDC defines the smallest within-person change that can be interpreted as a 'real' change above the measurement error. 35 All mentioned analyses were made on the pooled sample and the in-and outpatients separately. Participants with missing item values were excluded from the analysis. STATA 16.1 software (Stata Corp, College Station) was used for statistical analyses.

Ethics
All participants gave informed consent to participate. The project was reported to the Danish Data Protection Agency (no. 1-10-72-264-16) in accordance with Danish law. Participation was voluntary, and anonymity was preserved.

Participants
During the recruitment period, 68 people with stroke accepted to participate. Thirteen participants were excluded as they needed assistance to understand the written instructions and/or to answer the questions due to poorer cognitive abilities than expected. Among them, three participants had severe aphasia. Furthermore, five participants were excluded due to one or more missing item values. This left 50 participants for the study. Baseline characteristics of the 50 included participants are presented in Table 1.
The mean age of the enrolled participants was 61.78 years (SD 10.93) with 31 being men (62%) and 19 women (38%). Most of the participants had suffered ischemic infarction (78%) and were in the subacute to chronic phase with days from stroke onset to the first test varying from 2 to 4,748 (mean 785 days). The mean duration between the test and retest was 7.44 days (SD 1.20). A statistical difference in stroke duration (p = .00) between in-and outpatients was seen; inpatients had a mean stroke duration of 82 days (SD 213, range 2-1071), while outpatients had a mean stroke duration of 1,435 days (SD 1423, range 58-4,748). An overview of the characteristics of the in-and outpatients is presented in the supplemental material. No statistical difference was seen between the included participants and the 18 excluded people, except that significantly more included participants were working.

Single items
In the pooled analysis, ceiling effects occurred in 10 questions as 34% to 64% of all responses fell in the upper categories. No floor effects occurred. However, a tendency toward floor effect was seen in one question (Table 3). For each item, weighted kappa and observed agreement were calculated if applicable. Assessments revealed significant differences between test and retest in three questions and cells with zero counts occurred in the main diagonal of the contingency table in two questions. As the statistical prerequisites of the weighted kappa were not met in these five questions, it was not possible to calculate kappa values. The remaining eight kappa values of test-retest reliability showed "moderate to substantial reliability" with kappa values ranging from 0.47 to 0.70, of which four values were below the acceptable value (0.60). Agreement ranged from 83% to 89%; thus, all items were above the acceptable level of 80%. Based on this agreement, the testretest reliability was considered "good." An overview is presented in Table 2.
Analysis of the in-and outpatients revealed significant differences between test and retest among inpatients in two questions and cells with zero counts occurred in the main diagonal of the contingency table in three questions. The remaining eight questions showed "fair to substantial reliability" with kappa values ranging from 0.40 to 0.78, of which five values were below the acceptable value (0.60). The agreement was 83% to 92% in eleven questions. The remaining two questions were just below the cutoff value (80%). Overall, the agreement showed "good reliability." Regarding the outpatients, no significant differences between test and retest were revealed. However, cells with zero counts occurred in the main diagonal of the contingency table in two questions. The remaining eleven questions showed "fair to substantial agreement" with kappa values ranging from 0.35 to 0.72, of which six values were below the acceptable value (0.60). Agreement ranged from 83% to 94%, which suggests that the reliability was considered "good" also for the outpatients. An overview of the results is presented in Table 2.

Activity and self-management scales
For both the activity and self-management scales, paired t-test revealed significant differences between the test and the retest in the pooled analysis (p = .01). Differences between test and retest plotted against the mean of the two tests are visualized in Figure 1 and  Table 3.
In the analysis of the in-and outpatients, significant differences were found among inpatients on the activity scale (p = .02). Furthermore, the p-value on the self-management scale (p = .08) was close to the cutoff p-value (0.05). No significant differences were found among the outpatients. However, the p-values for the self-management scale (p = .09) were also close to the cutoff p-value. Differences between test and retest plotted against the mean of the two tests are visualized in Figure 1 and showed no systematic bias between the test and the retest. Based on the confidence intervals of the ICCs, the test-retest reliability for the activity and self-management scales was also considered "poor to excellent" when the in-and outpatients were analyzed. The results from the analysis are presented in Table 3.

Discussion
This study aimed to investigate the test-retest reliability of the Danish version of the SSEQ-DK. The test-retest reliability of the single items was good as it was above or close to the acceptable level of 80% agreement for all items. The kappa values showed fair to substantial test-retest reliability. However, several kappa values were missing due to significant difference between the test and the retest and/or a skewed response distribution. Based on the reported confidence interval of the estimated ICC, the test-retest reliability for the activity and selfmanagement scales was poor to excellent.  In the present study, single item ceiling effects were most pronounced on the activity scale among outpatients. This could indicate that no categories challenged those with stroke who are independent of physical assistance. This is in line with the original study by Jones et al. (2008), where a ceiling effect was seen in participants with a high degree of independence in ADL and mobility. 22 Other studies have also highlighted a ceiling effect in the SSEQ. 18,26,27 However, ceiling effects occurred among both the in-and outpatients on both the activity and self-management scales which could be explained by impaired judgment or unrealistic beliefs about one's own ability after stroke due to lack of self-awareness. [36][37][38] The broad confidence intervals of the ICCs could be explained by heterogeneity between the participants as the consequences of stroke vary in terms of difficulty, severity and the impact it has on the individual's activities and participation. 39 Therefore, confidence in one's ability to perform a specific task or specific behavior could also be expected to vary. The confidence intervals reported in the present study cannot be compared with those reported in other SSEQ test-retest studies, which either did not calculate ICCs or did not mention ICC confidence intervals. Regardless, no firm conclusions may be made about the results. The level of the estimated SDCs from the pooled analysis implies that 5.48 points on the activity scale and 3.40 points on the self-management scale are needed to detect a 'true' within-person change. It is slightly higher for inpatients (6.63 and 3.97, respectively) and slightly lower for outpatients (4.00 and 2.84, respectively). The relatively large SDCs combined with the tendency toward ceiling effects should be taken into consideration by clinicians who use SSEQ-DK, as it could be difficult to measure change over time.
The results from the analysis of in-and outpatients showed a tendency toward greater progress on the activity scale among inpatients than among outpatients. This difference was not significant as the estimate of outpatients was included in the confidence intervals of inpatients. However, spontaneous recovery due to shorter stroke duration among inpatients could have affected the difference in progress. 40,41 The estimates of the difference on the self-management scales indicate decline and progress among both inand outpatients. Previous studies have highlighted that self-management questions seem to require the ability to think abstractly, which could challenge people with stroke. 18,27 In this light, clinicians should be aware that some people with stroke might need assistance to answer these questions. As a high level of self-efficacy appears to be beneficial for people with stroke, it could be valuable to focus on stroke-specific self-management programs, which seem effective in improving self-efficacy for people with stroke. 6,8,[42][43][44][45] To our knowledge, the SSEQ is the only measurement developed specifically for people with stroke. Therefore, the SSEQ could be a useful measurement for clinicians if used alongside other measures of functional performance to gain greater insight into the relationship between self-efficacy and other stroke outcomes.

Study limitations
Although serious attempts were made to include people with stroke regardless of the degree of the severity of their stroke and the effect it had on their ADL and participation, some participants found it difficult to understand the written instructions or needed assistance to answer them and were therefore excluded. This may have reduced the generalizability of the present findings which may not apply to stroke individuals with alexia or severe cognitive impairment. However, such individuals could constitute an especially vulnerable group for whom rehabilitation is particularly important. As nobody declined the invitation to participate, an inherent selection bias may exist as we may have included participants who were motivated to participate because they had a friendly relationship to the clinicians in charge of the rehabilitation and/or felt grateful for the rehabilitation they received. We chose to recruit both inand outpatients for convenience. However, the stroke duration made the two groups quite different in terms of spontaneous recovery. Furthermore, the small sample sizes in both groups decreased the power of the non-pooled estimates why these findings should be interpreted with caution.

Conclusion
Based on the percentage agreement, the SSEQ-DK showed good test-retest reliability of the single items among a population with stroke in the subacute to chronic phase. The single items of the SSEQ-DK could be used in clinical practice to reveal those individuals who require more targeted support within a single aspect in order to build selfconfidence. Moreover, the single items of the SSEQ-DK could be used in research as secondary outcomes to provide data on whether the participants believe they can perform certain activities. However, further detailed testing on the single items is required. Firm conclusions about the test-retest reliability of the activity and self-management scales cannot be made due to the broad confidence intervals of the ICCs. On this basis it is not recommended to use the sum scores of the activity and self-management scales either in clinical practice or research contexts until more psychometric studies on SSEQ-DK have been made. Ceiling effects appeared in the single items. Conversely, a floor effect was absent.