Effect assessment in work environment interventions: A methodological reflection

This paper addresses a number of issues for work environment intervention (WEI) researchers in light of the mixed results reported in the literature. If researchers emphasise study quality over intervention quality, reviews that exclude case studies with high quality and multifactorial interventions may be vulnerable to ‘quality criteria selection bias’. Learning from ‘failed’ interventions is inhibited by both publication bias and reporting lengths that limit information on relevant contextual and implementation factors. The authors argue for the need to develop evaluation approaches consistent with the complexity of multifactorial WEIs that: a) are owned by and aimed at the whole organisation; and b) include intervention in early design stages where potential impact is highest. Context variety, complexity and instability in and around organisations suggest that attention might usefully shift from generalisable ‘proof of effectiveness’ to a more nuanced identification of intervention elements and the situations in which they are more likely to work as intended. Statement of Relevance: This paper considers ergonomics interventions from perspectives of what constitutes quality and ‘proof”. It points to limitations of traditional experimental intervention designs and argues that the complexity of organisational change, and the need for multifactorial interventions that reach deep into work processes for greater impact, should be recognised.


Introduction
An increasing number of literature reviews on the effectiveness of workplace interventions are being published in scientific journals. In a comparison between reviews, there are sometimes substantial differences in the conclusions drawn (Eklund et al. 2006). In the field of ergonomics, this has been apparent (Kuorinka and Forcier 1995, Westgaard and Winkel 1997, Hansson and Westerholm 2001, Karsh et al. 2001, Cole et al. 2005. Many of the reasons behind these differences seem to be how the original papers have been evaluated and thus included or excluded in the review. The different conclusions are problematic for practitioners who wish to apply scientific findings; this is therefore an issue with high priority. The aim of this discussion paper is to identify and summarise problems and contradictions when evaluating the effects of working environment interventions and to discuss methodological issues that may be contributing to the diverging conclusions published in literature. The terms working environment (WE) and work environment intervention (WEI) used in this paper are rooted in Scandinavian conceptions of workplace health and safety and are similar to the broad international definition of 'ergonomics' as it applies to workplaces (IEA Council 2000). The aspects of WEI research identified and discussed by the authors in this paper include study quality, quality criteria selection bias, publication bias, evaluation strategies, effect sizes, time issues, systems perspectives on intervention, levels of intervention, stability of findings and the need for new approaches to both WEI and to WEI research.

Issues in work environment intervention research 2.1. Study quality
The quality of WEI research studies has generally been seen as 'low' when compared to medical experimental traditions. van Poppel et al. (1997), for example, found only 11 studies with control groups and none using 'blinding' techniques such as those used in pharmacological studies. While systematic reviews have justified the exclusion of many studies due to 'quality' deficits, many studies exist that have indeed shown positive effects of interventions aimed at musculoskeletal disorder (MSD) prevention. Kilroy and Dockrell (2000), for example, present a typical case study of this kind with pre-post measures, without a control group and follow-up post-change measures taken shortly after the intervention. Studies with more stringent experimental designs, using control groups for example, generally show either modest effects (Morken et al. 2002) or, in an example of a 'high quality' randomised control trial (RCT) of participatory ergonomics, no effect (Haukka et al. 2008). Feuerstein et al. (2004) had a different problem with their control groups -both the intervention group (workplace modifications) and the control group (who received training in stress management) demonstrated significant reductions in reported MSD symptoms even after 12 months. While such reductions might be explained as the Hawthorne effect, it is also possible that both interventions were effective in this caseanother 'problem' with the use of control groups in these settings. It is also a well-known phenomenon that interventions may spread spontaneously to a control group (e.g. Tuomivaara et al. 2008). The ethics of using control groups, who are denied an intervention, has also been seen as a barrier to apply classical experimental study designs. Managers concerned with employee welfare may refuse to keep control groups isolated from the interventions (Bell et al. 2008). Similarly, Karltun (2007) studied an intervention for a total population in one occupational group, where a control group within the population was not isolated for ethical reasons. Volinn (1999), examining the relationship between study quality and intervention effectiveness, points out that the quality of a study design is often inversely related to the reported effect. It seems as if the more rigorous the scientific study design is, the weaker the effect that is found (Griffiths 1999, Karsh et al. 2001. Many of the systematic reviews in the literature aim to determine whether or not ergonomics interventions cause a reduction in occupational injuries. One of the strongest approaches for demonstrating causality of safety interventions comes from the medical experimental tradition emphasising randomisation and control groups (Robson et al. 2001). RCTs, seen as the strongest study design, also suffer from many drawbacks when it comes to intervention evaluation, including vulnerability to differential changes (e.g. drop out), context specificity (they are not generalisable), experimental uniqueness from 'real' programmes ('research' intervention is not transferable), non-applicability to full coverage programmes, high cost, spill-over in the workplace causing contamination and, finally, unsuitability in early stages of intervention development (Rossi et al. 1999). In many workplaces RCTs are problematic (Warming et al. 2008) or even unrealistic and alternative designs such as before-after trials either without or (better) with control groups are more appropriate. The application of individual level randomised control studies is probably not appropriate in many situations as the selection of individuals for a WEI may have an effect on both the control group individuals as well as those receiving intervention, thus biasing the study results. It is also likely that the organisational configuration required for a RCT of ergonomics interventions may actually lead to compromises in the quality of the intervention itself (Volinn 1999). These arguments do not negate the potential utility of experimental approaches in general, but do serve to question the utility of such design alternatives for the evaluation of WEIs, especially those aiming at change at the organisational level. There seems to be a conflict between calls for high quality multifactorial interventions engaging the whole organisation that provide the best opportunity for successful outcomes (Westgaard and Winkel 1997, Karsh et al. 2001, Silverstein and Clark 2004, Hartman et al. 2005, Holden et al. 2008, Wilson et al. 2009) and the desire for experimentally rigorous testing. Viable alternatives to experimental approaches include longitudinal and case series designs that can be paired to comprehensive, high quality interventions run with process evaluations (Cole et al. in press).

Quality criteria selection bias
A recent review of the literature suggests that positive WEI results can be achieved with appropriate interventions and many case studies exist demonstrating this (Eklund et al. 2006). If, on the other hand, reviewers apply extensive methodological criteria such as for randomised controls, there will be considerably fewer studies with positive results. If reviews base inclusion criteria exclusively on research study design criteria, including only experimental studies for example, then well-conducted multifactor interventions at the organisational level (interventions seen to have highest success potential) will be excluded and experimental studies with limited interventions will be included. Such reviews cannot be expected to demonstrate much effect of WEIs. That reasoning is to some extent supported by evidence that participative interventions against MSDs have been shown to be effective when study qualities were rated using 11 methodological strength criteria, among which comparison groups and randomisation were only two criteria (Cole et al. 2005). Similarly, the results of metaanalyses of available literature vary depending on whether or not 'grey' literature is included in the analysis (McAuley et al. 2000). Eklund et al. (2006) described the potentially biasing effects of inclusion criteria on literature review results as 'quality criteria selection bias'.

Publication bias
Similar to the quality criteria selection bias is the socalled publication bias -the tendency for only successful projects to be: a) written up; and b) published (Dickersin andMin 1993, Torgerson 2006). This will tend to overestimate the success of intervention approaches and may be preventing researchers from learning why some interventions are not successful. Failure of an intervention may relate to deficits in intervention theory or deficits in how the intervention is executed (Nielsen et al. 2006, Cole et al. in press). This can be difficult to unravel as the ergonomic effects of a given company strategy (including ergonomic intervention) can depend on how the strategy is realised and the context in which it is implemented (Dul and Neumann 2009) -reporting of implementation and context is important to understanding possible effects here. Reports from many different interventions are needed if the impact of a range of contextual and implementation factors are to be understood and publication bias may be inhibiting progress in this area. This can be inhibited both by reporting length limits in journals, as well as by the tendency to only report 'successful' interventions. Publication bias leaves researchers and practitioners unable to learn from intervention deficits and thereby hindered in developing more effective intervention tactics and strategies. Editors, peer reviewers and researchers should be encouraged to explore 'failed' interventions more deeply and to allow publication of such studies.

Evaluation strategies
How can one measure, identify or evaluate the effects of efforts to improve the WE? While this will depend on context, a better result is likely if evaluation is planned at the same time as the intervention (Vedung 1998). Validity of the evaluation will be improved if each stage along the hypothesised chain of effects is considered. For example, a researcher could measure or describe: (1) the WE changes; (2) the individual's immediate experience of the WE changes; (3) the individuals' reactions to the WE changes (e.g. turnover, job changes); (4) the health consequences for the individual (long term); (5) the system effects in terms of safety, productivity or quality.
Each step in this chain provides different insight along the hypothesised effect pathway and can be considered as providing indicators at different time points -leading indicators first, with lagging indicators later in the causal chain (Cole et al. 2003). Any evidence from along the chain can provide insight into the effects of any change efforts. Karsh et al. (2001) has argued that, since many high quality studies have demonstrated the link between risk factors and MSDs, then it is sufficient in an intervention to demonstrate a reduction in risk factor exposures. In this position, the emphasis is no longer on reinforcing the epidemiological evidence but on understanding the impact of the WEI itself along a proposed chain of effects on the people and systems being studied.

Size of effect
The aetiologically preventable fraction of ill health also poses a challenge for using health outcomes to evaluate interventions. In epidemiological studies of MSDs, it is unusual for any one exposure variable to account for much more than 10% of the injury variance (e.g. Kerr et al. 2001). If the intervention only partially eliminates this factor then the possible impact that can be expected is quite small. The research implication is that the sample sizes needed to spot this difference become enormous. This problem has also been framed in terms of the 'intensity' of the intervention -the extent to which the intervention makes real and substantial change in risk factor exposure amongst the workforce population being served (Cole et al. 2003, Wells et al. in press). It has been suggested that at least a 14% reduction in mechanical load is required to achieve any detectable change in MSDs (Lo¨tters and Burdorf 2002). Without sufficient intervention intensity, or very large sample sizes, statistically significant reductions in injuries are unlikely. Proponents of 'macroergonomics' suggest that much larger effects are possible if the intervention aims at the organisational level rather than just selected workstations (Nagamachi 1996, Hendrick and Kleiner 2001, Kleiner 2006, Imada and Carayon 2008.

Time an issue
While time is known to be a key issue in MSD development (Wells et al. 2007), it also becomes a critical issue in evaluating WEIs. Cancer, for example, can have a 30-year latency. MSDs can also have a long 'incubation' period and it can also take a long time for disorders to resolve. Some MSDs must be considered as chronic and will not be resolved by post hoc preventive measures. If permanent damage has occurred then the best an intervention can achieve is to prevent the injury from worsening and minimise associated disability -even though the existing MSD remains. While the elimination of certain risk factors may be achieved and measured, it is more difficult to demonstrate the effects of these changes on musculoskeletal symptoms or complaints. To measure the prevalence and intensity of MSDs is itself difficult, as there is day-to-day variation for afflicted individuals. To apply these measures as indicators of the effects of an intervention is even more problematic.
A second aspect of time relates to the latency of intervention effects. Some interventions can take a long time before they are fully integrated in the company and start to have an effect on the work system; 3-5 years has been suggested for organisational change efforts (Toulmin and Gustavsen 1996). Thus, long intervention and evaluation periods are needed to identify effects and this may conflict with the limited time-window of available research funding programmes. Furthermore, within this long intervention period, other interventions in the company or economy might overshadow the effects of a WEI or the intervention process itself might become compromised, resulting in programme deficits. Personnel turnover and production changes are other difficulties in longitudinal studies (Bell et al. 2008).
Third, timing also plays a role in the cost and effectiveness of interventions. It is well known in design science that the cost of a change increases dramatically throughout a development project, becoming maximal during the actual implementation phase (Miles and Swift 1998). Alexander (1998) has suggested that, in the case of ergonomics interventions, implementation in already running systems costs 5-10 times more than in the early design phase. These costs create limitations to what elements of the system are feasible to change. If risk is related to a core feature of the system that requires extensive retrofitting to correct, then it is less likely to be addressed in an intervention. Thus, both the cost and potential effectiveness of interventions aimed at retrofitting existing systems will tend to be compromised right from the start. It is noted herein that companies with a vision of 'zero accidents' must build prevention into design where hazards can be eliminated before employees are exposed to the risks -one cannot achieve 'zero' working reactively. Studying design stage WEIs, however, poses new problems for evaluation -how can one 'prove' a benefit from decisions made that have no counter-example in reality? Comparison here relies on available models, and 'virtual ergonomics' tools remain an important area for further research and development.

A systems perspective
From a systems theory perspective, both the initial conditions of a complex system and the ongoing changes within that system over time can have pronounced effects on how a given intervention attempt may affect that system (Skyttner 2001, Backstro¨m et al. 2002. Empirical reports seem consistent with this view, suggesting that intervention effects can be modified or compromised by macroeconomic changes, management culture and a company's current rationalisation efforts (Bao et al. 1996, Polanyi et al. 2005. Interventions can also be influenced by the more micro issues of normal life events, such as marriage, parenting and the death of personnel engaged in the change process ). Relationships between system elements also change over time -strength and spinal load tolerance, for example, change with age and fitness level. Intervention effectiveness may depend heavily on initial conditions, which can be critically determining of the systems response to perturbation -dubbed the 'Butterfly Effect' (Skyttner 2001). Response to intervention attempts may also depend on the sequence of intervention activities for the particular case during the implementation, referred to as path dependency. Core features of the company and, hence, the working environment, change very slowly or only infrequently (such as the human resource systems). These changes are often not clearly delineated in time but form an ongoing developmental process in the company. This makes the study of these rare and slow changes very difficult in a fast-moving and dynamic economic context that has been called 'hyper-competitive' (d'Aveni 1994). While isolating the effects of an intervention programme in this context can be exceedingly difficult, this does not mean there is no effect. Furthermore, reductionist strategies attempting to understand systems by detailed studies of individual intervention components are inappropriate for complex systems with multiple levels of interaction, since there are many uncontrolled variables (Cronbach 1975, Griffiths 1999, Karsh et al. 2001) -as is the case with organisational level WEI efforts. The separate system components studied cannot be combined linearly to recreate the whole (Checkland 1985, Skyttner 2001. In fact, organisational components are combined to create synergy effects (Eklund and Berggren 2000) and other emergent system behaviours. Thus, studies of multifactor interventions in complex systems create a challenge for evaluation -a challenge that may in part be addressed by the use of a series of intermediate outcomes, each shedding some light on the overall situations (Karsh et al. 2001, Cole et al. 2003.

Levels of intervention
Relatively few WEI studies address MSD prevention at the organisational level, but instead focus on workstation level 'microergonomics' aspects (Hendrick andKleiner 2001, Wilson et al. 2009), which are easier to study experimentally. While 'macroergonomic' approaches aim at intervening at the organisational level, they are mostly presented as case studies and may be dismissed by those seeking experimental evidence. In contrast, studies in the field of labour economics have used analyses of large-scale, multicompany longitudinal 'panel' datasets to reveal a time lag between investment in employee training and firm-level financial performance, which provides evidence of a causal relationship -in this case, between employee training and firm performance (d' Arcimoles 1997, Bassi et al. 2004, Hansson et al. 2004. Applying this approach to WEI research would require developing organisational level indicators for prevention activity, such as an audit tool or score card, which could be used to correlate to outcomes of interest such as injury rates or firm performance. Interventions at the organisational or societal level represent the most complex types of interventions that can have many interacting elements and synergistic interactions, the effects of which are difficult, if not impossible, to isolate (Ekberg 1994). Such interventions operate at a level of system complexity one order higher than the individual human whose health is to be improved (Skyttner 2001). The more complex the system subjected to intervention is, the more complex the intervention itself needs to be and the more difficult the evaluation will be. Sometimes it is not possible to conduct an experimental evaluation. Furthermore, complexity in systems implies that relationships will vary widely between systems (in this case, organisations) and that these relationships are unstable and may shift over time (Skyttner 2001, Backstro¨m et al. 2002. Under these circumstances, the usefulness of the concept of 'proof of effectiveness' can be questioned since generalisability is not possible -there is no single 'general' case on which to intervene . It is appropriate, therefore, to question the purpose of the evaluation and consider a shift in focus from 'proof of effectiveness' to the generation of information that can guide further action (Patton 1997). Those outside the intervention must consider the transferability of findings to their own contextwhat is it about the new case that mirrors the situation in the previous case? Similarly, those inside the intervention need to ensure that the efforts to improve working environment are adapted to the changes ongoing in the organisation. This research approach is more consistent with research in the areas of business practices, such as total quality management, where the emphasis is less on 'proving effectiveness' and more on determining what features of implementation appear to be associated with better performance (Ennals 1999).

Findings are not stable
There is a tendency observed in the literature that the effects of health promotion interventions have increased over time, while older biomechanical intervention studies have more substantial and positive results than newer intervention studies (Eklund et al. 2006). This might be related to any one of a number of hypotheses. The obvious and severe risks in older workplaces might have been improved, leaving only issues that are more difficult to intervene upon. Risk may have shifted from high-load biomechanical mechanisms to more psychosocial and low-level prolonged loading mechanisms as workplaces developed (Wells et al. 2007). Work rotation has become more common and this spreads risk around the workforce -diluting the effect of a change and reducing variance in the study population. Notions of what constitutes 'science' in reporting may also have changed in the last decades -thus changing the perception of effectiveness. Returning again to systems theory, workplaces are complex systems with dynamic and unstable relationships between system elements that pose moving targets for researchers (Backstro¨m et al. 2002). Solutions that work today, therefore, may not be appropriate tomorrow and the design of good working environments may demand ongoing development as the organisation itself adapts to changing social, technical and economic contexts. This may help explain why interventions may work in some contexts but not others -a problem noted by Karsh et al. (2001).

Alternative approaches needed
To focus on evaluation quality and not the intervention quality may be seen as putting the cart before the horse. It risks compromising the uptake of potentially useful ergonomic knowledge into workplaces by creating a negative bias in the literature (Dempsey 2007). Interventions should engage company stakeholders, as leadership and participation are frequently seen as crucial for the success of such programmes (e.g. Cohen et al. 1997, Holden et al. 2008, Vink et al. 2008). This approach is consistent with the calls for 'macroergonomic' approaches to WEI (Hendrick and Kleiner 2001, Kleiner 2006, Imada and Carayon 2008, Genaidy et al. 2009) as well as other organisational development approaches to WEI (Gustavsen et al. 1996, Toulmin andGustavsen 1996), to organisational learning (Senge 1990, Ekman Philips et al. 2002 and to work system design (Jensen 2002, Broberg 2007, Wilson et al. 2009). Organisational level approaches with direct engagement of company stakeholders also implies loss of control over the intervention process for the researcher and compromises the appropriateness of experimental approaches. This built-in conflict, between researcher control over the intervention vs. user participation and user control over the intervention, may partially explain the difficulty in obtaining good quality intervention research at the same time as good quality interventions. Furthermore, the demand for control groups in such interventions may also compromise the organisational change process by excluding and alienating certain stakeholders and disrupting the organisation's dynamics. Insisting on control groups can also pose ethical problems, in that certain parts of the organisation are denied hazard elimination efforts and held back developmentally in order to satisfy researcher demands for experimental methods. This view questions the very idea that scientific evidence should be obtained only through experimentation and calls for alternative forms of investigation.
There remains a need for research on how WEIs aimed at the organisational level can affect the health of individuals in the company (Fishman 1999). Action research provides one example of a non-experimental methodological approach for researchers operating collaboratively with firms to improve organisational processes (Toulmin and Gustavsen 1996, Reason and Bradbury 2001, Ottosson 2003. Unfortunately, here again, it can be difficult to publish such action research projects (Ottosson 2003) and the length of text required to describe comprehensive interventions is not always appreciated in scientific journals that are more used to shorter descriptions of experimental results. While new evaluation approaches are needed, it appears unlikely that single experimental studies will 'prove' causal effects of WEIs beyond a reasonable doubt. This implies that other aspects suggesting causality, such as time sequence of effects and consistency across studies, will become more important (cf. Hill 1965). Expectations for consistency of 'successful' results, however, should be considered within the context of organisational change programmes generally, which are reported to fail more often than succeed (Clegg et al. 2002, Smith 2003. Longitudinal designs that can use reference groups as benchmarks could be helpful. Furthermore, since interventions can be expected to yield different results at different times in different contexts, reports should emphasise rich detail of the context, content and process of the interventions so that crucial judgements can be made as to the similarity, and hence possible transferability, of past cases to current situations and contexts. This represents a move away from the notion of a single 'general' solution to a 'smorgasbord' of possible intervention approaches that have been seen to work (or not) in different contexts and that could be adapted to a new situation according to the needs of the local stakeholders.

Concluding remarks
From a reductionist perspective demanding traditional experimental methods, there is no certain proof that WEIs have an effect on MSDs. It seems particularly difficult to reduce work-related MSD symptoms once they have begun. There are, however, many case studies in both 'white' and 'grey' literature that report positive effects of these interventions. Thus, interventions in the working environment appear to have positive measurable effects in some cases but not others. Increased reporting of context and processrelated information may help improve the understanding of factors affecting WEI effectiveness. There seems to be general agreement that multifactor interventions have better chances of success, while single factor interventions can rarely demonstrate a sizeable impact. Single factor interventions, however, are more amenable to experimental evaluation and this may contribute to a body of 'null effect' experimental research. If experimental study designs do lead to compromises in the quality of the intervention itself, then systematic reviews that focus only on such experimental studies may be subject to quality criteria selection bias. Publication bias and journal paper length limits may also be limiting the reporting and examination of context and process factors influencing WEI success in action research and other case studies.
Conclusions from published literature reviews appear to have changed over time. Reviews of biomechanical interventions suggest that the effects have moderated over time, while the effects of health promotion interventions have increased. It is rare for studies to have both high-quality interventions and high-quality evaluations. Multifactor interventions are much more difficult to evaluate but are widely reported to pose the best opportunity for achieving a substantial improvement in the working environment. There is a need for new investigation strategies that are suitable for complex systems and for practical and wellconducted multifactor interventions, since these are not usually suitable for evaluation by traditional experimental approaches. There is also a need to move beyond studies of the individual employees towards interventions aimed at the organisation level. The long timelines for organisational change in WEI create further methodological challenges. The variation in contexts and complex dynamics of the systems under study suggest the need to shift focus from a binary general 'proof of effectiveness' to a more nuanced identification of influential intervention elements and the contexts in which they might prove most useful. Further development of new research and intervention approaches are needed in consideration of these problems.