How Well Is Quality Improvement Described in the Perioperative Care Literature? A Systematic Review

Background— Quality improvement (QI) approaches are widely used across health care, but how well they are reported in the academic literature is not clear. A systematic review was conducted to assess the completeness of reporting of QI interventions and techniques in the field of perioperative care. Methods— Searches were conducted using Medline, Scopus, the Cochrane Central Register of Controlled Trials, the Cochrane Effective Practice and Organization of Care database, and PubMed. Two independent reviewers used the Template for Intervention Description and Replication (TIDieR) checklist, which identifies 12 features of interventions that studies should describe (for example, How: the interventions were delivered [e.g., face to face, internet]), When and how much: duration, dose, intensity), to assign scores for each included article. Articles were also scored against a small number of additional criteria relevant to QI. Results— The search identified 16,103 abstracts from databases and 19 from other sources. Following review, full-text was obtained for 223 articles, 100 of which met the criteria for inclusion. Completeness of reporting of QI in the perioperative care

across the 100 included articles was 6.31 (of a maximum 11). More than a third (35%) of the articles scored 5 or lower. Particularly problematic was reporting of fidelity (absent in 74% of articles) and whether any modifications were made to the intervention (absent in 73% of articles).
Conclusions-The standard of reporting of quality interventions and QI techniques in surgery is often suboptimal, making it difficult to determine whether an intervention can be replicated and used to deliver a positive effect in another setting. This suggests a need to explore how reporting practices could be improved.
Health care is increasingly the subject of quality improvement (QI),1 which can be understood as purposeful efforts to make changes that will lead to better patient outcomes, better system performance, and better professional development.2 QI efforts often involve a quality intervention (specific changes to clinical or organizational systems) and a QI technique (a method used to support the implementation of the intervention, such as the Model for Improvement).3 Surgery is a particularly important area for QI. Fourteen recordreview studies together indicated that adverse events occurred in 14.4% of 16,424 patients undergoing surgery and that potentially preventable adverse events occurred in 5.2% of them.4 For 3.6% of the 16,424 patients, the consequences were fatal, and for around 10.4%, severe. 4 In the United State, adverse events in surgery account for approximately half (48%) of all adverse events in hospitals. 5 Given that an estimated 234 million surgical interventions are performed every year worldwide,6 improving quality and safety of surgical care is a global priority.7 Perioperative care, which encompasses care delivered before, during, and after surgery,8 makes an important contribution to the outcomes and experiences of surgery. Systematic reviews of QI efforts in diverse surgical specialties have reported that improvements are possible across the entire perioperative journey.1,9-12 However, like randomized controlled trials (RCTs) in surgery, 13 there are indications that important information may be missing from reports of surgical QI studies.14 This is not a problem unique to surgery: Notwithstanding relevant guidance,15 reporting of QI is often weak, lacking, for example, details of implementation context, potential harm from QI activities, intervention components, and the duration of individual Plan-Do-Study-Act (PDSA) cycles. 16,17 One challenge in producing full and explicit accounts of interventions-QI or otherwisehas been the absence of clearly articulated expectations about what should be reported. A welcome recent development, therefore, is the TIDieR (Template for Intervention Description and Replication) checklist,18 which identifies 12 features of interventions that studies should describe. TIDieR is recommended by the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network19 as an extension of the Consolidated Standards of Reporting Trials (CONSORT)20 and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT)21 statements to improve reporting across all "evaluative" study designs. A second challenge in QI reporting is the wide variation in study design used in the field. QI studies can be heuristically distinguished as QI projects, which are characteristically conducted with the primary aim of securing change in a defined service using structured methods, or research and evaluation studies, which are conducted with the primary aim of generating knowledge.22 QI projects and research and evaluation studies may use broadly similar methods-for example, in the data-analysis techniques they apply-but they differ in their aims, assumptions about process, and the nature of the claims that they make about generalizability. Thus, for both QI projects and research/evaluation studies, the study design used may affect how both quality interventions and QI techniques are reported.23,24 Many study designs used in the improvement field are vulnerable to problems of both internal and external validity (including QI projects and many trial designs, for example) and thus require detailed reporting not just of the "nuts and bolts" of quality interventions and QI techniques, such as what was delivered and by whom, but also of the contextual factors (such as QI team, QI support and capacity, and organization) relevant to implementation.25 Poor reporting frustrates improvement in health care systems: Among other problems, it poses threats to the internal validity of studies (for example, by making it difficult to determine the components and mechanisms of the intervention under study and the relevant aspects of context) and to external validity (that is, the ability to replicate in other settings). 26,27 Yet how well QI in surgery is reported is not known. We aimed to assess, using systematic review methods, the completeness of reporting of quality interventions and QI techniques in the perioperative literature, and, in particular, to identify which elements of reporting are most frequently missing.

Protocol
The protocol describing the design of this systematic review was submitted for external peer review28 and was registered with PROSPERO, an international database of prospectively registered systematic reviews in health and social care29 (CRD42014012845).

Eligibility Criteria
In this systematic review, we sought to include the following: The taxonomy created by Shojania et al. identifies nine QI "strategies," but, as we recognize, those strategies are of different kinds-ranging from reminder systems to financial incentives. 31 We therefore found it useful to distinguish between quality interventions and QI techniques. We defined quality interventions as specific changes to clinical or organizational systems. We defined QI techniques as the methods used to support the change, characteristically involving a predefined set of steps. Thus, while a reminder system for hand washing would be classified as a quality intervention, methods such as PDSA cycles, which are intended to support the implementation of the reminder system, would be classified as QI techniques. Accordingly, we modified the Shojania et al. taxonomy31 by classifying the strategies numbered 1 through 9 as examples of quality interventions and those numbered 10 and 11 as QI techniques (Table 1). Of note, the distinction between a quality intervention and a QI technique is not hard and fast but is rather more of a heuristic and is, to some extent, context specific. For example, feedback is listed within the taxonomy as both an intervention and a technique because feedback can be delivered as part of a quality intervention such as a reminder system but can also be delivered as part a QI technique, such as audit and feedback.
To qualify for inclusion in this systematic review, articles had to report both a quality intervention (strategies 1-9, Table 1) and an associated QI technique (strategies 10 and 11, We sought to improve agreement on whether candidate articles met the inclusion criteria by discussing the interventions described in articles with experts in the field and by contacting authors when clarification was needed to classify study design and the type of intervention. 37

Search Strategy
Bibliographic In an attempt to design a search that would be sensitive enough to ensure retrieval of all relevant studies and specific enough to ensure that irrelevant articles would be excluded, we adapted a QI search strategy that had previously been used by the Health Foundation in a research scan for literature available as of December 2010 on the concept and practice of improvement science.38 Our search strategy was designed to capture terms relating to (1)

Use of the TIDieR Checklist to Assess Reporting of Quality Interventions
The QI interventions (items 1-9, Table 1) were scored using a modified version of the TIDieR checklist (Table 2),which contains 12 items relating to reporting criteria.39 Item 9 (tailoring: personalization or titration of the intervention) was removed for purposes of this review because the interventions we studied were not titrated for individual patients. This resulted in a modified TIDieR checklist with 11 items, so that the maximum score that could be obtained by any article was 11/11. Scoring was guided by the TIDieR group's explanatory statement,18 which was further clarified through e-mail correspondence with TIDieR's first author (Hoffmann).
Articles were scored as "Yes" for each item that could be assessed as reported in full. If the description was unclear or if no description was given, the article was scored as "No" for that item. For example, when an article clearly described the modifications made to an intervention in a manner judged to be fully explicit, it was rated "Yes" under the TIDieR item "modification" (item 9, column 2, Table 2). An example of an article achieving "Yes" under this criterion described the modifications made as follows: "After multiple trials of various insulin protocols, a simplified high-infusion protocol replaced the low-infusion protocol with intermittent boluses."40(p. 25) Many articles reported on multiple interventions, such as a safety bundle.41 To accurately replicate a multifaceted program, all its components needed to be fully described. Therefore, we scored each article once against each TIDieR item, regardless of the number of interventions.

Use of the Checklist of Additional Items (Table 2) to Assess Reporting of QI Techniques
TIDieR was designed to aid assessment of the reporting of interventions. We also required a means of evaluating the reporting of QI techniques. We decided to score the QI techniques (items 10 and 11 in Table 1) in the studies in our sample by using a checklist of relevant items ( Table 2). The checklist items were based on the Cochrane EPOC review group's datacollection checklist,42 which had previously been used in systematic reviews to consider reporting features specific to QI measurement. 43 We selected relevant items from the EPOC checklist, including, for example, baseline measurement, data-collection schedule, data analysis, missing data, and named outcomes, for our checklist. A further item relating to data volume/duration was added in response to a recent publication by Taylor and colleagues17 on the reporting of PDSA cycles.

Additional Reporting Features Included in the Data-Extraction Template
As well as the TIDieR checklist and the checklist of QI techniques, we also included in our data-extraction template items relating to reporting of patient and public involvement (PPI), adverse events, patient-reported outcomes, and use of the Standards for QUality Improvement Reporting Excellence (SQUIRE) guidelines44 (Table 2). PPI, defined as the incorporation of the knowledge, skills, and experience of patients, caregivers, and the public into a study,45 was included because it is encouraged across all types of surgical interventional studies. 46 We defined an adverse event as any unfavorable or unintended sign, symptom, or event associated with the intervention; reporting of such events is important to enable the full understanding of possible benefits and harms of interventions. 16,46 The SQUIRE guidelines, which support the quality of reporting of QI studies, are recommended by the EQUATOR Network.

Absent Reporting Features
In view of best-practice recommendations produced by the Centre for Reviews and Dissemination, University of York,we also report what was not reviewed as part of this systematic review. Methodological flaws and risk of bias were not examined because the review did not focus on intervention effect.48

Data Extraction
Data

Study Characteristics
Of the 100 eligible articles, 40 focused on two or more surgical specialties. The remaining 60 articles named a specialty-cardiothoracic (21), colorectal/general (19), musculoskeletal Study designs were varied (Appendix 2, available in online article). Many articles (65) did not explicitly identify their study design but on inspection were found to be before-and-after studies (a design using data collected at defined time points before and after the introduction of an intervention, also known as the pretest/posttest design).49 Nine studies were labeled as cohort50-58 yet did not appear to feature true observational study designs, and one study was mislabeled as a case-control. 59 The United States was the most frequently reported country for study setting (67/100).
The most commonly reported targeted clinical issue for undertaking QI was that of reducing infection (30), followed by improving intraoperative clinical processes (such as reducing "never events") (18) and reducing postoperative complications (such as bleeding and prolonged intubation) (15). The least frequently cited aims were improving the postoperative discharge process (3), improving self-management (3), and reducing the postoperative incidence of venous thromboembolism (1) (Appendix 3, available in online article).

Completeness of Reporting: Quality Interventions and Quality Improvement Techniques
In this section, we report our appraisal of the completeness of reporting of the TIDieR checklist items and QI techniques ( Table 2). A full list of all 100 included articles can be found in Appendix 4 (available in online article).

Completeness of Reporting: Quality Interventions (TIDieR)
All articles used a combination of quality interventions, such as introducing a care pathway, providing staff education, changing the timing of ward rounds, and issuing reminders. No specific combination of interventions was used more often than any other. The most commonly reported intervention (classified according to the modified Shojania et al. QI taxonomy31) was education (59% of articles), including any form of teaching and learning, such as workshops. Nine studies provided access to Web links for additional material such as Web-based educational modules. Checklists were reported as quality interventions in 14% of articles; protocols were reported as quality interventions in 43%. More than half (51%) of the studies included feedback as part of the quality intervention.
The distribution of TIDieR scores for the reporting of quality interventions across the 100 articles approximately followed a normal bell-shaped curve, with a slight skew toward higher ratings (Figure 2, page 202). The most common (modal) score was 7/11, and the average (arithmetic mean) score was 6.31/11. The TIDieR items that were most usually fully reported were why (complete in 98% of articles), brief name of intervention (complete in 94% of articles), where (complete in 77% of articles), what (procedures) (complete in 69% of articles), and who (complete in 52% of articles) (Figure 3, page 203).
How well the researchers actually adhered to the intervention protocol and reported intervention fidelity (item 11: how well actual, Table 2) was the most frequently incomplete TIDieR item (Figure 3), absent in 74% of the articles. An example of good reporting of intervention fidelity is provided in Thomassen et al.: "Our checklist was used in 61% of all anaesthesias during the testing period."60(p. 1183) Modifications to interventions were also generally poorly reported (incomplete in 73% of the articles). Other items that were not fully reported in more than half of the included articles were: what (materials-any physical or informational materials used in the intervention and details on how they can be accessed) (incomplete in 62% of articles), when and how much (incomplete in 60% of articles), and how well (planned) (incomplete in 53% of articles). Assessed against the QI technique criteria (Table 2), the most frequently complete items were naming the QI technique (fully reported in 95% of the articles) and outcome measures (86%). The most common incomplete items were the description of missing data (not complete in 83% of the articles) and the provision of a primary outcome measure (missing in 90% of the articles) (Figure 4, page 203). This was followed by incomplete reporting of an explicit prediction of change (78%) and data volume (for example, length and number of PDSA cycles) (74%). Just over a third (38%) of articles discussed whether or not the results might be transferrable to another setting ( Figure 4).

Discussion
Adequate reporting and methodology are required to enhance the contribution that QI studies could make to improving care and reducing harm15,38 for the millions of patients undergoing surgery each year.69,70 Full descriptions are important to determine whether an intervention can be replicated and used to deliver a positive effect in a new setting, as well as what resources are required and how they should be allocated, and, ultimately, to ensure that patients benefit. 18,39 Our systematic review has demonstrated that the reporting of QI in the perioperative care literature is suboptimal (Figure 2), with important details often lacking. More than a third (35%) of the articles scored ≤ 5 out of a maximum of 11 ( Figure 2) on completeness of intervention reporting. The poor quality of reporting of QI studies identified here is likely to lead to frustration for interested readers.
Complete reporting is necessary to ascertain whether an intervention can be replicated, but it has another equally important function, which is that of informing decisions about whether an intervention should be replicated.71 When the results of QI studies are compelling and interesting, interventions must be reported in a way that allows recognition of all of their strengths and weaknesses. Explicit description will help the reader to understand how much the intervention might contribute toward changing practice for the better across many settings and the notable caveats. It was therefore particularly disappointing that we found only one study reporting adverse events resulting from applying the QI intervention.
Incomplete reporting is, among many other problems, also implicated in research waste. Studies that are not fully reported can necessitate additional or futile research that would not be required if the full findings were known. The drive to reduce waste has already been embraced in surgery with initiatives such as the "restoring invisible and abandoned trials" (RIAT) initiative,72 which encourages the publication of all research outcomes, and the IDEAL collaborative,23 which encourages the publication of accurate and transparent intervention development with the aim of avoiding waste through suboptimal reporting or distortion.
Although our aim in this review was to assess the completeness of reporting of QI, we also identified problems in the reporting of studies themselves. Nine articles50-58 were incorrectly described as cohort studies because they were not observational (they included interventions aimed at change),73 and many others did not explicitly identify the design used. Inappropriate categorization of studies is not unusual74 but may be particularly challenging in QI studies in which conventional descriptors derived from epidemiological study designs might not be optimally suited to use of, say, SPC methods. Consistent application of study design terminology is therefore likely to be helpful to QI reporting in the future.
A further challenge is that most studies in our review used designs, such as before-and-after studies, that have weaknesses in controlling for bias and in making causal inferences, if judged by the standards of traditional epidemiology. However, these designs are characteristic of QI projects. As we noted earlier, QI and traditional research are distinct (though sometimes overlapping) enterprises: QI projects are primarily aimed at securing change in a specific environment, in contrast to research of which the primary purpose is generating new knowledge.22 This makes the description of context particularly important, yet details of many aspects of context were missing in the articles we reviewed.
The range of areas targeted by QI in the articles we included was narrow. For example, we identified a paucity of QI studies examining the discharge process, patient information, and handover (handoff) to primary care on discharge, and coordination within and between specialties in emergency care, even though all of these are known to be problematic. [75][76][77] Only two articles reported on patient and public involvement,62,63 despite encouragement for improvement strategies to include patients,78 again suggesting that many opportunities for QI remain to be addressed.
There are limitations to this study. The possible scope of QI literature is wide, as reflected in the fact that it was difficult to pin down an accepted definition of the term. The use of MeSH terms and keywords has been inconsistent,79 and the 100 articles themselves used myriad terms. There is also no consensual definition of the distinction between quality interventions and QI techniques. The taxonomy we applied was fairly generic and might have resulted in literature being missed or studies being misclassified, although we believe that we reached a good compromise between robustness and pragmatism.
It is a difficult balance for the systematic reviewer to obtain enough articles to ensure that nothing is missed while also reducing "noise" to ensure that the project is manageable. More than 16,000 articles were identified, indicating that our search strategy had low sensitivity and specificity that had to be resolved by detailed review. It is likely that problems in search strategy design were related to lack of consensus on how QI terminology should be applied47 and lack of standardization of MeSH terms for QI article indexing.79 It is possible that our search might not have captured all studies stimulated, for example, by the 2014 Improving Trauma Care Act in the United States80 or the Emergency and Urgent Care Review in the United Kingdom. 81 The exclusion of three non-English-language articles and of unpublished reports may have introduced some bias,82 but this would have greater importance if the review had intended to estimate the size of the interventions' effect rather than describing their content.
A final limitation of this review is the possibility that reviewers' scoring of the articles might have been imperfect. Overly positive scoring could have occurred if the modification of one element of a multifaceted intervention was scored as "fully reported," when the possibility remains that other modifications were made but went unreported. The reviewers did not contact authors to identify missing aspects of intervention reporting, and the articles were scored as seen. Overly negative scoring might also have occurred. When content was completely absent, the item was scored as not fully reported, but nonreporting might have occurred for good reasons; for example, authors might not have reported on modifications if the intervention was never intended to be modified.
This systematic review has identified suboptimal reporting of QI within surgical literature but did not attempt to identify the possible causes of this problem. It might therefore be necessary to further consider what QI authors believe is required to create an environment in which improved reporting might flourish. The TIDieR checklist, of course, is not designed for assessing reporting of QI articles specifically, and topics for future investigation might include the adaptation of existing reporting guidance, such as TIDieR, to enable better description of features specific to QI.44 Benefit might also be gained from exploring journals' word-count limitations, checklist endorsement, and collaborative approaches to learning and sharing information, all of which might offer creative routes to securing fuller reporting. 83 The key is to identify what is required for authors to generate QI reports that provide a relevant and full account of the QI intervention and technique.

Conclusions
QI projects in the perioperative literature are suboptimally reported, but it is not yet clear why. Further exploration of poor reporting in surgery may help to orient research toward ways to improve it. This may then contribute toward the development of a comprehensive, coherent, and valid framework for the design and reporting of quality interventions and QI techniques.

Supplementary material
Refer to Web version on PubMed Central for supplementary material.  The distribution of TIDieR scores for the reporting of quality interventions across the 100 articles approximately followed a normal bell-shaped curve, with a slight skew toward higher ratings. The most common (modal) score was 7/11, and the average (arithmetic mean) score was 6.31/11.  Table 1 Quality Improvement (QI) Taxonomy *

QI Strategy Definition Examples of Methods Surgical Examples
Articles reporting any QI intervention (1-9) must include 1 additional item (10 & 11) from Table 1.

Provider reminder systems
Any "clinical encounter-specific" information intended to prompt a clinician to recall information or consider a specific process of care