Are all first-generation antipsychotics equally effective in treating schizophrenia? A meta-analysis of randomised, haloperidol-controlled trials.

Abstract Objectives: Narrative, unsystematic reviews revealed no differences in efficacy between the various first-generation antipsychotics (FGAs) resulting in the psychopharmacological assumption of comparable efficacy between the different FGAs. We sought to determine if the assumption of comparable efficacy of all FGAs can be regarded as evidence-based using meta-analytic statistics. Methods: A systematic literature survey (Cochrane Schizophrenia Group trial register) was applied to identify all RCTs that compared oral haloperidol with another oral FGA in schizophrenia. Primary outcome was dichotomous treatment response. Secondary outcomes were symptom severity measured by rating scales, discontinuation rates, and specific adverse effects. Results: Altogether, 79 RCTs with 4343 participants published between 1962 and 1999 were included. We found a significant between-group difference only between haloperidol and nemonapride, but not for the remaining 19 investigated FGAs. There were no significant differences for discontinuation rates. Conclusions: As most of the single meta-analytic comparisons can be regarded as underpowered, the evidence for the assumption of comparable efficacy of all FGAs is inconclusive. We therefore cannot confirm or reject the statements of previous narrative, unsystematic reviews in this regard. Our findings were limited by the small sample size in the individual comparisons and the low methodological quality in many included studies.


Introduction
Previous narrative, unsystematic reviews found no differences in efficacy between the various so-called first-generation (''conventional'', ''typical'') antipsychotic drugs (FGAs; Klein and Davis 1969;Davis and Garver 1978). Hence, many textbooks and guidelines codified that all FGAs are characterised by comparable antipsychotic efficacy (Davis et al. 1989;Buchanan and Carpenter 2000;Lehman et al. 2004;Buchanan et al. 2010). However, this assumption was never investigated applying a systematic methodological approach and meta-analytic statistics. To close this empirical gap we aimed to compare haloperidol with all other FGAs in our meta-analysis in order to examine whether the statement of comparable efficacy of all antipsychotic drugs is true.
That this assumption often contrasts with the clinical impression can be seen for example from the frequent selection of the high-potency compound haloperidol especially for acutely ill schizophrenic patients. Haloperidol has still a high marked share in Europe and the United States (Kaye et al. 2003;Paton et al. 2003;Roh et al. 2014). Furthermore, it has been used as comparator drug in many clinical trials for the introduction of other antipsychotic agents including the (''atypical'') second-generation antipsychotics (SGAs; Hasan et al. 2012Hasan et al. , 2013Hasan et al. , 2015. Moreover, haloperidol is on the list of essential drugs of the World Health Organisation (WHO 2013). Because of its outstanding role, research on haloperidol is still justified and, therefore, we selected haloperidol as comparator drug in our meta-analysis.
The objective of the present meta-analysis was to determine the efficacy, acceptability, and tolerability of haloperidol in comparison to all other FGAs in the pharmacotherapy of schizophrenia and related disorders based on all available randomised, controlled trials (RCTs). We did not aim to investigate SGAs as comparator drugs because these have been extensively compared with haloperidol in other previous systematic reviews (Essali et al. 2009;Leucht et al. 2009Leucht et al. , 2013.

Study selection
Without any limits concerning the administered antipsychotic dose, we included all RCTs that compared oral pharmacotherapy with haloperidol to any other orally administered FGA (direct comparison, ''head-to-head'') in schizophrenia or related disorders (schizoaffective, schizophreniform, or delusional disorder; any diagnostic criteria). We applied the definition of the FGAs implemented in the pharmacological treatment guidelines of the international psychiatric societies (Lehman et al. 2004;Buchanan et al. 2010;Hasan et al. 2012). At least 75% of all participants within a trial had to suffer from a schizophrenic disorder or results for people with schizophrenia were reported separately. In randomised crossover trials, we analysed exclusively data up to the point of first cross-over to avoid possible carry-over effects.

Search strategy
All trade and substance names of haloperidol (for exact phrases see Supplementary Table 1 available online) were employed to screen without language restrictions the Cochrane Schizophrenia Group trial register (until July 2012). This register contains methodical searches of BIOSIS, CINAHL, Dissertation abstracts, EMBASE, LILACS, MEDLINE, PSYNDEX, PsycINFO, RUSSMED, and Sociofile, supplemented with hand searching of relevant journals and numerous conference proceedings. The search was updated by screening PubMed/Medline in February 2015. Additionally, we evaluated the electronic trial registers ClinicalTrials.gov and Clinicaltrialsregister.eu and screened the references of the included studies for further relevant publications. Furthermore, all manufactures of FGAs were contacted for unpublished trials.

Study selection and data extraction
Study selection and data extraction were conducted independently by at least two authors (MD, MT, MS, CL, and/or SL). Disputes were resolved through discussion and if necessary, we contacted trial authors for clarification. Moreover, data extraction forms were sent to the original corresponding authors with a request to provide missing data and possibility for corrections. The data collection process was accomplished according to the PRISMA statement (''Preferred Reporting Items for Systematic Reviews and Meta-Analyses'') (Moher et al. 2009).

Outcome criteria
Primary outcome was the number of participants who achieved clinically important response to treatment in dichotomous manner. If presented, we used a cut-off of at least 50% reduction of the baseline value of the Positive and Negative Syndrome Scale (PANSS; Kay et al. 1987) or, if not available, of the Brief Psychiatric Rating Scale (BPRS; Overall and Gorham 1962). Statistical analyses demonstrated that this definition is clinically meaningful (Leucht et al. 2005a(Leucht et al. , 2005b. In absence of PANSS or BPRS values, we used the definitions applied in the individual studies. Secondary outcomes were alterations in schizophrenic symptom severity assessed by mean change in total scores on the PANSS or, if not available, alternatively on the BPRS scale. Further secondary outcomes were all-cause discontinuation (dropouts due to any reason), dropouts due to inefficacy of treatment as well as due to adverse events, and the occurrence of single adverse effects.
''Intention to treat (ITT)'' data were used insofar as available. If not available, we assumed for dichotomous data that those participants lost to follow-up would have had the same percentage of events as those who remained in the study and were analysed. Continuous data were used as presented in the original studies without any assumptions concerning those with premature discontinuation of treatment.

Statistical analyses
Mantel-Haenszel risks ratios (MH-RRs) with the associated 95% confidence intervals (CIs) were calculated for dichotomous data (e.g., number of responders and dropouts). Continuous data (mean PANSS/BPRS changes) were analysed using standardised mean differences Hedges's with the associated 95% CIs. Statistical significance was assumed if the 95% CIs did not include the numerical value of 0 (Hedges's g) or 1 (RR), and/or the P value of the comparison was50.05.
To consider variability between the different included trials (Huf et al. 2011), the Mantel-Haenszel randomeffects model of DerSimonian and Laird (1986) was employed to calculate the pooled binary and continuous effect sizes. The degree of heterogeneity between the studies was assessed statistically with I 2 -statistic and chi 2 -test of homogeneity (significance level of heterogeneity: I 2 450% and P50.1). In case of significant heterogeneity, we reported this, checked data extraction of outlier trials, investigated reasons for their different findings, and assessed the effects of study exclusion in post-hoc sensitivity analyses.
The likelihood for the existence of publication bias was examined by funnel-plot visualisation and calculation of Egger's regression intercept test (twotailed, significance level: P50.05) (Egger et al. 1997) for the primary outcome.
In unrestricted maximum-likelihood meta-regression analyses (significance level: P50.05) we investigated the impact of the continuous moderators: (1) mean haloperidol dose, (2) dose ratio (haloperidol dose/dose of comparator drug; only if more than five individual studies contributed to a pooled meta-analytic comparison), and (3) publication year on effect sizes.
To test the robustness of our results, we a-priori decided to perform sensitivity analyses (significance level: P50.05): In these, we (1) employed a fixed-effects model instead of a random-effects one, (2) excluded RCTs that were not double-blind, (3) excluded RCTs that used a cross-over study design, (4) excluded RCTs that included children or adolescents (mean age 518 years), (5) excluded RCTs that enrolled participants with treatment-resistant schizophrenia, (6) excluded RCTs with trial duration of 43 months or unclear follow-up period, and (7) excluded RCTs carried out in China. All metaregressions and sensitivity analyses were performed for the primary outcome.

Risk of bias assessment
Every included trial was assessed independently by at least two reviewers (MD, MT, MS, CL, and/or SL) regarding methodological quality by the ''risk of bias'' tool described in the Cochrane Collaboration handbook (Higgins and Green 2011). This set of criteria contains a rating of sequence generation, allocation concealment, blinding, incomplete outcome data, selective reporting, and risk for other biases.

Results of the literature search and characteristics of the included studies
The electronic literature search identified a total of 2944 references. A total of 174 citations of these seem potentially appropriate according to title and abstract and were closely inspected. A detailed description of the individual search steps according to the PRISMA statement (Moher et al. 2009) is shown in Figure 1. Contacting the manufactures of FGAs did not yield further relevant studies. Finally, 79 RCTs with 86 relevant study arms and a total of 4343 participants were included. The enrolled studies were published between 1962 (Hollister et al. 1962;Lempérière et al. 1962) and 1999 (Cosar et al. 1999).
In seven trials, fixed doses of the drugs were administered (Cocchi et al. 1971;Darondel et al. 1981;Nedopil and Ruther 1981;Nishikawa et al. 1984;Fux and Belmaker 1991;Kinon et al. 1993;Mauri et al. 1994). The mean haloperidol dose ranged from 3.27 mg/day (Nishikawa et al. 1984) to 34.75 mg/day (Cosar et al. 1999) (mean: 14.1±9.01 mg/day). 55 studies allowed the administration of anticholinergic drugs as concomitant medication to attenuate extrapyramidal adverse effects of the antipsychotics. The majority of trials (39.2%) were carried out in the US. No study enrolled exclusively subjects with first-episode schizophrenia and six trials treatment-resistant patients (Hall et al. 1968;Howard 1974;Teja et al. 1975;McCreadie and MacDonald 1977;Kinon et al. 1993;Shalev et al. 1993). Further characteristics of the single included RCTs are summarised in Supplementary Table 2 (available online).

Methodological quality of the included studies
Supplementary Figures 1 and 2 (available online) illustrate graphically the single ratings for each item of the ''risk of bias'' tool. Briefly, in agreement with the inclusion criteria all studies were stated to be randomised but only three of the 79 included trials described an adequate random sequence generation procedure (Engelhardt et al. 1978;Giordana and Frenay 1984;Gerlach et al. 1985). No study mentioned adequate concealment of allocation. The blinding of participants and personnel (performance bias) was sufficient in 52 studies and the blinding of outcome assessment (detection bias) in 10 trials. The risk of bias for incomplete outcome data was judged to be low in 19 studies, unclear in 37, and high in 23. Only six studies appeared to be free of selective reporting and in 35 trials, we found evidence for a high risk of other biases probably confining the study results.

Secondary outcome: premature discontinuation of treatment
We did not identify any significant between-group differences for the number of participants leaving the study early due to any reason (N ¼ 32; n ¼ 1535) (

Secondary outcome: adverse effects
All effect sizes for the occurrence of adverse effects are provided in Table 1, Supplementary Table 3 (available  online), and in the forest plots in Supplementary  Figures 8-19 (available online). Due to space limitations, we only present here the statistically significant findings without the effect sizes: patients treated with haloperidol experienced significantly more adverse effects (occurrence of at least one adverse effect) compared to perazine, pimozide and thioridazine, and significantly more extrapyramidal symptoms compared to chlorpromazine and pimozide. Under haloperidol treatment, there was a higher prevalence of tremor in comparison to bromperidol and levomepromazine, and dystonia in comparison to bromperidol and perazine. Haloperidol-treated patients received significantly more antiparkinson medication than perazinetreated subjects but less than patients treated with pimozide. Weight gain occurred significantly more rarely under haloperidol than thiothixene, and sedation more rarely under haloperidol than loxapine. No significant between-group differences were found for akathisia, dyskinesia, hypotension, rigor, and tardive dyskinesia.

Meta-regressions
The a-priori defined unrestricted maximum-likelihood meta-regression analyses revealed no significant relationship between the effect sizes and mean haloperidol doses, dose ratios, or publication years (Supplementary Figure 20 (available online)).

Sensitivity analyses
Haloperidol was statistically significantly more efficacious than chlorpromazine when applying a fixed-effects model instead of a random-effects model, after removing Chinese studies, and after the exclusion of nondouble-blind RCTs. On the other hand, removing trials with a cross-over design, mean age 518 years, treatment-resistant patients, and trial duration 43 months did not alter the meta-analytic findings regarding statistically significant between-group differences ( Supplementary Figures 21-27 (available online)).

Publication bias
Visual inspection of the funnel plot did not indicate any evidence for the presence of a publication bias (Supplementary Figure 28 (available online)) which was corroborated by a non-significant Egger's regression intercept test (P ¼ 0.22).

Discussion
Meta-analysing 79 RCTs with altogether 4343 participants that compared haloperidol with any other FGA in schizophrenia, we found a significant between-group difference for the proportion of treatment responders only in comparison to nemonapride, but this finding was based on only one trial. There were no statistically significant differences for the remaining 19 investigated FGAs. In terms of mean PANSS/BRRS total score changes, haloperidol was significantly more efficacious than trifluoperazine and less efficacious than flupenthixol and loxapine. However, the statistically significant differences for trifluoperazine and flupenthixol were again based on the results of only one trial and the finding for loxapine was confined by a significant level of heterogeneity.
The theoretical background of our study was the assumption of comparable efficacy of all FGAs implemented by previous narrative, unsystematic reviews (Klein and Davis 1969;Davis and Garver 1978). We aimed to examine this presumption applying a structured approach and high-quality meta-analytic methodology for the first time. Therefore, we evaluated systematically the evidence derived from randomised, haloperidolcontrolled trials. The fact that only one of the 20 haloperidol-FGA comparisons exhibited a significant between-group difference in achieving treatment response tends to corroborate the evaluated assumption of comparable efficacy of all FGAs. However, the included trials were characterised by mostly small sample sizes (mean: 56 participants) and the predefined outcomes were often incompletely reported. As a consequence, many meta-analytic comparisons were underpowered, and we can neither definitely confirm nor refuse the assumption of the comparable efficacy of all FGAs, at least not for the haloperidol-FGA comparisons.
Besides the present meta-analysis investigating haloperidol versus other FGAs, also other systematic reviews aimed to determine the differences in antipsychotic efficacy between the various FGAs. Samara et al. (2014) compared chlorpromazine with other FGAs and additionally a couple of SGAs. Based on 128 RCTs with a total of 10,667 participants, they found that chlorpromazine significantly differentiated only from a few other FGAs and concluded that there is no convincing evidence for confirming or rejecting the dogma of equal efficacy of all FGAs. Similar findings were reported by further systematic reviews investigating the differences in efficacy between various FGAs (Dold et al. 2015;Leucht et al. 2008;Tardy et al. 2014). To appraise the antipsychotic efficacy of haloperidol in comparison to all other antipsychotic drugs, a corresponding meta-analysis comparing haloperidol to all SGAs should be taken into account (Leucht et al. 2009). In this systematic review, only four SGAs (amisulpride, clozapine, olanzapine, and risperidone) were significantly more efficacious than haloperidol.
In order to determine the impact of the different drug doses administered in the individual included RCTs, we performed a-priori defined meta-regressions with mean haloperidol doses and dose ratios as continuous moderators. This seems appropriate especially because of the high variability of the dispensed haloperidol doses (mean: 14.1 ± 9.01; range: 3.27 to 34.75 mg/day). The non-significant results of these meta-regressions do not suggest that our statistical findings were influenced by the different administered antipsychotic doses in the single RCTs. As limitation, it should be taken into account that the numbers of the individual trials contributing for the meta-regressions were rather small and, therefore, some meta-regressions could be underpowered. However, our findings are concordant with those of other systematic reviews investigating the efficacy of different haloperidol dose regimens (Davis and Chen 2004;Donnelly et al. 2013) without identifying convincing evidence for an association between high doses of haloperidol and high antipsychotic efficacy. We did not find any significant differences in terms of all-cause discontinuation and premature study discontinuation due to adverse effects. This indicates comparable acceptability and tolerability of all FGAs in comparison to haloperidol. Especially all-cause discontinuation is more and more commonly used as outcome in effectiveness trials evaluating antipsychotics (Lieberman et al. 2005;Kahn et al. 2008) because it combines both efficacy and safety aspects of the psychopharmacotherapy. It should be considered in the individual drug choice that any gains in pure efficacy-related outcomes are not necessarily accompanied by similar effects in such a global measure.
The occurrence of specific adverse effects was only uncommonly reported in the majority of the individual trials. However, our meta-analytic findings tend to corroborate the classical thesis of antipsychotics. Hence, low-and mid-potency antipsychotic drugs are typically characterised by more anticholinergic (e.g., dry mouth and obstipation), antiadrenergic (e.g., orthostatic dysregulation), and antihistaminergic (e.g., sedation and weight gain) adverse effects compared to the highpotency antipsychotics which are usually associated with a higher proportion of movement disorders than low-potency antipsychotics. Our results for the adverse effects seem to support these different characteristics in the risk profile of the FGAs: The low/medium-potency FGAs caused significantly more sedation (loxapine) and weight gain (thiothixene) than the high-potent FGA haloperidol. On the other hand, haloperidol was associated with a higher incidence of extrapyramidal symptoms compared to the low-and medium potent FGAs chlorpromazine and pimozide.
In addition to the above discussed limitations caused by small sample sizes and incomplete outcome data reporting, it should be considered that the included RCTs differed in terms of the investigated participants (e.g., subjects with multiple schizophrenic episodes, treatment-resistant patients, or patients in remission), treatment modalities (outpatient or inpatient treatment), trial duration, the administered antipsychotic doses, the assessed outcomes and the applied diagnostic criteria. However, according to the sensitivity analyses, our findings did not appear to be influenced by aspects of study methodology, age of participants, trial duration, or origin of the study.
The interpretation of the meta-analytic results is further limited by poor methodological quality in a  Benperidol  Bromperidol  ----+  --+  ----Chlorpromazine - large number of the included studies. Although only RCTs trials have been incorporated, it remained unclear for many trials whether the procedure of randomisation was appropriate. Nine included RCTs were not performed in a double-blind manner and for the remaining it was not clearly indicated that the double blinding was maintained throughout the whole study course. Furthermore, 23 studies exhibited very high dropout rates (425% of the randomised participants). With regard to the evaluation of the occurrence of adverse effects, it must be noted that only a selection of the most important and most common side effects were investigated, but potentially more adverse effects might exist.
In the same way, it was not possible to obtain enough data to investigate specific domains of schizophrenia such as positive or negative symptoms. Potentially, more statistically significant differences would emerge when analysing exclusively the positive symptoms of schizophrenia, for instance by the PANSS positive subscale, instead of total scores comprising also negative symptoms and aspects of general psychopathology. However, this was not practicable due to a lack of data in the individual trials.
As further methodological limitation, it must be referred to the possibility of an industry sponsorship bias because haloperidol was the active comparator reference drug in most of the studies and the other FGAs were the drugs of interest in the original trials.
The symmetrical funnel plot and the non-significant Egger's regression intercept test did not provide any evidence for the presence of a publication bias. However, we cannot exclude that some study results relevant for this meta-analysis were not published and subsequently not covered by our systematic literature search. Thus, the possibility of publication bias needs to be considered.