Accounting for deaths in neonatal trials: is there a correct approach?

The Disability and Perinatal Care report published by the National Perinatal Epidemiology Unit and Oxford Regional Health Authority in 1994 emphasised that data on the neurodevelopmental outcomes of neonates requiring intensive care should be formally collected.1 Over the last 40 years, survival rates of high-risk infants have improved but these have not been matched with parallel improvements in neurodevelopmental outcomes.2–4 Consequently, the focus of neonatal care has shifted increasingly towards reducing long-term morbidity and neurodevelopmental impairment.1 ,2 Improved long-term neurodevelopment is now considered the ‘Holy Grail’ in neonatology.1 ,5

These developments have led to a change in focus of perinatal trials, which have moved away from survival as the primary outcome towards using long-term functional outcomes.2 This has raised the question of how to deal with deaths in those trials where neurodevelopmental impairment is of primary interest. In perinatal trials involving the recruitment of high-risk infants, it is inevitable that some will die and quantifying outcome for these infants has led to a range of approaches, none of which are without compromise.6–9

This issue has become even more pertinent since some interventions designed to improve neurodevelopmental outcomes may not necessarily have a biologically plausible effect on mortality. Nevertheless, mortality is significantly higher in neonatology compared with other fields of medicine, particularly among very preterm infants,10 which strongly influences both trial design and analysis.

This review considers approaches that have been taken by trialists regarding the role of death in their outcome measures, the pros and cons of the various approaches, the effect on the outcomes measured and the subsequent interpretation of the findings.

### How is neurodevelopmental impairment measured?

There is broad global consensus that neurodevelopmental outcomes should be measured at 18–24 months of age corrected for prematurity. This is a pragmatic compromise between identifying adverse neurodevelopmental outcomes as …


INTRODUCTION
The Disability and Perinatal Care report published by the National Perinatal Epidemiology Unit and Oxford Regional Health Authority in 1994 emphasised that data on the neurodevelopmental outcomes of neonates requiring intensive care should be formally collected. 1 Over the last 40 years, survival rates of high-risk infants have improved but these have not been matched with parallel improvements in neurodevelopmental outcomes. [2][3][4] Consequently, the focus of neonatal care has shifted increasingly towards reducing long-term morbidity and neurodevelopmental impairment. 1 2 Improved long-term neurodevelopment is now considered the 'Holy Grail' in neonatology. 1 5 These developments have led to a change in focus of perinatal trials, which have moved away from survival as the primary outcome towards using long-term functional outcomes. 2 This has raised the question of how to deal with deaths in those trials where neurodevelopmental impairment is of primary interest. In perinatal trials involving the recruitment of high-risk infants, it is inevitable that some will die and quantifying outcome for these infants has led to a range of approaches, none of which are without compromise. [6][7][8][9] This issue has become even more pertinent since some interventions designed to improve neurodevelopmental outcomes may not necessarily have a biologically plausible effect on mortality. Nevertheless, mortality is significantly higher in neonatology compared with other fields of medicine, particularly among very preterm infants, 10 which strongly influences both trial design and analysis.
This review considers approaches that have been taken by trialists regarding the role of death in their outcome measures, the pros and cons of the various approaches, the effect on the outcomes measured and the subsequent interpretation of the findings.

How is neurodevelopmental impairment measured?
There is broad global consensus that neurodevelopmental outcomes should be measured at 18-24 months of age corrected for prematurity. This is a pragmatic compromise between identifying adverse neurodevelopmental outcomes as early as possible, and using tools that are reliable and likely to be predictive of impairments in later life. 11 12 This also allows results to become available in a timescale that is not too far removed from the perinatal intervention, while minimising the duration and costs of the trial.
Neurodevelopmental outcomes are usually assessed using validated psychometric instruments which are designed to quantify a child's developmental progress. These have typically comprised formal standardised tests to assess multiple developmental domains including cognitive, language and motor development, but parent report measures have become increasingly popular as cost-effective alternatives to formal assessments. There are a variety of tools commonly used to assess neurodevelopment at 18 months to 2 years of age in perinatal trials (table 1). One of the most widely used and recently standardised developmental tests is the Bayley Scales of Infant and Toddler Development third edition (Bayley-III), 14 which provides separate scores for cognitive, language and motor development. Perinatal trials typically use either the cognitive score or a combination of the domains to assess cognitive development. 15 This 'outcome' is often combined with other measures of neuromotor and sensory impairment (eg, vision, hearing and cerebral palsy) to establish a 'broadspectrum' assessment of neurodevelopmental outcome (see table 2). Opinions vary as to what combination should be used, but the broad approach to defining neurodevelopmental impairment is clear. 11 However, there is no consensus on how death should be incorporated into the analysis of such composite primary outcomes.

DIFFERENT APPROACHES TO ACCOUNTING FOR DEATH IN CLINICAL TRIALS Use of a composite outcome
One common approach in neonatal trials is to use a composite of death or neurodevelopmental impairment as the primary outcome. Neurodevelopmental impairment for these purposes is usually dichotomised (ie, present or absent). In this approach, a score on a particular psychometric test or combination of measures may be used as a 'cut-off ' for defining an adverse outcome (see table 1). For example, standardised index scores more than 3 SDs below the normative mean of 100 (ie, scores <55) on the Bayley Scales of Infant Development second edition (BSID-II) 13 are generally accepted as defining severe impairment. Therefore, a trial could be based on a primary outcome of infants who either died before 2 years corrected age or had a BSID-II index score <55. Several major national and international studies have used this approach (see table 3).
The main advantage of composite outcomes is that they add statistical efficiency, in terms of an increased number of events and therefore greater statistical power, as demonstrated by the National Institute of Child Health and Human Development trial 24 on whole-body hypothermia for hypoxic ischaemic encephalopathy. The authors reported that whole-body hypothermia was associated with a reduction (risk ratio 0.72, 95% CI 0.54 to 0.95; p=0.01) in the primary outcome (death or moderate-to-severe neurodisability) compared with usual care in infants with moderate or severe hypoxic-ischaemic encephalopathy. However, the individual components of the primary outcome were not significant when analysed as secondary outcomes. This shows the benefit of using a composite primary outcome, especially when the components are important outcomes for clinicians and parents alike.
However, there are potential problems in defining the primary outcome in this way. For example, it cannot always be assumed that all components of the composite outcome will be affected by the intervention in the same direction. 29 Composites work best when an intervention anticipated to reduce morbidity is also expected to improve survival and this may not always be true; a trial investigating target ranges of oxygen saturation in extremely preterm infants 30 illustrates this. The oxygen saturation component of this factorial trial tested the hypothesis that a lower target range of oxygen saturation (85%-89%), as compared with a higher target range (91%-95%), would reduce the incidence of the composite outcome of severe retinopathy of prematurity or death among infants who were born between 24 +0 weeks and 27 +6 weeks gestation. The results showed no evidence of a difference in the composite outcome overall. However, the study found that a lower target range of oxygenation (85%-89%), as compared with a higher range (91%-95%), resulted in an increase in mortality and a substantial decrease in severe retinopathy of prematurity among survivors. 30 Treating neurodevelopmental impairment as a dichotomous outcome in analysis A further layer of complexity of treating neurodevelopmental impairment as a dichotomous outcome is that it effectively becomes 'all or nothing'. For example, if a study defines a BSID-II index score of <70 as representing moderate-to-severe neurodevelopmental impairment, then a child with a score of 70 would be classified as unimpaired, while a child with a score of 69 would be classified as impaired, even though the difference between these scores is not clinically significant. In addition, in this case, moderate or severe neurodevelopmental impairment is mathematically treated equally as important as death. Clearly, this may be a reasonable compromise, but illustrates the problems that may arise when interpreting study results.
Furthermore, an intervention capable of producing a clinically significant difference in the mean neurodevelopmental outcome of the population may be completely missed. A randomised trial (MOMS) of prenatal versus postnatal repair myelomeningocele to postnatal surgery, 9 used the BSID-II Psychomotor Development Index (PDI) score as a secondary outcome. There was a significant difference in the mean PDI score between the two groups (p=0.03). However, when the proportion of infants who had a PDI score ≥50 was compared, there was no significant difference between the two groups ( p=0.15). This was true even when a higher cut-off of 85 was used ( p=0.06). Dichotomising a continuous outcome measure using a cut-off may lead to a loss of power. 31 Thus, a significant result on a continuous outcome may no longer be significant when the outcome is dichotomised.  Hearing loss corrected or partially corrected with aids.
No useful hearing even with aids. Vision Moderately reduced vision but better than severe impairment or blind in one eye with good vision in contralateral eye.
Blind or can only perceive light.
Speech & Language Some words or signs but fewer than 5 or unable to comprehend un-cued command but able to comprehend cued command.
No meaningful words or unable to comprehend cued command. Cognitive function Score −2 SD to −3 SD below the normative mean. Score < −3 SD below the normative mean. Despite being consistent with an intention-to-treat analysis using complete data, this approach involves compromises. Ongoing UK Neurodevelopmental status at 2 years corrected (for prematurity). Neurodevelopmental status is defined by the three main domains of the Bayley-III scales, that is, cognitive score, language composite score and motor composite score.

Yes
Allocating any single value (ie, using single imputation) to participants who have died is technically problematic since the data may not be missing at random. 32 There is the compromise of assigning a similar, if not identical, score to those participants who have died with those severely disabled and also the scenario where one might impute a score for the deceased, which is higher than the minimum possible neurodevelopmental score for survivors on that scale. Trialists need to guard against this possibility when considering imputation. However, to impute a Bayley score implies that we know what the 'trade off ' is between level of disability and death and imparts extraneous value judgements, which could vary from individual to individual. 32 The use of a single imputation to assign a value to participants who have died also affects the precision and hence the interpretation of results, depending on the value assigned. At a more fundamental level, when planning a study, single imputation for participants who have died at analysis is likely to artificially inflate the overall SD thereby adversely affecting the precision of the results. This will consequently impact on the sample size and the appropriateness of the statistical test used.

Focusing solely on neurodevelopmental impairment
An alternative to incorporating death into a composite primary outcome would be to consider the developmental test score within survivors only. In this case, the study findings would reflect the impact of the intervention solely on neurodevelopmental outcome, and not on death. However, such an analysis is a non-randomised comparison and therefore subject to an increased risk of bias, the chances of which are affected by the magnitude of the death rate and whether the death rate is differential across the groups being compared. Here the difference in test scores will be easy to interpret; although death may be reported as a secondary outcome, the study would not normally be powered to show a difference in survival.
A study 7 that examined the effect of mild hypothermia for neuroprotection in infants requiring extra corporeal membranous oxygenation used this approach. The investigators argued that mild hypothermia was unlikely to influence mortality and thus they focused on the outcome for which there was biological plausibility for improvement. 7 The primary outcome was analysed as a continuous variable and used the Bayley-III cognitive composite score.
However, studies that focus solely on neurodevelopmental impairment may miss an important impact on survival, illustrated by a trial which investigated oxygen saturation and outcomes in preterm infants (BOOST-II UK). 22 The investigators compared the effects of targeting an oxygen saturation of 85%-89%, as compared with a range of 91%-95%, in infants born before 28 weeks' gestation. The primary outcome was a composite of death or serious neurosensory disability at 2 years corrected age. The trial was stopped early due to significantly increased mortality at 36 weeks postmenstrual age in the group treated with the lower oxygen saturation target. 22 Since deaths in neonatal trials mainly occur in the first few weeks of life, monitoring safety in such trials, typically by an independent Data Monitoring Committee, requires the uncoupling of such composite primary outcomes for this purpose, given that the 'whole' primary outcome is not available until much later.

Other approaches
The on-going OPPTIMUM 6 trial is examining whether prophylactic vaginal progesterone to prevent preterm birth has long-term neonatal or infant benefit. For the analysis of the childhood primary outcome (Bayley-III cognitive composite scale at 2 years of age, a continuous measure), the investigators plan to incorporate deaths in a two-stage statistical model. 6 Their rationale for the inclusion of deaths in the analysis is twofold; the number of deaths may not be negligible and the distribution across the two groups may not be balanced. In the two-stage statistical model, deaths will be modelled using a binomial test, and survivors modelled using a generalised linear model. The two parts will then be combined to form the appropriate test statistic.
A different approach was used in the MOMs 9 trial in which the second primary outcome, at 30 months, was a composite of the BSID-II Mental Development Index and the child's motor function. Each of the two components of this outcome was ranked across all infants. Fetal, neonatal and infant deaths were assigned the lowest rank. The composite score for each infant was the sum of the two ranks and this was compared across both groups.
For both of these approaches, the interpretation of the final results is potentially more difficult, but the advantage of the former is that it avoids the need to make value judgements and may well become the methodology of choice. However, it remains to be seen if such an approach affects the interpretation and impact of the results. While taking deaths into account, the major disadvantage of the latter is that only p values can be calculated and adjusted analysis is not possible.

The views of families
The opinion of the ultimate beneficiaries of treatment ( patients and families) may well be highly useful in identifying the appropriate 'trade-off ' between neurodevelopmental impairment and death, necessary when considering all of the above approaches. However, it would not be possible to extrapolate the views of families involved in one trial to those in another, since the risk of death or disability will vary between trials. Hence, parental views of what 'trade-off ' is acceptable must directly relate to a particular intervention in a specific clinical scenario.

CONCLUSIONS
The recent change of focus within day-to-day neonatal care, with its increasing attention on reducing neurodevelopmental impairment, has been mirrored in the outcomes used in many perinatal trials. Clinical trials have increasingly incorporated neurodevelopment into their primary outcome and this has led to the question of how to deal with death in these studies. A range of possible solutions have emerged, each of which involves pragmatic statistical and clinical compromises, and there does not seem to be a correct approach.
The resources required to run large multicentre trials and the finite population of high-risk neonates limit the potential size and feasibility of neonatal trials. Catastrophic events (ie, the typical negative outcomes of interest) are thankfully uncommon, but this drives up the sample size unless we compromise and create meaningful composites. Further work is needed to clarify how, and to what extent, each of the designs and chosen analysis used to date can affect the findings and the impact of the trial. Where value judgements are needed, views of patient groups should be considered and may enable trialists to make better judgements regarding which approach to choose for their particular trial.
Contributors SAP, DJF and EJ were involved in conception of this review article. SAP wrote the initial draft manuscript. SP and SJ designed the tables. SAP, SJ, DJF and EJ were all involved in the design and contributed to subsequent drafts. All authors revised the final manuscript.

Competing interests None.
Provenance and peer review Commissioned; externally peer reviewed.
To cite Parekh SA, Field DJ, Johnson S, Juszczak E.