Derivation and assessment of a sex-specific fetal growth standard

Abstract Purpose To derive a prescriptive sex-specific fetal growth standard and assess clinical management and outcomes according to sex-specific growth status. Materials and methods This was a secondary analysis of the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b), a prospective observational study of 10,038 nulliparas from eight U.S. centers who underwent ultrasounds at 14–20 and 22–29 weeks with outcomes ascertained after delivery. From these, we selected a nested cohort of lower risk participants (excluded those with chronic hypertension, pre-gestational diabetes, suspected aneuploidy, and preterm delivery) to derive a sex-specific equation for expected fetal growth using fetal weights by ultrasound and at birth. We compared the male-female discrepancy in the rate of weight <10th (small for gestational age [SGA]) and >90th (large for gestational age [LGA]) percentiles between the sex-specific and sex-neutral (Hadlock) standards. Using the full unselected cohort, we then assessed outcomes and clinical management according to sex-specific SGA and LGA status. Results Overall, 7280 infants in the lower risk nested cohort were used to derive a sex-specific equation with fetal sex included as an equation intercept. The sex-neutral standard diagnosed SGA more often in female newborns (21% vs. 13%, p < .001) and LGA more often in male newborns (5% vs. 3%, p < .001). The sex-specific standard resolved these disparities (SGA: 9% vs. 10%, p = .23; LGA: 13% vs. 13%, p = .58). To approximate an unselected population, 1059 participants initially excluded for risk factors for abnormal growth were then included for our secondary objective (N = 8339). In this unselected cohort, 39% (95% CI 37.0–42.0%) of the 1498 newborns classified as SGA by the sex-neutral standard were reclassified as appropriate for gestational age (AGA) by the sex-specific standard. These reclassified newborns were more likely to be delivered for growth restriction despite having lower risk of morbidity (females) or comparable risk of morbidity (males) compared to newborns considered AGA by both methods. Of the 6485 newborns considered AGA by the sex-neutral standard, 737 (11.4%, 95% CI 10.6–12.2%) were reclassified as LGA by the sex-specific standard. These reclassified newborns had higher rates of cesarean for arrest of descent, cesarean for arrest of dilation, and shoulder dystocia than newborns considered AGA by both methods. None were reclassified from LGA to AGA by the sex-specific standard. Conclusion The Hadlock sex-neutral standard generates sex disparities in SGA and LGA at birth. Our sex-specific standard resolves these disparities and has the potential to improve accuracy of growth pathology risk stratification.


Introduction
Ultrasonographic assessment of fetal growth in the antenatal period is commonplace in the United States [1][2][3]. Evaluation typically consists of calculation of estimated fetal weight (EFW) and comparison against a population average to generate a percentile value, with percentiles <10th considered as fetal growth restriction (FGR) or small for gestational age (SGA), and >90th as large for gestational age (LGA) [2][3][4]. Accuracy of prediction of morbidity from both abnormal fetal growth remains poor, probably due to the assumption inherent in our current approach that all fetuses share a similar growth potential [5,6].
Male newborns have long been recognized to be larger than female newborns of the same gestational age, such that neonatal growth charts in the United States are sex-specific [7][8][9]. Despite this, intrauterine growth charts remain sex-neutral [10][11][12][13]. This is true even for growth charts that were developed in the era when fetal sex is routinely visible on prenatal ultrasound [10,12]. A prior analysis found that the Hadlock standard, a widely-used, sex-neutral fetal growth standard published in 1991, was twice as likely to consider female fetuses as being <10th percentile compared with male fetuses, even though female fetuses had significantly lower morbidity than male fetuses [14,15]. Population fetal growth standards that do not account for fetal sex, such as the Hadlock standard, may generate disparities in diagnoses of abnormal growth between fetal sexes that may not be justified by morbidity. Given the knowledge of sex differences in fetal growth and the routine prenatal assessment of fetal genitalia [1], the lack of investigation into sexspecific intrauterine growth standards represents an important gap in both research and clinical practice. Therefore, our objectives were (1) to derive a sexspecific fetal growth standard and (2) to compare metrics of clinical outcomes and management according to growth status using sex-neutral versus sex-specific growth standards in an unselected cohort.

Methods
We conducted a secondary analysis of the prospective observational Nulliparous Pregnancy Outcomes Study: monitoring mothers-to-be (nuMoM2b), as previously described [16]. Briefly, 10,038 pregnant participants were recruited from 8 U.S. clinical centers between 2010 and 2013. Recruitment was of singleton pregnancies with a first trimester ultrasound and no prior deliveries beyond 20 weeks' gestation. Study visits occurred at 6 0/7-13 6/7 weeks (visit 1), 16 0/7-21 6/7 weeks (visit 2), and at 22 0/7-29 6/7 weeks (visit 3) and included collection of clinical parameters, ultrasound, and detailed questionnaires. Outcomes were ascertained by medical record abstraction after birth. Ultrasounds took place at visits 2 and 3, and EFWs that were calculated using biparietal diameter, head circumference, abdominal circumference, and femur length [17]. The nuMoM2b study was approved by ethical review committees at participating institutions and was registered at clinicaltrials.gov (NCT01322529).
Our analysis included nuMoM2b participants who delivered at or beyond 24 weeks' gestation with available delivery information. We excluded participants who underwent pregnancy termination, had missing key information, newborns not assigned a sex at birth, born after 41 weeks when birth weight percentile could not be calculated, or with implausible fetal weight measurements. From these, we selected a nested cohort of lower risk participants to carry out our primary objective to derive a prescriptive, sex-specific fetal growth standard. For this nested cohort, we excluded those with pregnancies affected by several complications known to be associated with poor fetal growth, including preterm birth, preexisting hypertension, pre-gestational diabetes, suspected chromosomal or fetal anomalies, or stillbirth. To carry out our secondary objective to assess the sex-specific standard, we used the full cohort, including participants initially excluded for risk factors for abnormal growth so as to approximate an unselected population that would be more generalizable to clinical practice. Therefore, the analysis to address our second objective was carried out using the full eligible cohort and only excluded those missing key variables or whose pregnancies ended prior to 24 weeks.
Our primary endpoint was birth weight percentile. SGA was defined as birth weight <10th percentile, LGA was defined as birth weight >90th percentile, and appropriate for gestational age (AGA) was defined as birth weight of 10th percentile to 90th percentile. Because the primary endpoints utilize birth weight, all weights <10th percentile are referred to as "SGA" in this analysis and "FGR" is only used when referring to nuMoM2b variables reflecting a prenatal diagnosis.
The Hadlock formula was used as the referent sexneutral standard to calculate weight-for-age percentiles, which were applied to both EFWs and birth weights in order to maintain continuity between weights measured before and after delivery. Hereafter, the Hadlock standard is referred to as the "sex-neutral" standard. Birth weight percentiles were also calculated using a validated sex-specific birth weight standard (Olsen) as a reference for the expected balance of SGA and LGA between fetal sexes [8].
To complete our primary objective, we used the lower risk nested cohort to regress EFWs and birth weights on fetal sex and weeks' gestation using a longitudinal mixed-effects regression model to estimate an equation representing fetal growth (detailed statistical explanation available in the Online Appendix).
Once the sex-specific, longitudinal equation was finalized, percentiles were calculated for birth weights using the new sex-specific standard, the sex-neutral standard, and the Olsen birth weight standard, comparing the proportion of SGA and LGA newborns by fetal sex using each. Rather than assume that male and female fetuses should automatically have the same proportions of SGA or LGA, we used (Olsen) as a reference for the expected rates of SGA and LGA for each sex [8].
Our secondary objective was to compare metrics of clinical outcomes and management according to growth status using sex-neutral versus sex-specific growth standards in an unselected cohort. To do this, we used the full eligible cohort. We then compared interventions and perinatal outcomes between two size classifications: newborns designated as AGA by both sex-neutral and sex-specific standards and newborns whose growth status was reclassified by the sex-specific standard. This comparison was performed separately for male and female fetuses as well as for SGA and LGA.
Two types of clinical measures were used to assess the sex-specific growth standard. First, we assessed a composite of perinatal morbidity, which was defined as the presence of any of the following individual outcomes: stillbirth occurring !24 weeks, need for mechanical ventilation, neonatal death before discharge, NICU stay >48 h, confirmed sepsis, respiratory distress syndrome, seizures, necrotizing enterocolitis (NEC), and intraventricular hemorrhage (IVH). Second, we selected measures that were specific to SGA and LGA to assess the clinical relevance of the sex-specific standard. Measures specific to SGA included admission to labor and delivery for FGR, delivery for FGR, clinical suspicion of FGR before delivery, scheduled labor induction or cesarean without labor before 39 weeks' gestational age, and cesarean delivery for non-reassuring fetal status. Clinical outcomes specific to LGA included cesarean delivery for arrest of dilation, cesarean for arrest of descent, shoulder dystocia, and brachial plexus injury. This analysis was not an effort at internal validation, but rather an exploratory assessment of whether reclassification from normal to abnormal or vice versa by the sex-specific standard was inappropriate.
Comparisons of SGA and LGA overall, and comparisons of clinical outcomes and management by SGA and LGA were tested with a chi-square test.
Statistical analysis was performed using SAS software, Version 9.4 of the SAS System for Windows. Copyright # 2006 SAS Institute Inc. Cary, NC, USA.
Graphics were created using GraphPad Prism version 9.1.2 for Windows, GraphPad Software, La Jolla, CA, USA.

Results
Overall, 8339 pregnancies were eligible for analysis ( Figure 1). The most common reasons for exclusion were delivery beyond 41 weeks (n ¼ 897) and loss to follow-up at delivery (n ¼ 353). An additional 1059 participants were excluded from derivation of the fetal growth equation for complications known to be associated with abnormal growth (preexisting hypertension or diabetes, preterm birth, chromosomal abnormalities, and stillbirth, Figure 1). Therefore, 7280 participants were included in the lower risk nested cohort for the primary analysis to derive a sex-specific fetal growth standard, with 80% of people contributing three measurements, 19% contributing two measurements, and 1% contributing one measurement. The distribution of weight assessments across gestation is illustrated in Figure 2(A). Because study ultrasounds were occasionally performed later than planned, EFW measurements extended to 34 weeks, with nearly all being completed before 32 weeks (Figure 2(A)). Characteristics of the study population are described in Table 1.
When sex was accounted for using a sex-specific intercept, it was statistically significant (p < .001). From this point forward in the analysis, we used the following sex-specific equation: A detailed description of the derivation of the final fetal growth equation are available in the Online Appendix ("Growth equation derivation results", and Table S1). The sex-neutral and nuMoM2b sex-specific curves are plotted in Figure 2(B) with the accompanying formulas in Figure 2(C). Across all gestational ages, the standard deviation of fetal weights was ±11.38% of the expected weight, such that the formula for fetal weight z score could be expressed as z ¼ (weightexpected weight)/ (expected weight Ã 0.1138).
When we applied the sex-neutral and sex-specific standards to the cohort, we found that the sex-neutral standard labeled significantly more female newborns as SGA than male (21% vs. 13%, p < .001), whereas the sex-specific standard did not (9% vs. 10%, p ¼ .23). The sex-neutral standard labeled significantly more male newborns as LGA than female (5% vs. 3%, p < .001), while the sex-specific standard did not (13% vs. 13%, p ¼ .58). Rates of SGA by the sex-specific standard were the same as when using a national sex-specific birth weight standard while rates of LGA were higher (Supplemental Figure S2).
For our secondary objective to assess intervention measures and outcomes, we included the whole eligible cohort (N ¼ 8339) in order to better represent an unselected population (Figure 1). The distribution of weights across gestation in this full unselected cohort are illustrated in Supplemental Figure S1. Of the 1498 newborns classified as SGA by the sex-neutral standard, 591 (39.5%, 95% CI 37.0-42.0%) were reclassified as AGA by the sex-specific standard. Conversely, of the 5753 considered AGA by the sex-neutral standard, only 5 (0.09%, 95% CI 0.03-0.2%) were reclassified as SGA by the sex-specific standard.
When analyzed by sex, female newborns reclassified from SGA to AGA by the sex-specific standard had lower rates of the composite perinatal morbidity and similar rates of cesarean delivery for "non-reassuring fetal status" as the group considered AGA by both standards, despite being more likely to receive a prenatal diagnosis of FGR, to be admitted for labor and delivery for FGR, and to be delivered for FGR (p < .001 for all three comparisons, Table 2). Male newborns reclassified from SGA to AGA by the sex-specific standard had comparable rates of the composite perinatal morbidity to the group considered AGA by both standards. However, they were more likely to be diagnosed with FGR before birth, to be admitted to labor and delivery for FGR, to be delivered for FGR than male newborns considered AGA by both standards, and to undergo cesarean for "non-reassuring fetal status" (p < .05 for all four comparisons). Neither female nor male fetuses who were reclassified from SGA to AGA by the sex-specific standard experienced higher rates of scheduled delivery before 39 weeks compared to those considered AGA by both standards (Table 2). Due to the small number reclassified from AGA to SGA by the sex-specific standard (n ¼ 5, all male), we did not perform a pairwise comparison of this group against the group that were AGA by both standards. However, it is noteworthy that all five newborns reclassified as SGA by the sex-specific standard that would be considered AGA by the sex-neutral standard experienced the composite morbidity outcome (Online Appendix, Table S2). Comparisons across all possible growth classifications are available in the Online Appendix (Tables S2, S3).
We also assessed outcomes according to LGA classification by sex-neutral and sex-specific standards using the whole eligible cohort (N ¼ 8339). Of the 6485 newborns classified as AGA by the sex-neutral standard, 737 (11.4%, 95% CI 10.6-12.2%) were reclassified as LGA by the sex-specific standard. Conversely, of the 351 considered LGA by the sex-neutral standard, none were reclassified as AGA by the sex-specific standard.
Both male and female newborns reclassified from AGA to LGA by the sex-specific standard had comparable rates of perinatal morbidity to the group considered AGA by both standards. However, their births were more likely to be complicated by cesarean delivery for arrest of descent, cesarean delivery for arrest of dilation, and shoulder dystocia. Female newborns reclassified as LGA also had a higher rate of brachial plexus injury, whereas male newborns did not ( Table 2). Among the six instances of brachial plexus injury, four occurred in association with shoulder dystocia.

Main findings
Using data from a large, prospectively enrolled cohort of nulliparous participants, we derived a sex-specific fetal growth standard that resolves the sex disparity in Figure 2. Distribution of growth assessments across gestation in the lower risk nested cohort used to derive the sex-specific growth curves. Panel A: Scatter plot of EFWs and birth weights used to derive the new sex-specific standard. Colors denote the study visit at which the weight was measured. Panel B: Fitted nuMoM2b male and female fetal growth curves alongside the Hadlock sex-neutral growth curve. For ease of interpretation, the x axis (gestational age) in panels A and B are aligned, and the y axis (weight) scales are equivalent. Panel C: Sex-specific formulas for expected fetal weight at a given gestational age, where GA is in weeks, decimal format. EFW: estimated fetal weight; GA: gestational age.

SGA and
LGA created by the sex-neutral standard. The sex-specific standard identified infants labeled SGA by the Hadlock sex-neutral standard who were not at increased risk of morbidity but who did experience more interventions, suggesting that it may be reasonable to safely consider them as normally-grown. It also identified a group of otherwise unrecognized LGA infants who were at increased risk of cesarean delivery for arrest of dilation, cesarean delivery for arrest of descent, and shoulder dystocia, suggesting that the sex-neutral standard under-recognizes LGA.
Overall, the sex-specific standard labeled fewer newborns as SGA and more newborns as LGA than the sex-neutral standard. This is because the sex-neutral standard predicts larger fetal/newborn size at the end of pregnancy than the sex-specific nuMoM2b standard (Figure 2). The reason that the Hadlock standard predicts larger size at birth than term birth weights from our cohort likely relates to differences in the study samples since term birth weight-derived curves are generally similar to ultrasound-derived fetal growth curves at term [12,18,19]. However, our new standard is consistent with other newer standards, which also predict smaller size than Hadlock [20,21].
Our finding that male fetuses are larger than female fetuses at any gestational age is consistent with reported literature [9,13]. It is not surprising that female fetuses in our cohort were significantly more likely to be considered SGA and less likely LGA than male fetuses by the sex-neutral fetal growth standard. Given the commonplace use of sex-specific neonatal growth charts, it is noteworthy that sex-neutral intrauterine growth standards still predominate [10,12,19]. A notable exception is the growth standard published by the World Health Organization (WHO), which also demonstrated significant differences between sexes but does not have a readily usable percentile formula for clinical use [13].

Strengths and limitations
Our study has multiple strengths. Rather than using a single fetal weight estimate per participant to construct the growth curve as Hadlock did [11], our sex- Maternal age and BMI were ascertained at the initial study visit in the first trimester. Data summarized as N(%) or mean ± SD as indicated. GA: gestational age; HTN: hypertension; NICU: neonatal intensive care unit; RDS: respiratory distress syndrome; NEC: necrotizing enterocolitis; IVH: intraventricular hemorrhage; BMI: body mass index; SD: standard deviation. a Transfer to higher level of care when delivery occurred at a facility with level II NICU or lower.
specific standard is based on longitudinal assessments, with the first EFWs obtained starting at 16 weeks, which is earlier than the Hadlock standard. Our inclusion of only term births in the derivation of the sex-specific equation removed bias that would be introduced by the association of preterm birth with poor growth. Because of this, our sex-specific standard is more representative of expected fetal growth in ongoing pregnancies. A final strength is the assessment of differences in clinical management and outcomes for newborns who were classified differently by the sex-specific standard than by the sex-neutral standard, which provided empiric substantiation of the clinical relevance of the differences between sexneutral and sex-specific curves. Our use of a nested cohort to derive and then an expanded cohort to assess the sex-specific standard is valid because this study is different from a traditional derivation-validation approach. In such an approach, separate cohorts are needed because the primary outcome is used to derive the model, making it invalid to test the model's prediction of the same outcome in the same cohort. In our case, this would be analogous to deriving a fetal growth equation based on its prediction of morbidity and then testing its prediction of morbidity. However, our approach was to derive an equation for fetal growth based on how well it represents available fetal measurements and then assess how designations based on this fetal growth equation are associated with clinical outcomes in the parent cohort. Even so, our analyses of clinical outcomes and management should be interpreted as exploratory and hypothesis-generating rather than as validating.
The primary limitation of our study is that ultrasound EFWs were not collected uniformly across gestation, but were instead concentrated around nuMoM2b study visits such that EFWs collected throughout pregnancy may better represent expected fetal growth. Additionally, sex was ascertained at birth, so our sex-specific curve needs to be validated using a cohort with prenatally identified fetal sex. Furthermore, we cannot rule out that clinical management based on prenatal suspicion of FGR may have introduced bias by lowering clinicians' thresholds for cesarean delivery. This could explain why the group of male newborns considered SGA by only the sex-neutral standard had higher cesarean rates for fetal compromise but did not experience concrete morbidity more often than the AGA group. This does not likely explain our findings, however, since it is implausible that growth-restricted fetuses who undergo delivery for FGR would have lower rates of morbidity than the AGA group, which is what we found among female newborns. Finally, information on prenatal suspicion for LGA or macrosomia was not collected in the nuMoM2b study so we are unable to determine whether this may have also altered clinical decisions related to mode of delivery.

Conclusion
If the new sex-specific standard indeed represents normal fetal growth better than the Hadlock sex-neutral method, our findings suggest that fewer diagnoses of SGA and more diagnoses of LGA may be necessary. Under this assumption, it is female fetuses who would be most adversely affected by continued use of the sex-neutral standard, since they are more likely to be inappropriately labeled as SGA and less likely to be appropriately labeled as LGA. This is concerning because fetuses with estimated size <10th percentile for gestational age in the United States undergo an intensive regimen of surveillance and often undergo delivery before 39 weeks even without evidence of compromise [3]. Simultaneously, failing to recognize LGA may be associated with additional maternal and neonatal morbidity [4]. Thus, our development of a sex-specific fetal growth standard represents a meaningful opportunity to reduce the sex disparity in diagnosis of both SGA and LGA created by the use of a sex-neutral growth standard. Whether or not a sexspecific fetal growth standard is ultimately found to improve prediction of perinatal morbidity and mortality, the fact that it resolves both statistically and clinically significant sex disparities in SGA and LGA is reason enough to consider its use.