The Association between Sex Hormones, Pubertal Milestones and Benzophenone-3 Exposure, Measured by Urinary Biomarker or Questionnaire

ABSTRACT Experimental studies have suggested benzophenone-3 (BP-3), a sunscreen ingredient, may have endocrine-disrupting properties. A cohort of girls were recruited at ages 6–7 years and returned semi-annually for pubertal maturation staging, provided blood for serum hormone analyses [estradiol, estrone, testosterone, dehydroepiandrosterone-sulfate (DHEA-S)], and urine to measure BP-3 concentrations. We found a significant negative linear association between amount of reported sunscreen use and testosterone levels at the onset of puberty (N = 157, adjusted β = −0.0163, 97.5% CI:-0.0300,-0.0026). The 2nd quartile of the BP-3 biomarker had earlier thelarche compared to the 1st quartile (N = 282, adjusted HR = 1.584, 97.5% CI:1.038,2.415). Results suggest that higher report of sunscreen use may be associated with lower testosterone levels at thelarche and a non-linear relationship between the BP-3 urinary biomarker and onset of puberty, although the clinical significance of the finding is limited and may be a random effect. Improved methods of BP-3 exposure characterization are needed.


Introduction
Benzophenone-3 (BP-3), also known as oxybenzone, is the common active agent in most sunscreens. Recent studies have examined BP-3 as a candidate endocrine disrupting chemical, potentially contributing to the decrease in age of puberty among females in the United States. In addition, multiple animal studies suggest weak estrogenicity (Schlumpf et al. 2001;Takatori et al. 2003;Kerdivel et al. 2013;Kim and Choi 2014; Centers for Disease Control and Prevention 2017). BP-3 exposure is widespread and levels are relatively high compared to other well-studied endocrine disruptors such as bisphenol-A (Centers for Disease Control and Prevention 2017). The detection rate of BP-3 in urine in the 2003-2004 National Health and Nutrition Examination Survey (NHANES) cohort (at the time of the study reported here) was 96.8%, thus potential health effects of this BP-3 could impact a large percentage of the population (Calafat et al. 2008). An analysis of data from the entire Breast Cancer and the Environment Research Program (BCERP) Puberty Cohort found that higher BP-3 urine concentration was associated with later age of thelarche (breast development) [HR = 0.95 (95% CI: 0.92-0.98, p = 0.001)] (Wolff et al. 2007(Wolff et al. , 2015. This was surprising because BP-3 has demonstrated estrogenic properties in vivo, and estrogens stimulate breast development (Schlumpf et al. 2001;Kerdivel et al. 2013;Kim and Choi 2014). Authors suggested anti-obesogenic properties of BP-3 as a potential mechanism for this delay (Wolff et al. 2015). Girls with lower BMIs have a greater surge in estradiol around the time of thelarche contrasted to girls with greater BMIs Wolff et al. 2015), thus the effect of BP-3 may be less detectable. In this same cohort, higher BP-3 was associated with later menarche [HR = 0.95 (95% CI: 0.93-0.98, p-value = 0.002)], but associations were attenuated after adjustment for race/ethnicity and education [adjusted HR = 0.99 (0.95-1.02, p-value = 0.4)] . Previous studies have only used pubertal milestones as an outcome but examining sex hormones during this peri-pubertal period is necessary to better understand the relationship between BP-3 exposure and pubertal milestones.
We hypothesized that higher BP-3 exposure, measured by either urinary biomarker or sunscreen questionnaire, will be associated with lower estrone, estradiol, testosterone, and DHEA-S levels at time windows around thelarche (before, during and after). In this single site sub-cohort study which included hormone analyses, we further explored whether this delay could potentially be associated with hormone levels during puberty. We hypothesized that higher BP-3 biomarker would be associated with lower levels of sex hormones during puberty. We chose to state our hypothesis in this direction because of findings in the larger cohort.
The BP-3 urinary biomarker has been widely used in the literature to represent BP-3 exposure (Hayden et al. 2005;Gonzalez et al. 2006;Buttke et al. 2012;Kunisue et al. 2012;Harley et al. 2018). The half-life of BP-3 ranges from about 20 to 135 h in various studies when 4% oxybenzone lotion was applied (Gustavsson Gonzalez et al. 2002;Matta et al. 2019Matta et al. , 2020. Given the half-life of the biomarker, it likely only represents exposure over the few past days (Human Biomonitoring for Environmental Chemicals 2006). Due to the rapid excretion, the amount of BP-3 in a urine samples obtained at various time points within 48 h after application changes dramatically and also varies widely from person to person (Gustavsson Gonzalez et al. 2002). Given this limitation, we sought another measurement for exposure to BP-3 that would better represent long-term exposure and be less susceptible to inter-person variability in metabolism. We also tested our hypothesis using selfreport of sunscreen use over the past year as a proxy for BP-3 exposure. Report of sunscreen use has not been validated as a proxy for the BP-3 biomarker. Previous studies have found a positive association between report of sunscreen use and the urinary BP-3 levels biomarker (Zamoiski et al. 2015;Ko et al. 2016;Berger et al. 2019).
Our second hypothesis was that higher BP-3 levels, measured by either urinary biomarker or sunscreen questionnaire, will be associated with a later age at pubertal milestones (thelarche, pubarche or menarche). Although the second hypothesis has previously been tested in the multisite cohort (Wolff et al. 2015, site-specific effect estimates on the same set of participants included in the reproductive hormone analysis are necessary in order to fully understand the interplay between BP-3 exposure, sex hormones and pubertal outcomes. The previous BCERP cohort study relied entirely on the BP-3 urinary biomarker to assess exposure to BP-3 (Wolff et al. 2015). Our study allows a comparison of the results of the BP-3 urinary biomarker and the sunscreen questionnaire.

Materials and Methods
This study utilizes the Cincinnati sub-cohort of the Breast Cancer and the Environment Research Program (BCERP) Puberty Cohort, as this is the only site that obtained serum hormone measurements. This study was approved by the University of Cincinnati and Cincinnati Children's Hospital Medical Center Institutional Review Boards, protocol numbers 2008-0170, 2010-1637, 2015-5937.
Recruitment and methods of data collection have been described previously (Biro et al. , 2013Wolff et al. 2015). The original study was a longitudinal cohort that included three sites: San Francisco Bay Area, CA; East Harlem, New York; NY and Cincinnati, OH .
The Cincinnati site included girls ages 6-7 years, at enrollment, with no underlying endocrine medical conditions, recruited through local public or parochial schools in the Cincinnati metropolitan area, which accounted for 85% of participants ( Figure 1) . There was oversampling for a family history of breast cancer by recruiting daughters or granddaughters of participants in the Breast Cancer Registry of Greater Cincinnati which accounted for 15% of participants . At the first visit and every six months for follow-up, participants completed a clinical study exam, provided a blood sample, and their parents or guardians completed a questionnaire. Urine samples were obtained each year.

Clinical Study Exams
Methods for the clinical study exams have been previously described . Briefly, during each exam participants were evaluated for pubertal maturation using a standardized protocol. Breast development was based on Marshall and Tanner staging criteria and included palpation to distinguish between adipose and breast tissue (Marshall and Tanner 1969;Biro et al. 2010). Clinicians used an accessory light source to better examine presence and stage of pubic hair (pubarche) (Marshall and Tanner 1969;Biro et al. 2010). Examiners, who were female clinicians, were trained and certified by a master trainer physician at each site. To ensure inter-rater reliability, staging by raters was compared to a master rater physician . Estimated age of thelarche and age of pubarche were calculated previously and these methods have been published (Biro et al. 2013). Examiners also obtained height and weight with at least two measurements of each, used to calculate body mass index (BMI) and BMI z-score (BMI-z) using the Centers for Disease Control 2000 Growth Charts (Centers for Disease Control; Biro et al. 2013). Age of onset of puberty was operationally defined as age of thelarche.

Biospecimens
Early morning, fasting blood samples were collected at each visit, processed to sera, frozen, and stored at −80 C. Thelarche occurs over months, but for the purpose of statistical analyses, an estimated date of thelarche was previously calculated (Biro et al. 2013). Three time windows were then constructed: T-6 window (−9 months before estimated date of thelarche, up to −3 months before estimated date of thelarche), T0 window (−3 months before estimated date of thelarche, up to +3 months after estimated date of thelarche), and T + 6 window (+3 months after estimated date of thelarche up to +9 months after estimated date of thelarche), as well as other time windows not used in the current analysis ( Figure 2) . Sera were selected for hormone analyses corresponding to these time windows. Study visits occurred every six months, with a ± 4-week window for scheduling. Occasionally, two study visits and thus hormone measurements occurred in the same window (eg. two visits may have occurred during T0, the first at −3 months and the second at +2 months). These duplicates were removed by choosing the hormone measurement that was collected closest to the center date of the window and removing the other, leaving only one hormone measurement during each window. The laboratory that performed the hormone assays was CLIA certified, and also certified by the CDC for testosterone and estradiol . Methods for hormone assays have been previously published . Estradiol, estrone and testosterone were measured using high performance liquid chromatography (HPLC) with tandem mass spectrometry (MS) .
For the first batch, DHEA-S was assayed using radioimmunoassay after enzymolysis ). In the other two batches, DHEA-S was assayed using HPLC-MS. The correlation between the two methods was 0.948. All hormone measurements were log-transformed for analyses. The limits of quantification (LOQs) for each of the hormones were DHEAS 10 µg/dL, estradiol 1 pg/ mL, estrone 2.5 pg/mL, testosterone 3 ng/dL. Early morning void urine samples were collected at each visit, frozen and stored at −80 C. Urine samples from all participants from the baseline visit, and from 98 participants at +1 year after baseline and + 3 years after baseline, were sent to the Centers for Disease Control and Prevention (CDC) National Center for Environmental Health for analysis. Urine samples were analyzed for BP-3 as part of a phenol panel using HPLC with MS (Ye et al. 2005;Wolff et al. 2007;Mervish et al. 2014). Although the parent compound BP-3 is metabolized in humans, we only measured levels of BP-3 because the metabolites are not included as part of the CDC phenol panel. All BP-3 measurements were normalized by urine creatinine concentrations for analyses. Creatinine was measured in urine on a Roche Hitachi 912 chemistry analyzer (Roche Hitachi, Basel Switzerland). The limit of detection for BP-3 was 0.3 ng/mL (Ye et al. 2005;Calafat et al. 2008).

Quality Assurance and Quality Control
Quality assurance and quality control for the BP-3 assay has been previously published by the CDC National Center for Environmental Health (Ye et al. 2005;Calafat et al. 2008). They measured total (free plus conjugated species) concentrations of BP-3 in urine by online solid-phase extraction coupled to high-performance liquid chromatography-tandem mass spectrometry, described in detail elsewhere (Ye et al. 2005). Pooled human urine samples were used to create low (~ 20 µg/L) and high (~ 45 µg/L) concentration quality control materials, which were analyzed with standard and reagent blank (Ye et al. 2005;Calafat et al. 2008). In addition to CDC quality control methods, each batch also included investigator-supplied duplicates from a pooled sample for quality control (Mervish et al. 2014).

Questionnaires
Parents or guardians, mostly mothers, completed annual comprehensive written questionnaires on topics such as medical history, medications, physical activity, personal products used, and more. Relevant to the current study, the questionnaire included the participant's race, highest parental education, biological mother's age of menarche, and the amount of sunscreen used over the past year. The sunscreen question asked, 'Thinking back over the past 12 months, how often did your daughter usually use sunscreen?' The potential for recall bias regarding sunscreen use was reduced by collecting data each year. A metric was then calculated from this questionnaire by calculating the number of days in the past year the participant had used sunscreen. The main goal of -6 Window (-9 months to -3 months before estimated date of thelarche using report of sunscreen over the past year was to provide a better representation of long-term exposure to BP-3 than the urinary biomarker provides. Race was dichotomized into black versus all other races for all analyses because there were very few Cincinnati participants who reported other races such as Asian or Hispanic-white. Age of menarche was self-reported from the study participant's and/or mother's answers to questions regarding first menstrual cycle, asked each year, beginning when girls were 11 or 12 years old Wolff et al. 2017).

Missing Data
If a BP-3 measurement was below the limit of detection (LOD), then a value of LOD/ ffi ffi ffi 2 p was imputed, a method for imputation when the data are assumed to display a left-censored log normal distribution (Hornung and Reed 1990;Wolff et al. 2010). Likewise, if the hormone value was below the LOD, then a value of LOD/ p 2 was imputed. If either the BP-3 or the hormone value was missing because the participant did not provide a urine or serum sample, or it was 'Quantity Not Sufficient' (QNS), then the participant was excluded from the analyses that used those data.

Statistical Analyses
We tested our first hypothesis with two approaches, first using the level of BP-3 urinary biomarker to represent BP-3 exposure and second using the sunscreen metric from the questionnaire. For easier interpretability of beta estimates, we divided the sunscreen metric by 12 such that the units of the sunscreen metric would be #days/month rather than #days/year, only for this portion of the analysis.

Analyses of Hormone Levels
We performed quantile regression using Proc QuantReg in SAS 9.4 to calculate the beta estimates and confidence intervals for the effect of BP-3 exposure on each hormone (estradiol, estrone, testosterone, DHEA-s) and separately for each time point. A participant was included in analyses to test the association between BP-3 exposure and sex hormones if they had at least one hormone measurement (estradiol, estrone, testosterone or DHEA-S) and either one urinary BP-3 biomarker measurement that occurred before the hormone measurement or one report of sunscreen use that occurred before the hormone measurement. Some participants had up to three BP-3 measurements and completed up to three sunscreen questionnaires during the study period. We used the single most recent BP-3 urinary biomarker measurement, or the sunscreen questionnaire data obtained prior to the day of the hormone measurement. Given that we tested our hypothesis twice, first using a biomarker and second using a questionnaire to represent exposure, we adjusted for multiple comparisons using the Bonferroni post-hoc adjustment (Bland and Altman 1995). Our alpha level was set at 0.025 (0.05/2) and we calculated the 97.5% confidence intervals for each estimate (CI). We examined these hypotheses during the three-time windows described above ( Figure 2). All models were adjusted for the BMI z-score using measurements taken on the same day as the hormone measurement or most recent previous exam BMI z-score. We did not include race as a covariate because race is so highly associated with sunscreen use.
The mechanism of action of BP-3 impacting hormone levels is currently unknown but could be rapidly acting. In order to evaluate for a real time effect of BP-3 exposure on hormone levels, we included a sensitivity analysis in which we only included the BP-3 measurements that occurred on the same day as the hormone measurement, or for questionnaire analysis, the questionnaires completed within 30 days before the hormone measurement.
Recent literature has suggested that for environmental biomarkers such as BP-3 which have relatively shorter half-lives, repeated measurements over a period of interest may improve the validity of that biomarker . Thus, we completed another sensitivity analysis utilizing the repeated measures of BP-3, in which we averaged two or three repeat BP-3 measurements that occurred before or on the date of the hormone measurement.
Other environmental chemicals have been known to have a non-linear relationship between exposure and health effect. In order to look for a non-linear relationship between the BP-3 urinary biomarker and levels of sex hormones, we created BP-3 quartiles using baseline levels of the entire cohort and baseline sunscreen metric data of all participants that had a BP-3 measurement or completed a questionnaire. We used these quartiles to complete a sensitivity analysis using the quartile of BP-3 level rather than the BP-3 level as a continuous variable. Because the highest BP-3 exposure occurs during the summer, measurements of BP-3 taken during non-summer months could lead to exposure misclassification (CM et al. 2018). Including non-summer measurements of BP-3 in analyses could attenuate effect estimates, so we included a sensitivity analysis in which we only used BP-3 measurements that occurred during the summer months (June, July and August).

Analyses of Pubertal Milestones
To test our second hypothesis, we employed Cox-proportional hazards models using Proc PHREG in SAS to estimate the risk of age of pubertal milestone (thelarche, pubarche or menarche) associated with BP-3 exposure level. We examined BP-3 exposure using both the levels of BP-3 urinary biomarker and the amount of reported sunscreen use, and created models using each as a continuous variable and then as a categorical variable (quartiles) to look for non-linear relationships. We used age as the scale and adjusted models for race, most recent BMI-z measurement before the pubertal milestone, mother's age of menarche, and highest parental educational attainment. If a covariate failed to meet the Coxproportional hazards model assumption of not varying with age, an interaction term with time was included in the model. Using the ASSESS statement in Proc PHREG, we defined cutoff points at which the term appeared to differ with age. We then tested whether the interaction term was significant before and after that cutoff point. Although race was not included as a co-variate when testing our first hypothesis, it was included as a co-variate here because previous study found that race a significant predictor of age of pubertal milestones (Biro et al. 2013). This raised concern that race could capture some of the variation attributable to the BP-3 biomarker, since use of sunscreen differs by race. Thus, a sensitivity analysis was conducted in which race was excluded from the model. We included a separate analysis to examine whether a family history of breast cancer was a significant covariate. Using questionnaire data, we identified those participants who had a first or second degree relative with breast cancer, and, in a dichotomous variable categorized them as those with a family history of the disease. We created categories of BMI-z based on CDC definitions of childhood normal weight, overweight and obesity (Barlow and Committee 2007). We examined whether there was an interaction between quartile of BP-3 exposure (biomarker or sunscreen metric) and BMI z-score category.

Results
The original BCERP Cincinnati cohort had 379 participants, and 353 of these participants had at least one baseline BP-3 measurement. The median value of the BP-3 baseline measurements was 25.0 μg/g-creatinine (N = 353), with detection in 98.9% of samples (Table 1). Some participants (N = 98) then had repeat measurements at +1 years and +3 years after baseline. The median number of days sunscreen used in the previous year was 48 days (N = 302). The Spearman correlation coefficient between report of sunscreen over the past year during year 1 of the study and BP-3 measurements was 0.28 (p = 0.0049).

BP-3 exposure assessed by the BP-3 urinary biomarker and levels of sex hormones
Results of the quantile regression did not suggest a significant linear relationship between the concentration of the BP-3 urinary biomarker and levels of any of the four hormones (estradiol, estrone, testosterone or DHEA-S) during the three time windows tested (T-6 before thelarche, T0 during thelarche or T + 6 after thelarche) (Supplementary Table 1). In many of these models, BMI-z was a significant covariate with a negative association with sex hormone levels. In these primary analyses, the number of days between the urine collection for the BP-3 biomarker measurement and the blood collection for the hormone measurement varied for each participant. This was because serum samples were selected for hormone measurement relative to estimated date of thelarche, and each participant progressed through breast development at her own time. The median number of days between the BP-3 biomarker measurement and the hormone measurement for the primary analyses was at T0: 224 days (25.3% had BP-3 and hormones measured on the same day); at T-6: 191 days (28.85% had BP-3 and hormones measured on the same day); at T + 6: 373 days (20.53% had BP-3 and hormones measured on the same day). Our sensitivity analyses only included the BP-3 urinary biomarker measurements that were measured on the same day as the hormones to look for a real time effect of BP-3 exposure on hormone levels. However, this sensitivity analysis did not suggest a significant linear relationship between the level of BP-3 urinary biomarker and the levels of any of the four hormones (Supplementary Table 1). To better represent BP-3 exposure, we used an average of two or three repeated BP-3 urinary biomarker measurements. However, we did not find evidence of a significant linear relationship between the average of repeated BP-3 biomarker measurements and the level of any of the hormones measured during the thelarche window (estradiol, estrone, testosterone or DHEA-S) (results were non-significant and are not provided). To look for a non-linear relationship between BP-3 exposure and the levels of sex hormones, we categorized the levels of the BP-3 biomarker in quartiles based on quartile levels for the entire cohort. Results of the sensitivity regression analysis did not suggest a non-linear relationship during any of the time windows (results were non-significant and are not provided). Sensitivity analyses that only used measurements of the BP-3 biomarker collected during summer months (June, July and August), which represent the season of highest BP-3 exposure (CM et al. 2018), did not find a linear relationship between the level of BP-3 urinary biomarker measured during summer months and levels of any of the hormones during any of the time windows (results were non-significant and are not provided).

BP-3 exposure assessed by sunscreen metric and levels of sex hormones
We found a significant linear relationship between reported sunscreen use and testosterone levels during the thelarche window (β = −0.0163, 97.5% CI: −0.0300, −0.0260, p = 0.0077) (Table 2). However, we did not find this same relationship between reported sunscreen use and testosterone levels during the other time windows examined (T-6, T + 6) (Supplementary Table 2). We also did not find evidence of a linear relationship between reported sunscreen use and estradiol, estrone or DHEA-S levels during any of the time windows examined (T-6, T0, T + 6) (Supplementary Table 2). The median number of days between completing the questionnaire and the hormone measurement for the primary analyses was: at T0: 186 days (34.19% ≤ 30 days); at T-6: 172 days (38.10% ≤ 30 days); at T + 6: 197 days (29.21% ≤ 30 days). When we only included questionnaires completed within 30 days before hormone measurement, we found a significant negative linear relationship between reported sunscreen use and estrone levels during thelarche (β = −0.0259, 97.5% CI: −0.0470, −0.0047, p = 0.0067, N = 53) (Supplementary Table 2).

BP-3 exposure assessed by the urinary biomarker and pubertal milestones
Cox proportional hazards models did not suggest that higher levels of the BP-3 urinary biomarker were associated with earlier pubertal milestones (age of menarche, age of thelarche or age of pubarche) after adjusting for BMI-z, race, mother's age of menarche and highest parental educational attainment (Supplementary Table 3, Supplementary Table 4). When the level of the BP-3 urinary biomarker was included as categorical variable, the 2 nd quartile of the BP-3 urinary biomarker had earlier thelarche compared to the first quartile of the BP-3 biomarker (HR = 1.584, 97.5% CI: 1.038-2.415, p = 0.015) ( Table 3). Thelarche was not earlier for the 3 rd or the 4 th quartile of the BP-3 urinary biomarker. We found marginal significance for a higher risk of attaining pubarche with the 2 nd quartile of the BP-3 biomarker as compared to the 1 st quartile, but this association did not remain significant after we adjusted for multiple comparisons and lowered the alpha level to 0.025 (HR = 1.446, 97.5% CI: 0.973-2.151, p = 0.037). The interaction between BMI z-score category and BP-3 quartile was not significant in any of the models and was not retained.

BP-3 exposure assessed by sunscreen metric and pubertal milestones
Results did not suggest that higher reported sunscreen use is associated with earlier pubertal milestones (age of menarche, age of thelarche or age of pubarche) after adjusting for BMI-z, race, mother's age of menarche and highest parental educational attainment (Supplementary Table 3, Supplementary  Table 4). The association between a family history of breast cancer had a significant positive association in all models with age of pubarche as a dependent variable but not in any of the models for age of thelarche or age of menarche. However, inclusion of this covariate did not change the significance of the hazard ratios for effect between BP-3 exposure and age of pubarche and was not retained in final models. *All models are adjusted for BMI z-score. We set α = 0.025 (two sided) due to multiple comparison. Results were not significant for models using BP-3 biomarker -displayed in supplementary tables. All models are adjusted for BMI-z, race, maternal age of menarche, highest parental educational attainment.
We set α = 0.025 (two sided) due to multiple comparisons, thus 2 nd quartile is borderline significant for age of pubarche.

Discussion
In this analysis of the Breast Cancer of and the Environment Research Program Puberty Cohort, we examined whether higher BP-3 exposure was associated with lower levels of sex hormones including estradiol, estrone, testosterone and DHEA-S around the time of puberty (operationally defined as time of thelarche or breast development). In the study we assessed BP-3 exposure using both the BP-3 urinary biomarker and a metric from a self-report of sunscreen use. Previous epidemiologic studies of BP-3 used the BP-3 urinary biomarker to assess BP-3 exposure (Wolff et al. 2015Deierlein et al. 2017). However, because the biomarker has a half-life of about 20-135 hours, it likely represents exposure over a short period of time rather than long-term exposure (Gustavsson Gonzalez et al. 2002;National Research Council 2006;Matta et al. 2019Matta et al. , 2020. We felt that a questionnaire on sunscreen use over the past year, which is the major source of BP-3 exposure could better represent long-term exposure, rather than short-term exposure (Gustavsson Gonzalez et al. 2002;Gonzalez et al. 2006). The present study found a weak positive association between report of sunscreen use over the past years and BP-3 urinary biomarker which has been shown in other studies (Zamoiski et al. 2015;Ko et al. 2016;Berger et al. 2019).

Prevalence of BP-3 Exposure
The detection rate in our study was 98.9% with a median value of the BP-3 baseline measurements was 25.0 μg/g-creatinine (N = 353). The most recent NHANES data provide a median of 32.8 μg-BP-3/g of creatinine (95% CI: 21.5-50.1 (N = 409)) for children ages 6-11 during the 2013-2014 study years (Centers for Disease Control and Prevention 2017). Additionally, our study reported much higher maximum values (maximum value 9101.96 μg/g of creatinine) as compared to NHANES (95 th percentile for children ages 6-11 was 868 (95% CI: 432-1700)). Samples in the current study and NHANES data were analyzed in the same lab using the same quality control methods thus it is extremely unlikely that differences were caused by analytical reasons. Exposure levels of BP-3 appear to be higher in the United States, as compared to other nations. We considered how levels of urinary BP-3 compared to those in other studies. In children in India, the detection rate was 93%, with a geometric mean of 0.605 (standard deviation = 2.73) μg-BP-3/ g-creatinine (Xue et al. 2015). In Chinese children, ages 3-10 years, the geometric mean was 0.622 ng/mL and in Chinese adults 0.977 ng/mL compared to the geometric mean in U.S. children ages 3-10 years 9.97 ng/mL and U.S. adults 15.7 ng/mL for samples collected in the year 2012 (Wang and Kannan 2013). A study of Belgian adults, BP-3 was detected in 82% of urinary samples from women (N = 138) participants with a geometric mean of 1.4 μg-BP-3/ g-creatinine (Dewalque et al. 2014).

BP-3 Exposure and Association with Sex Hormones
Results did not suggest that higher levels of the BP-3 urinary biomarker were associated with lower levels of sex hormones during any of the time windows evaluated (before, during, or after thelarche). An increase in the number of days of reported sunscreen use over the past year was associated with a decrease in testosterone levels during thelarche. However, because of the many comparisons performed during these analyses, we believe that this association is likely due to a random finding. In addition, although the finding that a decrease in testosterone with increased number of days of sunscreen use was statistically significant, the finding is unlikely to have clinical significance. With a beta coefficient (−0.0163), an increase of 1 day per month that sunscreen was associated with a 2% decrease in testosterone level (exp(−0.0163) = 0.98). For a testosterone level of 5 ng/dL (approximate mean during thelarche) this would be about 0.08 ng/dL change . We also found an inverse association between past 30-day sunscreen use and estrone levels during thelarche, a finding that is not consistent with the previously postulated androgen suppression activity of BP-3 (Ma et al. 2003).

BP-3 Exposure and Association with Pubertal Milestones
A previous study found that increased BP-3 exposure measured by BP-3 biomarker was associated with later age of thelarche (Wolff et al. 2015). The current study provided a site level analysis of this finding, since only the Cincinnati site of the BCERP Puberty Cohort conducted hormone analyses. Site-specific effect estimates on the same set of participants included in the reproductive hormone analysis are necessary to fully understand the interplay between BP-3 biomarker data, sex hormones and pubertal outcomes. Interestingly we found the second quartile of BP-3 urinary biomarker level to be associated with earlier age of thelarche compared to the first quartile. Differences in findings with previously published results from analyses of the entire cohort may be due to geographic differences in sunscreen use patterns and different racial admixtures, as the other two sites were East Harlem in NYC with Black and Hispanic participants, and the San Francisco Bay Area with more Latino and Asian participants (Wolff et al. 2015). Additionally, the current study has added a sunscreen use metric utilizing questionnaire data to examine the effect of BP-3 exposure and pubertal milestones.

Limitations & Strengths
Among the limitations of this study are that some of the analyses used just a one-time spot measurement of the urinary biomarker for BP-3 exposure. Although, we have high confidence in the precision of the assay to measure the amount of BP-3 in urine, we have less confidence that it accurately and precisely represents long-term BP-3 exposure. Repeated measures of the urinary biomarker also identified no associations with sex hormones. The length of time between either BP-3 or sunscreen use measurement and the outcome was quite long for some participants, and changes in the pattern of exposure during this interval may have led to misclassification of exposure. However, we included an analysis examining BP-3 and hormones measured on the same day, which should have been able to measure a real time effect of BP-3 on sex hormones, but this sensitivity analysis had a reduced sample size and thus reduced power (Supplementary  Table 1). Although sunscreen is very likely the greatest contributor to BP-3 exposure, another limitation is that this study did not account for other sources of BP-3 exposure including other personal care products.
While environmental chemicals occur in mixture, examining the effect of the mixture of chemicals on hormones was not within the scope of this study. Commercial sunscreens are typically composed of a mixture of sunscreen agents. It is possible that effects we found are not actually due to BP-3, but to another chemical exposure with which BP-3 exposure is highly correlated with, and BP-3 is acting as a surrogate for that chemical in our analyses. In an analysis of the correlation between BP-3 and other chemicals [monoethylhexyl phthalic acid (MEP), the sum of di-(2-ethylhexyl) phthalate congeners (ƩDEHP), 2 5-dichlorophenol (25-DCP), triclosan (TCS), perfluorooctanoic acid (PFOA), enterolactone (ETL), hexabromodiphenyl ether (BDE-153), the sum of polybrominated diphenyl ether congeners (ƩPBDE)], BP-3 was found to be weakly inversely correlated only to PFOA (R = −0.094, p = 0.068) and not significantly correlated with the other chemicals (SM Pinney, unpublished data). We must also consider there may be a reverse causal relationship between our exposure and our outcome. Girls who go through breast development earlier may start using personal care products that contain BP-3 at an earlier age (Harley et al. 2018). Finally, we cannot rule out that the finding that higher report of sunscreen use was associated with decreased testosterone levels was due to chance alone.
This study had several strengths. It was a longitudinal prospective study, which allows for true risk assessment. The pubertal stages were very well characterized . Staging for breast maturation is notoriously difficult because breast tissue is often confused with the underlying fat. Staging with palpation allowed study clinicians to distinguish between breast tissue and fat. Training and certification along with regular quality assurance by the master trainer ensured higher accuracy and consistency of staging. The BP-3 measurements were conducted at the CDC Environmental Health Lab, which is the premier lab for measuring environmental exposures and included internal quality control methods as well as investigator-supplied duplicates. We therefore have high confidence in the accuracy of the measurements.

Conclusions
This study does not suggest that exposure to BP-3 is associated with a clinically significant change in sex hormones (estrogen, estrone, testosterone, DHEA-S) during puberty in young girls and thus does not suggest that BP-3 has endocrine effects in humans. This finding should be interpreted in the context of other studies that have examined hormone effects of BP-3 and association of BP-3 with clinical outcomes.
This study allowed for comparison between BP-3 exposure assessed by urinary biomarker and by sunscreen metric from questionnaire. The findings presented suggest that the two do not completely agree and may potentially be capturing different exposure time periods. BP-3 has a short halflife in humans, thus the biomarker likely captures very recent exposure. In contrast, questionnaire to report recent sunscreen use may be a better representative of typical or past year exposure. Future studies should aim improve exposure characterization of BP-3, potentially by using both the BP-3 biomarker and questionnaire.

Author Contributions
Courtney M Giannini, PhD: Developed hypotheses and specific aims, designed the study, wrote all statistical code, and conducted and interpreted all the statistical analyses and prepared manuscript. Susan M. Pinney, PhD: Principle Investigator of the study, conception of the project, obtained funding for the study, collaborated on design of the study and interpretation of analyses, edited and contributed to full manuscript. Frank M. Biro, MD: Principal Investigator of the study and previous studies for which data was used in the current study, contributed to the design of dissertation, edited and contributed to full manuscript. Richard Schwartz, PhD: Provided input on basic science interpretation of findings, edited and contributed to full manuscript. Bin Huang: Collaborated on the design of Aim 3, advised on the conducting and interpreting all statistical analyses, edited and contributed to full manuscript. Cecily Fassler, PhD: contributed to data preparation for and edited and contributed to full manuscript Donald Chandler, PhD: oversaw hormone measurements and contributed to the full manuscript