Meta-Analysis of Clinical Dose–Response in a Large Drug Development Portfolio

This article reports the results of a meta-analysis based on dose–response studies conducted by a large pharmaceutical company between 1998–2009. Data collection targeted efficacy endpoints from all compounds with evidence of clinical efficacy during the time period. Safety data were not extracted. The goal of the meta-analysis was to identify consistent quantitative patterns in dose–response across different compounds and diseases. The article presents summaries of the study designs, including the number of studies conducted for each compound, dosing range, the number of doses evaluated, and the number of patients per dose. The  Emax  model, ubiquitous in pharmacology research, was fit for each compound. It described the data well, except for a single compound, which had nonmonotone dose–response. Compound-specific estimates and Bayesian hierarchical modeling showed that dose–response curves for most compounds can be approximated by  Emax  models with “Hill” parameters close to 1.0. Summaries of the potency estimates show pharmacometric predictions of potency made before the first dose ranging study within a (1/10, 10) multiple of the final estimates for 90% of compounds. The results of the meta-analysis, when combined with compound-specific information, provide an empirical basis for designing and analyzing new dose finding studies using parametric Emax models and Bayesian estimation with empirically derived prior distributions.


Introduction
A large research portfolio of drugs was analyzed to answer the questions: (1) "What do clinical dose-response curves look like?," (2) "How well were they determined by the studies conducted?," and (3) "Are there consistent patterns in dose-response across diseases and compounds with different pharmacological targets?." Data collection is described in Section 2.1. The study designs for each compound, and the data distributions within dose groups are described in Sections 2.2 and 2.3. The data collection and analyses were restricted to efficacy endpoints. The E max model is fit separately for each compound in Section 3 using maximum likelihood estimation (MLE). Analyses of the data from the only compound with a nonmonotone dose-response are in Section 3.4. The distribution of the parameters of the E max model are described in Section 4. The MLE's of the parameters are summarized and then the parameter estimation is refined by the use of hierarchical Bayesian models representing the distributions of the parameters across compounds. The posterior distribution of the hierarchical parameters are used to derive the posterior predictive distribution for a new compound. This distribution can be used to evaluate the likely utility of different dosing designs for a future compound, and it provides an empirically derived prior distribution for future dosing studies. Supplementary summaries are provided for each compound. The summaries include graphs, tables, and model fits. The supporting documentation also includes C American Statistical Association Statistics in Biopharmaceutical Research November 2014, Vol. 6, No. 4 DOI: 10.1080/19466315.2014 cross-compound summaries, abbreviations, and notes on special situations.
There is a large statistical literature on dose-response studies, including collected volumes such as Krishna (2006), and Ting (2006). Some researchers have focused on hypothesis testing approaches and corresponding confidence intervals as the basis for dose selection. Examples include methods in Bretz et al. (2003), and Hsu and Berger (1999), Tamhane et al. (1996), and Ruberg (1989), which build on the earlier work of Williams (1971) and Dunnett (1955). These methods invoke few assumptions regarding the nature of the dose-response curve beyond monotonicity. They tend to be sequential in nature and differ primarily in the family of hypotheses tested, and the extent of family-wide error control achieved. The MCP-Mod testing approach (Bretz, Pinheiro, and Branson 2005), in contrast, is derived using a set of parametric doseresponse curves and seeks to test the global hypothesis of no drug effect. Many dose-finding approaches have also been developed within a curve estimation framework. These approaches vary widely regarding the assumptions made about the dose-response curve. Much recent work has focused on methods that make few assumptions about the shape of the dose-response curve, for example, the normal dynamic linear model, Berry et al. (2002), and other flexible models like Bornkamp and Ickstadt (2009). Such approaches may not assume monotonicity, as was done with earlier methods like isotonic regression (Mc-Donald 2005). Methodological research based on specific parametric models has been less common because these methods are well developed. An exception is the estimation phase of the MCP-Mod approach (Bretz, Pinheiro, and Branson 2005), which emphasizes the selection or averaging of common parametric models. There has been recent work deriving optimal designs for parametric dose-finding methods (Bretz, Dette, and Pinheiro 2010;Dette et al. 2010Dette et al. , 2008. Bornkamp et al. (2011) is an example of adaptive designs for dose finding that incorporates many of the methods referenced here. It is not an objective of the current article to develop new methods, but rather to characterize the likely conditions (e.g., dose-response shapes, sampling designs) in clinical dose-response studies to guide the selection and development of dose-finding methods.

Data Collection
A review of the study repository of a large sponsor (Pfizer) identified all dose-response studies initiated between 1998-2009. The portfolio represented roughly 10% of global pharmaceutical industry research and de-velopment spending during the time period (PhRMA 2008;Pfizer, Inc 2009;EP Vantage, Inc 2010). The repository included at least one compound in 13 of the 17 therapeutic classes containing three or more patented compounds in 2003(Congressional Budget Office 2006. Two of the 13 classes included in the repository are not represented in our primary analyses because none of the compounds in these classes met the inclusion criteria. The sampling of compounds followed a prespecified plan with some modifications in response to unanticipated conditions. Because the portfolio was focused on small molecules, and complex biological molecules may have different dose-response characteristics, the meta analysis was restricted to small-molecule compounds (i.e., < 25 atoms). Oncology compounds were also excluded because they have different dosing objectives and methods to achieve them. Our analyses consider only efficacy data, and they are restricted to a single timepoint. Trends in dose-response as a function of time are not reported. No safety data were extracted.
Compounds with evidence of clinical efficacy, which was based on assessments in clinical study reports, were selected for follow-up data collection. Dose-response studies for the selected compounds initiated before 1998 were also included when they could be located. Most of the studies were from phase 2 of development, but phase 3 studies with three or more doses were also included. Because of the time required to locate, extract, and verify the data from each study, it was not feasible to collect data on the large number of compounds that failed to demonstrate clinical efficacy. The implications of the compound sampling design are discussed in Section 4.
The initial data screening yielded 87 studies from 34 compounds. Patient-level efficacy data were extracted from 85% of the studies, and summarized results were obtained from study reports for the remaining studies. Upon reviewing the study-level data, four compounds from the original screening were excluded for lack of evidence of dose-response (three of the four were antiinfectives without placebo data). Two cross-over studies were identified, but both used different endpoints and visit schedules that could not be combined with the larger parallel group studies, so these studies were excluded. These exclusions yielded a total of 76 studies on 29 compounds. Additionally, the monotherapy data from a combination product were retained and merged with data from the compound's monotherapy development, while the data from the combination product were excluded. The disposition of the compounds and studies are summarized in Figure 1. A list of the compounds is in Table 1, and a list of abbreviations is included in the supplemental materials. A numerical ID was assigned to each compound and used to reference compounds throughout. The missing numbers in the ID  sequence are the excluded compounds, which have already been described. Some compounds were evaluated in more than one distinct disease area (e.g., compound 25 for rheumatoid arthritis and psoriasis). The initial disease evaluated is denoted by appending "0.0" or "0.1" to the ID, and "0.2" or "0.3" for the second and third diseases studied. Except where noted, the same compound evaluated for more than one disease will be regarded as multiple distinct compounds. Another complication occurred when there were distinct subpopulations evaluated for the same disease so the dose-response curves may differ (e.g., treatment-naive and experienced patients in antiviral trials). When there was more than one subpopulation, the studies for the subpopulation with less data were identified by adding 100 to the compound ID. Separate model fits and other summaries were included in the supplemental materials for these secondary populations, but they were not included in the cross-compound summaries. Regarding different diseases treated by the same compound as separate "compounds" in cross-compound summaries increased the number of compounds from 29 to 33. The results of the hierarchical modeling in Section 4 were not substantially changed when the data from the additional diseases were excluded.
Data were collected on the primary efficacy endpoint(s) from each study at each planned study visit. When there were multiple endpoints and differing visit schedules, a visit and endpoint were selected that were available for all studies from a compound to permit pooling across studies. All of the analyses reported here are based on the single endpoint and timepoint. Our analyses are based on available cases without imputation; the missing data rates for most studies were between 5-20%, but one study had a missing data rate of 38%. As part of quality checking, dose groups summaries based on the available cases data were compared to corresponding primary analyses in the study reports. Exact agreement was not anticipated nor observed, but there were no changes to dose-response patterns apparent in graphical displays. Dosing data were also collected, including information on dosing regimen (e.g., once or twice daily). A small number of compounds had a mix of dosing regimens; preliminary examination of the response data from the different regimens did not display marked differences, so data from different regimens were combined using total daily dosing, but this assumption was reexamined during model fitting. The response data were masked by linear rescaling so that the overall mean for each compound without regard to dose group or study was zero, and the overall standard deviation (SD) was 1.0. The dosing data were rescaled so the maximum dose tested for each compound was 1.0.

Designs for Estimating Dose-Response
All of the studies included in the final database had parallel group designs. Multiple dose-response studies per compound were common: 16 compounds were evaluated in a single study, 9 in two studies, and 8 compounds in three−six studies. Multiple studies were required to better characterize the dose-response as the first study was often inadequate, as will be described. Only one study used a substantially adaptive dosing design. Development of some of the compounds with a single study was terminated without completing dose selection for a variety of reasons. From the 33 unique compounds/diseases, only 20 advanced to phase three.
There were 9 compounds evaluated on 4 doses (including placebo), 4 compounds on 5 doses, 10 compounds on 6 doses, 7 compounds on 7 doses, and 3 compounds on 8-10 doses. The ratio of the highest dose to the lowest (nonplacebo) dose was computed for each compound across all studies. The 25th, 50th, and 75th quantiles of the dosing ratio were 8, 16, and 30, respectively. The common hyperbolic E max model, which is described in the Section 3.1, predicts a ratio of 81 between the doses yielding 90% and 10% of the maximum effect, so most of the compounds were evaluated over limited dosing ranges, even if it is presumed the doses were well-targeted. Only four compounds had dosing ranges exceeding 81, with a maximum dosing ratio of 588.
The number of patients per dose group varied widely, in part because some dose-finding continued into phase 3 of development. The dosing designs, including the number of patients per dose in each study, are included in the compound-specific summaries in the supplementary materials.

Data Distributions Within Dose Groups
There were six compounds with binary primary endpoints, and 27 compounds with continuous primary endpoints. There were 20 endpoints derived from patient or investigator assessment, and 13 endpoints based on laboratory measurements or vital signs (e.g., blood pressure, weight). Many of the clinical endpoints were collected from ordinal rating scales with numerous categories, which were regarded as continuous in the protocol-specified analyses. Some count data with high frequencies were also regarded as continuous by the researchers; one count endpoint with low frequency had been converted to a binary response. There were some compounds in the repository that had time-to-event primary endpoints, but none of these compounds met the clinical efficacy inclusion criteria. In the remainder of this section, the distributions of the continuous endpoints within dose groups are described for the 25 compounds with continuous patient-specific data available.
Boxplots of the data within dose groups (pooled across studies within compound) are included with the supplemental summaries. Most of the boxplots are consistent with symmetric bell-shaped data distributions, reflecting in part transformations selected by the data analysts (e.g., log transformation of viral load), and the use of dichotomous responder variables when continuous and polychotomous variables had problematic distributions. Outliers are present in many of the boxplots. They were defined as any point exceeding the lower and upper quartiles by more than 1.5 times the interquartile range (Tukey 1977). The median (across compounds) proportion of outliers was 0.025 and its lower and upper quartiles were 0.014, and 0.047, respectively. To give context to these proportions, data were simulated in the dose groups of each compound from a homogeneous t-distribution with 7 degrees of freedom. One hundred portfolios were simulated, and the summarized proportions of outliers were averaged yielding a median proportion of outliers of 0.028, and averaged quartiles of 0.022 and 0.037, which were similar to those from the real data. In contrast, the median proportion of outliers from a normal distribution was 0.01. The rates of outliers in the dose-response data are consistent with broader experience reported from both laboratory (Posekany, Felsenstein, and Sykacek 2011), and less controlled data sources (Venables and Ripley 2002).
There was evidence of variance heterogeneity that increased with dose, but the increases were not as large as reported in some nonclinical dose-response settings. Because of the presence of outliers, robust measures of scale were computed from the interquartile ranges. To improve stability, the average of the interquartile ranges from the two highest dose groups were compared to the average from the two lowest dose groups (including placebo). Of the compounds with continuous endpoints, four had estimated ratios exceeding 2.0, with a maximum of 2.59. There were no ratios below 0.5. In the simple t-based simulations with homogeneous variability, no simulated sets of designs yielded more than two estimated ratios greater than 2.0.
The consistency of the placebo response when there were multiple studies for a compound was also evaluated. Likelihood ratio tests (binary data) or F-tests (continuous, normal model) were computed for each compound testing the homogeneity of the placebo response across studies. The proportion of compounds with placebo heterogeneity rejected at the 0.05 level was 0.19, and the p-values of the tests generally trended toward smaller values. The impact of the heterogeneity was apparent on some of the models fit in Section 3, so all models include a separate intercept (placebo) parameter for each study.

E max Models for Continuous and Binary Data
For continuous outcomes, denoted by Y , a normal model was used with homogeneous residual variance, σ 2 , and a mean function, called the sigmoid E max model, given by where D is dose, and (E 0 , ED 50 , E max , λ) are unknown parameters. The mean response is defined for D ≥ 0, and changes monotonically from E 0 at D = 0 to a maximum difference of E max . The parameter, λ, is called the Hill parameter, and the model in (3.1) is called a hyperbolic E max model when λ = 1 (MacDougall 2006). For the hyperbolic E max model, the ratio of the dose achieving 90% of the maximum difference with placebo, ED 90 , and the dose achieving 10% of the effect, ED 10 , is always 81. The model yields an increasingly steep sigmoidal shape as λ increases, and a more gradual approach to the maximum difference with λ < 1. When E 0 = 0 and E max = 1, the model for the conditional mean is the distribution func-tion for the log-logistic distribution (Johnson and Kotz 1995). The model can be derived from receptor occupancy models in pharmacology when the λ parameter is a positive integer (Iliadis and Macheras 2006). Common parameters were assumed across different studies for the same compound, except for different placebo parameters, E 0 , which were estimated separately for each study. The adequacy of the assumption of common treatment differences relative to placebo across studies, which is implied by the assumed common model parameters, was assessed through formal tests and graphical displays described in Section 3.3. Corresponding E max models were also used to fit data for compounds with binary outcomes, applied on the logit scale (e.g., Sheiner et al. 1997;Tan et al. 2011): dose-response curves based on this model were backtransformed to the response rate scale. Parametric collections of dose-response curves based on other distribution functions could also be used to model the dose-response data. Because most of the doseresponse curves were estimated with low precision, it is likely there are other parametric collections that would yield goodness-of-fit statistics similar to those produced by the E max model.

Automated Fitting Algorithm
The E max models for the continuous outcomes were fit using the nonlinear least squares R function, nls, and the binary responses with MLE using the R function nlm (R Development Core Team 2011). An automated algorithm tried Newton-type maximization methods initially, and a partial least squares method if the Newton method failed. Two different starting values were tried if needed to achieve convergence, which was judged using the default criteria. The log of the ED 50 parameter in (3.1) and (3.2) was estimated directly, and then backtransformed. The MLE's of the parameters are denoted by (Ê 0 , ED 50 , E max ,λ).
Models were fit to all compounds including those in secondary subpopulations, which were included in the supplemental materials. The summaries here are for the primary populations. Convergence was achieved for the hyperbolic E max models in 26 of the 27 compounds with continuous outcomes, while convergence was achieved for only 18 of the sigmoid models. The corresponding convergence for the 6 binary outcome models was 5 and 4, for the hyperbolic and sigmoid models, respectively. Summaries of the ML parameter estimates from converged fits are in Section 4. Model fits for the compounds where ML estimation failed were obtained using Bayesian methods, also described in Section 4.
Data from the compounds where the automated fitting failed were explored to understand the failed convergence. Most of these were settings with few doses (e.g., four doses including placebo), and the doses included were on the steeper linear portion of the E max curve, or the doses were only near the plateau. Because of the lack of identification, further exploration of starting values and parameter transformations (Ross 1990) did not produce additional converged estimates. Convergence was sometimes sensitive to the convergence criteria.

Assessing Model Fit
The fitted sigmoid E max curves for three compounds with well-studied pharmacological targets are displayed in Figures 2-4. The E max curve of a statin compound (ID 6) for reducing low-density cholesterol (LDL) haŝ λ = 0.35 ( Figure 2). Mandema et al. (2005) found similar curves for four other statin compounds, which could be adequately described with a common E max and a common λ parameter, while allowing for different potencies (ED 50 ) and placebo responses. An HIV C-C chemokine receptor 5 (CCR5) inhibitor (ID 22) for viral load reduction in Figure 3 had a common estimation challenge arising because doses near the plateau were not tested, so although convergence was achieved and the resulting E max curve described the data well, the individual parameter estimates (e.g.,λ = 0.51) were unstable. A compound (ID 18) with the same target was tested over a broader dosing range, and producedλ = 1.08. The dose-response for a phosphodiesterase 5 (PDE5) inhibitor for treatment of erectile dysfunction, which hasλ = 0.93 is displayed in Figure 4. An unpublished internal meta-analysis of two other marketed compounds, and another internal compound (ID 33) with the same target, were also well described with the sameλ(= 0.8) and E max . The shared E max and λ parameters of compounds with the same pharmacological target despite different potencies is predicted by common molecular binding models in pharmacology.
For compounds with more than one study, heterogeneity of the overlapping dose group differences from placebo was tested using a likelihood ratio (binary data) or F-test (continuous data) based on an unstructured model for the dose group differences. There was generally good consistency in the dose group differences despite heterogeneity in placebo response. Of the 17 compounds with multiple studies, only two had nominally significant heterogeneity at the 0.05 level. One compound (ID 15, p-value = 0.049) had moderately larger effects in the second study than the first, with no apparent explanation. The second compound (ID 17.1, p-value = 0.006), had different dosing regimens (once versus twice daily dosing) between its two studies, which may have contributed to this discrepancy. Neither compound had dose-response curves that displayed nonmonotonicity or deviation in appearance from the usual E max shapes. Goodness-of-fit F-tests (Bates and Watts 1988) and likelihood ratio goodness-of-fit tests were computed for continuous and binary endpoints comparing the hyperbolic E max model to an unstructured but homogeneous (across studies) model after adjustment for placebo amongst all compounds, except the two compounds with evidence of heterogeneous means. One compound (ID 34) displayed clear evidence of nonmonotone dose-response, and it is described in detail in Section 3.4. The compound in Figure 2 was well described by an E max model with λ < 1, but not λ = 1. Two other compounds had a nominally significant test. One of these compounds (ID 4) had multiple protocols combining once and twice daily dosing, which accounted for the model discrepancies when the different dosing regimens were fitted separately. The other compound (ID 9) was the only one evaluated using an adaptive dosing design. The response to one dose was better than predicted by the model (approximately 3 SE), and better than observed for the highest dose. Mandema, Boyd, and DiCarlo (2011) performed a meta-analysis that included this compound, and two other compounds with the same target, both of which completed phase 3 testing. The dose-response for the other compounds was well described by an E max model. Because of the lack of concurrent randomization, it is unclear whether the discrepant response was due to bias, or sampling variability consistent with the examination of numerous compounds.
The likelihood ratio test comparing the sigmoid and hyperbolic E max model was significant for only one compound (ID 6, Figure 2). The power to reject the hyperbolic E max model with λ = 1 was low for most individual compounds, and the quality of the usual asymptotic approximations is suspect (Thomas 2006). The distribution of λ across compounds is quantified in Section 4.2.

Example of Nonmonotone Dose-Response
The dose-response of a compound (ID 34) to raise high-density cholesterol displayed evidence of nonmonotone dose-response in Figure 5. The goodnessof-fit test for the hyperbolic E max model was 0.002; the sigmoid E max model did not converge for this data, which included only three dose groups and placebo. Prior to completion of the study, Nissen et al. (2007) published results from two studies that displayed a nonmonotone dose-response for a compound with the same target. A Bayesian E max analysis had been proposed in the original analysis plan, but the plan was altered to a simple unstructured mean model after publication of results for the other compound. As part of the Bayesian E max analysis, a posterior predictive check was planned to compare the best of the lower dose group sample means to the sample mean from the highest dose. The posterior predictive probability was 0.007, so even without the published information, it is likely that the analyses would have been appropriately amended. Both compounds were terminated, so the practical decision would have been unchanged if the incorrect assumptions about dose-response not been corrected.

Characterizing the Distribution of E max Model Parameters
Bayesian hierarchical models with distributions summarizing the parameters from different compounds were fit to assess the variability in the parameters adjusting for estimation error. The data from all compounds with continuous outcomes were included, even if the data from a compound were inadequate to produce stable MLE, except the one compound (ID 34) with nonmonotone dose-response was excluded. The parameters from the compounds with binary endpoints were similar to those for continuous endpoints, but they were not included to simplify the modeling.
Let j index the compounds with continuous outcomes included in the analysis, and k = 1, . . . , n j index the n j protocols for compound j. The model includes compound-specific E max j , ED 50 j , and λ j parameters, and protocol-specific E 0 jk parameters. The mean response for a patient receiving dose D of compound j is The residual variances for the (assumed) normally distributed errors are also compound-specific and denoted by σ 2 j . Distributions are specified for the λ j and ED 50 j parameters across the compounds. Cross-compound distributions for the E 0 jk parameters are not specified. When fitting hierarchical models for the λ j and ED 50 j , independent normal prior distributions with mean 0 and SD of 3 are used for the E 0 jk throughout. Recall that the data are normalized so the overall mean is 0 and the overall SD is 1 for each compound. Because the compounds were selected based on their differentiation from placebo, the distribution of effect sizes in the sample would not be an appropriate prior distribution for a new compound without demonstrated efficacy, so a cross-compound distribution for the E max j parameters was not estimated. For the hierarchical models, independent t 3 (t with 3 degrees of freedom) prior distributions with mean 0 and scale parameter 1 are specified for log |E max j | throughout, with the direction of the effect assumed to be known.
The magnitude of the ML estimates, E max j , are described in Section 4.1, and the prior distribution specified for each E max j is explained. Parameteric hierarchical distribution for the λ j are specified in Section 4.2, and preliminary estimates of these distributions are reported. Diffuse independent prior distributions for the ED 50 j are used for the models fit in Section 4.2. In Section 4.3, hierarchical parametric distributions are specified for the ED 50 j , and preliminary estimates of these distributions are reported assuming hyperbolic E max models (i.e., the λ j = 1). Section 4.4 summarizes the estimates of the hierarchical distributions from simultaneous estimation of different combinations of the hierarchical models for the λ j and ED 50 j . Model sensitivity and other fitting issues are also assessed. The distribution of the parameters across compounds requires special consideration because only compounds with demonstrated efficacy were included. When the E max parameter is 0, the ED 50 and λ parameters are not identified. Compounds with nonzero E max parameters so small their effects cannot be demonstrated in clinical trials are excluded from our sample, but their dose-response shapes are of little practical interest, even if they differ from those included in our sample. An anomalous situation can arise when diffuse prior distributions are assumed for the E max and ED 50 parameters. A compound with |E max | > 0 can fail to demonstrate effectiveness if the doses in the clinical trial design are much too low. The efficacy data from a null clinical trial cannot differentiate a small |E max | value from an ED 50 much larger than was anticipated when the trial was designed. The use of diffuse prior distributions assigns high probability to larger values of |E max | and ED 50 , which can produce unintended high posterior probabilities for very large values of the ED 50 when the trial results are null. Our analyses included only efficacious compounds because it was not feasible to extract data from the numerous failed compounds, but this selection criteria also reduces sensitivity to the ambiguous implications for the model parameters that arise with null efficacy data.
To reduce computation time, the data were summarized by dose group means and within-dose group sums of squares. All of the models were evaluated using Open-BUGS software (Lunn et al. 2009), and confirmed using STAN Hamiltonian Markov chain Monte Carlo software (Stan Development Team 2013). The usual convergence diagnostics were evaluated. A long burn-in (50,000) period was selected, and then 30,000 parameter sets were generated with thinning of 50. The validation methods proposed by Cook, Gelman, and Rubin (2006) were applied to several of the model fits. The model fits and prior sensitivity were also assessed with additional simulation methods described in Section 4.4.

The E max Parameter
The estimated magnitude of the efficacy of the sampled compounds is summarized in this section, but a model for the E max j across compounds is not developed. The 25th, 50th, and 75th percentiles of | E max j | estimated for the hyperbolic model from compounds with continuous endpoints were 0.86, 1.07, and 1.69, with minimum and maximum values of 0.2 and 3.76. The quantiles of log | E max j | were well approximated by the quantiles of a t 3 distribution with mean 0 and scale parameter 0.44. The prior distribution for the E max parameters used throughout Section 4 has a conservatively selected scale parameter of 1.0.
The E max j based on the sigmoid model were close to those of the hyperbolic model with two exceptions where the estimated maximum effects from the sigmoid model were approximately twice those from the hyperbolic model. Theλ j corresponding to the unusual E max j were the lowest estimated λ. They resulted from a near indeterminacy in the model fits that is described in Section 4.4.
The E max are theoretical maximum that are not typically achieved in practice, and their estimation can be unstable. A more interpretable measure of the effect magnitude for the continuous endpoints was computed from the estimated absolute difference in mean response between the highest dose and placebo divided by the residual SD, which was based on the MLE from the hyperbolic models. This summary was not sensitive to the model used. The 25th, 50th, and 75th percentiles were 0.53, 0.96, and 1.66, with minimum and maximum values of 0.19 and 6.84. The magnitude of the effect was larger for the 12 compounds whose endpoints were based on laboratory measurements compared to the 14 compounds with investigator/patient reported outcomes (IRO/PRO). The 25th, 50th, and 75th percentiles for the standardized effects on laboratory endpoints were 0.79, 1, and 3.77, while they were 0.45, 0.59, and 1.21 for the IRO/PRO endpoints. With the exception of therapeutic areas where laboratory biomarkers (e.g., viral load, LDL) were accepted in place of clinically assessed endpoints, the standardized treatment effects achieved were most often less than 1.0, and more commonly close to 0.5 amongst compounds with demonstrated clinical effect.

The λ Parameter
The 25th, 50th, and 75th quantiles of the converged λ amongst compounds with continuous endpoints were 0.85, 1.13, and 1.61, with minimum and maximum values of 0.35 and 4.6. Three different distributions were used to represent the variation of the underlying λ j across the portfolio of compounds, including those where the ML estimation failed.
The first distribution was a t 3 distribution for log(λ j ) conditional on the mean, μ λ , and scale parameter, σ λ . The mean was assigned a diffuse normal distribution with mean 0 and SD 10, and σ −2 λ was assigned a diffuse gamma distribution with shape and rate values of 0.001. The t 3 distribution was used for compound-specific parameters to reduce the shrinkage applied to potential outliers.
The second distribution for λ was a beta distribution scaled to the interval (0, 6). This range was specified because larger values of λ are practically indistinguishable, and a bounded distribution avoids potential numerical difficulties. Before scaling to (0, 6), the beta distribution was parameterized by the logit of the mean, denoted by α, and a size parameter, γ , as was done in Kahn and Raftery (1996) to support the assumption of prior independence of the parameters. The α parameter was assigned a normal prior distribution with mean 0, and an SD of 1.75. The prior distribution for γ is a distribution with shape and rate parameters 1.0 and 0.1. When the inverse logit transform is applied to α, followed by scaling, the prior distribution for the mean of λ is centered in (0, 6) with support throughout. The prior distributions for α and γ were selected to yield an approximately uniform distribution for λ, with some additional probability near 0 and 6.
The third prior distribution for λ was a discrete distribution with 12 possible values in (0.5, 1.0, 1.5, . . . , 6). The probabilities for the λ values were assigned a symmetric Dirichlet distribution with a concentration parameter of 1.0.
For the initial modeling of the λ j distribution, the log(ED 50 ) were assigned independent t 3 distributions centered about initial projections of the ED 50 with scale 2.0; the empirical basis for this prior distribution is explained in Section 4.3.
The Bayesian models fit the data well, as assessed by plotted dose-response curves (plots in the supplemental materials) and posterior predictive values for summaries of the MLE's. Simulation of the data from the portfolio of compounds based on the posterior distribution was performed for each multivariate parameter set created using the MCMC methods by (1) simulating response data for each dose group in each protocol using the E max model evaluated with the simulated parameters, (2) applying the same automated ML estimation procedure to the simulated data that was applied to the real data, (3) storing summaries of the MLE's for each compound. Note that the posterior predictive median (approximately 0.86) for λ based on the continuous prior distributions was lower than the median of theλ, (1.13). The posterior predictive distributions for the medianλ across compounds, however, were in good agreement with the observed median λ. The means of the posterior predictive distribution for the median ofλ were 1.07, 1.14, and 1.13, for the log t 3 , scaled beta, and Dirichlet prior distributions, respectively. Other percentiles of theλ were also reproduced by simulations from the posterior distributions. The relationship between the central tendency of the ML and Bayes estimates, and a more subtle multivariate sensitivity to the prior distribution are detailed in Section 4.4.

The ED 50 Parameter
The ED 50 from different compounds range from micrograms to decigrams, but much of this variation can be predicted from preclinical and early-stage clinical data before initiating clinical dose-finding studies. The variation in the ED 50 about these preliminary predictions is described in this section. The log transformation of the ED 50 is used throughout. The early ED 50 predictions were not consistently recorded, and confidence intervals for them were not typically computed. The predictions were approximated by the mid-point between the two lowest nonzero doses in the first dose-response study, because dose-response designs should have at least one dose below the ED 50 , and clinicians are reluctant to include many doses that are predicted to have inadequate efficacy. The approximated preliminary predictions of the ED 50 j are denoted by P 50 j . Figure 6 displays a t 3 quantile plot of the log( ED 50 j /P 50 j ), where ED 50 j are the MLE from the hyperbolic model. The P 50 j were well calibrated, as the line through the quantile plot has mid-point close to (0, 0). Because the λ j were concentrated near 1.0 in Section 4.2, Quantiles of a t3 distribution log( MLE ED50 / p50 ) Figure 6. Quantile plot of the log( ED 50 /P 50 ). The ED 50 are from the hyperbolic E max model for compounds with continuous outcomes. The number of compounds plotted is 25. Compounds 2 and 34 were excluded because of a lack of converged E max model and nonmonotonic response, respectively. the MLE and posterior distribution of the ED 50 j parameters are evaluated first with λ j = 1. The consequences of joint estimation of all of the parameters are described in Section 4.4. Despite estimation errors and potential differences between the P 50 and the actual early predictions, the log( ED 50 j ) were within a 2.3-fold multiple of the log(P 50 j ) with only three exceptions. The implied interval is equivalent to (0.1P 50 < ED 50 < 10P 50 ). The compound (ID 3) whose ED 50 j was far below its preliminary prediction was confirmed as a real error. It was not due to an unstable MLE or a poor approximation of the preliminary prediction by the P 50 . The prior distribution for the ED 50 in Section 4.2 was a t 3 distribution for log(ED 50 /P 50 ) with mean 0 and conservatively selected scale parameter, 2.0.
The distribution of the log(ED 50 j /P 50 j ) across the portfolio of compounds was modeled by a t 3 distribution with mean, μ 50 , and scale, σ 50 . A diffuse prior distribution was specified using a normal distribution for μ 50 with mean 0 and SD of 10, along with a Gamma distribution with shape and rate parameters of 0.001 for σ −2 50 . A second more informative prior distribution was specified with μ 50 assigned a mean of 0 and SD of log(2). A corresponding scaled chi prior distribution was specified for σ 50 with log(2)/σ 50 ∼ χ d f / √ d f , and d f = 2. The more informative prior distribution is based on an internal pharmacometric objective to predict the ED 50 j within a multiple of 2 using the preliminary data sources.

Prior Sensitivity and Posterior Predictive Values
The three hierarchical distributions for λ j in Section 4.2 were combined with the two for the ED 50 j in Section 4.3 to jointly model the variation in these parameters. The prior distributions for the E 0 jk and E max j parameters were unchanged. The posterior predictive distributions of λ and ED 50 from the combinations based on the log t 3 and scaled beta distributions for λ are summarized in Table 2. Trends for the posterior distributions based on the discrete prior distribution of λ j were consistent with those from the continuous prior distributions and are not reported. The percentiles in Table 2 varied depending on the differing prior distributions, and when compared to the posterior distributions in Sections 4.2 and 4.3. Two consistent trends were (1) the highest posterior density region had λ < 1.0, and 2) the probability that λ > 1.5 was low, with a maximum value of 0.09.
The other pronounced trend was the increased central tendencies of the ED 50 /P 50 when compared to results in Section 4.3 based on the hyperbolic model. The increased ED 50 were associated with decreased values of the λ. The central tendencies of the posterior distributions of the compound-specific |E max j |, which are not displayed, were also increased. With small treatment effects and limited dosing ranges, there was a multivariate indeterminacy in the likelihood/posterior distribution, with decreasing values of λ compensated by increased values of the ED 50 and |E max |. This indeterminacy is also common when fitting sigmoid E max models to data for a specific compound (Bornkamp 2012;Woodward 2012). All of the posterior distributions yielded similar dose-response curves over the observed dosing ranges, and each curve described the data well. Plots of the dose-response curves derived from the scaled beta prior distribution for λ and the more informative distribution for ED 50 are included in the supplemental summaries. Simulations from the posterior predictive distributions of the percentiles of theλ and ED 50 reproduced the MLE's as described at the end of Section 4.2, even though the central tendencies of the distributions in Table 2 were further from those of the MLE's. Increasing the prior variance of the |E max | combined with the more diffuse prior distribution for the ED 50 yielded yet more pronounced trends toward lower values of λ and higher values of ED 50 .
The relationship between the compound-specific MLE's and the Bayesian estimation was also assessed by simulation of the portfolio. The simulation model parameters, (E 0 jk , ED 50 j , E max j ), were set equal to (Ê 0 jk , ED 50 j , E max j ) from the hyperbolic E max models. When the data were simulated with λ j = 1 for each compound, the means over 5000 simulated portfolios of the 25th, 50th, and 75th percentiles of the MLE's of the λ j were 0.87, 1.16, 1.71, which were in close agreement with observedλ j percentiles, 0.85, 1.13, and 1.61, respectively. When the simulations were repeated with λ j = 1.5 in the simulation model, the means of the percentiles from the simulated portfolios were 1.23, 1.6, 2.17. The proportion of simulated portfolios yielding percentiles above those from the actualλ j was > 0.99 when λ j was fixed at 1.5. The converged ML estimates of λ j were biased upward.
The simulation of the portfolio was repeated and the posterior distributions of the parameters were computed based on the t 3 prior distribution for log(λ); computation time limited the number of replications to 200. When λ was set to 1 in the simulation model, the mean across the 200 simulation replications of the posterior predictive median for a new λ was 0.94 for the more informative prior distribution of the ED 50 , and 0.92 for the less informative prior distribution. When λ was set to 1.5, the means of the posterior median were 1.4 and 1.39. The portfolio of compounds was also simulated with the dosing designs for each compound unchanged, but with the sample sizes per dose increased proportionally. Consistent with asymptotic theory, the Bayes posterior means and medians for λ approached the fixed population values with increasing sample sizes, but the approach to the population λ was monotone from below.

Summary
The combination of low signal-to-noise, a small number of doses, and limited dosing range often obscured the dose-response shape in individual studies, but when aggregated across studies and compounds, a consistent pattern of dose-response was present, and it was well described by the hyperbolic E max model (Question 1, Section 1). In addition to displaying a consistent dose-response pattern, data from multiple protocols for the same compound were consistent, despite variation in placebo response. The consistent E max pattern may not generalize to other contexts. Gries et al. (1999), for example, described an example of a biologic compound with nonmonotone efficacy determined by its receptor binding. A meta-analysis of dose-response in toxicology studies by Calabrese and Baldwin (2001) suggested nonmonotone dose-response was more common.
The dosing range and the number of doses tested were inadequate to characterize the dose-response for most compounds (Question 2, Section 1). More than 1/3 of the compounds were tested on three or four nonplacebo doses, and more than 1/2 had a maximum dosing ratio less than 20. For some compounds, the limited dosing range was due to safety concerns, as noted for example, by Dutta, Matsumoto, and Ebling (1995), but even when there were safety concerns, investment of more resources in toxicology studies could sometimes expand the potential dosing range. Lower doses were also omitted from many studies. The proportion of compounds with the final ED 50 from the hyperbolic E max model below the lowest initial dose was 0.2. Although not captured in our data collection, the availability of different manufactured doses may be as important as safety and efficacy considerations when determining the dosing designs. Dosing ranges less than 20-fold are dubious given the uncertainty in the ED 50 at the initiation of dose-finding studies, and the nearly 100-fold range in doses between minimal and maximal response predicted by the hyperbolic E max model. The inadequacy of the designs is further supported by the frequent need to conduct multiple studies with additional doses.
Patterns in dose-response were summarized by the distributions of the E max model parameters (Question 3, Section 1). The modeling showed the Hill parameters were concentrated near 1.0. The assessment of this distribution between approximately (0.25, 1.5) is difficult because of the lack of identification within this range. Steeper dose-response curves represented by λ > 2 had low posterior predicted probability. This low probability depends on the assumption that our portfolio of compounds was a random sample from the collection of compounds that will be sampled by future drug developers. Dutta, Matsumoto, and Ebling (1995) reported without details on a meta-analysis yielding typical λ parameters closer to 2.0. Holford and Sheiner (1981) summarized an example from Winkle et al. (1976) and Meffin et al. (1977) that reported an estimated λ much greater than 2.0. All of these authors were evaluating concentration response rather than dose-response, so attenuation of the curves due to the use of dose rather than concentration might partially explain the steeper curves they observed.
These considerations suggest using the most diffuse of the posterior predictive distributions for λ. Alternatively, the hyperbolic model could be justified as the simplest model consistent with most past data, and it could be used unless the model is rejected by goodness of fit testing. The λ and ED 50 parameters for a new compound were approximately independent for each posterior predictive distribution in Table 2. The approximate independence was found even though the parameters for a compound with observed dosing data are sometimes highly dependent in their posterior distribution. The most diffuse posterior predictive distribution for λ (first column) can be approximated over a broad range of values by a beta distribution scaled to (0, 6) with parameters 3.03 and 18.15, computed by matching moments. The corresponding distribution for log(ED 50 /P 50 ) can be approximated by a t 3 distribution with mean 0.79 and scale parameter 0.6. In addition to their potential use in analyses, the distributions for (ED 50 , λ) also provide an empirical quantitative basis for model-based designs such as those in Dette et al. (2008).
The accuracy of the preliminary ED 50 estimates was conservatively assessed because the predicted ED 50 had to be inferred from the initial dosing design. The methods and data used for the preliminary estimates vary between compounds and therapeutic areas, so there could be important differences in methodology between sponsors. When a compound-specific estimate of precision is available, this information will dominate. Even in these situations, the results in Section 4 provide a useful check on the claimed precision, which can be difficult to assess.
The results in Section 4.4 displayed the multivariate sensitivity of the E max model parameter estimation to the specification of the prior distribution. The common practice of specifying diffuse prior distributions for the ED 50 and E max parameters when the λ parameter is also estimated favors model fits with smaller λ and larger ED 50 and |E max |. Our experience has been that the preliminary estimates of the ED 50 have utilized the hyperbolic E max model, which might explain why the preliminary ED 50 were best calibrated with the ED 50 estimated from the hyperbolic E max model.
There was one compound in the portfolio with nonmonotone dose-response. Despite demonstrated efficacy for the compound and another in same therapeutic class, both compounds were terminated. Replication of this and the other findings here using dose-response studies conducted by other large sponsors would be helpful. While graphical inspection and goodness of fit testing are warranted for all compounds, routine planning for nonmonotone dose-response was not indicated by the data in a large and diverse portfolio, or by the likely utility of compounds with nonmonotone efficacy.

Supplementary Materials
Summaries and additional information on the compounds discussed in the article. [Received October 2013. Revised March 2014