A pool-adjacent-violators-algorithm approach to detect infinite parameter estimates in one-regressor dose–response models with asymptotes

Binary response models are often applied in dose–response settings where the number of dose levels is limited. Commonly, one can find cases where the maximum likelihood estimation process for these models produces infinite values for at least one of the parameters, often corresponding to the ‘separated data’ issue. Algorithms for detecting such data have been proposed, but are usually incorporated directly into in the parameter estimation. Additionally, they do not consider the use of asymptotes in the model formulation. In order to study this phenomenon in greater detail, we define the class of specifiably degenerate functions where this can occur (including the popular logistic and Weibull models) that allows for asymptotes in the dose–response specification. We demonstrate for this class that the well-known pool-adjacent-violators algorithm can efficiently pre-screen for non-estimable data. A simulation study demonstrates the frequency with which this problem can occur for various response models and conditions.


Introduction
When fitting a quantal dose-response model to observed data, it is common to use maximum likelihood for estimating the model parameters. [1,Section 7.2] In many cases, however, anomalies can arise in obtaining the maximum likelihood estimator (mle) when the model is highly nonlinear. To illustrate, consider the following Monte Carlo-based scenario taken from Buckley and Piegorsch [2]: to study the small-sample behaviour of the mles under an Abbott-adjusted [i.e. positive lower asymptote; 3] Weibull dose-response model, ∼ Bin(n, π(x i )) with π(x i ) = γ 0 + (1 − γ 0 )(1 − exp(−e β 0 x β 1 )), we simulated 2000 data sets with n = 50. Four dose levels of x = 0, 0.25, 0.5, 1 were chosen, to match a common single-exposure design in cancer risk experimentation. [4] The true parameter values of γ 0 = 0.01, β 0 = −2.35, and β 1 = 1.63 were taken to correspond with a known *Corresponding author. Email: rcdeutsc@uncg.edu dose-response pattern having π(0) = 0.01, π( 1 2 ) = 0.04, and π(1) = 0.1. The mles of the three parameters were estimated using the function optim in R 2.15. [5] Investigating the fitted parameter values for all these 2000 replications, we found that a number of unusually large point estimates were evidenced for the parameter β 1 (ranging up to 3.219453 × 10 13 ) comprising about 17% of all the 2000 generated data sets. Clearly, when applied to real data any consequent inferences based on such inflated estimates could lead to improper or incorrect conclusions on the scientific phenomenon under study.
As discussed below, such anomalies in the model estimation process are affected by the underlying functional form of the dose-response function, the sample size, and the limited number of doses. Furthermore, the mle of the parameters in these cases actually contains infinite values that drive the estimated dose-response function to a degenerate structure: it is a step function and impossible to find using standard parameterizations.
The issue of infinite maximum likelihood estimates in quantal dose-response modelling is well known, and has received considerable attention in the literature. Wedderburn [6] and Haberman [7] provided a series of conditions for existence (i.e. finiteness) and uniqueness of the mle which apply to the class of generalized linear models. However, these authors gave no explicit means of detecting cases that could lead to infinite parameter estimates. Towards that end Silvapulle [8] demonstrated that finite parameter estimates in binomial response models require overlap to exist in the observed data, i.e. for any p regressor variables there does not exist an x 0 for which the entire data set is separated into responses only and non-responses only, with a possible 'quasi-separated' mixture of responses and non-responses at x 0 . This phenomenon is known as the separated data issue and was further generalized by Albert and Anderson, [9] who considered separated data effects for a wide class of logistic regression models. They also mentioned that their result applied in principle to many other quantal regression models, and they gave a linear programming algorithm for detecting separated data. Santner and Duffy [10] expanded on the Albert-Anderson approach, and discussed further the concept of quasiseparated data. Lesaffre and Albert [11] provided a non-existence theorem for the mle with partially separated data. Also, these authors looked at how to distinguish separated data from cases of multi-collinearity. Clarkson and Jennrich [12] considered new algorithms for detecting possible infinite parameter estimates, while Kolassa [13] presented an advanced algorithm to identify separated data. Numerous other works have since appeared, expanding on these earlier articles.
However, none of these results apply to an additional issue that can occur for binomial response models using asymptotes in the model formulation (such as anAbbott-adjustment).A further drawback of the references discussed above is that the detection of data leading to infinite parameter estimates is performed during the actual estimation procedure and/or requires a linear programming problem to be solved. When considering only one predictor variable this process may be inefficient and computationally expensive. As part of her unpublished dissertation, Buckley [14] provides preliminary work for this finding in the case of four design points and a lower asymptote -introducing the concept of specifiable degenerate functions (calling them specifiably separable), and proposing an ad hoc detection method.
Below, we generalize Buckley's results and relate this problem to the easily implemented pool-adjacent-violators (PAV) algorithm [15,Section 2.4] in order to detect both separated data and asymptote induced non-convergence with a single predictor for the dose, x i . This result applies to an arbitrary number of design points and a very broad class of non-decreasing doseresponse functions. We also allow for both lower and upper asymptotes in the dose-response model specification for π(x). Our results can inform sample size and model choice decisions, and may help explain seemingly odd parameter estimates that can occur in practice. They also allow for efficient pre-screening of such problematic data sets when conducting computer simulations, thus saving valuable computing time, especially in complex Monte Carlo scenarios.
In Section 2, we introduce the family of specifiably degenerate functions (sdfs) under which separated data may occur. In Section 3, we study their impact on fitting quantal dose-response models and also present necessary conditions to avoid degenerate fits. We further examine the impact of sdfs in a short simulation study in Section 4, and end with a discussion of these results.
The limit of the parameter sequence {β i : i ∈ N} can be a boundary value of the parameter space B and thus represents a non-permissible value of the parameter vector β. Also note that the limiting function, which we describe as degenerate or as a single-step function, will not be a cdf since it violates right-continuity. This definition applies to a wide range of cdfs which are commonly employed in the specification of response functions. In particular, it applies to all location-scale families: Let G((x − μ)/σ ) be a cdf and define β 1 = −μ/σ and β 2 = σ −1 . Then, for any x * ∈ R and g * ∈ (0, 1), let where G −1 (·) is the quantile function associated with G(·). In this case, the sequence of parameter vectors β i = (β 1,i , β 2,i ), i = 1, 2, . . . satisfies the conditions from Definition 2.1. This result includes the family of normal, logistic, and extreme value distributions, which are related to the popular probit, logit, and complementary log-log links in generalized linear modelling.
[1, Section 8.4.3] By substituting log(x) for x this also applies to the Weibull distribution.

Necessary conditions for a non-degenerate fit
To see how Definition 2.1 affects the process of fitting quantal dose-response models as in Equation (1), first note that our model specification orders the observed stimulus levels π 1 < π 2 < · · · < π q . Consider maximizing the log-likelihood function, which can be reduced to the quantity Where needed for simplicity, we write is required only to be non-decreasing, the non-parametric mle satisfying this restriction is given by the sequencẽ [16] This can be found using the well-known PAV algorithm: Weibull Logistic (2) Initializeπ i =π i (this will be modified to become the restricted mle).
Note that when no asymptotes are specified, a sequence of PAV-estimators containing no values distinct from 0 or 1 corresponds to the case of completely separated data, whereas a sequence with only a single estimate separate from 0 or 1 corresponds to the case of quasi-separated data. Theorem 3.1 can immediately be restated as:  (2), where G(x; β) is sdf and where the PAV-estimators satisfyπ 1 ≤π 2 ≤ · · · ≤π q . Then, a necessary condition for a non-degenerate mle is that at least two of the PAV-estimates of π i must be distinct fromγ 0 andγ 1 . In particular, if the conditions of the corollary are not met, any estimation attempt using the logistic, probit, complementary log-log or Weibull response functions in the parametric form (3) will fail by diverging. Also note that the above condition is necessary but need not be sufficient. There may be cases where estimation still fails for other reasons.

Simulation study
To examine how often this sort of degeneracy may result, we expanded upon the simulation mentioned in Section 1 and simulated data via Equations (1) and (2) β 1 ln x))). We studied a model's susceptibility in two aspects: number of specified asymptotes and number of dose levels used. For each base-model specification, we generated data using no asymptotes, only a lower asymptote, only an upper asymptote, and using both asymptotes. The dose levels  were studied over four designs: x = (0, 1 4 , 1 2 , 1), x = (0, 1 3 , 2 3 , 1), x = (0, 0.25, 0.5, 0.75, 1), and x = (0, 0.2, 0.4, 0.6, 0.8, 1). The parameterizations for β were then determined by fixing the response rates at x = 1 2 and at x = 1 for all models. The response rates represent variations on dose-response functions studied previously by Bailer and Smith [17] and Buckley and Piegorsch.
[2] All parameterizations chosen for the simulation models are summarized in Table 1.
For each of the 48 specified models and at each specified sample size n, N sim = 2000 data sets were generated. For every data set, the corresponding sequence of PAV-estimates was computed and it was determined whether the sequence violates the necessary conditions specified in Corollary 3.2. From the 2000 replications, the proportion of violating PAV-sequences (resulting in degenerative fits) was determined as an estimate of the probability of a degenerate fit given the current per-dose sample size and a specified model. (For full results, refer to the Supplementary Materials of this paper online.) Examining the results from the six model configurations, we observed that model configuration A (comprising models A1-A4; see Figures 1 and 2) was most prone to producing data sets resulting in degenerate fits. When no asymptote was used in the model, the proportion of degenerate fits could be as high as 0.75 at smaller sample sizes. This proportion dropped rapidly as n increased,  however, and at n ≥ 50 it became negligible for practical purposes over all four dose spacings. The proportions tended to be slightly higher for the Weibull model than for the logistic model. Increasing the number of dose levels also had a favourable impact, reducing the proportion of degenerate fits down to slightly above 40% at small n for q = 6 dose levels. The pattern was similar in the presence of a lower asymptote, but the decrease was a bit slower; n > 100 may be required for the Weibull model, with n > 300 required for the logistic model. Introducing an upper asymptote or two asymptotes had a much more severe impact, with more than 95% degeneracy at small sample sizes. The differences to no or one asymptote were most pronounced for n between 5 and 300. For small sample sizes, introducing two asymptotes had a similar effect as with the upper asymptote alone. However as n increased, the occurrence of degenerate fits consistently remained higher for two asymptotes. Depending on the number of dose levels, one could require as many as n = 3000 observations per dose to avoid degenerate fits altogether. The results for model configuration B were roughly similar to model configuration A; however, the proportions of degenerate fits tended to be slightly smaller. For model configuration C, the proportions of degenerate tended to be even smaller. For only q = 4 dose levels, however, introducing two asymptotes had a particularly severe effect: even at per-dose sample sizes between 50 and 1000 the incidence of degenerate fits was still 40% or higher (both spacings). Model configurations D, E and F were slightly less problematic (we display model configuration E in   In these cases having no or only one asymptote barely produced degenerate fits, especially when n ≥ 20. In these cases, the only real impact arose when two asymptotes were introduced. Also, an upper asymptote had more impact in the Weibull models than a lower asymptote. For the logistic models, the difference was a bit more ambiguous. Examining the differences between use of lower and upper asymptotes, it should be noted that (for the response probabilities in Table 1) introducing a lower asymptote tends to produce a doseresponse curve that is more similar to a dose-response curve without any asymptote. Conversely, introducing an upper asymptote produces a dose-response curve much more similar to a doseresponse curve with two asymptotes (see Figures 5 and 6 for an illustration with configuration A). We see this more as a particular characteristic of our simulation studies and not as a generalizable result.

Discussion
Herein, we have demonstrated that non-convergence issues apply to many popular non-decreasing quantal dose-response functions, including both separated data issues and asymptote induced degeneracy. We also provided a necessary condition that allows a researcher to quickly identify dose-response data resulting in degeneracy prior to starting any iterative parameter estimation routine and without solving linear programming problems. A simulation study was conducted to examine the susceptibility of a variety of dose-response models to producing degenerate fits. From these simulations, we can draw the following conclusions: • The incidence of degenerate fits shrinks as sample size increases.
• The incidence of degenerate fit also depends on the model; for flatter responses such as configurations A, B and C in Table 1) larger sample sizes may be required to avoid degenerate fits. • Increasing the number of dose levels has a greater impact on steeper models (such as configurations D, E and F) than on flatter ones. • Introducing asymptotes to a model greatly increases the occurrence of a degenerate fit (especially at lower sample sizes). Introducing two asymptotes has the most severe impact, especially on flat models. The impact of only one asymptote is more ambiguous and can range from a minor impact to an impact almost as severe as having two asymptotes.
For a researcher investigating new or existing statistical methodology based on dose-response functions, these results may prove quite valuable, especially when studying the operating characteristics of these methods via Monte Carlo simulation. Early filtering of data sets resulting in degenerate fits can save valuable simulation time and avoid unusual and undesired results. Our Corollary 3.2 proves particularly useful in this regard, since the calculation of the PAV-estimates is computationally inexpensive and quickly accomplished. To illustrate, our entire simulation study ran in less than 5 min on a laptop computer running Windows XP with a 2.5 GHz CPU and 3.5 GB of RAM. Due to the widespread use of the class of dose-response models we study here, we also expect that these results will be useful for applied researchers. Clear impacts on study design and model formulation are apparent: with the former, our simulation studies call for (not very surprisingly) either larger sample sizes or a greater number of dose levels, especially when a shallow dose-response curve is anticipated. The results also emphasize that any model formulation using response asymptotes should be considered carefully when only limited sample sizes are available.