Identification of the Minimum Effective Dose for Normally Distributed Endpoints Using a Model Selection Approach

When identifying the minimum effective dose (MED) or the lowest observed adverse event level (LOAEL), researchers usually employ multiple comparison procedures (MCPs). The ones preferred are the Dunnett-type difference-to-placebo and ratio-to-control test. In this article, we will use a model selection criterion, namely the generalized order-restricted information criterion (GORIC). The GORIC can evaluate a set of hypotheses regarding the response means directly and simultaneously, where in this case in each hypothesis a different dose is hypothesized to be the MED or LOAEL. It takes different patterns of increasing response with increasing dose into account without pooling response means, whereas MCPs with order restrictions do. The GORIC chooses the best hypothesis of the set leading to the identification of the MED and, depending on the set, to a specific pattern of response means. We will show, by simulation, that the GORIC has advantages in identifying the MED or LOAEL, besides theoretical ones. Supplementary materials for this article are available online.


Introduction
Selecting an appropriate dose for a new drug entity is a challenging problem. A specific aspect is the iden-tification of the minimum effective dose (MED) for a primary efficacy endpoint in a randomized Phase II clinical trial (Committee for Proprietary Medicine Products 2002). A related problem is the identification of the lowest observed adverse event level (LOAEL) (Kodell 2009), which is important in toxicological risk assessment. The adverse event considered here is a normally distributed endpoint, such as body weight in a repeated toxicity study. Note that the MED and the LOAEL are statistically similar approaches: each determines the lowest dose for which the mean response differs significantly from the placebo or control and for which the consecutive doses have at least the same or increasing differences to the control. The two differ only in application: the first is often employed in efficacy measures and the latter in safety measures/adverse effects. In this article, we will refer to both as the MED, except in the application of a safety measure.
A pioneering article (Ruberg 1989) proposed different types of contrast tests to estimate the MED. However, Bauer (1997) mentioned the need for the strict monotonicity assumption for an unbiased MED estimate, and Hothorn and Hauschke (2000) demonstrated the possibility of false significant effects in smaller doses when pooling groups. Therefore, only the many-to-one comparison procedure for difference-to-placebo (Dunnett 1955) or for ratio-to-control (Dilba et al. 2004) reveal unbiased estimates. Nevertheless, the evaluation of dose-finding trials by multiple pooling contrasts was recently recommended C American Statistical Association Statistics in Biopharmaceutical Research February 2014, Vol. 6, No. 1 DOI: 10.1080/19466315.2013.847384 (Aras, Xue, and Liu 2011). Moreover, MED identification by means of statistical testing approaches can be misleading since it depends on the design of the study, particularly on sample size (Leisenring and Ryan 1992;Hung and Wang 2010).
In this article, we introduce the model selection approach called the generalized order-restricted information criterion (GORIC; Kuiper, Hoijtink, and Silvapulle 2011) as an alternative to identify an MED. It evaluates hypotheses of interest, possibly including order restrictions, directly and simultaneously; and is oriented to select the best model. All kinds of models can be evaluated. But, for the identification of the MED, one should examine those where a specific dose is hypothesized to be the MED. Like other information criteria, it consists of a likelihood and a penalty part. The GORIC is a modification of the model selection approach of Yanagawa and Kikuchi (2001) in that it takes the monotonicity of the dose-response relationship into account (in determining the penalty) and not solely the number of parameters.
There are several differences between an MCP and the GORIC. First, the GORIC is able to take different patterns of increasing response with increasing dose into account (simultaneously), while an MCP rejects a null against an alternative in which two (possibly pooled) means differ. Second, the GORIC examines patterns of response means without the need of pooling means, whereas an MCP with order restrictions requires to pool means. Third, where an MCP leads to a dichotomous decision of a specific dose being an MED or not, the GORIC (weights) can also render relative support. Fourth, an MCP controls the familywise error rate (FWER); that is, the probability of making one or more false discoveries. In contrast, the GORIC/model selection does not (and cannot), since it serves another goal. In model selection, all hypotheses are of equal importance, while the null hypothesis is of utmost importance in hypothesis testing. The latter controls the FWER to ensure that the probability of making one or more false discoveries does not exceed the nominal alpha level (often set to 0.05). In model selection, one evaluates hypotheses directly and chooses the best of a set; thus in that sense there is no need for FWER control. Notably, in model selection, the equivalent of the Type I error (i.e., incorrectly rejecting a true null hypothesis) does exist; namely, not selecting the true null hypothesis (when it is included in the set). This is also not controlled for, since the null hypothesis does not serve a special role here.
To facilitate the description of the two methods, we will illustrate them by means of two data examples, which are introduced in Section 2. A brief description of the Dunnett-type difference-to-placebo test, referred to here as the Dunnett test, and the GORIC will be given in Section 3 for identifying the MED. Subsequently, in Section 4, these methods will be applied to the two data examples in which the MED is searched for in the first and, due to the type of application, the LOAEL is obtained in the second. In Section 5, the Dunnett test, accompanied by the Williams test for order-restricted comparison against control (Williams 1971), and the GORIC will be examined by means of simulation. Here, it is shown that the Williams procedure is biased (as are other tothe-left pooling contrasts). Therefore, this method is not elaborated upon in the Methods and Examples sections (i.e, Sections 3 and 4, respectively). We end with alternatives for LOAEL and some extensions of (and software for) the GORIC in Section 6 and with a discussion in Section 7.

Motivating Examples
The first data example is part of a dose-response clinical trial regarding a drug to treat angina pectoris. The primary efficacy endpoint is the change in pain-free walking from pretreatment (measured in minutes); that is, the duration of pain-free walking after treatment relative to the values before treatment. The data were taken from Westfall et al. (1999, p. 164) and are available under the name "angina" in the R package mratios. Figure 1 displays the boxplots of change in pain-free walking for four dose groups and zero-dose placebo. Bear in mind that large values indicate positive effects on patients.
As a second data example, a chronic toxicity study on Mosapride Citrate (Fitzhugh, Nelson, and Quaife 1964) was selected, regarding the relative liver weight (relative to the body weight) of dogs. Due to the application, one searches for the LOAEL here (instead of the MED). These data were employed by Yanagawa and Kikuchi (2001) and West and Kodell (2005) and are available from Table 2 on page 320 of Yanagawa and Kikuchi (2001). Figure 2 displays the boxplots of relative liver weight of dogs for three Mosapride Citrate levels and the zero-dose control.

Definition of MED
Consider one-way layouts with the factor dose with i = 0, 1, 2, . . . , k levels, where the placebo group or the negative control group is coded as a zero-dose control (i.e., i = 0). For instance, in the first example, k = 4 and, in the second, k = 3. Furthermore, let Y i j be a primary (efficacy or safety) endpoint which is normally distributed with mean μ i for group i and a homogeneous variance σ 2 , that is, Y i j ∼ N (μ i , σ 2 ). The definition of the MED (or LOAEL) is MED = min (η ∈ 1, . . . , k : μ 0 = · · · = μ η−1 < μ η ≤ · · · ≤ μ k ).

Contrast Tests
In a testing framework, the global null hypothesis H 0 : μ 0 = μ 1 = · · · = μ k can be compared to a global order-restricted alternative where η is the unknown MED. For instance, for k = 3, the global alternative can be decomposed into seven elementary order-restricted alternatives (Bretz 1999): where H x η=e denotes a specific pattern of response means, here labeled as x, when dose e is the MED. A subset  of these alternatives, namely H a η=1 , H a η=2 , and H a η=3 , are change point alternatives (Hirotsu, Yamamoto, and Hothorn 2011), in which some response means are pooled in a testing framework (namely, doses 1, 2, and 3 in H a η=1 and doses 2 and 3 in H a η=2 ). As mentioned earlier, all testing approaches which use pooling means (e.g., contrasts) render biased MED estimations. For example, H a η=1 : μ 0 < μ 1 = μ 2 = μ 3 can be false, while μ 0 < μ 123 is true, with μ 123 the pooled mean (i.e., the [weighted] average) of μ 1 , μ 2 , and μ 3 . Therefore, these tests prefer low(er) doses as the MED and, thus, they can underestimate the MED.

The Dunnett Test
The Dunnett test (Dunnett 1955) is an unbiased testing approach for the identification of the MED, which employs linear combinations of coefficients (here, means) of the k + 1 dose groups. In the first example, where k = 4, the alternatives examined with the one-sided many-to-one comparisons of Dunnett are As a result, the following hypotheses are tested simultaneously with a one-sided Dunnett test: This procedure is available in the R package multcomp (Bretz, Hothorn, and Westfall 2002), which renders socalled simultaneous p-values. The lowest dose for which the one-sided simultaneously p-value (adjusted for multiplicity, that is, controlling the FWER) is just below the nominal alpha level (often, and also here, set to 0.05) is the MED. For the second example, the hypotheses tested to identify the LOAEL are equal to those used in the first example with the exception of the last hypothesis, since now k = 3.

Model Selection for Order-Restricted Hypotheses
In contrast to testing various null hypotheses versus an alternative, model selection evaluates a set of hypotheses of interest directly and simultaneously. One such model selection technique is the generalized order-restricted information criterion (GORIC; Kuiper, Hoijtink, andSilvapulle 2011, 2012). The GORIC is, like the AIC (Akaike 1973) and ORIC (Anraku 1999), a trade-off between the fit of the hypothesis in the data (which is related to the likelihood) and the complexity of the hypothesis (which is related to the number of distinct parameter values). The calculation of the GORIC is given in Appendix 7.
The GORIC selects the best out of a set of hypotheses/models. One can best construct this set based on theory/expectations. For instance, when k = 3, one could examine the seven elementary order-restricted alternatives specified in Equation (1) if one is interested in which of those is the correct ordering. If one is just interested in the change point alternatives H a η=1 , H a η=2 , and H a η=3 , then one should inspect only those (notably, without the necessity of pooling means). The hypothesis with the lowest GORIC value is the preferred one (of the set).
To protect against selecting the best out of a set of weak hypotheses (i.e., hypotheses not strongly supported by the data), one should always incorporate the unconstrained hypothesis (i.e., the hypothesis with no restrictions on the parameters, also known as the classical or omnibus alternative hypothesis). Then, the GORIC selects (i) the correct hypothesis (if it is included), (ii) a similar one (i.e., a hypothesis which resembles the true hypothesis, that is, only differs in a few constraints), or (iii) the unconstrained hypothesis. Note that, if the set does not contain the null model and the number of observations is high enough, the unconstrained model will be safeguarded for preferring a noneffective dose. One could also include the null in the set to protect against selecting a noneffective dose.
To improve interpretation, we will report the GORIC weights (w m ), which are elaborated upon in Appendix 7. These weights reflect the relative support of one hypothesis of interest in comparison to the whole set. Moreover, the ratio of two weights gives the relative support of these two hypotheses: H m is w m /w m times more likely than H m .

Hypotheses of Interest
We believe that there are often two types of research questions: One wants to identify the MED level or one wants additionally to obtain information about the pattern of the response means, which can be useful in clinical trials. In the latter case, one should evaluate all hypotheses based on the decomposition of the global order-restricted alternative H ordered A : μ 0 ≤ μ 1 ≤ · · · ≤ μ k , with at least one strict inequality constraint, and the unconstrained hypothesis H u . This will lead to a preferred pattern/shape of response means for a specific MED level. For instance, in Equation (1), H a η=1 to H d η=1 represent patterns where dose 1 is the MED and model selection might lead to preferring one of those four patterns. In the first case, when one is not interested in the specific pattern of the response means, but solely in the identification of the MED, one should examine only those with a monotonic increase of means after μ η (e.g., in Equation (1)). For those who are mathematically inclined, it holds true for the GORIC that "<" equals "≤." Hence, in Equation (1), H a η=1 is a subset/special case of H d η=1 . Therefore, when H a η=1 is true, H d η=1 will be true as well; this is reflected by the same likelihood value. This also holds true for the other patterns (here: H b η=1 and H c η=1 ). Thus, the hypotheses with a monotonic increase of means after μ η (e.g., H d η=1 in Equation (1)) covers all patterns of η (= 1). Notably, when both H a η=1 and H d η=1 are true, the hypotheses are distinguished by their complexity; where here pattern a has a lower penalty value than d and is thus preferred. This is, besides a theoretical one, the reason why one should examine the patterns separately when it is of interest. Namely, it might be possible that pattern H a η=1 is the correct one, but that the support for H d η=1 is not convincing (compared to, say, H b η=2 ).
In both examples, we will first examine all hypotheses based on the decomposition of the global order-restricted alternative and the unconstrained hypothesis (H u ). For brevity, the sets of models are not displayed here, but can be found in Table 2 for the first example and in Table 4 for the second one. Subsequently, we will investigate solely the hypotheses with a monotonic increase of means after μ η . For the first example, the set then comprises H h η=1 ,  Table 4 will be examined in the second example.
To make sure that two means differ by at least a specific amount, say δ, one could for the second example, where k = 3, evaluate This type of set is not employed in the examples, but will be used in the simulation study (except for H u

Example 1: Change in Pain-Free Walking
Performing the one-sided Dunnett test renders the socalled one-sided simultaneous p-values. Table 1 shows that for α = 0.05, when taking multiplicity into account (hence, based on adjusted p-values), dose η = 3 is the MED. Table 2 (for now, disregard the last column) shows, for each of the elementary order-restricted alternatives and H u , the values for the order-restricted log-likelihood, the penalty, the GORIC, and the GORIC weights regarding change in pain-free walking. Since H h η=1 : μ 0 < μ 1 < μ 2 < μ 3 < μ 4 has the lowest GORIC value and, therefore, the highest GORIC weight, it is the preferred hypothesis of the set. Moreover, H h η=1 is about 0.275/0.018 ≈ 15.28 fold more likely than the unconstrained hypothesis. Hence, H h η=1 is not a weak hypothesis. Thus, according to the GORIC, based on these data and this set of hypotheses, dose η = 1 is the MED, and the best ordering of the response means is μ 0 < μ 1 < μ 2 < μ 3 < μ 4 . However, it has only 0.275/0.130 ≈ 2.12 more support than H d η=2 : μ 0 = μ 1 < μ 2 < μ 3 < μ 4 , the best hypothesis for η = 2. One could increase confidence in rejecting this pattern as the preferred one by investigating a larger dataset. Remarkably, H h η=1 is 0.275/0.034 ≈ 8.09 times more likely than the best hypothesis for η = 3, the MED chosen by the one-sided Dunnett test.
The GORIC weights for the hypotheses in which all patterns leading to a specific dose are combined are reported in the last column of Table 2. Also now, H h η=1 is the preferred hypothesis of the set and, thus, dose η = 1 is the MED. Since the hypotheses in this smaller set are the ones with the highest support for each η in the whole set, there are practically no differences in relative weights. That is, H h η=1 is here about 0.598/0.038 ≈ 15.18 times more likely than the unconstrained hypothesis (and is, therefore, not a weak hypothesis). Furthermore, it has only 0.598/0.282 ≈ 2.12 times more support than H d η=2 , where the MED is 2. One could increase confidence in dismissing dose η = 2 as the MED by inspecting a larger dataset. Moreover, η = 1 (i.e., H h η=1 ) is 0.598/0.075 ≈ 7.97 times more likely than η = 3 (i.e., H b η=3 ), the MED chosen by the one-sided Dunnett test.  Note: ORLL = order-restricted log-likelihood; GORIC = generalized order-restricted information criterion = 2 (ORLL -penalty).

Example 2: Relative Liver Weight
toxicity study regarding the relative liver weight (relative to body weight) of dogs. Table 4 (for now, disregard the last column) shows, for each of the seven elementary order-restricted alternatives and H u , the values for the order-restricted loglikelihood, the penalty, the GORIC, and the GORIC weights regarding the relative liver weight of dogs. Since H c η=1 : μ 0 < μ 1 = μ 2 < μ 3 has the lowest GORIC value and, therefore, the highest GORIC weight, it is the preferred hypothesis of the set. Moreover, H c η=1 is about 0.267/0.037 ≈ 7.22 times more likely than the unconstrained (i.e., any ordering). Hence, it is not a weak hypothesis. Consequently, according to the GORIC, dose η = 1 is the LOAEL and μ 0 < μ 1 = μ 2 < μ 3 is the preferred ordering of the response means. Notably, all hypotheses with η = 1 have at least 0.133/0.068 ≈ 1.96 times more support than those with η = 2 or 3; and H c η=1 itself is at least 0.267/0.068 ≈ 3.93-fold more likely than any hypothesis with η = 2 or 3.
Alternatively, one could also inspect the set consisting of just H d η=1 , H b η=2 , and H a η=3 (where all patterns leading to dose η are combined), accompanied with H u . The GORIC weights for this set are reported in the last column of Table 4. Now, H d η=1 is the favored hypothesis of the set. Like in the set comprising all alternatives, dose η = 1 is the LOAEL. Furthermore, H d η=1 is not a weak hypothesis, since it is about 0.587/0.094 ≈ 6.24 times more likely than the unconstrained hypothesis. Moreover, H d η=1 (i.e., LOAEL of 1) has here about 0.587/0.144 ≈ 4.08 and 0.587/0.174 ≈ 3.37 times more support than H b η=2 (i.e., LOAEL of 2) and H a η=3 (i.e., LOAEL of 3, which was chosen by the one-sided Dunnett test), respectively.

Simulation Study
The two examples show that the results of the Dunnett test differ from that of the GORIC. Since it is not known in these examples what the true MED (or LOAEL) is, a simulation study will be performed to obtain more insight into which method performs the best (under which circumstances).
The performance of the GORIC is compared with that of the Dunnett-and Williams-type multiple contrast tests by simulating the MED selection frequency based on a simple, artificial dose-response experiment with 10 observations at each of four increasing, equidistant dose levels. A convex, linear, semiconcave, and concave doseresponse profile is assumed, similar to the power calculation examples shown in Genz and Bretz (1999), to cover a wide spectrum of dose-response settings. To study the sensitivity of detecting the true model for each method, the transition from the null model with equal means at each dose level to a clearly detectable dose-response shape is implemented by increasing a noncentrality parameter, as described in Genz and Bretz (1999). In addition, we scaled this noncentrality parameter to obtain the Note: ORLL = order-restricted log-likelihood; GORIC = generalized order-restricted information criterion = 2 (ORLL -penalty).
same effect sizes (Cohen 1992) for each dose-response profile.
The set of linear constraints in Equation (2), except for H u , will be examined by the GORIC. To maintain at least a related set of hypotheses to compare the GORIC with multiple testing procedures, H 0 is included in the set of hypotheses and H u is not. The four models, H 0 to H η=3 , correspond to an MED at dose level 0-3, respectively. The parameter δ denotes a relevance margin, which increases the distance of H η=1 , H η=2 , and H η=3 to H 0 . The same relevance criterion is introduced to the right-hand side of the hypothesis definition of the multiple contrast tests to maintain a rough comparability of the three approaches.
For each of 10,000 simulation runs per setting, the frequency of selecting a specific MED is computed and accordingly the selection rate of the true MED is calculated. The selection rates for the linear true dose-response profile is shown in Figure 3. The results for the convex, semiconcave, and concave dose-response shapes are given in the supplementary material, available online.
With an increasing noncentrality parameter, both the GORIC and the Dunnett-type MCP are able to detect the true underlying MED. In contrast, the Williams-type MCP has difficulties to detect the MED at a semiconcave dose-response shape, as there is no specific hypothesis/contrast available for this case. This can also be seen from the corresponding figure. Note that the four figures show that the Williams-type MCP indeed prefers low(er) doses as the MED, which can lead to the underestimation of the MED in some populations.
The GORIC and MCP approaches can be clearly distinguished by the rate of detecting the null model at a noncentrality parameter of zero. The MCP procedures are defined to select the true null model with a Type I error of α, whereas the model selection procedure does not give a special weight to the null model, but treats it like any other model in the set. Furthermore, when the nonnull model is true, the GORIC often has a higher true-hypothesis rate than the other two methods. It is also less dependent on the noncentrality parameter than the other two and almost independent of it after a certain value (especially for the convex, semiconcave, and concave dose-response shapes). This suggests favoring the GORIC when one does not expect the null model to be true beforehand. However, bear in mind that the three methods do not evaluate the same (set of) hypotheses. Thus, besides these properties, one should base the comparison on other characteristics of the three methods. One distinction between the methods is the hypothesis of interest. With MCPs, one rejects a null of two means being equal or not (a dichotomous decision), which leads to the MED. Note that often pooled means are required, which can lead to false significant effects in smaller doses (Hothorn and Hauschke 2000). The GORIC evaluates a set of theories/hypotheses/ordering of all the means directly and simultaneously. Thus, it can examine very specific orderings of means, which can also lead to the identification of the MED (and might give extra information regarding the pattern of the response means). Here, no pooled means are needed. In addition, via the GORIC weights, one obtains relative support for each pattern of means and/or the value of the MED. Another difference is that MCPs control for the Type I error or FWER, whereas the GORIC does not. But, since the GORIC evaluates orderings of means directly and simultaneously, there is no need for controlling the FWER. However, when "finding" the null, when it is true, is of most importance, one should employ an MCP. When the importance lies more in finding a nonnull hypothesis, the GORIC is preferred.

Alternatives for LOAEL in Toxicological Risk Assessment
According to Hothorn and Hasler (2008), the noobservable-adverse-effect level (NOAEL), controlling the more relevant false-negative error rate, is of more interest than the LOAEL. Additionally, the benchmark dose approach, especially the benchmark dose lower confidence limit approach, is preferred over the LOAEL identification approach (Kodell 2009). However, the definition of an appropriate benchmark response level is not clear: see the recent proposed background noise approach based on the difference between the upper and lower bounds of the two-sided 90% confidence interval curves (Sand, Portier, and Krewski 2011). Because of the lower influence of the design, particularly of the noncentrality parameter and thus also of the sample sizes, the identification of the LOAEL by means of the GORIC instead of an MCP is now an alternative to the benchmark dose concept in toxicological risk assessment. However, it should be stressed that, up to now, the GORIC is restricted to normally distributed endpoints.

Extensions of the GORIC
There is a version of the GORIC applicable to heterogenous variances (Kuiper 2011a;Kuiper, Hoijtink, and Silvapulle 2012). For ANOVA models, the adjustment to the GORIC is straightforward. Although there is not really a biological reasoning behind assuming different variances per dose in the described examples, Table  5 displays the results of the GORIC apt for heterogenous variances for the second example. In this case, H c η=1 : μ 0 < μ 1 = μ 2 < μ 3 is still the preferred hypotheses/ordering of means, where the MED is dose 1. Hence, Note: ORLL = order-restricted log-likelihood; GORIC = generalized order-restricted information criterion = 2 (ORLL -penalty).
for this example, the GORIC for both homogeneous and heterogenous variances render the same conclusion. The small sample size of n i = 6, in the second example, is particularly limiting for testing approaches. Fortunately, there exist a small-sample corrected GORIC (Kuiper 2011a(Kuiper , 2013, which especially performs better in regression models, when the total number of observations (N) is small (it is shown for N = 10 and 20). Notably, in ANOVA models, the total number of observations equals the sum of the group sizes (i.e., N = k i=0 n i ). In the example, N = 4 × 6 = 24; therefore, the results of the GORIC resemble those of the small-sample corrected GORIC here.

Software
The GORIC is available in the R package goric. There, it can be applied to ANOVA models. There also exist a stand-alone, free-to-use software application of the GORIC for (multivariate) regression models and, thus, ANOVA models. This is downloadable from http://www.uu.nl/staff/RMKuiper.

Discussion
Both multiple comparison procedures (MCPs) and the generalized order-restricted information criterion (GORIC) can be used for the identification of the minimum effective dose (MED) or the lowest observed adverse event level (LOAEL), but are based on different principles with substantially different properties. It should be stressed first that the GORIC and MCP serve other goals. Namely, the GORIC selects the best out of a set of hypotheses, whereas the MCP rejects the null hypothesis (of two means being equal) or not (in favor of an alternative). Since the GORIC examines the ordering of all response means, there is no need to pool response means or to hypothesize only the relation between two means instead of all of them. Furthermore, MCPs control the familywise error rate (FWER), while the GORIC controls neither false-negative nor false-positive error rate at all, since the null hypothesis has no special interest. Shang (2010) stated about the posterior probabilities of the Schwarz information criterion (SIC), which are comparable to the GORIC weights, that "due to the simplicity of the utilized priors for the SIC variant, no adjustments in inferences are needed in conducting multiple comparisons." Besides that, we believe that no FWER control is required, because the ordering of the means representing a specific MED value is investigated directly and simultaneously. Since the null hypothesis plays no pivotal role in model selection, there is also no (need for a) Type I error control.
From the examples, it can be seen that by including hypotheses to the set, the weights for a specific pattern of means will not increase and generally decrease. Thus, only evaluate all alternatives when you are interested in the patterns (as well). However, at least in these examples, (i) the relative weights are not influenced considerably and (ii) the support regarding a specific MED (including all possible patterns) remains about the same in both sets. In that sense, there seems to be no need for adjusting for the number of hypotheses in the set.
Simulation shows, among other things, that the GORIC often has a higher true-hypothesis rate, when a nonnull hypothesis is true, and that it is particularly less dependent on the noncentrality parameter and thus on the sample sizes. But, bear in mind that the methods do not evaluate the same set of hypotheses. Nevertheless, the GORIC also has theoretical advantages, like evaluating a possible MED directly in one hypothesis and not requiring pooled means. However, when a researcher expects the null model to be true beforehand or pays special importance to the null, it is better not to use the GORIC, since MCPs are designed to choose the null model frequently when it is true. Since we believe that the null will usually not be the true hypothesis (and due to the other advantages), we prefer the use of the GORIC.
To control the Type I error, one could employ a twostep approach as well. In the first step, evidence for an overall effect of an increase in dose levels on the endpoint is identified by a hypothesis test (at a specific Type I error rate). If a significant effect is found, the GORIC can be used in a second step to identify the MED.
From the examples, it can be seen that different types of sets can be evaluated with the GORIC. Which set should be employed in practice depends on the research question. The use of the set where one hypothesis contains all possible orderings representing a specific dose to be the MED is sufficient for identifying the MED. When one can specify a minimum difference or wants to evaluate a specific minimum difference (say, δ), one should use the set used in the simulation study accompanied by the unconstrained model; that is, the set displayed in Equation (2). In case one is interested in which of all the possible patterns (resulting from the decomposition of the global order-restricted alternative) is the best theory (besides the identification of the MED level), one should apply the GORIC to the set containing all of these. might be that Shang (2010) showed that the use of probabilities (like the GORIC weights) is a more effective procedure for "multiple comparisons" than that based on the information criterion itself.
The weight w m is calculated by w m = exp(−1/2 (GORIC m − GORIC min )) m ∈M exp(−1/2 (GORIC m − GORIC min )) , where M is the set of (say, M) hypothesis indices and GORIC min is the lowest GORIC value, that is, the GORIC value of the preferred model/hypothesis. Because the GORIC can be seen as a likelihood for Hypothesis H m , the GORIC weight represents the relative likelihood (or, stated otherwise, the weight of evidence) of Hypothesis H m given the data and the set of (say, M) hypotheses. Hence, the weights reflect the relative support of one hypothesis of interest in comparison to the whole set. Moreover, the ratio of two weights gives the relative support of these two hypotheses, leading to statements as "H m is w m /w m -fold more likely than H m ".

Supplementary Materials
The supplementary material contains four figures (like Figure 3), each depicting the simulation results for one of the four true dose-response shapes (i.e., linear, convex, semi-concave, and concave). That is, each figure displays the simulated MED selection rates for the three methods (i.e., the GORIC and the Dunnett-and Williams-type multiple contrast tests) for one of the true dose-response shapes.