Using a Monotonic Density Ratio Model to Find the Asymptotically Optimal Combination of Multiple Diagnostic Tests

ABSTRACT With the advent of new technology, new biomarker studies have become essential in cancer research. To achieve optimal sensitivity and specificity, one needs to combine different diagnostic tests. The celebrated Neyman–Pearson lemma enables us to use the density ratio to optimally combine different diagnostic tests. In this article, we propose a semiparametric model by directly modeling the density ratio between the diseased and nondiseased population as an unspecified monotonic nondecreasing function of a linear or nonlinear combination of multiple diagnostic tests. This method is appealing in that it is not necessary to assume separate models for the diseased and nondiseased populations. Further, the proposed method provides an asymptotically optimal way to combine multiple test results. We use a pool-adjacent-violation-algorithm to find the semiparametric maximum likelihood estimate of the receiver operating characteristic (ROC) curve. Using modern empirical process theory we show cubic root n consistency for the ROC curve and the underlying Euclidean parameter estimation. Extensive simulations show that the proposed method outperforms its competitors. We apply the method to two real-data applications. Supplementary materials for this article are available online.


Introduction
Disease classification and prediction based on diagnostic tests have long been investigated by medical and biostatistical researchers. In disease screening, multiple diagnostic tests are commonly used and, in general, no single diagnostic test has satisfactory levels of sensitivity and specificity. One approach to improve the performance of screening is to combine multiple diagnostic tests so as to obtain an optimal composite diagnostic test with higher discriminative accuracy that detects the presence of the disease more accurately.
The receiver operating characteristic (ROC) technique has been widely used for assessing diagnostic test accuracy in disease classification (e.g., Zhou, McClish, and Obuchowski 2002;Pepe 2003) when the sampling distribution of the diagnostic variable is continuous. The ROC curve is defined as a plot of the sensitivity versus one minus the specificity across all possible cutpoints of the test results. The closer the curve follows the left-hand border, the more accurate the test. The classification performance for a test can be measured by the area under the ROC curve (AUC). In the literature, the estimation of an ROC curve and its area is based on either a parametric model, a semiparametric model, or a fully nonparametric model. For comprehensive reviews of recent developments in ROC curves see, for example, Zhou, McClish, and Obuchowski (2002), Pepe (2003), Krzanowski and Hand (2009), and Zou et al. (2011).
Under a multivariate normal assumption for different test results, Su and Liu (1993) found the best linear combination of multiple tests. By using the Neyman-Pearson fundamental lemma, Eguchi and Copas (2002), Copas and Corbett (2002) Pepe (2002) found the best way to combine different diagnostic tests using the log density ratio statistic for diseased and nondiseased populations. Kay and Little (1987) discussed various versions of the density ratio model for some conventional distributions. Qin and Zhang (2010) considered an exponential tilting model, in which they assumed that the log density ratio between the diseased and nondiseased populations is a combination of multiple diagnostic tests. For comprehensive reviews of recent developments in the combination of multiple diagnostic tests, see, for example, McIntosh and Pepe (2002), Qin and Zhang (2010), Liu and Zhou (2013), Barreno, Cardenas, and Tygar (2008), and Kim et al. (2013).
According to the celebrated Neyman-Pearson lemma, the optimal combination of different diagnostic tests is the likelihood ratio. Hence, to obtain the optimal combination, it is desirable to specify the density ratio correctly. The conventional parametric approach discussed above needs the correct specification of the distribution for multiple diagnostic tests, which may not be easy in practice, especially when the dimension of the tests is high. Furthermore, parametric models are not sufficiently robust if the underlying distribution for either the diseased or nondiseased population is misspecified. To improve the robustness of the estimation, we propose directly modeling the density ratio as a nonparametric function of a combination of multiple diagnostic tests. Specifically, we assume that the density ratio between the diseased and nondiseased populations is a monotonic nonparametric functional form of a combination of multiple diagnostic tests. Through this method, we are able to find the optimal combination of the tests; here, optimality is over the monotone functions of all linear combinations of multiple diagnostic tests. Subsequently, the ROC curve constructed from the likelihood ratio statistic is optimal (Eguchi and Copas 2002;McIntosh and Pepe 2002). This method is appealing in that it is robust to the misspecification of the underlying distribution for the diseased or nondiseased population (we do not even need to specify these distributions). It is also robust to the misspecification of the functional form of the density ratio. Furthermore, the proposed semiparametric maximum likelihood estimation procedure can easily be implemented using the well-known pooladjacent-violation-algorithm (PAVA; Ayer et al. 1955), which is available in the R packages Iso and isotone (R Development Core Team 2011). Asymptotic theory and empirical study demonstrate that the proposed method yields a consistent estimator of the optimal ROC curve and the AUC.
This article is organized as follows. In Section 2, we introduce the optimal combination of multiple diagnostic tests. In Section 3, we propose our semiparametric estimators of the optimal ROC curve and the optimal AUC; we also present some asymptotic results for the proposed estimators. In Section 4, we evaluate the performance of our method via simulation studies. In Section 5, we apply our estimators to two real-data examples, and Section 6 provides concluding remarks. The proofs of the main theoretical results are relegated to the Appendix and supplementary material.

Optimal Combination of Diagnostic Tests Using Monotonic Density Ratio Model
For a single diagnostic test, if F 1 and G 1 denote the distribution functions for the test result in the diseased and nondiseased populations, the ROC curve has the following representation: Suppose that there are K tests and/or covariates. Let X 1 , . . . , X n 0 denote independent and identically distributed K-vector test results and/or covariates from a nondiseased population, and let Y 1 , . . . ,Y n 1 represent independent and identically distributed K-vector test results and/or covariates from a diseased population. Let G(x) = P(X ≤ x|D = 0) and F (x) = P(Y ≤ x|D = 1) represent, respectively, the distribution functions of X 1 and Y 1 , and let f (x) and g(x) be the density functions corresponding to F (x) and G(x), where D = 1 refers to the disease status and D = 0 refers to the nondisease status. Qin and Zhang (2010) considered an exponential tilting model for the density ratio, where α is a scalar parameter, β is a p × 1 vector parameter, and h(x) is a p × 1 smooth vector function of x. Let U = α + β τ h(X ), F C (u) = P(U ≤ u|D = 1), and G C (u) = P(U ≤ u|D = 0). According to Eguchi and Copas (2002) and McIntosh and Pepe (2002), the optimal ROC curve is a plot of ROC C (s) = 1 − F C (G −1 C (1 − s)) against s ∈ [0, 1], whereas the area under the optimal curve ROC C (s) is given by AUC C = 1 0 ROC C (s)ds.
If the exponential tilting model (1) is violated, the ROC curve based on the combination U derived from (1) may not be optimal since the exponential function form in (1) may not hold. To improve the robustness of (1), we consider a semiparametric monotonic density ratio model, where ψ (·) is an unknown monotonic nondecreasing function, and v (x; β) = β τ h(x). For identifiability purposes, we do not include the constant term in v (x; β) and further we assume that β 1 , the first element of β, is 1. Because of the invariant property of ROC, under model (2), the optimal ROC curve is based on the combination v (X; β) = β τ h(X ). Our focus in this article is semiparametric estimation of the optimal curve ROC C (s) and the area under the curve AUC C under the semiparametric density ratio model (2). Lehmann and Rojo (1992) discussed some basic properties of the monotonic density ratio model for a single biomarker. In the case of K = 1, the estimation of ROC under this type of ordering constraint was studied by Carolan and Tebbs (2005). Under a less restrictive stochastic ordering restriction, Davidov and Herman (2012) constructed the ROC estimate and studied its large-sample properties. However, to the best of our knowledge, combinations of multiple biomarkers under a monotonic density ratio ordering restriction have not been discussed in the statistical literature. Our goal in this article is to fill this gap. Model (2) is also related to the monotonic single-index model discussed by, for example, Cosslett (1983) and Auh and Sampson (2006). Based on stochastic ordering theory (e.g., Dykstra, Kochar, and Robertson 1995), (2) implies that combinations v (x; β) of multiple tests x for a diseased group are stochastically larger than those for the nondiseased group.

The likelihood function is
The maximum empirical likelihood estimators of p i and q i , i = 1, . . . , n are then defined to be {p 1 , . . . ,p n ,q 1 , . . . ,q n } = arg max p 1 ,...,p n ,q 1 ,...,q n L subject to constraint (3). We assume n 1 /(n 0 + n 1 ) → λ ∈ (0, 1) as n → ∞. For simplicity, hereafter we write λ = n 1 /(n 0 + n 1 ) and assume that it is constant, since it does not affect our technical development. We reparameterize p i and q i by setting It follows that The empirical likelihood function can be rewritten as L = L 1 L 2 with For any fixed β, as argued by Dykstra, Kochar, and Robertson (1995), the maximum empirical likelihood estimatorsφ i of φ i areφ i = (m i + r i )/n. Hence, we can maximize L 1 to obtain the maximum empirical likelihood estimators of θ (·) and β. Let . Thenp i andq i are the maximum empirical likelihood estimators of p i and q i , respectively. Now we give an estimation procedure forθ (·) andβ. Specifically, given an initial value of β, we obtain estimates of θ (·) and β through the following steps: Step 1. For a given β, we profile θ to obtain which is carried out through the following three substeps: Step Step 1.2. Minimize M n (θ, β) with respect to θ . Following Dykstra, Kochar, and Robertson (1995), we can achieve this by minimizing with respect to θ , which can be solved using PAVA (Ayer et al. 1955) and active set methods (de Leeuw, Hornik, and Patrick 2009).
Step 1.3. Letθ (v (i) Step 2. Minimize M * n (β) with respect to β to obtainβ. This step can be easily implemented using existing software such as the R function optim() (R Development Core Team 2011) with the given initial value of β. Letβ andθ (v (t (i) ;β)) be the estimates obtained from the above procedure. The maximum empirical likelihood estimatê θ (u) is a step-function and equalsθ (v (t (i) Let F C (u; β) and G C (u; β) be the cumulative distribution function (cdf) of v (Y ; β) and v (X; β), respectively, where Y and X have the cdfs F (x) and G(x), respectively. We can estimate F C (u; β) and G C (u; β) bŷ The optimal ROC curve is and the area under the optimal curve is given by
Theorem 1. Assume Conditions A and B in the Appendix. We have With the results in Theorem 1, we proceed to establish the convergence rates forF C (u) andĜ C (u), which in turn are used to prove the convergence rates of ROC C (s) and AUC C .
Theorem 2. Assume Conditions A and B in the Appendix. We have With weaker technical conditions (i.e., only Condition A in the Appendix), we can show all the convergence results in Theorems 1 and 2 by reducing O p (n −1/3 ) to o p (1) (see Theorems A.1 and A.2). For presentational continuity, we have relegated all the technical conditions and long proofs to the Appendix.
Remark 1. The monotonic nonparametric component in the semiparametric model creates a significant difficulty in the technical development. In Theorem 1, we have merely established an upper bound for the convergence rate of β. This is a nonstandard convergence rate for the estimates of parametric parameters, and it results from that of the nonparametric monotonic estimate θ (·). We conjecture that it is possible to improve this rate, but it may be very difficult to do so. This is because β essentially appears in the argument of θ (·), and with the monotonic constraint, θ (·) is not differentiable. The available theory from M-estimators is not applicable.
Huang and Wellner (1993) encountered a similar challenge. They studied current status data under the accelerated failure time model assumption, and they proved only the o p (1) convergence of their estimators. In other words, although they were able to show consistency for the underlying distribution function and the Euclidean parameter estimation, they did not establish a convergence rate. Moreover, they conjectured that the maximum profile likelihood estimate of the Euclidean parameter has root-n consistency and is efficient. However, the best convergence rate for the underlying distribution function estimation is cubic root consistency. Therefore, the O p (n −1/3 ) convergence shown in this article is an important contribution. In applications, we suggest using a bootstrap method to obtain the variability of the proposed estimators.

Simulation Studies
In this section, we study the finite-sample performance of the semiparametric monotonic density ratio estimator ROC C (s) of the optimal curve ROC C (s) via simulation studies. We compare the following methods: (1) that based on the proposed semiparametric monotonic density ratio model (2) (i.e., the monotone density ratio method); (2) the exponential tilting method by Qin and Zhang (2010) (i.e., the exponential tilting method); (3) that based on the nonparametric logistic regression type models by Ma andHuang (2005, 2007), in which, basically for all possible linear combinations of biomarkers, they maximize the area of the ROC (Huang, Qin, and Fang 2011) (i.e., the MH method); (4) the nonparametric method based on best linear combinations by Vexler et al. (2006) (i.e., the nonparametric method); (5) the min-max linear combination method by Liu, Liu, and Halabic (2011) (i.e., the min-max method); (6) the linear combination based on the decision tree by Breiman et al. (1984) (i.e., the tree method); and (7) the smoothed empirical likelihood estimator by Chen, Vexler, and Markatou (2015) (i.e., the smoothed EL method). We consider several combinations of sample sizes (n 0 , n 1 ), namely, (30, 30), (200, 100), (300, 200), (1000, 500), and (1000, 1000), and we consider a combination of two biomarkers, X 1 and X 2 . Let X = (X 1 , X 2 ) τ . For each model configuration, we perform 2000 replications. To study the robustness of our method, we consider two distributions: (i) X follows a bivariate normal distribution; and (ii) X follows a bivariate exponential distribution. We have also experimented with lognormal distributions; the observations are similar to those from the bivariate exponential distributions and are omitted for brevity. We note that our method can be applied to any distribution.
We have obtained the ROC curve and β estimates for all the methods and simulation setups in the above and subsequent sections. However, to avoid lengthy output, in the following, we present results only for the sample sizes n 0 = n 1 = 1000 and three methods: our monotonic density ratio method, the exponential tilting method, and the MH method. We give the results of the other simulation studies in the supplementary document.

Biomarkers Follow Normal Distribution
First, we consider the case where the exponential tilting model (1) is correctly specified. Specifically, we posit that X|D = 1 ∼ N(μ 1 , 1 ) and In this setup, f (x) and g(x) are the probability density functions of N(μ 1 , 1 ) and N(μ 0 , 0 ), respectively. It is easy to verify that Let U = log{ f (X )/g(X )}. Then U is the optimal combination of X 1 and X 2 (Eguchi and Copas 2002;McIntosh and Pepe 2002). The closed forms of F C (u) = P(U ≤ u|D = 1) and G C (u) = P(U ≤ u|D = 0) may be complex or difficult to find. Instead, we use a Monte Carlo method with 10,000 repetitions from N(μ 1 , 1 ) and N(μ 0 , 0 ), respectively, to numerically calculate F C (u) and G C (u), which are then used to obtain the true curve ROC C (s) for s ∈ [0, 1]. The left panel of Figure 1 displays the true ROC curve and the average after 2000 replications of the estimated ROC curves based on the monotone density ratio method, the exponential tilting method, and the MH method. It can be seen that the estimated ROC curves from the three methods are close to the true ROC curve, indicating that our method based on the semiparametric monotonic density ratio model performs as well as the other two methods when the exponential tilting model is correctly specified.
Next, we study the robustness of our method. Specifically, we compare the proposed estimator with the estimators based on the other methods when the "gold standard" is not perfect, so that the sample from the diseased population may contain nondiseased individuals, and the nondiseased sample may contain diseased individuals. To this end, we posit that the diseased population has the density and the nondiseased population has the density where 1 − λ 1 is the proportion of nondiseased individuals contaminating the diseased population, and 1 − λ 0 is the proportion of diseased individuals contaminating the nondiseased population.
In this study, we posit f 1 (x) and f 0 (x) to be the probability density functions of the bivariate normal distributions N(μ 1 , 1 ) and N(μ 0 , 0 ) specified above, respectively.
which is a monotonically increasing function of U if λ 1 + λ 0 ≥ 1. Hence, U is still the optimal combination of X 1 and X 2 when λ 1 + λ 0 ≥ 1 (Eguchi and Copas 2002;McIntosh and Pepe 2002). It can be verified that the semiparametric monotonic density ratio model (2) is correctly specified, but the exponential tilting model (1) is misspecified when 1 ≤ λ 1 + λ 0 < 2. The right panel of Figure 1 displays the true ROC curve and the average after 2000 replications of the estimated ROC curves based on the monotonic density ratio method, the exponential tilting method, and the MH method, with λ 1 = 0.8 and λ 0 = 0.8. The true ROC curve is obtained by using the Monte Carlo method as described above. It can be seen that the estimated ROC curves based on the monotonic density ratio and MH methods are close to the true ROC curve, but the estimated curve based on the exponential tilting method is biased.
For case (i), let f (x) and g(x) be the density functions for X|D = 1 and X|D = 0. It is easy to verify that (45), β 1 = 8, and β 2 = 8. For case (ii), we consider the scenario where the "gold standard" is not perfect, as in Section 4.1, but we posit that f 1 (x) and f 0 (x) are the probability density functions of the bivariate exponential distributions specified above for the diseased and nondiseased groups, respectively. In both cases, the semiparametric monotonic density ratio model (2) is correctly specified. However, the exponential tilting model is correctly specified in the first case but not in the second case when 1 ≤ λ 0 + λ 1 < 2. Let U = α + β 1 X 1 + β 2 X 2 . It can be verified that U is the optimal combination of X 1 and X 2 in both cases. The left and right panels of Figure 2 display the results for cases (i) and (ii), respectively. For case (ii), we again posit λ 1 = 0.8 and λ 0 = 0.8. We obtain conclusions similar to those for the normal distribution: the monotonic density ratio and MH methods yield consistent estimates in both cases, but the exponential tilting method yields biased estimates in case (ii).

Comparison of Coefficient Estimates
In this section, we compare the estimate of β from our method with that from other methods. Specifically, we consider the following methods: (1) the MH method of Ma andHuang (2005, 2007); (2) the exponential tilting method, which can be viewed as a parametric alternative to our method. When simulating the biomarkers, we consider two distributions: normal and exponential, where the model setup is the same as that in Sections 4.1 and 4.2, respectively; we refer to these as Case I and Case II. For both cases, we regularize β 2 = 1 for comparison purposes, where · 2 is the L 2 norm.
For Case I, we summarize the β estimates for different (λ 0 , λ 1 ) in Table 1. We observe that the average biases of the β estimates are ordered as follows: when λ 0 = λ 1 = 1, all methods produce very small and comparable biases; when λ 0 < 1 and λ 1 < 1, our model has the smallest biases, the MH method leads to medium biases, and the exponential tilting method yields the largest biases. Moreover, when λ 0 = λ 1 = 1, our method and the exponential tilting method lead to comparable mean square errors (MSEs), whereas those for the MH method are slightly larger; when λ 0 < 1 and λ 1 < 1, our method leads to much smaller MSEs for β than the other methods give.
For Case II, we consider ξ 1 = ξ 2 = ξ with ξ varied to simulate the magnitude difference between diseased and healthy patients. We summarize the β estimates for different (λ 1 , λ 2 ) and (ξ 1 , ξ 2 ) in Table 2. When λ 0 = λ 1 = 1, the proposed method yields MSEs comparable to those for the exponential tilting, but slightly smaller than those for the MH method; when λ 0 < 1 and λ 1 < 1, our method leads to smaller biases for the β estimates than the other methods give. Furthermore, when the magnitude difference between the diseased and healthy patients increases (i.e., ξ increases), the corresponding MSE decreases as expected.
In addition to the above, we have conducted the following simulation studies, for which the results and a detailed discussion are provided in the supplementary document: r We have obtained all the ROC curve and β estimates for all the simulation setups described at the beginning of Section 4; see Sections 4-7 in the supplementary document.
r We have studied the robustness of the proposed method to the misspecification of model (2); see Section 2 in the supplementary document.
r We have also conducted simulation studies for combining more biomarkers; see Section 8 in the supplementary document. In the following, we briefly summarize our observations above and in Sections S2 and S4-S8 in the supplementary document.
For the ROC estimation, we make the following observations. When the parametric model (1) is well satisfied, the proposed, exponential tilting, MH, nonparametric, and smoothed EL methods perform similarly, and they are better than the minmax and tree methods. When the model is misspecified but the monotonic density ratio model (2) is correctly specified, our method performs slightly better than the exponential tilting, MH, nonparametric, min-max, and smoothed EL methods, and it is much better than the tree method. When h(x) is misspecified, the proposed, MH, nonparametric, min-max, and smoothed EL methods have similar performance, and they are better than the tree and exponential tilting methods. When v (x, β) is misspecified, our method performs much better than all the other methods.
We have also compared our method with the exponential tilting, MH, nonparametric, and smoothed EL methods for the β estimation. Note that the min-max and tree methods do not lead to β estimates comparable to those of the other methods, so they are not included. When the parametric model (1) is well satisfied, our method and the exponential method perform similarly, and they are more efficient than the other methods. When the model is misspecified but the monotonic density ratio model (2) is correctly specified, the exponential tilting method Table . Comparison of coefficient estimation for different choices of λ 0 and λ 1 when biomarkers follow normal distribution; λ 0 = λ 1 = 1 indicates correctly specified monotonic density ratio and exponential tilting models; λ 0 < 1 or λ 1 < 1 indicates correctly specified monotonic density ratio model but misspecified exponential tilting model.

Proposed
Exponential tilting MH  Table . Comparison of coefficient estimation for different choices of λ 0 and λ 1 when biomarkers follow exponential distribution; λ 0 = λ 1 = 1 indicates correctly specified monotonic density ratio and exponential tilting models; λ 0 < 1 or λ 1 < 1 indicates correctly specified monotonic density ratio model but misspecified exponential tilting model.  yields biased estimates, and our method yields estimates that are much more efficient than those of the other methods. We conjecture that the main reason our estimator is more efficient than the others is that our method is based on the maximum likelihood principle for a semiparametric model. On the one hand, this principle has been demonstrated to be efficient in the establishment of methodology for complex statistical and practical problems. On the other hand, in our problem, this model efficiently incorporates both the case and control data in the estimation of the distributions; in contrast, the other methods (except the exponential tilting method) use the case and control data separately in the estimation of the corresponding distribution.

Applications
In this section, we apply our method to two well-known examples.
Example 1. As reported by Wieand et al. (1989), sera from n 0 = 51 control patients with pancreatitis and n 1 = 90 case patients with pancreatic cancer were studied at the Mayo Clinic with a cancer antigen (CA-125) and a carbohydrate antigen (CA19-9). A parametric ROC curve analysis in Qin and Zhang (2010) shows that the antigen CA19-9 is the better biomarker for pancreatic cancer. To combine the test results for CA-125 and CA19-9, we adopt the following semiparametric monotonic density ratio model: = ψ (log(CA-125) + β 2 log(CA19-9)). Qin and Zhang (2010) suggested the log-transformation on CA-125 and CA19-9. In Table 3, we report the coefficient estimates and their corresponding standard errors (SEs) based on 200 bootstrap estimates; we have regularized β 2 = 1 for comparison purposes. We observe that the estimates from the three methods are slightly different; the difference between our method and the MH method is smaller than that between our method and the exponential tilting method. Our method leads to smaller SEs than other methods give. In contrast to the simulation example, we are unable to obtain the biases and MSEs of the β estimates in this real-data example. However, based on our observations in Section 4.3 and Section S2 of the supplementary document, we conjecture that these differences are likely a result of the model misspecification. Our method may have comparable or smaller biases than the exponential tilting and MH methods, and may have the smallest MSEs.
Example 2. Etzioni et al. (1999) and Pepe (2003, p. 161) analyzed the serum levels of prostate-specific antigen (PSA) for n 1 = 71 cases of prostate cancer and n 0 = 70 age-matched controls. For Table . Estimation of the coefficients based on the monotonic density ratio, exponential tilting, and MH methods for the carbohydrate antigen (CA-) and cancer antigen (CA-).

Proposed
Exponential  these individuals, retrospective blood samples were available, so they were assayed for PSA. Etzioni et al. (1999) were interested in determining if longitudinal PSA profiles were capable of discriminating cases from controls. We wish to combine the test results. In the original data, each individual had multiple test results. To simplify the discussion, we focus on each individual's average results so that each has only one measurement for the biomarkers/covariates: total PSA (TPSA), free PSA (FPSA), time, and age. To optimally combine these test results and covariates, we adopt the following semiparametric monotonic density ratio model: Qin and Zhang (2010) suggested the log-transformation on TPSA and TPSA. We report the coefficient estimates (after regularizing β 2 = 1) and their corresponding standard errors (SEs) based on 200 bootstrap estimates in Table 4. The scenarios are similar to those of the first example, and therefore the discussion is similar; we omit it for brevity. Eguchi and Copas (2002), Copas and Corbett (2002), and McIntosh and Pepe (2002) showed that the best way to find the optimal combination of multiple diagnostic tests is to use the log density ratio statistic for the diseased and nondiseased populations. In most cases, the log density ratio is a monotonic nondecreasing or nonincreasing function of some combination (not necessarily linear) of the multiple diagnostic tests. To avoid the need to choose a specific model, it is reasonable to assume that the density ratio is a monotonic function without a specified form. Ordering-restricted inference has been popular because it can use the natural ordering in physical or biomedical problems without making artificial parametric assumptions about the underlying distributions. This model implies that the combination of multiple tests is stochastically larger for one group than for the other. We believe that this is the minimum requirement for a model assumption since useful biomarkers should be able to distinguish between disease and nondisease. We use the wellknown PAVA (Ayer et al. 1955) to find the maximum semiparametric likelihood estimate. Simulation studies show that our method yields consistent estimators of the optimal ROC curves and the AUC. Our method is applicable to both prospectively and retrospectively collected data. When the data are collected retrospectively, that is, the test information is collected by conditioning on the response status, then the two-sample density ratio model is equivalent to the monotonic response probability model (i.e., the probability of a response being in a given category is a monotonic nondecreasing or nonincreasing function of some combination of the multiple tests).

Appendix: Proofs of Theorems 1 and 2 Condition A and Theorems of Consistency
We first impose the regularity conditions that are needed to establish the consistency results given in Theorems A.1 and A.2.
Conditions A1 and A2 together ensure that the class is P-Glivenko-Cantelli for P = F 0 and G 0 , which is used to show thatθ (v (x;β)) is consistent. The concept of P-Glivenko-Cantelli will be discussed below. Condition A3 requires that the model be identifiable. Cosslett (1983) and Klein and Spady (1993) discussed regularity conditions for the identifiability of θ (v (x; β)); we refer to them for more details. Here, we simply assume in Condition A3 that θ (v (x; β)) is identifiable. With the consistency ofθ (v (x;β)), Condition A3 implies the consistency ofβ. Condition A4 is needed in the proof of the consistency of ROC C (s).
With Condition A, we have the following consistency results.
With the results in Theorem A.1, we are able to show the consistency ofF C (u) andĜ C (u), which in turn is used to prove the consistency of ROC C (s) and AUC C .
The proofs of Theorems A.1 and A.2 are given in the supplementary document.

Preliminary Preparation
Our proof relies heavily on some results in empirical processes. We first review some concepts. Throughout, we use " " (" ") to denote smaller (greater) than up to a universal constant. Let X 1 , . . . , X n be a sequence of iid random variables from the distribution P. We use P n to denote the empirical cumulative distribution function based on X 1 , . . . , X n . That is, Further, let G be a class of real functions. The class G is called P- We need the following definitions of entropy for function classes, which play key roles in modern empirical process theory. They are adapted from Definition 2.1 and Definition 2.2 in van de Geer (2000).
Definition A.1. For any δ > 0 and q > 0, let N q,B (δ, G, P) be the smallest value of N for which there exists a set of pairs of functions and (ii) for any g ∈ G, there exists a j = j(g) such that N q,B (δ, G, P) is called the δ-bracketing number of G and H q,B (δ, G, P) = log N q,B (δ, G, P) is called the δ-entropy with bracketing of G.
Definition A.2. For any δ > 0 and q > 0, let N q (δ, G, P) be the smallest value of N for which there exists a collection of functions g 1 , . . . , g N , such that for any g ∈ G, there is a j = j(g) ∈ {1, . . . , N}, such that g − g j q,P ≤ δ.
N q (δ, G, P) is called the δ-covering number of G and H q (δ, G, P) = log N q (δ, G, P) is called the δ-entropy of G.
To facilitate our proof for the convergence rate of our estimators, we need the following lemma, which is a direct application of Lemma 5.13 in van de Geer (2000).
for every δ > 0 and some 0 < α < 2 and some constant A. Then, for some constant c and n 0 depending on α and A, we have for all T ≥ c and n ≥ n 0 , P sup g∈G, g−g0 2,P ≤n −1/(2+α)

Two Useful Lemmas
We present two useful lemmas, which play very important roles in the proof of our theorems. The first lemma establishes the δ-entropy with bracketing of the single-index class The proof is given in the supplementary document.
Lemma A.2. Suppose Conditions A1 and A2 are satisfied. For P = F 0 , P = G 0 , and P = H 0 , any δ > 0, and any integer q > 0, we have (a) if δ ≤ 1, where " " is up to a universal constant depending only on p and q; The second lemma given below establishes the basic inequality, which supports the proof of Theorems A.1 and 1. The proof of this lemma is given in the supplementary document.
which are the empirical cumulative distribution functions of {Y 1 , . . . ,Y n1 } and {X 1 , . . . , X n0 }, respectively. Then we have Proof of Theorem 1. Part (a). Recall the basic inequality (A.7). In the following, we show We show only (A.8); the proof for (A.9) is similar. To this end, let By the definition of and Condition B1 that θ 0 (v (x; β 0 )) > c > 0, there exists a universal constant M > 0, such that G = M G satisfies sup g∈G |g − g 0 | ∞ ≤ 1, (A.10) where g 0 = 4Mλ is a constant function. Next, we calculate H 2,B (δ, G, F 0 ). It is easily checked that √ ⊂ , therefore, by Lemma A.2, we have where " " is up to a universal constant depending on λ and p. Combining (A.10) and (A.11) with Lemma A.1, we immediately have sup g∈G, g−g0 2,F 0 ≤n −1/3 In the above, replacing g with g = 4Mλ , and g 0 with g 0 = 4Mλ, we have  β)) and E 0 (·) is taking the expectation on X under H 0 (x). The following lemma is a direct application of Theorem 3.4.1 of van der Vaart and Wellner (1996).
Assume that there exists η > 0 such that for any 0 < δ ≤ η, where φ(δ) is a function such that φ(δ)/δ α is a decreasing function on (0, η) for some α < 2. Furthermore, let r n satisfy We now check conditions (A.14)-(A.17) in the lemma above. We start with (A.14). Since M(β 0 ) = 0, we need to consider only M(β). Define Applying a Taylor expansion and Conditions B2 and B3, we have which leads to when β − β 0 2 ≤ η 0 . Therefore, we can choose 0 < η ≤ η 0 such that for any 0 < δ ≤ η, we have Next, we check (A.15). For every β ∈ B and θ ∈ , Therefore, which immediately leads to with φ(δ)/δ α being a decreasing function on (0, η) for any α ∈ (1, 2). If we set r n = n 1/3 , then we can check (A.16): Last we check (A.17). With Part (a) in this theorem, we have where H n (x) = n i=1 mi+ri n I(t i ≤ x). We shall examine the function class (A.23) where G I = I v (x; β) ≤ u : β ∈ B, u ∈ R . According to Lemma 9.8 and Lemma 9.12 of Kosorok (2008), G I is a VC-subgraph class with bounded envelope 1 and index V (G I ) = (p − 1) + 2, where p − 1 is the dimension of B −1 . This together with Theorem 3.11 in van de Geer (2000) leads to = C(p + 1)(16e) p+1 1 δ pq , for all δ > 0, for a universal constant C > 0 and all q < ∞, probability measure Q. Specifically, where " " is up to a universal constant depending only on p. Furthermore, it is readily checked that when δ > 1, we have This together with Theorem 2.5 in Kosorok (2008) shows that (i) G I is H 0 -Donsker and uniformly bounded. Next, we show that (i) is H 0 -Donsker and uniformly bounded. From the definition of , we can immediately conclude that it is uniformly bounded; therefore, we need to show only that it is H 0 -Donsker. To this end, define the bracketing integral which together with Theorem 2.3 in Kosorok (2008) proves (ii). By combining (i) and (ii) and applying Corollary 9.32 (iii) in Kosorok (2008), we conclude that = G I · is H 0 -Donsker. This proves (A.22) and therefore completes our proof of Part (a).

Supplementary Materials
The supplementary materials contain more simulation studies and supplementary technical details for the Appendix.