Diagnostic evaluation of pharmacokinetic features of functional markers

ABSTRACT The dynamicity of functional (curve) markers from modern clinical studies offers deeper insights into complex disease physiology. A frequent clinical practice is to examine various ‘pharmacokinetic features’ of functional markers (definite integral, maximum value, time to maximum, etc.) that reflect important physiological underpinnings. For instance, the current diagnostic procedure for kidney obstruction is to examine several pharmacokinetic features of renogram curves characterizing renal function. Motivated by such clinical practices, we develop a statistical framework for evaluating diagnostic accuracy of pharmacokinetic features using area under the receiver operating characteristic curve (AUC). The major challenge is that functional markers are observed at discrete time points with measurement error. To address this challenge, we develop a two-stage non-parametric AUC estimator based on summary functionals providing unified representation of various pharmacokinetic features and study its asymptotic properties. We also propose a sensible adaptation of a semiparametric regression model that can describe heterogeneity of AUC across different subpopulations, while appropriately handling discreteness and noise in observed functional markers. Here, a novel data-driven approach that balances between bias and efficiency of the regression coefficient estimates is introduced. Finally, the framework is applied to rigorously evaluate pharmacokinetic features of renogram curves potentially useful for detecting kidney obstruction.


Introduction
Modern medical devices are increasingly producing complex marker data that could potentially offer deeper insights into physiological mechanisms underlying diseases and aid in improving diagnostic capabilities. One type of complex markers that arises frequently in medical imaging studies is a functional marker whose sampling unit is a smooth continuous curve defined over a time or spatial domain (or other continuum). The dynamic, flexible structure of functional markers is a rich source of clinical information, and many functional data analysis (FDA) tools have been developed to effectively translate this structural information into clinical useful applications (Kokoszka and Reimherr 2017;Morris 2015;Ramsay and Silverman 2005). To facilitate the practical use of functional markers for pure diagnostic purposes, a frequent procedure in clinical studies is to derive and examine their 'pharmacokinetic features'-e.g., definite integral, maximum value, time to reach maximum and so on -that characterize various dynamic, interpretable curve patterns and relate to important physiological underpinnings. Use of pharmacokinetic features of functional markers for disease diagnosis and monitoring can be found in many biomedical fields, including pulmonology (Stockley and Stockley 2016), Alzheimer's study (Weiner et al. 2012), cardiology (Zhou and Sedransk 2013), and obesity study (Going et al. 2003).
The wide-usage of pharmacokinetic features of functional markers for diagnostic purposes calls for the need to rigorously evaluate their diagnostic utility. For instance, consider the renal study that has motivated our work. Obstruction to urine drainage from a kidney (kidney obstruction) is a serious clinical problem that can lead to irreversible loss of renal function if not properly treated. In recent years, Technetium-99 m Mercaptoacetyltriglycine (MAG3) renography has been widely used as a high-tech, non-invasive procedure to diagnose kidney obstruction (Taylor et al. 2008). The procedure begins with an intravenous injection of a gamma emitting tracer, MAG3, that is removed from the blood by the kidneys and then travels down the ureters from the kidney to the bladder. While MAG3 passes through the kidney, the MAG3 photons are imaged and quantified on region of interest over time, producing a set of renogram curves (functional markers) that serve as a basis for diagnosing kidney obstruction (Bao et al. 2011). The first renogram curve (called baseline) represents the MAG3 photon counts detected at 59 time points during an initial period of 24 min. The second renogram curve (called post-furosemide) is obtained at 40 time points during an additional period of 20 min after an intravenous injection of furosemide, a potent diuretic. The top left and right columns of Figure 1 respectively depict baseline and post-furosemide renogram curves of 257 kidneys stored in Emory University's archived database.
The typical pattern of the non-obstructed kidney's baseline renogram curve is characterized by a quick uptake and immediate excretion of MAG3. On the other hand, the obstructed kidney's baseline curve often shows a prolonged period of MAG3 accumulation, followed by its poor excretion to the bladder throughout the post-furosemide renogram. The bottom left and right panels of Figure 1 respectively show the baseline and post-furosemide renogram curves of two kidneys that are typically diagnosed as non-obstructed (solid lines) and obstructed (dashed lines). However, in practice, there is a high kidney-to-kidney variability in renogram curves, and many show less distinctive patterns as depicted in the top panel of Figure 1. As such, accurate diagnosis of kidney obstruction using renogram curves generally requires substantial expertise in renal physiology and MAG3 pharmacokinetics (Taylor and Garcia 2014). Given the difficulty of interpreting renogram curves, the standard clinical practice to assist physicians arrive at prompt and accurate diagnosis of kidney obstruction is to examine their pharmacokinetic features (e.g., definite integral of the renogram curve, time to maximum MAG3, maximum MAG3, etc.) that adequately reflect the aforementioned patterns (Taylor et al. 2008). For example, time to maximum MAG3 for an obstructed kidney is typically large, given its lack of absorption capacity. Unfortunately, due to limited resources, a majority of MAG3 renography scans in the U.S. are interpreted by physicians who have less than 4 months of training in nuclear medicine (Taylor and Garcia 2014). The ensuing clinical problem is that these physicians tend to select and utilize pharmacokinetic features based on ad hoc blending of intuition and past practice without proper scientific guidance, frequently leading to erroneous diagnosis, inappropriate patient management and unnecessary renal surgery (Taylor et al. 2008). Thus, it is of interest to rigorously evaluate the diagnostic accuracy of various pharmacokinetic features of renogram curves and establish scientifically justified guidance regarding their selection and application.
In cases where a new technology is implemented and optimal thresholds are not verified, the area under the receiver operating characteristic (ROC) curve is often a preferred summary measure that aggregates diagnostic performance information for all thresholds (Dodd and Pepe 2003;Pepe 2003). It is thus sensible to compute the area under the ROC curve (AUC) to evaluate pharmacokinetic features of newly emerging functional markers. The goal of this article is to develop a novel statistical framework for evaluating pharmacokinetic features based on AUC. A key challenge is that each functional marker is not directly observable continuously in time; rather, it is collected at discrete time points with some possible measurement error. In such a situation, extra work in constructing the smoothed estimate of an underlying function is warranted to ensure desirable asymptotic properties of the corresponding AUC estimator. Our strategy is to first represent different types of pharmacokinetic features by summary functionals (Jang et al. 2019), defined as a set of mappings from a space of squareintegrable functions to a real line. We then develop a two-stage non-parametric AUC estimator, where a summary functional is first appropriately estimated from discrete, error-prone functional measurements, and then plugged into a Mann-Whitney statistic. We show that this estimator offers a unified mathematical framework and conceptual flexibility for studying AUC of important, widely used pharmacokinetic features, including integral, magnitude-specific and time-specific types, while being consistent and asymptotically normal.
Pathological mechanisms of many diseases are prone to a large population heterogeneity. As such, the use of covariate-specific AUC has been advocated to tailor the use of certain markers to specific high-benefit subpopulations (Janes and Pepe 2008;Pepe 1998). In our motivating renal study, normal ranges of several pharmacokinetic features of renogram curves were found to vary across different age and gender groups (Esteves et al. 2006), suggesting potential variations in the diagnostic accuracy of these features among different demographic groups. Therefore, we propose a sensible adaptation of a semi-parametric AUC regression model (Dodd and Pepe 2003) to systematically assess covariate effects on the diagnostic accuracy of various pharmacokinetic features. There are some important subtlety in formulating estimating equations for our regression model involving pharmacokinetic features. Firstly, outcomes are based on pairwise comparisons of pharmacokinetic features derived from discrete, error-prone functional measurements. This demands extra work to formulate the estimating equations and to establish asymptotic properties of the regression parameter estimates. Secondly, to properly estimate the covariate-specific AUC when the covariate is continuous, we should incorporate additional pairwise comparisons with covariate differences within a pre-specified neighborhood of zero, whose size should be carefully chosen to balance out the bias and efficiency (Dodd and Pepe 2003). In this work, we propose an automated data-driven approach for selecting the appropriate size of this neighborhood.
The remainder of the article is organized as follows: In Section 2, we propose a framework for evaluating the diagnostic accuracy of a wide class of pharmacokinetic features represented as summary functionals. We study the asymptotic properties of the proposed two-stage AUC estimator accounting for discrete, error-prone functional measurements and develop inferential procedures. In Section 3, we introduce a semi-parametric AUC regression framework for pharmacokinetic features. We propose estimating equations accounting for discrete, error-prone functional measurements and carefully study their asymptotic properties. We also introduce a novel data-driven method for trading off between bias and efficiency of the regression coefficient estimates when continuous covariates are considered. In Section 4, we report the results of a simulation study conducted to evaluate the performance of the proposed approaches. The application of our methods to a renal study is described in Section 5. Finally, we conclude with some remarks in Section 6.

Summary functional
Let X denote a functional marker defined over a compact time interval T � . While we do not impose any parametric structure on the distribution of X, we do assume that X has continuous mean function EfXðtÞg ¼ μðtÞ and covariance function Cðs; tÞ ¼ CovfXðsÞ; XðtÞg. Recently, a concept of summary functional that can flexibly capture various pharmacokinetic features of a functional marker was introduced (Jang et al. 2019). Assume that X is in F ω , a Hilbert space of functions f : T ! that are ωtimes (ω 2 þ Þ continuously differentiable at any t 2 T and square integrable with respect to Lebesgue measure dt on T , i.e., Ef ð T XðtÞdtg < 1. Then, summary functional is defined as the mapping, ϕ : F ω ! , which can take different forms, depending on the pharmacokinetic feature of interest. Using summary functionals offers a unified mathematical framework and conceptual flexibility for studying AUC of important, widely used pharmacokinetic features. To illustrate, we introduce three classes of summary functionals representing pharmacokinetic features that are widely used in various clinical studies. We denote X ðνÞ as ν th ð0 � ν � ω À 2Þ derivative of X, with X ð0Þ ¼ X.

Integral-type summary functional
Integral-type summary functional represents the definite integral of a functional marker over a period of time: Setting ν ¼ 0, ν ¼ 1 and ν ¼ 2 gives average value of a functional marker, average velocity and average acceleration, respectively, when scaled by the interval length.

Magnitude-specific summary functional
Another important pharmacokinetic feature of a functional marker its magnitude at a specific point in time. Given t � 2 T , a general form of the magnitude-specific summary functional is: A unique maximum or minimum magnitude of the functional marker often provides useful information and can be expressed as a summary functional as defined below:

Time-specific summary functional
Time to attain a certain threshold value η of the functional marker is often clinically significant and can be captured by a time-specific summary functional that maps the space of functional markers to the time domain, i.e., ϕ : F ω ! T . That is, ϕðXÞ :¼ ϕ ½ν� TIMEðηÞ ðXÞ ¼ infft 2 T : X ðνÞ ðtÞ ¼ ηg: The timing of a maximum/minimum value can be studied using the time-specific summary functional as follows In reality, we do not directly observe X continuously in time; rather, we often observe its noisy and discrete version W on N time points fðt 1 ; t 2 ; . . . ; t N Þ 2 T g: where k are identically and independently distributed (iid) square integrable random errors with mean zero and finite variance σ 2 . There are several ways to reconstruct the underlying functional marker X based on discrete noisy observations W. Several smoothing techniques, including smoothing splines and kernel smoothing, can be employed. For instance, smoothing splines (e.g., cubic B-splines) approximate X using the coefficients obtained via penalized least squares methods that aim to explicitly control the trade-off between fidelity to the data and roughness of the function estimate (Green and Silverman 1993;Ramsay and Silverman 2005). In this paper, we opt to a nonparametric smoothed estimate of the true undelying functional marker X using the Gasser-Müller (GM) kernel estimator, which is widely recognized for its computational efficiency and ability to provide accurate first or higher-order derivative estimates even with a number of observed time points as small as 15 (Gasser and Müller 1984;Gasser et al. 1991;Müller 1984Müller , 1985. Moreover, GM kernel estimator possesses desirable asymptotic properties that can be readily incorporated to derive the asymptotic properties of the two-stage AUC estimator, which we will introduce in Section 2.2.
Denote Ŵ as the reconstructed version of X obtained using the GM kernel estimator. Then, given a pharmacokinetic feature of interest and the corresponding summary functional ϕðXÞ, its non-parametric estimator ϕ N ðWÞ can be built based on Ŵ . For instance, ϕ INTðT Þ ðXÞ ¼ ð T XðtÞdt can be estimated by the Riemann sum of Ŵ on design points ft 1 ; . . . ; t N g: In fact, estimators of three widely used summary functionals (integral-type, magnitude-specific, and time-specific) can be constructed using the corresponding empirical versions based on Ŵ as suggested by Jang et al. (2019). It is shown that these summary functional estimators ϕ N ðWÞ, when built using the GM kernel estimator Ŵ , converge in probability to respective true values ϕðXÞ as N ! 1 (Altman 1990;Jang et al. 2019;Müller 1985).

Diagnostic evaluation of pharmacokinetic features
We aim to evaluate the diagnostic accuracy of pharmacokinetic features of a functional marker for classifying 'diseased' and 'non-diseased' states (groups), denoted as D and D, respectively. Let X D i and X D j (i ¼ 1; . . . ; n D ; j ¼ 1; . . . ; n D ; n ¼ n D þ n D ), respectively, denote independent realizations of functional markers from D and D states. Functional markers are assumed to be iid within groups and mutually independent between groups. Given a pharmacokinetic feature of interest and the corresponding summary functional that captures it, we assume that larger summary functional values indicate greater likelihood of disease (i.e., 'positive' if the summary functional value exceeds some given cutoff value c).
Let ϕðX D i Þ and ϕðX D j Þ denote summary functional values of ith and jth subjects from the diseased and non-diseased groups, respectively. The ROC curve plots false-positives (1-specificity) versus truepositives (sensitivity) for varying cutoff values c: The area under the ROC curve (AUC) summarizes performance information across all cutoff values and was shown to equal which represents a probability that a summary functional value of a randomly selected diseased subject is greater than that of a randomly selected non-diseased subject (Bamber 1975). Note that, without loss of generality, PrðϕðX D i Þ ¼ ϕðX D j ÞÞ ¼ 0 is assumed. AUCðϕÞ ¼ 1 corresponds to a pharmacokinetic feature (captured by ϕ) that perfectly classifies subjects into the two states, while AUCðϕÞ ¼ 0:5 denotes a feature that performs no better than chance. The closer AUCðϕÞ is to 1, the better the overall diagnostic performance of the pharmacokinetic feature of interest.

Two-stage estimation
If functional markers (X D i and X D j 's) were completely observable, a Mann-Whitney type statistic (Serfling 1980) that compares the true summary functional value of each diseased subject to that of every other non-diseased subject in pairs would estimate θðϕÞ: where indicator function Ið�Þ equals 1 if the bracketed expression is true and 0 otherwise. In practice, however, functional markers from both groups are observed at discrete time points with measurement error as fW D i ðt ik Þ; k ¼ 1; . . . ; N i g and fW D j ðt jk Þ; k ¼ 1; . . . ; N j g, so that estimator (2) cannot be directly used. We thus propose a two-stage estimator. In the first step, we use a GM kernel method or other techniques to obtain a smoothed estimate of each subject's functional marker, Ŵ D i or Ŵ D j , based upon which we build a corresponding summary functional estimator In the second step, we replace the true summary functionals with their estimated counterparts in (2) as following Ties can be handled by adding 0: Þg to the kernel of (3). There are two potential sources of sampling error in the estimation of θðϕÞ using θ ðϕ N Þ. The first source of error stems from using a two-sample Mann-Whitney statistic (2) to estimate the probability in (1). The second source of error comes from replacing the true summary functionals with their estimates and is specific to our situation involving functional markers. It is important that both sources of error are taken into consideration when studying the (asymptotic) properties of the two-stage estimator (3).

Asymptotic properties
In Theorem 2.1, we state that the proposed two-stage estimator θ ðϕ N Þ presented in (3) is a consistent estimator of θðϕÞ and asymptotically normal, given that we plug in a consistent estimator ϕ N for the true summary functional ϕ-e.g., three-widely used summary functionals (integral-type, magnitudespecific, time-specific) based on GM kernel estimators. The regularity conditions (A1)-(A5), explicit formula for σ 2 θðϕÞ and the proof of Theorem 2.1 are given in Section A of the supplementary material.
Note that P N differs depending on different types of pharmacokinetic feature considered (Jang et al. 2019).

Statistical inference
Given the rather complicated analytic form of σ 2 θðϕÞ , we recommend a bootstrap approach for its estimation. Note that here the bootstrap sample is drawn separately from each group (D and D) with replacement. Other non-parametric methods such as the jackknife, half-sampling or subsampling can also be used (Efron 1981).
One can use normal approximation to construct confidence intervals of AUCðϕÞ. Since the scale for the AUC is restricted to ð0; 1Þ, adopting a logit transformation may accelerate the convergence of θ ðϕ N Þ to asymptotic normality, especially when it is close to the boundary. Define lðxÞ ¼ lnfðx=ð1 À xÞg, l 0 ðxÞ ¼ dlðxÞ=dx and l À 1 ð�Þ as the inverse function of lð�Þ. Using the delta method, the 100ð1 À αÞ% confidence interval for AUCðϕÞ is constructed as We also develop a hypothesis testing procedure that can determine whether a particular feature, captured by the summary functional ϕ 1 , leads to a significantly better AUC than that of a competing feature captured by ϕ 2 . The null and alternative hypotheses are H 0 : AUCðϕ 1 Þ ¼ AUCðϕ 2 Þ vs. H 1 : AUCðϕ 1 Þ > AUCðϕ 2 Þ, respectively. Let ϕ N1 and ϕ N2 respectively denote the estimated versions of ϕ 1 and ϕ 2 . The hypothesis can be tested based on the Wald test statistic: 1Þ, where the denominator term V J represents the estimated asymptotic standard error of the numerator. V J can be obtained by bootstrapping the observations at the subject level.

Covariate-specific AUC analysis of pharmacokinetic features
The diagnostic accuracy of pharmacokinetic features may depend on patient characteristics. In this section, we present a framework for studying population heterogeneity in the diagnostic utility of pharmacokinetic features based on covariate-specific AUC analysis.

Model formulation
Suppose both functional marker and covariate data, (X D i ; Z D i ) and (X D j ; Z D j ), are available for study subjects from D and D. The covariate-specific AUC of a pharmacokinetic feature of interest, captured by ϕ, is written as AUC ij ðϕÞ and is defined as the probability that the ϕ value of a randomly selected diseased subject with covariate value Z D i ¼ z D i is greater than that of a randomly selected non-diseased subject with covariate value Given definition (4), a covariate-adjusted AUC analysis of pharmacokinetic features can be performed based on a sensible adaptation of the semiparametric regression model (Dodd and Pepe 2003): where gð�Þ is a monotone link function (e.g., logit and probit). β D and β D are p-dimensional vectors of unknown parameters that quantify the rates of change in g À transformed AUC given unit changes in Z D and Z D , respectively. In most cases, investigating the AUC between diseased and non-diseased subjects with a common covariate value Z D i ¼ Z D j ¼ z is of substantial scientific interest. Under this setting, the covariatespecific AUC (4) becomes and the semi-parametric regression model (5) becomes AUC Z ðϕÞ :¼ θ Z ðϕÞ ¼ g À 1 ðZ T βÞ: To illustrate this modelling approach, consider a model that uses a logit link function with a single covariate Z: logitfθ z ðϕÞg ¼ β 1 þ β 2 Z. Here, expðβ 1 þ β 2 zÞ ¼ AUC z ðϕÞ=f1 À AUC z ðϕÞg denotes the AUC odds in a subpopulation defined by covariate value Z ¼ z, and expðβ 1 Þ represents the AUC odds ratio (OR) between subpopulations corresponding to Z ¼ z þ 1 and Z ¼ z.

Estimating equations
where β D 0 , β D 0 and β 0 denote true parameter values. This suggests that if X i 's were available for all subjects, our model (7) would correspond to a generalized linear regression model for the binary variables U ij restricted to ði; jÞ pairs with Z D i ¼ Z D j ¼ Z. Accordingly, the estimating equations for the regression parameters β in (7) are constructed by Dodd and Pepe (2003) as where and Ω ij ¼ θ ij ð1 À θ ij Þ is a variance function. Since EfS n ðβ 0 Þg ¼ 0, the estimating equations (8) are the classic score equations for binary regression, except that U ij 's are cross-correlated. Under some moderate conditions, it was shown that the estimators β obtained from solving S n ðβÞ ¼ 0 are consistent and asymptotically normal (Dodd and Pepe 2003).
The direct use of the estimating equations (8) is impossible because we observe noisy and discrete W rather than its true counterpart X. Accordingly, we propose to replace the pairwise comparisons of true summary functionals in (8) by those of estimated summary functionals, Þg, and solve the following estimating equations: S Nij ðβÞ:

Estimation with continuous covariates
When the covariates are continuous or data are too sparse within covariate stata, the proposed estimating equations (9) may not be efficient or feasible as there may be few or no pairs from D and D with the identical covariate value. In such a case, the strategy is to temporarily consider the estimating equations (9) derived from the following model and replace Þg with covariates that are sufficiently close to each other (Dodd and Pepe 2003;Liu and Zhou 2013). Note that this model reduces to our target model (7): There is a trade-off between bias and efficiency as η varies (Dodd and Pepe 2003). One can choose the value of η manually or in a data-driven manner. With regard to the latter, we first note that conventional cross-validation approaches to select the optimal bandwidth for smoothing in usual regression models cannot be applied as pairwise comparison data U Nij with Z D i ¼ Z D j ¼ z are not always available. Under this circumstance, we propose to utilize the fact that θðϕÞ ¼ Efθ Z ðϕÞg to assess the performance of a given η value. Specifically, we use this fact to compare the estimated unadjusted AUC measure θ ðϕ N Þ to θ ðηÞ � ðϕ N Þ ¼ n À 1 P n k¼1 g À 1 ðβ 1 þβ 2 z k Þ for each η value, where β 1 and β 2 are obtained by fitting the temporary model (10) that incorporates U Nij 's with covariates up to η units apart, and z k 's denote covariates of all diseased and non-diseased subjects. Then, we choose η as In practice, a grid search can compute θ ðηÞ � ðϕ N Þ over candidate η values to select η opt .

Asymptotic properties
Denote β N as the solution to S Nn ðβÞ ¼ 0. Theorem 3.1 summarizes its asymptotic properties. The regularity conditions (B1)-(B8), proofs of Lemmas 1-4, the forms of Q and and the proof of Theorem 3.1 are given in Section B of the supplementary material.
Given the rather complicated analytic forms of � and Q, bootstrap or jackknife variance estimates of � and Q can be used for convenience. For instance, the jackknife estimator is given as (Shao 1992): is the estimate of β based on the data with kth observation data fϕ N k ðw k Þ; z k g removed from the pooled sample of diseased and nondiseased subjects.

Simulations
We conducted simulation studies to evaluate the finite sample performance of the proposed methods. Firstly, the performance of the two-stage AUC estimator (3) was assessed. We considered three widely used summary functionals (integral-type, magnitude-specific, time-specific) briefly introduced in Section 2.1. We first generated the disease status of each subject using Bernoullið0:5Þ distribution. The underlying functional markers X were generated over a time interval T ¼ ½0; 1� under four different scenarios. In Scenario 1, we generated XðtÞ by a Gaussian process with mean functions μ D ðtÞ ¼ 1 and μ D ðtÞ ¼ 2 for the non-diseased and diseased subjects, respectively. Likewise, in Scenario 2, X were generated as a Gaussian process, but this time with time-varying mean functions μ D ðtÞ ¼ t and μ D ðtÞ ¼ 2t. In Scenario 3, Gaussian processes with periodical mean functions μ D ðtÞ ¼ sinðπtÞ and μ D ðtÞ ¼ 3 sinðπtÞ were considered. Note that a covariance function Cðs; tÞ ¼ expfÀ ðs À tÞ 2 g, s; t 2 T , was used for the first three scenarios. In Scenario 4, we generated XðtÞ ¼ sinð2πtÞ with probability (w. p.) 2=3 and XðtÞ ¼ sinð 2 3 πtÞ w.p. 1/3 for the non-diseased, and generated XðtÞ ¼ sinðπtÞ w.p. 1 for the diseased. In all four scenarios, we obtained error-prone, discrete functional measurements W under model (2.1), by further contaminating the generated X with measurement errors 2 that were iid generated from Nð0; 0:4 2 Þ.
We evaluated ϕ INT under Scenarios 1 and 2; ϕ MAGð0:5Þ and ϕ MAX under Scenario 3; and ϕ tMAX under Scenario 4. For each scenario, we considered the following five study designs to assess the sensitivity of the proposed framework to varying density of observed time points: (20 U ) unbalanced design with N i (and N j ) following a Poisson distribution with mean 20; (40 U ) unbalanced design with N i following a Poisson distribution with mean 40; (20 B ) balanced design with N i ¼ 20; (40 B ) balanced design with N i ¼ 40; and (60 B ) balanced design with N i ¼ 60. Except for the two endpoints (0 and 1), the N i observation times in above study designs were randomly drawn from a uniform grid T grid ¼ fðu À 1Þ=59; u ¼ 2; . . . ; 59g separately for each subject. A GM kernel estimator (Gasser et al. 1991;Müller 1984) with a polynomial kernel of degree 2 and an automatically adapted global 'plug-in' bandwidth that is asymptotically optimal with respect to the mean integrated square error (MISE) was used to produce a smooth estimate of each functional marker. Standard errors were estimated via bootstrap with 1000 resamples, and 95% confidence intervals were computed based on the logit transformation. Table 1 presents the simulation results based on 1000 replicates for each scenario with sample size n ¼ 30 and 100. In all configurations, the biases generally decrease as the sample size (n) and the number of time points (n) increase, suggesting that our two-stage estimators are consistent. The biases are somewhat larger for the magnitude-specific and time-specific summary functionals than the integral-type summary functional, but they are all negligible in magnitude, rarely exceeding 0.01. For integral-type and magnitude-specific summary functionals, the bootstrap standard errors of the AUC estimates are close to the empirical standard deviations, and the empirical coverage probabilities hover around the nominal level of 95%, regardless of the study design. For ϕ tMAX , we see a slight over-coverage of the confidence intervals, but they rapidly approach the nominal level as the sample size increases. We also see that ϕ tMAX 's standard errors and empirical standard deviations are generally larger than those of other summary functionals. This is because the convergence rate of its GM kernelbased estimator is slower compared to those of other summary functional types (Jang et al. 2019).
Under the same data generation scenarios, we also evaluated the performance of the widely used parametric approach, which directly estimates the AUCs of the pharmacokinetic features based on the binormal model without smoothing (Hanley 1988). The results are summarized in Table 2. Although this approach shows a fairly good estimation performance for the integral-type summary functionals, it suffers from large bias and low coverage rates across most configurations for other summary functional types.
Simulation results based on 1000 replicates with sample size n ¼ 200 and 300 are presented in Table 3. Relative biases tend to be slightly larger when data are highly sparse (20 U and 20 B ), but rapidly diminish with increasing number of time points. Relative biases in other cases are all smaller than or close to 3%, suggesting that the regression slope estimates obtained from our proposed approach are reasonably unbiased in a finite sample setting. For ϕ INT and ϕ MAG ð0:5Þ, the estimated standard errors based on the jackknife method generally agree well with the empirical standard deviations. The 95% confidence intervals have coverage probabilities close to the nominal level. For ϕ tMAX , a coverage of 95% or 96% is generally achieved in the continuous covariate case. In the binary covariate case, coverage probabilities are close to 97% with N i � 40 and n ¼ 200, but they approach the nominal level as N i and n increase. In conclusion, our practical recommendation is to perform a semi-parametric regression analysis using functional markers that are, on average, collected on 20 or more time points (40 or more for time-specific summary functionals) to ensure accurate estimation and inference. Simulation results for the binormal parametric approach without smoothing: mean of 1000 biases (Bias), standard deviation of 1000 AUC estimates(empsd), mean of 1000 standard error estimates (EstSE) and proportion of 95% confidence intervals containing the true AUC value (Cov95). D denotes the five study designs for the observed time domain. Finally, we compared the finite-sample performance of the manual and our proposed data-driven approaches for selecting η when continuous covariates are involved. The data generation scheme and simulation results are presented in Section C of the supplementary material. In summary, our proposed data-driven approach is shown to be less sensitive to the underlying model specification compared to the manual approach.

Application to renal study
In this section, we apply the proposed method to the renal study described in Section 1. Analysis of pharmacokinetic features of renogram curves is a standard clinical procedure that facilitates accurate and prompt diagnosis of kidney obstruction (Bao et al. 2011). The scientific goal of the renal study is two-fold: 1) to evaluate and compare diagnostic utility of various pharmacokinetic features of baseline and post-furosemide renogram curves; and 2) identify subpopulation for whom certain pharmacokinetic features are more useful. To study these goals, we utilize the Emory University Hospital's archived database containing MAG3 renography data and expert consensus ratings of patients who were referred to the Emory University Hospital with suspected kidney obstruction during 1/5/1994 to 4/7/2015. After excluding patients with renal grafts, ileal conduits and neobladders, data of 257 kidneys from 133 patients (67 men [50%], 66 women [50%]; mean age, 58 years; SD, 16 years; Table 3. Simulation results for regression coefficient (slope) estimates β 1 from the semiparametric regression model AUC z ðϕÞ ¼ g À 1 ðβ 0 þ β 1 zÞ, obtained as the solution to (9). Φð�Þ and l À 1 ð�Þ respectively denote cumulative standard normal distribution function and inverse logit function. Mean of 1000 relative biases (RBias), standard deviation of 1000 AUC estimates (EmpSD), mean of 1000 standard error estimates (EstSE) and proportion of 95% confidence intervals containing the true AUC value (Cov95). D denotes the five study designs for the observed time domain. range, 18--87 years) were used for analysis. The obstruction status of each kidney was determined based on the consensus diagnosis provided by the three internationally renowned experts in nuclear medicine (Taylor et al. 2008). Based on the consensus diagnosis, 188 kidneys were non-obstructed (D), and 69 kidneys were obstructed (D). For baseline renogram curves, the speed of initial MAG3 uptake inside the kidney and the rate of MAG3 excretion into the bladder are important physiological mechanisms that reflect the severity of the kidney obstruction (Taylor et al. 2008). Accordingly, we evaluated three pharmacokinetic features (summary functionals) that are related to such mechanisms: time to reach half-maximum MAG3 (ϕ1 2 tMAX ), time to reach maximum MAG3 (ϕ tMAX ) and minimum velocity (ϕ ½1� MIN ). For post-furosemide renogram curves, two pharmacokinetic features that characterize the overall MAG3 intensity were considered: definite integral of the renogram curve (ϕ INT ) and maximum MAG3 (ϕ MAX ). Summary functionals corresponding to all these pharmacokinetic features were initially estimated based on the Gasser-Müller kernel estimates of the crude (ν ¼ 0) renogram curves and their first derivatives (ν ¼ 1) that used polynomial kernels of degree 2 and 3, respectively, and an automatically adapted global 'plug-in' bandwidth (Gasser et al. 1991;Müller 1984). Two-stage AUC estimators (3) were used for diagnostic evaluation. Standard errors were estimated via bootstrap with 1000 resamples, and 95% confidence intervals (CIs) were computed based on the logit transformation. Table 4 presents the AUC estimates of the selected pharmacokinetic features of the baseline and post-furosemide renogram curves. For baseline renogram curves, AUCs of ϕ tMAX and ϕ ½1� MIN , are 0.839 (95% CI: 0.774-0.888) and 0.846 (95% CI: 0.783-0.893), respectively, suggesting their good diagnostic accuracy. Especially, the AUC of ϕ tMAX is marginally statistically significantly higher than that of ϕ1 2 tMAX (P-value ¼ 0:07), which had moderate diagnostic accuracy (AUC ¼ 0.776; 95% CI: 0.696-0.839). For the post-furosemide renogram, both ϕ INT (AUC: 0.857; 95% CI: 0.786-0.907) and ϕ MAX (AUC: 0.816; 95% CI: 0.740-0.874) have good diagnostic utility for discrimination between obstructed cases and non-obstructed controls. Our findings have important clinical implications in the diagnosis of kidney obstruction, offering set pharmacokinetic features of renogram curves that practicing physicians can examine to improve their diagnostic accuracy.
Next, we conducted a semiparametric regression analysis (7) to identify age and gender groups for whom a given pharmacokinetic feature is more useful. The model of interest is logitfθðϕÞg ¼ β 1 þ β 2 age þ β 3 male, where age is a continuous variable (years), and male is a dummy variable that equals 1 if the kidney is from a male and 0 otherwise. For estimation, we considered the temporary model: logitfθðϕÞg ¼ β 1 þ β 2 age D þβ 2 ðage D À age D Þ þ β 3 male to accommodate comparisons between obstructed and non-obstructed kidney pairs whose age differences are within η years. η was chosen based on the proposed data-driven method described in Section 3.2. The regression parameters were estimated as the solution to the estimating equation (9) derived from the temporary model, and the corresponding standard errors were obtained by the jackknife method. Table 5 presents the AUC ORs of age and gender for each of the pharmacokinetic features considered in Table 4. For ϕ ½1� MIN of the baseline renogram, the AUC odds of males are 66% lower than those of females (AUC OR: 0.34; 95% CI: 0.12-0.93; p-value ¼ 0:03); that is, baseline ϕ ½1� MIN has superior diagnostic utility for detecting kidney obstruction in females compared to males. The AUCs of other pharmacokinetic features do not depend on patient characteristics.

Discussion
Various pharmacokinetic features of functional markers are frequently derived and studied as they represent important interpretable, physiological information, but are often naively relied upon without appropriate scientific justification. As such, we developed a much-needed framework that can rigorously evaluate pharmacokinetic features based on AUC, while appropriately handling the noise and discreteness in observed functional markers. Specifically, we proposed a two-stage nonparametric AUC estimator and sensible adaptation of a semi-parametric AUC regression model that account for unobserved true functional markers and studied their asymptotic properties. We further proposed a new data-driven procedure for selecting an optimal bandwidth that balances the bias and efficiency of the coefficient estimates of the semi-parametric AUC regression model given continuous covariates.
The proposed framework assumes an existence of already widely used pharmacokinetic features or those that are newly chosen based on a priori scientific information. But this is not always the case in some studies, especially for those that involve recently introduced devices. Future work thus includes extending the proposed framework to select candidate pharmacokinetic features in a purely datadriven manner using modern analytic methods, e.g., tree-based methods. Our long-term plan is to derive an empirical form of a summary functional that produces maximum AUC given any functional marker data.
Another important future direction will be to assess the overall diagnostic accuracy of functional markers without having to summarize them using a set of pharmacokinetic features. One possible approach is to perform functional principal component analysis (FPCA) (Ramsay and Silverman 2005) to obtain a low, finite-dimensional representation (FPCA scores) of each functional marker and assess its diagnostic utility via AUC. There are two major challenges to this FPCA-based approach. Firstly, FPCA scores are obtained by projecting the functional marker to a finite space determined by observed data, so there is no guarantee that these scores are related to important physiological mechanisms. Secondly, FPCA is based on a sophisticated, non-traditional statistical tool that requires a certain mathematical maturity for proper implementation. These challenges can complicate interpretation of the results and may lead to higher diagnostic error rates in generic clinical settings that lack statistical training. Thus, development of a new methodology in this direction should prioritize providing clear clinical interpretation and ensuring a smooth translation of the findings into clinical practice to improve diagnostic performance.