Hazard Estimation With Bivariate Survival Data and Copula Density Estimation

Bivariate survival function can be expressed as the composition of marginal survival functions and a bivariate copula and, consequently, one may estimate bivariate hazard functions via marginal hazard estimation and copula density estimation. Leveraging on earlier developments on penalized likelihood density and hazard estimation, a nonparametric approach to bivariate hazard estimation is being explored in this article. The new ingredient here is the nonparametric estimation of copula density, a subject of interest by itself, and to accommodate survival data one needs to allow for censoring and truncation in the setting. A simple copularization process is implemented to convert density estimates into copula densities, and a cross-validation scheme is devised for density estimation under censoring and truncation. Empirical performances of the techniques are investigated through simulation studies, and potential applications are illustrated using real-data examples and open-source software.


INTRODUCTION
Hazard estimation using censored lifetime data is among routine tasks in survival analysis. For standard univariate right-censored lifetime data with possible left-truncation and covariate, one may employ, among others, the penalized likelihood approach to the task (Gu 1994(Gu , 1996. In this article, we explore an approach to hazard estimation using paired lifetime data. The approach is fully nonparametric, leveraging on earlier developments on univariate hazard estimation and multivariate density estimation. For the estimation of bivariate survival function, one has the Kaplan-Meier estimate that does not even assume continuity (Dabrowska 1988), and one has the frailty models (see Hougaard 1986;Oakes 1989) that are largely parametric. The approach we explore here sits somewhere in between, in that we do assume the existence of density with sufficient smoothness, but we do not assume any parametric form for the entities involved. A bivariate survival function can be expressed as the composition of univariate survival functions and a bivariate copula, and its estimation naturally decomposes into two phases: (i) univariate hazard estimation, a solved problem; and (ii) copula density estimation, a problem needing new treatment. The extra information carried in bivariate survival models beyond marginal models is the association between the two time axes, which is characterized by the copula density in our setting.
Copula density estimation is by itself a subject of interest, and has seen treatments in the literature via kernel density estimation (Gijbels and Mielniczuk 1990), Bernstein polynomials (Sancetta and Satchell 2004), penalized hierarchical B-splines (Kauermann, Schellhase, and Ruppert 2013), etc. In the setting of bivariate hazard estimation, one needs to accommodate censored/truncated samples, for which none of the existing methods in the literature appears easily amendable. A routine adaptation of penalized likelihood density estimation (Gu and Qiu 1993;Gu and Wang 2003) can be used in the setting, but the resulting density estimates are not copula densities. To make things work, we devise a cross-validation scheme for smoothing parameter selection in density estimation with censored/truncated samples, and we implement a copularization process to convert the initial density estimates into copula densities. Absent censoring/truncation, the method is numerically feasible up to dimensions 4 or 5. In higher dimensions, one may explore conditional independence structures among variables using tools developed by Gu, Jeon, and Lin (2013), and if such structure exists, a copula density in high dimension can often be decomposed into product of copula densities in lower dimensions.
The rest of the article is organized as follows. In Section 2, aspects of bivariate hazard estimation are laid out in some detail, which include likelihoods of censored/truncated lifetime data, the decomposition of likelihood for phased estimation, and the estimation of marginal hazards possibly with covariate. Copula density estimation is discussed in Section 3, concerning the general method on unit cubes [0, 1] d , the handling of censoring/truncation on the unit square [0, 1] 2 , and how one may enforce symmetry when the marginals are interchangeable; some of the treatments require technical details of tensor product cubic splines, which are summarized in an appendix. Simulation studies are presented in Section 4 to assess the empirical performances of the techniques being developed, such as the effectiveness of cross-validation, the effect of copularization on estimation precision, etc. Real-data examples are given in Section 5 to illustrate potential applications of the methods using open-source software embodied in an R package gsscopu. A few remarks in Section 6 conclude the article.

HAZARD ESTIMATION WITH BIVARIATE SURVIVIAL DATA
In this section, we explore hazard estimation using bivariate survival data. Expressing a bivariate survival function as the composition of univariate hazards and a bivariate copula, the task can be decomposed into the estimation of marginal hazard functions and that of the copula density.

MARGINAL HAZARD ESTIMATION
To estimate a univariate log hazard η(t), one may minimize a penalized likelihood functional, where J (η) is a roughness functional such as η (t) 2 dt, and λ is the smoothing parameter to be selected by a cross-validation score; see, for example, Gu (1994). The marginal survival function may also depend on some covariate, say V, thus is of form The hazard e η(t,v) = −∂ log S(t, v)/∂t can then be estimated through a routine extension of (2.3), see Gu (1996). One then may use u ij = S j (X ij , V ij ) and z ij = S j (Z ij , V ij ) in the third term of (2.2) to estimate the copula density. With continuous covariate V i , hazard estimation via the minimization of (2.4) is numerically time-consuming due to the repeated evaluations of integrals X i Z i e η(s,V i ) ds in iterations.
As an alternative, one may minimize a penalized pseudo-likelihood functional where ρ(s, v) is a known positive function and integrals X i Z i η(s, V i )ρ(s, V i )ds can largely be precomputed; see Du and Gu (2009) and Gu (2013), Section 10.4. Among good choices of ρ(s, v) is ρ(s) = eη (s) , whereη minimizes (2.3).
When it can be assumed that S 1 (t) = S 2 (t), the data on the two margins shall be combined for the estimation of the marginal hazard.

COPULA DENSITY ESTIMATION
As seen in Section 2, the bivariate hazard function λ(t 1 , t 2 ) involves the copula density f (u 1 , u 2 ), which is to be estimated using the third term in (2.2). Copula density estimation is a subject of interest by itself, and we now discuss a nonparametric approach to the task.
In general, copula density estimation can be done on unit cubes [0, 1] d for d ≥ 2; forms of (3.1) and (3.2) for general [0, 1] d are straightforward. To evaluate the marginal density f j (u j ) needed in copularization, one has to integrate out the rest of the coordinates from the joint density f (u 1 , . . . , u d ), a numerically costly undertaking, and the need for F −1 j (p) further complicates the matter. An efficient solution to the problem is to compute and store the marginal densities on a grid at fitting time, to be used later in the evaluation of f j (u) and F −1 j (p) at arbitrary points. The grid {ũ i } ⊂ [0, 1] on which a marginal density f (u) is to be computed is taken as the nodes of a Gauss-Legendre quadrature with associated weights w i . For a Gauss-Legendre quadrature {ũ i } on [0, 1], one can partition [0, 1] = ∪ i I i with consecutive intervals I i such that I i ũ i , and withũ i at the center of I i for all I i but the two at the ends. Spreading w i evenly over I i and taking f (u) = f (ũ i ), ∀u ∈ I i , one may approximate the marginal distribution function is piece-wise linear, easy to invert on the fly. To evaluate f (u) at arbitrary point u, one may take four points {v j } 4 j =1 ⊂ {ũ i } from the grid that are the closest to u, and interpolate by fitting a cubic polynomial to (v j , f (v j )). In practice, a Gauss-Legendre quadrature of size 200 appears sufficient for the purpose.
The marginal densities f j (u) resulting from (3.1) should be well-behaving as the samples are marginally U(0, 1), and instead of repeating the costly numerical integration 200 times on the grid points {ũ i }, one may employ Chebyshev interpolation to approximate the marginal densities. In our implementation, f j (u) are evaluated using numerical integration at 10 interpolation points u k = cos((2k + 1)π/20)/2 + 1/2, k = 0, . . . , 9, and a poly- The feasibility of the approach hinges on that of numerical integration on [0, 1] d , which may stretch up to d = 5. When conditional independence structures exist among the marginals, however, a high-dimensional copula density can often be decomposed into a product of lower dimensional ones, as will be seen in the example of Section 5.1.

DENSITY ESTIMATION UNDER CENSORING AND TRUNCATION ON [0, 1] 2
in the third term of (2.2), and dropping entries that do not involve f (u 1 , u 2 ), (3.1) becomes Tensor product cubic splines will be used for η in (3.3), and we shall discuss key issues in its implementation. Pertinent technical details concerning tensor product cubic splines can be found in Appendix A, which we will quote where needed. The minimization of (3.1) or (3.3) is implicitly over η in a reproducing kernel Hilbert space H ⊆ {η : J (η) < ∞}. One has a tensor sum decomposition H = N J ⊕ H J , where N J = {η : J (η) = 0} and H J has J (η) as its square norm. The reproducing kernel The minimizerη of (3.1) in H is numerically intractable. Instead, one may calculate the minimizerη * in a finite dimensional space It can be shown that, for q n 2/9+ , ∀ > 0 with tensor product cubic splines,η * shares the same asymptotic convergence rates aŝ η; see, for example, Gu and Qiu (1993) and Gu and Wang (2003). Convergence rates are yet to be established for the minimizer of (3.3), but it appears adequate to adopt the same procedure for the numerical implementation of (3.3). The computation ofη * is of order O(nq 2 ).

Newton Iteration. A function in
Differentiating with respect to c, one has the gradient g and Hessian H, . Fixing the smoothing parameters consisting of λ and the θ 's hidden in R J (see (A.1)), one may employ a standard Newton iteration to minimize (3.5).

Cross-Validation.
With varying smoothing parameters, the minimizerη * = η λ of (3.3), where λ represents both the λ in front of J (η) and the θ 's hidden therein, provides a collection of estimates to choose from, and the proper selection of smoothing parameters is crucial in practical estimation. A cross-validation scheme was developed earlier for use with (3.1), which aimed to minimize KL(η, e η is the true density, f λ = e η λ / X e η λ is the estimate, and μ η (g) = X ge η / X e η ; see, for example, Gu (1993) and Gu and Wang (2003). We shall now adapt the scheme for use with (3.3). Write . As a crude approximation, one may substitute log λ for log X i e η λ . Putting thing together, one may select smoothing parameters via the minimization of a cross-validation score of the following form, To make this work, one needs some η [i] λ that are analytically tractable, and to this end, let us consider the quadratic approximation of (3.5) atη = η λ , whereμ i andṼ are μ i and V appearing in (3.6) evaluated atη =c T ξ . The minimizer of (3.8) is clearlyc, which satisfiesc =H −1 (n −1 n i=1μi +Ṽc), whereH =Ṽ + λQ. Now consider a "delete-one" version of (3.8), whose minimizer is given by Summing up, one may select smoothing parameters via the minimization of for α = 1. V (λ) is to be evaluated at the convergence of Newton iteration forη = η λ . Absent censoring and truncation, (3.10) reduces to the cross-validation score derived earlier for use with (3.1), to be found in, for example, Gu (1993) and Gu and Wang (2003). The empirical performance of (3.10) is assessed in Section 4.1.

Numerical Integration.
As seen above, integrals of form A h e η need to be computed repeatedly for the minimization of (3.3). Absent censoring and truncation, all integrals are over A = [0, 1] d , and one may use methods such as Smolyak cubatures that are more efficient than product quadratures; see Gu and Wang (2003) and Section 3.4 for discussions. With the varying A = X i , i under censoring and truncation on [0, 1] 2 , however, a product quadrature allows one to use the same set of nodes for all A, and hence is a more efficient choice.
For the 2-D integrals involved in (3.3) with varying A ⊆ [0, 1] 2 , one may use the product of a Gauss-Legendre quadrature {ũ i } ⊂ [0, 1] with weights w i . The same set of nodes {(ũ i ,ũ j )} ⊂ [0, 1] 2 are used for all 2-D integrals, on which ξ j (u 1 , u 2 ) are to be computed and stored, but the associated weights w ij vary with A. Using the partition [0, 1] = ∪ i I i discussed in Section 3.1, one has [0, 1] 2 = ∪ i,j (I i × I j ), and the cubature weights for A h e η depend on the amount of overlap A has with R ij = I i × I j , . Integrals over 1-D i can be handled similarly.

ESTIMATION OF SYMMETRIC DENSITY
When the marginals are exchangeable, the copula density should be symmetric, or invariant under permutations of the coordinates. The standard construction of tensor product cubic splines needs to be modified to enforce symmetry.

MISCELLANEOUS
Apart from the copularization afterward, copula density estimation via a general form of (3.1) is simply the method developed by Gu and Qiu (1993) and Gu and Wang (2003) applied on domain X = [0, 1] d . Special features of copula density warrant customized implementation, however, which we discuss below.
As noted by Gu and Wang (2003), efficient numerical integration on [0, 1] d can be facilitated by Smolyak cubatures, which achieve efficiency by thinning out nodes from product quadratures. Distributional data are often dense in the middle and sparse on the edges of the domain, but nodes of Smolyak cubatures are dense near the edges and sparse in the middle. To make things work in the stock implementation, univariate marginal density estimates were used to rescale the axes to spread out the data more evenly (Gu and Wang 2003). With samples from copula density, the marginal distributions are known to be U(0, 1), so axis scaling becomes unnecessary.
Using tensor product splines in the general form of (3.1), one has a functional ANOVA decomposition built in in the log density η. Selective elimination of ANOVA terms may imply conditional independence structures, and the exclusion of higher order terms may help to ease the curse of dimensionality. For copula densities, the marginals are more homogeneous, and instead of the model formulas used in the stock implementation, model complexity can be controlled globally by the highest order interactions allowed; to enforce conditional independence when necessary, one simply excludes interactions involving selected pairs of marginals.
To minimize the cross-validation score with respect to λ in front of J (η) and θ 's hidden therein, one may perform two passes of fixed-θ optimization to obtain initial values of θ 's, then use quasi-Newton iteration to update θ 's; see Gu (2013), Appendix A. The initial values of θ 's prove to be effective, often leaving only "20% of performance" to be picked up by the time-consuming quasi-Newton iteration. One typically has a large number of θ 's to select when d > 2, and it is prudent to skip quasi-Newton iteration by default in the situation.

SIMULATION STUDIES
We now present simulation studies of limited scales to assess the empirical performances of the techniques developed in earlier sections. Copula samples were generated from the Frank copula, which has a distribution function of form Random number generation and the evaluation of true copula density are performed using utilities supplied in the R package copula (Hofert et al. 2014).

CROSS-VALIDATION FOR DENSITY ESTIMATION UNDER CENSORING/TRUNCATION
To assess the effectiveness of V (λ) in (3.10) for use with (3.3), samples (x i1 , x i2 ) were generated from the 2-D Frank copula with θ = 4. The samples are subject to left censoring at c ij ∼ Beta(1, 9) and right truncation at z ij = 0.9, 1, j = 1, 2, with u ij = max(x ij , c ij ) < z ij and δ ij = I [x ij ≥c ij ] to be used in (3.3); note that u = S(t) turns right-censoring/lefttruncation on the t axis into left-censoring/right-truncation on the u axis. Despite censoring and truncation, we shall still use the standard Kullback-Leibler loss to measure the performance of f λ ∝ e η λ as an estimate of f , , which is a function of smoothing parameters given data.
Samples of size n = 100 were used to estimate the density f (u 1 , u 2 ) ∝ e η(u 1 ,u 2 ) , with Results from 100 replicates are summarized in Figure 1. In the left frame, L(λ v ) with α = 1 in (3.10) are compared with L(λ v ) with α = 1.4, and it is clear that α = 1.4 is the better choice; this is consistent with similar empirical results for density estimation without censoring and truncation, found in, for example, Gu and Wang (2003). The relative efficacy of cross-validation is depicted as box plots of L(λ o )/L(λ v ) in the center frame. The test density is symmetric, and as seen in the right frame, enforcing symmetry does seem to deliver better performance in general.
In the 100 replicates represented in Figure 1, the number of exact data i γ i0 = i δ i1 δ i2 was in the range [56,80] with mean 67.49, the numbers of singly censored data ) were in ranges [6,21], [4,20], respectively, with means 12.31, 12.58, and the number of doubly censored data i γ i3 = i (1 − δ i1 )(1 − δ i2 ) was in the range [1, 16] with mean 7.62. Note that the estimates involved in the above discussion are the "raw" minimizers of (3.3), not the copularized version. Cross-validation is designed to assist the execution of (3.3), but is unable to foresee the effect of copularization.

COPULARIZATION
To see how copularization may affect the accuracy of density estimation, we now compare the performance of the "raw" minimizers of (3.1) or (3.3) with that of their copularized version. Simulations were conducted in dimensions d = 2, 3, 4, with samples generated from the Frank copula with θ = 4, and with the default α = 1.4 in cross-validation.
For d = 2, results are obtained from the same set of replicates represented in Figure 1 shown in the right frame, but the symmetric fit was not calculated; as noted in Section 3.3, symmetric fit is computationally impractical for d > 3.
It is clear that copularization improves the accuracy of density estimation in our experiments, where the true density is a copula density. There, however, are a few points above the 45 • line in Figure 2, albeit barely, thus absolute improvement is not guaranteed.

MODEL COMPLEXITY
We now take a look at the effect of model complexity on estimation accuracy and execution time. Calculating estimates without the three-way interaction on [0, 1] 3 using the replicates represented in the center frame of Figure 2, we can compare the performance of order 2 fits with that of full order fits as in the left frame of Figure 3; the fits being compared are copularized so the horizontal axis in the left frame of Figure 3 reproduces the verticle axis in the center frame of Figure 2. Similarly, copularized fits with only two-way interactions were calculated on [0, 1] 4 and their performance is compared in the center frame of Figure 3 with that of fits with full interactions, using the replicates represented in the right frame of Figure 2. The right frame of Figure 3 casts the results of the left frame from a different perspective, comparing the asymmetric fits with the same-data symmetric fits. The true copula density contains full interactions and it is symmetric, and it is clear that full interactions and symmetry both yield more estimation accuracy.   Most of the numerical load is on the evaluation of R J (v j , x)'s (see (3.4)) on quadrature nodes, in the fitting step and in the copularization step, so the execution time is largely determined by the quadrature sizes and the number of terms involved in R J (v j , x). For asymmetric fits, the number of terms involved in R J (v j , x) is simply the number of θ 's; for symmetric fits, fewer θ 's are attached to combined terms but the work load for each evaluation remains the same. Listed in Table 1 are the quadrature sizes and the number of terms in R J (v j , x) for the fits represented in Figure 3, along with the respective total execution times for 100 replicates. Note that for the symmetric fits on [0, 1] 3 , the number of terms of R J (v j , x) are in parentheses which is different from the number of θ 's, and each term needs to be evaluated 3! = 6 times. Also, the sizes of {v j } used in the simulations are q = 33 on [0, 1] 3 and q = 40 on [0, 1] 4 , the default values given by q = 10n 2/9 , which are in turn based on the simulation results of Gu and Wang (2003).
On (t 1 , t 1 + dt 1 ) × (t 2 , t 2 + dt 2 ), one may measure the estimation accuracy of infinitesimal failure probability p = λ(t 1 , t 2 , v)dt 1 dt 2 = P T 1 ∈ (t 1 , t 1 + dt 1 ), T 2 ∈ (t 2 , t 2 + dt 2 )|T 1 > t 1 , T 2 > t 2 , V = v) byp =λ(t 1 , t 2 , v)dt 1 dt 2 via the Kullback-Leibler discrepancy in a Bernoulli setting, Accumulating over the empirical at-risk processes ( which will be used to measure the estimation accuracy of λ(t 1 , t 2 , v) byλ(t 1 , t 2 , v). Samples of size n = 150 were generated, with which four cross-validated estimates were calculated. Two of the fits assumed S 1 (t) = S 2 (t) and C(u 1 , u 2 ) = C(u 2 , u 1 ), with the marginal hazard estimated via (2.4) or (2.5), respectively; the other two were parallel but did not assume symmetry. Due to the phased estimation of marginal hazards and copula density, the optimal bivariate hazard estimate is not available even in simulations, but one may locate the next best thing, minimizing the univariate version of KL(λ,λ) for smoothing parameter selection in marginal hazard estimation, then, based on the resulting (u ij , z ij ), calculating the optimal copula density estimate as in Section 4.1; this yielded four pseudooptimal estimates, against which one may assess the relative efficacy of the cross-validated fits.
Results from 100 replicates are summarized in Figure 4: the left frame compares the penalized likelihood of (2.4) with the penalized pseudo-likelihood of (2.5), the center frame compares symmetric fits with asymmetric ones, and the right frame depicts the relative efficacy KL(λ,λ o )/KL(λ,λ v ) in boxplots, whereλ v denotes cross-validated hazard estimate andλ o denotes the pseudo-optimal one. In theory, (2.4) should deliver better performance in general, but in our particular simulation setting, the comparison is more of a toss-up or even slightly in favor of (2.5). The total execution times over the 100 replicates were 41,635 versus 33,753 CPU sec for symmetric fits and 47,272 versus 41,112 CPU sec for asymmetric fits, all in favor of (2.5) as expected; the relative saving of (2.5) over (2.4) is, however, much less dramatic than in univariate hazard estimation, due to the dominant numerical "overhead" of copula density estimation.

EXAMPLES
The techniques developed in Sections 2 and 3 have been implemented in an R package gsscopu, which we shall now employ to illustrate potential applications using real-data examples; as a niche extension of the gss package, gsscopu uses some gss functions as building blocks and inherits some utilities and syntax.
lab <fit$terms$label[-(1:6)] rat <project(fit,lab,drop1=TRUE)$ratio lab[order (-rat)] Projecting the fit to a space containing only interactions involving FCHI, project(fit,lab[order(-rat)[1:5]])$ratio one loses only 1% of "entropy;" see Gu, Jeon, and Lin (2013) for further methodological details. We are led to the same graphical model as in Brechmann and Schepsmeier (2013): given FCHI, the other five indices are conditionally independent. For a copula density, f (u 1 , u 2 ) = f (u 2 |u 1 ) = f (u 1 |u 2 ), so when U 2 ⊥U 3 |U 1 , f (u 1 , u 2 , u 3 ) = f (u 1 , u 2 )f (u 1 , u 3 ). In our case here, the joint density of the indices is simply the product of bivariate copula densities with FCHI as one of the marginals, which can be estimated using facilities in the gsscopu package. For example, a copula density can be fitted to the pair GSPC-FCHI via w15 <as.matrix (w.idx[,c(1,5)]) fit15 <sscopu(w15); fit15.2 <-sscopu2(w15,id.basis=fit15 $id.basis) Repeated calls to sscopu/sscopu2 generally yield different fits due to the random selection of {v j } in (3.4), but one may ensure the same selection by passing along id.basis. Note that sscopu/sscopu2 takes as input a matrix instead of a model formula with variables in a data frame, and the marginal domain [0, 1] needs no specification; sscopu2 implements the techniques of Section 3.2, using a product quadrature instead of Smolyak cubatures used in sscopu. For bivariate fits, one may use method summary to obtain Kendall's τ and Spearman's ρ. summary(fit15); summary(fit15.2) In fitting copula densities to pairs GDAXI-FCHI and FTSE-FCHI, sscopu2 operates smoothly while sscopu does not. Negative weights are introduced in Smolyak cubatures as nodes from product quadratures being thinned out, and this could cause numerical problems in our setting when the true density has high peaks/deep valleys. For d = 2, one may safely use sscopu2, but for d > 2, one may have to increase quadrature sizes via larger qdsz.depth values in the call to sscopu.

DIABETIC RETINOPATHY
For an example of bivariate hazard estimation, consider the diabetic retinopathy data analyzed by Huster, Brookmeyer, and Self (1989). The data involve 197 patients, of whom one eye was randomly selected for treatment and both eyes were followed up until blindness or censoring. The data are found in the diabetes data frame in the R package timereg (Scheike and Zhang 2011), and are reformatted in gsscopu as data frame DiaRet. An initial model is fitted to the data, data(DiaRet) fit0 <-sshzd2d(Surv(time1,status1) ∼ time1*trt1*type, Surv(time2,status2) ∼ time2*trt2*type, data=DiaRet,symmetry=TRUE) where time1/time2, status1/status2 are the follow-up time and censoring status of the left/right eye, trt1/trt2 is treatment indicator with levels 0 and 1, and type is a factor with levels "adult" and "juvenile" indicating patient's age at the onset of diabetes. The eyes are interchangeable in the context so symmetry is enforced, and the two model formulas must match each other but with generally different variable names. The fitted marginal hazards are in fit0$hzd1 and fit0$hzd2, which can be accessed through utility functions of the sshzd suite in the gss package. With symmetry=TRUE in the call to sshzd2d, the common marginal hazard is estimated using combined marginal data, and our marginal log hazard here is of form where u, v represent trt1/trt2 and type. Projecting into a space excluding η tu , η tv , and η tuv , project(fit0$hzd1,inc=c('time1','trt1','type','trt1:type')) one loses only 4% of "entropy," so a proportional hazard model on the margins is in order, fit <-sshzd2d(Surv(time1,status1) ∼ time1+trt1*type, Surv(time2,status2) ∼ time2+trt2*type, data=DiaRet,symmetry=TRUE,id.basis=fit0$id) see Gu (2004) for further details concerning Kullback-Leibler projection. To evaluate the estimated survival function S(t 1 , t 2 ) and hazard function λ(t 1 , t 2 ), which in our example also depend on covariates (u 1 , v 1 ) and (u 2 , v 2 ), one may use something like time <cbind(c(50,70),c(70,70)) cova <data.frame(trt1=as.factor(c(1,1)),trt2=as.factor(c(1,0)), type=as.factor(c("juvenile\del{",}\ins{,"}"adult"))) survexp.sshzd2d(fit,time,cov=cova) hzdrate.sshzd2d (fit,time,cov=cova) The association between the two eyes, which is characterized via the copula estimate in fit$copu, appears to be moderate, with Kendall's τ at 0.25 and Spearman's ρ at 0.37.

summary(fit$copu)
To evaluate the fitted copula density on a grid, say (u i , u j ), u i = (i − 0.5)/10, i = 1, . . . , 10, use u <-((1:10)-.5)/10; uu <cbind(rep(u,10),rep(u,rep(10,10))) dd <matrix(dsscopu(fit$copu,uu),10,10) Instead of the left and right eyes, one may alternatively take the treated and untreated eyes as the margins. In such a setting, treatment cannot be used as a separate covariate, and symmetry no longer holds between the two margins.  Given the random selection of eyes for treatment, the association between the two margins should remain the same, which is indeed the case. summary(fit1$copu)

DISCUSSION
Explored in this article is a nonparametric approach to bivariate hazard estimation and copula density estimation, which are technically related but serve different applications. The techniques are implemented in an R package ready for use by practitioners.
For the direct nonparametric estimation of multivariate density, working with copula data does not seem to offer any benefit yet the constraint a copula density is subject to could be a computational annoyance. Given conditional independence structures, however, a copula density can often be decomposed into product of lower dimensional copula densities, facilitating a divide-and-conquer solution not available on other scales.
For the handling of censoring/truncation, one must use a product quadrature in copula density estimation, thus our approach to bivariate hazard estimation is not extensible to trivariate or beyond computationally. On the other hand, the covariate in the marginal hazards may include random effects (Du and Ma 2010;Gu 2013, sect. 8.3.3), so what we have here seems more than just an alternative to frailty models for multivariate survival.

APPENDIX A: TENSOR PRODUCT CUBIC SPLINES
In this appendix, we review some technical facts concerning tensor product cubic splines. Further details can be found in, for example, Gu (2013, chap. 2).
The minimization of (3.1) or (3.3) is implicitly in a Hilbert space H ⊆ {η : J (η) < ∞} in which J (η) is a square semi-norm with a finite dimensional null space N J = {η : J (η) = 0}. A Hilbert space has a metric and a geometry that facilitate analysis and computation, and a finite dimensional N J prevents interpolation. Function evaluations appear in the log-likelihood, so one also needs the evaluation functional [x]η = η(x) to be continuous in η ∈ H, ∀x ∈ X ; X = [0, 1] 2 in (3.1) and (3.3).
When evaluation functional is continuous in a Hilbert space, the space is known as a reproducing kernel Hilbert space possessing a reproducing kernel R(·, ·), a nonnegative definite bivariate function on X such that R(x, ·) = R(·, x) ∈ H, ∀x ∈ X , and R(x, ·), η(·) = η(x), ∀η ∈ H, for ·, · the inner product in H. A reproducing kernel Hilbert space can be perceived as the "column space" span{R(x, ·), x ∈ X } of its reproducing kernel R, for which any nonnegative definite function qualifies. A general theory of reproducing kernel Hilbert space can be found in Aronszajn (1950), and a tutorial of its use in penalized regression can be found in Nosedal-Sanchez et al. (2012).