Generalized Quasi-Likelihood Ratio Tests for Semiparametric Analysis of Covariance Models in Longitudinal Data

ABSTRACT We model generalized longitudinal data from multiple treatment groups by a class of semiparametric analysis of covariance models, which take into account the parametric effects of time dependent covariates and the nonparametric time effects. In these models, the treatment effects are represented by nonparametric functions of time and we propose a generalized quasi-likelihood ratio test procedure to test if these functions are identical. Our estimation procedure is based on profile estimating equations combined with local linear smoothers. We find that the much celebrated Wilks phenomenon which is well established for independent data still holds for longitudinal data if a working independence correlation structure is assumed in the test statistic. However, this property does not hold in general, especially when the working variance function is misspecified. Our empirical study also shows that incorporating correlation into the test statistic does not necessarily improve the power of the test. The proposed methods are illustrated with simulation studies and a real application from opioid dependence treatments. Supplementary materials for this article are available online.


Introduction
In a longitudinal study of comparing efficacy of different treatments, suppose that there are q treatment groups. Let Y k,i (t ) be the primary variable of interest observed at time t for the ith subject receiving the kth treatment, and let μ k,i (t ) be the expected value of Y k,i (t ). We assume that g{μ k,i (t )} = X T k,i (t )β + θ k (t ), k = 1, . . . , q, i = 1, . . . , n k , where g(·) is a known link function, X k,i (·) is a p-dim covariate vector that may be time dependent, β is a p-dim regression coefficient, θ k (·) is a nonparametric function, and n k is the number of subjects in the kth treatment group. Denote n = q k=1 n k as the total number of subjects in the study. Our primary interest is to compare the treatment effects, which are represented by θ k 's. Specifically, we want to test the following hypotheses: H 0 : θ 1 = · · · = θ q versus H 1 : not all θ k 's are the same. (2) In the context of our application to be detailed in Section 6, we consider two opioid agonist treatments: physician management and physician management plus cognitive behavioral therapy. The response variable Y k,i (t ) for the ith subject in the kth treatment group is a binary indicator of opioid use at day t in a followup period and the associated covariate vector X k,i contains baseline information (e.g., age, gender, and education) for the same subject. We are interested in studying how the two treatments affect one's opioid use behavior over time differently, after having accounted for effects due to the baseline covariates X k,i .
To answer this question, we will test hypotheses in the form of (2). We do not require any parametric form for θ k 's to maintain modeling flexibility. The generalized likelihood ratio (GLR) test proposed by Fan, Zhang, and Zhang (2001) and Fan and Jiang (2005) is a very general approach to test various nonparametric hypotheses under varying coefficient and additive models. Some recent developments and reviews on GLR tests include Fan and Jiang (2007), Li and Liang (2008), and González-Manteiga and Crujeiras (2013). All these articles studied independent Gaussian data. In our setting, however, the responses within the same subject are longitudinal outcomes and can, therefore, be strongly correlated. It remains unclear how the GLR test can be extended to dependent data. In this article, we fill in this gap in literature by developing a generalized quasi-likelihood ratio (GQLR) test for (2), based on longitudinal data obtained from multiple treatment groups.
There is also a vast volume of work on semiparametric modeling of longitudinal data. Our work is most related to the literature on marginal semiparametric regression methods, including Carroll (2001, 2006) and Wang, Carroll, and Lin (2005). However, most of these articles do not model data from multiple treatment groups and focus on estimation rather than testing hypotheses on the nonparametric components. Since the treatment effects are represented by nonparametric functions, model (1) is also related to functional ANOVA models, see, for example, Brumback and Rice (1998), Morris and Carroll (2006), and more recently Zhou et al. (2011). A common approach in these articles is to first express each trajectory as a linear combination of some basis functions and then treat the associated coefficients as random effects in a linear mixed model. Such a hierarchical modeling approach usually requires a relatively large number of repeated measures in each subject as well as many random effects in the model to fully capture the features in each longitudinal trajectory. In contrast, our method does not require many repeated measures and is applicable to the so-called sparse functional data (Yao, Müller, and Wang 2005). More importantly, existing methods only produce pointwise confidence intervals for θ k 's, which cannot replace a rigorous test as we are about to propose.
The rest of the article is organized in the following way. In Section 2, we describe estimation procedures under both the null and alternative hypotheses, and study their asymptotic properties. In Section 3, we propose the new GQLR test and study its null distribution and local power. In Section 4, we discuss implementation issues such as variance and correlation estimation, approximating the null distribution using bootstrap and bandwidth selection. We illustrate the methodology through simulations in Section 5 and an application on opioid dependence treatment in Section 6. Some concluding remarks are provided in Section 7. Technical proofs and additional simulation results are contained in the supplementary material.

Estimation and Preliminary Results
Although model (1) is defined in continuum, in real life we only observe the response variable at discrete time points within a study period T . Let T k,i = (T k,i1 , . . . , T k,im k,i ) T be the observation time on the ith subject in the kth treatment group, where m k,i is a positive integer. These time points are usually irregular and subject specific. . . . , m k,i . Suppose that the conditional variance of the response is var(Y k,i j |X k,i j , T k,i j ) = σ 2 (μ k,i j ), where σ 2 (·) is a conditional variance function. All subjects are assumed to be independent, but the observations within a subject are correlated.

Estimation Under Both Null and Alternative Hypotheses
We refer to the model under the null hypothesis in (2) as the reduced model and that under the alternative hypothesis as the full model. Denote β R and θ R (t ) as the estimators under the reduced model and β F and θ F,k (t ), k = 1, . . . , q, as the estimators under the full model. Our estimation procedures under both models are based on profile-kernel estimating equations. We first consider estimation under the reduced model, where there is only one treatment group. Based on the Taylor's expansion, for any T k,i j in a neighborhood h of t, θ (T k,i j ) can be approximated locally by a linear polynomial Let K(·) be a symmetric probability density function and For a given β, θ (t ) is estimated by solving the following local linear estimating equation regarding α = (α 0 , α 1 ) T , where and W k,i is a weight matrix to be specified below. The local linear estimator is given by θ R (t; β) = α 0 , where ( α 0 , α 1 ) is the solution of (3). Then β R is obtained by solving At convergence of the algorithm described above, the nonparametric estimator needs a final update as θ R (t ) = θ R (t, β R ). As shown by Lin and Carroll (2001), the most efficient estimators within the class defined by (3) are obtained by set- is a working variance function. Similar kernel estimators are widely used in longitudinal and functional data analysis, see Fan and Li (2004), Yao, Müller, and Wang (2005), Hall, Müller, and Wang (2006), Hall, Müller, and Yao (2008), and Li and Hsing (2010). The working variance function ω(·) can be replaced by a nonparametric variance estimator described in Section 4.1.
We now consider estimation under the full model, where we need to estimate θ k (·) using the kth treatment group only. For a given β, the profile local linear estimator for θ k (t ) is given by To obtain β F , we will again solve an estimating equation by pooling all treatment groups together The nonparametric components are then updated as θ F,k (t ) = θ F,k (t, β F ), for k = 1, . . . , q.

Asymptotic Properties of the Estimators
Before proposing a test procedure for the hypotheses in (2), we first investigate the asymptotic properties of the profile-kernel estimators of β and θ (t ) under both the full and reduced models. Denote the true parameters as β 0 and θ k0 (t ), k = 1, . . . , q. Under the reduced model, θ 10 = · · · = θ q0 ≡ θ 0 . For ease of exposition, we assume that the number of observations per subject is a constant in our theoretical derivations, that is m k,i = m < ∞ for all k and i. For situations where the numbers of repeated measurements are unequal, a common practice is to model m k,i as independent realizations of a positive random variable m, and essentially the same results can be derived. We assume that the observation times T k,i j are independent random variables on a compact interval T , with a density f (t ) > 0 for all t ∈ T . We assume that there exist constants 0 < ρ 1 , . . . , ρ q < 1 such that q k=1 ρ k = 1 and n k /n − ρ k = o(n −1/2 ). Our problem is fundamentally different from those in the GLR test literature because of the presence of within-subject correlation. Define the errors as ε k,i j = Y k,i j − g −1 {X T k,i j β 0 + θ k0 (T k,i j )} and the error vector within the (k, i)th subject as ε k,i = (ε k,i1 , . . . , ε k,im ) T . We consider X k,i j and ε k,i j as discrete realizations of the continuous longitudinal processes X (t ) and ε(t ), t ∈ T . Define the conditional variance and correlation functions of ε(t ) as where τ is an unknown correlation parameter. Many authors modeled the variance function σ 2 (·) as a nonparametric function but the correlation function R(s, t; τ ) as a member of a parametric family, such as the autoregressive (AR) or autoregressive and moving-average (ARMA) correlations (see, Fan, Huang, andLi 2007, andWu 2008). The within-subject covariance matrix is where S k,i = diag{σ (μ k,i )} and R k,i (τ) = {R(T k,i j , T k,i j ; τ)} m j, j =1 . We allow the covariance to depend on the mean and the parameters β and θ (·), since this is usually the case for generalized longitudinal data, for example binary data.
In additional to the asymptotic framework described above, we make the following assumptions for the asymptotic theory.
(C.1) Assume that the true mean functions θ k0 (·), k = 1, . . . , q, are smooth and twice continuously differentiable. Using the shorthand notation μ k (X, Assume all functions defined above are Lipschitz continuous. (C.2) The kernel density function K(·) is a symmetric continuous function with mean 0 and unit variance, that is Let X k,i be an m × p matrix with the jth row being X k,i j − μ X,k (T k,i j ), where μ X,k (·) is defined in condition (C.1). For ease of exposition, we often suppress the index i, and write k , k , W k , and X k as a generic version of these quantities for a typical subject receiving the kth treatment. When the null hypothesis in (2) is true, k ≡ , k ≡ , W k ≡ W, and X k ≡ X for k = 1, . . . , q. The following proposition provides asymptotic expansions of the estimators under the reduced model. Proposition 1. Suppose the null hypothesis in (2) is true and the asymptotic framework described above and conditions (C.1)-(C.3) hold. Then the parametric estimator has the following asymptotic expansion where The nonparametric estimator has the following asymptotic expansion where When the null hypothesis is true, Model (1) reduces to a generalized partially linear model, the asymptotic results in Proposition 1 are standard (Lin and Carroll 2001;Li 2004, andWang, Carroll, andLin 2005) and hence, the proof is omitted. With similar arguments, one can easily show the following results regarding the estimators under the full model. where The asymptotic results in Proposition 2 are derived in a broader setting allowing θ k (t ) to be different. When the null hypothesis (2) holds, one can easily see that the first order expansions of β R and β F are identical, and hence,

Test Procedure and Null Distribution
We now direct our focus back to testing the hypotheses in (2). In this section, we introduce our test procedure based on a quasilikelihood function, and study the asymptotic distribution of the test statistic under the null hypothesis.
A quasi-likelihood function (McCullagh and Nelder 1989) Q satisfies where Y is an m-vector of response within a subject, μ = g −1 {Xβ + θ (T )} is the conditional mean vector and V(μ) is a working covariance matrix not necessarily the same as the true covariance (μ). Define the quasi-likelihood function for the data as The GQLR test statistic is defined as the difference of the quasilikelihoods under the full and reduced models Under the null hypothesis, let and V be generic copies of k,i and V k,i in a typical subject. These matrices are random because the within-subject correlation might depend on the subject-specific time vector T and the conditional variance might depend on the conditional mean vector μ = g −1 {Xβ 0 + θ 0 (T )}. Denote σ j , ν j , and j as the ( j, )th entry of , V −1 and , respectively, and they are considered random as well. Define Denote ν K = K 2 (t )dt and K * K as the convolution of the ker- The following theorem provides the asymptotic distribution of the GQLR test statistic under the null hypothesis, where the proof is provided in Section W.1 of the supplementary material.
Theorem 1. Under the asymptotic framework described in Section 2.2 and conditions (C.1)-(C.3), we assume that B 2 (t ) -B 5 (t ) defined in (16) exist and are Lipschitz continuous in t. In addition, assume that nh 5 → 0, then under the null hypothesis in (2) Remark 1. The asymptotic normal distribution in Theorem 1 is derived using the central limit theorem for generalized quadratic forms developed by de Jong (1987), which is the same theoretical tool that is used to develop the asymptotic distribution of the GLR test for independent Gaussian data. Compared with the GLR test, the main theoretical challenges in our problem are: 1) the data are correlated; 2) the covariance matrices k,i and the local expansion of the link function k,i are random and subject specific; 3) we allow the working covariances W k,i and V k,i to be misspecified. To be more specific on the second challenge, k,i and k,i depend on the mean vector μ k,i and the withinsubject correlation may depend on the subject specific time vector T k,i . As a result, our proof is more involved when calculating the asymptotic mean and variance of the test statistic.
Remark 2. For nonparametric hypothesis testing in independent data, Fan, Zhang, and Zhang (2001) established a property called the Wilks phenomenon for the GLR test, that is the asymptotic distribution of the test statistic under the null hypothesis does not depend on the unknown true parameters. Indeed, when the likelihood function is used and correctly specified, this property holds for a wide range of problems. However, for generalized longitudinal data, working covariance matrices, generalized estimating equations and quasi-likelihoods are commonly used, and in many situations these models do not need to be correctly specified. When the variance (covariance) of ε depends on the mean, and if V, W, and are different, B 1 (t ) -B 5 (t ) and thus, the asymptotic distribution of λ n (H 0 ) in Theorem 1 depend on the true parameters β 0 and θ 0 (t ). In this case, the Wilks phenomenon does not hold in general for the test statistic defined in (15).
Next we provide two corollaries of Theorem 1, where the asymptotic distribution of λ n (H 0 ) does not depend on β 0 and θ 0 (t ). Recall that the true covariance matrix has the structure as in (8) and denote d = S 2 as the diagonal variance matrix. We now investigate the situation when the variance function is correctly specified for both the estimation and test procedures but the working correlation for the test is misspecified. In other words, we assume W = d and V = SC(τ )S, where C(τ) is a working correlation matrix that may be different from the true correlation matrix R(τ).
The asymptotic null distribution of the test statistic under this special case is given below.  (2) is The functions B j † , j = 2, . . . , 5, are defined in (17).
From the expressions of B j † (t ), j = 2, . . . , 5, it is easy to see that they do not depend on β 0 and θ 0 . It follows that the asymptotic distribution of λ n (H 0 ) does not depend on the true values of these parameters when the variance is correctly specified but the correlation is misspecified. However, the asymptotic distribution of λ n (H 0 ) depends on the true correlation function R(τ ) which is generally unknown. Thus, it is difficult to simulate the asymptotic distribution in Corollary 1.
We now consider another important special case where working independence is assumed for both estimation and hypothesis testing.
and |T | is the length of the time domain.
Corollary 2 implies that, if a working independence covariance is used in both estimation and hypothesis testing and if the true variance function is used, the asymptotic distribution of λ n (H 0 ) does not depend on β 0 , θ 0 (t ) or the true correlation structure R(τ ). This Wilks result makes it easy to assess the distribution of λ n (H 0 ) using bootstrap, see our discussions in Section 4.2.

Power of the Generalized Quasi-Likelihood Ratio Test
To study the local power of the GQLR test, we consider a contiguous alternative hypothesis (18) where G kn (t ) are twice continuously differentiable smooth functions with sup t∈T G kn (t ) → 0 as n → ∞.
Consider the test statistic in (15), and call it λ n (H 1n ) instead. The following theorem gives the asymptotic distribution of the test statistic under the local alternative (18) where the proof is given the supplementary material.
Theorem 2. Suppose that assumptions (C.1) -(C.3) and the local alternative (18) hold, nh 5 → 0, and the functions G kn (t )'s are twice continuously differentiable. Denote μ 1n = 1 2 q k=1 Then the test statistic has the following limiting distribution k,i G kn (T k,i )} and μ n and σ 2 n are as defined in Theorem 1.
An approximate level-α test based on the GQLR test statistic is ϕ n = I{λ n (H 1n ) − μ n > z α σ n }, where z α is the upper 100 × α percentile of N(0, 1), and we reject the null hypothesis if ϕ n = 1. Let (·) be the cumulative distribution function of N(0, 1). Then the Type II error of the test is where G n = (G 1n , . . . , G qn ) T . Define the class of functions and the maximum probability of Type II errors as Following Ingster (1993a,b,c) and Fan, Zhang, and Zhang (2001), define the minimax rate of a test ϕ with a Type II error β(α, ) as n such that (a) for any > n , α > 0, and β > 0, there exists a constant c such that β(α, c ) ≤ β + o(1); (b) for any sequence * n = o( n ), there exist α > 0 and β > 0 such that for any The following theorem provides the minimax rate for our GQLR test procedure.
The proof of Theorem 3 is provided in the supplementary material. Theorem 3 shows that the GQLR test achieves the minimax optimal rate of Ingster (1993a,b,c).

Estimation of the Variance and Correlation Functions
The proposed estimation and test procedures involve a working variance function and also a working correlation function if working independence is not assumed. In practice, both the working variance and working correlation functions need to be estimated from the data.
When the error process ε(t ) is heteroscedastic, one popular approach in recent literature on functional/longitudinal data analysis is to model the variance as a nonparametric function of the observation time or the mean value (Yao, Müller, and Wang 2005;Fan, Huang, and Li 2007;Li 2011). In what follows, we assume that the variance, denoted as σ 2 (μ), is a smooth function of the mean value, and estimate this function using a local linear smoother where e = (1, 0) T , U k,i j (μ) = {1, ( μ k,i j − μ)/h} T , μ k,i j are pilot estimators of μ k,i j from the full model by setting W k,i to be identity matrices and ε k,i j are the residuals from the pilot estimates. Li (2011) showed that the local linear variance estimator is uniformly consistent to the true variance function and using the local linear variance estimator as the working variance in the estimators defined in Section 2 is asymptotically as efficient as using the true variance function. Using similar arguments, we can show that, if the nonparametric variance estimator σ 2 (μ) is used as the working variance in the GQLR test statistic, the asymptotic distribution of λ n (H 0 ) is the same as if σ 2 (μ) was known.
Assuming that the working covariance matrices have the structure V k,i = S k,i C k,i (τ)S k,i , where C k,i (τ) is the working correlation matrix for the (k, i)th subject with a correlation parameter τ, then τ can be estimated by the quasi-maximum likelihood (QMLE) method of Fan, Huang, and Li (2007). The estimator τ is the maximizer of When the working correlation model is correctly specified, Fan and Wu (2008) showed that τ is root-n consistent to the true correlation parameter.

Evaluating the Null Distribution with Bootstrap
As indicated in Theorem 1, the null distribution of the GQLR test statistic is asymptotically normal. This seems to suggest that we can estimate the various terms in the functions B 1 (t ) -B 5 (t ) and use the normal distribution with the estimated asymptotic mean and variance to decide the rejection region of the test. However, the numerical studies of Härdle and Mammen (1993) show that such a "plug-in" method (i.e., plug-in estimates into the mean and variance of λ n (H 0 )) does not work well even for independent data. See their Figures 1-4 for a comparison between the true density of the test statistic, normal density with estimated mean and variance, and the bootstrap density. The    (t ), and θ 4 (t ) = θ 0 (t ) + 2φG(t ). The left panel contains the power curves for G(t ) = sin(t ) and the right panel contains the power curves for G = sin(3πt ). In each panel, the solid curve is the power under working independence and the dotted curve is the power when the true correlation is used.
weak performance of the plug-in method is due to (1) the distribution of a nonparametric test statistic converges in a rather slow rate and normal approximation may not work well under small sample size; (2) estimators of the asymptotic mean and variance of λ n (H 0 ) also converge in a slow nonparametric rate. In our setting, estimating B 1 (t ) -B 5 (t ) consistently also requires estimating the covariance structure consistently, but in real data analysis the covariance structure is often misspecified. For these reasons, most authors advocate bootstrap procedures for nonparametric tests, for example Härdle and Mammen (1993) and Fan and Jiang (2005).
To estimate the null distribution using bootstrap, we need to generate data that satisfy the null hypothesis and closely resemble the true data. Because the GQLR test is in general lack of the Wilks property, a consistent bootstrap procedure requires the bootstrap samples to have the same correlation structure as the real data. Since the real correlation structure is unknown and often misspecified, such a bootstrap procedure can lead to an inconsistent estimator of the null distribution and hence, an invalid test.
One alternative approach is to assume working independence for both estimation and test as described in Corollary 2, in which case the distribution of λ n (H 0 ) enjoys the Wilks property. Since, under this situation, the asymptotic distribution of λ n (H 0 ) is the same as for independent data, we can use a simple wild bootstrap procedure proposed by Mammen (1993) for independent data. The detailed procedure is given below.
(i) Estimate both the full and reduced models from the original data and estimate the variance function by the nonparametric estimator σ 2 (μ) described in Section 4.1 using residuals from the full model. Evaluate the GQLR test statistic λ n (H 0 ) under working independence. ε * k,i j = ξ k,i j ε k,i j , ε k,i j 's are the residuals from the full model and ξ k,i j 's are independent random perturbation factors with mean 0 and variance 1. (iii) Calculate the GQLR test statistic λ * n (H 0 ) from the bootstrap sample {Y * k,i , X k,i , T k,i } using the exact same procedure for the original data. (iv) Repeat Steps (ii) and (iii) a large number of times to obtain the bootstrap replicates λ * n (H 0 ). The estimated pvalue is the percentage of λ * n (H 0 ) that are greater than λ n (H 0 ). Davidson and Flachaire (2008) advocate to generate the perturbation factor ξ k,i j in Step (ii) from a simple Rademacher distribution, that is P(ξ = 1) = P(ξ = −1) = 1/2, which is also what we use in our numerical studies. Put X = {(X k,i , T k,i ), i = 1, . . . , m k , k = 1, . . . , q}. The following theorem shows the consistency of the bootstrap procedure above. It states that, conditioning on X , λ * n (H 0 ) from the working independent bootstrap procedure above has the same asymptotic normal distribution as λ n (H 0 ) in Corollary 2. The proof of the theorem is provided in the supplementary material.
Theorem 4. Under the same assumptions as Corollary 2, where σ n * , μ n * , and d n * are as defined in Corollary 2.

Bandwidth Selection
For bandwidth selection, we adopt a leave-one-subject-out cross-validation (Rice and Silverman 1991) that is tailored to our data structure. Let h cv be the minimizer of (·) are the full model estimators with bandwidth h by removing the (k, i)th subject from the dataset. It is well-known that cross-validation estimates the optimal bandwidth for estimation, which is of order n −1/5 (Xia and Li 2002). To make the bandwidth follow the optimal order for hypothesis test suggested in Theorem 3, we propose to use h = h cv × n −1/45 . As shown in the empirical studies of Fan and Jiang (2005), the hypothesis test results are quite robust against the choice of h as long as it is in the right order.

Simulation Study
To illustrate the performance of the GQLR procedure and the Wilks phenomenon, we consider two simulation settings: Gaussian longitudinal data with heterogenous variance and binary longitudinal data. To save space, we only present the simulation results on Gaussian data. Results of the binary case can found in the online supplementary material.
We generate data from the following model: where T k,i j are independent random variables following a uniform distribution on [0, 1] and X k,i j = T k,i j + U k,i j , U k,i j ∼ i.i.d. Uniform (−1, 1). Under this setting, the marginal density of (X k,i j , T k,i j ) is the same for all (k, i, j) and E(X k,i j | T k,i j , T k,il ) = E(X k,i j | T k,i j ) for j = l. We assume that the variance of ε k,i j depends on the conditional mean and is given as follows We also assume an ARMA(1,1) correlation structure for the errors within the same subject, that is corr{ε(s), ε(t )} = 1 for s = t and γ exp(−|s − t|/ν) for s = t, and set the correlation parameters at γ = 0.6 and ν = 1.
We simulate 200 datasets for each scenario and perform estimation and test under different working variance and correlation structures.

Wilks Phenomenon When the Variance Function is Consistently Estimated
For each simulated dataset, we first fit the full model in a pilot analysis setting W k,i 's to be identity matrices, and then estimate the variance function by applying the local linear estimator described in Section 4.1 to the residuals. The inverse of the estimated variance function is used as the weight for fitting both the reduced and full models. To construct the GQLR test statistic, we use a Gaussian quasi-likelihood where , and C(τ) is a working correlation matrix. For the working correlation C(τ ), we consider both a misspecified exchangeable working correlation structure (i.e., corr{ε(s), ε(t )} = τ for some −m −1 < τ < 1 if s = t) and working independence. When the exchangeable correlation structure is assumed, the correlation parameter is estimated by the QMLE of Fan, Huang, and Li (2007). The bandwidth for the local linear estimators is chosen using the method described in Section 4.3 on a pilot dataset and is then fixed for both the reduced and full models in all simulations.
The empirical distributions of λ n (H 0 ) under different settings are shown in Figure 1, where the curves are estimated densities using the "density" function in R based on 200 replicates of the test statistic. The left panel of Figure 1 shows the distributions of λ n (H 0 ) when a misspecified exchangeable correlation structure is used, and the right panel shows the same distributions when working independence is adopted. In each panel, the three estimated density functions correspond to the three scenarios in (21). The density functions in the same panel being almost identical confirms our theoretical results that, when the variance function is correctly specified, the asymptotic distribution of λ n (H 0 ) does not depends on the values of β and θ 0 (t ). By comparing the curves between the two panels, we find that the distribution of λ n (H 0 ) depends on the working correlation structure C(τ). These results corroborate our theoretical findings in Corollary 1.
To confirm these findings numerically, we perform Kolmogorov-Smirnov (KS) tests to test if λ n (H 0 ) under different scenarios have the same distribution. When a misspecified exchangeable correlation is assumed, the Kolmogorov-Smirnov p-values are 0.63 for Scenario I versus II, 0.18 for Scenario I versus III, and 0.47 for Scenario II versus III. Similar results are obtained when working independence is used. We also test if λ n (H 0 ) under different working correlation structures have the same distribution and the p-values are all in the magnitude of 10 −16 .

Wilks Phenomenon Fails to Hold Under Variance Misspecification
To better understand the behavior of the GQLR test when the variance is misspecified, we consider the same simulation as above but misspecify the variance as a constant. In other words, we assume σ 2 (μ) = σ 2 and estimate σ 2 using the mean squared error of the residuals in the full model. In Figure 2, we present the estimated densities of λ n (H 0 ) under the misspecified variance structure and working independence correlation. As we can see, there is a visible difference between the distributions of the test statistics under the three scenarios. This difference indicates the failure of the Wilks phenomenon. Again, to confirm this point numerically, we perform KS tests to test if λ n (H 0 ) under different scenarios have the same distribution. The KS p-values are 1.5 × 10 −3 for Scenario I versus II, 1.3 × 10 −4 for Scenario I versus III, and 0.39 for Scenario II versus III.

Power of the GQLR Test
We now study the power of the GQLR test. In particular, we are interested in the effect of working correlation on the power of the test. We focus on the simulation setting described in Scenario I and consider local alternatives with θ 1 (t ) = θ 0 (t ) − 2φG(t ), θ 2 (t ) = θ 0 (t ) − φG(t ), θ 3 (t ) = θ 0 (t ) + φG(t ), and θ 4 (t ) = θ 0 (t ) + 2φG(t ). We consider two cases with G(t ) = sin(t ) or G(t ) = sin(3πt ) and set φ at different values. Obviously, the null hypothesis is true when φ = 0 and as φ increases the model deviates further away from H 0 . The true withinsubject correlation is ARMA(1,1) as described before.
For each value of φ, we generate 200 datasets from the model. We use the local linear variance estimator as the working variance for both estimation and test, and we perform the GQLR test under two different working correlation structures: working independence and the true ARMA(1,1) correlation structure. The issue of estimating the critical value of the test using bootstrap will be addressed in Section 5.4. For the current study, we assume the critical values are consistently estimated and focus on the effect of correlation structure on the power of the test. Specifically, we set the significance level at α = 0.05 and use another independent set of simulations to determine the 95% percentile of λ n (H 0 ) under either working independence or the true correlation structure.
The power curves under the two settings of G(t ) and the two correlation structures are depicted in the top two panels of Figure 3. We can also see from the graphs that the power of the GQLR test gets higher as φ increases in all situations that we consider. Interestingly, the simulation results suggest that using the true correlation in the test does not always increase the power. From panel (a) of Figure 3 with G(t ) = sin(t ), the test based on working independence is more powerful than that using the true correlation, while the opposite is true when G(t ) = sin(3πt ) as suggested by panel (b). It seems that the power of the test rather depends on a complicated interaction between G(t ) and the true correlation, as suggested in Theorem 2. To validate this finding using our theoretical results, we also calculate the theoretical power of the test given in (20). The quantities σ 2 n , σ 2 1n , μ n , and μ 1n are estimated by replacing expectations with sample means. These theoretical power curves also confirm that the working independence test is more powerful when G(t ) = sin(t ) and the opposite is true when G(t ) = sin(3πt ).

Estimating Null Distribution with Wild Bootstrap
Our study on the power of the test in the previous subsection suggests that, even if we consistently estimate the null distribution of the test statistic, using the true correlation structure may not increase the power of the GLQR test. In this subsection, we will focus on estimating the null distribution of λ n (H 0 ) under working independence using the wild bootstrap procedure described in Section 4.2.
We use the data generated for Scenario I and apply the wild bootstrap procedure with a sample of 500 to each of the 200 datasets. We compare the bootstrap distributions with the true distribution of the test statistic in the left panel of Figure 4. The solid curve in the graph is the true distribution of λ n (H 0 ), the dashed curve is the mean of the bootstrap densities from the 200 replicates, and the two dotted curves are the pointwise 1% and 99% percentiles of the bootstrap densities. For comparison, we also plot the asymptotic normal distribution in Corollary 2 as the dash-dot curve in the same plot. As we can see, for a moderate sample size of 50 subjects per group, there is still a large discrepancy between true distribution of λ n (H 0 ) and the asymptotic normal distribution, which indicates a slow convergence to the limiting distribution for this nonparametric test. This result is in agreement with the simulation studies in Härdle and Mammen (1993). The bootstrap distribution does a much better job on approximating the truth, even though it still tends to underestimate the variation of λ n (H 0 ).
To use bootstrap in tests, it is more relevant to check the performance of the bootstrap quantiles. In the right panel of Figure 4, the dark dots indicate the true quantiles Q 1−α of λ n (H 0 ), and the vertical bars are the percentile bands of the bootstrap estimator Q 1−α . For each 1 − α, the lower end of the vertical bar is the 2.5% percentile of the bootstrap quantile Q 1−α , and the upper end of the bar is the 97.5% percentile of Q 1−α . As we can see, the true quantiles are covered by these bands.

Analysis of the Opioid Dependence Treatment Data
We now illustrate the application of our proposed GQLR test to a dataset on opioid agonist treatment. The data were collected from 140 patients who received primary care-based buprenorphine, a commonly prescribed medication for treating opioid dependence, at the Primary Care Center of Yale-New Haven Hospital. Each patient first went through a two-week induction and stabilization period and then was prescribed with daily medication of buprenorphine for 24 weeks. Prior research has shown that adding counseling to buprenorphine can help increase opioid abstinence rate (Amato et al. 2011). The main objective of the study was to investigate the impact of adding cognitive behavioral therapy, which is a counseling intervention for treating a variety of psychiatric conditions and substance use disorders (Crits-Christoph et al. 1999;Beck 2005;Butler et al. 2006;McHugh, Hearon, and Otto 2010), on the efficacy of primary case-based buprenorphine to treat opioid dependence. The patients were randomly assigned to receive one of two treatments: physician management (PM) or physician management and cognitive behavioral therapy (PMCBT). Physician management was provided in the form of 15-to 20-minute sessions by internal medicine physicians who had experience with providing buprenorphine but had no training in cognitive behavioral therapy. These sessions were given weekly for the first two weeks, every two weeks for the next four weeks, and then monthly afterward. Patients in the PMCBT group were offered the additional opportunity to participate in up to 12 50-min weekly cognitive behavioral therapy sessions during the first 12 weeks of treatment. All counseling sessions were given by well-trained masters and doctoral-level clinicians. The main components of counseling focused on developing behavioral skills such as promoting behavioral activation and identifying and coping with opioid craving.
Illicit opioid use was measured weekly by both self-reported frequency of opioid use and urine toxicology testing. The latter was conducted with the use of a semiquantitative homogeneous enzyme immunoassay for opioids and other substances such as cocaine and oxycodone. The accuracy of self-reported opioid use can be questionable. As a result, we considered only the urine data. The time points when the urine testings were done were unbalanced and irregular, because most patients did not provide urine samples on a strict weekly basis. Some of these subjects also had follow-up measurements going up to 195 days. The number of observations per patient was between 1 and 27, with a median of 24. The covariates we used included age, gender (1 = female / 0 = male) and the highest level of education completed (1 = High School or Higher and 0 = otherwise); the time variable was day with the range from day 0 to day 195. The response variable was urine toxicology testing result (1 = positive / 0 = negative).
Among the 140 patients, 69 of them were in the "PM" group and 71 in the "PMCBT" group. We now analyze the data using the proposed model (1), where the link function g(·) is set to be a logistic link. The bandwidth in the estimation is selected using the method described in Section 4.3. The estimated regression coefficients for the covariates are reported in Table 1, where the standard errors of these estimates are calculated using a sandwich formula similar to that on page 1056 of Lin and Carroll (2001). Among the three covariates, education level is significantly related to the response variable. The estimated mean curves for the two groups are presented in Figure 5. At the beginning of the treatment, the mean curves of the two groups were almost the same, while after about 70 days, patients in the 'PMCBT' group had a lower probability of opioid use. Near the end of the treatment, both groups had increased opioid use  rate, but patients with additional cognitive behavioral therapy seemed to have a lower overall rate of use.
Our goal is to test whether the mean curves of the two groups are significantly different after taking into account the covariate effects. There has been some literature on testing the difference between two mean curves for independent Gaussian-type data, such as the methods proposed by Hall and Hart (1990). However, these classic methods are not applicable to this semiparametric setting with a nonlinear link function. Instead, we apply our proposed GQLR test to the data using a working independence binary quasi-likelihood The p-value of the test is 0.044 based on 1000 bootstrap replicates. We, therefore, conclude that there is a significant difference between the two treatment groups. In particular, Figure 5 suggests that cognitive behavioral therapy improved the opioid abstinence over time. Fiellin et al. (2013) analyzed the same data, using self-reported frequency of opioid use and the maximum number of consecutive weeks of abstinence from illicit opioids in the two 12-week periods as the primary outcome measures, but did not find any evidence supporting the benefit of adding cognitive behavioral therapy. Their outcome measures were aggregated and captured only certain features of the data. In contrast, our analysis is based on the entire longitudinal trajectory and hence, is more powerful in detecting the difference between the two treatments.

Summary
We investigate a class of semiparametric analysis of covariance models for generalized longitudinal data, where the treatment effects are represented as nonparametric functions over time and covariates are incorporated to account for the variability caused by confounders. We propose to test the treatment effects using a generalized quasi-likelihood ratio test. Our theoretical study reveals that when the variance structure is correctly specified, the asymptotic distribution of the GQLR test statistic does not depend on the mean parameters but still depends on the true and working correlation structures. When the variance is misspecified, the Wilks property might fail completely in the sense that the distribution of the test statistic depends on all nuisance parameters. In these situations, the bootstrap distribution of the test statistic might be inconsistent, which in turn results in invalid inferences.
However, if working independence is assumed in both estimation and test and if the variance structure is correctly specified, the Wilks phenomenon known to hold for independent data also holds for longitudinal data. Replacing the true variance with a consistent estimator, such as a nonparametric local linear estimator, only causes an asymptotically negligible error to the test. In this case, we propose a working independence wild bootstrap procedure that can consistently estimate the null distribution. We have also shown that the GQLR test yields the minimax optimal power rate.
Our test procedure is based on estimators similar to those in Lin and Carroll (2001). More sophisticated kernel estimators were proposed in Wang, Carroll, and Lin (2005) and Lin and Carroll (2006). It is unclear how much power one can gain by building tests based on these more sophisticated estimators since our proposed test already attains the minimax optimal power rate. Tests based on other smoothers, such as penalized splines (Craineceanu, et al. 2006), or other principles, such as empirical likelihood (Chen and Van Keilegom 2009), can potentially be extended to our problem. These possibility are open questions that call for future investigations.

Supplementary Materials
The online supplementary material contains technical proofs for the theoretical results and additional simulation results.