Simultaneous Inference for HIV Dynamic Models with Skew-t Distribution Incorporating Mismeasured Covariate and Multiple Treatment Factors

It is a common practice to analyze AIDS longitudinal data using nonlinear mixed-effects (NLME) models with normal distribution for HIV dynamics. Normality of model errors may unrealistically obscure important features of subject variations. To partially explain between- and within-subject variations, covariates are usually introduced in such models; some covariates, however, may be often measured with substantial errors. This article, motivated by an AIDS clinical study, discusses a Bayesian NLME joint modeling approach to viral dynamic models with skew-t distribution in the presence of covariate measurement error. In this model, we fully integrate viral load response, time-varying CD4 covariate with measurement error, and time-dependent drug efficacy, which is a function of multiple treatment factors, into the data analysis. Thus, the purpose of this article is to demonstrate our models and methods with application to an AIDS clinical trial study. The results suggest that modeling HIV dynamics and virologic responses with consideration of covariate measurement error and time-varying clinical factors may be important for HIV/AIDS studies in providing quantitative guidance to better understand the virologic responses to antiretroviral treatment and to help evaluation of clinical trial design in existing therapies.


Introduction
Studies of the HIV dynamics, using biomathematical models, have considerably improved the knowledge of the pathogenesis of HIV infection and guided for the treatment of AIDS patients and evaluation of antiretroviral (ARV) therapies. Viral dynamic studies have a common structure in that they use repeated measurements over the period of treatment to assess rates of changes in viral load. As a result, nonlinear mixed-effects (NLME) models based on biexponential models, derived from a dynamic compartmental analysis, are often used to model the viral load trajectory and to quantify individual heterogeneity among subjects (Ding and Wu 2000). Although long-term treatment in HIV infected patients with highly active ARV therapies (HAART) results in a decrease of plasma HIV-1 RNA (viral load), the decay in viral load occurs in the first few weeks after beginning treatment (Perelson et al. 1996). It may be sustained for a long period, but often is followed by resurgence of viral load within months as observed in clinical trial studies (Acosta et al. 2004). The resurgence of virus may be caused by  (50)).
multiple clinical factors such as drug resistance, poor patient adherence, pharmacokinetic (PK) variation during therapy, and (time-varying) covariates such as CD4 cell count. These general phenomenons on viral load responses observed from AIDS clinical trial studies are displayed in Figure 1(a) in Section 2 for the three representative patients; it motivates us to describe the viral decay (viral load response) associated with (time-varying CD4) covariate, and time-dependent treatment efficacy that is closely related to multiple treatment factors. HIV dynamic models can be formulated through a system of ordinary differential equations (ODE) to describe the interaction between CD4 cells and viruses (Huang, Liu, and Wu 2006;Labbé and Verttoa 2006;Lavielle et al. 2011). Although it is biologically important to directly work with models specified by an ODE system, statistical inference approaches based on such models will pay high computational costs and experience considerable problems with ODE numerical solutions and statistical algorithms. Thus, there is a need to develop flexible models that include the confounding interactions of drug susceptibility, medication adherence, PK variation, and time-varying CD4 covariate on virologic responses. With these considerations, we adopt a biphasic exponential model (Ding and Wu 2000) in this article, which is approximately obtained from an ODE system. This biphasic exponential model reduces not only computational costs significantly, but also captures long-term viral load with complex trajectories and integrates timevarying CD4 covariate as well as time-varying treatment efficacy (a function of drug susceptibility, medication adherence, PK variation) into the model for data analysis.
A large number of statistical modeling methods have been investigated for analyzing longitudinal data with various features. First, the commonly assumed distribution for model random error is normal, but this assumption may lack robustness against departures from normality and/or outliers and may obscure important features of between-and within-subject variations since collected data are often far from symmetric. Thus, statistical inference and analysis with normal assumption may lead to misleading results (Verbeke and Lesaffre 1996;Sahu, Dey, and Branco 2003). In addition, the validity of inferential methods relies on an important requirement that variables are "perfectly" measured. In practice, however, collected longitudinal data are often far from "perfect." Covariate measurement error is such a common feature of longitudinal data, and statistical inference that does not consider covariate measurement error may result in biased results. To the best of our knowledge, there is relatively little work done on simultaneously accounting for skewness and covariate measurement error, which are inherent features of longitudinal data as well as multiple treatment effects, which are determined by drug susceptibility, medication adherence, and PK variation.
It is not clear how asymmetry and covariate measurement error of data may interact with multiple treatment factors and simultaneously influence inferential procedures. This article investigates the effects on inference when both features exist in the longitudinal data and timedependent drug efficacy as a function of multiple treatment factors is incorporated in the models. To achieve our objective, we employ a Bayesian inferencial approach to jointly investigate the NLME model with a skew-t (ST) distribution (Arellano-Valle and Genton 2005; Azzalini and Capitanio 2003;Ho and Lin 2010;Sahu et al. 2003) for the viral load response process, and the nonparametric mixed-effects model with the ST distribution for CD4 covariate measurement error process. This article provides a unified approach to investigate ST Bayesian NLME models with covariate measurement errors and demonstrates the proposed modeling approach implemented in a real application. It is noted that the models and methods introduced in this article are extended from previous work with tailoring as follows: (i) the model errors are assumed to follow an ST distribution instead of a skew-normal distribution, which is a special case of the ST distribution when the degrees of freedom approach infinity; (ii) time-dependent drug efficacy covariate, which is a function of drug susceptibility, medication adherence, and PK, is incorporated into the model to investigate how treatment affects the change of viral load in HIV-infected patients; (iii) the more flexible unknown nonparametric mixed-effects model instead of a standard linear mixedeffects model is conducted to model measurement error for CD4 covariate; (iv) the model development is based on data availability from an AIDS study and, thus, we fully incorporate viral load response, time-varying CD4 covariate with measurement error, and time-dependent drug efficacy, which is a function of three treatment factors, into the data analysis and demonstrate the proposed models and methods implemented using the dataset of an AIDS clinical study.
We consider a multivariate ST distribution introduced by Sahu, Dey, and Branco (2003) which is suitable for a Bayesian inference and is briefly discussed in Appendix A of the online supplementary materials. The remainder of the article is organized as follows. In Section 2, we summarize the motivating dataset from an AIDS clinical trial study and introduce the HIV dynamic model as well as associated time-varying treatment efficacy function. In Section 3, we investigate the NLME joint model with ST distribution for HIV response incorporating CD4 covariate with measurement error and time-varying treatment efficacy, and also discuss associated Bayesian simultaneous inference approach. Section 4 presents modeling and analysis results, and finally we conclude the article with some discussion in Section 5.

Motivating Dataset
AIDS Clinical Trials Group (ACTG) Protocol A5055 was a Phase I/II, randomized, open-label, 24-week comparative study of two regimens of indinavir (IDV) and ritonavir (RTV), plus two nucleoside analogue reverse transcriptase inhibitors (NRTIs) on HIV-infected patients who failed protease inhibitor (PI)-containing ARV therapies (Acosta et al. 2004). Forty-four patients were randomized to one of two regimens: IDV 800 mg + RTV 200 mg twice daily and IDV 400 mg + RTV 400 mg twice daily. Patients were scheduled for follow-up visits at study days 0, 7, 14, 28, 56, 84, 112, 140 and 168. More detailed description of this study and data is given in Acosta et al. (2004). A summary of measurements of data from this study to be used in our analysis is briefly described below.
RNA viral load and CD4 cell count: RNA viral load was measured in copies/mL at designed study days. In this study, there are about 9% of the HIV-1 RNA measures below the detectable limit of 50 copies/mL which are not considered reliable, therefore we simply imputed such values as half of 50 copies/mL (Acosta et al. 2004;Huang, Liu, and Wu 2006). The exact day of viral load measurement (not predefined study day) was used to compute study day in our analysis. Covariates such as CD4 cell count were also measured throughout the study on similar scheme. It was seen that observed data in this study are often far from being "symmetric" displayed by histograms of viral load in natural log-transformation and CD4 cell count (not shown here, but see Figure 1 in Huang and Dagne (2011) where a similar plot was presented); asymmetric patterns of observations of viral load (in log scale) and CD4 cell count usually occur and measurement errors in CD4 cell count often arise. Thus, an asymmetric distribution (such as the ST distribution) should be more appropriate than a symmetric distribution, and statistical analysis must take these data features into account. Figure 1 shows observed viral load (in log scale) and CD4 cell count measurements after the initiation of an ARV treatment for three randomly selected patients. We see that the viral load trajectories in the initial period follow a clear pattern (a rapid initial decay, called firstphase viral decay). After the initial period, however, the viral load trajectories can be quite complicated (a slower decay and some may rebound, called second-phase viral decay).
Phenotypic drug susceptibility: Phenotypic drug susceptibilities were retrospectively determined from baseline samples. Phenotypic determination of ARV drug resistance was performed at baseline and/or at the time of virological failure (viral load rebounds). Some patients had virologic failure and phenotypic susceptibility testing done on samples at the time of failure. For analysis, we used the phenotype marker, the median inhibitory concentration (IC 50 ), which represents the drug concentration necessary to inhibit viral replication by 50%, to quantify agent-specific drug susceptibility (Molla et al. 1996). The baseline (•) and failure time (×) IC 50 from 44 individuals for IDV/RTV drugs are displayed in Figure 2 (upper panel) which were used to construct IC 50 (t). Note that for patients without virological failure, IC 50 (t) was approximately held by a constant with the baseline IC 50 over time. Pharmacokinetic variation: An intensive PK evaluation was performed on day 14. Plasma for intensive PK analysis was obtained at predose, and 0.5, 1, 2, 3, 4, 5, 6, 8, 10, and 12 hr following an observed IDV/RTV dose. PK parameters of IDV and RTV were determined using noncompartmental methods. Calculated PK parameters included maximum (C max ), minimum (C min ) drug concentration, and area under the curve (AUC). Wu et al. (2006) compared these PK parameters as predictors of virological responses and no significant differences were found. Thus, C min displayed in Figure 2 (middle panel) was used in our analysis because it is easily obtained in clinical studies.

Baseline characteristics:
The baseline viral load in ln scale, CD4 cell count, age, and weight were chosen for further correlation analysis based on modeling results. As an example, the baseline viral load and CD4 cell count of 44 individuals are displayed in Figure 2 (lower panel).
Medication adherence: Medication adherence was measured by the use of questionnaires. It was completed by the study participant and/or by a face-to-face interview with study personnel. As an example, the adherence rates over time based on questionnaire data for IDV (dotted stairstep line) and RTV (dashed stairstep line) drugs from the three randomly selected patients are presented in Figure 5 in Section 4.

Time-varying drug efficacy:
We briefly discuss the drug efficacy function with two or more agents. In clinical practice, genotypic or phenotypic tests can be performed to determine the sensitivity of HIV-1 to ARV agents before a treatment regimen is selected. Here, we use the phenotypic marker, IC 50 , to quantify agent-specific drug susceptibility. Because experimental data tracking development of resistance suggest (Molla et al. 1996) that the resistant fraction of the viral population grows exponentially, we propose a ln-linear function to model withinhost changes over time in IC 50 as follows: where S 0 and S r are, respectively, exponential values of IC 50 at baseline and time point t r at which the resistant mutations dominate. In our study, t r is the observed time of virologic failure from clinical studies. Given that IC 50 is measured only at baseline and at the time of treatment failure (Molla et al. 1996), IC 50 (t) remains practical although more complex models for IC 50 (t) can be considered. For patients without a failure time IC 50 , baseline IC 50 was held constant over time. In other words, if S r = S 0 , no new drug resistant mutation is developed during treatment. As an example, such function for two ARV drugs is plotted in Figure 3(a). Poor adherence to a treatment regimen is one of the major causes of treatment failure (Ickovics and Melisler 1997). Patients may occasionally miss doses, may misunderstand prescription instructions, or may miss multiple consecutive doses for various reasons. These deviations from prescribed dosing affect drug exposure in predictable ways (Ickovics and Melisler 1997;Gabrielsson and Weiner 2000). We use the following model to represent medication adherence, where 0 ≤ R < 1, with R indicating the adherence rate for a drug (in our study, we focus on the two PI drugs discussed previously). T k denotes the adherence evaluation time at the kth clinical visit. As an example, Figure 3(b) shows the effect of adherence over time for two ARV drugs.
HAART containing three or more reverse transcriptase inhibitors (RTIs) and protease inhibitors (PIs) has proved to be effective at reducing the amount of virus in the blood and tissues of HIV-infected patients. In the previous viral dynamic studies (Perelson et al. 1996;Ding and Wu 2000;Labbé and Verttoa 2006) investigators assumed that the drug efficacy was constant over treatment time. Drug efficacy may actually vary, however, because the concentrations of ARV drugs and other factors (i.e., emergence of drug-resistant mutations) vary during treatment. Also, patients' viral load may rebound because of drug resistance, nonadherence, and other factors. A simple pharmacodynamic (PD) sigmoidal Emax model for the dose-effect relation follows (Gabrielsson and Weiner 2000) where E max is the maximal effect that can be achieved, C is the drug concentration, and EC 50 is the drug concentration that induced an effect equivalent to 50% of the maximal effect. Many different variations of the E max model have been developed by pharmacologists to model PD effects. More detailed discussions on E max models can be found in Gabrielsson and Weiner (2000) and Huang, Rosenkranz, and Wu (2003). To model the relationship of multiple treatment factors with ARV drug efficacy, we follow Huang, Rosenkranz, and Wu (2003) and Huang, Liu, and Wu (2006) to adopt the following the time-varying drug efficacy for two ARV agents within a class: where γ (t) ranges from 0 to 1; A d (t), C d min , and IC d 50 (t) (d = 1, 2) are the adherence profile, the minimum drug concentration in plasma, and the time-course of median inhibitory concentrations for the two agents, respectively. Note that C min could be replaced by other PK parameters such as AUC and C max . The example of drug efficacy γ (t) for two ARV drugs is shown in Figure 3(c).

HIV Dynamic Models
As discussed previously, viral dynamic models can be formulated through a system of ODE (Huang, Liu, and Wu 2006). Based on biological and clinical arguments, Ding and Wu (2000) proposed the following biphasic exponential model with the constant first-and secondphase viral decay rates to approximately describe HIV viral dynamics: where V (t) is the plasma HIV-1 RNA level (viral load) at time t, the unknown constants λ 1 and λ 2 are called the first-and second-phase viral decay rates, respectively. The first-and second-phase viral decay rates may represent the minimum turnover rate of productively infected cells and that of latently or long-lived infected cells, respectively. It is of particular interest to estimate these two viral decay rates because they quantify the ARV effect, and hence, can be used to assess the efficacy of the ARV treatment.
In estimating these decay rates, only the early viral load data with decreasing patterns have been used due to the feature of model (4) (Ding and Wu 2000). Since the viral load trajectory may change to different shapes in the late stages, it may not be reasonable to assume that the secondphase decay rate λ 2 remains constant during long-term treatment. To model the long-term viral load responses, a biexponential model with a time-varying second-phase decay rate λ 2 (t) can be constructed as follows (Wu and Zhang 2002): In this study, the second-phase decay rate λ 2 (t) (to be discussed in detail below) is assumed to be a function of time-varying CD4 with measurement error and timedependent drug efficacy. Intuitively, model (5) is more reasonable because it assumes that the decay rate can vary with time as a result of drug resistance, pharmacokinetics, medication adherence, and other relevant clinical factors. Therefore, all data observed during treatment period can be used to fit model (5). This is a time-varying parametric model because of the mechanistic (two-exponential) structure with constant parameters (λ 1 , p 1 , p 2 ) and a time-varying parameter λ 2 (t) to capture various viral load trajectories over a long-term period. Actually, by including both viral load and other clinical/covariate data, the estimate of λ 1 can be more accurate and reasonable based on model (5) in comparison with that obtained in previous studies based on model (4) (Ding and Wu 2000) and among others where long-term viral load data are excluded based on some ad hoc rules for modeling and analysis. In the mean time, the estimate of λ 2 (t) provides not only an approximate turnover rate over time of long-lived/latently infected cells at the early stage of treatment as the standard parametric model does, but also more importantly describes how it may change over a long treatment period as driven by, presumably, drug exposure, drug resistance, and other clinical determinants. Most importantly, this model is capable of mod-eling long-term viral load data of which the trajectory may vary substantially among different patients.

Bayesian NLME Joint Model With ST Distribution
Because viral load is measured on each subject repeatedly over the study period, the measurements obtained from the same subject may be correlated, but they are assumed independent between patients. One powerful tool available to handle such longitudinal data is the NLME model, in which both within-subject and between-subject variations are considered. Thus, it is natural to consider the NLME model in conjunction with the HIV dynamic model (5) as follows: where y i j= y i j (t i j ) is the natural log-transformation of the viral load V (t i j ) with model error e i j for the ith subject at the jth time point t i j (i = 1, 2, . . . , n; j = 1, 2, . . . , n i ), z * i j (t i j ) to be discussed below indicates a summary of the true (but unobserved) CD4 values at time t i j , γ i j (t i j ) is drug efficacy specified by Equation (3) for the ith subject at time t i j , β i j = ( p i1 , p i2 , λ i1 , λ i j2 ) T , and β = (β 1 , β 2 , . . . , β 6 ) T are individual and population parameters, respectively, b i = (b 1i , . . . , b 4i ) T are individual random effects. The vector of random errors e i = (e i1 , . . . , e in i ) T follows a multivariate ST distribution, ST n i ,ν 1 (−J (ν 1 )δ e 1 n i , σ 2 1 I n i , δ e I n i ), with degrees of freedom ν 1 , precision parameter σ 2 1 and skewness parameter δ e , where J (ν 1 ) = (ν 1 /π ) 1/2 [ ((ν 1 − 1)/2)/ (ν 1 /2)]. In model (6), we assume that the individual-specific parameter λ i j2 (t i j ) depends on the drug efficacy γ i j (t i j ) and the true (but unobserved) CD4 covariate z * i j (t i j ) rather than the observed CD4 covariate z i j (t i j ) which may be measured with substantial error for capturing long-term viral load trajectories with different shapes including viral load rebound. Note that the log-transformation of the viral load is taken here to stabilize the variation of measurement errors and to speed up the estimation algorithm. Various covariate mixed-effects models were investigated in the literature (Wu 2002;Carroll et al. 2006). Since the CD4 covariate may be measured with substantial error and skewness as discussed previously, validation data are in general needed to address these data features. With CD4 measures collected over time, we may model the CD4 process to partially address the measurement error and skewness; see Huang and Dagne (2011) and Wu (2002) for an example of modeling the covariate process parametrically. However, the CD4 trajectories are often complicated, with no well-established model for the CD4 process. We, thus, adopt a flexible empirical nonparametric mixed-effects model with an ST distribution to address CD4 measurement error and skewness as follows: where w(t i j ) and h i (t i j ) are unknown nonparametric smooth fixed-effects and random-effects functions, respectively; may be viewed as the true (but unobserved) CD4 covariate values at time t i j ; i = ( i1 , . . . , in i ) T follows a multivariate ST distribution with ν 2 degrees of freedom, precision parameter σ 2 2 and skewness parameter δ . To fit model (7), we apply the regression spline method. The working principle is briefly described as follows and more details can be found in Wu and Zhang (2002). The main idea of regression splines is to approximate w(t) and h i (t) by using a linear combination of spline basis functions. For instance, w(t) and h i (t) can be approximated by a linear combination of basis functions p (t) = {ψ 0 (t), ψ 1 (t), . . . , ψ p−1 (t)} T and α = (α 0 , . . . , α p−1 ) T is a p × 1 vector of fixed-effects and a i = (a i0 , . . . , a i,q−1 ) T (q ≤ p) is a q × 1 vector of random effects with a i iid ∼ N q (0, a ). Based on the assumption on h i (t), we can regard a i as iid realizations of a zero-mean random vector. For our model, we consider natural cubic spline bases with percentile-based knots. To select an optimal degree of regression spline and numbers of knots, that is, optimal sizes of p and q, the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) are often applied (Wu and Zhang 2002). Substituting w(t) and h i (t) by their approximations w p (t) and h iq (t), we can approximate model (7) as follows.
In our data analysis, we set ψ 0 (t) = φ 0 (t) = 1 and take the same natural cubic splines in the approximations (8) with q ≤ p (to limit the dimension of random effects). The values of p and q are determined based on the AIC/BIC model selection criteria. The AIC/BIC values are evaluated for various models with ( p, q) = {(1, 1), (2, 1), (2, 2), (3, 1), (3, 2), (3, 3)} which suggest the following ST nonparametric mixed-effects CD4 covariate model: where ψ 1 (·) and ψ 2 (·) are two basis functions given above, α = (α 0 , α 1 , α 2 ) T is a vector of population parameters (fixed-effects), a i = (a i0 , a i1 , a i2 ) T is a vector of random effects. In addition, to avoid too small or large estimates which may be unstable, we standardize the time-varying covariate CD4 cell count and rescale the original time (in days) so that the time scale is between 0 and 1. In a longitudinal study, such as the AIDS study described previously, the longitudinal response and covariate processes are usually connected physically or biologically. Although a simultaneous inference method based on a joint likelihood for the covariate with measurement error and response data may be favorable, the computation associated with the joint likelihood inference in such models for longitudinal data can be extremely intensive and, particularly, may lead to serious convergence problems (Wu 2002). Here we propose a simultaneous Bayesian inference method based on MCMC procedure for the response and covariate models (6) and (10) to estimate all the parameters of the joint models which offers advantages to avoid computational and convergence problems.
Let i1 , β i1 ), . . . , g(t in i , β in i )) T . Following the study by Sahu, Dey, and Branco (2003) and properties of the ST distribution, to specify models (6) and (10) for MCMC computation it can be shown by introducing two n i × 1 random vectors w e i = (w e i1 , . . . , w e in i ) T and w i = (w i1 , . . . , w in i ) T (i = 1, . . . , n) based on the stochastic representation for the ST distribution that y i = (y i1 , . . . , y in i ) T and z i = (z i1 , . . . , z in i ) T with respective random effects b i and a i can be hierarchically formulated as follows (see Appendix A of the online supplementary materials for more details to justify these derivations): where A) denote the n ivariate t distribution with parameters μ, A and degree of freedom ν, I(w > 0) is an indicator function. An important advantage of the above representations based on the hierarchical models is that they allow one to easily implement the method using the freely available WinBUGS software (Lunn et al. 2000) and the computational effort is almost equivalent to the one necessary to fit the models with a standard t-distribution. Our methodology can be widely applied to real problems for longitudinal studies as long as they meet the specifications proposed in this article.
Let the observed data D = {( y i , z i ), i = 1, . . . , n}. Let f (·), f (·|·) and π (·) be a generic density function, a conditional density function, and a prior density function, respectively. One usually assumes that α, β, σ 2 1 , σ 2 2 , a , b , ν 1 , ν 2 , δ e , and δ are independent of each other, that is, π (θ) = π (α)π (β)π (σ 2 1 )π (σ 2 2 ) π ( a )π ( b )π (ν 1 )π (ν 2 )π (δ e )π (δ ). After we specify the models for the observed data and the prior distributions for the unknown model parameters, we can make statistical inferences for the parameters based on their posterior distributions under a Bayesian framework. Thus, the joint posterior density of θ based on the observed data D is given by In general, the integrals in (13) are of high dimension and do not have closed form. Analytic approximations to the integrals may not be sufficiently accurate. Therefore, it is prohibitive to directly calculate the posterior distribution of θ based on the observed data. As an alternative, MCMC procedures can be adopted to sample based on (13) using the Gibbs sampler along with the Metropolis-Hastings (M-H) algorithm using WinBUGS software (Lunn et al. 2000). The program code for ST Model I is available in Appendix B of the online supplementary materials.
In particular, the MCMC scheme for drawing samples based on the posterior distributions for all parameters in the joint models is obtained by iterating between the following two steps: (i) Gibbs sampler is used to update α, β, σ 2 1 , σ 2 2 , a , b , ν 1 , ν 2 , δ , and δ e since the full conditional distributions for these parameters can be obtained explicitly; (ii) we update b i and a i (i = 1, 2, . . . , n) using the M-H algorithm since the full conditional distributions for random-effects parameters cannot be expressed explicitly, but they are proportional to exponential functions (see Huang, Liu, and Wu 2006 in detail). Note that it is not needed to specify the full conditional distributions explicitly or proportional functions of full conditional distributions for parameters to be estimated when the WinBUGS software is used (see the program code in Appendix B of the online supplementary materials for details). Although their derivations by working the complete joint posterior density (13) are straightforward, some cumbersome algebra will be involved. Thus, we omit those here to save space. After collecting the final MCMC samples, we are able to draw statistical inferences for the unknown parameters. Specifically, we are interested in the posterior means and quantiles. See Lunn et al. (2000) and Huang, Liu, and Wu (2006) for detailed discussion of the Bayesian modeling approach and the implementation of the MCMC procedures. Convergence of the generated MCMC samples is assessed using standard tools within the WinBUGS software such as trace plots and Gelman-Rubin (GR) diagnostics (Gelman and Rubin 1992;Ntzoufras 2009). Figure 4 shows the dynamic version of GR diagnostics based on two chains run for Model I as obtained from the WinBUGS software for the representative parameters where the three curves are given: the middle and bottom curves below the dashed horizontal line (indicated by the value one) represent the pooled posterior variance (V , green color) and average within-sample variance (W , blue color), respectively, and the top curve represents their ratio (R, red color). Note that in WinBUGS, these measures of posterior variability are estimated based on the widths of the 80% posterior credible intervals (see Ntzoufras 2009 in detail). It is seen thatR tends to 1, andV and W will stabilize as the number of iterations increases indicating that the algorithm has approached convergence (Gelman and Rubin 1992;Brooks and Gelman 1998;Ntzoufras 2009). With the GR convergence diagnostics observed, we propose that every 20th MCMC sample is retained from the next 200,000 after an initial number of 50,000 burn-in iterations of one chain of length 250,000. Thus, we obtain 10,000 samples of targeted posterior distributions of the unknown parameters for statistical inference. Along with this sampling procedure, we also check the k-lag serial correlation of the samples for each parameter to diagnose independence of the MCMC samples. We graphically checked the last 500 samples drawn from the MCMC sampling scheme for each parameter (plots not shown here) and found that the consecutive samples move randomly toward different directions, which indicates that the MCMC is not "sticky" and MCMC samples are independent for each parameter, suggesting convergence to the stationary distribution.

Results of Model Fitting and Parameter Estimation
Bayesian joint modeling approach is applied to fit the AIDS data and to estimate the parameters in the joint models for the viral load response and CD4 covariate processes. We note that Huang and Dagne (2011) investigated the nonlinear mixed-effects model assuming both model errors and random effects to have the SN distribution and the modeling results based on the SN distribution for random effects indicated no significant difference from those based on the normal distribution for random effects. We also considered the same procedure and found that the modeling results assuming random effects with the ST distribution provided a similar performance to those assuming random effects with the normal distribution. We, thus, focus on the normal distribution for random effects to compare results between the model with the ST distribution for model errors and that with t-distribution for model errors. Along with this consideration, we investigate the following two scenarios. First, since the t-distribution is a special case of the ST distribution when the skewness parameter is zero, we will investigate how the (asymmetric) ST distribution for model error (denoted by Model I) contributes to modeling results and parameter estimation in comparison with the (symmetric) t-distribution for model error (denoted by Model II). Second, we estimate the model parameters by using the "näive" method (NM), which does not separate the measurement errors from the true CD4 values. That is, the "näive" method only uses the observed CD4 values z i j rather than the true (unobserved) CD4 values z * i j in the response model (6). We use it as a comparison to the joint modeling (JM) approach proposed in Section 3. This comparison attempts to investigate how the measurement errors in the CD4 covariate influence modeling results.   (11) for details) based on the property of the ST distribution of "chasing the data" to a larger extent. From the model fitting results, we have seen that, in general, all the models provided a reasonably good fit to the observed data for most patients in our study, although the fitting for a few patients was not completely satisfactory due to unusual patterns of viral load fluctuation for these patients, particularly for Model II. To assess the goodness-of-fit of the proposed models, the diagnosis plots of the observed values versus the fitted values (top panel) and ST and t Q-Q plots (bottom panel) from Models I and II are presented in Figure 6. It is seen from Figure 6 (top panel) that the model where the random error is assumed to have the ST distribution provided better fit to observed data, compared with the model where the random error is assumed to have the t-distribution. This result can be also supported by examining the ST and t Q-Q plots of the residuals (bottom panel) that both plots show the existence of outliers, but it is clearly indicated that Model I only has few negative outliers and thus fits observed data better than Model II. Note that the residual is defined as the posterior mean of the targeted posterior distribution of the difference between the observed value and corresponding predictive value under the Bayesian model. In other words, the residual is calculated by posterior means based on MCMC samples of the targeted posterior distribution of residuals.
The population posterior mean (PM), the corresponding standard deviation (SD) and 95% CI for population parameters for the two methods (JM and NM) are presented in Tables 1 and 2. The following findings are observed for the estimated parameters. (i) In the response model (6), all the estimates of the parameters based on Model I are smaller than those based on Model II; these estimates are statistically significant for both Models since the 95% CIs do not contain zero. The results indicate that estimated parameters may be substantially overestimated if the model distribution ignores skewness. For the variance parameter σ 2 1 , the estimated value (0.03) based on Model I is much smaller than that (0.67) based on Model II. (ii) For parameter estimates of the CD4 covariate model (10), the estimates of parameters based on Models I and II are comparable. This may be due to less skewness exhibited in the CD4 longitudinal data since the estimate of skewness parameter δ is 0.53.
To compare estimated results with the JM approach, we also employed the "näive" method to estimate the model parameters presented in Table 1, ignoring the CD4 measurement error and using the observed CD4 instead of the true (but unobserved) CD4 in model (6). The difference of estimated parameters between these two methods, related to whether or not ignoring potential CD4 measurement error in conjunction with the viral dynamic model (6), indicates that CD4 measurement error cannot be ignored in the analysis. In particular, we find that the "naive" method may overestimate the effects of covariate CD4 and drug efficacy (i.e., β 5 and β 6 ).
To select a better model that fits the data adequately, a Bayesian selection criterion, known as deviance Table 1. Summary of the estimated posterior mean (PM) for population (fixed-effects) parameters and the corresponding standard deviation (SD), lower limit (L CI ), and upper limit (U CI ) of 95% equal-tail credible interval (CI) based on the joint modeling (JM) approach and the naive method (NM)  (2002) is used. As with other model selection criteria, we caution that DIC is not intended for identification of the "correct" model, but rather merely as a method of comparing a collection of alternative formulations. Guo and Carlin (2004) gave several advantages for choosing DIC as model selection criteria. The structure of DIC allows for automatic computation in the WinBUGS software. In addition, although hierarchical Bayesian methods implemented via the MCMC procedure enable the fitting of such models, a formal comparison of their fit is hampered by their large size and often

Residuals
Theoretical quantiles of t  improper specifications. By using a complexity measure for the effective number of parameters that is based on an information theoretical argument, DIC avoids some dilemmas. As one referee suggested, although Bayesian predictive information criterion (BPIC) investigated by Ando (2007), which is an extension to justify the DIC, could be an alternative as a more stable criterion for model comparison, an additional effort is needed to calculate BPIC value using the WinBUGS software. Instead, we evaluate expected predictive deviance (EPD) formulated by EPD=E i, j (y rep,i j − y obs,i j ) 2 for model comparison, where the predictive value y rep,i j is a replicate of the observed y obs,i j and the expectation is taken over the posterior distribution of the model parameter θ (see Gelman et al. 2003 for details). This criterion chooses the model where the discrepancy between predictive values and observed values is the lowest. We calculate estimated DIC values, which are 1046.3 and 1223.7 for Models I and II (Table 2), respectively, using the JM approach, and 1178.1 for Model I using the NM. As mentioned before, it is hard to tell which model is "correct" but which one fits the data better. Furthermore, the model which fits the data better may be more accurate to describe the mechanism of HIV infection and the CD4 changing process, and thus, needs more attention for patient treatment. Therefore, based on the DIC criterion, the results indicate that Model I provides the better fitting to the observed data. This finding is confirmed by the results of the EPD values (see Table  2). These results are consistent with those in the diagnosis of goodness-of-fit displayed in Figure 6, indicating that Model I performs better. In summary, our results suggest that it is very important to assume a skewed distribution for the viral load response and CD4 covariate models to achieve reliable results, in particular if the data exhibit skewness. Along with these observations, we further report our results in detail below based on Model I using the JM approach only.

Relationship of Covariate and Clinical Factors With Viral Decay Rate
The results presented in Table 1 based on the joint model with the ST distribution indicate that the estimates of the parameters in the CD4 covariate model (10) suggest a significant overall increase in CD4 after treatment. The estimates for the individual-specific first-phase viral decay rateλ 1i =β 2 + b i2 in model (6) ranged from 24.02 to 27.30 with standard deviation 0.72, indicating that the overall first-phase viral decay rate is significant with a substantial variation across subjects; the estimate of the population parameter β 2 (SD) is 26.9 (3.41), and the estimated variance of the random effects b i2 is 13.19. The estimate of the individual-specific second-phase viral decay rate (i.e.,λ i j2 (t) = −6.20 + 0.46z * i j (t) + 4.67γ i j (t) + b i4 with the estimated variance of random effects b i4 being 15.62) appears positively and significantly associated with both the true (unobserved) CD4 value and the drug efficacy value over time. It suggests that both covariate factors have a significantly positive effect on the second-phase viral decay rate; this finding confirms that they may be significant predictors for the second-phase viral decay rate during the treatment process. This may be explained by the fact that more rapid increase in CD4 cell count and more potent drug effect may be associated with faster viral decay in the late stage.
Recent research findings indicate that the decay rate of viral responses to a treatment is a potentially useful marker for ARV treatment (Perelson et al. 1996;Ding and Wu 2000). Individual-specific parameter estimates are very important and have implications for the tailoring of treatment for individual patients with AIDS. As mentioned previously, in the response model (6), the intercepts ( p i1 , p i2 ) are macroparameters which have no interpretable biological meanings, whereas the first-and second-phase viral decay rates λ i1 and λ i j2 represent the minimum turnover rate of productively infected cells and that of latently or long-lived infected cells, respectively. Note that λ i1 is dependent on subjects only, while λ i j2 is considered to be dependent on both subjects and time points. Thus, we have correlated the baseline factors such as baseline ln(RNA), CD4 cell count, age, and weight of patients with the estimated individual λ i1 using the Spearman rank correlation test. Baseline viral load and CD4 cell count are significantly correlated with λ i1 . These correlations are plotted in Figure 7. No significant correlation is observed between the age or weight of patients and λ i1 . The subject-specific estimates of λ i1 shows significantly negative correlation (r = −0.894 and p < 0.0001) with baseline ln(RNA) levels, where the correlation coefficient is understood at the "so called" population level. This can be explained by the fact that the slower viral decay rate may result in a higher viral load response. It is clearly shown from Figure 7 that baseline CD4 cell count had an opposite relationship with λ i1 as baseline ln(RNA) had. This is presumably due to a negative correlation between baseline CD4 cell count and baseline viral load (data not shown).
The results also indicate that the estimated population first-and second-phase decay rates for viral load responsê λ 1 = 26.9 andλ 2 (t) = −6.20 + 0.46z * (t) + 4.67γ (t), respectively, where z * (t) and γ (t) are the standardized true CD4 value and the drug efficacy, respectively, at time t. Thus, the population viral load process may be approximated byV (t) = exp[5.57 −λ 1 t] + exp[1.48 −λ 2 (t)t]. Since the second-phase viral decay rate is significantly associated with both the true CD4 and drug efficacy values (due to statistically significant estimates of β 5 and β 6 ), this suggests that the viral load change V (t) may be significantly associated with the true CD4 process and drug efficacy. Note that the true association described above may be complicated, but the simple approximation considered here may provide new scientific insights on further research. In addition, the estimate of within-subject precision parameter (σ 2 1 ) in Model I (0.03) is much smaller than that in Model II (0.67) which indicates that gain in significant efficiency for the model with the ST distribution relative to the model with tdistribution is observed for the precision parameter estimation. This is expected because high variability and skewness are interrelated, to a certain extent.
The estimated results based on the model with the ST distribution in Table 2 show that the skewness parameters in viral load (1.43) and CD4 cell count (0.53) with degrees of freedom being 5.57 and 5.31, respectively, are significantly positive which confirm the heavy right tail skewness of the viral load and fairly right tail skewness of the CD4 cell count. Thus, it may suggest that accounting for a model with the ST distribution provides a better fit to the data which exhibit skewness and, in turn, gives more accurate estimates of the parameters.

Concluding Discussion
In this article, we demonstrate the use of a tractable model which can be used to characterize long-term HIV dynamics during therapy. The model is an approximated version of a well-known physiologically based HIV dynamic ODE model, but can include treatment influences on HIV dynamics. This article establishes the relationship of virological response with drug susceptibility, medication adherence, PK variation, and time-varying CD4 covariate which quantifies the confounding effects of these clinically interested determinants on virological response. The Bayesian NLME joint modeling approach proposed here can not only combine all possible clinical data into the analysis in the presence of prior information (although noninformative priors were specified in this application), but also investigate the influence on inference when asymmetry and covariate measurement error exist in the longitudinal data. Although the basic principles of such joint modeling approach were well established, their application to our model with the ST distribution, incorporating covariate measurement error, and time-dependent multiple treatment effects are nonetheless innovative. Thus, the results of dynamic parameters based on this model should be more reliable and reasonable to interpret long-term HIV dynamics.
The analysis presented here used the model incorporating a time-dependent drug efficacy which appeared to perform well in capturing the observed patterns of viral load trajectories, and characterizing the biological mechanism of HIV infection under relatively complex clinical situations. It is important to find a way to incorporate subject-specific information with regard to drug exposure, drug susceptibility, and time-varying covariate with measurement error in predicting long-term virological response. Since each of these factors may only contribute a very small portion to virological response and they may be confounded through complicated interactions, the appropriate modeling of the combination effects of these factors is critical to efficiently use information in predicting virological response. The viral dynamic model and associated statistical approach discussed here provide a good avenue to fulfill this goal. However, it is appropriate to mention that some issues exist with the proposed drug efficacy Equation (3), since it is constructed entirely on data available in the A5055 study. First, we only considered PI drug effects on the drug efficacy since the information of NRTI drugs was not collected in the A5055 study and the effect of NRTI drugs was considered less important compared to the PI drugs (Ding and Wu 2000;Acosta et al. 2004). Second, as questionnaire measurements of adherence may not reflect actual adherence profiles for individual patients, the data quality would affect our results of estimated parameters. More accurate measurements for adherence such as electronic monitoring devices may improve data quality. Third, one may notice that we only have the IC 50 data at baseline and failure time. We extrapolated the IC 50 data log-linearly to the whole treatment period in our modeling. The log-linear extrapolation is the best approximation that we can get from the sparse IC 50 data. Last, in the proposed drug efficacy model, our interest was to explore how overall treatment effects in the A5055 study contribute to viral load response, but we did not separately model drug efficacy to discriminate between the impacts on viral load of the two treatment arms and/or the two agents. Although a more elaborate model with consideration of separated drug efficacy between treatment arms and/or agents may be of clinical interest in response to existing therapies and provide a more clinically meaningful description for underlying long-term HIV dynamics, it may cause an identification problem of the model parameters because of the complexity of the model used for statistical inference; thus, it may limit the usefulness of the more sophisticated models. The tradeoff between the complexity and utility of models should be carefully considered. Further studies on these issues are definitely needed. Nevertheless, these limitations should not offset the major findings from our modeling approach, although further improvement may be warranted.
A common concern with Bayesian methods is their dependence on various aspects of the modeling process. One of the possible sources of uncertainty is the choice of prior distributions. The basic tool for investigating model uncertainty is sensitivity analysis. That is, we make reasonable modifications to the assessments of parameters of the prior distributions, recompute the posterior quantities of interest, and see whether they have changed in a way that significantly affects the resulting interpretations or conclusions. If the results are robust against variations in the assumptions, we can report the results with confidence and the conclusions would be solid. However, if the results are sensitive to the assumptions, we may choose to communicate the sensitivity results and interpret the results with caution. To examine the dependence of parameter estimates on the prior distributions, we carried out a sensitivity analysis. In particular, we implemented the MCMC sampling scheme and monitor several independent MCMC runs, starting from different values of the hyperparameters. Those runs exhibited similar and stable behavior (data not shown here). That is, when different priors were used, the results were similar to those presented in this article.
Modeling skewness by modifying well-known distributions is a topic that has received much attention over the past years (Sahu, Dey, and Branco 2003;Ho and Lin 2010;Huang and Dagne 2011). In the presence of skewness and covariate measurement errors in the longitudinal data in combination with data of multiple treatment factors, we propose a robust Bayesian approach to the NLME joint model with ST distribution as a powerful tool to handle such longitudinal data. The proposed methods enhance the modeling flexibility and allow practitioners to analyze longitudinal and multiple treatment factor data in a wide variety of applications. In addition, the proposed joint modeling approach can be easily implemented using the publicly available WinBUGS software. This makes our approach quite powerful and accessible to practicing statisticians. A final issue to note is that, in our application, a further analysis to simply use the unreliable observations (actually observed values) below limit of quantification (BLQ), instead of such values imputed as half of BLQ, may be conducted for inference, but results of parameter estimates may be interpreted differently. In addition, it is noted that the simple "fill-in" with half of BLQ method should not lead to significant inferential bias due to the small percentage of BLQ values (9%) in this particular application. However, if there is a larger percentage of BLQ values in the data analysis, more advanced techniques such as likelihood approaches or multiple imputation to handle the BLQ values may be adopted. We are actively investigating these problems and simulation studies now, and hope that we could report these interesting results in the near future.

Supplementary Materials
Appendices A and B in the supplementary materials provided a brief discussion of multivariate skew-t distribution and WinBUGS program code of ST Model I for the analysis of AIDS data.