Extending the Mann–Whitney–Wilcoxon rank sum test to longitudinal regression analysis

Outliers are commonly observed in psychosocial research, generally resulting in biased estimates when comparing group differences using popular mean-based models such as the analysis of variance model. Rank-based methods such as the popular Mann–Whitney–Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies under missing data. In this paper, we propose a generalized MWW test for comparing multiple groups with covariates within a longitudinal data setting, by utilizing the functional response models. Inference is based on a class of U-statistics-based weighted generalized estimating equations, providing consistent and asymptotically normal estimates not only under complete but missing data as well. The proposed approach is illustrated with both real and simulated study data.


Introduction
Popular models for longitudinal data analysis with continuous outcomes such as the generalized linear mixed-effects models (GLMM) and weighted generalized estimating equations (WGEE) lack robustness in the presence of outliers. For example, in a study to evaluate the efficacy of a sexual risk-reduction intervention for sexually active teenage girls in low-income urban settings, a group at elevated risk for HIV, some adolescent girls reported very large numbers such as 450 and even 1,000,000 for their unprotected vaginal sex over a three-month period [13]. Although answers like this are clearly not legitimate values of the outcome, they do indicate the extremely high level of sexual activity among these girls, when compared with the rest of the study sample. However, the mean-based GLMM and WGEE are not capable of dealing with this type of 'outliers', due to the sensitivity of the sample mean to large values. Rank-based methods such as the popular Mann-Whitney-Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, the MWW test is limited to two groups. The Kruskal-Wallis extension to more than two groups does not have known asymptotic distributions for its estimates [9]. In addition, none of these approaches accommodate covariates. Finally, although attempts have been made to extend the MWW test to longitudinal data, available methods often do not provide valid inference because they ignore the sampling variability in the estimated model for missing data [8,19].
In this paper, we address the aforementioned limitations by framing the MWW test within the regression framework of the functional response models (FRM). In Section 2, we provide a brief review of FRM and discuss how to recast the MWW test within the context of FRM to address its various limitations. In Section 3, we discuss inference for both cross-sectional and longitudinal data. In Section 4, we illustrate the approach with both real and simulated data.

Extending MWW rank sum test to multi-group and longitudinal study setting
We start with a brief review of the MWW rank sum test for between-group difference. We then generalize the classic test for more than two treatment groups within a longitudinal data setting utilizing the FRM.

MWW rank sum test
Consider two independent samples of size n k and let y ki be some continuous outcome from the ith subject within the kth group (1 ≤ i ≤ n k , k = 1, 2). Let R ki denote the rank of y ki in the pooled sample. The Wilcoxon rank sum statistic has the following form [8,14]: Wilcoxon rank sum statistic: Note that the sum of the rank scores n 2 j=1 R 2j from the second group may also be used as a statistic. However, since the two sums add up to n(n + 1)/2 (n = n 1 + n 2 ), only one of the sums can be used as in the test statistic. An alternative form of this statistic is the following Mann-Whitney statistic [8,14]: Mann-Whitney statistic: where I {u≤0} is a set indicator with I {u≤0} = 1 if u ≤ 0 and 0 otherwise. Since W n = U n + n 1 (n 1 + 1)/2 (in the absence of ties), the two tests are equivalent. However, the Mann-Whitney version is readily generalized to longitudinal data in the presence of missing values, while the Wilcoxon form is not. Thus, we use the Mann-Whitney form of the test for the extension, which will be referred to as the MWW rank sum test in the discussion that follows unless otherwise stated. Let V n = ( n 2 ) −1 U n be the normalized MWW statistic in Equation (1) and let θ = E(V n ), where ( n 2 ) denotes combinations of two distinct elements (i, j) from the integer set {1, . . . , n}. If y ki have the same distribution, then θ = 1 2 . Although the reverse is generally not true, θ = 1 2 is often considered as the null hypothesis of no between-group difference in practical applications. This connotation is adopted in the following discussion unless otherwise stated.
Inference about the MWW test has been discussed in the literature (e.g. [7,8,14,18,19]). Below, we first provide a brief review of FRM and then discuss how to use the flexibility of FRM to extend this classic test to accommodate multiple groups, longitudinal data and even covariates.

Functional response models
Existing regression models are defined based on a single-subject response, severely limiting their applications in practice. For example, the classic linear regression has the form, E(y i | x i ) = x i β, where y i (x i ) denotes some response (a vector of predictors or covariates) and β is a vector of parameters. In this model, the response is a single-subject linear y i . Although modern regressions such as the generalized linear models (GLM) and even nonlinear models are more complex, the response remains the same as linear regression. For example, in the GLM E(y i | x i ) = h(x i β) , the linear predictor η i = x i β is extended to a function h(x i β) to accommodate non-continuous outcome y i , but the response function is still the same single-subject linear y i in the linear model.
Many quantities of interest require not only more complex response functions, but also multisubject responses. Within the current context, the statistic in Equation (1) is defined by pairs of subjects y 1i and y 2j . The FRM address the limitations of existing regression models by extending (1) the single-subject y i to multiple subject responses and (2) the linear response to an arbitrary function where f (·) is some function, h(·) some smooth function (with continuous second-order derivatives), y i 1 , . . . , y i q are q individual responses, and x i 1 , . . . , x i q denote vectors of predictors from q subjects. By generalizing the response in this fashion, an FRM has been successfully applied to address a range of methodological issues involving multi-subject responses and second-order moments [3,11,12,23,24].

Extending MWW statistic to multiple groups and longitudinal data with covariates
Adopting the previous notation, let where C n q denotes the set of n q combinations of q distinct elements (i 1 , . . . , i q ) from the integer set {1, . . . , n}. Then, the following is an FRM: In the FRM above, θ is the parameter of interest. For example, in most applications, we are interested in testing the null of no between-group difference, H 0 : θ = 1 2 . The FRM in Equation (4) works for continuous y ki . For discrete outcomes, ties are present. The null in this case is expressed as [20] H 0 : Thus, we may redefine the functional response in Equation (3) as f (y 1i , y 2j ) = I {y 1i <y 2j } + 1 2 I {y 1i =y 2j } . For notational brevity, we assume continuous y ki throughout the remaining discussion unless otherwise noted. By framing the MWW test under FRM, we can readily extend this test for comparing two groups to multi-group settings, even accommodating covariates. To this end, we adopt a popular model for interpersonal connections in social network analysis [21]. Since ties between individual subjects in social networks involve two subjects, standard statistical models for individual outcomes such as linear and logistic regression are not appropriate for modeling ties between two subjects. Holland and Leinhardt introduced a model for such between-subject defined outcomes [5,24], which is readily adapted to our setting for the extension.
Let x ki be a vector of covariates from the ith subject within the kth treatment (k = 1, 2). Consider the following FRM: where logit −1 (x) = exp(x)/(1 + exp(x)) and v(x 1i , x 2j ) is some function of differences between x 1i and x 2j . For example, we may simply set: v l (x 1i , x 2j ) = x 1il − x 2jl , where x kil denotes the lth component of x ki . Since Equation (5) is a function of two subjects, each subject's covariates no longer exert their effect individually, but rather jointly through the function v(x 1i , x 2j ). In addition to the logit link, we may also use probit and other link functions. In fact, the probit is a more natural link in some applications (see Section 4 for details). The above is essentially an extension of the popular logistic regression to our setting involving a response function defined by two subjects. If β = 0, it is readily seen that the null of no treatment difference under Equation (4) is equivalently expressed as H 0 : γ = 0. If β = 0, Equation (5) accounts for the effect of covariates on the probability that y 1i ≤ y 2j and H 0 : γ = 0 again represents the null of no treatment difference.
We can also readily extend Equation (5) to K groups by defining an FRM as follows: As in the two group case, β indicates the effect of covariates, while H 0 : γ = 0 is the primary null hypothesis of no treatment difference. We may also include I {q=k} by v l (x 1i , x kj ) interactions if v l (x 1i , x kj ) has differential effects across k.
We can further extend Equation (6) to longitudinal data analysis. For a longitudinal study with m assessments, let y kit (x kit ) denote a response (a vector of covariates) from the ith subject within the kth group at time t. The version of Equation (6) for longitudinal data is defined by Like the FRM in Equation (6) for cross-sectional data, β accounts for the effect of covariates, while γ t models the between-group difference at time t. For randomized trials, if t = 1 denotes baseline, then γ 1 = 0, since no difference at baseline is expected. If treatment difference is expected over the follow-up time, then γ t = 0 for some or all remaining t (2 ≤ t ≤ m). Thus, the null H 0 : γ = 0 indicates no between-group difference at any of the follow-up times. We may also model temporal patterns for each group using linear, piece-linear or polynomial functions as in standard regression analysis. In the absence of covariates, the FRM in Equation (7) is equivalent to where

Inference for the generalized MWW model
We start with the cross-sectional data and then extend the considerations to the longitudinal setting.

Cross-sectional data
Consider the K-group FRM in Equation (6) and let where f (y 1i , y kj ) and h k (x 1i , x kj ; θ) are defined in Equation (6), and C 1k = C n 1 1 ⊗ C n k 1 with ⊗ denoting the Cartesian product of C n 1 1 and C n k Define the U-statistics-based generalized estimating equations (UGEE) as follows: where S ki = f ki − h ki , D ki = (∂/∂θ)h ki and V ki is given by ( 1 1 ) For example, for the classic MWW test in Equation (3), the UGEE can be solved in closed form to obtainθ = (1/n 1 n 2 ) i∈C 12 f (y 1i , y 2j ) = (1/n 1 n 2 ) i∈C 12 I {y 1i ≤y 2j } . In general, numerical methods such as the Newton-Raphson may be used to solve the UGEE.
As in the standard generalized estimating equations (GEE) [10], the UGEE estimateθ obtained as solution to Equation (10) is consistent and asymptotically normal. We summarize the asymptotic properties below. As a special case of longitudinal data, Theorem 1 follows from asymptotic results that will be established for longitudinal data in the next section.
Then, under mild regularity conditions,θ is consistent and asymptotically normal The asymptotic variance θ generally is not expressed in closed form, except under some special cases. For example, for the classic MWW test in Equation (3), K = 2 and B = 1. It follows that where F k (y) denotes the cumulative distribution function of y ki . Further, under the null H 0 : It follows from Equation (13) and the fact that F(y ki ) is a uniform U between 0 and 1 that Thus in this special case, θ = 1 12 (ρ 2 1 + ρ 2 2 ) and a consistent estimate is given byˆ θ = 1 12 (n/n 1 + n/n 2 ) (n = n 1 + n 2 ). For general K and h(x 1i 1 , x ki k ; θ), a consistent estimate of θ is obtained by substituting respective consistent estimates in place of B and k . A consistent estimate of B iŝ whereÂ i denotes A i with θ replaced byθ . To find a consistent estimate of l , note that Thus, a consistent estimate of θ is given bŷ whereB andˆ l are given in Equations (15) and (16). Under the null H 0 : Lθ = b, where L denotes a known matrix and b a known vector of constants, we can use the Wald statistic W = n(θ − θ 0 ) ˆ −1 θ (θ − θ 0 ) for inference about H 0 , which has an asymptotic χ 2 df distribution with the degree of freedom df = L, the rank. For example, in the special case of the FRM for the classic MWW rank sum test in Equation (3)

Longitudinal data
We first consider the case of complete data and then generalize it to address missing values.

Complete data
Let Inference about θ can be made using the UGEE with the same form as in Equation (10), but with f ki and h ki (θ ) redefined in Equation (18) to account for multiple assessments. In addition, we need to redefine V ki to reflect the added time dimension in f ki and h ki as well as the correlations among the repeated responses y kit . For example, consider extending V ki in Equation (11) to account for the within-subject correlation as follows: where diag t (A t ) denotes a diagonal block matrix with A t on the tth diagonal and C m×m (ρ) a compound symmetry correlation matrix with ρ being the correlation between any two components of f ki . As in the standard GEE [10], R(ϕ) is a working correlation matrix, parameterized by ϕ to account for correlations among the repeated y kit . If ϕ is unknown, as in most applications, it must be estimated before the UGEE in Equation (10) can be solved for θ . Although the consistency ofθ is independent of how ϕ is estimated, the asymptotic normality ofθ is guaranteed only when √ n-consistent estimates of ϕ are used [8,10]. Thus, to use the results in Theorem 1 for the longitudinal data setting, we must substitute a √ n-consistent estimate of ϕ. In particular, under the working independence model, ϕ is known and the UGEE estimate is asymptotically normal. We focus on the working independence model in what follows unless otherwise stated.
It follows from Theorem 1 that the UGEE estimateθ in Equation (20) is consistent and asymptotically normal. The asymptotic variance is readily estimated and, following Theorem 1, is given byˆ Under the null of no between-group difference over all assessments, i.e. H 0 : θ = 1 2 1 m , a n × 1 column vector of 1's, the Wald statistic for testing this null is given by

Monotone missing data
Define a vector of missing (or rather observed) value indicators as follows: We assume no missing data at baseline t = 1 such that r li1 ≡ 1 for all i ∈ C n l 1 and 1 ≤ l ≤ K. Let Since r li1 ≡ 1, π li1 = 1 for all i ∈ C n l 1 and all 1 ≤ l ≤ K. In the presence of missing data, the UGEE in Equation (10) generally yields biased estimates. By integrating the inverse probability weighting technique into the UGEE to account for the selection bias due to missing data, akin to weighting selected households sampled from a targeted region of interest in survey research [6], we obtain a set of U-statistics-based weighted generalized estimating equations (UWGEE) for inference about θ : where S ki = f ki − h ki , and D ki , V ki and S ki are defined the same as in the complete data case. Unlike the UGEE, the above has an additional weight function ki defined by π lit . Given π lit , the UWGEE in Equation (25) is again readily solved using numerical methods such as the Newton-Raphson algorithm.
In most applications, π kit is unknown and must be estimated. Under missing completely at random (MCAR), π lit is independent of y li and thus π lit = E(r lit ) = π lt . In this case, π lt is a constant independent of y li and is readily estimated by the sample moment: Under missing at random (MAR), π lit becomes dependent on the observed y lit and x lit , which, within the current context, contain all y lis and x lis prior to time t (1 ≤ s ≤ t − 1). Denote such a 'history' by z lit = (y lis , x lis ; s = 1, . . . , t − 1) . Then under MAR Thus, under MAR r lit only depend on the observed data z lit , making it possible to model π lit in Equation (26) based on the observed data prior to time t. However, it is still difficult to model and estimate π lit without imposing the monotone missing data pattern (MMDP) assumption, because of the large number of potential missing data patterns [8,15,20]. Under MMDP, y lit and x lit are observed, only if all y lis and x lis prior to time t are all observed. The structured patterns reduce not only the number of missing data patterns, but also the complexity in modeling π lit .
Let p lit = E(r lit | r li(t−1t) = 1, z lit ) denote the one-step transition probability from observing the response at t − 1 to t. We can readily model p lit using a logistic regression model: where η lt = (η 0lt , η 1lt ) denotes the model parameters. More complex models may be considered such as including some or all interactions of the components of z lit . Under MMDP, it is readily checked that where η l = (η l2 , . . . , η lm ) (1 ≤ l ≤ K). Thus, we can estimate π lit by estimating η l of p lit . To estimate η l , we may use maximum likelihood or estimating equations. Suppose η l is estimated based on the following estimating questions: where where 0 J denotes a J × 1 column vector of 0's. Let η = (η 2 , . . . , η m ) . We may express Equation (29) in a compact form Upon estimating η, we can estimate ki and then estimate θ by substituting the estimated ki into the UWGEE in Equation (25). Since the UWGEE estimateθ is based upon estimated η, we must also account the variability of the estimate of η when estimating the variance ofθ . The theorem extends Theorem 1 to take this extra variability into consideration when constructing the variance ofθ. A proof is sketched in the appendix: where n and ρ 2 l (1 ≤ l ≤ K) are defined in Equation (12). Then, under mild regularity conditions, we have the following: (1)θ is consistent; (2) ifφ is √ n-consistent,θ is asymptotically normal: The asymptotic variance ofθ is identical to its counterpart in Equation (13) of Theorem 1, except for an added term l , which accounts for the additional variability due to estimation of η. A consistent estimate of θ is obtained by substituting consistent estimates in place of the respective quantities in Equation (33). For example, a consistent estimate of B isB = K k=2 (1/n 1 n k ) i∈CD kî by (θ ,η,φ). Note that Theorem 2 shows that the asymptotic variance ofθ obtained by ignoring the sampling variability ofη underestimates the true variability ofθ . Also, Theorem 2 applies to the most general case where R(ϕ) may be non-identity matrices. However, we again focus on the working independence model for both in this paper unless stated otherwise.

Applications
We demonstrate our considerations with both simulated and real data. We first investigate the performance of the proposed approach by simulation and then present an application to a real study on sexual health for a group of teenage girls in low-income urban settings who were at elevated risk for HIV, sexually transmitted infections (STIs) and unintended pregnancies. In all examples, we set the statistical significance at α = 0.05 and Monte Carlo (MC) sample size equal to 1000.

Cross-sectional data
Consider the following analysis of covariance model: The above models three group means, represented by β 0 , β 0 + γ 2 and β 0 + γ 3 , while adjusting for one covariate x li1 . It follows from Equation (34) that: Thus, we have where (·) is the cumulative distribution function of the standard normal. Let Then, except for the absence of the intercept β 0 , the FRM below contains the same parameters θ as the linear model in Equation (34) and thus can be used to provide inference about θ . If y li follows the normal distribution in Equation (34), maximum likelihood provides the most efficient estimate of θ . However, if there are outlying observations of y li , maximum likelihood, or even the estimating equations [10], generally yields biased inference. In contrast, the FRM in Equation (35) is not affected by such outliers and continues to provide valid inference, if the ranking order of the observations is preserved.
For our simulation study, we set β 0 = β = γ 2 = γ 3 = 1 and σ 2 x = 0.2. We considered three sample sizes 50, 100 and 300, representing small, moderate and 300 large sample sizes. Given each sample size, we simulated three group sizes n l from a trinomial with the probability 1 3 for each of the three cells. For each of the 1000 simulated data sets, we fit the FRM in Equation (35) and computed estimates of θ and asymptotic standard errors and tested the null that the value of parameter was equal to the true value for each of the three components of θ based on the Wald statistic. We also computed the empirical standard errors based on the 1000 MC estimates of θ and type I error rates based on the percent of times the null was rejected over the 1000 MC replications.
The UGEE estimates of θ are shown in Table 1, along with standard errors (asymptotic and empirical) and type I errors based on 1000 MC replications. The UGEE estimates ofθ were quite close to the true values, even for the small sample size n = 50. The asymptotic and empirical standard errors also agreed quite well with each other, even for the small sample size. Although the type I error rates showed a small upward bias, typical for small sample sizes [1,4,16], the bias quickly decreased as the sample size increased.

Longitudinal data
We considered a longitudinal study with two groups and three assessments and simulated the longitudinal responses y ki = (y ki1 , y ki2 , y ki3 ) for the two groups from a trivariate normal, N(0, C 3 (0.5)), with C 3 (0.5) denoting a 3 × 3 compound symmetry correlation matrix. We applied the FRM in Equation (8) to compare the two groups based on the data generated. In this particular case, K = 2, m = 3 and the vector of parameters of the FRM is θ = (θ 1 , θ 2 , θ 3 ) . We considered three sample sizes, 50, 100 and 300 per group, representing small, moderate and large sample sizes, respectively. We increased the sample size in this simulation study over the prior one to account for decreased sample sizes at the two follow-up assessments due to missing data.
To assess performance of the FRM under missing data, we assumed no missing value at baseline t = 1 and simulated the missing response r ki,t at the follow-up times under MAR according to Equation (27) with the one-step transition probability p kit modeled by the logistic regression under a one-step Markov condition below: In the above, we assumed a Markov condition so that p lit only depends on the most recent y li(t−1) prior to time t. We set η lt = 3 (t = 2, 3) and solved the following equations for η 0lt to create about 15% and 25% missing response y lit at time t = 2, 3, respectively: To ensure MMDP, we first simulated missing data indicator r li2 from the Bernoulli distribution, Bern(p li2 ) (l = 1, 2). If r li2 = 0, then set r li3 = 0. Otherwise, r li2 = 1 and r li3 is simulated from Bern(p li3 ).
In the absence of missing data, the UGEE estimate of θ is given by Equation (20), which is asymptotically normal with a consistent estimate of the asymptotic variance given by Equation (21). In the presence of missing data, the UWGEE estimate does not admit a closed-form expression, but is readily obtained by solving the UWGEE using the numerical methods. A consistent estimate of the asymptotic variance is also readily computed according to Theorem 2.
The UGEE and UWGEE estimates of θ are shown in Table 2, along with standard errors and type I errors (for testing the null of no time effect, i.e. H 0 : θ 1 = θ 2 = θ 3 ) under both complete and missing data cases based on 1000 MC replications. As seen, both the UGEE and UWGEE estimates ofθ were quite accurate, even for the small sample size n l = 50 per group. The standard errors showed a steady decrease as n l increased. Also, the corresponding standard errors were slightly larger for the UWGEE estimates because of missing data at the two follow-up times. As in the cross-sectional data case, the type I error based on the Wald statistic again showed a small upward bias for the small sample size, but the bias immediately disappeared once the sample size increased to 100 and 300. The results (not shown) from the logistic regression in Equation (36) for the missing data were also quite close to the true values.

Real study
Teenage girls in low-income urban settings are at elevated risk for HIV, STIs and unintended pregnancies. A randomized controlled trial was recently conducted to evaluate the efficacy of a sexual risk-reduction intervention, supplemented with post-intervention booster sessions, targeting low-income, urban, sexually active teenage girls [13]. The sexually active urban adolescent girls aged 15-19 (n = 738) recruited in the study in Rochester, NY, were randomized to a theorybased, sexual risk-reduction intervention or to a structurally equivalent health promotion control group. Assessments and behavioral data were collected at baseline, and then again at three and six months post-intervention. The primary interest of the study is to compare frequency of unprotected vaginal sex between the intervention and controlled condition. More details about the demographic characteristics of the study population, the treatment conditions and the assessment battery can be found in [13]. As mentioned in Section 1, a difficult problem with the data is the extremely large values some subjects reported with respect to their sexual activities. For example, 7 subjects reported over 100 episodes of unprotected vaginal sex over the past three months at the three-month follow-up, with the largest one being 1,000,000. A common approach to this issue in psychosocial research is to trim such outliers using some ad hoc rules [17]. However, these methods induce artifacts, because of their dependence on the specific rules used. Rank-based approaches such as the proposed FRM model more effectively address this issue.
Our analysis was based on 639 (n 1 = 310, 49% in the control and n 2 = 329, 51% in the intervention). We were interested in comparing the two treatment groups for the unprotected vaginal sex. Let y lit (r lit ) denote such an outcome (indicator for missing data) at time t (= 1, 2 and 3 for baseline, three and six months post-intervention) for the lth treatment condition (l = 1 for the control and 2 for the intervention). As in the simulation study, we applied the FRM in (25) by jointly modeling y lit and r lit . Since this is a randomized trial, we did not include any covariate and thus considered the following FRM for comparing the two treatment groups: We modeled the missing data under MAR using logistic regression as discussed in Section 3.2. However, since some of y lit were extremely large, we used the rankings R li(t−1) , rather than the actual values of y li(t−1) , as the predictor of the logistic model: Note that we used a Markov assumption in Equation (38) so that the missingness at time t only depends on the observed response at time (t − 1). Also, as ties are inevitable for the intrinsically discrete y lit in this application, we used as the functional response in the FRM to account for their presence. Under Equation (39) and the null of no treatment difference at time t is H 0 : θ t = 1 2 (1 ≤ t ≤ 3). The UWGEE estimates of the intercept (η 0t ) and slope (η 1t ) from Equation (38) for missingness at three and six months post-intervention are shown in Table 3. The occurrence of missing data did not depend on the value (ranking) of the outcome at the prior visit for either treatment condition, suggesting no evidence for rejecting MCAR. The estimated θ , standard errors and p-values for testing the null of between-group difference at each assessment time, along with the test statistics and p-values for testing the null of no temporal trend during post-treatment, are shown in Table 4. The estimated θ t showed a steady decrease for those in the intervention condition to engage in unprotected vaginal sex over time from baseline to three and six months post-intervention. The decline was significant at six months, but was a trend at three months (pvalue is close to 0.10). The continued decline from three to six months was confirmed by the near significant p-value for testing the null H 0 : θ 2 = θ 3 . The increased gain of the intervention effect at six months was likely due to two booster sessions the study subjects received at three months. The booster sessions, 90-min long (as opposed to four 120-min regular sessions delivered during the intervention), address behavioral patterns of girls that are expected to occur as they age and can promote maintenance of gains observed with health-behavior interventions [2].

Discussion
In this paper, we extended the classic MWW test to accommodate more than two groups, covariates and longitudinal data. We achieved this generalization by utilizing the FRM, which is uniquely positioned to model relationships involving between-subject interactions, rather than attributes of a single subject, such as the MWW rank sum test within our context. For valid inference, we also considered two approaches based on a set of U-statistics-based estimating equations. The first follows the traditional route, akin to WGEE, by estimating the primary and missing data models separately, but unlike other applications the inverse probability weighting technique offers valid inference by accounting for the effect of sampling variability of the estimated weight on the estimate of the primary model. The second avoids the complex adjustment procedure by making joint inference about both the primary and missing data models through the flexibility of FRM. We examined the performance of the proposed approach through both simulated and real study data. Results from the simulation study show that the proposed approach performed really well, with good parameter and type I estimates even for a sample as small as 50 per group. The proposed approach applies to both continuous and discrete outcomes. As demonstrated by the real study on sexual health, it handled ties well in the outcome of unprotected vaginal sex, an intrinsically count response. In addition to the MWW test, median regression may also be used to address outliers arising from the Sexual Health Study [22]. However, these methods may not work well, since they either do not address missing data in longitudinal outcomes or require a unique median. Given the discrete nature of the unprotected vaginal sex outcome in our context, ties are inevitable. Although still providing valid inference under MCAR, methods that do not address missing data may suffer power loss because of list-wise deletion.