A nonparametric test for the two-sample problem based on order statistics

Abstract We study a nonparametric test procedure based on order statistics for testing the null hypothesis of equality of two continuous distributions. The exact null distribution of the proposed test statistic is obtained using an enumeration method and a novel combinatorial argument. A recurrence relation for the probability generating function and a sequential approach for computing the mean and variance of the distribution are given. Critical values and characteristics of the distribution for selected small sample sizes are presented. For the Lehmann alternative family, the exact power function of the new test is derived, and its power performance is examined. We also study the power performance of the proposed test under the location-shift and scale-shift alternatives using Monte Carlo simulations and observe its superior performance when compared to commonly used nonparametric tests under various scenarios. A generalization of the proposed procedure for unequal sample sizes is discussed. An illustrative example and some concluding remarks are provided.


Introduction
For two absolute continuous cumulative distribution functions (cdfs) F and G, suppose we are interested in testing the null hypothesis using two independent random samples from the two distributions.This is the classical two-sample problem.Many statistical tests exist for testing the hypotheses in (1) based on a random sample X 1 , X 2 , :::, X m from F and an independent random sample Y 1 , Y 2 , :::, Y n from G. For instance, the two-sample t-test is a parametric procedure under the assumption that F and G are normally distributed.The Wilcoxon rank-sum test (Wilcoxon 1945) and the Kolmogorov-Smirnov (KS) test (Smirnov 1933) are nonparametric procedures that do not require any distributional assumption on F and G.
The exact null distribution for the Wilcoxon rank-sum test statistic and the KS test statistic can be found in Wilcoxon, Katti, and Wilcox (1970) and Kim and Jennrich (1973), respectively.
In the past decades, there have been several attempts to develop new tests for the two-sample problem.For instance, Baumgartner, Weiß, and Schindler (1998) proposed a new test for a variety of alternative hypotheses, and showed that the power of the proposed test is competitive with the KS, Wilcoxon, and Cram er-von Mises tests.Zhang (2006) developed a likelihood ratio-based approach to constructing non-parametric tests for the two-sample problem.Murakami (2006) modified the Baumgartner test for the two-sample problem and then generalized the test procedure for multisample problem.Recently, Neuh€ auser, Welz, and Ruxton (2017) compared the performances of the KS test, Zhang's test (Zhang 2006), Baumgartner test and modified Baumgartner test (Murakami 2006) for different discrete and continuous distributions, and recommended the use of Zhang's test and the modified Baumgartner test instead of the KS test.
There exist many nonparametric tests that perform reasonably well in different situations.For example, the Wilcoxon rank-sum test performs well under the location-shift alternative and the Ansari-Bradley test performs well for the scale-shift alternative.However, there does not exist a single test that performs well in all situations.Therefore, researchers are developing different nonparametric procedures for the twosample problem which have some desirable properties such as being easy to interpret, the null distribution is available, and the p-values can be obtained efficiently.In this paper, we propose a new test procedure based on order statistics of the two samples and study its properties.The proposed test procedure for the two-sample problem based on order statistics has many desirable properties and comparable power performance with the other existing nonparametric tests under different kinds of alternatives.Our aim is to provide an alternative nonparametric procedure for the two-sample problem but not a procedure that is superior to all the other procedures.
The rest of the paper is organized as follows.In Section 2, we introduce the test statistic, denoted as T n (and defined in (2) below).In Section 3, we derive its null distribution (that is, under the assumption F ¼ G) using an enumeration method as well as a novel combinatorial argument.An extensive investigation of the probabilistic properties of the null distribution is presented in Section 4. The exact distribution of T n under Lehmann alternatives and the associated power performance of the proposed test procedure are discussed in Section 5.The case of unequal sample sizes is discussed in Section 6.In Section 7, a Monte Carlo simulation study is used to evaluate the power performance of the proposed test procedure under location-shift and scale-shift alternatives.A numerical example is presented in Section 8 to illustrate the proposed test procedure.Concluding remarks are given in Section 9.

Proposed test statistic T n for equal sample sizes
When the sizes of the two samples are equal (i.e., m ¼ n), suppose X 1:n < X 2:n < Á Á Á < X n:n and Y 1:n < Y 2:n < Á Á Á < Y n:n be corresponding order statistics of the random samples X 1 , X 2 , :::, X n and Y 1 , Y 2 , :::, Y n , respectively.Under the null hypothesis H 0 in (1), the first order statistic of the X-sample X 1:n is expected to be close to the first order statistic of the Y-sample Y 1:n , the second order statistic of the X-sample X 2:n is expected to be close to Y 2:n , and so on.Based on this observation, we consider for i ¼ 1, :::, n, the number of Y i:n that fall inside the interval ðX iÀ1:n , X iþ1:n as a test statistic for testing the hypotheses in (1) with X 0:n À1 and X nþ1:n 1: Formally, the proposed test statistic T n is defined as where I A is the indicator function of the event A, i.e., & Note that switching the X-and Y-samples will not change the value of the test statistic, i.e., Small values of the test statistic T n lead to the rejection of H 0 and in favor of H 1 in (1).Thus, we reject H 0 if the test statistic T n kða, nÞ, where kða, nÞ is the critical value that depends on the size of the test a and the sample size n.In other words, the critical value kða, nÞ is the largest k that satisfies Pr T n kjH 0 : The critical value kða, nÞ as well as the p-value associated with observed samples can be determined by using the null distribution of T n presented in the next section.Note that the test statistic T n has been considered by Kaya and Kus ¸(2003).They obtained the probability mass function (pmf) of T n by exhaustive enumeration for small n.However, Kaya and Kus ¸(2003) did not obtain the exact null distribution of the test statistic and they did not study the properties of the test statistic thoroughly.

Based on precedence-type statistics
Let M 1 be the number of the Y-observations in ðX 0:n , X 1:n and M i be the number of the Y-observations in ðX iÀ1:n , X i:n , i ¼ 2, 3, :::, n: Under the null hypothesis H 0 in (1), the joint pmf of M 1 , M 2 , :::, M n is given in Theorem 4.1 (Balakrishnan and Ng 2006) as , and the m i are nonnegative integers such that P n i¼1 m i n: The proposed test statistic T n can be written in terms of M i 's based on the following relations: In other words, the test statistic T n can be written in terms of M i 's as Then, the pmf of T n under H 0 : F ¼ G can be obtained as I fhðm 1 , m 2 , :::, m n Þ¼kg :

Based on a combinatorial argument
We can derive a compact closed form expression for the pmf of T n using a direct combinatorial argument.To obtain the main result, we first obtain several lemmas.Consider that we have n balls with numbers 1, 2, :::, n and n þ 1 boxes numbered from 1 to n þ 1.If we place these n balls, beginning from the first ball, into the boxes in such a way that every succeeding ball is either placed into the same box as the previous ball or into any of the subsequent boxes.After placing the n balls in this manner, some boxes can be empty.In this case, the balls represent the variables Y i:n , i ¼ 1, 2, :::, n and the boxes represent the n þ 1 intervals ðÀ1, X 1:n , ðX 1:n , X 2:n , :::, ðX n:n , 1Þ: In the example presented in Table 1 with 4 balls and 5 boxes (i.e., n ¼ 4), boxes with the numbers 2, 3 and 4 are empty; the first box contains two balls, namely the balls number 1 and number 2; the fifth box contains the other two balls (balls 3 and 4).
Here, we call the s-th ball successful if the ball is placed in one of the boxes with the numbers s or s þ 1.In the example presented in Table 1, balls 1 and 4 are successful and they are indicated by using an asterisk.Let S k n be the number of all possible placements of n balls with exactly k successes and let S 0 0 ¼ 1: In Table 2, we present all the eight possible allocations of 3 balls into 4 boxes with exactly two successes.The successful balls are indicated by using an asterisk.It shows that S 2 3 ¼ 8 in this example.We can show that S n n ¼ 2 n and S 0 n ¼ 0 for any n (i.e., there is no placement without a success).The following lemmas can be used to obtain some formulas related to S k n : Lemma 3.1.For any n ! 1, k ¼ 1, :::, n and d ¼ 1, 2, :::, k À 1, S k n satisfies the following equations: Remark 1.A special case of Eq. (4) for d ¼ 1 is Remark 2. Since C n 2n is the total number of all placements and S 0 n ¼ 0, we can show that S Lemma 3.2.For any n and any k with k n, the sequence S k n is uniquely determined through Eq. ( 5) and the equations S n n ¼ 2 n and 2S 1 n ¼ S 2 n : Lemma 3.3.For any n and any k n, we have The proofs of Lemmas 3.1-3.3are presented in the supplementary materials.
Theorem 3.4.Let F and G be two continuous cdfs.Under the null hypothesis H 0 in (1), the pmf of T n defined in Eq. ( 2) is given by Proof: Theorem 3.4 is a direct consequence of Lemma 3.3 and the identity w 4. Probabilistic properties and limiting distribution of T n The result provided in Theorem 3.4 gives an easy way to compute the pmf and cdf of T n when F ¼ G.In this section, we explore additional properties of the null distribution of T n and provide a numerical example to illustrate the application of the results.

Basic distributional properties
The pmf of T n is unimodal and skewed to the right even for moderate values of n.It follows from Eq. ( 8) that for 1 k ðn À 1Þ, This expression is less than 1 if, and only if kðk þ 1Þ < 2n and equals 1 only if kðk þ 1Þ ¼ 2n, and exceeds 1 otherwise.Thus, PrðT n ¼ kÞ is strictly monotonically increasing and then decreases and has two adjacent integers k 0 and k 0 þ 1 as modes only when i, for some positive integer k 0 ; i.e., for n ¼ 3, 6, 10, 15, etc.Otherwise, the unique mode is given by the integer For n < 6, there is mild negative skewness, which turns into positive skewness for larger n; the skewness increases as n increases.
We now obtain a useful recurrence relation between the probability generating functions (pgfs) of integer valued random variables T n and T nÀ1 where the pgf of T n is defined as P n ðsÞ ¼ Eðs T n Þ: Note that P n ð1Þ 1: As T n is at least 1, P n ð0Þ ¼ 0, and since it has a finite support, P n ðsÞ exists for all real s and we define P 0 ðsÞ ¼ 1: Theorem 4.1.The pgf P n ðsÞ satisfies the recurrence relation for all real s: The proof of Theorem 4.1 is provided in the supplementary materials.
For EðT n Þ, upon differentiating both sides of Eq. (S6) in supplementary file with respect to s once and putting s ¼ 1, we obtain the recurrence relation which simplifies to Since T 1 is degenerate at 1, EðT 1 Þ ¼ 1, and using the above recurrence relation, one can generate the mean of T n in a sequential manner.For example, Upon differentiating both sides of Eq. (S6) in supplementary file with respect to s twice and putting s ¼ 1, we obtain the following relation: : 11), the above equation can be simplified into the recurrence relation , n ! 2, with P 00 1 ð1Þ ¼ 0: Thus, sequentially we can generate P 00 n ð1Þ and using Eq. ( 10) and Eq. ( 11) we can obtain VarðT n Þ as well.For example, VarðT 2 Þ ¼ 2=9 ¼ 0:22, VarðT 3 Þ ¼ 14=25 ¼ 0:56, and VarðT 4 Þ ¼ 1186=1225 ¼ 0:97: Lemma 4.2.Under H 0 : F ¼ G, the first and second moments of T n are, respectively, Proof.The first and second moments of T n can be obtained from Eq. ( 8) and the recurrence relation (11).
For illustrative purposes, the cdf, the mode, the expected value, the standard deviation, and the median of T n are tabulated in Tables 3, 4 for n ¼ 5ð1Þ30:
Theorem 4.3.Under the null hypothesis H 0 : F ¼ G, the limiting distribution of T n = ffiffiffi n p is Weibull with scale parameter k ¼ 2 and shape parameter k ¼ 2.
Remark 4. The mean and variance of the limiting distribution are, respectively, ffiffiffi p p ¼ 1:7725, and 4 À p ¼ 0:8584: Note that EðT n jH 0 : F ¼ GÞ % ffiffiffiffiffi ffi pn p À 1 and VarðT n jH 0 : and there is also convergence of the first and second moments.The median and the mode of the limiting distribution are 2 ffiffiffiffiffiffiffiffiffiffiffiffiffi log ð2Þ p ¼ 1:6651 and ffiffi ffi 2 p ¼ 1:4142, respectively.We also note that the limiting distribution of ffiffiffi n p T n is Weibull, where T n is the proportion of successful matches in n dependent trials.
Remark 5.The limiting distribution can be used to approximate the critical values for a given level of significance a. Specifically, t a ðnÞ ¼ 2fÀn log ð1 À aÞg 1=2 can be used to approximate the critical value for a level a test.For example, when a ¼ 0:05 and n ¼ 30, the approximated critical value is 2.48.In other words, H 0 is rejected if T 30 < 2:48 (i.e., T 30 ¼ 1 or 2).From Table 3, we can observe that PrðT 30 2Þ ¼ 0:0508: Remark 6.From our empirical study based on Monte Carlo simulations (results are not shown here for brevity), the limiting distribution approximates the distribution of T n well for n !500:

Exact distribution of T n under Lehmann alternatives
In this section, we consider the exact distribution of the proposed test statistic T n under the Lehmann alternative H 1 : F c ¼ G for some c 6 ¼ 1: The Lehmann alternative H 1 : F c ¼ G is a subclass of the alternative H 1 : F 6 ¼ G when c 6 ¼ 1: Then, based on the exact distribution of T n and the enumeration method, we obtain an explicit expression for the power function of the proposed test procedure under the Lehmann alternative.
From Theorem 4.2 of Balakrishnan and Ng (2006), under the Lehmann alternative Then, the exact pmf of T n under the Lehmann alternative is given by ' : The exact power values under Lehmann alternatives with different c for sample sizes n ¼ 6ð2Þ14 are presented in Table 5.When c ¼ 1, the values are the exact significance levels.From Table 5, we observe that the power values of the proposed test based on statistic T n increase with the sample sizes.Moreover, the further the values of c away from 1, the larger the power values.
Remark 7. Similar results hold for the proportional hazard family of distributions where the relationship ð1 À FðxÞÞ ¼ ð1 À GðxÞÞ c is assumed.

Proposed test statistic T n for unequal sample sizes case
The proposed test statistic presented in Eq. ( 2) can handle the case with equal sample sizes.In this section, we generalize the test procedure for the case with unequal sample sizes and propose a generalized version of the test statistic based on comparing the sample quantiles from the two samples.Without loss of generality, we consider m !n, where m is the sample size of the X-sample and n is the sample size of the Y-sample.
n be order statistics of the Xand Y-samples from the populations with continuous cdfs F and G, respectively.
For i ¼ 1, :::, n, we define the following indices: where bac is the floor function of a and dae is the ceiling function of a.Note that j 1 ð1Þ ¼ 0, and j 2 ðiÞ À j 1 ðiÞ ! 2 for all i ¼ 1, :::, n: Using the idea of the test statistic in Eq. ( 2), we proposed the test statistic Tðm, nÞ ¼ with X 0:m À1 and X mþ1:m 1: It is obvious that m ¼ n, T(m, n) in Eq. ( 20) reduces to T n in Eq. ( 2).
In view of the probability integral transformation, under the null hypothesis where the U j:m and V i:n are the i-th order statistics of a sample of size m and the j-th order statistics of a sample of size n from a uniform distribution in (0, 1).
In the following theorem, we provide the expected value of the proposed statistic in Eq. ( 20) under the null hypothesis.Note that the expected value of the proposed statistic in Eq. ( 20) under the null hypothesis reduces to the expectation in Eq. ( 12) when m ¼ n.Theorem 6.1.For m !n, under the null hypothesis H 0 : where for 1 i n and the indices j 1 ðiÞ and j 2 ðiÞ are given by Eq. ( 19).
Proof.Note that Upon conditioning on V i:n , we obtain for 1 j m: Integrating out v iþkÀ1 ð1 À vÞ mþnÀkÀi in Eq. ( 23) over (0, 1) results in a beta function, and when multiplied by the combinatorial coefficients, we can obtain Thus, Eq. ( 22) holds and the proof is complete.
w Remark 8.The form in Eq. ( 22) can be related to the negative hypergeometric probabilities and this form is well-known in the context of distribution-free prediction intervals for order statistics of a future sample (see Section 7.3 of David and Nagaraja 2003).Moreover, alternative forms of p i ðm, nÞ can be obtained by conditioning on U j:m : The performance of the proposed test procedure when the sample sizes of the two samples are unequal are evaluated by Monte Carlo simulation in the next section.
For a detailed discussion on various properties of these distributions, one may refer to Johnson, Kotz, and Balakrishnan (1994).For different choices of sample sizes, we generated 10,000 sets of data in order to obtain the estimated rejection rates.Note that when the location shift is h ¼ 0:0, the simulated rejection rates for level 10% tests are the simulated significant levels.In addition to the location-shift alternative, we also consider the scale-shift alternative H 1 : FðxÞ ¼ GðdxÞ for some d 6 ¼ 1, where d is the scale shift.
The power of the proposed test procedure (T n ), the classical Wilcoxon's rank-sum test (WRS), the KS test (KS), the two-sample t-test (t-test), Anderson Darling test (AD), Cramer-Von-Mises test (CVM), Ansari-Bradley test (AB), Lepage test (Lepage), Zhang's test (Zhang) and modified Baumgartner test (MB) are estimated through Monte Carlo simulations when the scale shift d ¼ 5:0, 1:0, 0:9, 0:75, and 0.25 with equal sample sizes n ¼ 25, 50, and 100.Note that when the scale shifts d ¼ 1:0, the simulated rejection rates for level 10% tests are the simulated significant levels.The simulation results for the location-shift and scale-shift alternatives are presented in Tables 6-11.In addition to the location-shift and scale-shift alternatives, we consider the F and G be the standard normal distribution, standard logistic distribution, and skew-normal distributions with a ¼ À4, À 1, 0, 1, and 4. The simulated rejection rates of different test procedures under these settings are presented in Table 12.Note that skew-normal distribution with parameter a ¼ 0 is equivalent to the standard normal distribution.Therefore, when a ¼ 0 for standard normal distribution versus skew-normal distribution, it corresponds to the null hypothesis; hence, the simulated rejection rates are the simulated type-I error rates.However, for the case of standard normal distribution versus standard logistic distribution and the case of standard logistic distribution versus skew-normal distribution, all the simulated rejection rates are simulated power values.Furthermore, we considered the alternative with both location-shift and scale-shift based on the normal distribution.The simulated rejection rates for equal sample sizes n ¼ 25 under Normal(0, 1) versus Normal(h 1 , h 2 Þ with h 1 ¼ À2:0, À 1:0, À 0:4, 0, 0:4, 1:0, and 2.0 and h 2 ¼ 0:25, 0:75, 0:90, 1:00, and 5.00, are presented in Table 13.From Tables 6-13, we observe that the proposed test procedure based on the test statistic in Eq. ( 2) is consistent in the sense that the larger the sample sizes n, the larger the power values.Moreover, the proposed test procedure provides similar power values when the location shifts are þh and Àh, which indicates that the test statistic is a location invariant test for equal sample sizes.We observe that the power performance of Zhang's test and the modified Baumgartner test are superior compared to the other  advantages in terms of power compared to the Wilcoxon rank-sum test, the KS-test, and the two-sample t-test for uniform and beta distributions under the location-shift alternatives, and for normal and t distributions under the scale-shift alternatives.
For the case with unequal sample sizes, the simulation results for different sample sizes n and m are given in Tables 14-16.Once again, the proposed test procedure based on T(m, n) provides comparable power performance with the other existing two-sample procedures considered in this paper.

Table 5 .
Exact power under Lehmann alternatives with different c for sample sizes n ¼ 6ð2Þ14:

Table 6 .
Simulated rejection rates under location-shift alternatives with significant level 10% under equal sample sizes n ¼ 25.

Table 7 .
Simulated rejection rates under scale-shift alternatives with significant level 10% when under equal sample sizes n ¼ 25.
ered in this paper.Moreover, the proposed test procedure does not give the worst power in most of the scenarios considered here.The proposed test T n has obvious

Table 8 .
Simulated rejection rates under location-shift alternatives with significant level 10% under equal sample sizes n ¼ 50.

Table 9 .
Simulated rejection rates under scale-shift alternatives with significant level 10% under equal sample sizes n ¼ 50.

Table 10 .
Simulated rejection rates under location-shift alternatives with significant level 10% under equal sample sizes n ¼ 100.

Table 11 .
Simulated rejection rates under scale-shift alternatives with significant level 10% under equal sample sizes n ¼ 100.

Table 14 .
Simulated rejection rates under location-shift alternatives with significant level 10% when n ¼ 25, and m ¼ 50.

Table 15 .
Simulated rejection rates under scale-shift alternatives with significant level 10% when m ¼ 25, and n ¼ 50.

Table 18 .
The values of test statistics with corresponding p-values.