Maximum test and adaptive test for the general two-sample problem

An extension of the omnibus test statistic of Ebner et al. [A new omnibus test of fit based on a characterization of the uniform distribution. Statistics. 2022;56:1364–1384. doi: 10.1080/02331888.2022.2133121] is considered for the general two-sample alternative. In addition, using the extension this paper introduces a maximum test statistic and an adaptive test statistic for testing the equality of two distributions. The power performance in various situations is investigated for continuous and discrete distributions. Simulation studies based on Monte-Carlo show that the proposed test statistics are good competitors of the existing nonparametric test statistics. The proposed test statistic displays outstanding performance in certain situations, and is illustrated using real data. Finally, we offer some concluding remarks.


Introduction
One of the main goals in statistics is to make inferences, such as hypothesis tests and estimates, from data from unknown populations.Applications of hypothesis testing are widely used in many practical areas, including psychology, biomedical studies, and industry.A common problem faced by statisticians is determining whether two groups of samples come from the same population.For example, in the life sciences, practitioners rely on hypothesis testing to compare the effectiveness of a new drug with an existing drug or placebo.If we can assume that the basic features of the populations are normal, then a variety of statistical methods can be applied to the collected data with optimal properties.In this situation, the two-sample t-test is the best known and uniformly most powerful test for equality of means with homoscedasticity.However, the assumption of normality can often not be ascertained due to limited sample sizes.Additionally, the classical F-test for testing the equality of variances is sensitive to the assumption that the population distributions are normal and is therefore of limited use in practice.Hence, the underlying theoretical distribution is often unknown, and in several application areas we cannot assume normality or any other distribution.In numerous instances, distributing the new drug becomes intricate owing to the limited sample size, making it challenging to assess the normality of the distribution.In addition, the underlying theoretical distribution is unknown, and in several application areas we cannot assume normality or any other distribution.Therefore, nonparametric methods become more appropriate choices in hypothesis testing.
The nonparametric method is a classical subject and is sometimes called the distribution-free statistical method Conover [1].Many nonparametric tests are based on ranking procedures applied directly to the sample data.Testing problem is usually solved by assuming that the populations have the same shape and differ only by a shift in location (such as mean or median).In practice, however, these special alternatives, such as location shift or scale shift, are rare cases.In fact, in the statistical process monitoring context, both location and scale have their respective importance, and upward scale shift often makes a process volatile, see e.g.Mukherjee and Sen [2].In this circumstance, the Cucconi statistic Cucconi [3] is extensively studied in recent years in the context of location-scale tests.The Cucconi statistic is interpreted as the Mahalanobis distance between the sum of ranks and reverse ranks of one sample in the pooled sample.The Cucconi statistic is applied to hydrology and parasitology, see Rutkowska and Banasik [4] and Marozzi and Reiczigel [5].Marozzi [6] pioneered the Cucconi statistic and found some advantages of this statistic.For example, the convergence to the limiting distribution of the Cucconi statistic is excellent by comparing with the exact critical values.However, recently, Kössler and Mukherjee [7] noted that traditional two-sample simultaneous tests, e.g.Lepage test [8], for location and scale parameters are silent about the shape of the distributions.Then, Kössler and Mukherjee [7] and Mukherjee et al. [9] proposed test statistics for the versatile alternative.However, in various experiments, a change in other parameters of distribution along with location, scale and shape is also widespread.For instance, when the control group follows a heavy-tailed t distribution with 2.1 degrees of freedom, and the experimental group is from a normal distribution with mean 0 and variance 21, the means and variances of both groups are the same.The third moment does not exist for the control group.As both distributions are symmetric, skewness measures for both distributions will be zero.Only kurtosis can indicate the difference.Therefore, Suzuki [10] applied a nonparametric statistic for phase-I analysis in statistical process control to analyse waiting times.One may also overcome this problem by considering a more general alternative that is equality of two underlying distributions.In this case, the Kolmogorov-Smirnov test Gibbons and Chakraborti [11], namely KS, is the standard omnibus test statistic for the equality of two distributions.Since KS is not powerful against certain alternatives, various omnibus test statistics are proposed with much discussion.Two-sample omnibus test statistics, such as the two-sample Cramér-von Mises Anderson [12] and the two-sample Anderson-Darling Pettitt [13] test statistics are well known and widely used.In addition, Neuhäuser et al. [14] recommended to use the two-sample test based on the likelihood ratio Zhang [15] and the modified Baumgartner Murakami [16] test statistics instead of KS.
Since there are many different test statistics, the practitioners are difficult to select an appropriate test statistic in a certain situation.A efficient test exists for every distribution but the underlying distribution is not known in practical analysis.Under this circumstance, e.g.Randles [17] and Hogg [18], Hogg et al. [19] proposed adaptive tests in nonparametric setup.In addition, Büning [20] proposed an adaptive test for the multisample location problem based on selectors suggested by Hogg [18].In addition, Neuhäuser [21] proposed an adaptive test for the two-sample location problem by using the selectors of Hogg [18].
For two-sample scale problem, Kössler [22] proposed the adaptive test based on new selectors for skewness and tail-weight.Büning and Thadewald [23] proposed the two-sample location-scale adaptive test, and used a selector that suggested by Büning [24].For the onesample location problem, recently, Kitani and Murakami [25] proposed an adaptive test and new selectors.Alternative to the adaptive test, one can choose the maximum test based on various nonparametric statistics.For example, Neuhäuser [21] compared the validity of maximum test with the adaptive test for the two-sample location problem.Additionally, Neuhäuser and Hothorn [26] discussed that a maximum test is an adaptive permutation test.
Recently, Ebner et al. [27] proposed a new omnibus goodness-of-fit test statistic for a specified distribution.The power of the new test is compared with several existing tests such as the one-sample Kolmogorov-Smirnov D'Agostino and Stephens [28], the one-sample Cramér-von Mises D'Agostino and Stephens [28], the one-sample Anderson-Darling D'Agostino and Stephens [28], and the one-sample likelihood ratio test Zhang [29].Ebner et al. [27] showed the validity of the proposed test statistics for different scenarios.Therefore, in this paper we focus on extending the test statistic of Ebner et al. [27] to the two-sample testing problem.The rest of the paper is organized as follows.In Section 2, we extend the test statistic of Ebner et al. [27] to the two-sample test statistic.In addition, we list some critical values of the proposed test statistic and investigate the convergence of the test statistic to the boundary distribution.In Section 3, we introduce a maximumtype test statistic and an adaptive-type test statistic based on a selector rule.In addition, we investigate the robustness of the adaptive test statistic through a simulation study for different distributions.The power performance of the proposed test statistic is investigated via Monte-Carlo simulation in Section 4. Section 5 illustrates the proposed test statistic for two real data sets.Finally, concluding remarks are presented in Section 6.

Revisiting the one-sample test statistic
Let Y 1 , Y 2 , . . ., Y m be a random sample from a continuous distribution function G. Recently, Ebner et al. [27] proposed a new omnibus test statistic to test the hypothesis where G 0 is a specified continuous distribution, against Then, the null hypothesis is rewritten as where U(0, 1) is the uniform distribution with the unit interval.Ebner et al. [27] showed that U ∼ U(0, 1) if and only if Replacing the expected value in (1) by its empirical counterpart, Ebner et al. [27] proposed a new test statistic as follows: T 1 is based on a characterization of the uniform distribution.The proposed test motivated by characterization is known to have desirable properties Nikitin [30].In fact, T 1 of Ebner et al. [27] is not based on the order statistic.Since the aim of this paper is to propose a two-sample statistic based on T 1 , we use this representation in this paper.Moreover, Ebner et al. [27] listed the asymptotic critical values of T 1 for testing H K against H A .From a historical view, it is an important topic to replace the weight function to raise the power of the test statistic.For example, hydrologists are interested in estimates of flood magnitudes for high return periods.In this case, we encounter right-skewed data.Sinclair et al. [31] proposed the modified Anderson-Darling test statistic which emphasizes the upper tail of distribution by using the weight function (1 − t) −1 , t ∈ (0, 1).The test statistic of Sinclair et al. [31] is suitable for right-skewed distribution.Therefore, in this paper, we apply the idea of Sinclair et al. [31] to the test statistic of Ebner et al. [27].Then, by replacing U with 1−U, we can consider another type of test statistic as Note that the expected value, the variance and the critical values of T 2 are the same as that of T 1 .

Two-sample test statistic based on Ebner et al. [27]
Let X 1 = (X 11 , . . ., X 1n 1 ) and X 2 = (X 21 , . . ., X 2n 2 ), N = n 1 + n 2 be two random samples of size n 1 and n 2 from absolutely continuous populations with the cumulative distribution functions F 1 (x) and F 2 (x), respectively.Then, we are interested in testing the following hypothesis Let R i , i = 1, . . ., n 1 and S k , k = 1, . . ., n 2 denote the combined-samples ranks of the X 1 and X 2 in increasing order of magnitude, respectively.In this section, at first, we consider the two-sample test statistics based on T 1 and T 2 .
We denote that F1 (x), F2 (x) and F(x) are the empirical distribution function of X 1 , X 2 and the combined sample, respectively, and X (p) be p th order statistic of combined sample (X 1 , X 2 ).Then, the empirical distribution function of combined sample is given by In addition, we have at the i th observation of X 1 , and at the k th observation of X 2 .By applying these facts, pth order statistic from uniform distribution with unit interval is represented as where the scaling factor N/(N + 1) is used to avoid potential problems with blowing up at the boundary of (0, 1).Let U X 1 and U X 2 be the random variables produced by the transformation U X 1 = F 1 and U X 2 = F 2 , respectively, that is U X 1 , U X 2 ∼ U(0, 1).Then, it is possible to replace U with U X 1 or U X 2 .Additionally, we denote order statistic U X 1 (i) if X (i) is from the first sample and U X 2 (k) if X (i) is from the second sample.Herein, by (1), U ∼ U(0, 1) if and only if Then, we obtain Replacing the expected value in (2) by its empirical counterpart, we have a two-sample test statistic V 1 based on T 1 as follows: A two-sample statistic based on T 2 , we have where Remark that since Ebner et al. [27] derived the limiting distribution of T 1 when testing for the uniform distribution, then the limiting distribution of V 1 (V 2 ) is that of T 1 (T 2 ).

Maximum and adaptive test procedures
Let us denote the inverse rank of R i and S k as Similarly, V 2 based on R i and S k is equal to V 1 .This property indicates that the power of are not location invariant test statistic.In practical analysis, we have to determine whether to use V 1 or V 2 before we treat the hypothesis test.Hence, we propose the maximum-and adaptive-type test statistics.

Maximum test statistic
A first and simple way to solve this problem is to use the larger of the two statistics as the test statistic.Then, we propose a maximum test statistic as follows: Herein, we list the critical values of V 1 , V 2 and V max for selected sample sizes in Table 1.By using the exact permutation method, we can derive the exact distribution of test statistics for small sample sizes.However, when the sample sizes are moderate to large, deriving the exact distribution is difficult.Then, we estimate the critical value of V 1 , V 2 and V max for the moderate to large sample sizes ((n 1 , n 2 ) ≥ (15, 15)) by 1,000,000 times permutations.Table 1 reveals that V 1 and V 2 converge to the limiting distribution for large sample sizes.However, we list the exact critical values for some selected sample sizes.In practice, an approximate permutation test with a random sample is possible for evaluating p-value .For example, we may not evaluate the exact p-value of the test statistic for the case of

Adaptive test statistic
Another method is to choose whether to use the test statistic V 1 or V 2 through the selector.
It does not matter to choose V 1 or V 2 for a symmetric distribution with pure location or pure scale shift.However, the powers of V 1 and V 2 are different when the location and scale parameters shift simultaneously.For left-skewed distributions with pure location shift, one test statistic is superior to another test statistic.However, we encounter right-skewed data in many applications.For right-skewed distributions with pure location shift, another test statistic is more powerful than one test statistic.For real data analysis, the pure location shift, the pure scale shift or the pure other parameter shift with symmetric distribution are rare cases.In addition, for example, the shape parameter affects the expected value in asymmetric distributions.Therefore, we have to determine whether to use V 1 or V 2 by the structure of data such as skewness.As a measure of the skewness of the distribution, Hogg et al. [19,32] gave where Lγ , Mγ and Ûγ denote the average of the smallest, middle and largest γ N order statistics, respectively.This selector statistic is rather familiar in practice, for example it has been used by Marozzi [33] when assessing scale differences.When 0.95N, 0.5N or 0.05N are not integers, the fractional items are used.If Q > 1, then the right tail of the distribution seems longer than the left tail; that is, there is an indication that the distribution is skewed to the right.On the other hand, if Q < 1, the sample indicates that the distribution may be skewed to the left.Let Z = (X 1 , X 2 ) be a combined sample of X 1 and X 2 .Assume that, for example, the distribution of X 1 is a right-skewed distribution and the distribution of X 2 is a left-skewed distribution.In this case, the feature of skewness for the distribution of the combined sample Z disappear.By applying the idea of the modified Anderson-Darling test statistic (Sinclair et al. [31]), when the combined sample or both single samples are strongly left-skewed distributions, we expect that V 1 is more useful than V 2 .On the other hand, we expect V 2 is superior to V 1 when the combined sample or both single samples are strongly right-skewed distributions.Therefore, in this paper, we also suggest an adaptive test statistic V adp based on a selector rule by where Hill [34] showed that adaptive procedures can be safely applied when the size of each sample is at least 20 in their simulation study.Therefore, for small sample sizes we also investigate a test procedure where the selector is calculated for each permutation.The resulting adaptive permutation test, called V perm , guarantees the type I error even for small sample sizes and in the presence of ties Neuhäuser and Hothorn [26].
The rank vector of the full sample is independent of the order statistic for continuous distributions Randles and Wolfe [35].However, the suggested selector I(V 1 , V 2 ) uses the order statistics of the single samples.Therefore, this does not guarantee the independence between the selector and the test statistics for continuous and discrete distributions.Our simulations show that the level of significance is practically satisfied.
Since V adp is designed for continuous data, we also use V perm for discrete data.For the latter statistic, we neither need the independence between the selector and the test statistic nor the continuity of the underlying distribution is required, see e.g.Neuhäuser and Hothorn [26].

Power comparison
This section presents the performance of the proposed test statistics based on Monte-Carlo simulation.Our simulation study is performed with R Core Team [36].The proposed test statistics are compared with existing competitors: the Wilcoxon rank sum test statistic W Wilcoxon [37], the Lepage test statistic LP Lepage [8], the Kolmogorov-Smirnov test statistic KS Gibbons and Chakraborti [11], the Cramér-von Mises test statistic CvM Anderson [12], the Anderson-Darling test statistic AD Pettitt [13], the Zhang test statistic ZH Zhang [15], the modified Baumgartner test statistic MB Murakami [16].There are many other nonparametric tests such as the Fligner-Policello test statistic FP Fligner and Policello [38] and the Brunner-Munzel test statistic BM Brunner and Munzel [39].However, FP and BM are developed for the Behrens-Fisher problem.This is not the test problem considered in this paper.In addition, the powers of FP and BM are similar to W for our settings.Therefore, we do not to include FP and BM in the simulation study.According to Neuhäuser et al. [14], in five journals in between 2010 and 2013, H 0 : F 1 (x) = F 2 (x) was tested in 50 papers: in all cases, KS was used.Although KS is widely used in many papers to test H 0 , KS is not a very powerful test.This is not surprising because KS is conservative for small to moderate sample sizes.Therefore, Neuhäuser et al. [14] compared the power of KS with ZH and MB by using normal, χ 2 , exponential, Laplace, Poisson, binomial and negative binomial distributions.Then, Neuhäuser et al. [14] recommended to use ZH or MB instead of KS from point of view of enhancing power and controlling type I error rate.There are other tests, like the Cucconi test, with no test being uniformly most powerful within the nonparametric framework.We discuss the performance for continuous and discrete distributions in Section 4.1 and Section 4.2, respectively.We focus on the case of equal sample sizes for power comparisons.For the results of unequal sample sizes, see the supplemental file.

Continuous distributions
In this section, the simulated powers are obtained by 1,000,000 Monte-Carlo simulations for the continuous distributions.
Algorithm 1 Calculate the simulated power of test statistic 1: Set the number of simulation repetitions B 2: X 1 ← generate n 1 random numbers according to F 1 3: X 2 ← generate n 2 random numbers according to F 2 4: Calculate the test statistic from datasets (X 1 , X 2 ) 5: Independently, repeat step 2 to step 4 B times 6: Count the number of test statistics greater than the critical value 7: Total count is divided by B In this paper, we use following distributions to investigate the performance of various statistics.

Robustness
The empirical type I errors of the adaptive test V adp are listed in Table 2, together with that of their competitors.In addition, the results of competitors are listed in Table 2.For the case of (n 1 , n 2 ) = (10, 10), we use the exact critical values for all statistics, that is, an exact permutation test is applied.For the case of (n 1 , n 2 ) = (30, 30), we use the estimated critical values to investigate the performance of type-I error rate.As mentioned in Section 2.2, the critical values for V 1 and V 2 are the same.Therefore, the critical value of V 1 (V 2 ) can be used as the critical value of V adp .Hence, in large-sample situations, V adp has the advantage that an asymptotic test can be performed.In addition, the adaptive permutation test V perm is investigated based on 100,000 Monte-Carlo simulation with 1,000 permutations for (n 1 , n 2 ) = (10, 10).This setting gives a precise estimation of type I error rate and power.
Table 2 indicates that the type-I error of V 1 , V 2 , V max and V adp are around the significance level as expected because exact permutation tests are performed for small sample sizes.As a similar assertion of Hill [34], the adaptive test can be used for larger sample sizes, that is (n 1 , n 2 ) = (30,30).However, the permutation version of adaptive test V perm can be used more safely for small sample sizes.In this case the actual level is closer to the nominal one for of V perm compared to of V adp .
We compare the powers of V max , V adp and V perm with existing test statistics for various scenarios in next section.

Symmetric distribution
In this paper, since the performance of various statistics have similar pattern, we use normal distribution and logistic distribution for the symmetric distribution.For the case of (n 1 , n 2 ) = (10, 10), we use the exact critical values for all statistics, that is, an exact permutation test is applied.At first, we list the simulated power for the most common distribution, that is normal distribution, in Table 3.
As an example for a non-normal symmetric distribution, we use logistic distribution and list the simulated power in Table 4. From Tables 3 and 4, it is difficult to determine the winner for W, LP, KS, CvM, AD, ZH and MB.In addition, the power of V perm is similar to V adp as expected because the difference of the type-I error between these two tests in Table 2 is small.By comparing the powers of V 1 , V 2 , V max and V perm (V adp ) with existing test statistics, we summarize the results as follows: • the power of V perm (V adp ) is similar to the power of V max in many cases.However, V perm (V adp ) seems to be slightly better than V max .• for example, the power of V 1 for the case (μ 2 , σ 2  2 ) = (−0.50,0.5 2 ) is similar to that of V 2 for the case (μ 2 , σ 2 2 ) = (1.50,0.5 2 ) which confirms the known result that V 1 and V 2 are not location invariant for normal distribution.We have the same result for logistic distribution.However, the power of V perm (V adp ) is similar to higher power of V 1 or V 2 .
• V 1 is better than V 2 if one sample with smaller location has smaller variance or with larger location has larger variance.On the other hand, V 2 is better than V 1 if one sample with smaller location has larger variance or with larger location has smaller variance.The power of V perm (V adp ) is between that of V 1 and V 2 in most cases.Therefore, V perm (V adp ) is more suitable than V 1 and V 2 .• the power of V perm (V adp ) is similar to the maximum power of {W, LP, KS, CvM, AD, ZH, MB} in most cases.
Thus, we conclude that the powers of V max and V perm (V adp ) are stable compared with various existing test statistics for normal and logistic distributions.

Asymmetric distribution
In this section, we investigate the power of various test statistics for asymmetric distributions.As examples for asymmetric distributions, we choose χ 2 distribution and skew normal distribution.In Table 5, we show the result for chi square distribution.
From Table 5, V adp has the highest power among V max , V adp and V perm , though the differences are marginal.Additionally, the statistics AD, ZH, MB, V adp and V perm are more suitable than the other test statistics.For other cases, we summarize the results as follows: • more than 60 % cases, we obtain that the power of V adp is greater than or similar to the maximum power of {AD, ZH, MB}.• even if the power of {AD, ZH, MB} are higher than that of V adp , the power of V adp is similar to the maximum power of {W, LP, KS, CvM}.
Thus, V adp is more suitable than various existing test statistics for chi square distribution.
In addition, we use skew normal distributions as another asymmetric distribution.The simulated powers are listed in Tables 6 and 7.
The power of V perm is similar to V adp as expected because the differences of type-I error between these two tests in Table 2 are small.Tables 6 and 7 indicate that V max is superior to V perm (V adp ) for two cases.On the other hand, V perm (V adp ) is more suitable than V max for two cases.For other cases, the difference of power between V max and V perm (V adp ) are less than 5%.Therefore, the power of these statistics are similar but V perm (V adp ) is slightly better than V max .By comparing the powers of proposed statistics with existing test statistics, V max and V perm (V adp ) are more suitable than KS, CvM, AD, ZH and MB.In addition, the power of LP is higher than other test statistics for the cases of μ 2 < 0 in Table 6 and μ 2 > 0 in Table 7.However, for other cases, V max , V perm (V adp ) is more efficient than LP.Therefore, V max and V perm (V adp ) are more useful than various existing test statistics.
From Tables 6 and 7, interestingly, although the hypothesis is the general alternative, LP can be superior to other statistics.Similar results have been already indicated by Marozzi [40].To find the reason of this result is remaining as a future work.

Different distributions
In this section, we investigate the power performance of various test statistics for the different distributions.We focus on normal-skew normal distribution and uniform-beta distribution.At first, we use normal and skew normal distributions in Table 8.

Discrete distribution
In this section, we relax the assumption of the continuous distribution.The simulated powers are obtained by 10,000 Monte-Carlo simulations with 1000 permutations for the discrete distributions.In this paper, we use following distributions to investigate the performance of various statistics.
• RN(μ, σ 2 ): rounded normal distribution with mean μ and variance σ 2 .In this paper, we round to the second decimal place.
Note that Q(Z) becomes not a number in very rare cases, for example if Z = (1, 1, 1, 1, 1).In such cases, we use a random selection of V 1 or V 2 in this paper.
Thus, we use the selecting rule for discrete distribution as follows: otherwise ⇒ randomly select V 1 or V 2 .

Robustness
As mentioned above, the independence between the selector and the test statistics is not guaranteed for discrete distributions.This is a problem for adaptive tests in case of discrete distributions.Therefore, we use V perm instead of V adp in this case.Then, neither the independence between the selector and the test statistic nor the continuity of the underlying distribution is required, see e.g.Neuhäuser and Hothorn [26].We investigate the robustness of V perm for discrete distribution and list the results in Table 10.Table 10 indicates that the type-I error of V 1 , V 2 , V max and V perm are around the significance level.For the geometric and binomial distributions, the tests V 1 , V 2 , V max and V perm are slightly conservative, type-I errors of about 0.045.Therefore, we compare the powers of V max and V perm with existing test statistics for various scenarios in next section.

Same type of distributions
For power comparisons for discrete distributions, we select Poisson distribution and negative binomial distribution.Since different statistics are the best for different sample sizes and alternative configurations, we briefly report the results of simulation studies for discrete distributions.We use Poisson and negative binomial distributions and list the simulated power in Tables 11 and 12, respectively.From Tables 11 and 12, AD, ZH and MB have high power compared with various test statistics but the powers of V max and V perm are similar to these test statistics in some cases.In addition, V max and V perm are more efficient than KS and LP, and have stable power for various alternative configurations.

Different distributions
Finally, we investigate the power of various test statistics for different discrete distributions.In this paper, we use Poisson and negative binomial distributions in Table 13.
From Table 13, comparing the power of existing test statistics shows that the power of AD is superior to other existing test statistics for all cases.However, the power of V perm is similar to that of AD for equal sample sizes.In our simulation settings, the results indicated that V perm is more powerful than LP.Therefore, the power of V perm has stable high power compared with various existing test statistics in many cases, except for AD.

Rank sums of test statistics
In this paper, the powers of various test statistics are compared for continuous and discrete distributions via simulation.We compute the ranks of the test statistics according to the simulated power.For each alternative configuration, the worst power is assigned the rank one, that with the largest power is assigned the rank eight.Thus, the test statistics with the largest ranks are the best.
In Table 14, we list the rank sums of test statistics over all considered alternatives.Table 14 indicates that the rank sum of the proposed test statistic V adp is greater than 70 for all continuous distributions.In few cases, the rank sums of test statistics (ZH, AD in case of N-SN and U-Beta) are higher than that of V adp .However, the rank sums of ZH and AD are less than 70 for other distributions.For discrete distributions, the rank sum of V perm is similar to that of AD.Therefore, we may state that V adp (V perm ) have stable high power for various distributions, compared with other statistics.the blood may be a sign of liver disease or damage to the bile ducts.We are interested in testing whether the GGT level of patient of hepatitis is same as that of cirrhosis.From Table 16, although V 1 can not reject the null hypothesis at the 10% level, V 2 is significant with the smallest p-value of all considered tests.The test V perm has the second smallest p-value which is only slightly larger.Considering the problem of choosing V 1 and V 2 , this example reveals that V perm is useful for real data analysis.

Concluding remarks
In this study, we considered two-sample statistics based on the goodness-of-fit test statistic introduced by Ebner et al. [27].Sinclair et al. [31] proposed a modified Anderson-Darling test statistic with a weight function to increase the power for right-skewed distributions.We applied the idea of Sinclair et al. [31] to the test statistic of Ebner et al. [27].As we expected, in fact, V 2 was suitable for right-skewed distributions in our simulation studies.The weighting of V 2 emphasizes the upper tail of the distribution as the test statistic of Sinclair et al. [31].We then proposed the maximum and adaptive test statistics.Simulation studies showed that both the maximum test and the adaptive test control the type I error rate.However, the permutation version of adaptive test V perm could be used more safely for small sample sizes.We also showed that our proposed tests are more powerful than existing tests for various settings.The result of rank sum also indicated that the proposed test statistics were more suitable than other test statistics for various distributions.For the moderate sample sizes n 1 = n 2 = 30, we confirmed that the pattern of simulation results were similar to the cases of small sample sizes for various distributions.However, to save the space, we only remark the simulation results in this paper.Two practical examples illustrated that V max , V adp and V perm are useful even when there is no valid information about whether V 1 or V 2 should be used.Therefore, we recommend using the V max , V adp and V perm test statistics to test the equality of two distributions.Recently, Mukherjee et al. [41] extended the test statistic of Kössler and Mukherjee [7] to multi-sample testing.Then, extending our proposed test statistic to multi-sample testing problems in the light of Mukherjee et al. [41] is a possible direction for further research.Another promising direction for further research is the extension to multivariate comparison studies which are increasingly encountered in many fields like genetics and metabolomics.Guidelines of this extension are given in Pesarin [42] and Marozzi et al. [43].

Table 2 .
Type-I error of test statistics for various distributions with 5% significance level.

Table 10 .
Type-I error of test statistics for various distributions with 5% significance level.

Table 15 .
Cholinesterase level of patients of hepatitis and fibrosis and p-values.

Table 16 .
Gamma-glutamyl transferase level of patient of hepatitis and fibrosis.