A simulation study of a class of nonparametric test statistics: a close look of empirical distribution function-based tests

Abstract Kolmogorov–Smirnov (KS) statistic is a non-parametric statistic based on the empirical distribution function. For the one-sample case, it uses the supremum distance between an empirical distribution function (EDF) and a pre-specified cumulative distribution function (CDF). For two-sample case, it measures the maximum of the distance between two EDFs. KS test, as well as other EDF-based tests such as the Anderson-Darling (AD) test and Cramer-von Mises (CvM) test, has been widely used in statistical analysis. To address and compare the performance of these test statistics, we have conducted a simulation study comparing the type I error and power of the KS test, the CvM test, the AD test, and the Chi-squared test. Our study includes both one sample and two sample tests and for both independent and correlated samples. Our study showed that if we do not have prior information about the tested distributions, EDF-based tests are better. However, so long as we have prior information about the tested distribution and the density of two distributions is bell-shaped and we are expecting differences in variance/sparseness, then the Chi-squared test may be more preferable. When correlation exists between tested samples, adjustment on the informative sample size is important and required.


Introduction
The Kolmogorov-Smirnov (KS) distribution has been widely adopted in evaluating if a set of interested samples is from a theoretical population distribution, or if two sets of interested samples are from the same population distribution. Together with other goodness of fit tests, the Chi-squared test, the Cramer-von Mises (CvM) test, and the KS test, researchers are given a considerable library of tests to choose from. The parametric tests, for example, t-test, have been widely used across many areas. One of the key assumptions of the t-test is that the samples are observed from normal distributions. In the field of clinical trial, however, sometimes samples are not from normally distributed variables, e.g., triglycerides (Miller et al. 2011). The tests studied in our paper fall in the category of "distribution-free methods" and do not require normality for tested samples. When the distribution of the tested samples is unknown or hard to assume, these "distribution-free methods" are useful tools for statistical inferences. Though it is a good thing to be provided with a variety of methods to apply for different problems, one may find it is hard to decide which methods to apply. To address these issues, we conducted a review of the performance of the KS test, the CvM test, the Anderson-Darling (AD) test, and the Wilcoxon rank-sum test. The assessment will be both on one sample and two-sample tests. One of the purposes of conducting our current simulation study was to select a proper test statistic for assessing properties of positron emission tomography (PET) scans from cardiac patients.
In the year of 1933, Kolmogorov published a short but landmark paper, in which he formally defined empirical distribution function (EDF), in the Italian Giornale dell-Istituto Italiano degli Attuari (Kolmogorov 1933).
To define the empirical distribution function, let x 1 , x 2 , :::, x nÀ1 , x n be the realizations of random variable X having the FðxÞ ¼ prðX xÞ: Let Then the EDF of X is defined as: Iðx i xÞ It could be easily seen that the EDF F n ðxÞ is the proportion of x 1 , x 2 , :::, x nÀ1 , x n of X less than or equal to x. Kolmogorov showed that the EDF F n ðxÞ converges to the cumulative distribution function (CDF) F(x) (Kolmogorov 1933). This led to the definition of Kolmogorov statistic (or Kolmogorov-Smirnov statistic) D. The statistic D given finite sample size n is defined as D ¼ sup x jF n ðxÞ À FðxÞj Smirnov (1936) worked out the two sample version of the KS statistic, which is defined as D n, m ¼ sup Later, Smirnov proposed the Cramer-von Mises statistic (CvM statistic) x 2 , which can be viewed as an extension of the KS statistic, derived from Cramer's work in 1928 and von Mises's work in 1931(Mises 1928Smirnov 1939;Simpson 1951). In which, Smirnov also found the asymptotic distribution of x 2 , in the form of a sum of weighted Chi-squared variables.
Choulakian extended the CvM statistic into the scope for discrete distributions or continuous distributions being grouped (Choulakian, Lockhart, and Stephens 1994). For two sample case, consider x Ã 1 , :::, x Ã L as the ordered L-distinct sample of X.
where N is the number of total observations and p i is the probability of failing into category i and o i is the number of observations in category i.
Researchers extended the two sample discrete CvM into a k-sample CvM for discrete distributions or continuous distributions being grouped. Consider ordered observations Z Ã 1 , :::, Z Ã L as the L distinct pooled sample of X and Y (Brown 1982;1994;Lockhart, Spinelli, and Stephens 2007). Let The two-sample CvM for discrete distribution is defined as followed Where S 1j is the number of observations in X not greater than Z Ã j , S 2j is the number of observations in Y not greater than Z Ã j , and ðn þ mÞp j is the number of observations of a pooled sample of X and Y coinciding with z Ã j : By modifying the weight factor of the CvM statistic, T. W. Anderson and D. A. Darling proposed the Anderson Darling statistic (AD statistic) A (Anderson and Darling 1952).
AD statistic under discrete setting is defined as follows.
where f 1j be the number of observations in X coinciding with Z Ã j , f 2j be the number of observations in Y coinciding with Z Ã j and let The Chi-squared test statistic is defined as where k is the number of categories, o i and e i are the number of observed and expected in category i, respectively.
Traditionally, these tests are based on the assumption that the samples are independent. However, in real life, there are many cases that dependence exists. For example, spatial correlation is well observed in geology (Fotheringham 2009). Rainfall information collected at nearby locations may be closer in value than those collected at locations farther apart. Similarly, the spatial correlation also exists in the PET scans. Our simulation evaluated how these tests were biased under correlations.

Methods
Tests mentioned in the Introduction are in the category of "distribution-free method" which means their statistical properties remain the same under a large class of distributions. To evaluate the performance of the KS test, the CvM test, the AD test, and the Chi-squared test, we used various hypothesis test settings under different sample sizes and at nominal levels of 0.05. To study the properties of the above tests in the presence of dependence patterns, we generated two sample correlated samples with spatial autocorrelations.

Simulation
In our study, we used R to simulate several commonly encountered distribution scenarios. Weibull distribution Wðc, kÞ, with scale parameter k and shape parameter c, is commonly seen in survival analysis, engineering and geology. Based on the values of shape parameter c, Weibull distribution could be used in modeling failure time data with either an increasing or a decreasing hazard rate and therefore offer tremendous flexibility to researchers. Meanwhile, Weibull distribution of shape parameter c and scale parameter k makes us able to control the skewness of the simulated distributions.
Normal distribution Nðl, r 2 Þ is the most widely used distribution in most areas of statistics. It has a hazard function increases in time with no upper limit. Another versatile of the normal distribution is the results from applying the central limit theorem.
Multinomial distribution Mult(n, p) is commonly used to model the categorical phenomenon. For n independent trials each of which leads to success for exactly one of k, give fixed success probability p k . Multinomial distribution may model the probability of any particular combination of numbers of successes ðx 1 , x 2 , :::, x k Þ for the various categories.
In the one-sample cases, we tested if the distribution function F(x), where simulated realizations X were from, equals to the pre-specified distribution function of G(x). In the two-sample cases, we simulated realizations X and Y from distribution function F(x) and G(x), respectively. Then we tested if the null hypothesis (1) is rejected for realizations X and Y.
Meanwhile, r controls the shape and density of the probability curve in normally distributed data. The mean parameter l from normal distribution shifts the entire curve while not changing shape. Therefore, the change in r and l provide us an opportunity to test the performance of the tests under shape differences and location differences, or both differences.
Lastly, in the multinomial distributed data group, we were able to evaluate the performance of the KS, the CvM and the AD tests when data are discrete. When, unfortunately, certain parameters of the distribution were not available and we are left with no option but to estimate these parameters from the sample, then results from the KS test will be conservative. Methods were proposed to extend the EDF tests on discrete data (Mises 1931;Lilliefors 1967;Crutcher 1975). Therefore, we simulated data from multinomial distribution under various conditions.

Correlated realizations
An important and easily overlooked assumption of the KS type statistic is the independence across random variables. The original form of the Kolmogorov statistic was under ideal setting, which has an assumption of independent identical distribution among random variables. However, unfortunately, in reality, the assumption of independence among random variables may not be met. Studies suggest that, both theoretically and empirically, use of the KS statistic without adjusting for correlation between subjects lead to an inflated proportion of Type I errors in the existence of positive correlation, and otherwise a deflated proportion of Type I errors when subjects are negatively correlated (Clifford, Richardson, and H emon 1989;Dutilleul et al. 1993). Our simulation studies the effects of correlation among random variables on the EDF-based test.
In order to simulate correlated samples, we applied the copula method (Joe 1997). For the sake of easy computation and estimation, we choose a Gaussian copula method for its relatively high accuracy. The procedure of copula methods to simulate bivariate correlated Weibull distribution is as followed.
(1) First, we choose a covariance matrix R that reflects the correlations relationship in our targeted samples. Based on the covariance structure we would like to achieve, we draw correlated samples X 1 ¼ ðx 1, 1 , x 1, 2 , x 1, 3 , :::, x 1, n Þ and X 2 ¼ ðx 2, 1 , x 2, 2 , x 2, 3 , :::, x 2, m Þ from standard bivariate Gaussian distribution. Therefore we may have (2) Find the CDF of X 1 and X 2 as /ðX 1 Þ, /ðX 2 Þ: (3) In order to simulate correlated samples Z 1 ¼ ðz 1, 1 , z 1, 2 , z 1, 3 , :::, z 1, n Þ and Z 2 ¼ ðz 2, 1 , z 2, 2 , z 2, 3 , :::, z 2, m Þ from the targeted distribution, we find the targeted inver-CDF function as F À1 ðZ 1 Þ and F À1 ðZ 2 Þ (4) Compute the following function and our interested correlated samples may be obtained There are several choices for the correlation matrix to simulate the bivariate Gaussian distribution. Rank correlation coefficients, such as Kendall's s and Spearman's q, are usually preferred as they are invariant to strictly increasing transformations (Ding and Li 2013). The linear correlation coefficient, on the other hand, may not be invariant to non-linear transformations but may be applicable directly to simulate normal distribution in the first step. In addition, the trend of the correlation relationship between samples is invariant. Dithinde used a translation-based lognormal model with Pearson's r to capture the correlation structure between two hyperbolic curve-fitting parameters and have relatively well results (Dithinde et al. 2011). Genest reported the simulation with Pearson's r measuring the correlation structure to be performing reasonably well when simulated sample size n is 50 or larger. We used Pearson's r to simulate the bivariate normal distribution (Genest and Rivest 1993).
The performance of the EDF-based tests and the Chi-squared test was evaluated by their simulation results of type I error and power. To evaluate the effects of sample size on type I error and power, we simulated samples of size n ¼ ð10, 20, 30, 100, 500Þ: Type I error and power will be analyzed from realization results of 10,000 repeated iterations.

Comparison of one-sample tests
For continuous distributions, such as normal distributions and Weibull distributions. We found that EDF type tests achieved the type I error, that is, the empirical values were reasonably close to the nominal levels even when the sample size was relatively small (n ¼ 10). When sample size n ! 30, all tests have a type I error around the nominal level of 0.05 (Results are in the supplemental).
From Table 1, we can see that, when the data is multinomial distributed, the KS test, as Conover mentioned in his paper, is more accurate when the sample size is less than 30 (Conover 1972). On the other hand, when the sample size n > 30, the modified KS test produced a conservative type I error. In addition, we found that the Conover's KS test performs better when the discrete distribution is symmetric and has heavy tails. It is more conservative when the data is skewed. Moreover, the EDF based tests are heavily influenced by the number of groups. They seem to perform better in multinomial distribution with 5 groups than that of 2 groups. As the Chi-squared tests are for discrete samples, it performs the most stable among the 4 tests, it tends to be more accurate when the sample is symmetric and with more number of groups. In addition, the influence in symmetricity and the number of groups were not observed when the sample size is large than 100.
3.1.2. Comparison of two-sample tests From Table 2, we can see that when data is normally distributed, the KS test and the Chi-squared test were conservative if the sample size is small, say n 100. When the sample size is large, n ¼ 500, the KS, the AD, and the Chi-squared tests all have controlled type I error. However, the CvM tests seem to be a little conservative.
When simulated data are from Weibull distribution, results from Table 2 are similar to that of normal distributions. However, it is noticeable that the Chi-squared test was conservative when the shape parameter of Weibull is 1(heavily skewed), it still is very conservative when the sample size reached 500. In the multinomial case C 4 , the modified AD test has the most controlled type I error.

Correlated samples
From the results from Table 3, we can see that for normal distribution and Weibull distribution, when X and Y were sampled from correlated distributions, all the tests did not achieve nominal type I errors. When the correlation between tested samples is positive then the type I error is underestimated. On the other hand, when correlation negative then we are more likely to have a liberal type I error (Cribbie and Keselman 2003). When the Pearson's r ! 0:5, the EDF-based tests had a type I error of almost 0, however, the Chi-squared test had non-zero type I error rates. When the Pearson's r ¼ À0:8, the type I error may have doubled that of the nominal level, except the Chi-squared test.

Comparison of one-sample tests
Results for normal distributions under the alternative with the same mean and different variance are listed in Table 4. When the sample size is relatively small, n ¼ 10, the Chi-squared test is most powerful, which is significantly higher than the EDF-based tests. Under a relatively large sample size, 100 > n > 20, the Chi-squared test is still the most powerful when the change ratio in variance is below 50%, while when the change ratio in variance large than 100% then the AD test is more powerful.
Empirical power for Weibull distributions are shown in Table 5, when the alternative is scale difference, even under small sample size, n ¼ 10, the EDF-based tests were more powerful than the Chisquared tests. Among the EDF tests, the CvM and the AD have almost identical power under various alternatives. KS has a slightly low power but almost the same as the other two EDF ones. However, when the sample size is relatively large, the gap between the AD, the CvM and the KS are greater, while the order is the AD test > the CvM test > the KS test. When the alternative is the shape difference, similar to scale difference, the AD is the most powerful test in detecting the difference. However, we found that the KS and the CvM are not always better than the Chi-squared test.
From the simulation results of the multinomial cases in Table 6, we see that the EDF-based tests have higher power when the sample distribution is not symmetric. When categories of multinomial distribution are more than 5, the EDF-based tests achieved comparable or higher power than the Chi-squared test. However, when the multinomial distribution is bell-shaped, then the Chi-squared test is the most powerful one.

Two sample tests comparison
From Figure 1, we find that when the alternative was the difference in location (l) shift, then the EDF-based tests are more powerful than the Chi-squared test. Similarly to the previous power analysis on the variance difference, when the assumption of independence among samples is violated, the power of the four tests was relatively lower when there exists positive correlation and relatively higher power when samples were negatively correlated.  N(0, 4). Weibull distribution is from W(1, 2). The results from Figure 2 showed that under normal distribution, the two-sample tests have almost identical power to the one-sample conditions. When the alternative is the difference in dispersion rate (r), then the Chi-squared test is the most powerful one. However, under the twosample condition, the AD test has an acceptable rate to rightly discriminate among alternatives. When the underlying assumption of independence between samples is violated, r ¼ 0.8, then the four tests achieved relatively lower powers than the independent cases. However, when r ¼ À0:8 then the four tests were relatively more powerful to discriminate among alternatives. Power analysis for two-sample tests on normal distributions. Top left, shows power analysis for Nðl 1 ¼ 1, r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 10. Top right shows power analysis for Nðl 1 , r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 100. Middle left was the power for the correlated case with r ¼ 0.8, Nðl 1 , r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 10. Middle right is the power for the correlated case with r ¼ 0.8, Nðl 1 , r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 100. Bottom left is the power for the correlated case with r ¼ À0:8, Nðl 1 , r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 10. Bottom right is the power for the correlated case with r ¼ À0:8, Nðl 1 , r 2 ¼ 4Þ and Nðl 2 , r 2 ¼ 4Þ, where l 2 ¼ l 1 Ã ð1 þ DÞ, sample size N ¼ 100. Figure 3 showed that when tested samples were from Weibull distribution, the simulation results showed that the EDF tests were more powerful than the Chi-squared tests when the tested distributions were significantly different. Given the alternative that X and Y sampled from that of Weibull distribution with identical scale parameter, k, but different shape parameter, c 1 and c 2 , the CvM, the KS and the Chi-squared tests were almost as powerful when the change ratio was less than 50%. However, when the change ratio in the shape parameter of tested Weibull populations was significant, more than 50%, then the EDF-based tests were much more powerful. 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 10. Top right shows power analysis for Nð0, r 2 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 100. Middle left was the power for the correlated case with r ¼ 0.8, Nð0, r 2 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 10. Middle right is the power for the correlated case with r ¼ 0.8, Nð0, r 2 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 100. Bottom left is the power for the correlated case with r ¼ À0:8, Nð0, r 2 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 10. Bottom right is the power for the correlated case with r ¼ À0:8, Nð0, r 2 1 ¼ 4Þ and Nð0, r 2 2 Þ, where r 2 ¼ r 1 Ã ð1 þ DÞ, sample size N ¼ 100. Figure 4 showed results from Weibull distribution with identical shape parameter, c, while different scale parameters, k, generally, the EDF-based tests were more powerful than the Chisquared test. It was worth noting that when the independence assumption for the tested population was violated, the positive correlation leads to a conservative probability of rejecting the null hypothesis when the difference between tested populations are not significant, while the rejecting probability increased drastically when the difference was more significant. . Power analysis for two-sample tests on Weibull distributions. Top left, shows power analysis for Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 10. Top right shows power analysis for Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 100. Middle left was the power for the correlated case with r ¼ 0.8, Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 10. Middle right was the power for the correlated case with r ¼ 0.8, Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 100. Bottom left was the power for the correlated case with r ¼ À0:8, Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 10. Bottom right was the power for the correlated case with r ¼ À0:8, Wðc 1 ¼ 1, k ¼ 2Þ and Wðc 2 , k ¼ 2Þ, where c 2 ¼ c 1 Ã ð1 þ DÞ, sample size N ¼ 100.
Interesting results from Figure 5 were found from the power plots for multinomial distributions. When group numbers in multinomial are small or when the distributions are skewed, EDF-based tests were more powerful than the Chi-Squared test. When the multinomial distributions are symmetric and sample size large than 30, the Chi-squared test has the highest power. The number of groups increases in a multinomial distribution, the more powerful the KS, the CvM, the AD and the Chi-squared test will be. Interestingly, the more skewed the multinomial distributions are, the more powerful the KS, the CvM, the AD and the Chi-squared test will be. . Power analysis for two-sample tests on Weibull distributions. Top left, shows power analysis for Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 10. Top right shows power analysis for Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 100. Middle left was the power for the correlated case with r ¼ 0.8, Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 10. Middle right was the power for the correlated case with r ¼ 0.8, Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 100. Bottom left was the power for the correlated case with r ¼ À0:8, Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 10. Bottom right was the power for the correlated case with r ¼ À0:8, Wðc ¼ 1, k 1 ¼ 2Þ and Wðc ¼ 1, k 2 Þ, where k 2 ¼ k 1 Ã ð1 þ DÞ, sample size N ¼ 100.

Discussion and concluding remarks
As compared to the Chi-squared test, the EDF-based tests have a steeper discriminate curve, in other words, EDF-based test may not perform as powerful to minor differences between tested populations but very powerful toward more significant differences. Also, from the simulation results, we have shown that the AD test has the most satisfactory controlled type I error and power under sample sizes ranged from small to large and across multiple distributions.
The symmetricity of the bell-shape property of distribution is critical for the Chi-squared test. We have noticed a considerable decline of accuracy for the Chi-squared test when the tested distributions were from an unsymmetrical distribution family. On the other hand, EDF-based tests were consistent across distributions.
When correlation exists between tested samples, none of the tests was a suitable choice. The KS test in its original form, the CvM test, the AD test and the Chi-squared test have conservative type I error when the correlation was positive and liberal type I error when the correlation was negative, the degree of conservative/liberal of the tests increases when the degree of correlation increases and vice versa. Noticeably, the Chi-squared test was less vulnerable to the violation of the independence assumption of tested samples than EDF-based tests, in other words, the Chisquared test has less power reduced when correlation exists among tested samples. The informative sample size decreased if a positive correlation exists. Under the extreme case, when all samples are perfectly correlated, they are carrying the same information. Thus, the informative sample size is no more but one (Griffith 2005). Similarly, when negative correlations exist, the informative sample size may be larger than the actual sample size. For the above methods, the degrees of freedom play an important role when interpreting the test statistic and calculating the p-values. When tested samples are positively correlated, for example, pixels in the PET scans, conservative results may be observed if the informative sample size is not adjusted.
We conclude that if we do not have prior information about the distributions going to be tested, the EDF-based tests are better. However, so long as we have prior information about the tested distribution and the distribution is bell-shaped and we are expecting differences in variance/sparseness, then the Chi-squared test may be more preferable. When correlation exists between samples, the type I error and power are biased from the above tests. Adjustment on the informative sample size is important and required under this situation.
Our simulation results for the one-sample KS test in discrete distribution is from Conover's method (Conover 1972). Conover has mentioned in his paper that his discrete KS test is inaccurate when the sample size n is larger than 30. In the two-sample KS test simulation, we applied the KS test which is known to be conservative when the tested distribution is discontinuous.
The Chi-squared test has a relatively better power for continuous distribution when applying an optimal grouping algorithm (D'Agostino 1986). However, our simulation results have shown that the EDF-based tests, such as the KS, the CvM and the AD, were more powerful and robust than the Chi-squared test. Only under certain conditions like the difference only exists in variation and the distribution is bell-shaped, the Chi-squared test is preferred. Among the EDF-based tests, the CvM and the AD outperformed the KS in most cases as they have cumulative the difference while KS used the supremum of the density difference as the testing statistic. When the data is discrete, we may still apply the EDF-based tests due to their higher power. Under the condition that tested samples are correlated, the tests are inaccurate and adjustments account for such effect is necessary.

Funding
This work was partially supported by Weatherhead Foundation.