Simultaneous tests for homogeneity of two zero-inflated (beta) populations

ABSTRACT Motivated by an example in marine science, we use Fisher’s method to combine independent likelihood ratio tests (LRTs) and asymptotic independent score tests to assess the equivalence of two zero-inflated Beta populations (mixture distributions with three parameters). For each test, test statistics for the three individual parameters are combined into a single statistic to address the overall difference between the two populations. We also develop non parametric and semiparametric permutation-based tests for simultaneously comparing two or three features of unknown populations. Simulations show that the likelihood-based tests perform well for large sample sizes and that the statistics based on combining LRT statistics outperforms the ones based on combining score test statistics. The permutation-based tests have overall better performance in terms of both power and type I error rate. Our methods are easy to implement and computationally efficient, and can be expanded to more than two populations and to other multiple parameter families. The permutation tests are entirely generic and can be useful in various applications dealing with zero (or other) inflation.


Introduction
Proportion data falling in the continuum (0, 1) are very common in practice. Examples include the proportion of conifer cover in a particular area, the proportion of household income spent on food and the volume of stroke lesion as a percentage of total brain volume. The family of Beta distributions provides broad flexibility for modeling proportions. Depending on its parameters, the Beta probability density function (pdf) can be right-skewed; left-skewed; "U-" or "J-" shaped; inverted "J-" shaped; or uniform (Ospina and Ferrari, 2012). It can happen, however, that an inflated number of zeros and/or ones in a sample of proportions can render the Beta distribution an unsuitable model, since the Beta distribution takes support on the open interval (0, 1). There are extensive studies of zero-inflated data in the literature (Barry and Welsh, 2002;Marin et al., 2005;Chiogna and Gaetan, 2007). Almost all of them, however, focus on zero-inflated count data, for example, zero-inflated Poisson counts. The proportion data that we are considering here are not based on counts, however, but rather on continuous proportions across space and/or time. Ospina and Ferrari (2010) propose a mixed continuous-discrete distribution for data observed on the intervals [0, 1), (0, 1], or [0,1]. The discrete component of this distribution is defined by a degenerate (point mass) distribution that assigns non zero probability to 0 and/or 1 depending on whether there is zero-and/or one-inflation. In particular, the zero-inflated Beta (BEZI) has a point mass on zero.
Suppose Y ∼ BEZI (p, μ, φ). Then the pdf of Y is where (·) is Gamma function, 0 < p < 1, 0 < μ < 1, and 0 < φ < ∞. The mean and variance of Y are where μ and φ are the mean and precision parameter of Beta component. As is the Beta family of distributions, the BEZI family is quite flexible in shape. In many studies, interest focuses on comparing the distributions of samples or experimental units obtained under different conditions. For example, in an investigation of the proportion of conifer cover in forests, it might be interesting to know whether forests at higher elevation have generally different proportions of conifer than do forests at lower elevations. A two-sample comparison of the means for data such as these may not be sufficient for testing the equality of the two populations. It might be possible, for example, that two samples of forests at different elevations have very similar sample means (i.e., average proportions of conifer cover), but one sample has a larger variance or one sample has a higher proportion of zeros. In cases like this, the samples may clearly come from two distinct populations, but if our focus is on the means only, these populations will be viewed as the same. In this article, we evaluate hypothesis tests aimed at distinguishing multiple features (i.e., parameters) of two BEZI populations, and we introduce several new tests for such situations.
Most existing work for testing the homogeneity of two populations with multiple parameters is focused on location and scale parameters. The strategy is first to test for the equality of scale parameters and once the equal scale assumption is found to be tenable, then to test for the equality of the location parameters. In Normal populations, for example, when comparing two means there is usually an assumption about the variances: they could be known or unknown, equal or not equal, and a z-test or t-test can be applied to compare the means depending on the situation. Or, when comparing two variances we could assume the means are known or unknown, equal or not equal, and then use an F-test with appropriate degrees of freedom. Surprisingly, there is not a well-known hypothesis test that considers the equality of means and variances at the same time. Although the usual multiple parameter likelihood ratio test (LRT) can be used, it appears to be rarely applied in practice. Beyond the LRT, there appear to be few existing tools that address the equality of location and scale parameters simultaneously, let alone tools for cases with more than just location and scale parameters. Singh (1986) develops a test statistic for testing the equality of means and variances of two Normal populations based on combining two independent LRTs: one for testing the equality of means given the variances are equal but unspecified and the other for testing the equality of variances when the means are unspecified and not necessarily equal. In this article, Singh claims that the test is asymptotically optimal in the sense of Bahadur efficiency based on facts derived by Folks (1971, 1973). The idea of combining tests originated with Fisher (1950), though the detailed steps for actually doing the combining were omitted there. Durairajan (1985) also applies Fisher's idea to test the parameters of the inverse Gaussian distribution based on combining two independent LRTs.
In an extension of Singh's work, Paul and Jiang (2005) expand the method from Normal distributions to several other two-parameter distributions including the Negative Binomial and Beta-Binomial distributions. They also consider combining asymptotic independent score test statistics. Their simulations provide evidence that Fisher's method performs well even where there is only asymptotic independence. The authors recommend statistics based on combining score tests because the approach appears to maintain the appropriate level of the test in all the situations they investigate. Thiagarajah (2012) considers a combined test procedure for testing the homogeneity of Weibull (or Extreme value) populations with censored data following ideas similar to Singh (1986) and Paul and Jiang (2005). Simulation shows that Fisher's method of combining statistics performs reasonably well under censoring.
In this article, we adopt Fisher's method to combine the p values of independent LRTs for the three parameters of the BEZI distribution. We also combine asymptotically independent score tests for these three parameters. We discuss the inconsistency of some versions of the score test and evaluate several non parametric tests often applied in situations of non Normality. In addition to these standard parametric tests, we develop two new non parametric and two new semiparametric permutation tests for testing multiple features of the populations simultaneously. One of the non parametric tests considers only location and scale, and the other considers the zero proportion in addition to location and scale. The two semiparametric tests can be viewed as hurdle (Ridout et al., 1998) models-we first model the presence/absence of zeros, and then we model the non zeros. Although motived by data from BEZI populations, the non parametric and semiparametric approaches are applicable to other inflated populations.
The organization of this article is as follows. In Section 2, we describe an example from marine science where in the original analysis, non parametric tests were applied to distinguish two populations, where these non parametric tests typically look at only one feature of the populations. In this example, the underlying data populations are well represented by the BEZI distribution. In Section 3, we introduce the parametric simultaneous tests that we developed for assessing the equivalence of the two populations. Then we discuss two non parametric and two semiparametric permutation tests as alternatives. In Section 4, we give simulation results for evaluating the performance of our new tests and compare them with standard parametric and non parametric alternatives. In Section 5, all tests are applied to the marine science data and compared. We discuss our findings and possible extension of this work in Section 6.

Settlement of onshore barnacle larvae
Tyburczy (2011) compares settlement distributions of onshore barnacle larvae with and without the occurrence of different oceanographic processes. Such processes as large-scale regional relaxation or upwelling and smaller-scale localized diurnal upwelling driven by afternoon sea breezes are thought to influence barnacle settlement. The data consist of observations on two types of barnacle larva (Balanus cf. glandula and Chthamalus spp.) daily or bi-daily settlement information, and covariate information involving the physical processes (regional relaxation, front passage, and diurnal upwelling) in four study areas (Sandhill Bluff (SHB), Terrace Point, Bonny Doon Beach, and Lighthouse Point) collected within northern Monterey Bay, CA, in 2007 (May-September). As described in Tyburczy (2011), the settlement data on intertidal plates was normalized for hours of immersion based on tidal height of the plates and combined with pump samples and larvae counts on larval traps deployed with different depths. The manipulation process compressed the settlement information for Balanus cf. glandula to the range [0, 1), with more than half of them being zeros. Examples of the barnacle larvae settlement data are shown in Fig. 1, where all panels are for data from SHB. Each panel contains overlapping histograms of settlement data with and without the occurrence of a particular oceanographic process. Notice that the two zero counts are also indicated in each panel.
Tyburczy studied the associations between the occurrence of the oceanographic processes and the larvae settlement at each study area. His analysis treated the settlement during and between occurrences of the oceanographic process as two samples which were then compared. Analysis of the association between the oceanographic processes and settlement is certainly complicated by non normality and autocorrelation in the data, as well as uneven sampling intervals. Tyburczy used Superposed Epoch Analysis (Chree, 1913(Chree, , 1914 on logtransformed settlement data (after adding a small constant) with a modified t-test statistic and the difference between sample means as permutation statistics. These test statistics are SEA = (Ē −B)/S and SEAD =Ē −B, whereĒ andB are the sample mean numbers of settlement during the periods when the process did and did not occur, respectively, and S is a modified pooled estimate of standard deviation. Tyburczy also used the Wilcoxon rank-sum test to compare settlement during intervals when each process occurred and those when it did not. After describing our new tests, we will compare them against the tests Tyburczy used by using simulations. We will also compare all tests as they apply to the settlement data.

Fisher's method based on combining tests
Fisher's method, also known as Fisher's combined probability tests, was initially suggested by Fisher (1950). It combines p values from several independent tests into one test statistic using the formula where p i is the p value for the ith hypothesis test. When the p values tend to be small, X 2 will be large, which suggests that the null hypotheses are not true for every test (Paul and Jiang, 2005). When all the null hypotheses are true and the p i 's (more essentially, their corresponding test statistics) are independent, X 2 will have a χ 2 2k distribution, where k is the number of tests being combined. This fact can be used to determine the p value corresponding to X 2 .
The key steps for deriving a simultaneous test via Fisher's method are the following: (1) Partition the general hypotheses into several subhypotheses.
(3) Show the test statistics in 2 are independent (or asymptotically independent).
(4) Use expression (1) to combine the multiple tests to obtain the test statistic for the general (or simultaneous) hypotheses.
Notice that the null parameter space (i.e., the parameter space under the null hypothesis) for (3) is the same as the null parameter space for (2); the null parameter space for (4) is the same as the entire parameter space (union of the parameter space under the null and the alternative hypotheses) for (3); the null parameter space for (5) is the same as the entire parameter space for (4); the entire parameter space for (5) is the same as that for (2). In other words, the parameter spaces have a nested structure. There are other ways to partition the general hypotheses. The ones given here, however, address equality of the different BEZI parameters. Specifically, (3) addresses the equality of the proportions of zero and keeps the Beta components equal. Then, the scale parameter (φ) of the Beta component is evaluated in (4) and finally, the location parameter (μ) in the Beta component in (5).
In another approach, we combine three independent LRTs using Fisher's method. If we let , and let LRT 1 obs , LRT 2 obs , and LRT 3 obs denote the observed value of LRT 1, LRT 2, and LRT 3, respectively, then

... Score tests
As another likelihood-based large sample tool, the score test is particularly appealing because it only requires the estimates of parameters under the null hypothesis (i.e., for the reduced model only). Since the BEZI is a full-rank exponential family (Ospina and Ferrari, 2010), it satisfies the regularity condition needed for the score test. For notational simplicity, we reparameterize p 1 , p 2 , μ 1 , μ 2 , φ 1 , and φ 2 taking The corresponding new general hypotheses are the following: H 0 : δ = 0; β = 0; γ = 0 vs. H 1 : At least one of δ, β and γ differs from 0.
Let 0 and f denote the log-likelihood under the reduced and full models in (6). Then the first and second partial derivatives of f with respect to (p, δ, μ, β, φ, γ ) can be obtained. (6). Using the formula for blockwise inversion of matrices, the score test statistics involving the nuisance parameters for testing (6) (6), S is asymptotically distributed as χ 2 3 (Pace and Salvan, 1997). Since A, C, D involve the expected information matrix, they may be hard to derive in explicit forms. Others (Mukhopadhyay, 2000) have used the observed (or empirical) information matrix to simplify the problem so that becomes the score test statistic we typically use. Asymptotically, its distribution is χ 2 3 (Mukhopadhyay, 2000). Because the observed information matrix is not always positive definite, it can happen that the observedŜ is negative, and so, not consistent for the expected information (Verbeke and Molenberghs, 2007). As a consequence, the p value for hypothesis testing turns out to be 1. Some argue that the negative score statistics implies a rejection of the null hypothesis as the data fit poorly under the reduced model (Morgan et al., 2007). Another way to deal with the negative score statistics, though, is to consider the MLE for (p, δ, μ, β, φ, γ ) from the full model, denoted as (p,δ,μ,β,φ,γ ), rather than the MLE of (p, μ, φ) from the reduced model. Then the corresponding estimators for A, C, D would beÃ,C,D. A new score test statis-ticS =ŝ T (Ã −CD −1CT ) −1ŝ can be constructed, and it is also asymptotically χ 2 3 distributed (Conniffe, 2001;Freedman, 2007).

Non parametric simultaneous tests
In this section, we develop two non parametric simultaneous tests based on permutations to address the question of homogeneity between two BEZI populations. Although we mainly discuss BEZI populations, the application of these new tests is not limited to the BEZI. Our idea is to transform two or three features of the data into one-dimensional tests using squared Mahalanobis distance (Mahalanobis, 1936). Compared with Euclidean distance, Mahalanobis distance takes into account covariance among variables and is scale invariant. We use squared Mahalanobis distance because the square transformation is monotone for non negative values, and in general, taking the square root is computationally inefficient.

... Test based on location and scale
We consider Y 11 , . . . , Y 1n 1 ∼ f 1 and Y 21 , . . . , Y 2n 2 ∼ f 2 , where f 1 and f 2 come from the same distributional family whose first and second moments exist and are finite. We assume that all the observations are mutually independent within and across samples.

Semiparametric simultaneous tests
In this section, we develop two semiparametric tests to compare two zero-inflated populations. Both tests can be viewed as two-stage approaches in which we first analyze the data for presence/absence of zeros, and we then analyze the non zeros (Liu and Chan, 2008). The proportion of zeros is controlled by a parameter p ∈ (0, 1), whereas the non zeros are characterized by a density function, h Y (y). The only requirement is that h Y (y) has finite second moment. We don't assume any particular functional form for h Y (y). This idea is similar to that of hurdle models (Ridout et al., 1998). For example, in the Poisson hurdle model, h Y (y) is zero-truncated Poisson density (Zeileis et al., 2008). h Y (y) is the non parametric component; p is the parametric component. Therefore, the tests we have in this section are semiparametric methods.

... Hybrid test
Now we consider a two-step procedure for testing (8). The first step is testing whether the zero proportions are the same or not. We use Fisher's exact test (Fisher, 1922) for evaluating whether p 1 = p 2 to account for potentially small and unbalanced sample sizes. When the sample size is large and balanced, standard z-test for comparing two proportions using the Normal approximation could be used. If the first step test rejects p 1 = p 2 , we reject the overall null hypothesis in (8) and conclude that the two samples are from different populations. Otherwise, in the second step, we use a permutation test based on sample means and variances of the non zero component (i.e., h Y ) to see whether we can reject the overall null hypothesis in (8). The second step of the procedure is the same as in Section 3.2.1 except we useD and R defined in Section 3.3.1 instead of D and R.
Because the two-step test (denoted as Hybrid in what follows) actually involves two separate tests, we need an adjustment to obtain the correct overall type I error rate. Here we use a Bonferroni correction. This correction assumes independence among test statistics and can reduce the power to detect the true effect (Gelman et al., 2012). For our situation, we prove that the type I error rate under this correction is less than or equal to the nominal level. In other words, the test is a level α test, but a more powerful test exists (see Theorem 2 in the Supplemental document). As with other multiple comparison procedures, one drawback of this hybrid test is that for a particular set of observations, there is no overall p value available. The only thing we can say is whether we reject or fail to reject the null hypothesis at a certain nominal level.

Simulation
To evaluate the performance of the tests we derived in Section 3-in terms of type I error rate and power-we conducted a simulation study. We compared the tests defined in Section 3 with several other tests that typically only focus on a single feature of the populations in a two-sample comparison. Table 1 gives a compilation of all the tests we compare in our simulations. Among them, RS, SEA, and SEAD are designed to compare location parameters only; KS assumes a continuous distribution 1 ; LRT , S, and S are standard likelihood-based tests; the others (MLR, Mscore, Mscore, T , V , U , and Hybrid) are our new procedures, designed for simultaneous comparison between two populations.
In our simulations, we considered several different settings for the BEZI parameters (p = 0.1, 0.3, 0.6, μ = 0.1, 0.5, φ = 10, 100), as well as several different sample sizes, since the likelihood-based tests should perform better for large samples. These sample sizes are n 1 = n 2 = 10, 30, 60, 100 in the equal sample size situations and n 1 = 20 vs. n 2 = 40, n 1 = 40 vs. n 2 = 80, and n 1 = 20 vs. n 2 = 60 in the unequal sample size situations. For each configuration of parameters and sample sizes, we ran 1,000 simulations. For the permutation-based tests, we used 1,999 permutations. The nominal significance level is α = 0.05. Portions of the results regarding type I error rate and power are summarized in Tables 2 and 3.
In terms of type I error rate (Table 2), for both equal and unequal sample size situations, all the tests perform as well as expected, except for the KS test. Because all the assumptions for the RS, SEA, and SEAD are met, they have reasonable type I error rates for all settings. T , V , and U have the desired type I error rate as well. The type I error rate of Hybrid is often less than the nominal level as expected. As sample size increases, the type I error rates of all the asymptotic tests get closer to the desired nominal level. LRT and MLR outperform the score tests. A possible reason is that the independence is exact for the likelihood ratio based tests but only asymptotic for the score based tests. In general, compared with the RS, SEA, and SEAD and our new permutation tests T , V , and U , the likelihood-based tests are slightly inferior, and Hybrid is perhaps too conservative (as expected).
In terms of power, a general result is that when the parameters of the Beta component remain the same and the proportion of zeros gets larger, the power of all tests goes down.  Table . Data are simulated from BEZI(p i , μ i , φ i ) with sample size n i , i = 1, 2; based on , replications, at level α = 0.05 (boldface for values between . and .).
Test n 1 = 10 n 1 = 30 n 1 = 60 n 1 = 100 n 1 = 20 n 1 = 40 n 1 = 20 statistics n 2 = 10 n 2 = 30 n 2 = 60 n 2 = 100 n 2 = 40 n 2 = 80 n 2 = 60 This is likely because the zeros are not as informative as the non zeros. The higher the proportion of zeros, the less information in the sample. For both equal and unequal sample size situations, the difference in locations are easily detected by all tests, as shown in the left panels in Table 3. When the zero proportion and mean of the Beta component are the same (so that the means of two BEZI populations are the same), but the precisions are different, the RS, SEA, and SEAD perform much worse than all of our new procedures (the center panels of Table 3). Because SEA and SEAD are designed for comparison of location parameters, when the real difference is only in the scale parameters, their power is very low for both equal and unequal sample sizes (the center panels of Table 3). The RS has slightly higher power than SEA and SEAD. T and V have marginally smaller power than our other new tests in both equal and unequal sample size cases when only the scales are different. The KS test has better power than the RS, SEA, and SEAD, but they all have inferior power relative to the new tests. When the difference exists only between the zero proportions (the right panels of Table 3), all tests have relatively low power. In general, the empirical power is less than 0.50 unless the sample sizes are large (n 1 = n 2 = 60 and n 1 = 40 vs. n 2 = 80). The new tests are in general more powerful than KS, RS, SEA, and SEAD. When at least two of the parameters are different, all tests have reasonable power. Due to space limitation, we do not include all results here. It is also interesting to notice that the power of the reduced model MLE-based score test is usually smaller than that of the full model MLE-based score test. The presence of negative score values is the likely reason for this. There is no clear distinction between the standard tests and those based on Fisher's method. Again, the LRTs perform better than the score tests in terms of power. When the total sample sizes are equal (i.e., comparing n 1 = n 2 = 30 and n 1 = 20 and n 2 = 40), all tests generally have larger power in the equal sample size cases.
On the one hand, the simulations show that LRT , MLR,Ŝ,S, MScore, and MScore all have similar power to U and Hybrid. T and V have slightly smaller power in a few settings. However, all of these tests are better than RS, SEA, and SEAD in terms of power. On the other hand, RS, SEA, and SEAD have type I error rates that are similar to those of T , V , and U . LRT , MLR,Ŝ,S, MScore, and MScore have larger type I error rates, unless the sample size  Table . Data are simulated from BEZI(p i , μ i , φ i ) with sample size n i , i = 1, 2; based on , replications using level at α = 0.05 (boldface for values ≥). p 1 = 0.3, μ 1 = 0.5, φ 1 = 10 p 1 = 0.3, μ 1 = 0.1, φ 1 = 10 p 1 = 0.1, μ 1 = 0.1, φ 1 = 10 p 2 = 0.3, μ 2 = 0.1, φ 2 = 10 p 2 = 0.3, μ 2 = 0.1, φ 2 = 100 p 2 = 0.3, μ 2 = 0.1, φ 2 = 10 Test n 1 = 10 n 1 = 30 n 1 = 60 n 1 = 10 n 1 = 30 n 1 = 60 n 1 = 10 n 1 = 30 n 1 = 60 statistics n 2 = 10 n 2 = 30 n 2 = 60 n 2 = 10 n 2 = 30 n 2 = 60 n 2 = 10 n 2 = 30 n 2 = 60 Test n 1 = 20 n 1 = 40 n 1 = 20 n 1 = 20 n 1 = 40 n 1 = 20 n 1 = 20 n 1 = 40 n 1 = 20 statistics n 2 = 40 n 2 = 80 n 2 = 60 n 2 = 40 n 2 = 80 n 2 = 60 n 2 = 40 n 2 = 80 n 2 = 60 is about 100 in each sample. Hybrid is conservative as expected using the Bonferroni correction. The KS test performs the worst in terms of both type I error rate and power. Overall, considering both power and type I error rate, U is the best test, followed by Hybrid, V , and T . When the sample sizes are large, the likelihood ratio tests, LRT and MLR, perform well. We also examined several settings with parameters near the boundary of the parameter space (p = 0.05, μ = 0.01 and 0.05, φ = 0.5 and 1 2 ). The patterns of type I error and power results are in general similar to those we report in Tables 2 and 3. However, the tests require larger sample sizes to achieve the desired nominal level in type I error rate or to have reasonably large power. We also found when parameters are close to their boundaries, although the two populations are different, because of the small magnitude of the difference, the power of the tests is also lower than in the cases reported in Table 3.  We choose these numbers because when p is close to  we almost always get zeros in samples, as a result we do not have much information to compare; naturally when μ is small, there is likely to be zero inflation; when φ is greater than , for fixed μ, φ's influence on the variance of BEZI random variable is small. We implemented all tests in R (R Core Team, 2013) (code available from authors upon request). The tests are all computationally efficient. The computing time for the four permutation-based non parametric and semiparametric tests range between 4 and 7 seconds to compute for a single data set with 1999 permutations (sample sizes vary from 20 to 200). All simulations were performed on an Intel C Core TM 2 Quad CPU Q6600 @ 2.40 GHz 2.40 GHz processor with 4.00 GB RAM.

Application to the barnacle settlement data
In this section, we apply our new tests to the barnacle settlement data described in Section 2, and we compare our results to those in Tyburczy (2011), namely, SEA and SEAD (refer to Section 2). Table 4 provides sample means and variances and MLE of all parameters under the BEZI distribution assumption for the settlement data at SHB. The sample zero proportions are the MLE of the population zero proportions. For these data, the sample sizes are nearly equal in some cases and twofold different in other cases. There are many zeros in all cases, with many samples having zero proportions greater than 50%. The sample means are less than 0.1 and the sample variances are all small.
As shown in Table 5, the KS and Hybrid tests conclude that there is no difference between samples. The likelihood-based tests conclude that the samples are different, and these tests have the smallest p values among all the tests. The p value that is greater than 0.999 under Fails to reject H 0 the regional relaxation (relax) condition forŜ is due to the presence of negative score statistic. The RS, T , V , and U indicate that the settlement for Balanus larvae is associated with regional relaxation only. We would conclude from SEA and SEAD that the settlement for Balanus larvae is associated with regional relaxation and front passage. The possible reason for the discrepancy between the likelihood-based tests and the non parametric and semiparametric tests is that these data are actually well represented by the BEZI distribution. When the parametric assumption is satisfied, the parametric tests are typically more powerful than the non parametric and semiparametric alternatives. Because of the large proportions of zeros, the sample means are pulled close to zero and the sample variances are also small. In other words, the data do not vary a lot. Therefore, the magnitude of difference between two samples are small compared to the potential range. The unbalanced sample sizes between two samples also provide some challenges for all the tests.

Discussion
We have extended Paul and Jiang's (2005) study to assess homogeneity across two threeparameter populations that are mixtures of discrete and continuous components (namely, the BEZI distribution). This distribution is particularly useful for modeling data taking the form of a proportional change in some continuous measurement, such as space, time, or volume, with the added complication of zero-inflation. We develop two non parametric and two semiparametric simultaneous tests to compare two or three features between the two populations simultaneously. Fisher's method based on LRT statistics outperforms the tests based on score test statistics. The non parametric and semiparametric simultaneous tests have even more desirable properties than the tests based on Fisher's method. Specifically, their type I error rates are desirable even in small sample situations and their power is comparable with the parametric tests. In addition, these non parametric and semiparametric tests can be applied to other zero-and/or one-inflated distributions directly. All methods can be expanded to compare more than two populations.
In Paul and Jiang's simulation for Normal and Negative Binomial distributions, the standard LRT has larger type I error rate than the desired level (0.05), and the standard score test has smaller type I error rate than the desired level. But the tests based on Fisher's method have the correct levels. These distinctions do not appear in our BEZI simulation. 3 The standard tests achieve similar type I error rate and power as the tests based on Fisher's method. More interestingly, the score test statistics should equal the LRT asymptotically, so it is not clear why they behaved differently in Paul and Jiang's simulations, even in the large sample size case. Furthermore, Paul and Jiang (2005) do not mention the negative score statistics in their paper, but we came across such occurrences with the barnacle settlement data and many times in our simulation study. Thiagarajah's (2012) simulations show similar patterns as ours: Fisher's method using LRTs has smaller type I error rate than Fisher's method using score tests. However, there is no clear distinction between the standard tests and combined tests. Thiagarajah (2012) recommends using combined score tests because of the simplicity in maximum likelihood estimation and inversion of a low-dimension matrix.
One benefit of considering combined tests, rather than a single LRT, is that it provides more information about what aspect or aspects of two populations might be different. The subtests deal with specific features of the data distributions, so they may be useful when examined individually. Based on our simulation results, within the parametric setting, we recommend combining the likelihood ratio tests using Fisher's method. The non parametric simultaneous tests we developed can be applied to other populations with and without inflation directly, as can the semiparametric tests. They can also be extended to cases of both zero and one inflation. This is particular appealing in ecological data, where zero inflation is common due to reasons such as species rarity or poor habitat conditions (Barry and Welsh, 2002;Chiogna and Gaetan, 2007). Current statistical methods for ecological analysis often involve count data (Marin et al., 2005). Zero-inflated continuous data are mostly based on the log-normal (Tian, 2005), Gamma (Feuerverger, 1979), and exponential (Zhang et al., 2010;Wu et al., 2012) model. Our study fills the gap in simultaneous comparison of zero-inflated continuous data with a constrained support.
Finally, when sample sizes are large and the data can be well represented by BEZI distributions, likelihood-based tests are superior to the non parametric and semiparametric alternatives. In small sample size situations, however, the non parametric and semiparametric tests are preferred.