Testing independence between discrete random variables

Abstract In this paper we develop a likelihood ratio test for independence between multivariate discrete random vectors whose components may be dependent or may have different marginal distributions. A special case that tests independence between two discrete random variables is also discussed. The model under the null hypothesis is constructed based on the marginal distributions of the random variables, while the model under the alternative is built by adding a common random effect which is not restricted to follow the same distribution as those of the other variables. The test statistic is shown to admit an asymptotic distribution of a weighted sum of chi-square random variables, each with one degree of freedom. In order to avoid the high computational cost of finding the parameters of the asymptotic distribution, a permutation-based procedure is introduced. It is shown in simulations that the permutation-based realization yields empirical level close to the nominal level, while achieving power comparable to or higher than that of existing competitors in the literature. The application of the test is illustrated with a real data set.


Introduction
Multivariate discrete distributions are fundamental models to a wide range of applications (Booth et al. (2003), Warton (2005), Ma, Kockelman, and Damien (2008), Park and Lord (2007), Pascual and Akhundjanov (2019)). For an accurate estimation and interpretation of model parameters, it is crucial to correctly specify the model form as well as the correlation structure between variables. Tests of independence of bivariate and multivariate discrete random vectors are important in a number of applications in agriculture, biology, nutrition, medicine, among many other fields. For instance, researchers may be interested in testing the independence of the number of plants of two species in laid out contiguous quadrats (Holgate (1966)), or the number of occurrences of two types of recurrent events such as infections (Hocine et al. (2005)), or the abundance of specific types of bacteria in certain locations (Jeon, Chun, and Kim (2013), Flores et al. (2013)), or the microbiota in family members or people living in the same household (Brito et al. (2019)). In this paper we propose a new hypothesis test of independence for components in multivariate random vectors and independence of random vectors whose components may be dependent.
Given its practical applicability, it is not surprising that vast statistics research has been developed on inference for multivariate discrete random vectors. Most attention, however, has been given to the estimation problem when discrete variables have some correlation structure, see Holgate (1964), Subrahmaniam and Subrahmaniam (1973), Karlis and Ntzoufras (2003), Pedeli and Karlis (2011), Bulla, Chesneau, and Kachour (2015), Su et al. (2017) and references therein. One of the most conventional principles used to induce correlation between variables is to add a common random effect to each component. For instance, Holgate (1964), Paul and Ho (1989), Stern and Zacks (2002), Liang and Yuen (2016), and Genest, Mesfioui, and Schulz (2018) model the dependence between Poisson random variables with a common Poisson random effect. Also for the Poisson case, Johnson, Kotz, and Balakrishnan (1997) and Tsionas (2001) introduce a covariance term that is shared by each pair of count variables, while Karlis and Meligkotsidou (2005) extend this idea to a full covariance matrix among Poisson counts with a combination of common random effects. Berm udez and Karlis (2011) study zero-inflated versions of Poisson regression models that allow a combination of random effects to yield a full covariance matrix. Shi and Valdez (2014) also use random effects to induce correlation, however, they study counts modeled with multivariate Negative Binomial variables. Despite the extensive interest in estimation and modeling, only a few authors have addressed the problem of testing the independence of components in bivariate or multivariate discrete distributions. To the best of our knowledge, independence tests for bivariate or multivariate discrete random vectors are restricted to models where random effects that induce the correlation belong to the same family of distributions as the target variables, and furthermore, there is no test in the literature for the independence of random vectors whose components may be dependent.
In the literature of hypothesis testing for discrete random vectors, the most common distribution used is the Poisson due to its practical importance and the ease of dealing with its probability mass function. Kocherlakota and Kocherlakota (1985) propose a test for the independence between non-Normal random vectors with the Neyman's C(a) test statistic. Paul and Ho (1989) evaluate the findings in Kocherlakota and Kocherlakota (1985) for the Poisson bivariate case and examine a likelihood ratio test based on the correlation coefficient. Stern and Zacks (2002) use the Holgate's bivariate distribution dependence structure for the Poisson bivariate distribution, and develop an independence test that yields higher statistical power than all methods presented in Paul and Ho (1989). More recently, but also for the Poisson bivariate case, Bower et al. (2018) introduce a test statistic for the independence of the components that is based on Normal copula functions, while Bower et al. (2019) use a score test for the dependence parameter in the Lakshminarayana's bivariate Poisson probability mass function (Lakshminarayana, Pandit, and Srinivasa Rao (1999)). Novoa-Munoz and Jimenez-Gamero (2014) use Cramer-von Mises type tests based on the empirical probability generating function for testing the goodness-of-fit of bivariate Poisson distribution. Overall, these tests are based on the Poisson probability mass function and are not extendable to other distributions. From a different point of view, Genest et al. (2019), Heller, Heller, andGorfine (2013), Roy (2020), and Sz ekely, Rizzo, and Bakirov (2007) take a nonparametric approach to tackle the problem. In this paper we propose a likelihood ratio test by assuming a parametric underlying data generating process that is general in the sense that any distribution that satisfies a set of regularity conditions can be used, including those commonly seen discrete distributions. We also present asymptotic theory that establishes the approximate distribution for the test statistic, which can be used for the finite-sample practical applications.
The remainder of the paper is organized as follows. Section 2 describes the model and the likelihood ratio hypothesis test, including the asymptotic distribution of the test statistic and an alternative way to compute p-values through a permutation-based procedure. The details of the proposed test in the case of a bivariate Negative Binomial model is discussed in Section 3, where the estimation of parameters from the latent random effect variable are estimated via an EM-algorithm. In Section 4, the finite sample performance of the proposed test is investigated under the bivariate Poisson and Negative Binomial models. A real data example is presented in Section 5. In the Supplementary material that accompanies this paper, we derive the details of the proposed hypothesis test for the case of multivariate vectors with dependent Poisson components, the bivariate Poisson and Zero Inflated Poisson (ZIP) cases, and the special case of mixed distributions (NB/Poisson/ZIP) for which we present additional simulations.

Basic framework: Testing independence of multivariate discrete random vectors with dependent components
In this section we describe the basic framework for testing the independence of multivariate discrete random vectors with dependent components. Consider the following motivating practical scenario: a set of bacteria counts is measured in a number of individuals in the same household. For instance, let these counts be the numbers of a types of bacteria (phyla) in each person's gut microbiota. Without loss of generality and for simplicity of notation, assume that the number of individuals in each household is fixed, denoted by b. It is expected that the bacteria counts of different types within a person are correlated. Furthermore, studies have shown that people living in the same household may have similar microbiota (Yatsunenko et al. (2012), Song et al. (2013), Brito et al. (2019)). It is of interest to test whether the sets of bacteria counts from persons in the same household are independent. This problem can be modeled with the following framework. Let fT 1 , :::, T n g, where T i ¼ ðT 1i , :::, T bi Þ and T ki ¼ ðT 1ki , :::, T aki Þ, be an independent and identically distributed sample from the discrete random vector T ¼ ðT 11 , :::, T a1 , T 12 , :::, T ab Þ with distribution H: Consider the model where M 0 is the general effect, M jk ðj ¼ 1, :::, a, k ¼ 1, :::, bÞ is the effect of bacteria of phylum j for person k, P k ðk ¼ 1, :::, bÞ, is the person effect, and M 0 , M jk , and P k are independent discrete random variables. The person effect P k induces the correlation between bacteria counts within each subject, while M 0 , the common effect, induces the correlation between persons within the same experimental unit (e.g., household). For a general formulation, let M 0 $ D g 0 , M jk $ D g jk , and P k $ D g k , where D g 0 , D g jk , and D g k are discrete families of distributions indexed by parameter vectors g 0 , g jk , and g k , respectively. Denote this model by F :¼ ff ðtjhÞ : h ¼ ðg 0 , g 11 , :::, g ab , g 1 , :::, g b Þ 2 H & R k 1 g, where t ¼ ðt 11 , :::, t ab Þ and k 1 is the dimension of the parameter space.
To test for independence between individuals in the same household, while allowing the counts of bacteria within an individual to be correlated, consider the following null model which will be denoted by G :¼ fgðtjcÞ : c ¼ ðg 11 , :::, g ab , g 1 , :::, g b Þ 2 C & R k 0 g, where k 0 < k 1 : In general, comparing two models in how closely they approximate the true underlying data generating mechanism can be done when both, only one, or neither of the two models is correctly specified (Vuong (1989)). Motivated by the work of Vuong (1989) and Lo, Mendell, and Rubin (2001), we consider comparing the two competing models, i.e., F and G, in relation to the unknown true data generating process H: It is natural to choose the model that is closer to the true distribution in some sense. Using the Kullback-Leibler Information Criterion (Kullback and Leibler (1951)), the distance from model F to the true density hðtÞ is E H ð log hðtÞÞ À E H ð log f ðtjh Ã ÞÞ, where h Ã is the pseudo-true value of h (see Sawa (1978) and White (1982) for details). Hence, the hypotheses of interest can be formulated as H 0 : F and G are equally close to H, H a : F is closer to H than G, or equivalently Letĥ andĉ be the maximum likelihood (ML) estimators of h and c under their respective models. Although under the null hypothesis E H ½log f ðtjh Ã Þ and E H ½log gðtjc Ã Þ are unknown, their difference can be consistently estimated (Vuong (1989), Lo, Mendell, and Rubin (2001)) by 1=n times the log-likelihood ratio statistic, namely, ð1=nÞLR :¼ L f ðĥjt 1 , :::, t n Þ À L g ðĉjt 1 , :: where t 1 , :::, t n are sample observations of T 1 , :::, T n : A special case of this generalized framework is when a ¼ 1, which establishes a test for the independence of components in a multivariate vector. A practical example of such a test is when data are collected from biomedical research where the components in the random vector represent the number of a certain bacteria/virus across hospital rooms, so that T j ðj ¼ 1, :::, bÞ represents the bacteria counts in room j. Testing the independence of these counts may give insight on contamination prevention and sanitation procedures to be adopted. The asymptotic distribution of the proposed test statistic in equation (3) is described in Subsection 2.2.

Asymptotic distribution of the test statistic
Recall that the parameter h in F is defined as h ¼ ðg 0 , g 1 , g 2 Þ, and the parameter c in G is defined as c ¼ ðg 1 , g 2 Þ: Consider the following assumptions.
Assumption 1. The random vectors T 1 , :::, T n are independent and identically distributed with the joint mass function hðtÞ that is strictly positive for almost all t 2 T , where T is the nonnegative bivariate integer space.
(a) For every h 2 H, f ðtjhÞ is strictly positive for almost all t 2 T : (b) The parameter space H is a compact subset of R k 1 and f ðtjhÞ is continuous in h for almost all t 2 T : (c) For every c 2 C, gðtjcÞ is strictly positive for almost all t 2 T : (d) The parameter space C is a compact subset of R k 0 and gðtjcÞ is continuous in c for almost all t 2 T : (a) For almost all t, j log f ðtjhÞj is bounded from above by a function of t integrable with respect to H: For almost all t, j log gðtjcÞj is bounded from above by a function of t integrable with respect to H: (d) The function E H ½log gðtjcÞ has a unique maximum at c Ã in C.

Define the matrices
Under the null hypothesis that the two models are equally close to the true underlying data generating mechanism, the asymptotic distribution of the likelihood ratio test statistic arises from a weighted sum of chi-squares distributions, which is established in Theorem 2.1.
Theorem 2.1. Under Assumptions 1-3 and additional regularity conditions analogous to A4, A5, and A6 in Vuong (1989), the asymptotic distribution of 2LR is a weighted sum of k 0 þ k 1 independent v 2 1 random variables under the null hypothesis. In other words, as n ! 1, where k 0 and k 1 are the dimensions of the parameter space of model F and model G respectively, M k 0 þk 1 ðÁÞ is the distribution function of a weighted sum of chi-square random variables, each with 1 degree of freedom, and k ¼ ðk 1 , :::, k k 0 þk 1 Þ is a vector of k 0 þ k 1 eigenvalues of the following matrix W: The proof of Theorem 1 follows similarly to that in Vuong (1989). In practice, the matrices defined in (4) can be consistently estimated by the following sample versionŝ whereĥ andĉ represent the maximum likelihood estimators for h and c respectively.

Special case: Testing independence of the components of bivariate discrete random vectors
In the literature of statistics, authors have focused on testing independence for bivariate random vectors (Paul and Ho (1989), Stern and Zacks (2002), Bower et al. (2019)) due to its vast variety of real world applications. For this reason in this subsection we present the special case of the proposed method for testing independence of components of bivariate random vectors. Let fT 1 , :::, T n g, where T i ¼ ðT 1i , T 2i Þ, be an independent and identically distributed sample from the bivariate discrete random vector T ¼ ðT 1 , T 2 Þ with distribution H: Assume that each component of T can be written as where M 0 $ M 0 g 0 , M 1 $ M 1 g 1 , and M 2 $ M 2 g 2 are independent discrete random variables from the discrete families of distributions M j , with indexing parameter vectors g j , j ¼ 0, 1, 2, respectively. In this model, the possible dependency between T 1 and T 2 arises from the common random component M 0 . Denote the model in (5) by F :¼ ff ðt 1 , t 2 jhÞ : h ¼ ðg 0 , g 1 , g 2 Þ 2 H & R k 1 g: Despite the fact that the correlation between T 1 and T 2 in (5) can only be non-negative, this model is widely used in the analysis of correlated count data. Nevertheless, it can be extended to accommodate the full spectrum of possible correlations using Genest, Mesfioui, and Schulz (2018)'s approach, writing where Z 1 and Z 2 are comonotonic variables independent of M 1 and M 2 . That is, Z 1 and Z 2 are both increasing functions of the same standard uniform variable. In this paper we use the model in (5) for ease of notation. However, the theory and applications follow similarly for the general model in Genest, Mesfioui, and Schulz (2018).
Our goal is to test the independence of the two components T 1 and T 2 of T, which in view of (5), can be formulated as Denote the model in (6) by G :¼ fgðt 1 , t 2 jcÞ : The tests of independence studied in Paul and Ho (1989) and Stern and Zacks (2002) only consider the Holgate bivariate Poisson distribution, which is a special case of Model F in (5) when M j , j ¼ 0, 1, 2 are Poisson families of distributions. As described in the multivariate framework in Subsection 2.1, the two competing models F and G are compared with the unknown true data generating process H using the Kullback-Leibler distance. Hence, the hypotheses of interest can be formulated as H 0 : F and G are equally close to H versus H a : F is closer to H than G, or equivalently where h Ã and c Ã are the pseudo-true values of h and c respectively. Lettingĥ andĉ be the maximum likelihood (ML) estimators for h and c respectively, the log-likelihood ratio test statistic is computed as ð1=nÞLR :¼ L f ðĥjt 1 , :::, t n Þ À L g ðĉjt 1 , ::

Null distribution approximation using a Permutation-Based procedure
The computation of the sample derivative matrices and their inverses (Â À1 f ,Â À1 g ), followed by the estimation of the matrix W, and finally the eigenvalues from W for the asymptotic approximation of the distribution of the test statistic incurs very high computational complexity. This is especially true if the number of parameters is large, which is evident in the generalization of the proposed method to the multivariate case (see Section 2.1). We propose using a permutation-based procedure as an alternative to the computation of these matrices and their inverses for the distribution of the test statistic under the null hypothesis. For ease of presentation, we first introduce the permutationbased procedure for the bivariate case. Under the null hypothesis of independence of the components T 1 and T 2 , the pairing of the observations ðT 1i , T 2i Þ for i ¼ 1, :::, n is exchangeable. In other words, under independence, the permutation of the observations of one component should yield the same joint distribution. To do so, consider the artificial data set ðT 1i , T Ã 2i Þ, i ¼ 1, :::, n, where ðT Ã 21 , :::, T Ã 2n Þ is a permutation of ðT 21 , :::, T 2n Þ: The distribution of the test statistic under the null hypothesis is estimated by computing the values of the test statistic from the n! possible artificial data sets generated by the n! permutations. The p-value for the hypothesis test is computed as the percentage of the values of the test statistic computed from the artificial data sets which are larger than the test statistic computed from the original data. Note that this differs from the classical permutation test for independence in that the classical method computes the distribution of the correlation coefficient.
The computational cost of this procedure increases rapidly with an increasing sample size. Hence, a smaller number B ðB < n!Þ of artificial data sets generated from the permutations can be used, depending on computational time constraints, to approximate the distribution of the test statistic under the null. Note that the computational complexity for computing the test statistic is of order O(n) (due to the n elements in the LR) multiplied by the number of iterations of the EM algorithm and the likelihood numerical maximization. When B increases, this computational cost of is multiplied by B, so that a choice of B approaching n! would be highly costly. In the simulations of Section 4 we use B ¼ 500, which yields reasonable distribution approximation for the proposed test in the scenarios considered. Now we extend this idea to the case of multivariate vectors with dependent components. Let T ¼ ðT 11 , :::, T a1 , T 12 , :::, T ab Þ be a multivariate random vector with dependent components defined by model (1). Under the null hypothesis in (2), ðT 1k , :::, T ak Þ is independent of ðT 1k 0 , :::, T ak 0 Þ for all k 6 ¼ k 0 : Recall that T 1 , :::, T n , where T i ¼ ðT 1i , :::, T bi Þ and T ki ¼ ðT 1ki , :::, T aki Þ, k ¼ 1, :::, b, is a random sample from the discrete random vector T ¼ ðT 11 , :::, T a1 , T 12 , :::, T ab Þ, so that under the null hypothesis the pairing of T i ¼ ðT 1i , :::, T bi Þ, i ¼ 1, :::, n is exchangeable. Consider the artificial data set T Ã i ¼ ðT Ã 1i , :::, T Ã bi Þ, i ¼ 1, :::, n, where ðT Ã 11 , :::, T Ã 1n Þ is a random permutation of ðT 11 , :::, T 1n Þ, however the components of T ki , i ¼ 1, :::, n, k ¼ 1, :::, b are not permuted. Similarly to the bivariate case, the distribution of the test statistic under the null hypothesis is estimated by computing the values of the test statistic from a number of possible artificial data sets generated by the permutations.

Notations and formulation
Among the distribution families used to model discrete random variables, the Poisson and Negative Binomial (NB) are the most commonly seen in applications for their interesting properties (for instance, the sum of independent Poisson variables is also Poisson and similarly for NB with the same probability parameter). Because the NB allows for over-dispersion, in this section, we present the details of the proposed likelihood ratio test when M j , j ¼ 0, 1, 2, are NBðr j , pÞ random variables, where 0 < p < 1 and r j > 0 are the parameters. Details of the proposed method for the case when M j , j ¼ 0, 1, 2 have different marginal distributions can be found in the Supplementary material. Under the null model G, the observed data can be written as t i ¼ ðt 1i , t 2i Þ ¼ ðm 1i , m 2i Þ for 1 i n: In this case, the probability mass function of model G can be expressed as gðt i jcÞ ¼ t 1i þ r 1 À 1 t 1i t 2i þ r 2 À 1 t 2i p t 1i þt 2i ð1 À pÞ r 1 þr 2 , so that the maximum likelihood estimator of c can be obtained from the likelihood L g ðcjt 1 , :::, t n Þ ¼ Q n i¼1 gðt i jcÞ: The likelihood under the alternative hypothesis, that is, under model F , is L f ðhjt 1 , :::, t n Þ ¼ Note that in the general case where the probability distribution families of M j , j ¼ 0, 1, 2 are not equal, L f ðhjt 1 , :::, t n Þ would be written in a similar way, except with a product composed of probability mass functions from different distributions.
The likelihood function of model F is also a function of the parameter r 0 of the latent variable M 0 . Therefore, the ML estimator of h has to be found numerically. Instead, a possible method to obtain the ML estimate is through the expectation-maximization (EM) algorithm. In what follows, we present the details of the EM algorithm for computing the estimate of the parameter h:

The EM algorithm
Let h ¼ ðr 0 , r 1 , r 2 , pÞ: The EM algorithm consists of iterating between the following two steps until convergence. E-step: find the conditional expected complete log-likelihood with respect to the latent random variable M 0 , given the observed variables and the parameters estimated in the previous iteration (denoted by h ðkÞ ). The conditional expectation is written as Qðhjh ðkÞ Þ ¼ E M 0 jT 1 , T 2 , h ðkÞ log LðhjM 0 , T 1 , T 2 Þ Â Ã : M-step: maximize the conditional expectation to update parameter estimates for h: Qðhjh ðkÞ Þ: Given an independent and identically distributed sample of bivariate NB random vectors ðT 1 , :::, T n Þ where T i ¼ ðT 1i , T 2i Þ, the complete likelihood function is equal to Lðhjt 1i , t 2i , m 0i , i ¼ 1, :::, nÞ : ! p t 1i þt 2i Àm 0i Â ð1 À pÞ r 0 þr 1 þr 2 : Then, the EM conditional expectation function is : With the estimatorĥ obtained through the EM algorithm, and the estimatorĉ obtained by maximizing the likelihood under the null, we can compute the likelihood ratio test statistic in (8) and the p-value using the asymptotic distribution as the approximation or the permutation-based procedure described in Section 2.4.
Remark. The family of distributions in the proposed procedure can be chosen based on the knowledge of the researcher, data exploration, or goodness-of-fit metrics. In the real data analysis in Section 5 we look at the AIC, BIC, and log-likelihood of the data when using, for example, the Poisson, Negative Binomial and Zero-Inflated Poisson distributions, and then choose the family of distributions that best fits the data.

Numerical simulations
In this section we investigate the finite sample performance of the proposed method in detecting independence between discrete random variables through two simulation scenarios. We consider random vectors with Poisson and Negative Binomial bivariate distributions, which are the most commonly used in practice. For comparison purposes, in the Poisson case we also evaluate the performance of the tests of independence in Paul and Ho (1989)  In addition, we also included the classical Permutation test by computing the sample correlation coefficient out of 500 permuted data sets. Adopting Paul and Ho (1989) and Stern and Zacks (2002)'s simulation setup, in each simulation run, we generate n ¼ 10, 20, or 50 samples from the independent Poisson random variables M 0 , M 1 , M 2 with rate parameters denoted by k 0 , k 1 , k 2 respectively, and following their notation, we write h 0 :¼ k 0 and generate T ¼ ðT 1 , T 2 Þ as T 1 ¼ M 1 þ M 0 $ Poiðh 1 :¼ k 1 þ k 0 Þ and T 2 ¼ M 2 þ M 0 $ Poiðh 2 :¼ k 2 þ k 0 Þ: The rejection rates, at significance level 0.05, of the proposed likelihood ratio test and competing tests based on 500 Monte Carlo simulations runs are reported in Tables 1 and 2 for h 1 ¼ 1, h 2 ¼ 0:5, 1, 1:5, and 2, with increasing values of h 0 (h 0 ¼ 0 yields the empirical level). For our method, the rejection rates are calculated based on the permutation-based procedure (Section 2.4) with 500 random permutations for each simulated data set. Table 1. Rejection rates of the tests of independence for bivariate Poisson out of 500 iterations when h 1 ¼ 1 and h 2 ¼ 0:5 or 1:  Table 2. Rejection rates of the tests of independence for bivariate Poisson out of 500 iterations when h 1 ¼ 1 and h 2 ¼ 1:5 or 2: The numerical results in Tables 1 and 2 suggest that all methods under comparison achieve empirical level close to nominal in all cases. In addition, the proposed method shows comparable or better power than (P&H), (S&Z), (B. et al), the Pearson's test, and the classical Permutation test in all simulations. In particular, as the dependence between T 1 and T 2 increases with higher values of k 0 , the power of the proposed test improves at a rate that is significantly faster than those of Paul andHo (1989)'s andBower et al. (2019)'s, while still being competitive with or better than the power of Stern and Zacks (2002)'s test.
Notice that Paul and Ho (1989)'s, Stern andZacks (2002)'s, andBower et al. (2019)'s methods are only applicable to Poisson distribution, while the proposed method is applicable to any discrete distribution that satisfies the assumptions in Theorem 2.1. In practice, a key disadvantage of using a Poisson distribution is that it may not be appropriate when the discrete data set has over-dispersion. In such cases, a Negative Binomial distribution may be preferred. In what follows, we investigate the proposed likelihood ratio test procedure for testing independence of Negative Binomial bivariate data.
For comparison purposes, in this second simulation scenario we include the results of the Pearson's test and the classical Permutation test. Let r 0 , r 1 , r 2 be the rate parameters for the independent Negative Binomial random variables M 0 , M 1 , M 2 respectively. Define T 1 ¼ M 1 þ M 0 $ NBðr 0 þ r 1 , pÞ and T 2 ¼ M 2 þ M 0 $ NBðr 0 þ r 2 , pÞ: In each simulation, n ¼ 10, 20, or 50 samples of (T 1 , T 2 ) were generated with r 1 ¼ 1, r 2 ¼ 1, 2, or 3, p ranging from 0.2 to 0.5, and different values of r 0 for increased dependence between T 1 and T 2 . Note that as p increases from 0.2 to 0.5, the over-dispersion of the distribution increases. To compute the empirical levels of the tests, we generate T 1 ¼ M 1 and T 2 ¼ M 2 under the null hypothesis. Table 3 shows the results of the empirical level and Tables 4-6 exhibit the rejection rates based on 500 Monte Carlo simulation runs using the permutation-based procedure in Section 2.4.
The results displayed in Table 3 suggest that the proposed test achieves an empirical level close to the nominal level, while being occasionally slightly liberal (within a couple percentage points). The exception is when the sample size is small (n ¼ 10) and the probability of success p is small (p ¼ 0.2 or 0.3), for which the test is significantly liberal. This is because for the choices of r 1 and r 2 in this simulation scenario, small probabilities for p yield very low simulated counts, many of which are equal to 0, so that it is challenging for the permutation test to find a good approximation of the null Table 3. Empirical level of the tests of independence for bivariate Negative Binomial. distribution. The rejection rates in Tables 4-6 suggest that, for n ¼ 20, the power of the proposed test is about 10% higher than that of the Pearson's test for small to medium values of r 0 (r 0 ¼ 1, 3), and slightly higher for large values of r 0 (r 0 ¼ 6). For a larger sample size of n ¼ 50, this difference is maintained when r 0 ¼ 1, but both tests have power near 1 when r 0 ¼ 3 or 6. Moreover, the proposed test has higher power than that of the Permutation test for almost all cases. Table 4. Rejection rates of the tests of independence for bivariate Negative Binomial (n ¼ 10). n ¼ 10  Table 5. Rejection rates for the test with bivariate Negative Binomial (n ¼ 20).  Table 6. Rejection rates for the test with bivariate Negative Binomial (n ¼ 50). n ¼ 50

Real data application
To illustrate the performance of the proposed test in practice, we consider a real data set which is part of the Australian Health Survey from 1977 to 1978. This data set was also studied in Cameron and Trivedi (1998) and Bower et al. (2018). The data set consists of n ¼ 5190 paired samples of two discrete variables, i.e., T 1 ¼ the number of doctor consultations and T 2 ¼ the number of prescribed medications. As shown in Table 7, the observed (0,0) count has the highest frequency in the data, and moreover, the marginal 0 counts for T 1 and T 2 are also high, indicating that the simple Poisson model may not be adequate so that either a zero-inflated Poisson or a Negative Binomial model, for instance, may yield a better fit for the data. Therefore, the bivariate Poisson model in Bower et al. (2018) and other existing methods (Paul and Ho (1989); Stern and Zacks (2002)) are not appropriate in this case. Bower et al. (2018) indicate that their method is not appropriate and an extension to the zero-inflated Poisson (ZIP) case is necessary. The flexible framework of our proposal allows us to consider a ZIP distribution. We first investigate the goodness of fit of the Poisson, Negative Binomial and ZIP models. The log-likelihood, AIC and BIC for each model are shown in Table 8. The results corroborate the fact that the Poisson distribution is not adequate, but the Negative Binomial and ZIP distributions both fit the data well with very similar goodness-of-fit metrics.
The proposed likelihood ratio test of independence, modeled with a zero-inflated Poisson or a Negative Binomial distribution, yielded a p-value < 0.0001, which was computed through the permutation-based procedure with 500 permutations. This suggests very strong evidence against the null hypothesis of independence, which means that the number of prescribed medications and the number of doctor consultations are not independent count variables when modeled with zero-inflated Poisson distributions.