Wald-Type Testing and Estimation Methods for Asymmetric Comparisons of Poisson Rates

This article considers the problem of asymmetric comparisons, that is, instances where one treatment serves as the compelling standard treatment for a certain disease or health condition, where the asymmetric comparisons are performed in terms of the Poisson rates. We propose asymptotic Wald-type tests and confidence intervals suitable for demonstrating superiority and noninferiority for a given margin. The testing methods for the difference and ratio are based on the unconstrained (MLE) and constrained (CMLE) maximum likelihood estimator. The resulting CMLE-based tests for asymmetric comparisons are equivalent to the standard (i.e., those for symmetric comparisons) CMLE-based tests. The asymmetric tests as well as the standard MLE-based tests are evaluated via simulations. The CMLE-based asymmetric tests are shown to adequately control the Type I error in most settings, while the MLE-based asymmetric tests are shown to have this property only in settings with relatively large sample sizes and means. In all settings, the MLE-based asymmetric tests are more powerful than both the asymmetric CMLE-based and the standard MLE-based tests. We also propose asymptotic confidence intervals that can be used to estimate the difference or ratio of the two rates (the presentation of this content includes supplementary material that is available online), and discuss how power and sample size estimation can be done to aid in study planning. Supplementary materials for this article are available online.


Introduction
The Poisson distribution is commonly encountered in practice, for example, in transplantation studies (Rochon, Aprile, and Cardella 1992). Therefore, not surprisingly, many statistical methods concerning testing or estimation of the Poisson rate have been developed and widely discussed (Pearson and Hartley 1966;Hall 1982;Schwertman and Martinez 1994b;Brown, Cai, and Dasgupta 2001;Barker 2002;Byrne and Kabaila 2005;Johnson, Kemp, and Kotz 2005;Swift 2009;Khamkong 2012;Patil and Kuklarni 2012). In addition, methods for the difference of the Poisson rates (Miettinen and Nurminen 1985;Weissfeld, Laurent, and Moulto 1991;Schwertman and Martinez 1994a;Stamey and Hamilton 2006;Liu et al. 2006;Ng, Gu, and Tang 2007;Li et al. 2011;Krishnamoorthy and Lee 2013;Bright and Soulakova, in press) and ratio of the Poisson rates (Nelson 1970;Sahai and Khurshid 1993;Price and Bonett 2000;Liu 2004;Tang and Ng 2004;Ng and Tang 2005;Barker and Cadwell 2008;Gu et al. 2008;Krishnamoorthy and Lee 2010;Li, Tang, and Wong 2014) have also been discussed. Among different methods for the difference and ratio of Poisson rates, the Wald-type hypothesis testing methods are one of the most commonly addressed in the literature
All available methods regarding the difference and ratio of the Poisson rates are proposed for symmetric comparisons, that is, when "neither group represents a compelling standard for the other" (Falk andKoch 1998, p. 1604). However, there are settings where the reference treatment does serve as the compelling standard, for example, a study presented by Anbar (1983). In the latter case, the statistical method should reflect this "asymmetry" of the comparison, by giving more weight to information obtained from the reference group. If an asymmetric comparison is of interest, then the study should be planned accordingly, for example, the power and group sample sizes must be estimated for the asymmetric tests that will be used in the data analysis. Asymmetric comparisons have been discussed with respect to the difference of two proportions by Anbar (1983), Mee (1984), Falk and Koch (1998), Roy (2011), andBright (2013). However, no such methods have been proposed for Poisson outcomes. In this article, we propose hypothesis testing and interval-estimation methods for asymmetric comparisons in terms of the difference and ratio of two Poisson rates.
The article is outlined as follows. The problem formulation and proposed tests for asymmetric comparisons based on the unconstrained maximum likelihood estimator (MLE), as well as those based on the constrained maximum likelihood estimator (CMLE) are presented in Section 2.1. Approximate power of the tests and sample size computing are addressed in Section 2.2. The performance of the proposed tests is evaluated and compared to that of the standard tests for symmetric comparisons via a simulation study presented in Sections 2.3, and applications are illustrated via an example presented in Section 2.4. Confidence intervals for asymmetric comparisons, as well as an evaluation of the performance of these confidence intervals, are provided in Section 3. The article concludes with a discussion, given in Section 4.

Problem Formulation and Proposed Wald-Type Tests
Suppose that the study goal is to determine whether the experimental treatment is more beneficial than (or noninferior to) some reference treatment, when the parameter of interest is the rate of occurrences of some phenomenon, for example, incidence of a disease. This problem is commonly stated as a hypothesis testing problem for the difference or ratio of the Poisson rates (Ng, Gu, and Tang 2007;Gu et al. 2008). In particular, if the goal is to assess whether the benefits of the experimental treatment exceed the benefits of the reference treatment by some fixed margin d (d could be any real number), then the hypothesis testing problem can be stated as H 0,d : λ 1 − λ 0 = d with the corresponding alternative hypothesis H 1,d : λ 1 − λ 0 > d, where λ 1 and λ 0 denote the rate, that is, the true mean number of occurrences of the event of interest per some unit of time (or space), for the experimental and reference treatments, respectively. Another formulation corresponds to demonstrating that the benefits of the experimental treatment exceed the benefits of the reference treatment by some specified percentage, that is, the null hypothesis H 0,r : λ 1 = r λ 0 is tested against the alternative hypothesis H 1,r : λ 1 > r λ 0 , where r is the specified percentage (in decimal notation), also termed margin, such that r > 0. Note that when d = 0 and r = 1, the hypotheses for the difference and ratio reduce to the same set of hypotheses.
We consider a study that has two groups. Let X i denote the total number of occurrences in the ith treatment group such that X i ∼ Poisson (λ i t i ), where t i is the total number of time units in the ith group. If a study deals with the number of occurrences per subject, and a subject is treated as one unit, then t i corresponds to the group sample size. For simplicity of wording, we will refer to t i as the ith group sample size.
To test the above hypotheses, we construct Waldtype tests that differentiate between the experimental and reference treatments in the sense that more weight is given to the information obtained from the reference treatment group rather than the experimental treatment group. Let E(.) and V (.) denote the expectation and variance, respectively. First, we consider the random variable Note that under the null hypothesis H 0,d , E (W d ) = 0 and V (W d ) = λ 1 /t 1 + λ 0 /t 0 can be written as V 0 (W d ) = (λ 0 + d)/t 1 + λ 0 /t 0 . Second, we consider another random variable W r = X 1 /t 1 − r X 0 /t 0 . Under the null hypothesis H 0,r , E(W r ) = 0 and V (W r ) = λ 1 /t 1 + r 2 λ 0 /t 0 can be expressed as V 0 (W r ) = r λ 0 /t 1 + r 2 λ 0 /t 0 .
Third, to construct a test statistic, we estimate the true rate λ 0 in the variances V 0 (W d ) and V 0 (W r ). For this purpose, the (unconstrained) maximum likelihood estimator (MLE) can be used. The MLE of λ 0 is given asλ 0 = X 0 /t 0 (Gu et al. 2008). Using this estimate, we construct the following test statistics, termed A-statistics. In the case of testing H 0,d versus H 1,d , the A-statistic is Similarly, we construct the A-statistic that correspond to testing H 0,r against H 1,r as A similar approach can be used to construct CMLEbased test statistics, that is, if the constrained (under the null hypothesis) maximum likelihood estimator (CMLE) for λ 0 is used to estimate the true rate λ 0 in the variance V 0 (W d ) and V 0 (W r ). In particular, the CMLE for λ 0 under H 0,d is given bŷ (Ng, Gu, and Tang 2007), which simplifies toλ p = (X 1 + X 0 ) / (t 1 + t 0 ) when d = 0. The CMLE for λ 0 under H 0,r is given byλ r = (X 1 + X 0 ) / (rt 1 + t 0 ) (Gu et al. 2008). Then the corresponding statistics are as follows.
In the case of testing H 0,d versus H 1,d , the CMLEbased statistic is In the special case of d = 0, A CMLE,d simplifies to The CMLE-based statistic in the case of testing H 0,r against H 1,r is We note though, that the above CMLE-based statistics for asymmetric comparisons reduce to the score statistics given by Ng, Gu, and Tang (2007) and Gu et al. (2008), as well as the corresponding tests. Asymptotically, the null distribution of an A-statistic is standard normal. Let P (Z > z α ) = α, where Z ∼ Nor-mal (0,1). Then the asymptotic asymmetric MLE-based test for the difference (AMD) and CMLE-based test for the difference (ACD) reject H 0,d in favor of H 1,d at level α (provided that λ i t i is sufficiently large for i = 0, 1) if A MLE,d > z α and A CMLE,d > z α , respectively. Similarly, the asymptotic asymmetric MLE-based test for the ratio (AMR) and CMLE-based test for the ratio (ACR) reject H 0,r in favor of H 1,r at level αif A MLE,r > z α and A CMLE,r > z α , respectively. Note that in the case of d = 0 and r = 1, the AMD and AMR are identical, as are the ACD and ACR. Similarly, asymptotic Wald-type normal tests based on the A-statistics can be constructed for asymmetric comparisons with other types of the alternative hypothesis.
In addition to these asymmetric tests, we consider the standard Wald MLE-based symmetric tests for the difference (NMD), discussed by Ng, Gu, and Tang (2007), and for the ratio (GMR), discussed by Gu et al. (2008). In the case of d = 0 and t 0 = t 1 , the NMD and ACD are equivalent. Similarly, in the case of r = 1 and t 0 = t 1 , the GMR and ACR are equivalent.

On Power Approximation
The approximate power function of the proposed tests for asymmetric comparisons is derived similarly to a method used to approximate the power of normal tests for the difference of binomial proportions (Falk and Koch 1998;Soulakova and Roy 2012). In the case of testing H 0,d against H 1,d via the tests given above, this method is as follows. First, consider λ 0 = λ a 0 and λ 1 = λ a 1 such that Since the distribution of W d / V a (W d ) under the specific alternative is asymptotically standard normal, the power function of the AMD or ACD can be approximated using where (.) denotes the cumulative distribution function of the standard normal random variable. Similarly, for given values λ 0 = λ a 0 and λ 1 = λ a 1 such that r a = λ a 1 /λ a 0 > r , the approximate power function of the AMR and ACR is given by Note that the power functions do not take into account the specific type of estimator, MLE or CMLE, used in the test statistic. Table 1 presents the computed power values for some specific configurations. First, we note that when the total sample size, the margin, and the rates are fixed, balanced allocations result in the highest computed power when compared with designs with unbalanced allocations in all considered settings.
Next we assess the accuracy of the proposed approximate power. For this purpose, the power is computed via (1) or (2) for each given set of the rates (under a specific alternative hypothesis), fixed value of the margin d (orr ), and group sample sizes t 0 and t 1 . Additionally, for the same setting, the power is assessed via simulations (based on 100,000 simulation runs). The discrepancy between the approximate and simulated power is computed for each test as the absolute value of the difference between the approximate and simulated power. The accuracy is assessed using the maximum discrepancy and the average (computed across all configurations) discrepancy. Table 1 presents the results. As is illustrated, the approximate power may overestimate or underestimate the (simulated) power for each test. For the tests for the difference, the largest discrepancy between the approximate and simulated power is 1.3% for the AMD and 4.3% for the ACD. On average, the approximate power provides a better estimate of the power of the AMD (the overall average discrepancy is about 0.6%) than of the power of the ACD (the overall average discrepancy is about 1.7%). When testing for the ratio, the largest observed discrepancy is 2.8% for the ACR and 2.2% for the AMR; and the approximate power appears to be reasonably close to the power of either test: the overall average discrepancy is 0.6% for the AMR and 1.1% for the ACR. Table 1 also illustrates that the approximate power is reasonably close to the simulated power in all settings with relatively large means, that is, in settings where the total sample size is 300. Thus, for relatively large values of the means, the power of any test can be reasonable approximated using (1) and (2).

On Sample Size Estimation
In this section, we discuss how the approximate power in (1) and (2) can be used to derive formulas suitable for estimating the minimum required sample size under some constraints. We consider the settings where t 1 = kt 0 , where k is the specified allocation constant (k > 0). For given values of the parameters λ a 0 and λ a 1 , significance level α, and desired power 100 % − β, where β denotes the probability (in%) of a Type II error (under λ a 0 and λ a 1 ), we solve the inequality P ≥ 100 % − β for t 0 , where P is the power given in the right-hand side in (1) or (2). As a result, we estimate the minimum required reference group size for the AMD (ACD) by the smallest integer exceeding t 0,d such that Similarly, the minimum required reference group size corresponding to the AMR (ACR) is the smallest integer exceeding t 0,r such that Following this calculation, the experimental group sample size, t 1,d (t 1,r ), is computed using t 1,d = kt 0,d (t 1,r = kt 0,r ). A similar approach can be used to derive formulas for other design settings, for example, when the total sample size is fixed at n and t 1 = n − t 0 . Table 2 presents the computed sample sizes using the proposed formulas for different allocation strategies and parameter values.
Proximity of the proposed sample sizes is assessed via simulations for α = 5% and power of 80%, that is, β = 20%. First, for each set of rates, fixed value of the margin d (or r ), and a specified allocation constant k, the group sample sizes are computed via the proposed formulas. Then these values are used in a simulation study to determine if the actual power of a test is close to the desirable 80% level. Table 2 presents the results. As is expected from the power study discussed in the section above, balanced allocations result in the smallest total sample size (when the other parameters are fixed). Additionally, the proposed sample size formulas (3) and (4) are more suitable for estimating the sample size requirement for the AMD rather than ACD, while these sample size formulas are appropriate for both the AMR and ACR.

Type I Error Rate and Power of the Tests
A simulation study was conducted to investigate the control of the Type I error rate and power of the four asymmetric Wald-tests (AMD, ACD, AMR, and ACR) in comparison to the Wald MLE-based symmetric tests (NMD and GMR). The specific goals were (1) among the considered settings with relatively small, moderate, and  (3) confirm that balanced designs result in higher power than unbalanced ones (for a fixed total size, margin, parameter values, and significance level). For each simulation configuration specified in Tables 3 and 4, the data were generated from two in-dependent Poisson distributions, Poisson (λ i t i ), i = 0, 1. Then, the AMD, ACD, and NMD, or the AMR, ACR, and GMR were performed at a 5% significance level, and the corresponding decisions in terms of accepting or rejecting the null hypothesis were noted. The above steps were repeated 100,000 times. The percentage of replications where a true null hypothesis was rejected provided the estimated Type I error rate. We say that the simulations indicate inflation of Type I error rate if the observed values are at least 5.14%, where 5.14% = 5% + 1.96 √ 5 % * 95 %/100, 000. The proportion of replications where a false null hypothesis is rejected provided the estimated power.
Goal 1 Results. Table 3 presents the results of the simulated Type I error rate. As is illustrated, all tests can potentially result in Type I error rate inflation. In addition, the CMLE-based tests (ACD and ACR) and Wald MLE-based symmetric tests (NMD and GMR) consistently result in a smaller Type I error rate than the corresponding MLE-based asymmetric tests (AMD and AMR). The CMLE-based tests and Wald symmetric MLE-based tests inflate the Type I error rate in a few settings. However, the Wald asymmetric MLE-based tests inflate the Type I error rate in several settings, even for relatively large sample sizes, and the largest Type  Table 4 presents the power results. As is expected, the power of any test highly depends on λ a i t i , i = 0, 1, given that the other simulation parameters are fixed; for example, when d = −0.5, λ a 0 = 2, and λ a 1 = 2, the power of any of the three tests for the difference ranges from 19% to 82% depending on the values of the group sample sizes. The power of the tests for the difference (ratio) also depends on the relative difference λ a 1 − λ a 0 (ratio λ a 1 /λ a 0 ); for example, when r = 1.5 and t 0 = t 1 = 10, the power difference can be as large as 19% when settings with λ a 1 /λ a 0 = 6/3 and settings with λ a 1 /λ a 0 = 125/80 are compared.
When the three tests for the difference (ratio) are compared in terms of the power, it can be concluded that the MLE-based asymmetric tests outperform the CMLEbased tests and Wald MLE-based symmetric tests in all settings, where the maximum discrepancy in power is 6.8% when tests for the difference are compared (and 9.2% when tests for the ratio are compared). Whether the CMLE-based tests or the corresponding Wald MLEbased symmetric tests have higher power depends on the parameter settings.
Goal 3 Results. Among different sample size allocations for the designs with the same total sample size (when the other parameters are fixed), the highest power is achieved in the balanced design cases, which agrees with our conclusions stated in Section 2.2. Boice et al. (1991) investigated the relationship between repeated exposure to X-ray fluoroscopy as part of a treatment regimen and incidence of breast cancer in women. The study involved examining the hospital records of female patients who have been treated for pulmonary tuberculosis between 1925 and 1954 in Massachusetts and determining who received lung collapse treatment using repeated X-ray fluoroscopy (exposed group) and who were unexposed to this source of radiation (unexposed group). In the sample, there are 87 cases of breast cancer over a total of 48,919 person-years observed in the unexposed group and 147 cases of breast cancer observed over a total of 56,965 person-years in the exposed group. Thus, the breast cancer incidence per year is computed as 0.00178 and 0.00258 for the unexposed and exposed groups, respectively. The authors concluded that exposure to X-ray fluoroscopy (as part of a treatment regimen) results in excess risk of developing breast cancer. This example has been widely used to illustrate inference methods for the Poisson rates (Graham, Mengersen, and Morton 2003;Ng and Tang 2005;Gu et al. 2008;Krishnamoorthy and Lee 2010;Li et al. 2011;Li, Tang, and Wong 2014).

Example
To illustrate the application of the proposed Waldtests, we assume that it is reasonable to treat the treatment group without X-ray fluoroscopy as the reference one. In this case, λ 1 and λ 0 denote the rates of breast cancer corresponding to the treatment with and without repeated X-ray fluoroscopy, respectively. The significance level is 5%, so we use the critical value z 0.05 = 1.645.
To demonstrate the application of the methods for the asymmetric testing of the difference, we suppose that the objective of the study is to demonstrate that treatment with repeated X-ray fluoroscopy is inferior (i.e., results in a higher rate of breast cancer) to treatment without X-ray fluoroscopy, that is, H 0,d : λ 1 − λ 0 = 0 and H 1,d : λ 1 − λ 0 > 0. We compute A MLE,d=0 = 3.086 > 1.645, therefore, the AMD rejects the null hypothesis. The ACD leads to the same conclusion (A CMLE,d=0 = 2.768).
To demonstrate the application of the methods for the asymmetric testing of the ratio, we suppose that the objective of the study is to demonstrate that the breast cancer rate associated with repeated X-ray fluoroscopy exceeds 15% of the rate of the reference treatment, that is,λ 1 > 1.15λ 0 . Thus, we test H 0,r : λ 1 = 1.15λ 0 versus H 1,r : λ 1 > 1.15λ 0 . Since A MLE,r =1.15 = 1.847 > 1.645, the AMR rejects the null hypothesis: the repeated X-ray fluoroscopy (as a part of the treatment) group breast cancer incidence exceeds 115% of the incidence of the treatment without the X-ray exposure. If we use the ACR, A CMLE,r =1.15 = 1.723, we reach the same conclusion.

Wald-Type Confidence Intervals for the Difference and Ratio in Asymmetric Settings
The proposed tests can be used to derive the corresponding asymptotic 100(1 − α)% one-sided confidence intervals for the parameters of interest, that is, the difference and ratio of the rates. The two-sided confidence intervals can be derived similarly. In this section, we provide the one-sided lower confidence intervals equivalent to the AMD, AMR, and ACR. The approach applied with respect to the ACD results in a very complex expression for the lower bound, and thus, it is not presented here. The Appendix illustrates the approach for constructing the asymptotic confidence interval equivalent to the AMD. Let x 0 and x 1 denote the observed values of X 0 and X 1 , respectively. The resulting asymptotic 100(1 − α)% onesided lower interval for the difference λ 1 − λ 0 is given by The asymptotic 100(1 − α)% one-sided lower intervals for the ratio λ 1 /λ 0 are given by (L MLE,r , ∞) and (L CMLE,r , ∞), where provided that x 0 > z 2 , and To check the proposed formulas, we also used Mathematica 9.0 (Wolfram Research Inc. 2012).
To illustrate the proposed confidence intervals, we use the set-up of the example presented in Section 2.4, and construct the asymptotic one-sided 95% confidence intervals. The lower confidence bound for the difference is computed as L MLE,d = 0.0004, which is greater than zero. Thus, based on the confidence interval we reject the null hypothesis. The MLE-based lower confidence bound for the ratio is L MLE,r = 1.178 and the CMLEbased bound is L CMLE,r = 1.162,which both exceed 1.15. Therefore, each inference is consistent with the one based on the corresponding test.
A simulation study was conducted to evaluate the coverage probability of the asymptotic confidence intervals for the difference and ratio of two Poisson rates. These simulations were performed for several rate settings, which include settings with relatively small rates (λ 0 ≤ λ 1 and λ 0 > λ 1 ) and various group sample sizes, including balanced and unbalanced settings. The simulation configurations are specified in Tables 5 and 6. For each configuration, random data were generated similarly to the study discussed in Section 2.3. Then the proposed 95% lower confidence intervals were constructed. An estimate of the true coverage probability for each approach was calculated as the percentage of replications where the confidence interval contained the true parameter value (difference of rates, λ 1 − λ 0 , or ratio of rates, λ 1 /λ 0 ). Taking into account the simulation error, an intervalestimating method was said to result in under-coverage if the observed probability coverage was 94.87% or below.
Tables 5 and 6 present the coverage probability results for balanced and unbalanced designs, respectively. The simulations do not detect under-coverage for the CMLEbased interval for the ratio in almost all settings. The MLE-based intervals, however, result in under-coverage in settings where at least one group sample size is not sufficiently large, for example, less than 500. The extent of under-coverage does not seem to be more pronounced in balanced settings when compared to the unbalanced ones (or vice versa) with the same total sample size (and other simulation parameters). All methods tend to perform satisfactorily when the sample sizes and means are sufficiently large.

Discussion
This article, proposes several asymptotic Wald-type tests and interval-estimation methods for asymmetric comparisons, that is, settings where one treatment (reference one) serves as the standard treatment. The asymmetric tests reflect this: the variance of the point estimator for the difference or ratio of two Poisson rates (under the null hypothesis) incorporates only the rate corresponding to the reference treatment. The standard error is then computed using one of two commonly used estimators for the Poisson, that is, the MLE or CMLE. The CMLEbased asymmetric tests, the ACD and ACR, are shown to be equivalent to the standard (i.e., symmetric) tests proposed by Ng, Gu and Tang (2007) and Gu et al. (2008). The asymmetric tests, that is, the AMD, ACD, AMR, and ACR, are assessed in terms of Type I error rate and power. The advantages of these tests over the standard MLE-based tests (Ng, Gu, and Tang 2007;Gu et al. 2008) are discussed. The formulas for estimating the approximate power and sample size are provided. In addition, the interval-estimating methods for the difference and ratio of the rates are proposed.
The proposed tests for asymmetric and symmetric comparisons are large-sample tests and are only expected to perform well when sample sizes are sufficiently large. In fact, we have detected several settings with relatively small values of the rates and/or sample sizes where the proposed MLE-based tests inflated the Type I error rate, for example, when testing for the difference with d = −2 and t 0 = t 1 = 300, settings with λ 0 = 8 and λ 1 = 6 resulted in an inflated Type I error rate, while similar settings with λ 0 = 80 and λ 1 = 78 did not result in any inflation. However, for sufficiently large values of the rates, all large-sample tests satisfactorily control the Type I error rate. In these cases, the proposed MLE-based tests are more powerful than the corresponding CMLE-based tests and the standard MLE-based tests. The proposed formulas for estimating the power and minimum sample size can be used in practice in the design stage to obtain additional information regarding the anticipated test performance. For a fixed total sample size, the tests are expected to perform with the highest power in balanced settings, although there could be some exceptions.
We also presented Wald-type confidence interval methods that can be used for asymmetric comparisons. Similar to the conclusions stated in terms of the Type I error rate control, the MLE-based confidence interval methods may result in under-coverage when estimating the difference or ratio of the rates. The CMLEbased confidence interval method performs reasonably well.

Appendix: Constructing the Asymptotic Confidence Interval Equivalent to the Wald-MLE Asymmetric Test for the Difference of Two Poisson Rates
The Appendix illustrates the approach for using the proposed Wald-MLE asymmetric test to derive the corresponding asymptotic 100(1 − α)% one-sided confidence intervals for the difference of two rates. [Received May 2014. Revised September 2014