Parameter Estimation and Goodness-of-Fit in Log Binomial Regression

Summary An estimate of the risk, adjusted for confounders, can be obtained from a fitted logistic regression model, but it substantially over-estimates when the outcome is not rare. The log binomial model, binomial errors and log link, is increasingly being used for this purpose. However this model’s performance, goodness of fit tests and case-wise diagnostics have not been studied. Extensive simulations are used to compare the performance of the log binomial, a logistic regression based method proposed by Schouten et al. (1993) and a Poisson regression approach proposed by Zou (2004) and Carter, Lipsitz, and Tilley (2005). Log binomial regression resulted in “failure” rates (non-convergence, out-of-bounds predicted probabilities) as high as 59%. Estimates by the method of Schouten et al. (1993) produced fitted log binomial probabilities greater than unity in up to 19% of samples to which a log binomial model had been successfully fit and in up to 78% of samples when the log binomial model fit failed. Similar percentages were observed for the Poisson regression approach. Coefficient and standard error estimates from the three models were similar. Rejection rates for goodness of fit tests for log binomial fit were around 5%. Power of goodness of fit tests was modest when an incorrect logistic regression model was fit. Examples demonstrate the use of the methods. Uncritical use of the log binomial regression model is not recommended


Introduction
The capability of fitting a generalized linear model (McCullough and Nelder, 1989) with a non-canonical link, using common statistical packages such as SAS (SAS Institute, 1999) and Stata (Stata Corp., 2003), has expanded the range of models that can be fit to data. One such model for binomial data uses a log link. This combination of response variable and link is not new (Wacholder, 1986). We refer to it as the log binomial model. The rationale for the log binomial regression model is that its coefficients may be used to directly estimate risk ratios in prospective studies and ratios of prevalence in cross-sectional data. The logistic regression model had previously been the principal method of obtaining estimates for continuous factors or estimates adjusted for continuous covariates. The logistic model uses the canonical logit link for binomial data to produce odds ratio estimates. As a measure of association in its own right, the usefulness of the odds ratio in epidemiological research has long been questioned (Miettenen, 1985) particularly for prospective (Miettenen and Cook, 1981;Greenland, 1987) and cross-sectional (Lee, 1994) data. As an estimate of the risk or prevalence ratio, the approximation is adequate when the outcome is rare in all exposure and confounder categories (Greenland,

The Log Binomial Regression Model
Suppose that we have n independent observations of a binary outcome variable Y, coded Y ¼ 1 or Y ¼ 0, and p non-constant covariates. We make no assumptions about the sampling beyond independence of the individual observations. Denote the observed data as ðy i ; x i Þ; i ¼ 1; 2; . . . ; n where x 0 i ¼ ðx 0 ; x i1 ; x 2i ; . . . ; x pi Þ with x 0 ¼ 1. We assume that the number of covariate patterns in the data is n. The covariates can be any mix of continuous and dichotomous variables, and non-linear combinations or transformations of them.
Under the log binomial model, the conditional probability of the outcome given the covariates is: x p is the model's linear predictor. It follows from (1) that the risk ratio is simply the exponentiation of the product of the coefficients times the difference in covariate values: One property of the log binomial regression model is that it is not symmetric in its coefficients with respect to the risk ratio. For a model with a single dichotomous (0/1) covariate: but: By contrast, a well-known property of the logistic regression model is that the odds ratio for the outcome Y ¼ 0 is simply the inverse of the odds ratio for the outcome Y ¼ 1. The logistic model is symmetric about a probability of 0.5, but the log binomial regression model is not. It is possible to estimate the risk ratio in (2) with the logistic regression model but the resulting expression is a quite complicated non-linear function of coefficients and covariates. If gðx; aÞ ¼ x 0 a ¼ a 0 þ a 1 x 1 þ a 2 x 2 þ . . . þ a p x p denotes the log odds for the outcome Y ¼ 1, and the first covariate is dichotomous and coded zero or one, then the risk ratio estimator from the logistic model for it controlling for all other model covariates is: . . . ; x p Þ ¼ e a 1 1 þ e a 0 þa 2 x 2 þ:::þa p x p 1 þ e a 0 þa 1 þa 2 x 2 þ:::þa p x p : When the outcome is "rare", 1 þ e a 0 þa 2 x 2 þ...þa p x p 1 þ e a 0 þa 1 þa 2 x 2 þ...þa p x p % 1 and e a 1 % e b 1 . The relative simplicity of the expression in (2) compared to that in (3) provides a rationale for using the log binomial regression model. For ease of notation, we denote the log binomial regression model in (1) evaluated at b and x i as qðx i Þ. It follows that the conditional log-likelihood of the data is The score equations from (4) are: x ij ðy i À qðx i ÞÞ 1 À qðx i Þ ; j ¼ 0; 1; 2; . . . p : Denote the likelihood-maximising solution to (5) asb b, and the fitted values from the model aŝ q qðx i Þ ¼ exp ðx 0b bÞ. A consequence of the solution of the score equation for the intercept parameter is that the sum of the observed values is equal to the sum of the estimated odds over the subjects with the response absent: This is in contrast to that for a canonical link logistic regression model, for which the sums of observed and fitted values are equal. This may have implications for the goodness-of-fit tests presented in Section 3. Under standard theory for maximum likelihood, estimators of the covariance matrix of the estimators of the parameters are functions of the matrix of second partial derivatives of the log likelihood function. In the case of the log binomial regression model with log likelihood function (4) and first partial derivatives (5), the general term in the ðp þ 1Þ Â ðp þ 1Þ matrix of second partial derivatives is: This is the observed information matrix. Evaluated atb b, it can be written as: where X is the n Â ðp þ 1Þ matrix of data, (7) and (8) depend on the observed value of the outcome variable. This is a consequence of the log link, which is not the canonical link for Bernoulli/binomial data. Under the assumption that EðY i j x i Þ ¼ qðx i Þ, the expected information matrix evaluated atb b is: The observed and expected information are not the same. This leads to two different estimators of the covariance matrix of the parameter estimators. The observed covariance matrix is the inverse of the matrix in (8): and the expected covariance matrix is the inverse of the matrix in (9): The situation is further complicated by the use of an "information sandwich" to provide robustness to model misspecification. In the log binomial setting, the "meat" of the sandwich is based on the outer product of the matrix of first partial derivatives (score equations) in (7) namely: wherer r is a n Â 1 vector with general element ðy i Àq qðx i ÞÞ ð1 Àq qðx i ÞÞ . Thus the robust version using the observed information matrix is: The robust version using the expected information is termed the "semi-robust" estimator. It is: Thus there are four different variance estimators to select from depending on the method chosen to solve the likelihood equations and whether or not the robust option is requested. In simulations, we examined the sampling distributions of the four estimators in Eqs. (10), (11), (13) and (14), and found no important differences. The work of Efron and Hinkley (1978) shows that, at least asymptotically, the estimator (10) based on the observed information should have better properties than the estimator (11) based on the expected information. In the simulation results reported in Section 4, we chose to use more robust version (13) of estimator (10). Another issue in fitting the log binomial regression model is that the parameter space is constrained. Two of the adverse outcomes in attempting to solve (5) are that the iterative methods used by the software package may fail to converge, and that a converged solution may not be admissible.
Here we define an admissible solution as a value ofb b such that x 0 ib b < 0; i ¼ 1; 2; . . . ; n and hence 0 <q qðx i Þ < 1. We discuss this more extensively in Section 4. Schouten et al. (1993) recognized the convergence problem and proposed a clever solution to it. To employ their method, a new outcome variable D is defined with value d ¼ y, and the data for subjects with y ¼ 1 are replicated but with the additional observations assigned the value d ¼ 0. Schouten et al. (1993) show that consistent and asymptotically normally distributed estimators of the log binomial slope parameters may be obtained by simply fitting a logistic regression model to the expanded data: They also derive a consistent estimator of the covariance matrix of the estimated coefficients. It is an "information sandwich" type of estimator that is described in the Appendix. While an estimate can be obtained from nearly any set of data, the problem is that the solution is not guaranteed to be admissible for the log binomial model. This is an issue we investigate in the simulations of Section 4.
Recently, McNutt et al. (2003) investigated the use of the Poisson regression model (log link, Poisson errors) to estimate relative risk for binomial data. This suggestion follows from the fact that the standard generalized linear models parameterization of the mean of the Poisson regression model: is of the same form as the log binomial model in (1). McNutt et al. (2003) reported that the Poisson model produced confidence intervals that were too wide because the Poisson errors are over-estimates of the binomial errors when the outcome is not rare. Zou (2004) proposed correcting for this by using the information sandwich estimator to obtain variance estimates that are robust to the error misspecification. Carter et al. (2005) noted that solving the score equations from a Poisson regression model yields a quasi-likelihood estimator for the log binomial regression model. Hence these coefficient estimates consistently estimate the coefficients from the log binomial model in (1). They show that by using the information sandwich estimator of the covariance matrix of the estimated coefficients following the Poisson regression fit, one obtains a consistent estimator of the covariance matrix of estimated coefficients from a log binomial fit. The advantage of the Poisson regression approach is that it requires no data modification and can be easily performed using widely available software such as SAS and Stata. Because the mean of the Poisson regression model is not constrained to be less than one, however, the resulting estimated coefficients may yield inadmissible fitted values for the log binomial model probabilities. This is investigated in the simulations in Section 4. Deddens et al. (2003) propose an ad hoc method for solving the problem of non-convergence when fitting the log binomial model using SAS. They suggest adding 998 copies of the original data and one additional copy of the data with the outcomes reversed, thus expanding the sample to 1000 times the original. The single copy with the outcomes reversed is to enhance the chance of a non-boundary solution but not with so much errant data that the solution is no longer consistent. They suggest correcting the estimates of the standard errors from the expanded data fit by multiplying them by ffiffiffiffiffiffiffiffiffiffi 1000 p . We tried this method using the SAS macros provided by Deddens et al. (2003), and found it yielded a convergent solution in each example data set tested. However this expanded data method did not solve convergence problems in Stata. Because the method does not work in all packages, it cannot be recommended for general use and we did not include it in the simulations.
In practice, when one wants to provide a regression model-adjusted estimate of the risk ratio, there are four choices: 1) fit the log binomial regression model; 2) fit the expanded data logistic model; 3) fit a Poisson regression model to the observed binary outcome data; 4) fit the usual logistic model, and either concede that the odds ratio "exp ðâ aÞ" will over-estimate the risk ratio or make use of expression (3) to calculate the risk ratio from the logistic model estimates. In Section 4, we compare methods 1), 2) and 3) in simulations. Before doing so, we discuss extensions to the log binomial model of the goodness-of-fit tests and case-wise diagnostic statistics used in logistic regression.

Goodness-of-Fit and Casewise Diagnostics
Suppose that we have obtained an admissible solution to the log binomial model. Usually the next step in a regression analysis is to assess goodness-of-fit and examine case-wise diagnostics for leverage, residual and influence. In this section we present extensions to the log binomial regression model of some of the goodness-of-fit tests for the logistic regression model. We also develop case-wise diagnostic statistics for use in evaluating the effect that individual subjects have on parameter estimates and model fit.

Goodness-of-fit tests
The simplest logistic regression goodness-of-fit test to consider for the log binomial regression model is the deciles-of-risk test. It is described in Chapter 5 of Hosmer and Lemeshow (2000). The basic idea of the test is to compare observed outcome frequencies to model based estimated frequencies within groups based on the rank ordered the fitted values. We could use any number of groups, but g ¼ 10 equal sized "deciles of risk" are used most frequently in practice. The quantities for the k-th containing the indices of the subjects in the k-th decile of risk. The proposed goodness-of-fit test is: In the logistic regression setting, Hosmer and Lemeshow (1980) show thatĈ C $ c 2 ð10 À 2 ¼ 8Þ when the correct model is fitted. The degrees of freedom for this test are calculated as the number of groups minus two.
The problem with applying the "groups minus two" rule to a fitted log binomial regression model is that P 10 k¼1 o 1k 6 ¼ P 10 k¼1ê e 1k for the log binomial model. As noted in the previous section, inclusion of an intercept term in a log binomial regression model places a constraint on the sum of the odds rather than on the sum of the fitted values. In nearly every data set we have fit, however, the sums of the observed and fitted valueswhile not exactly equalhave been extremely close. Simulations in the logistic regression setting have shown that the sampling distribution ofĈ C is well approximated by the c 2 ð10 À 2 ¼ 8Þ distribution when the correct model has been fit. We believe that these same results should hold for the log binomial model. In the simulations we examine the empirical rejection rate when the correct log binomial model has been fit, and the power of the test when the incorrect logistic regression model is fit.
One of the criticisms of the decile-of-risk test is that, by grouping, subtle departures from model fit may be missed. The problem of the effect on significance levels of cells with small expected values in the 2 by 10 table is noted in Pigeon and Heyse (1999). They propose scaling the contribution to the test of the two cells in each column, ðo 1k Àê e 1k Þ 2 , by the sum of the individual variances, P i 2 C kq qðx i Þ ð1 Àq qðx i ÞÞ, rather than by n 10 q qðx i Þ ð1 À q qðx i ÞÞ where q qðx i Þ ¼ 10 n P i 2 C kq qðx i Þ and C k indexes the subjects in the k-th decile of risk. Bertolini et al. (2000) conducted a study of 1393 subjects and demonstrated the sensitivity of the value ofĈ C and thus its significance level to the order of tied subjects and hence the actual groups used. In nearly a million arrangements of their data, the p-values forĈ C ranged from 0.011 to 0.950 and averaged 0.296. Hosmer et al. (1997) studied, via simulation, two tests for goodness of fit for the logistic regression model that do not group the data. Expressed in the context of the log binomial model, these are the Pearson chi-quare: and the unweighted sum of squares: The actual test statistics are standardized statistics with significance levels calculated using the standard normal distribution. Moment estimators for the Pearson chi-square are given by Osius and Rojek (1992). The statistic is an example of their test with power 1 and log link. After fitting the log binomial model, four new variables are created from the fitted values for each subject: . The estimator of the variance of X 2 , denotedŝ s 2 X 2 , is the residual sum-of-squares from the linear regression of c i on the covariates using weights equal tô w w i . The standardized statistic is: The unweighted sum of squares computed from binary data with the log link is not a member of the family of tests developed by Osius and Rojek (1992). Hosmer et al. (1997) derive a general estimator of the variance ofŜ S À Pv v i , denotedŝ s 2 S , which is the residual sum-of-squares from the linear regression of d i on the covariates using weights equal tov v i . The standardized test statistic is: The empirical rejection rates of these three tests, and their power when a logistic regression model is fit to log binomially generated data, are studied in Section 5.
3.2 Casewise diagnostic statistics for the log binomial model Pregibon (1981) derived case wise diagnostic statistics for the logistic regression model by capitalizing on the similarity between linear regression and the iteratively re-weighted least squares procedure used to solve the score equations from a generalized linear model. McCullagh and Nelder (1989, pp. 405-407) generalize the results to any generalized linear model. In the case of log binomial regression, the values of the adjusted dependent variable used in the iterative re-weighted linear regression are: . . . ; n and the weights are the estimated odds,ŵ w i , used as weights in the variance estimator for the Pearson chi-square. McCullagh and Nelder (1989) show that the leverage values are the diagonal elements of the hat matrix, H, from this regression: whereŴ W ¼ diag ðŵ w i Þ and X is the nðp þ 1Þ data matrix. The individual leverage values are: The weightŵ w i increases monotonically from a value of zero atq qðx i Þ ¼ 0 and without upper bound. The second term on the right hand side of (19) is a "quadratic like" function that increases as x i gets further from the vector of means, X X. The net result is thatŵ w i can have considerable effect on the value of h i . Subjects with large fitted values, sayq qðx i Þ > 0:7, will almost certainly have the highest leverage. Subjects with small fitted values, sayq qðx i Þ < 0:2, will almost certainly have small leverage. In contrast, the leverage values in logistic regression go to zero when the probability goes to zero or one.
Once the correct leverage values have been obtained, it is a relatively straightforward task to obtain the values of the Pearson residual, standardized Pearson residual, change in chi-square and the Cook's distance measure of influence. The formulae are the same as for the logistic regression model in Hosmer and Lemeshow (2000, Chapter 5) and are as follows: for i ¼ 1; 2; . . . ; n. We note that some packages scale Cook's distance (23) by 1=p. We illustrate leverage and use these case-wise diagnostic statistics to evaluate model fit in Section 6.

Simulation Study
There are a number of difficulties that one may encounter when fitting a log binomial regression model. In this section, we describe and present the results of simulation studies conducted to examine those issues. All calculations were performed using Stata (Stata Corp., 2003). The first issue we examine is an assessment of what factors affect convergence of the fitting algorithms and the admissibility of the solution once convergence is attained. We hypothesized that the occurrence of fitting problems would be directly related to the size of the fitted probabilities from the log binomial regression model. When the fitted probabilities are high, the expanded data logit model proposed by Schouten et al. (1993) would more often provide inadmissible estimates as well. The rationale for the hypothesis is that the log binomial regression model has the exponential-like shape of the logistic model for probabilities less than 0.5, and their shapes only really begin to diverge at larger probabilities.

Simulations I
The basic design was a univariable model with covariate distributed uniformly on the interval ½À6; a. The intercept b 0 and slope coefficient b 1 were chosen such that Pr ðY ¼ 1 j x ¼ À6Þ ¼ 0:01 and PrðY ¼ 1 j 0.3, 0.5 and 0.7. Given those values of b 0 and b 1 , the upper bound a of the uniform distribution was chosen so that the marginal probability of response: Þ was equal to 0.1 or 0.2 (the actual values of a were rounded). These constraints yielded the 8 settings described in Table 1. We expected convergence and admissibility problems in settings 1, 3, 5 and 7because the maximum probability in those settings was high (in excess of 0.9)but not in the other settings, where the maximum probability was lower (around 0.5). We used 1000 replications of sample sizes 250 and 500, but report only the results for sample size 500 because the results for sample size 250 were similar. Full results are available on request.
The results of fitting the three models are shown in Table 2. As expected, fitting problems occurred in settings 1, 3, 5 and 7. The average frequency of y ¼ 1 in the 1000 successful samples was around 21 percent in those settings, and around 12 percent in the settings without fitting problems (settings 2, 4, 6 and 8). In setting 1, for example, a total of 2450 samples were required to obtain 1000 successful samples without log binomial fitting problems. In the 1000 successful samples, inadmissible log binomial probabilities occurred on 18.9 percent of occasions when calculated using the parameter estimates from the expanded data logit model, and on 12.3 percent of occasions when calculated using the Poisson regression estimates. In the 1450 unsuccessful samples in setting 1 where log binomial fitting problems occurred, inadmissible log binomial probabilities occurred on 76.9 percent of occasions when calculated using the parameter estimates from the expanded data logit model, and on 75.6 percent of occasions when calculated using the Poisson regression estimates. The average maximum Table 1 Design of the first set of simulations. probability from a logit model fit to the unsuccessful samples was 0.77. Settings 3, 5 and 7 produced similar results. Results of estimating the coefficient for the uniformly distributed, continuous covariate are shown in Table 3. The summary statistics presented are the average percent relative bias, 100 1000 P 1000 k¼1 ðb b 1k À b 1 Þ b 1 ; 100 times the average mean squared error, 100 1000 P 1000 k¼1 ½ðb b 1k À b 1 Þ 2 þ Vâ arðb b 1k Þ whereb b 1k is the respective estimated coefficient from the k-th replication and b 1 is the target value given in the second column of Table 1; and the observed 95 percent confidence interval coverage of the true value, b 1 ,with confidence intervals computed asb b 1k AE 1:96 Â SÊ Eðb b 1k Þ. The robust version (13) of the observed information matrix was used to calculate SÊ Eðb b 1k Þ for the log binomial model. We used the information sandwich estimator from the Poisson regression fit. Confidence interval coverage differs significantly, at the five percent level, from the target value of 95 percent if it is less than 93.6 percent or greater than 96.4 percent. The log binomial estimator had the smallest percent relative bias in all but one setting, and the smallest mean squared error in all settings. In six out of nine settings, the percent relative bias of the Poisson regression estimator was smaller than the percent relative bias of the expanded data logit model. In all settings, the mean squared error of the Poisson regression estimator was smaller than that of the expanded data logit model estimator. The numerical differences in these quantities between the three models are, in our opinion, quite small. This suggests that the expanded data logit and Poisson regression estimators provide a close approximation to the log binomial estimator, with the Poisson regression estimator having slightly better properties. Only in setting 4 was the confidence interval coverage for the log binomial estimator significantly different from 95 percent. In six of eight settings, the coverage for the expanded data logit and Poisson regression estimators were not significantly different from 95 percent. Coverage in the other two settings was slightly less than 95 percent.

Simulations II
The first set of simulations (settings 1-8) was designed primarily to study the influence of the magnitude of the probabilities and the marginal response proportion on the ability to successfully fit the log binomial model. The simplicity of the model with a single continuous covariate allowed considerable control over those factors. The disadvantage is that univariable models are quite simple when compared to the typical model encountered in practice. A second set of simulations was performed with models containing a dichotomous covariate, D $ BðpÞ, and a uniformly distributed continuous covariate, U, which was associated with D by being generated from the density function f ðU j D ¼ dÞ ¼ UðÀ6 þ 2d; 2 þ 2dÞ. The parameters were chosen as follows: intercept coefficient b 0 ¼ ln ð0:3Þ, slope coefficient for the dichotomous covariate b D ¼ ln ð1:5Þ or b D ¼ ln ð2:0Þ, and Bernoulli probability for the dichotomous covariate p ¼ 0:2 or 0:5. The expression for the marginal response probability: and the maximum possible log binomial probability, e b 0 þb D þb U ð4Þ , were evaluated for a range of values between 0 and 1.0. The value of b U chosen was the one yielding the largest value of (24) and a maximum probability of at least 0.9. This ensured that the uniform covariate, U, would be a significant predictor and that the log binomial and logistic model would be as different as possible. These considerations resulted in the designs shown in Table 4. The principal aim of the second set of simulations was to compare the estimators of the coefficients for D and U from the log binomial, expanded data logistic model and Poisson regression models. The results of the simulations are presented in Table 5.
The log binomial estimator again had the smallest precent relative bias in most settings and smallest mean squared error in all settings. Its confidence interval coverage was not significantly different from 95 percent in seven of eight settings. The Poisson regression model estimator has smaller percent relative bias than the expanded data logit model estimator in three of the eight settings and its mean squared error was smaller in all eight. The confidence interval coverage of these two estimators was not significantly different from 95 percent in seven of eight settings. While the Poisson regression model estimator has slightly better properties than the expanded data logit model, the two are quantitatively similar and not greatly inferior to the log binomial model. Again, this suggests that these two estimation methods may provide a good approximation to the estimator from the log binomial model.
In simulation settings 9-12, the number of samples generated to obtain 1000 successful replications (to which the log binomial model could be fitted) ranged from 1040 to 1336. In the 1000 successful samples, inadmissible fitted values for the log binomial model occurred on 6 to 12 percent of occasions when calculated using the parameter estimates from the expanded data logistic model, and on 3 and 7 percent of occasions when calculated using the Poisson regression model estimates. In the unsuccessful samples the two methods yielded inadmissible estimates in 57 to 70 percent of the samples.
In summary, the simulations showed that when the log binomial model could be fit, it provided a better estimator than either the expanded data logit model or the Poisson regression model. The Poisson regression estimator had slightly better properties than the expanded data logit model estimator, but the differences between these two estimators were not great.

Assessing Model Fit
We noted in Section 3.1 that commonly used goodness-of-fit tests for the logistic regression model should be able to be adapted for use with the log binomial model. In this section, we report the simulation results in settings 1-12 for the Hosmer-Lemeshow decile of risk test (HL test) in (15), the normalized Pearson chi square in (16) and normalized unweighted sum of squares in (17) when these tests are used with the correctly fit log binomial model. We also noted in Section 3.1 that the degrees of freedom for the HL test may differ slightly from the "groups minus two" rule for applications in logistic regression because there is not a sum of observed equal to sum of expected constraint on the fitted values from the log binomial. Additionally, we assess the power of each test to reject fit when a logistic model is fitted to log binomial generated data. The rejection percentages in Table 7 for the normalized Pearson and unweighted sum of squares tests are based on moment estimates for the logit link function given in Hosmer et al. (1997) and not on the log link moments in Section 3.1. The simulated type I error rates are presented in Table 6. The empirical rejection percentages for the HL test were obtained from a p-value computed using 8 degrees-of-freedom (the "groups minus two" rule). Rejection percentages are significantly different from 5 percent at the 5 percent level if less than 3.6 or greater than 6.4. Using this criterion, the observed percentages support use of the "groups minus two" rule in settings 1-5 and 7-9. The mean of the HL test was less than 8 in settings 9-12, however, and the rejection rates were numerically less than 5 percent in nine of twelve settings. This suggests that the "groups minus two" rule results in a test that rejects slightly less often than the specified level but we are reluctant to suggest a reduction in the degrees-of-freedom until additional simulations are performed, especially with models containing more than two covariates. Pending those results, we recommend use of the HL test with the "groups minus two" rule for assessing fit of a log binomial model.
The empirical rejection rates for the normalized Pearson chi-square and normalized unweighted sum of squares were mixed. The rejection rates of the Pearson chi-square were not significantly differ- ent from 5 percent in settings 1 to 8, but it rejected slightly more often than expected in settings 9 and 10. In settings 11 and 12, the rejection percentages were closer to 10 percent than to 5 percent. The observed significance level for the unweighted sum of squares test was significantly different from 95 percent in all but settings 2, 6, 8 and 9. The rejection percentages were too small in settings 1, 3-5 and 7, and too large in settings 2 and 10-12. In general the Pearson chi square has better sampling properties than the unweighted sum of squares statistics when testing for fit of a correctly specified model. The power of each of the four tests to detect lack of fit when a logistic regression model was fit to the simulated data from a log binomial model is shown in Table 7. The logistic regression model and log binomial model are nearly indistinguishable when the probabilities from the models are less than 0.5, and we expect the tests to have power only when some fitted values from a logistic model fit to the data are large (greater than 0.7). This was the case in settings 1, 3, 5, 7, 10 and 12, and only in those settings was the power of the HL test much greater than the nominal 5 percent level. Power was highest in settings 1, 3, 5 and 7 because their design was such that probabilities larger than 0.9 were possible.
The power of the normalized Pearson chi-square was of the same order of magnitude as the HL test in settings 1, 3, 5 and 7. There were no appreciable differences in the power of the three tests in those settings (2, 4, 6 and 8) where there was little difference between the logistic and log binomial models. The normalized unweighted sum of squares had more power than the normalized Pearson chi-square in the first set of simulations (settings 1-8). This was reversed in the second set of simulations (settings 9-12), where the normalized Pearson chi square demonstrated more power than the normalized unweighted sum of squares test. None of the tests had power greater than 70 percent in any of the settings.
In summary, the simulations support using the HL test with the "groups minus two" rule for the degrees of freedom. This test had about the correct Type I error rate, and low to modest power when there was an appreciable difference in the fitted models under logit and log binomial links. The performance of the normalized Pearson and unweighted sum of squares tests when testing for fit under the null were mixed, suggesting that the moment approximations require quite large sample sizes to be accurate, especially in multivariable models. This must be kept in mind when using these tests in practice. The difficulty here is that power is greatest in those settings where fitting a log binomial model is most likely to present problems. We may fit a logistic regression model to a given data set and find it does not fit but, when we attempt to fit a log binomial model to the same data, there is a high likelihood that the software will not converge regardless of the initial guess, or will yield inadmissible fitted values if it does converge. In these instances, one is essentially forced to use the logistic regression model, the expanded data logistic model or Poisson regression model with a check to be sure its solution is admissible.

Example of Assessing Model Fit with Real Data
As an example we use data from a follow-up study (Blizzard et al., 2003) of infants in Tasmania, Australia, who participated in the Tasmanian Infant Health Study (TIHS). The subjects for this analysis were restricted to 528 infants in the TIHS who were born in southern Tasmania during 1994 or 1995 to mothers who smoked tobacco at 4 weeks post-partum. The binary (1/0) outcome (y) was whether or not the infant was hospitalised during the first 12 months of life with respiratory infection as a discharge diagnosis. A total of 46 (8.7 percent) of the 528 infants had one or more hospitalizations. The binary (1/0) study factor ("same_room") was whether or not the mother sometimes or usually smoked in the same room as the infant. The birth weight ("bwt") of the infant was included as a continuous covariate. As the authors did in the original study (Blizzard et al., 2003), we fitted a log binomial model to these prospective data to obtain risk ratio estimates adjusted for continuous covariates. No problems occurred in fitting the log binomial model because the maximum value (0.27) of the estimated logistic probabilities used as initial guesses in the log binomial fit was low.
The fit is presented in Table 8. The estimated risk ratio for the main exposure variable (same_room) was 1.93 [95 percent CIE (1.05,3.53)]. The odds ratio estimate of this risk ratio from a logistic regression model fitted to the data was 2.08, which represents a 16 percent greater elevation in risk than the log binomial model estimate. Before feeling confident about our inferences, we needed to assess the model fit and check the diagnostic statistics. The HL test had a value of 3.80 that, with 8 degrees-offreedom, yielded a p-value of 0.92. The Pearson chi square had a p-value of 0.92 and the unweighted sum of squares had a p-value of 0.90. We conclude that the tests support overall model fit.
The leverage values from (19) for the fitted model in Table 8 are plotted versus the fitted values in Figure 1. Two subjects appear in the top right corner as having greater leverage than the other subjects. However the actual values were less than 0.08.
In Figure 2, we plot the deletion diagnostic (22) against the fitted values with plotting symbol proportional to Cook's distance (23). In this plot there are 22 poorly fit subjects with DX 2 > 10. There are two subjects whose circles are larger than the others: one in the lower right corner with a predicted probability ofq q ¼ 0:29, and the other with a predicted probability ofq q ¼ 0:14. A box plot of Cook's distance (not shown) indicates that these two subjects have Db b > 0:16 and the next largest value is 0.13. The subject with the predicted probability of 0.29 had one or more admissions (y ¼ 1), a mother who smoked in the same room, and the second lowest birth weight at 0.828 kg. This subject had the second largest leverage, but was not too poorly fit (DX 2 ¼ 2:7) and its influence was due mainly to leverage. The subject with predicted probability of 0.14 had one or more admissions (y ¼ 1), a mother who did not smoke in the same room, and the fourth lowest birth weight at 0.918 kg. This subject had a leverage value in the 95th percentile and was modestly poorly fit (DX 2 ¼ 6:2). Thus its influence was a combination of leverage and lack of fit. Removing these two subjects from the data set and refitting the model left the coefficient for same_room essentially unchanged, but the coefficient for birth weight increased by 27 percent. Although the birth weights of these two subjects are low, they are clinically valid and we kept them in the analysis.  Table 8.   Table 8 with plotting symbol proportional to Cook's distance.
In summary, we conclude that the log binomial model fits the data and no subjects can be excluded from the analysis. In practice the next step would be to interpret the fitted model for the subject matter audience.

Summary and Conclusions
In this paper, we have explored four aspects of using the log binomial model: (1) factors determining the ability of currently available software to fit an admissible model to a set of data; (2) a simulationbased comparison of the estimators of coefficients and standard errors from the log binomial model, the expanded data logistic regression model of Schouten et al. (1993), and the Poisson regression model; (3) an evaluation by simulation of extensions to the log binomial model of the Hosmer-Lemeshow, normalized Pearson chi square and normalized unweighted sum of squares goodness of fit tests for the logistic regression model; and (4) illustration of the use of log binomial case-wise diagnostic statistics to assess leverage, residual and influence.
The simulations demonstrated that the primary determinants of the ability of currently available software to fit the log binomial model, and yield admissible estimators of the model coefficients, are the marginal response proportion and the magnitude of the maximum probability from a logistic model fitted to the data. The later factor was the more important of the two. In analyses not reported but available on request, we determined that there was a high probability (greater than 80 percent) that fitting problems occur if the maximum probability from the fitted logistic model was greater than 0.8.
The percent relative bias and mean squared error of the log binomial estimator were slightly superior to those of the expanded data logit model and the Poisson regression model. The Poisson regression estimator performed marginally better than the expanded data logit estimator in our simulations. One problem is that the estimators of coefficients obtained from the expanded data logistic model and Poisson regression model produced fitted values greater than 1.0 for the log binomial model in as many as 19 percent of samples in some settings. The caution here is that even though one may always obtain an estimator from fitting the expanded data logistic model or Poisson regression model, there is no guarantee that it will be admissible for the log binomial model. Thus if either one of these models is used in difficult to fit settings, we believe it is essential to check for the admissibility of the estimates before making inferences from the fit. The confidence interval coverage of all three estimators was in general close to the specified 95 percent level.
The Hosmer-Lemeshow (HL) decile-of-risk test is one of the most frequently used post-fit tests for model adequacy in logistic regression. Our simulations showed that the test might be employed to assess fit of a log binomial model. Interestingly, the "groups minus two" rule for degrees-of-freedom yields a test in the log binomial setting that rejects slightly less than the nominal level in some settings, and was never larger than the nominal level. If one is concerned about the HL test's sensitivity to grouping, the normalized Pearson chi square and/or the normalized unweighted sum of squares test may be used instead but the simulations showed that the rejection percentages for those tests can differ from the nominal level. The Pearson chi-square had the better performance.
The simulations also showed that all three tests, when applied to a fitted logistic regression model, had low power to detect the incorrectly specified link function unless there was a considerable difference between the logistic and log binomial models.
It is always sound statistical practice to use case-wise diagnostic statistics to assess the effect that individual subjects have on model fit summary statistics and coefficient estimates. The paper demonstrates via an example that application of standard generalized linear model definitions of leverage, residual and influence yield easily computed and interpretable diagnostics. The only major difference between the diagnostics for logistic and log binomial regression models is the leverage. In logistic regression the leverage goes to zero as the fitted value goes to zero or one. With the log binomial model, the leverage goes to zero as the fitted values goes to zero but grows without bound as the fitted value goes to one. The paper shows that the reason for this difference is that a weight equal to the model-based estimate of the variance is used in logistic regression when calculating the leverage, but with the log binomial model the weight is equal to the model-based estimate of the odds. The standard graphical presentations of leverage, residual and influence used in logistic regression yield the same easily interpreted graphical presentations for the log binomial model.
There are several limitations of our research that are worth keeping in mind. Many of the results are based on simulations, and we did not simulate multivariable models as complicated as one typically encounters in practice. In the simulations examining the sampling distributions of estimators and fit tests, data were generated in each setting until 1000 successful log binomial fits were obtained. Also we used results based on maximum likelihood theory to compute Wald-type confidence intervals. The log binomial model places the constraint on estimators that the linear predictor must be negative for all subjects in the sample, and assumptions about the parameter space required for maximum likelihood theory do not hold. We speculate that this is not likely to have an appreciable effect on asymptotic sampling distributions in settings when the log binomial has no problems being fit (e.g. maximum probability less than 0.5 over the covariate space). We are less sure in the troublesome cases when the log binomial model may experience difficulties in being fit.
In summary, we feel that the log binomial model does offer a practical solution to the problem of obtaining multivariable adjusted estimates of the risk ratio. However the model must be used with some care and attention to detail. In particular, we believe that the estimator obtained from any software packages must be admissible before its coefficients are used for inferential purposes. We highly recommend the standard statistical practice of using tests of fit and case-wise diagnostics to assess model adequacy.