2023-02-06 Sessión 1

Step by Step - Selection of variables for modeling

Import data

Importing a CSV database under the name of “onch1302”, with “,” as separator:

onch1302 <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2/Logistic Regresion Exercise Database/Datasets/onch1302.csv", sep=",")


Starting the Exercise (MVA_LogReg_Ex1-2023):

Onchocerciasis (commonly known as River Blindness) is a chronic filarial disease found in sub-Saharan Africa and some parts of Central and South America. An onchocerciasis project was set up in 1982 in the Bo district of Sierra Leone. The aims of the project were to study epidemiological, clin-ical, immunological and entomological aspects of the disease. Prevalence surveys were undertaken in villages selected on the basis of potential high endemicity, being situated on or near rivers which are the breeding sites for the Simulium damnosum blackfly. Of the twelve villages included in the present dataset, five were situated in the south and east of the country in the forest' zone and the other seven were in thesavannah’ zone of the country. A census was taken of each village, and all villagers over the age of five years were asked to participate in the study. Coverage was over 90% in all but one of the selected villages. Diagnosis was made by taking a skin-snip, and clinical and an ocular examination were also performed. The file ONCH1302 contains data for all 1,302 subjects.

1. To make the naming a bit more logical, copy the ‘sex’ variable into a new variable called ‘female’ and the ‘area’ variable into one called ‘forest’. For ‘mf’ and ‘lesions’ you may use the variables as is. Next, recode variables with more than 2 levels (‘agegrp’ and ‘mfload’) to binary variables. For ‘agegrp’ use ‘adult’ and code as ‘TRUE’ for those aged 20 and older, ‘FALSE’ for those under 20 years of age. For ‘mfload’ you can recode to ‘highload’ taking values 0 and 1 together as ‘FALSE’ and values 2 and 3 together as ’TRUE

Creating the new variables (as a copy)

onch1302$female <- onch1302$sex
onch1302$forest <- onch1302$area

Recoding the variables with more than 2 levels into binary variables

#Adult variable (adult=1, non adult=0)
  onch1302$adult <- ifelse(onch1302$agegrp>1,1,0)

#Highload (False = 0, True=1)
  #table(onch1302$mfload, useNA = "always")
  onch1302$highload <- 0
  onch1302$highload[onch1302$mfload==2] <- 1
  onch1302$highload[onch1302$mfload==3] <- 1
2. Get an overview of the data by univariable analysis. Construct a table showing numbers and frequencies of each of the variables
table(onch1302$lesions, useNA = "always")     #frecuency

   0    1 <NA> 
1101  201    0 
prop.table(table(onch1302$lesions))           #proportion

        0         1 
0.8456221 0.1543779 
table(onch1302$female, useNA = "always")     #frecuency

   0    1 <NA> 
 616  686    0 
prop.table(table(onch1302$female))           #proportion

        0         1 
0.4731183 0.5268817 
#Forest
table(onch1302$forest, useNA = "always")     #frecuency

   0    1 <NA> 
 548  754    0 
prop.table(table(onch1302$forest))           #proportion

        0         1 
0.4208909 0.5791091 
#Highload
table(onch1302$highload, useNA = "always")     #frecuency

   0    1 <NA> 
 847  455    0 
prop.table(table(onch1302$highload))           #proportion

        0         1 
0.6505376 0.3494624 
#Adult
table(onch1302$adult, useNA = "always")     #frecuency

   0    1 <NA> 
 420  882    0 
prop.table(table(onch1302$adult))           #proportion

        0         1 
0.3225806 0.6774194 
#Microfilaria
table(onch1302$mf, useNA = "always")     #frecuency

   0    1 <NA> 
 480  822    0 
prop.table(table(onch1302$mf))           #proportion

        0         1 
0.3686636 0.6313364 
Note
Factor N
Female gender 686 (52.7)
Adult age 882 (67.7)
Living in Forest 754 (57.9)
Micro filaria infected 822 (63.1)
High micro-filarial load 455 (34.9)
Eyes affected (lesions) 201 (15.4)
3. Make three 2x2 tables to explore the association between mf infection and the three exposures, ‘female’, ‘adult’ and ‘forest’ and manually compute the odds ratios between exposed and unex-posed

We can do that directly with “cc” command from the EpiStat library. If EpiStat doesnt work you can do it manually with “table” command and a calculator

library(EpiStats)
cc(onch1302, mf,female)
$df1
                   Cases Controls Total
Exposed              396      290   686
Unexposed            426      190   616
Total                822      480  1302
Proportion exposed  0.48     0.60  0.53

$df2
                Point estimate 95%CI-ll 95%CI-ul
Odds ratio                0.61     0.48     0.77
Prev. frac. ex.           0.39     0.23     0.52
Prev. frac. pop           0.24       NA       NA
chi2(1)                  18.22       NA       NA
Pr>chi2                  0.000       NA       NA
Note
Micro filaria infected
Gender Yes No
Female 396 290
Male 426 190

OR (female) = 0.61 (0.48-0.77) p value = 0.0001

cc(onch1302, mf,adult)
$df1
                   Cases Controls Total
Exposed              677      205   882
Unexposed            145      275   420
Total                822      480  1302
Proportion exposed  0.82     0.43  0.68

$df2
                Point estimate 95%CI-ll 95%CI-ul
Odds ratio                6.26     4.82     8.15
Attr. frac. ex.           0.84     0.79     0.88
Attr. frac. pop           0.69       NA       NA
chi2(1)                 218.04       NA       NA
Pr>chi2                  0.000       NA       NA
Note
Micro filaria infected
Age Yes No
20+ 677 205
1-19 145 275

OR (female) = 6.26 (4.82-8.15) p value = 0.0001

cc(onch1302, mf,forest)
$df1
                   Cases Controls Total
Exposed              541      213   754
Unexposed            281      267   548
Total                822      480  1302
Proportion exposed  0.66     0.44  0.58

$df2
                Point estimate 95%CI-ll 95%CI-ul
Odds ratio                2.41     1.90     3.06
Attr. frac. ex.           0.59     0.47     0.67
Attr. frac. pop           0.39       NA       NA
chi2(1)                  57.15       NA       NA
Pr>chi2                  0.000       NA       NA
Note
Micro filaria infected
Residence Yes No
Forest 541 213
Savanna 281 261

OR (female) = 2.41 (1.90-3.06) p value = 0.0001

4. As a next step, use the ‘cc’ command from the ‘Epistats’ package to confirmm the associations between ‘mf’ and exposures ‘female’, ‘adult’ and ‘forest’ and add the 95% confidence intervals. Are these associations statistically significant? Who are more at risk, men or women?
Note

We did this in the previous step.

Number (%) OR (95% CI)
Female 686 (52.7) 0.6 (0.5-0.8)
Adult age 883 (67.7) 6.3 (4.8-8.2)
Forest 754 (57.9) 2.4 (1.9-3.1)

All bivariate associations are significant

5. Now compare results from table-based analyses with results from logistic regression. Are they consistent? For each of the three models note the OR and it’s 95% CI

Now we are going to calculate the OR again but with the logistic regression

5.1. mf and female

GLM1 <- glm(mf ~ female, family=binomial, data=onch1302)
  summary(GLM1)                   # To see the results of the logistic regression

Call:
glm(formula = mf ~ female, family = binomial, data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5338  -1.3122   0.8589   1.0483   1.0483  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.80742    0.08724   9.255  < 2e-16 ***
female      -0.49588    0.11655  -4.255 2.09e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1695.7  on 1300  degrees of freedom
AIC: 1699.7

Number of Fisher Scoring iterations: 4
  exp(coef(GLM1))                 # To calculate the OR
(Intercept)      female 
  2.2421053   0.6090335 
  exp(confint(GLM1))              # To calculate the confidence intervarl of the OR
Waiting for profiling to be done...
                2.5 %    97.5 %
(Intercept) 1.8930554 2.6654704
female      0.4842344 0.7647899

5.2. mf and adult

GLM2 <- glm(mf ~ adult, family=binomial, data=onch1302)
  summary(GLM2)                   # To see the results of the logistic regression

Call:
glm(formula = mf ~ adult, family = binomial, data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7083  -0.9203   0.7274   0.7274   1.4584  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.6400     0.1026  -6.236 4.48e-10 ***
adult         1.8347     0.1300  14.118  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1497.8  on 1300  degrees of freedom
AIC: 1501.8

Number of Fisher Scoring iterations: 4
  exp(coef(GLM2))                 # To calculate the OR
(Intercept)       adult 
  0.5272727   6.2632464 
  exp(confint(GLM2))              # To calculate the confidence intervarl of the OR
Waiting for profiling to be done...
                2.5 %    97.5 %
(Intercept) 0.4302207 0.6435193
adult       4.8647343 8.0981555

5.3. mf and forest

GLM3 <- glm(mf ~ forest, family=binomial, data=onch1302)
  summary(GLM3)                   # To see the results of the logistic regression

Call:
glm(formula = mf ~ forest, family = binomial, data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5900  -1.1992   0.8148   0.8148   1.1558  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.05111    0.08546   0.598     0.55    
forest       0.88102    0.11767   7.487 7.05e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1657.0  on 1300  degrees of freedom
AIC: 1661

Number of Fisher Scoring iterations: 4
  exp(coef(GLM3))                 # To calculate the OR
(Intercept)      forest 
   1.052434    2.413363 
  exp(confint(GLM3))              # To calculate the confidence intervarl of the OR
Waiting for profiling to be done...
                2.5 %   97.5 %
(Intercept) 0.8901384 1.244620
forest      1.9176644 3.042055
Note

Yes, the OR calculated with logistic regresion it is consistent with the OR calculated in step 4

Number (%) OR (95% CI)
Female 686 (52.7) 0.6 (0.5-0.8)
Adult age 883 (67.7) 6.3 (4.8-8.2)
Forest 754 (57.9) 2.4 (1.9-3.1)
6. The only exposure that we could do something about in an intervention is ‘living in the forest’, do you think age or gender could be confounders in the association between ‘living in the forest’ and being infected with micro filaria?
Note

Yes it could be confounders.

Clarification:

To being counfunders has to fulfill 3 conditions:
1. Has to be related to the VI, 2. TO be related to the VD, and 3. Not be in the causal pathway
            
            (Microfilaria Infection) ----- (living forest)
                                 \           /
                                  Confounders
                                 (Age or gender)
                                 
If it is permant residence, then probably age or gender cannot affect the residence
But in this case it doesn't say anything.

If adult men spend part of the year in the forest for professional reasons, 
both age and gender could be associated with living in forest and with having microfilaria infect.

Also age or gender do not seem to be in the causar pathway
7. Use ‘CCInter’ from the ‘Epistats’ package to test for confounding. Is there confounding or inter-action in the association between ‘mf’ and ‘forest’ by either ‘female’ or ‘adult’?

7.1 Confounding: being female

CCInter(onch1302, "mf", "forest", "female") 
$df1
   CCInter mf - forest by(female) Cases Controls          P.est. Stats 95%CI-ll
1                      female = 1  <NA>     <NA>      Odds ratio  2.87     2.07
2                         Exposed   265      120 Attrib.risk.exp  0.65     0.52
3                       Unexposed   131      170 Attrib.risk.pop  0.44     <NA>
4                           Total   396      290                  <NA>     <NA>
5                       Exposed % 66.9%    41.4%                  <NA>     <NA>
6                  ______________  <NA>     <NA>                  <NA>     <NA>
7                      female = 0  <NA>     <NA>      Odds ratio  1.92     1.34
8                         Exposed   276       93 Attrib.risk.exp  0.48     0.25
9                       Unexposed   150       97 Attrib.risk.pop  0.31     <NA>
10                          Total   426      190                  <NA>     <NA>
11                      Exposed % 64.8%    48.9%                  <NA>     <NA>
12                 ______________  <NA>     <NA>                  <NA>     <NA>
13                  Number of obs  1302     <NA>            <NA>  <NA>     <NA>
14                        Missing     0     <NA>            <NA>  <NA>     <NA>
   95%CI-ul
1      3.97
2      0.75
3      <NA>
4      <NA>
5      <NA>
6      <NA>
7      2.76
8      0.64
9      <NA>
10     <NA>
11     <NA>
12     <NA>
13     <NA>
14     <NA>

$df2
                        P.estimate Stats 95%CI-ll 95%CI-ul
1 MH test of Homogeneity (p-value)  0.09                  
2              Crude OR for forest  2.41     1.90     3.06
3 MH OR forest adjusted for female  2.40     1.90     3.02
4   Adjusted/crude relative change -0.75        _        _
Note

Steps for check for confounding

  1.Check for the wolf test (homogeneity test) if pvalue higher than 0.05,
  means no interaction (no effect modifier) and that could be a confounding
  
  2. If is not interaction, then check for the change between OR crude and OR adjusted, 
  if that relative change is higher than 10%, it is a confounding
  

So, with the results:

  1. The Homogeneity test had a p value of 0.09 (higher than 0.05)
  means no interaction and that could be a confounding
  
  2. Then, the OR crude is 2.41 and the OR adjusted 2.40. 
  Also the relative change is -0.75
  So sex  is not a confounding

7.2 Confounding: being adult

CCInter(onch1302, "mf", "forest", "adult") 
$df1
   CCInter mf - forest by(adult) Cases Controls          P.est. Stats 95%CI-ll
1                      adult = 1  <NA>     <NA>      Odds ratio  3.85     2.72
2                        Exposed   434       65 Attrib.risk.exp  0.74     0.63
3                      Unexposed   243      140 Attrib.risk.pop  0.47     <NA>
4                          Total   677      205                  <NA>     <NA>
5                      Exposed % 64.1%    31.7%                  <NA>     <NA>
6                 ______________  <NA>     <NA>                  <NA>     <NA>
7                      adult = 0  <NA>     <NA>      Odds ratio  2.42     1.53
8                        Exposed   107      148 Attrib.risk.exp  0.59     0.34
9                      Unexposed    38      127 Attrib.risk.pop  0.43     <NA>
10                         Total   145      275                  <NA>     <NA>
11                     Exposed % 73.8%    53.8%                  <NA>     <NA>
12                ______________  <NA>     <NA>                  <NA>     <NA>
13                 Number of obs  1302     <NA>            <NA>  <NA>     <NA>
14                       Missing     0     <NA>            <NA>  <NA>     <NA>
   95%CI-ul
1      5.46
2      0.82
3      <NA>
4      <NA>
5      <NA>
6      <NA>
7      3.86
8      0.74
9      <NA>
10     <NA>
11     <NA>
12     <NA>
13     <NA>
14     <NA>

$df2
                        P.estimate Stats 95%CI-ll 95%CI-ul
1 MH test of Homogeneity (p-value)  0.10                  
2              Crude OR for forest  2.41     1.90     3.06
3  MH OR forest adjusted for adult  3.23     2.48     4.22
4   Adjusted/crude relative change 34.04        _        _
Note

So, with the results:

  1. The Homogeneity test had a p value of 0.10 (higher than 0.05)
  means no interaction and that could be a confounding
  
  2. Then, the OR crude is 2.41 and the OR adjusted 3.23 
  Also the relative change is 34.04 (more than 10%)
  So being adult is a confounding
8. Does logistic regression confirm your findings on “adult” and “female” as potential confounders?

8.1 Logistic regresion only with: mf and forest (bivariate)

GLM4 <- glm(mf ~ forest, family=binomial, data=onch1302)
exp(coef(GLM4))
(Intercept)      forest 
   1.052434    2.413363 

8.2 Logistic regresion with: mf and forest and female (multivariate)

GLM5 <- glm(mf ~ forest + female, family=binomial, data=onch1302)
exp(coef(GLM5))
(Intercept)      forest      female 
  1.3739477   2.3967425   0.6165796 

8.3 Logistic regresion with: mf and forest and adult (multivariate)

GLM6 <- glm(mf ~ forest + adult, family=binomial, data=onch1302)
exp(coef(GLM6))
(Intercept)      forest       adult 
  0.2427931   3.2653865   7.6328965 

8.4 Logistic regresion with: mf and forest and female and adult (multivariate)

GLM7 <- glm(mf ~ forest + female + adult, family=binomial, data=onch1302)
exp(coef(GLM7))
(Intercept)      forest      female       adult 
  0.3313785   3.2577809   0.5342773   8.0289468 
Note

The OR of forest in model GLM4 (only with: mf and forest) is

  OR (crude) = 2.413363

  Interpretation: The odds of having a microfilarial infection are 2.41 times as high for those 
  who reside in the forest compared to those in the savanna.

The OR for forest in model GLM5 (with mf and forest and female) is 2.3967425 This ise quite the same with GLM4, so it confirms that adding female do not cause changes, so female it is not a confounder of (mf ~ forest association)

  Interpretation: The odds of having a microfilarial infection are 2.39 times as high for those 
  who reside in the forest compared to those in the savanna, if sex are held constant.

The OR for forest in model GLM6 (with mf and forest and adult) is 3.2653865 This OR is different from the GLM4, so it confirms that adding adult do cause changes, so being adult it is a confounder of (mf ~ forest association)

  Interpretation: The odds of having a microfilarial infection are 3.27 times as high for those 
  who reside in the forest compared to those in the savanna, if age(being adult) 
  are held constant.

The OR for forest in model GLM7 (with mf and forest and female and adult) is 3.2577809 This is different from the GLM4, so it still confirms that one of the variables added caused changes,

Interpretation: The odds of having a microfilarial infection are 3.26 times as high for those 
who reside in the forest compared to those in the savanna, if sex and age are held constant.
9. What happens if you add an additional term in the ‘CCInter’ command, e.g. if you type: ‘CCInter(Onch1302, “mf”,“forest”,“adult”,“female”)’?
CCInter(onch1302, "mf", "forest", "female", "adult") 
$df1
   CCInter mf - forest by(female) Cases Controls          P.est. Stats 95%CI-ll
1                      female = 1  <NA>     <NA>      Odds ratio  2.87     2.07
2                         Exposed   265      120 Attrib.risk.exp  0.65     0.52
3                       Unexposed   131      170 Attrib.risk.pop  0.44     <NA>
4                           Total   396      290                  <NA>     <NA>
5                       Exposed % 66.9%    41.4%                  <NA>     <NA>
6                  ______________  <NA>     <NA>                  <NA>     <NA>
7                      female = 0  <NA>     <NA>      Odds ratio  1.92     1.34
8                         Exposed   276       93 Attrib.risk.exp  0.48     0.25
9                       Unexposed   150       97 Attrib.risk.pop  0.31     <NA>
10                          Total   426      190                  <NA>     <NA>
11                      Exposed % 64.8%    48.9%                  <NA>     <NA>
12                 ______________  <NA>     <NA>                  <NA>     <NA>
13                  Number of obs  1302     <NA>            <NA>  <NA>     <NA>
14                        Missing     0     <NA>            <NA>  <NA>     <NA>
   95%CI-ul
1      3.97
2      0.75
3      <NA>
4      <NA>
5      <NA>
6      <NA>
7      2.76
8      0.64
9      <NA>
10     <NA>
11     <NA>
12     <NA>
13     <NA>
14     <NA>

$df2
                        P.estimate Stats 95%CI-ll 95%CI-ul
1 MH test of Homogeneity (p-value)  0.09                  
2              Crude OR for forest  2.41     1.90     3.06
3 MH OR forest adjusted for female  2.40     1.90     3.02
4   Adjusted/crude relative change -0.75        _        _
Note

What happened? Nothing. Only worked with female, doesn’t show anything about adults This command does not work for more than 1 independent variable

10.Now try to fit a logistic regression model with ‘mf’ as outcome variable and ‘forest’, ‘adult’ and ‘female’ as predictors. What happens now to the odds ratio of ‘forest’ in comparison to the model with only ‘forest’ and ‘adult’ as predictors?
Note

We already did this in step 8. They are asking for compare GLM7 with GLM6:

The OR for forest in model GLM6 (with mf and forest and adult) is 3.2653865

  Interpretation: The odds of having a microfilarial infection are 3.27 times as high for those 
  who reside in the forest compared to those in the savanna, if age(being adult) 
  are held constant.

The OR for forest in model GLM7 (with mf and forest and female and adult) is 3.2577809

  Interpretation: The odds of having a microfilarial infection are 3.26 times as high for those 
  who reside in the forest compared to those in the savanna, if sex and age are held constant.

There is a minimal change from 3.2577809 to 3.2653865. This is no surprise because we already knew that adding “female”, that is not a counfounder, so it was not going to cause change in the OR of forest.

Yet, the question still remains: for the final model will it be better to have sex or not? for this we have to compare both models with the likelihood ratio test. we will do it later

11. Please check that the null deviance (the -2LLR of the null model) is 1714.1 on 1301 degrees of freedom. For the model with ‘forest’ and ‘adult’ it is 1416.7 on 1299 degrees of freedom. For the model with ‘forest’, ‘adult’ and ‘female’ it is 1394.1 on 1298 degrees of freedom. So the differ-ence between the last two models is 1416.7-1394.1, which is equal to 22.6, on 1299-1298, i.e. 1 degree of freedom. Is this a significant difference, please check a chi square table or in Excel (formula CHIDIST(22.6,1)?
  1. Model ‘forest’ and ‘adult’ is GLM6 (Check step 8.3)
summary (GLM6)

Call:
glm(formula = mf ~ forest + adult, family = binomial, data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9765  -1.0805   0.5531   0.9290   1.8072  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.4155     0.1432  -9.884   <2e-16 ***
forest        1.1834     0.1359   8.709   <2e-16 ***
adult         2.0325     0.1402  14.499   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1416.7  on 1299  degrees of freedom
AIC: 1422.7

Number of Fisher Scoring iterations: 4
Note

For Model ‘forest’ and ‘adult’ (GLM6)

  null deviance: 1714.1
  Residual deviance: 1416.7   df: 1299
  1. Model ‘forest’, ‘adult’ and ‘female’ is GLM7 (Check step 8.4)
summary (GLM7)

Call:
glm(formula = mf ~ forest + female + adult, family = binomial, 
    data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1302  -0.9543   0.4673   0.7988   1.9465  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.1045     0.1564  -7.060 1.66e-12 ***
forest        1.1810     0.1370   8.619  < 2e-16 ***
female       -0.6268     0.1335  -4.696 2.65e-06 ***
adult         2.0831     0.1427  14.594  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1394.1  on 1298  degrees of freedom
AIC: 1402.1

Number of Fisher Scoring iterations: 4
Note

For Model ‘forest’, ‘adult’ and ‘female’ (GLM7)

  null deviance: 1714.1
  Residual deviance: 1394.1     df: 1298
Note

Note that the null deviance is the same for all the models (because the null deviance refers to a model without any factor)

So the differ-ence between the two models is

Residual deviance 1416.7-1394.1 = 22.6, 
Degrees of freedom 1299-1298 = 1 

Is this a significant difference, please check a chi square table or in Excel (formula CHIDIST(22.6,1)?

With the Chi square in excel we will obtain p=0.00002
So the difference between both models is highly significant.
12. So apparently the model with the three terms, ‘forest’, ‘adult’ and ‘female’ is significantly better than the model with only ‘forest’ and ‘adult’. The easier way to check this is to do a likelihood ra-tio test in R. First run the complex model with three terms, then run the simple model with two terms and next use the anova() function to make the comparison.

12.1 Fist the model with ‘forest’, ‘adult’ and ‘female’ (GLM7)

#We will run again the Model ‘forest’, ‘adult’ and ‘female’ (GLM7)
GLM7 <- glm(mf ~ forest + female + adult, family=binomial, data=onch1302)
summary (GLM7)

Call:
glm(formula = mf ~ forest + female + adult, family = binomial, 
    data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1302  -0.9543   0.4673   0.7988   1.9465  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.1045     0.1564  -7.060 1.66e-12 ***
forest        1.1810     0.1370   8.619  < 2e-16 ***
female       -0.6268     0.1335  -4.696 2.65e-06 ***
adult         2.0831     0.1427  14.594  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1394.1  on 1298  degrees of freedom
AIC: 1402.1

Number of Fisher Scoring iterations: 4

12.2 Then the model with ‘forest’ and ‘adult’ (GLM6)

#Model ‘forest’ and ‘adult’
GLM6 <- glm(mf ~ forest + adult, family=binomial, data=onch1302)
summary (GLM7)

Call:
glm(formula = mf ~ forest + female + adult, family = binomial, 
    data = onch1302)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1302  -0.9543   0.4673   0.7988   1.9465  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.1045     0.1564  -7.060 1.66e-12 ***
forest        1.1810     0.1370   8.619  < 2e-16 ***
female       -0.6268     0.1335  -4.696 2.65e-06 ***
adult         2.0831     0.1427  14.594  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1714.1  on 1301  degrees of freedom
Residual deviance: 1394.1  on 1298  degrees of freedom
AIC: 1402.1

Number of Fisher Scoring iterations: 4

12.3 Likelyhood ratio test: To check if adding female change the model significantly

anova(GLM6, GLM7, test="Chisq")
Analysis of Deviance Table

Model 1: mf ~ forest + adult
Model 2: mf ~ forest + female + adult
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      1299     1416.7                          
2      1298     1394.1  1   22.562 2.035e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note

So, according to the likelihood ratio test (anova) the p value is 2.035e-06 (less of 0.05) So adding sex(female) does generate a change in the model. Because of that we keep the more complex model.

The final model would be:

GLM7 <- glm(mf ~ forest + female + adult, family=binomial, data=onch1302)