2023-02-14 Session 2

Survival Analysis - Session 2

Writer
Affiliation
Javier Silva-Valencia

Instituut Voor Tropische Geneeskunde. Antwerp-Belgium

Published

2023-02-14

Abstract
The objective of the exercise is to practice Cox-regression. It starts doing a survival curve (K-M) for all the sample, then by group of treatment. Finally we need to create the Cox-regression model using the classical approach to select the variables that will be part of the model.


Starting the Exercise

Indications:

The data in ExerciseII.xls are results of an (hypothetical) observational cancer trial. The aim of the study was to compare two treatment strategies for a brain tumour: one of consists of local radiation (control regimen), one of a combination of local radiation and aggressive chemotherapy (test regimen). The endpoint of the study was survival duration after therapy.

Use R to explore the data. Describe the patient demographics and survival using Kaplan-Meier curves. Do you think that the test regimen prolongs survival? Analyze the data further using Cox-regression. Describe your conclusions.

Analysis variable: Time to death

Initial time: When the treatment begins

End time: time of dead

Independent variable: Type of treatment (local radiation vs local+chemotherapy)

Step 1

First we import the data from an excel file

library(readxl)
ExerciseII <- read_excel("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2/Survival Analysis Exercise Database/Exercises/ExerciseII.xls")

To start working with survival analysis we need an event variable (1 when the event happened, otherwise 0)

In this case we already have the “died” variable

Step 2

Doing the Kaplan-Meyer curve (survival curve) for all the sample

```{r}
#Code for K-M curve
library(survival)
mod0 <- survfit(Surv(time, died) ~ 1, data=ExerciseII)
```
#Code for Kaplan-Meyer Graph
plot(mod0, mark.time=TRUE)

Step 3

Kaplan-Meyer curve (survival curve) by group of treatment

```{r}
#Code for K-M curve
mod1 <- survfit(Surv(time, died) ~ trt, data=ExerciseII)

#Code for Kaplan-Meyer Graph
plot(mod1, mark.time=TRUE, lty=1:2)
legend ("top", legend=unique(ExerciseII$trt), lty=1:2)
```

We can also see the numbers of the graphic

mod1
Call: survfit(formula = Surv(time, died) ~ trt, data = ExerciseII)

        n events median 0.95LCL 0.95UCL
trt=0 300    225    512     450     600
trt=1 200    151    434     332     605
Interpretation

In Treatment 0 (local radiation): 50% of the population was alive at day 512

In Treatment 1 (local + chemoteraphy): 50% of the population was alive at day 434

So, at first glance, it seems that in treatment 1, people tend to die faster. Is this true?? We cannot say yet, we need to do the multivariate analysis with other variables that may be intervening

Step 4

Creating Cox Regression model (using the classical model)

4.1 Creating Cox regression with each of the (possible) predictors separately (bivariate analysis):

Tip

The Possible predictors could be: gender, trt and type

All those could be predictors or counfounders because this is not a clinical trial. (In a clinical trial the amount of persons in each group is fixed)

```{r}
#Creating the Cox models for each variable (bivariate)
library(survival)
Cox1<- coxph(Surv(time, died) ~ trt, data=ExerciseII)
Cox2<- coxph(Surv(time, died) ~ gender, data=ExerciseII)
Cox3<- coxph(Surv(time, died) ~ type, data=ExerciseII)

summary(Cox1)
summary(Cox2)
summary(Cox3)
```
Call:
coxph(formula = Surv(time, died) ~ trt, data = ExerciseII)

  n= 500, number of events= 376 

      coef exp(coef) se(coef)     z Pr(>|z|)  
trt 0.2342    1.2638   0.1055 2.221   0.0264 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
trt     1.264     0.7912     1.028     1.554

Concordance= 0.514  (se = 0.013 )
Likelihood ratio test= 4.85  on 1 df,   p=0.03
Wald test            = 4.93  on 1 df,   p=0.03
Score (logrank) test = 4.95  on 1 df,   p=0.03

Call:
coxph(formula = Surv(time, died) ~ gender, data = ExerciseII)

  n= 500, number of events= 376 

           coef exp(coef) se(coef)     z Pr(>|z|)
genderM 0.03332   1.03388  0.10319 0.323    0.747

        exp(coef) exp(-coef) lower .95 upper .95
genderM     1.034     0.9672    0.8446     1.266

Concordance= 0.502  (se = 0.014 )
Likelihood ratio test= 0.1  on 1 df,   p=0.7
Wald test            = 0.1  on 1 df,   p=0.7
Score (logrank) test = 0.1  on 1 df,   p=0.7

Call:
coxph(formula = Surv(time, died) ~ type, data = ExerciseII)

  n= 500, number of events= 376 

       coef exp(coef) se(coef)     z Pr(>|z|)    
type 1.2583    3.5193   0.1112 11.32   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     exp(coef) exp(-coef) lower .95 upper .95
type     3.519     0.2841      2.83     4.376

Concordance= 0.644  (se = 0.012 )
Likelihood ratio test= 131.9  on 1 df,   p=<2e-16
Wald test            = 128.1  on 1 df,   p=<2e-16
Score (logrank) test = 141.8  on 1 df,   p=<2e-16
Results from above
Factor HR 95% IC p-value
trt 1.264 1.028 1.554 0.0264
gender 1.034 0.8446 1.266 0.747
type 3.519 2.83 4.376 < 2e-16

Now we have to select those with p < 0.200 (or 0.100 depending on our analysis plan) for further modeling

This means that all variables with p < 0.200 (type) are potential factors that could be in our final model.

We also keep “trt” because it is the main independent variable

4.2 Entering the selected variables in a multivariable model and then removing the non-significant factors one by one (always removing the least significant factor). “Backwards elimination”

```{r}
#Creating the Cox model for all selected variables (multivariate)
library(survival)
Cox4<- coxph(Surv(time, died) ~ trt + type , data=ExerciseII)
summary(Cox4)
```
Call:
coxph(formula = Surv(time, died) ~ trt + type, data = ExerciseII)

  n= 500, number of events= 376 

        coef exp(coef) se(coef)      z Pr(>|z|)    
trt  -0.5853    0.5569   0.1233 -4.747 2.06e-06 ***
type  1.5945    4.9257   0.1316 12.115  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     exp(coef) exp(-coef) lower .95 upper .95
trt     0.5569      1.796    0.4374    0.7092
type    4.9257      0.203    3.8058    6.3751

Concordance= 0.67  (se = 0.013 )
Likelihood ratio test= 154.6  on 2 df,   p=<2e-16
Wald test            = 149.9  on 2 df,   p=<2e-16
Score (logrank) test = 163.7  on 2 df,   p=<2e-16
Reply

Both independant variable (trt and type) are significant (p value = 2.06e-06 and < 2e-16 )

So we need to keep both independent variables. But, are they ok like that, or there is an interaction?

4.3 Checking for interaction

```{r}
#Creating the Cox model for all selected variables considering interaction
Cox5<- coxph(Surv(time, died) ~ trt * type , data=ExerciseII)
summary(Cox5)
```
Call:
coxph(formula = Surv(time, died) ~ trt * type, data = ExerciseII)

  n= 500, number of events= 376 

            coef exp(coef) se(coef)      z Pr(>|z|)    
trt      -0.1068    0.8987   0.2191 -0.487   0.6260    
type      1.7610    5.8185   0.1466 12.014   <2e-16 ***
trt:type -0.6405    0.5270   0.2584 -2.479   0.0132 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

         exp(coef) exp(-coef) lower .95 upper .95
trt         0.8987     1.1127    0.5850    1.3808
type        5.8185     0.1719    4.3656    7.7550
trt:type    0.5270     1.8974    0.3176    0.8745

Concordance= 0.67  (se = 0.013 )
Likelihood ratio test= 160.2  on 3 df,   p=<2e-16
Wald test            = 163.4  on 3 df,   p=<2e-16
Score (logrank) test = 189.7  on 3 df,   p=<2e-16

It seems that with the treatment effect interaction it is no longer significant (HR: 0.8987, pvalue= 0.6260), but it is not true.

Because we’re introducing the interaction, when we see the HR = 0.8987 it’s the HR for the local cancer only.

Reply

The data from the model is:

Factor HR
trt 0.8987
type 5.8185
trt:type 0.5270

But to know the correct HR of trt we have to add the HR of the interaction:

. HR of trt = 0.899^trt * 0.5270^(trt*type)

HR trt in local cancer = 0.899^trt * 0.5270^(trt0) HR trt in local cancer = 0.899^trt 1 HR trt in local cancer = 0.899

HR trt for metastatic cancer = 0.899^trt * 0.5270^(trttype) HR trt for metastatic cancer = 0.899^trt 0.5270^(trt1) HR trt for metastatic cancer = 0.899 0.527 HR trt for metastatic cancer = 0.473

4.4 We need to compare both models to check if the model with interaction is significantly better

anova(Cox5, Cox4, test="Chisq")
Analysis of Deviance Table
 Cox model: response is  Surv(time, died)
 Model 1: ~ trt * type
 Model 2: ~ trt + type
   loglik  Chisq Df P(>|Chi|)  
1 -1995.9                      
2 -1998.8 5.6712  1   0.01725 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reply

There is a significant difference with and without interaction models (p = 0.01725). So we keep the interaction.

4.5 To present better the data with interaction is better to divide the dataset according each group of interaction and to present the model separately.

Creating subsets according each group of interaction:

```{r}
table(ExerciseII$type)
LocalCancer  <-  subset(ExerciseII, type==0)  #Creating a database for local cancer
Metastatic  <-  subset(ExerciseII, type==1)    #Creating a database for metastatic
```

  0   1 
240 260 

Checking the kaplan-meyer curve and cox regressión in each subset

4.5.1 IN LOCAL CANCER

```{r}
#Kaplan-Meyer curve
modLC <- survfit(Surv(time, died) ~ trt, data=LocalCancer)
plot(modLC, mark.time=TRUE, lty=1:2)
legend ("top", legend=unique(LocalCancer$trt), lty=1:2)
```

```{r}
#Cox Regression
CoxLC<- coxph(Surv(time, died) ~ trt, data=LocalCancer)
summary(CoxLC)
```
Call:
coxph(formula = Surv(time, died) ~ trt, data = LocalCancer)

  n= 240, number of events= 151 

       coef exp(coef) se(coef)      z Pr(>|z|)
trt -0.1020    0.9030   0.2191 -0.465    0.642

    exp(coef) exp(-coef) lower .95 upper .95
trt     0.903      1.107    0.5877     1.387

Concordance= 0.511  (se = 0.016 )
Likelihood ratio test= 0.22  on 1 df,   p=0.6
Wald test            = 0.22  on 1 df,   p=0.6
Score (logrank) test = 0.22  on 1 df,   p=0.6
Interpretation:

HR local cancer = 0.90 IC95%(0.5877 1.387)

Among those with local cancer, the ones that undergo the new treatment has 10% less hazard of die than the ones that went with the traditional treatment.But this is not significant.

That is the same to say:

Among those with local cancer, getting the new treatment reduces your hazard of dying by 10%. But this is not significant.

4.4.2 IN METASTATIC CANCER

```{r}
#Kaplan-Meyer curve
modMC <- survfit(Surv(time, died) ~ trt, data=Metastatic)
plot(modMC, mark.time=TRUE, lty=1:2)
legend ("top", legend=unique(Metastatic$trt), lty=1:2)
```

```{r}
#Cox regressión
CoxMC<- coxph(Surv(time, died) ~ trt, data=Metastatic)
summary(CoxMC)
```
Call:
coxph(formula = Surv(time, died) ~ trt, data = Metastatic)

  n= 260, number of events= 225 

       coef exp(coef) se(coef)      z Pr(>|z|)    
trt -0.7685    0.4637   0.1430 -5.375 7.67e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
trt    0.4637      2.156    0.3504    0.6137

Concordance= 0.592  (se = 0.018 )
Likelihood ratio test= 28.13  on 1 df,   p=1e-07
Wald test            = 28.89  on 1 df,   p=8e-08
Score (logrank) test = 30.19  on 1 df,   p=4e-08
Interpretation:

HR metastatic cancer = 0.46 IC95% (0.3504 0.6137)

Among those with metastasis cancer, the ones that undergo the new treatment has 54% less hazard of die than the ones that went with the traditional treatment.

That is the same to say:

Among those with metastasis cancer, getting the new treatment reduces your hazard of dying by 54%

HR local cancer = 0.90