2023-02-14 Session 2
Survival Analysis - Session 2
Starting the Exercise
Indications:
The data in ExerciseII.xls are results of an (hypothetical) observational cancer trial. The aim of the study was to compare two treatment strategies for a brain tumour: one of consists of local radiation (control regimen), one of a combination of local radiation and aggressive chemotherapy (test regimen). The endpoint of the study was survival duration after therapy.
Use R to explore the data. Describe the patient demographics and survival using Kaplan-Meier curves. Do you think that the test regimen prolongs survival? Analyze the data further using Cox-regression. Describe your conclusions.
Analysis variable: Time to death
Initial time: When the treatment begins
End time: time of dead
Independent variable: Type of treatment (local radiation vs local+chemotherapy)
Step 1
First we import the data from an excel file
To start working with survival analysis we need an event variable (1 when the event happened, otherwise 0)
In this case we already have the “died” variable
Step 2
Doing the Kaplan-Meyer curve (survival curve) for all the sample
Step 3
Kaplan-Meyer curve (survival curve) by group of treatment
We can also see the numbers of the graphic
Call: survfit(formula = Surv(time, died) ~ trt, data = ExerciseII)
n events median 0.95LCL 0.95UCL
trt=0 300 225 512 450 600
trt=1 200 151 434 332 605
In Treatment 0 (local radiation): 50% of the population was alive at day 512
In Treatment 1 (local + chemoteraphy): 50% of the population was alive at day 434
So, at first glance, it seems that in treatment 1, people tend to die faster. Is this true?? We cannot say yet, we need to do the multivariate analysis with other variables that may be intervening
Step 4
Creating Cox Regression model (using the classical model)
4.1 Creating Cox regression with each of the (possible) predictors separately (bivariate analysis):
The Possible predictors could be: gender, trt and type
All those could be predictors or counfounders because this is not a clinical trial. (In a clinical trial the amount of persons in each group is fixed)
```{r}
#Creating the Cox models for each variable (bivariate)
library(survival)
Cox1<- coxph(Surv(time, died) ~ trt, data=ExerciseII)
Cox2<- coxph(Surv(time, died) ~ gender, data=ExerciseII)
Cox3<- coxph(Surv(time, died) ~ type, data=ExerciseII)
summary(Cox1)
summary(Cox2)
summary(Cox3)
```
Call:
coxph(formula = Surv(time, died) ~ trt, data = ExerciseII)
n= 500, number of events= 376
coef exp(coef) se(coef) z Pr(>|z|)
trt 0.2342 1.2638 0.1055 2.221 0.0264 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
trt 1.264 0.7912 1.028 1.554
Concordance= 0.514 (se = 0.013 )
Likelihood ratio test= 4.85 on 1 df, p=0.03
Wald test = 4.93 on 1 df, p=0.03
Score (logrank) test = 4.95 on 1 df, p=0.03
Call:
coxph(formula = Surv(time, died) ~ gender, data = ExerciseII)
n= 500, number of events= 376
coef exp(coef) se(coef) z Pr(>|z|)
genderM 0.03332 1.03388 0.10319 0.323 0.747
exp(coef) exp(-coef) lower .95 upper .95
genderM 1.034 0.9672 0.8446 1.266
Concordance= 0.502 (se = 0.014 )
Likelihood ratio test= 0.1 on 1 df, p=0.7
Wald test = 0.1 on 1 df, p=0.7
Score (logrank) test = 0.1 on 1 df, p=0.7
Call:
coxph(formula = Surv(time, died) ~ type, data = ExerciseII)
n= 500, number of events= 376
coef exp(coef) se(coef) z Pr(>|z|)
type 1.2583 3.5193 0.1112 11.32 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
type 3.519 0.2841 2.83 4.376
Concordance= 0.644 (se = 0.012 )
Likelihood ratio test= 131.9 on 1 df, p=<2e-16
Wald test = 128.1 on 1 df, p=<2e-16
Score (logrank) test = 141.8 on 1 df, p=<2e-16
Factor | HR | 95% IC | p-value |
---|---|---|---|
trt | 1.264 | 1.028 1.554 | 0.0264 |
gender | 1.034 | 0.8446 1.266 | 0.747 |
type | 3.519 | 2.83 4.376 | < 2e-16 |
Now we have to select those with p < 0.200 (or 0.100 depending on our analysis plan) for further modeling
This means that all variables with p < 0.200 (type) are potential factors that could be in our final model.
We also keep “trt” because it is the main independent variable
4.2 Entering the selected variables in a multivariable model and then removing the non-significant factors one by one (always removing the least significant factor). “Backwards elimination”
```{r}
#Creating the Cox model for all selected variables (multivariate)
library(survival)
Cox4<- coxph(Surv(time, died) ~ trt + type , data=ExerciseII)
summary(Cox4)
```
Call:
coxph(formula = Surv(time, died) ~ trt + type, data = ExerciseII)
n= 500, number of events= 376
coef exp(coef) se(coef) z Pr(>|z|)
trt -0.5853 0.5569 0.1233 -4.747 2.06e-06 ***
type 1.5945 4.9257 0.1316 12.115 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
trt 0.5569 1.796 0.4374 0.7092
type 4.9257 0.203 3.8058 6.3751
Concordance= 0.67 (se = 0.013 )
Likelihood ratio test= 154.6 on 2 df, p=<2e-16
Wald test = 149.9 on 2 df, p=<2e-16
Score (logrank) test = 163.7 on 2 df, p=<2e-16
Both independant variable (trt and type) are significant (p value = 2.06e-06 and < 2e-16 )
So we need to keep both independent variables. But, are they ok like that, or there is an interaction?
4.3 Checking for interaction
```{r}
#Creating the Cox model for all selected variables considering interaction
Cox5<- coxph(Surv(time, died) ~ trt * type , data=ExerciseII)
summary(Cox5)
```
Call:
coxph(formula = Surv(time, died) ~ trt * type, data = ExerciseII)
n= 500, number of events= 376
coef exp(coef) se(coef) z Pr(>|z|)
trt -0.1068 0.8987 0.2191 -0.487 0.6260
type 1.7610 5.8185 0.1466 12.014 <2e-16 ***
trt:type -0.6405 0.5270 0.2584 -2.479 0.0132 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
trt 0.8987 1.1127 0.5850 1.3808
type 5.8185 0.1719 4.3656 7.7550
trt:type 0.5270 1.8974 0.3176 0.8745
Concordance= 0.67 (se = 0.013 )
Likelihood ratio test= 160.2 on 3 df, p=<2e-16
Wald test = 163.4 on 3 df, p=<2e-16
Score (logrank) test = 189.7 on 3 df, p=<2e-16
It seems that with the treatment effect interaction it is no longer significant (HR: 0.8987, pvalue= 0.6260), but it is not true.
Because we’re introducing the interaction, when we see the HR = 0.8987 it’s the HR for the local cancer only.
The data from the model is:
Factor | HR |
---|---|
trt | 0.8987 |
type | 5.8185 |
trt:type | 0.5270 |
But to know the correct HR of trt we have to add the HR of the interaction:
. HR of trt = 0.899^trt * 0.5270^(trt*type)
HR trt in local cancer = 0.899^trt * 0.5270^(trt0) HR trt in local cancer = 0.899^trt 1 HR trt in local cancer = 0.899
HR trt for metastatic cancer = 0.899^trt * 0.5270^(trttype) HR trt for metastatic cancer = 0.899^trt 0.5270^(trt1) HR trt for metastatic cancer = 0.899 0.527 HR trt for metastatic cancer = 0.473
4.4 We need to compare both models to check if the model with interaction is significantly better
Analysis of Deviance Table
Cox model: response is Surv(time, died)
Model 1: ~ trt * type
Model 2: ~ trt + type
loglik Chisq Df P(>|Chi|)
1 -1995.9
2 -1998.8 5.6712 1 0.01725 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a significant difference with and without interaction models (p = 0.01725). So we keep the interaction.
4.5 To present better the data with interaction is better to divide the dataset according each group of interaction and to present the model separately.
Creating subsets according each group of interaction:
```{r}
table(ExerciseII$type)
LocalCancer <- subset(ExerciseII, type==0) #Creating a database for local cancer
Metastatic <- subset(ExerciseII, type==1) #Creating a database for metastatic
```
0 1
240 260
Checking the kaplan-meyer curve and cox regressión in each subset
4.5.1 IN LOCAL CANCER
Call:
coxph(formula = Surv(time, died) ~ trt, data = LocalCancer)
n= 240, number of events= 151
coef exp(coef) se(coef) z Pr(>|z|)
trt -0.1020 0.9030 0.2191 -0.465 0.642
exp(coef) exp(-coef) lower .95 upper .95
trt 0.903 1.107 0.5877 1.387
Concordance= 0.511 (se = 0.016 )
Likelihood ratio test= 0.22 on 1 df, p=0.6
Wald test = 0.22 on 1 df, p=0.6
Score (logrank) test = 0.22 on 1 df, p=0.6
HR local cancer = 0.90 IC95%(0.5877 1.387)
Among those with local cancer, the ones that undergo the new treatment has 10% less hazard of die than the ones that went with the traditional treatment.But this is not significant.
That is the same to say:
Among those with local cancer, getting the new treatment reduces your hazard of dying by 10%. But this is not significant.
4.4.2 IN METASTATIC CANCER
Call:
coxph(formula = Surv(time, died) ~ trt, data = Metastatic)
n= 260, number of events= 225
coef exp(coef) se(coef) z Pr(>|z|)
trt -0.7685 0.4637 0.1430 -5.375 7.67e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
trt 0.4637 2.156 0.3504 0.6137
Concordance= 0.592 (se = 0.018 )
Likelihood ratio test= 28.13 on 1 df, p=1e-07
Wald test = 28.89 on 1 df, p=8e-08
Score (logrank) test = 30.19 on 1 df, p=4e-08
HR metastatic cancer = 0.46 IC95% (0.3504 0.6137)
Among those with metastasis cancer, the ones that undergo the new treatment has 54% less hazard of die than the ones that went with the traditional treatment.
That is the same to say:
Among those with metastasis cancer, getting the new treatment reduces your hazard of dying by 54%
HR local cancer = 0.90