2023-02-13 Session 1

Survival Analysis - Session 1

Writer
Affiliation
Javier Silva-Valencia

Instituut Voor Tropische Geneeskunde. Antwerp-Belgium

Published

2023-02-13

Abstract
The exercise begins by asking you to identify the cases in which you need to use survival analysis. Specific patient examples are then provided to tabulate data using either the intention-to-treat approach or a per-protocol approach. Finally we create the kaplan-meyer curves.


Starting the Exercise

Question 1

Which of the following types of data can/should be analyzed using survival analysis? What would be the analysis variable? What is the time of origin and the end point? Are the data censored, and if yes, which type of censoring occurs?

  1. Two groups of patients with skin cancer are treated: group A gets surgical removal of the cancer, group B gets surgical removal + chemotherapy. We believe that the chemotherapy will reduce the risk of recurrence of he cancer. The patients are asked to return for a check up every 3 months for the following 5 years. At every check-up, the patients are examined for skin cancers.
Reply

a1. Should be analyzed using survival analysis?: Yes.

Why?, Because we are interested in analyzing the time from one event to another and also because the data could be censored (when we don’t know exactly when is the end point for some individuals)

Why not linear regression?: Because of the censorship. There is a possibility that some individuals did not continue to follow-up or completed the study without having the event.

a2. Analysis variable: Time to cancer recurrence

a3. Time:

Moment of origin: Moment in which the intervention is carried out

End point: Time to cancer recurrence

a4. Are the data censored?: Yes, censoring in survival analysis refers to the situation in which the exact time to event is not known for some individuals in a study.

What type of censoring occurs?: Interval, because if you are studying the recurrence of cancer, you do not know if it just appeared before the appointment Also, the point where cancer begins is not well defined.

  1. Two groups of patients with malaria are entered in a clinical trial: 1 group receives a new treatment, the other group receives standard care. After 3 months they are asked to return for parasitological examination. We want to see if the new treatment cures more patient than standard therapy.
Reply

b1. Should be analyzed using survival analysis?: No. Our outcome doesn’t involve time. It is just one measure.

For this we should use Logistic Regression. Because the outcome is a binary variable..

b2. Analysis variable:% of patients parasite-free at month 3

  1. To assess risk behaviour for HIV and other STI’s, a group of adolescents (12 to 18 years) are asked when they had sexual intercourse for the first time. We are interested in estimating the age of sexual debut.
Reply

c1. Should be analyzed using survival analysis?: Yes. Our outcome is the time of sexual debut. We are interested in the time from one event to another. Also the data could be censored

c2. Analysis variable: Age of sexual debut

c3. Time:

Time of origin: Born

End point: Age of sexual debut

c4. Are the data censored?: Yes. There may be adolescents who at the end of the study have not developed the result (they have not yet had sexual relations at the time of the survey).

This is a cross-sectional study but you ask for a time. (It’s a bit tricky)

What type of censoring occurs? Right. Because of some teens who have not yet had sex at the time of the survey. We know that the outcome could develop after the end of the study.

  1. To assess risk behaviors for childhood asthma, we perform a case-control study comparing mothers of children with asthma and children without asthma. We ask the mothers when they were first pregnant. We are interested in estimating the age of first pregnancy for women with a child with asthma vs women with a child without asthma
Reply

d1. Should be analyzed using survival analysis?: Yes. We are interested in the time from one event to another. But, since there is no censoring here, we can also choose to use linear regression.

d2. Analysis variable: Age at first pregnancy

Independent variable: having or not a child with asthma

d3. Time:

Time of origin: Birth

End point: Age of first pregnancy

d4. Are the data censored? No, censorship is when we don’t know the endpoint for some individuals. In this case, all the women have had a first pregnancy, so they have all reached the outcome and we know when. Also, I cannot have losses to follow-up, because it is a cross-sectional study.

Question 2

We ran a clinical trial of a new less invasive surgical procedure (treatment A) for patients with myocardial infarction. This procedure is compared to the standard treatment (treatment B) which requires open-heart surgery. After surgery patients were required to return to the clinic every 3 months for a check-up. The trial was completed on 1-Jan-2006. The primary objective of the study is to compare the two treatment groups with respect to mortality, analyzed using an intention-to-treat approach. As a secondary analysis, time to cardiac event (defined as new MI or a new cardiac procedure or cardiac death) is analyzed, using a per-protocol approach.

In an intention-to-treat analysis, you analyze everyone according to the groups from the initial randomization. (Although in the end for X reasons some did not follow the planned treatment)

In a per-protocol analysis, you analyze only those who actually undergo initial randomization.

Intention to treat vs Per protocol

  1. Patient 001 was randomized to treatment A. The surgery was performed on 1-Jul-2004 and was successful according to the surgeon. The patient came in for his visits and was in good health at the end of the study.
Tip

Table according to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
1 A 0 1-jul-04 1-ene-06 End of study 18.30

Table according to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
1 A A 0 1-jul-04 1-ene-06 End of study 18.30

The cardiac event is defined as new MI or a new cardiac procedure or cardiac death.

  1. Patient 002 was randomized to treatment A. The surgery was performed on 15-Jul-2004 and was successful according to the surgeon. On 15-Nov-2004, the patient came into the hospital with a new myocardial infarction. The patient however died 15 days later in hospital.
Tip

According to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
2 A 1 15-jul-04 1-dic-04 Died 4.63

According to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
2 A A 1 15-jul-04 15-nov-04 New MI 4.10
  1. Patient 003 was randomized to treatment B. The surgery was performed on 30-Jul-2004 and was successful. The patient came in hospital for the first 2 visits. The second visit was on 1-Feb-2005 when the patient was in good health. However the patient but did not return to the hospital after this date.
Tip

According to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
3 B 0 30-jul-04 1-feb-05 Lost follow up 6.20

According to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
3 B B 0 30-jul-04 1-feb-05 Lost follow up 6.20
  1. Patient 004 was randomized to standard treatment. When in the operating room (on 1-Aug-2004), the surgeon however decided that the standard surgery was too dangerous to perform, due to the bad condition of the patient. Instead, the new less invasive operation was performed. The patient died in hospital 15 days after the operation.
Tip

According to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
4 B 1 1-ago-04 16-ago-04 Died 0.50

According to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
4 B A Excluded

Why in per-protocol analysis is B? Because this person did not carry out the assigned procedure (randomized), then is excluded

  1. Patient 005 was randomized to the treatment B. The surgery was performed on 1-Nov-2004 and was successful according to the surgeon. The patient returned for his first follow-up visit on 1-Feb-2005. The patient did not return to the hospital for any of the next follow-up visits. However one of the study nurses read in the newspaper that the patient (whose name she remembered) had died on 1-Sep-2005 due to a fall when performing repairs to the roof of his house.
Tip

According to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
5 B 1 1-nov-04 1-sep-05 Died (even it was by fall) 10.13

According to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
5 B B 0 1-nov-04 1-feb-05 Lost follow up 3.07
  1. Patient 006 was randomized to the treatment A. The surgery was performed on 1-Dec-2004 and was successful according to the surgeon. On 1-Mar-2005, the patient came into the hospital with a new MI. The surgeon decided to perform open heart surgery, similar to treatment B. After this surgery, the patient did well and returned regularly to the check-ups. On 1-Jan-2006 was still alive and doing well
Tip

According to an intention-to-treat analysis

ID Original Treatment Event (death) Start date End date Reason of the end date Time (start date - end date)
6 A 0 1-dic-04 1-ene-06 End of study 13.20

According to an per-protocol analysis

ID Treatment original Real treatment Cardiac Event Start date End date (first date) Reason of the end date Time to event (months)
6 A A 1 1-dic-04 1-mar-05 New MI 3.00

Why in per-protocol analysis is still A? Because this person did carry out the assigned procedure (randomized)

Summary of all the cases

So the final table for intention-to-treat analysis is:

ID Treatment Time (months) Event
1 A 18.30 0
2 A 4.63 1
3 B 6.20 0
4 B 0.50 1
5 B 10.13 1
6 A 13.20 0

And the final table for per-protocol analysis is:

ID Treatment Time (months) Event
1 A 18.30 0
2 A 4.10 1
3 B 6.20 0
4 A Excluded
5 B 3.07 0
6 A 3.00 1

Question 3

Patient 003 was lost to follow-up in the trial. Can you think of ways to recover some additional data for this patient?

Reply
  • Check with other hospitals
  • Contact with their house or relatives.
  • Check demographics records

Question 4

The data below are from a cervical cancer trial. Enter the data in Excel and analyze the data using R.

Give: - Kaplan-Meier plot for the two groups

  • median survival time in each group

  • The proportion of patients alive after 2 years in each group

  • Test the difference between the two curves using the log-rank test

4.1. First we import the data from the excel file

library(readxl)
dat <- read_excel("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2/Survival Analysis Exercise Database/Exercises/MPH2021_T2_MAV_SurvAnal_Day_I_Result_Data.xlsx")

4.2. To start working with survival analysis we have to create an event variable (1 when the event happened (in this case dead))

dat$event <- ifelse(dat$SurvivalStatus == "dead", 1, 0)

4.3. We start doing the kaplan-meyer plot for the two groups

library(survival)
mod2 <- survfit(Surv(SurvivalTime, event) ~ Therapy, data=dat)
plot(mod2, mark.time=TRUE)

4.4. To see the median survival time for each group we ask for the numbers of the model:

mod2
Call: survfit(formula = Surv(SurvivalTime, event) ~ Therapy, data = dat)

           n events median 0.95LCL 0.95UCL
Therapy=A 16     11   1037     291      NA
Therapy=B 14      5   1307     827      NA
Reply

The median survival time is 1037 for therapy A and 1307 for therapy B.

Interpretation:

  • 50% of the population was alive at 1037 days in therapy group A

  • 50% of the population was alive at 1307 days in therapy group B

So, group B (new treatment) keeps more alive for more time

4.5. Proportion of patients alive after two years

For this we can see the graph:

plot(mod2, mark.time=TRUE)

Reply

Seeing the proportion of patients alive after two years (730 days)

More or less 60% in the A group and about 80% in the B group are alive after two years

But we also can see the proportions of people alive in a certain moment using summary

summary(mod2)
Call: survfit(formula = Surv(SurvivalTime, event) ~ Therapy, data = dat)

                Therapy=A 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   90     16       1    0.938  0.0605       0.8261        1.000
  142     15       1    0.875  0.0827       0.7271        1.000
  150     14       1    0.812  0.0976       0.6421        1.000
  269     13       1    0.750  0.1083       0.5652        0.995
  291     12       1    0.688  0.1159       0.4941        0.957
  680     10       1    0.619  0.1230       0.4191        0.914
  837      9       1    0.550  0.1271       0.3497        0.865
 1037      7       1    0.471  0.1310       0.2735        0.813
 1153      4       1    0.354  0.1417       0.1612        0.775
 1297      3       1    0.236  0.1348       0.0768        0.723
 1429      2       1    0.118  0.1072       0.0198        0.701

                Therapy=B 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
  272     14       1    0.929  0.0688        0.803            1
  362     13       1    0.857  0.0935        0.692            1
  373     12       1    0.786  0.1097        0.598            1
  827      7       1    0.673  0.1401        0.448            1
 1307      3       1    0.449  0.2057        0.183            1
Reply

In terapy A , at the time 680, 68.8% were alive

In terapy B , at the time 827, 78.6% were alive

(You need to see the number in the previous row)

4.6 Test differences between both curves

Are both curves different? (Curve from terapy A vs curve from terapy B). We use the Long-rank test

#Long rank test command
survdiff(Surv(SurvivalTime, event) ~ Therapy, data=dat)
Call:
survdiff(formula = Surv(SurvivalTime, event) ~ Therapy, data = dat)

           N Observed Expected (O-E)^2/E (O-E)^2/V
Therapy=A 16       11     8.44     0.780      1.68
Therapy=B 14        5     7.56     0.869      1.68

 Chisq= 1.7  on 1 degrees of freedom, p= 0.2 
Reply

The p value is 0.2, so there is no statistical difference between both curves.

4.7 Conclusions

Reply

The prognosis of patients with the new treatment appeared to be better to reduce mortality cases. The median survival was 1037 days (2.8 years) in therapy group A (standard treatment) and 1307 days (3.5 years of survival) in the therapy group B (new treatment)

This difference was however not statistically significant (p-value = 0.2)

This could be because it was a small study. A new and larger clinical trial should be planned to reach a definitive conclusion.