Competing Risks

Learning objectives
  • Introduction to competing risks

  • Learn methods for competing risks

  • Learn how to perform competing risks in R

Introduction to competing risks

1. Competing risks can be viewed as a multi-state survival problem

  • Three example multi-state models.
    • The upper left is a simple survival diagram,
      • we can view death as an absorbing state
    • the lower left is an example of competing risks,
    • and the lower right panel is a multi-state illness-death model.

  • In application, there are 5 steps proposed by Staffa and Zurakowski (2019) to design and analyze a competing risks study

    1. determine whether competing risks analysis is needed (do you have informative censoring?)
    2. perform nonparametric analysis first
    3. perform a model-based analysis
    4. interpret results
    5. comparing to traditional survival analysis, this may not be necessary but will help you understand the impact of competing events

2. The “incorrect” Naive analysis

  • Understanding the issues of the naive approach can provide insight into the competing risks program and statistical methods to appropriately handle competing risks

  • Placebo data from the 4D study, Wanner et al (2005)

  • This study aimed at comparing atorvastatin to placebo for patients with type 2 diabetes and receiving hemodialysis in terms of cariovascular events.

  • The primary endpoint was a composite of death from cardiac causes, stroke and non-fatal myocardial infarction.

  • Competing event was death from other causes.

    • id: Patients’ id number
    • sex: Patients’ sex
    • age: Patients’ age
    • status: Status at the end of the follow-up.
      • 1 for the event of interest,
      • 2 for death from other causes
      • and 0 for censored observations
    • time: Survival time
    • treated: Numeric vector indicated whether patients are treated or not. Here always equal to zero
library(etm) #loading fourD data from R package etm;
library(DT)
library(gtsummary)

data(fourD)
datatable(fourD,
          rownames = FALSE,
          options = list(dom = 't'))
fourD %>%
  tbl_summary(
  by = status,
  missing_text = "(Missing)") %>%
  add_overall()
Characteristic Overall, N = 6361 0, N = 2641 1, N = 2431 2, N = 1291
id 5,927 (5,458, 6,417) 6,126 (5,606, 6,576) 5,844 (5,400, 6,272) 5,720 (5,383, 6,243)
sex
Female 292 (46%) 115 (44%) 119 (49%) 58 (45%)
Male 344 (54%) 149 (56%) 124 (51%) 71 (55%)
age 66 (61, 72) 64 (60, 70) 67 (62, 72) 69 (62, 75)
medication
Placebo 636 (100%) 264 (100%) 243 (100%) 129 (100%)
time 2.38 (1.50, 3.70) 3.37 (2.25, 4.47) 1.83 (0.82, 2.97) 1.76 (1.01, 2.98)
treated 0 (0%) 0 (0%) 0 (0%) 0 (0%)
1 Median (IQR); n (%)


  • Naive analysis: modelling death from other causes (status=2) as censoring, by folding the competing risk into a two-status outcome, where we no longer need to distinguish between event types

\[ \text{status} = \Big\{\begin{smallmatrix} 0 & \text{censored}\\ 1 & \text{death - primary cause} \\ 2 & \text{death - other causes} \end{smallmatrix} \Rightarrow \text{status1} = \Big\{\begin{smallmatrix} 0 & \text{censored or death - other causes}\\ 1 & \text{death - primary cause} \end{smallmatrix} \]

\[ \text{status} = \Big\{\begin{smallmatrix} 0 & \text{censored}\\ 1 & \text{death - primary cause} \\ 2 & \text{death - other causes} \end{smallmatrix} \Rightarrow \text{status2} = \Big\{\begin{smallmatrix} 0 & \text{censored or death - primary cause}\\ 1 & \text{death - other causes} \end{smallmatrix} \]

  • the Kaplan-Meier estimator takes the patients who are censored to still be at risk, i.e., they will go on to experience the event

  • However, of course those who experienced the competing risk are not ‘still at risk’, results in the estimated survivor function being under estimated; over-estimate event rates!

# incorrect KM estimates;
library(tidyverse)
library(survminer)
library(survival)
library(latticeExtra)

fit_incorrect1 <- survfit(Surv(time, status==1)~1, data=fourD)
fit_incorrect2 <- survfit(Surv(time, status==2)~1, data=fourD)

surv.km <- fit_incorrect1$surv
time.km <- fit_incorrect1$time
surv.other.km <- fit_incorrect2$surv
cumDist.km = 1-surv.km #cumulative incidence distribution;

km_primary <- xyplot(cumDist.km ~ time.km, type = "l",lwd=2,  ylim = c(0,1.1), xlim = c(0,6.2), ylab = "Probability of death from primary cause", xlab = "Years")

km_other <- xyplot(surv.other.km ~ time.km, type = "l" , lwd=2, ylim = c(0,1.1), ylab = "Probability of death from other causes")

doubleYScale(km_primary, km_other, add.ylab2 = TRUE, use.style=T)

  • Figure shows that the two curves cross.

  • At 5 years, the probability of dying of primary cause is 0.44, and of other causes it is 0.64. The sum of the two probabilities exceed 1 - over estimate event rates!

Nonparametric estimation using cumulative incidence functions (CIF)

1. Connection between CIF, suvival and hazard functions

  • In the absence of competing risks, the cumulative incidence function is equivalent to the cumulative distribution function
    • the probability or risk of an event occurring by time \(t\).

\[ CIF(t) = F(t) = Pr(T \leq t) = 1- S(t) \]

\[ \hat{CIF}_{km}(t) = 1 - \hat{S}_{km}(t) = 1 - \exp(-\int_0^t \lambda(u)du) \]

  • CIF and hazard function is related! Thus, if a covariate is associate with change in the rate of which events occur, we can also conclude that it will be associate with cahnges in the probability of the event occuring.

  • The direction of the association will be concordant between analysis based on CIF and the hazard function.

2. Cause-specific CIF

  • The probability of experiencing an event by time \(t\) and that the event experience is of type \(j\), \(j = 1, \ldots, J\) and \(D \in \{1, \ldots, J\}\) be the event type indicator.

\[ CIF_j(t) = Pr(T \leq t, D = j) \]

\[ \lim_{t \rightarrow \infty} CIF_j(t) < 1 \]

  • as time goes to infinity, in the presence of competing event, the probability of experiencing event type j will be less than 100%

    • “referred to as a subdistribution function”
  • Using KM estimator of \(CIF_j(t)\) will result in under estimated survival as we have shown before.

  • Several estimators do not exhibit this bias has been proposed

    • Alen-Johansen estimator for CIF
    • standard error of the AJ estimator can be computed using jackknife or bootstrap technique
    • in R we can use cuminc() function under the tidycomprsk package to calculate CIF.
  • we can use Gray’s test, a modified Chi-squared test, to compare CIF between 2 or more groups (similar to the log-rank test).

library(tidycmprsk)
library(ggsurvfit)

# change status to a factor;
fourD$status2 <- as.factor(fourD$status)
levels(fourD$status2) <- c("Censored", "Death from primary cause", "Death from other causes")

fit1 <- cuminc(Surv(time, status2) ~ 1, data = fourD)
fit1

time   n.risk   estimate   std.error   95% CI          
1.00   532      0.050      0.009       0.035, 0.069    
2.00   383      0.116      0.013       0.093, 0.143    
3.00   252      0.163      0.015       0.134, 0.194    
4.00   137      0.206      0.018       0.173, 0.242    
5.00   51       0.247      0.021       0.208, 0.288    
6.00   1        0.272      0.024       0.225, 0.321    

time   n.risk   estimate   std.error   95% CI          
1.00   532      0.112      0.013       0.089, 0.138    
2.00   383      0.222      0.017       0.190, 0.255    
3.00   252      0.305      0.019       0.268, 0.342    
4.00   137      0.390      0.021       0.348, 0.432    
5.00   51       0.467      0.025       0.419, 0.515    
6.00   1        0.519      0.039       0.440, 0.592    
fit1 %>% 
  ggcuminc(outcome = c("Death from primary cause", "Death from other causes")) +
  ylim(c(0, 1)) + 
  labs(x = "Years")+ 
  add_confidence_interval() +
  add_risktable()

  • Suppose we want to compare cause-specific CIF by sex, we can use Gray’s test.
fit2 <- cuminc(Surv(time, status2) ~ sex, data = fourD)
fit2

strata   time   n.risk   estimate   std.error   95% CI          
Female   1.00   240      0.058      0.014       0.035, 0.089    
Female   2.00   164      0.121      0.019       0.087, 0.162    
Female   3.00   103      0.177      0.024       0.134, 0.226    
Female   4.00   59       0.199      0.025       0.152, 0.251    
Female   5.00   21       0.241      0.030       0.184, 0.301    
Female   6.00   0        NA         NA          NA, NA          
Male     1.00   292      0.044      0.011       0.025, 0.069    
Male     2.00   219      0.112      0.017       0.081, 0.148    
Male     3.00   149      0.151      0.020       0.114, 0.192    
Male     4.00   78       0.212      0.024       0.167, 0.262    
Male     5.00   30       0.253      0.028       0.199, 0.310    
Male     6.00   1        0.288      0.036       0.219, 0.359    

strata   time   n.risk   estimate   std.error   95% CI          
Female   1.00   240      0.117      0.019       0.083, 0.157    
Female   2.00   164      0.248      0.026       0.200, 0.300    
Female   3.00   103      0.339      0.029       0.283, 0.396    
Female   4.00   59       0.426      0.032       0.362, 0.488    
Female   5.00   21       0.500      0.037       0.426, 0.569    
Female   6.00   0        NA         NA          NA, NA          
Male     1.00   292      0.108      0.017       0.078, 0.143    
Male     2.00   219      0.199      0.022       0.159, 0.244    
Male     3.00   149      0.276      0.025       0.228, 0.326    
Male     4.00   78       0.360      0.029       0.304, 0.416    
Male     5.00   30       0.441      0.033       0.375, 0.504    
Male     6.00   1        0.518      0.059       0.397, 0.625    

outcome                    statistic   df     p.value    
Death from primary cause   2.06        1.00   0.15       
Death from other causes    0.011       1.00   0.91       
fit2 %>% 
  ggcuminc(outcome = c("Death from primary cause", "Death from other causes")) +
  ylim(c(0, 1)) + 
  labs(x = "Years")+ 
  add_confidence_interval() +
  add_risktable()

fit2 %>% 
  tbl_cuminc(
    times =c(1,5), 
    outcome = c("Death from primary cause", "Death from other causes"),
    label_header = "**Year {time}**") %>% 
  add_p()
Characteristic Year 1 Year 5 p-value1
Death from other causes
sex >0.9
Female 5.8% (3.5%, 8.9%) 24% (18%, 30%)
Male 4.4% (2.5%, 6.9%) 25% (20%, 31%)
Death from primary cause
sex 0.2
Female 12% (8.3%, 16%) 50% (43%, 57%)
Male 11% (7.8%, 14%) 44% (37%, 50%)
1 Gray's Test

Model-based Competing risks analysis - Proportional hazard models

1. Competing risks regression
  1. Cause-specific hazards
    • instantaneous rate of occurrence of the given type of event in subjects who are currently event-free
    • redefining the failure indicator to be 1 if and only if the patient fails due specifically to cause \(j\)
    • estimated using Cox regression
  2. Subdistribution hazards
    • instantaneous rate of occurrence of the given type of event in subjects who have not yet experienced an event of that type
    • estimated using Fine-Gray regression

Assumptions

  • independent observations
    • in the presence of correlated data, we can fit a cluster or frailty term on the cause-specific hazard model and the Fine-Gray model
  • Non-informative censoring
  • Proportional hazard assumption
    • We have to check for PH assumption for both approaches!
    • use cox.zph() to test for association between hazard and time
2. Difference between the two approaches
  • Cause-specific Cox model
    • an individual is removed from the risk set when a failure due to other causes occurs
    • the interpretation of the regression coefficients relates to the cause-specific hazard function.
    • intuitive interpretation on both direct and magnitude of the effect on the hazard rate.
  • Fine-Gray model
    • the individual remains in the risk set for cause \(j\) when a failure due to other causes occurs
    • The interpretation of the regression coefficients relates to the subdistribution hazard function
    • be cautious of your interpretation! Comment on effect direction (e.g., variable x increases the risk of outcome), but not the magnitude. Austin and Fine (2017)

When to use which approach?

  • suggest to use the Fine-Gray subdistribution hazard model when the focus is on estimating incidence or predicting prognosis in the presence of competing risks.

  • suggest to use the cause-specific hazard model when the focus is on addressing etiologic questions

3. Cause-specific Cox PH Model

  • we can fit cause-specific Cox PH model easily using coxph() function - from lecture 3.

  • if your data contains two competing events, you will fit two cause-specific Cox model for each event.

  • same approaches to perform model diagnostics!

# cause 1 Cox model;
fit3_1 <- coxph(Surv(time, status == 1) ~ age + sex, data = fourD)

fit3_1 %>% 
  tbl_regression(exp = TRUE)
Characteristic HR1 95% CI1 p-value
age 1.02 1.00, 1.04 0.016
sex
Female
Male 0.88 0.68, 1.14 0.3
1 HR = Hazard Ratio, CI = Confidence Interval
# cause 2 cox model;
fit3_2 <- coxph(Surv(time, status == 2) ~ age + sex, data = fourD)

fit3_2 %>% 
  tbl_regression(exp = TRUE)
Characteristic HR1 95% CI1 p-value
age 1.07 1.04, 1.09 <0.001
sex
Female
Male 1.19 0.84, 1.69 0.3
1 HR = Hazard Ratio, CI = Confidence Interval
#  model estimated CIF;
#  it's possible to add in confidence interval using confidence interval of the estimated survival;
CIF.cox <- data.frame(
  CIF = c(1- survfit(fit3_1)$surv,1- survfit(fit3_2)$surv),
  time = c(survfit(fit3_1)$time,survfit(fit3_2)$time),
  event = c(rep("Death from primary cause", length(survfit(fit3_1)$time)), 
    rep("Death from other causes", length(survfit(fit3_2)$time))))

CIF.cox %>%
  ggplot( aes(x=time, y=CIF, group=event, color=event)) +
    geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Cause-specific Cox model estimate Cumulative Incidence")

# checking for PH assumption;
cox.zph(fit3_1)
        chisq df    p
age    0.9441  1 0.33
sex    0.0729  1 0.79
GLOBAL 0.9454  2 0.62
cox.zph(fit3_2)
       chisq df    p
age    1.871  1 0.17
sex    0.778  1 0.38
GLOBAL 3.342  2 0.19

4. Fine-Gray subdistribution hazards model

primary <- finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype = "1")
other <- finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype = "2")

#creating fine-gray weights;
head(primary)
  age    sex fgstart    fgstop fgstatus fgwt
1  60   Male       0 5.8480493        0    1
2  68 Female       0 5.2539357        0    1
3  70 Female       0 2.9541410        1    1
4  69   Male       0 0.9856263        1    1
5  58 Female       0 0.2902122        1    1
6  63   Male       0 3.9452430        1    1
head(other)
  age    sex  fgstart   fgstop fgstatus      fgwt
1  60   Male 0.000000 5.848049        0 1.0000000
2  68 Female 0.000000 5.253936        0 1.0000000
3  70 Female 0.000000 2.954141        0 1.0000000
4  70 Female 2.973306 3.044490        0 0.9921875
5  70 Female 3.101985 3.151266        0 0.9561795
6  70 Female 3.178645 3.197810        0 0.9397284
fgfit4_1 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex, data=primary, weight= fgwt)
summary(fgfit4_1)
Call:
coxph(formula = Surv(fgstart, fgstop, fgstatus) ~ age + sex, 
    data = primary, weights = fgwt)

  n= 7929, number of events= 243 

             coef exp(coef)  se(coef) robust se      z Pr(>|z|)
age      0.009938  1.009987  0.008203  0.007799  1.274    0.203
sexMale -0.146488  0.863736  0.131443  0.125210 -1.170    0.242

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0100     0.9901    0.9947     1.026
sexMale    0.8637     1.1578    0.6758     1.104

Concordance= 0.546  (se = 0.018 )
Likelihood ratio test= 3.49  on 2 df,   p=0.2
Wald test            = 4.14  on 2 df,   p=0.1
Score (logrank) test = 3.46  on 2 df,   p=0.2,   Robust = 4.13  p=0.1

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
fgfit4_2 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex, data=other, weight= fgwt)
summary(fgfit4_2)
Call:
coxph(formula = Surv(fgstart, fgstop, fgstatus) ~ age + sex, 
    data = other, weights = fgwt)

  n= 10154, number of events= 129 

           coef exp(coef) se(coef) robust se     z Pr(>|z|)    
age     0.05294   1.05437  0.01225   0.01174 4.509 6.51e-06 ***
sexMale 0.18354   1.20147  0.17986   0.17046 1.077    0.282    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age         1.054     0.9484    1.0304     1.079
sexMale     1.201     0.8323    0.8602     1.678

Concordance= 0.6  (se = 0.026 )
Likelihood ratio test= 20.16  on 2 df,   p=4e-05
Wald test            = 20.35  on 2 df,   p=4e-05
Score (logrank) test = 18.76  on 2 df,   p=8e-05,   Robust = 20.62  p=3e-05

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
# checking for PH assumption;
cox.zph(fgfit4_1)
        chisq df     p
age    4.0892  1 0.043
sex    0.0495  1 0.824
GLOBAL 4.1291  2 0.127
cox.zph(fgfit4_2)
       chisq df    p
age     0.42  1 0.52
sex     1.21  1 0.27
GLOBAL  1.95  2 0.38
# model predicted CIF;
CIF.fg <- data.frame(
  CIF = c(1- survfit(fgfit4_1)$surv,1- survfit(fgfit4_2)$surv),
  time = c(survfit(fgfit4_1)$time,survfit(fgfit4_2)$time),
  event = c(rep("Death from primary cause", length(survfit(fgfit4_1)$time)), 
    rep("Death from other causes", length(survfit(fgfit4_2)$time))))

CIF.fg %>%
  ggplot(aes(x=time, y=CIF, group=event, color=event)) +
    geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Fine-Gray model estimate Cumulative Incidence")

# combine cox and fg CIF;
CIF <- rbind(CIF.cox, CIF.fg)
CIF <- CIF %>% mutate(method = c(rep("Cox",nrow(CIF.cox)),rep("F-G",nrow(CIF.fg))))

ggplot(CIF, aes(x=time, y=CIF, colour=interaction(event, method),
                group=interaction(event, method))) +
  geom_point()+
  geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Model estimate Cumulative Incidence")

Summary

Recommendations for Analyzing Competing Risk Survival Data, Austin et al (2016)

  1. (Non-parametric/univariate) Cumulative incidence functions (CIFs) should be used to estimate the incidence of each of the different types of competing risks.

    • Do not use the Kaplan-Meier estimate of the survival function for this purpose.
  2. Researchers need to decide whether the research objective is on addressing etiologic questions or on estimating incidence or predicting prognosis.

    • etiologic, cause-specific cox PH model
    • incidence estimation/prediction, Fine-Gray subdistribution hazard model
  3. In some settings, both types of regression models should be estimated for each of the competing risks to permit a full understanding of the effect of covariates on the incidence and the rate of occurrence of each outcome.

  4. Be careful of how you interpret coefficients from the Fine-Gray model

    • the covariates can be concluded as having an effect on the CIF (the probabiltiy/risk of experiencing event \(j\)).
    • However, it is important to note that magnitude of the subdistribution hazard ratio does not
    • This interpretation error appears to frequently in the medical literature.
  5. Proportional hazard assumption is needed for both approaches! Model diagnostic and validation are key steps.