Competing Risks

Learning objectives
  • Introduction to competing risks

  • Learn methods for competing risks

  • Learn how to perform competing risks in R

Introduction to competing risks

1. Competing risks can be viewed as a multi-state survival problem

  • Three example multi-state models.
    • The upper left is a simple survival diagram,
      • we can view death as an absorbing state
    • the lower left is an example of competing risks,
    • and the lower right panel is a multi-state illness-death model.

  • In application, there are 5 steps proposed by Staffa and Zurakowski (2019) to design and analyze a competing risks study

    1. determine whether competing risks analysis is needed (do you have informative censoring?)
    2. perform nonparametric analysis first
    3. perform a model-based analysis
    4. interpret results
    5. comparing to traditional survival analysis, this may not be necessary but will help you understand the impact of competing events

2. The “incorrect” Naive analysis

  • Understanding the issues of the naive approach can provide insight into the competing risks program and statistical methods to appropriately handle competing risks

  • Placebo data from the 4D study, Wanner et al (2005)

  • This study aimed at comparing atorvastatin to placebo for patients with type 2 diabetes and receiving hemodialysis in terms of cariovascular events.

  • The primary endpoint was a composite of death from cardiac causes, stroke and non-fatal myocardial infarction.

  • Competing event was death from other causes.

    • id: Patients’ id number
    • sex: Patients’ sex
    • age: Patients’ age
    • status: Status at the end of the follow-up.
      • 1 for the event of interest,
      • 2 for death from other causes
      • and 0 for censored observations
    • time: Survival time
    • treated: Numeric vector indicated whether patients are treated or not. Here always equal to zero
library(etm) #loading fourD data from R package etm;
library(DT)
library(gtsummary)

data(fourD)
datatable(fourD,
          rownames = FALSE,
          options = list(dom = 't'))
fourD %>%
  tbl_summary(
  by = status,
  missing_text = "(Missing)") %>%
  add_overall()
Characteristic Overall, N = 6361 0, N = 2641 1, N = 2431 2, N = 1291
id 5,927 (5,458, 6,417) 6,126 (5,606, 6,576) 5,844 (5,401, 6,273) 5,720 (5,383, 6,243)
sex



    Female 292 (46%) 115 (44%) 119 (49%) 58 (45%)
    Male 344 (54%) 149 (56%) 124 (51%) 71 (55%)
age 66 (61, 72) 64 (60, 70) 67 (62, 72) 69 (62, 75)
medication



    Placebo 636 (100%) 264 (100%) 243 (100%) 129 (100%)
time 2.38 (1.50, 3.70) 3.37 (2.25, 4.47) 1.83 (0.82, 2.97) 1.76 (1.01, 2.98)
treated 0 (0%) 0 (0%) 0 (0%) 0 (0%)
1 Median (IQR); n (%)


  • Naive analysis: modelling death from other causes (status=2) as censoring, by folding the competing risk into a two-status outcome, where we no longer need to distinguish between event types

\[ \text{status} = \Big\{\begin{smallmatrix} 0 & \text{censored}\\ 1 & \text{death - primary cause} \\ 2 & \text{death - other causes} \end{smallmatrix} \Rightarrow \text{status1} = \Big\{\begin{smallmatrix} 0 & \text{censored or death - other causes}\\ 1 & \text{death - primary cause} \end{smallmatrix} \]

\[ \text{status} = \Big\{\begin{smallmatrix} 0 & \text{censored}\\ 1 & \text{death - primary cause} \\ 2 & \text{death - other causes} \end{smallmatrix} \Rightarrow \text{status2} = \Big\{\begin{smallmatrix} 0 & \text{censored or death - primary cause}\\ 1 & \text{death - other causes} \end{smallmatrix} \]

  • the Kaplan-Meier estimator takes the patients who are censored to still be at risk, i.e., they will go on to experience the event

  • However, of course those who experienced the competing risk are not ‘still at risk’, results in the estimated survivor function being under estimated; over-estimate event rates!

# incorrect KM estimates;
library(tidyverse)
library(survminer)
library(survival)
library(latticeExtra)

fit_incorrect1 <- survfit(Surv(time, status==1)~1, data=fourD)
fit_incorrect2 <- survfit(Surv(time, status==2)~1, data=fourD)

surv.km <- fit_incorrect1$surv
time.km <- fit_incorrect1$time
surv.other.km <- fit_incorrect2$surv
cumDist.km = 1-surv.km #cumulative incidence distribution;

km_primary <- xyplot(cumDist.km ~ time.km, type = "l",lwd=2,  ylim = c(0,1.1), xlim = c(0,6.2), ylab = "Probability of death from primary cause", xlab = "Years")

km_other <- xyplot(surv.other.km ~ time.km, type = "l" , lwd=2, ylim = c(0,1.1), ylab = "Probability of death from other causes")

doubleYScale(km_primary, km_other, add.ylab2 = TRUE, use.style=T)

  • Figure shows that the two curves cross.

  • At 5 years, the probability of dying of primary cause is 0.44, and of other causes it is 0.64. The sum of the two probabilities exceed 1 - over estimate event rates!

Nonparametric estimation using cumulative incidence functions (CIF)

1. Connection between CIF, suvival and hazard functions

  • In the absence of competing risks, the cumulative incidence function is equivalent to the cumulative distribution function
    • the probability or risk of an event occurring by time \(t\).

\[ CIF(t) = F(t) = Pr(T \leq t) = 1- S(t) \]

\[ \hat{CIF}_{km}(t) = 1 - \hat{S}_{km}(t) = 1 - \exp(-\int_0^t \lambda(u)du) \]

  • CIF and hazard function is related! Thus, if a covariate is associate with change in the rate of which events occur, we can also conclude that it will be associate with cahnges in the probability of the event occuring.

  • The direction of the association will be concordant between analysis based on CIF and the hazard function.

2. Cause-specific CIF

  • The probability of experiencing an event by time \(t\) and that the event experience is of type \(j\), \(j = 1, \ldots, J\) and \(D \in \{1, \ldots, J\}\) be the event type indicator.

\[ CIF_j(t) = Pr(T \leq t, D = j) \]

\[ \lim_{t \rightarrow \infty} CIF_j(t) < 1 \]

  • as time goes to infinity, in the presence of competing event, the probability of experiencing event type j will be less than 100%

    • “referred to as a subdistribution function”
  • Using KM estimator of \(CIF_j(t)\) will result in under estimated survival as we have shown before.

  • Several estimators do not exhibit this bias has been proposed

    • Alen-Johansen estimator for CIF
    • standard error of the AJ estimator can be computed using jackknife or bootstrap technique
    • in R we can use cuminc() function under the tidycomprsk package to calculate CIF.
  • we can use Gray’s test, a modified Chi-squared test, to compare CIF between 2 or more groups (similar to the log-rank test).

library(tidycmprsk)
library(ggsurvfit)

# change status to a factor;
fourD$status2 <- as.factor(fourD$status)
levels(fourD$status2) <- c("Censored", "Death from primary cause", "Death from other causes")

fit1 <- cuminc(Surv(time, status2) ~ 1, data = fourD)
fit1

time   n.risk   estimate   std.error   95% CI          
1.00   532      0.050      0.009       0.035, 0.069    
2.00   383      0.116      0.013       0.093, 0.143    
3.00   252      0.163      0.015       0.134, 0.194    
4.00   137      0.206      0.018       0.173, 0.242    
5.00   51       0.247      0.021       0.208, 0.288    
6.00   1        0.272      0.024       0.225, 0.321    

time   n.risk   estimate   std.error   95% CI          
1.00   532      0.112      0.013       0.089, 0.138    
2.00   383      0.222      0.017       0.190, 0.255    
3.00   252      0.305      0.019       0.268, 0.342    
4.00   137      0.390      0.021       0.348, 0.432    
5.00   51       0.467      0.025       0.419, 0.515    
6.00   1        0.519      0.039       0.440, 0.592    
fit1 %>% 
  ggcuminc(outcome = c("Death from primary cause", "Death from other causes")) +
  ylim(c(0, 1)) + 
  labs(x = "Years")+ 
  add_confidence_interval() +
  add_risktable()

  • Suppose we want to compare cause-specific CIF by sex, we can use Gray’s test.
fit2 <- cuminc(Surv(time, status2) ~ sex, data = fourD)
fit2

strata   time   n.risk   estimate   std.error   95% CI          
Female   1.00   240      0.058      0.014       0.035, 0.089    
Female   2.00   164      0.121      0.019       0.087, 0.162    
Female   3.00   103      0.177      0.024       0.134, 0.226    
Female   4.00   59       0.199      0.025       0.152, 0.251    
Female   5.00   21       0.241      0.030       0.184, 0.301    
Female   6.00   0        0.254      0.032       0.193, 0.318    
Male     1.00   292      0.044      0.011       0.025, 0.069    
Male     2.00   219      0.112      0.017       0.081, 0.148    
Male     3.00   149      0.151      0.020       0.114, 0.192    
Male     4.00   78       0.212      0.024       0.167, 0.262    
Male     5.00   30       0.253      0.028       0.199, 0.310    
Male     6.00   1        0.288      0.036       0.219, 0.359    

strata   time   n.risk   estimate   std.error   95% CI          
Female   1.00   240      0.117      0.019       0.083, 0.157    
Female   2.00   164      0.248      0.026       0.200, 0.300    
Female   3.00   103      0.339      0.029       0.283, 0.396    
Female   4.00   59       0.426      0.032       0.362, 0.488    
Female   5.00   21       0.500      0.037       0.426, 0.569    
Female   6.00   0        0.512      0.038       0.436, 0.584    
Male     1.00   292      0.108      0.017       0.078, 0.143    
Male     2.00   219      0.199      0.022       0.159, 0.244    
Male     3.00   149      0.276      0.025       0.228, 0.326    
Male     4.00   78       0.360      0.029       0.304, 0.416    
Male     5.00   30       0.441      0.033       0.375, 0.504    
Male     6.00   1        0.518      0.059       0.397, 0.625    

outcome                    statistic   df     p.value    
Death from primary cause   2.06        1.00   0.15       
Death from other causes    0.011       1.00   0.91       
fit2 %>% 
  ggcuminc(outcome = c("Death from primary cause", "Death from other causes")) +
  ylim(c(0, 1)) + 
  labs(x = "Years")+ 
  add_confidence_interval() +
  add_risktable()

fit2 %>% 
  tbl_cuminc(
    times =c(1,5), 
    outcome = c("Death from primary cause", "Death from other causes"),
    label_header = "**Year {time}**") %>% 
  add_p()
Characteristic Year 1 Year 5 p-value1
Death from other causes
sex

>0.9
    Female 5.8% (3.5%, 8.9%) 24% (18%, 30%)
    Male 4.4% (2.5%, 6.9%) 25% (20%, 31%)
Death from primary cause
sex

0.2
    Female 12% (8.3%, 16%) 50% (43%, 57%)
    Male 11% (7.8%, 14%) 44% (37%, 50%)
1 Gray’s Test

Model-based Competing risks analysis - Proportional hazard models

1. Competing risks regression
  1. Cause-specific hazards
    • instantaneous rate of occurrence of the given type of event in subjects who are currently event-free
    • redefining the failure indicator to be 1 if and only if the patient fails due specifically to cause \(j\)
    • estimated using Cox regression
  2. Subdistribution hazards
    • instantaneous rate of occurrence of the given type of event in subjects who have not yet experienced an event of that type
    • estimated using Fine-Gray regression

Assumptions

  • independent observations
    • in the presence of correlated data, we can fit a cluster or frailty term on the cause-specific hazard model and the Fine-Gray model
  • Non-informative censoring
  • Proportional hazard assumption
    • We have to check for PH assumption for both approaches!
    • use cox.zph() to test for association between hazard and time
2. Difference between the two approaches
  • Cause-specific Cox model
    • an individual is removed from the risk set when a failure due to other causes occurs
    • the interpretation of the regression coefficients relates to the cause-specific hazard function.
    • intuitive interpretation on both direct and magnitude of the effect on the hazard rate.
  • Fine-Gray model
    • the individual remains in the risk set for cause \(j\) when a failure due to other causes occurs
    • The interpretation of the regression coefficients relates to the subdistribution hazard function
    • be cautious of your interpretation! Comment on effect direction (e.g., variable x increases the risk of outcome), but not the magnitude. Austin and Fine (2017)

When to use which approach?

  • suggest to use the Fine-Gray subdistribution hazard model when the focus is on estimating incidence or predicting prognosis in the presence of competing risks.

  • suggest to use the cause-specific hazard model when the focus is on addressing etiologic questions

3. Cause-specific Cox PH Model

  • we can fit cause-specific Cox PH model easily using coxph() function - from lecture 3.

  • if your data contains two competing events, you will fit two cause-specific Cox model for each event.

  • same approaches to perform model diagnostics!

# cause 1 Cox model;
fit3_1 <- coxph(Surv(time, status == 1) ~ age + sex, data = fourD)

fit3_1 %>% 
  tbl_regression(exp = TRUE)
Characteristic HR1 95% CI1 p-value
age 1.02 1.00, 1.04 0.016
sex


    Female
    Male 0.88 0.68, 1.14 0.3
1 HR = Hazard Ratio, CI = Confidence Interval
# cause 2 cox model;
fit3_2 <- coxph(Surv(time, status == 2) ~ age + sex, data = fourD)

fit3_2 %>% 
  tbl_regression(exp = TRUE)
Characteristic HR1 95% CI1 p-value
age 1.07 1.04, 1.09 <0.001
sex


    Female
    Male 1.19 0.84, 1.69 0.3
1 HR = Hazard Ratio, CI = Confidence Interval
#  model estimated CIF;
#  it's possible to add in confidence interval using confidence interval of the estimated survival;
CIF.cox <- data.frame(
  CIF = c(1- survfit(fit3_1)$surv,1- survfit(fit3_2)$surv),
  time = c(survfit(fit3_1)$time,survfit(fit3_2)$time),
  event = c(rep("Death from primary cause", length(survfit(fit3_1)$time)), 
    rep("Death from other causes", length(survfit(fit3_2)$time))))

CIF.cox %>%
  ggplot( aes(x=time, y=CIF, group=event, color=event)) +
    geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Cause-specific Cox model estimate Cumulative Incidence")

# checking for PH assumption;
cox.zph(fit3_1)
        chisq df    p
age    0.9441  1 0.33
sex    0.0729  1 0.79
GLOBAL 0.9454  2 0.62
cox.zph(fit3_2)
       chisq df    p
age    1.871  1 0.17
sex    0.778  1 0.38
GLOBAL 3.342  2 0.19

4. Fine-Gray subdistribution hazards model

primary <- finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype = "1")
other <- finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype = "2")

#creating fine-gray weights;
head(primary)
  age    sex fgstart    fgstop fgstatus fgwt
1  60   Male       0 5.8480493        0    1
2  68 Female       0 5.2539357        0    1
3  70 Female       0 2.9541410        1    1
4  69   Male       0 0.9856263        1    1
5  58 Female       0 0.2902122        1    1
6  63   Male       0 3.9452430        1    1
head(other)
  age    sex  fgstart   fgstop fgstatus      fgwt
1  60   Male 0.000000 5.848049        0 1.0000000
2  68 Female 0.000000 5.253936        0 1.0000000
3  70 Female 0.000000 2.954141        0 1.0000000
4  70 Female 2.973306 3.044490        0 0.9921875
5  70 Female 3.101985 3.151266        0 0.9561795
6  70 Female 3.178645 3.197810        0 0.9397284
fgfit4_1 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex, data=primary, weight= fgwt)
summary(fgfit4_1)
Call:
coxph(formula = Surv(fgstart, fgstop, fgstatus) ~ age + sex, 
    data = primary, weights = fgwt)

  n= 7929, number of events= 243 

             coef exp(coef)  se(coef) robust se      z Pr(>|z|)
age      0.009938  1.009987  0.008203  0.007799  1.274    0.203
sexMale -0.146488  0.863736  0.131443  0.125210 -1.170    0.242

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0100     0.9901    0.9947     1.026
sexMale    0.8637     1.1578    0.6758     1.104

Concordance= 0.546  (se = 0.018 )
Likelihood ratio test= 3.49  on 2 df,   p=0.2
Wald test            = 4.14  on 2 df,   p=0.1
Score (logrank) test = 3.46  on 2 df,   p=0.2,   Robust = 4.13  p=0.1

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
fgfit4_2 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex, data=other, weight= fgwt)
summary(fgfit4_2)
Call:
coxph(formula = Surv(fgstart, fgstop, fgstatus) ~ age + sex, 
    data = other, weights = fgwt)

  n= 10154, number of events= 129 

           coef exp(coef) se(coef) robust se     z Pr(>|z|)    
age     0.05294   1.05437  0.01225   0.01174 4.509 6.51e-06 ***
sexMale 0.18354   1.20147  0.17986   0.17046 1.077    0.282    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age         1.054     0.9484    1.0304     1.079
sexMale     1.201     0.8323    0.8602     1.678

Concordance= 0.6  (se = 0.026 )
Likelihood ratio test= 20.16  on 2 df,   p=4e-05
Wald test            = 20.35  on 2 df,   p=4e-05
Score (logrank) test = 18.76  on 2 df,   p=8e-05,   Robust = 20.62  p=3e-05

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
# checking for PH assumption;
cox.zph(fgfit4_1)
        chisq df     p
age    4.0892  1 0.043
sex    0.0495  1 0.824
GLOBAL 4.1291  2 0.127
cox.zph(fgfit4_2)
       chisq df    p
age     0.42  1 0.52
sex     1.21  1 0.27
GLOBAL  1.95  2 0.38
# model predicted CIF;
CIF.fg <- data.frame(
  CIF = c(1- survfit(fgfit4_1)$surv,1- survfit(fgfit4_2)$surv),
  time = c(survfit(fgfit4_1)$time,survfit(fgfit4_2)$time),
  event = c(rep("Death from primary cause", length(survfit(fgfit4_1)$time)), 
    rep("Death from other causes", length(survfit(fgfit4_2)$time))))

CIF.fg %>%
  ggplot(aes(x=time, y=CIF, group=event, color=event)) +
    geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Fine-Gray model estimate Cumulative Incidence")

# combine cox and fg CIF;
CIF <- rbind(CIF.cox, CIF.fg)
CIF <- CIF %>% mutate(method = c(rep("Cox",nrow(CIF.cox)),rep("F-G",nrow(CIF.fg))))

ggplot(CIF, aes(x=time, y=CIF, colour=interaction(event, method),
                group=interaction(event, method))) +
  geom_point()+
  geom_line()+
  ylim(0,1)+
  xlab("Years")+
  ylab("Model estimate Cumulative Incidence")

Summary

More on difference between the two modelling approaches for completing risks

Difference on Hazard function specification

The cause-specific hazard function focuses on the risk of experiencing the event of interest, assuming the individual has not yet experienced either the event of interest or any competing event.

Mathematically, for cause \(k\), the cause-specific hazard at time \(t\) is defined as:

\[ \lambda_k(t \mid X) = \lim_{\Delta t \to 0} \frac{\Pr(t \leq T < t + \Delta t, \text{event is of type } k \mid T \geq t, X)}{\Delta t}\]

Where:

  • \(T\) is the time to the event.
  • \(X\) is a vector of covariates.
  • \(\lambda_k(t \mid X)\) is the instantaneous risk of experiencing event \(k\) at time \(t\), given that no event (neither the event of interest nor any competing event) has occurred before time \(t\).

The subdistribution hazard from Fine-Gray for cause \(k\) is given by:

\[ \lambda_k^{sub}(t \mid X) = \lim_{\Delta t \to 0} \frac{\Pr(t \leq T < t + \Delta t, \text{event is of type } k \mid T \geq t \text{ or competing event by time } t, X)}{\Delta t} \]

This differs from the cause-specific hazard because:

  • The risk set for the subdistribution hazard includes individuals who have already experienced competing events (but with modified risks), while the risk set for the cause-specific hazard excludes them once a competing event occurs.

  • The subdistribution hazard directly models the cumulative incidence function (CIF), which gives the probability of an event happening over time, considering the possibility of competing events.

Difference on Risk Set Defitniions

  • Cause-specific hazard: The risk set only includes individuals who are event-free for both the event of interest and competing events up to time \(t\).

\[ \mathcal{R}(t) = \{ i : T_i \geq t \} \]

  • Subdistribution hazard: The risk set includes individuals who are either event-free or have experienced a competing event by time \(t\). The Fine-Gray model retains individuals who experienced competing events in the risk set by adjusting their contribution.

\[ \mathcal{R}^{sub}(t) = \{ i : T_i \geq t \text{ or competing event by time } t \} \]

Difference on CIF

  • Cause-specific hazard: The cumulative incidence for cause \(k\), denoted as \(F_k(t)\), can be derived from the cause-specific hazard function using the following relationship:

\[ F_k(t) = \int_0^t \lambda_k(s) S(s) \, ds \]

where \(S(t)\) is the overall survival function (the probability of no event occurring by time \(t\)).

  • Subdistribution hazard: In the Fine-Gray model, the subdistribution hazard directly governs the cumulative incidence function for the event of interest. The cumulative incidence function can be written as:

\[ F_k(t) = 1 - \exp\left(-\int_0^t \lambda_k^{sub}(s) \, ds\right) \]

Leading to difference in HR interpretation

  • Cause-specific hazard ratios describe the effect of covariates on the instantaneous risk of the event of interest, conditional on being event-free up to time (t).

  • Subdistribution hazard ratios describe the effect of covariates on the cumulative incidence of the event of interest, accounting for competing risks.

Main points

  • Misinterpretation of Hazard Ratios:
    • The hazard ratio in the Fine-Gray model corresponds to the subdistribution hazard of an event, which is different from the cause-specific hazard (which reflects the instantaneous risk of an event given that no competing event has occurred).
    • Interpreting these hazard ratios as affecting the absolute risk of an event can be misleading because the Fine-Gray hazard ratios are conditional on survival up to time \(t\) without experiencing a competing event. Hence, they do not directly measure the effect of covariates on the probability of the event of interest occurring.
    • The Fine-Gray model links covariates to the cumulative incidence function, which gives the probability of an event over time. While this is the prepered approach for predicting absolute risks, the hazard ratios derived from this model do not correspond to the instantaneous risk but rather how covariates shift the entire cumulative risk.

Recommendations for Analyzing Competing Risk Survival Data, Austin et al (2016)

  1. (Non-parametric/univariate) Cumulative incidence functions (CIFs) should be used to estimate the incidence of each of the different types of competing risks.

    • Do not use the Kaplan-Meier estimate of the survival function for this purpose.
  2. Researchers need to decide whether the research objective is on addressing etiologic questions or on estimating incidence or predicting prognosis.

    • etiologic, cause-specific cox PH model
    • incidence estimation/prediction, Fine-Gray subdistribution hazard model
  3. In some settings, both types of regression models should be estimated for each of the competing risks to permit a full understanding of the effect of covariates on the incidence and the rate of occurrence of each outcome.

  4. Be careful of how you interpret coefficients from the Fine-Gray model

  • the covariates can be concluded as having an effect on the CIF (the probabiltiy/risk of experiencing event \(j\)).
  • However, it is important to note that magnitude of the subdistribution hazard ratio does not
  • This interpretation error appears to frequently in the medical literature.
  1. Proportional hazard assumption is needed for both approaches! Model diagnostic and validation are key steps.