A competing risk is an event whose occurrence precludes the occurrence of the primary event of interest.
violating the non-informative censoring assumption in conventional Cox PH modelling.
e.g., In a study in which the primary outcome was time to ICU death, ICU discharge in this case would be an competing event, precluding the occurrence of the primary event.
A competing risks analysis should appropriately incorporating the competing event into the analysis.
Patients are still at risk to experience other outcomes!
Analysis that treats other events as random censoring, or considers time to the first occurred outcome will lead to bias analysis.
1. Competing risks can be viewed as a multi-state survival problem
Three example multi-state models.
The upper left is a simple survival diagram,
we can view death as an absorbing state
the lower left is an example of competing risks,
and the lower right panel is a multi-state illness-death model.
In application, there are 5 steps proposed by Staffa and Zurakowski (2019) to design and analyze a competing risks study
determine whether competing risks analysis is needed (do you have informative censoring?)
perform nonparametric analysis first
perform a model-based analysis
interpret results
comparing to traditional survival analysis, this may not be necessary but will help you understand the impact of competing events
2. The “incorrect” Naive analysis
Understanding the issues of the naive approach can provide insight into the competing risks program and statistical methods to appropriately handle competing risks
Naive analysis: modelling death from other causes (status=2) as censoring, by folding the competing risk into a two-status outcome, where we no longer need to distinguish between event types
the Kaplan-Meier estimator takes the patients who are censored to still be at risk, i.e., they will go on to experience the event
However, of course those who experienced the competing risk are not ‘still at risk’, results in the estimated survivor function being under estimated; over-estimate event rates!
# incorrect KM estimates;library(tidyverse)library(survminer)library(survival)library(latticeExtra)fit_incorrect1 <-survfit(Surv(time, status==1)~1, data=fourD)fit_incorrect2 <-survfit(Surv(time, status==2)~1, data=fourD)surv.km <- fit_incorrect1$survtime.km <- fit_incorrect1$timesurv.other.km <- fit_incorrect2$survcumDist.km =1-surv.km #cumulative incidence distribution;km_primary <-xyplot(cumDist.km ~ time.km, type ="l",lwd=2, ylim =c(0,1.1), xlim =c(0,6.2), ylab ="Probability of death from primary cause", xlab ="Years")km_other <-xyplot(surv.other.km ~ time.km, type ="l" , lwd=2, ylim =c(0,1.1), ylab ="Probability of death from other causes")doubleYScale(km_primary, km_other, add.ylab2 =TRUE, use.style=T)
Figure shows that the two curves cross.
At 5 years, the probability of dying of primary cause is 0.44, and of other causes it is 0.64. The sum of the two probabilities exceed 1 - over estimate event rates!
Nonparametric estimation using cumulative incidence functions (CIF)
1. Connection between CIF, suvival and hazard functions
In the absence of competing risks, the cumulative incidence function is equivalent to the cumulative distribution function
the probability or risk of an event occurring by time \(t\).
CIF and hazard function is related! Thus, if a covariate is associate with change in the rate of which events occur, we can also conclude that it will be associate with cahnges in the probability of the event occuring.
The direction of the association will be concordant between analysis based on CIF and the hazard function.
2. Cause-specific CIF
The probability of experiencing an event by time \(t\) and that the event experience is of type \(j\), \(j = 1, \ldots, J\) and \(D \in \{1, \ldots, J\}\) be the event type indicator.
\[
CIF_j(t) = Pr(T \leq t, D = j)
\]
\[
\lim_{t \rightarrow \infty} CIF_j(t) < 1
\]
as time goes to infinity, in the presence of competing event, the probability of experiencing event type j will be less than 100%
“referred to as a subdistribution function”
Using KM estimator of \(CIF_j(t)\) will result in under estimated survival as we have shown before.
Several estimators do not exhibit this bias has been proposed
Alen-Johansen estimator for CIF
standard error of the AJ estimator can be computed using jackknife or bootstrap technique
in R we can use cuminc() function under the tidycomprsk package to calculate CIF.
we can use Gray’s test, a modified Chi-squared test, to compare CIF between 2 or more groups (similar to the log-rank test).
library(tidycmprsk)library(ggsurvfit)# change status to a factor;fourD$status2 <-as.factor(fourD$status)levels(fourD$status2) <-c("Censored", "Death from primary cause", "Death from other causes")fit1 <-cuminc(Surv(time, status2) ~1, data = fourD)fit1
fit1 %>%ggcuminc(outcome =c("Death from primary cause", "Death from other causes")) +ylim(c(0, 1)) +labs(x ="Years")+add_confidence_interval() +add_risktable()
Suppose we want to compare cause-specific CIF by sex, we can use Gray’s test.
fit2 <-cuminc(Surv(time, status2) ~ sex, data = fourD)fit2
strata time n.risk estimate std.error 95% CI
Female 1.00 240 0.058 0.014 0.035, 0.089
Female 2.00 164 0.121 0.019 0.087, 0.162
Female 3.00 103 0.177 0.024 0.134, 0.226
Female 4.00 59 0.199 0.025 0.152, 0.251
Female 5.00 21 0.241 0.030 0.184, 0.301
Female 6.00 0 NA NA NA, NA
Male 1.00 292 0.044 0.011 0.025, 0.069
Male 2.00 219 0.112 0.017 0.081, 0.148
Male 3.00 149 0.151 0.020 0.114, 0.192
Male 4.00 78 0.212 0.024 0.167, 0.262
Male 5.00 30 0.253 0.028 0.199, 0.310
Male 6.00 1 0.288 0.036 0.219, 0.359
strata time n.risk estimate std.error 95% CI
Female 1.00 240 0.117 0.019 0.083, 0.157
Female 2.00 164 0.248 0.026 0.200, 0.300
Female 3.00 103 0.339 0.029 0.283, 0.396
Female 4.00 59 0.426 0.032 0.362, 0.488
Female 5.00 21 0.500 0.037 0.426, 0.569
Female 6.00 0 NA NA NA, NA
Male 1.00 292 0.108 0.017 0.078, 0.143
Male 2.00 219 0.199 0.022 0.159, 0.244
Male 3.00 149 0.276 0.025 0.228, 0.326
Male 4.00 78 0.360 0.029 0.304, 0.416
Male 5.00 30 0.441 0.033 0.375, 0.504
Male 6.00 1 0.518 0.059 0.397, 0.625
outcome statistic df p.value
Death from primary cause 2.06 1.00 0.15
Death from other causes 0.011 1.00 0.91
fit2 %>%ggcuminc(outcome =c("Death from primary cause", "Death from other causes")) +ylim(c(0, 1)) +labs(x ="Years")+add_confidence_interval() +add_risktable()
fit2 %>%tbl_cuminc(times =c(1,5), outcome =c("Death from primary cause", "Death from other causes"),label_header ="**Year {time}**") %>%add_p()
instantaneous rate of occurrence of the given type of event in subjects who are currently event-free
redefining the failure indicator to be 1 if and only if the patient fails due specifically to cause \(j\)
estimated using Cox regression
Subdistribution hazards
instantaneous rate of occurrence of the given type of event in subjects who have not yet experienced an event of that type
estimated using Fine-Gray regression
Assumptions
independent observations
in the presence of correlated data, we can fit a cluster or frailty term on the cause-specific hazard model and the Fine-Gray model
Non-informative censoring
Proportional hazard assumption
We have to check for PH assumption for both approaches!
use cox.zph() to test for association between hazard and time
2. Difference between the two approaches
Cause-specific Cox model
an individual is removed from the risk set when a failure due to other causes occurs
the interpretation of the regression coefficients relates to the cause-specific hazard function.
intuitive interpretation on both direct and magnitude of the effect on the hazard rate.
Fine-Gray model
the individual remains in the risk set for cause \(j\) when a failure due to other causes occurs
The interpretation of the regression coefficients relates to the subdistribution hazard function
be cautious of your interpretation! Comment on effect direction (e.g., variable x increases the risk of outcome), but not the magnitude. Austin and Fine (2017)
When to use which approach?
suggest to use the Fine-Gray subdistribution hazard model when the focus is on estimating incidence or predicting prognosis in the presence of competing risks.
suggest to use the cause-specific hazard model when the focus is on addressing etiologic questions
3. Cause-specific Cox PH Model
we can fit cause-specific Cox PH model easily using coxph() function - from lecture 3.
if your data contains two competing events, you will fit two cause-specific Cox model for each event.
same approaches to perform model diagnostics!
# cause 1 Cox model;fit3_1 <-coxph(Surv(time, status ==1) ~ age + sex, data = fourD)fit3_1 %>%tbl_regression(exp =TRUE)
Characteristic
HR1
95% CI1
p-value
age
1.02
1.00, 1.04
0.016
sex
Female
—
—
Male
0.88
0.68, 1.14
0.3
1 HR = Hazard Ratio, CI = Confidence Interval
# cause 2 cox model;fit3_2 <-coxph(Surv(time, status ==2) ~ age + sex, data = fourD)fit3_2 %>%tbl_regression(exp =TRUE)
Characteristic
HR1
95% CI1
p-value
age
1.07
1.04, 1.09
<0.001
sex
Female
—
—
Male
1.19
0.84, 1.69
0.3
1 HR = Hazard Ratio, CI = Confidence Interval
# model estimated CIF;# it's possible to add in confidence interval using confidence interval of the estimated survival;CIF.cox <-data.frame(CIF =c(1-survfit(fit3_1)$surv,1-survfit(fit3_2)$surv),time =c(survfit(fit3_1)$time,survfit(fit3_2)$time),event =c(rep("Death from primary cause", length(survfit(fit3_1)$time)), rep("Death from other causes", length(survfit(fit3_2)$time))))CIF.cox %>%ggplot( aes(x=time, y=CIF, group=event, color=event)) +geom_line()+ylim(0,1)+xlab("Years")+ylab("Cause-specific Cox model estimate Cumulative Incidence")
# checking for PH assumption;cox.zph(fit3_1)
chisq df p
age 0.9441 1 0.33
sex 0.0729 1 0.79
GLOBAL 0.9454 2 0.62
cox.zph(fit3_2)
chisq df p
age 1.871 1 0.17
sex 0.778 1 0.38
GLOBAL 3.342 2 0.19
4. Fine-Gray subdistribution hazards model
primary <-finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype ="1")other <-finegray(Surv(time, as.factor(status))~ age + sex, data = fourD,etype ="2")#creating fine-gray weights;head(primary)
age sex fgstart fgstop fgstatus fgwt
1 60 Male 0 5.8480493 0 1
2 68 Female 0 5.2539357 0 1
3 70 Female 0 2.9541410 1 1
4 69 Male 0 0.9856263 1 1
5 58 Female 0 0.2902122 1 1
6 63 Male 0 3.9452430 1 1
Recommendations for Analyzing Competing Risk Survival Data, Austin et al (2016)
(Non-parametric/univariate) Cumulative incidence functions (CIFs) should be used to estimate the incidence of each of the different types of competing risks.
Do not use the Kaplan-Meier estimate of the survival function for this purpose.
Researchers need to decide whether the research objective is on addressing etiologic questions or on estimating incidence or predicting prognosis.
etiologic, cause-specific cox PH model
incidence estimation/prediction, Fine-Gray subdistribution hazard model
In some settings, both types of regression models should be estimated for each of the competing risks to permit a full understanding of the effect of covariates on the incidence and the rate of occurrence of each outcome.
Be careful of how you interpret coefficients from the Fine-Gray model
the covariates can be concluded as having an effect on the CIF (the probabiltiy/risk of experiencing event \(j\)).
However, it is important to note that magnitude of the subdistribution hazard ratio does not
This interpretation error appears to frequently in the medical literature.
Proportional hazard assumption is needed for both approaches! Model diagnostic and validation are key steps.