Additional Topics under Survival Analysis

Introduction to causal survival analysis

Overview

This tutorial shows how to estimate population‑level causal effect for time‑to‑event outcomes with propensity score weighting (PSW):

  • Common causal estimand of interest, the Average Treatment Effect (ATE) as
    • average survival probabilities difference between treated and untreated at a specific time using PS‑weighted Kaplan–Meier* estimator
    • causal Hazard ratios, obtained from weighted marginal Cox model.

Why the hazard ratio is not a causal estimand

The Hazards of Hazard Ratios (Hernán)

Important

The hazard ratio compares instantaneous risks among those who have already survived to time \(t\). Because treatment can change who survives up to \(t\), the sets being compared are post-treatment selected, so the hazard ratio generally lacks a simple causal interpretation even with no unmeasured confounding.

Let \(T^{(a)}\) denote the potential event time under treatment \(a\in\{0,1\}\). The (cause-specific) hazard under \(a\) is \[ \lambda_a(t) = \lim_{\Delta t \downarrow 0} \frac{\Pr\{t \le T^{(a)} < t+\Delta t \mid T^{(a)} \ge t\}}{\Delta t}, \] and the (time-specific) hazard ratio is \[ \mathrm{HR}(t) \;=\; \frac{\lambda_1(t)}{\lambda_0(t)}. \]

Two issues make \(\mathrm{HR}(t)\) non-causal:

  1. Conditioning on a post-treatment variable. The risk set at time \(t\) is those with \(T^{(a)}\!\ge t\). If treatment affects early survival, then \[ \{T^{(1)} \ge t\} \;\text{and}\; \{T^{(0)} \ge t\} \] include different compositions of individuals because of treatment itself. Thus, even if \((T^{(0)},T^{(1)}) \perp A \mid L\) (no unmeasured confounding given \(L\)), the event \(\{T^{(a)}\ge t\}\) is not independent of \(A\). Comparing \(\lambda_1(t)\) vs \(\lambda_0(t)\) is therefore a comparison of selected subpopulations at time \(t\), not a clean “effect of treatment on outcome.”

  2. Non-collapsibility. Even without confounding, marginal and conditional hazard ratios differ: \[ \frac{\lambda_1(t)}{\lambda_0(t)} \;\neq\; \sum_{\ell} w_\ell \,\frac{\lambda_1(t\mid L=\ell)}{\lambda_0(t\mid L=\ell)} , \] so the hazard ratio does not aggregate to a population-average causal contrast. One can observe \(\mathrm{HR}(t)\!\approx\!1\) while survival probabilities differ meaningfully across arms.

Survival Probability Causal Effect

A more transparent target is the survival probability causal effect (often called SPCE): \[ \Delta_S(t) \;=\; S_1(t) - S_0(t), \qquad S_a(t) \;=\; \Pr\{T^{(a)} > t\}. \]

\(\Delta_S(t)\) is a collapsible - the difference in the probability of being event-free at time \(t\) if everyone were treated vs if everyone were untreated. - We can compute \(\Delta_S(t)\) using propensity score weighted Kaplan–Meier estimation. - Extensions is possible to incorporate informative censoring and competing risks.

1. Hands on tutorial 1 on fitting PS-weighted marginal Cox and PS-weighted KM.

Mathematically, we will aim at estimating SPCE. - the survival probability causal effect (SPCE) at a chosen time \(t\), i.e. \[ \Delta_S(t) \equiv S_1(t) - S_0(t) \] where \(S_a(t)=Pr\{T^{(a)}>t\}\) is the counterfactual survival probability under treatment \(a, a\in \{0,1\}\).

  • Causal assumptions:
    • Stable unit treatmnet assumption (no multiple versions of treatment and no interference)
    • Consistency (i.e., observed outcome is consistent with their potential outcome under the treatment they actually received),
    • Positivity (i.e., non-zero probability of receiving any of the treatment levels),
    • Conditional exchangeability (also known as no unmeasured confounding).

1.1 Reading data

  • We will revisit the veteran dataset

  • Veterans’ Administration Lung Cancer study

    • trt: 1=standard 2=test
    • celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large
    • time: survival time
    • status: censoring status
    • karno: Karnofsky performance score (100=good)
    • diagtime: months from diagnosis to randomization
    • age: in years
    • prior: prior therapy 0=no, 10=yes
  • Randomised trial of two treatment regimens for lung cancer. This is a standard survival analysis data set.

    • D Kalbfleisch and RL Prentice (1980), The Statistical Analysis of Failure Time Data. Wiley, New York.
library(survival)
library(dplyr)
library(ggplot2)
library(cobalt)
library(tableone)

data(veteran, package = "survival")
dat1 <- veteran %>%
  mutate(
    A     = as.integer(trt == 2),           # 1 = test regimen, 0 = standard
    time  = time,
    event = as.integer(status == 1),        # 1 = death, 0 = censored
    celltype = factor(celltype)
  ) %>%
  select(time, event, A, age, karno, diagtime, prior, celltype) %>%
  na.omit()


head(dat1)
  time event A age karno diagtime prior celltype
1   72     1 0  69    60        7     0 squamous
2  411     1 0  64    70        5    10 squamous
3  228     1 0  38    60        3     0 squamous
4  126     1 0  63    60        9    10 squamous
5  118     1 0  65    70       11    10 squamous
6   10     1 0  49    20        5     0 squamous

1.2 Checking baseline variable distribution by treatment

baselines <- c("age", "karno", "diagtime", "prior", "celltype")
tab0 <- CreateTableOne(vars = baselines,
                       data = dat1, 
                       strata = "A", 
                       test = FALSE, #mute P-value calculation;
                       smd = TRUE,
                       addOverall = TRUE)
print(tab0, smd = TRUE, showAllLevels = FALSE)
                      Stratified by A
                       Overall       0             1             SMD   
  n                      137            69            68               
  age (mean (SD))      58.31 (10.54) 57.51 (10.81) 59.12 (10.28)  0.153
  karno (mean (SD))    58.57 (20.04) 59.20 (18.74) 57.93 (21.40)  0.063
  diagtime (mean (SD))  8.77 (10.61)  8.65 (8.76)   8.90 (12.27)  0.023
  prior (mean (SD))     2.92 (4.56)   3.04 (4.64)   2.79 (4.52)   0.054
  celltype (%)                                                    0.465
     squamous             35 (25.5)     15 (21.7)     20 (29.4)        
     smallcell            48 (35.0)     30 (43.5)     18 (26.5)        
     adeno                27 (19.7)      9 (13.0)     18 (26.5)        
     large                27 (19.7)     15 (21.7)     12 (17.6)        

1.3 Propensity score weighting

library(WeightIt)
IPTW <- weightit(A ~ age + karno + diagtime + prior + celltype,
               data = dat1, 
               method = "glm", #using the default logistic regression;
               stabilize = TRUE,
               estimand = "ATE")
IPTW
A weightit object
 - method: "glm" (propensity score weighting with GLM)
 - number of obs.: 137
 - sampling weights: none
 - treatment: 2-category
 - estimand: ATE
 - covariates: age, karno, diagtime, prior, celltype
summary(IPTW)
                  Summary of weights

- Weight ranges:

           Min                                  Max
treated 0.6226 |---------------------------| 1.9439
control 0.6763  |---------------------|      1.7527

- Units with the 5 most extreme weights by group:
                                           
            101     96     95     98     91
 treated 1.4196 1.4639 1.4762 1.5009 1.9439
             11     52     47     46     48
 control 1.5396 1.5713 1.5865 1.7263 1.7527

- Weight statistics:

        Coef of Var   MAD Entropy # Zeros
treated       0.258 0.205   0.031       0
control       0.277 0.223   0.035       0

- Effective Sample Sizes:

           Control Treated
Unweighted   69.      68. 
Weighted     64.15    63.8
# Diagnostics: covariate balance and positivity

# Positivity: PS overlap
bal.plot(IPTW,
         which="both",
         type = "density",
         colors = c("red","blue"))

bal.tab(IPTW, un=TRUE, thresholds = c(m=0.1))
Balance Measures
                       Type Diff.Un Diff.Adj    M.Threshold
prop.score         Distance  0.5154   0.0053 Balanced, <0.1
age                 Contin.  0.1527   0.0023 Balanced, <0.1
karno               Contin. -0.0635  -0.0065 Balanced, <0.1
diagtime            Contin.  0.0230  -0.0060 Balanced, <0.1
prior_10             Binary -0.0249  -0.0019 Balanced, <0.1
celltype_squamous    Binary  0.0767  -0.0048 Balanced, <0.1
celltype_smallcell   Binary -0.1701  -0.0009 Balanced, <0.1
celltype_adeno       Binary  0.1343   0.0058 Balanced, <0.1
celltype_large       Binary -0.0409  -0.0001 Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1         9
Not Balanced, >0.1     0

Variable with the greatest mean difference
 Variable Diff.Adj    M.Threshold
    karno  -0.0065 Balanced, <0.1

Effective sample sizes
           Control Treated
Unadjusted   69.      68. 
Adjusted     64.15    63.8
love.plot(IPTW, 
          binary = "std", 
          grid = TRUE,
          thresholds = c(m = .1),
          colors = c("red","blue"))  

1.4 PS-weighted Kaplan Meier Estimator

dat1$w<-IPTW$weights # adding PS weights back to the data;
t_star <- 120
fitA1 <- survfit(Surv(time, event) ~ 1, data = subset(dat1, A==1), weights = dat1$w[dat1$A==1])
fitA0 <- survfit(Surv(time, event) ~ 1, data = subset(dat1, A==0), weights = dat1$w[dat1$A==0])

S1_t  <- as.numeric(summary(fitA1, times = t_star, extend = TRUE)$surv)
S0_t  <- as.numeric(summary(fitA0, times = t_star, extend = TRUE)$surv)
SPCE  <- S1_t - S0_t
SPCE
[1] -0.1170191
# Plot PS-weighted survival curves
plot(fitA0, mark.time = FALSE, xlab = "Days", ylab = "Survival",
     main = "PS-weighted KM (veteran)", lty = 2, conf.int = FALSE)
lines(fitA1, lty = 1, conf.int = FALSE)
abline(v = t_star, lty = 3)
legend("topright", c("A=0 (control)", "A=1 (treated)"), lty = c(2,1), bty = "n")

1.5 Bootstrap confidence interval for causal estimand

set.seed(123)
n <- nrow(dat1) #sample size;
B <- 1000 #bootstrap iterations;
boot_SPCE  <- numeric(B)

for (b in 1:B) {
  # Resample subjects with replacement
  idx <- sample.int(n, size = n, replace = TRUE)
  bdat <- dat1[idx, ]

  # fit PS and recompute stabilized IPTW (ATE)
  IPTWb <- weightit(A ~ age + karno + diagtime + prior + celltype,
               data = bdat, 
               method = "glm", #using the default logistic regression;
               stabilize = TRUE,
               estimand = "ATE")
  bdat$w <- IPTWb$weights
 
  fitA1_b <- survfit(Surv(time, event) ~ 1, data = subset(bdat, A==1),
                     weights = bdat$w[bdat$A==1])
  fitA0_b <- survfit(Surv(time, event) ~ 1, data = subset(bdat, A==0),
                     weights = bdat$w[bdat$A==0])

  S1_b <- as.numeric(summary(fitA1_b, times = t_star, extend = TRUE)$surv)
  S0_b <- as.numeric(summary(fitA0_b, times = t_star, extend = TRUE)$surv)
  boot_SPCE[b] <- S1_b - S0_b

}

# Percentile 95% CIs
quantile(boot_SPCE, c(0.025, 0.975))
       2.5%       97.5% 
-0.26620470  0.03855422 

1.6 “Causal” Hazard ratio (not causally interpretable!)

cox1 <- coxph(Surv(time, event) ~ A, data = dat1, weights = w, robust = TRUE)
summary(cox1)