Survival

Links

https://www.emilyzabor.com/tutorials/survival_analysis_in_r_tutorial.html

https://www.danieldsjoberg.com/ggsurvfit/

jmpost: combines survival analysis, mixed effect model https://genentech.github.io/jmpost/main/

Basics

Observed time \(Y_i = min(T_i, C_i)\)

  • \(T_i\) is event time
  • \(C_i\) is censoring time

Event indicator \(\delta_i = 1\) if event observed (\(T_i <= C_i\)), 0 else.

Probability that a subject surves beyond given specific time:

\[S(t) = Pr(T>t) = 1 - F(t)\] where

  • \(S(t)\) is the survival function
  • \(F(t) = Pr(T <= t)\) is the cumulative distribution function

Survival probability at a certain time t is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time. This can be estimated as the number of patients who are alive (without loss of follow-up), divided by the number of patients who were alive just prior to that time.

Left and right censor

3 Time points: diagnosis, intervention or study time 0 (baseline), recording of outcome

  • Right censor: endpoint not observed, survived until at least recording time
  • Left censor: diagnosis time unknown. We might care about the time from diagnosis to outcome, rather than baseline to outcome.

Kaplain Meier curve

It is non-parametric estimator of the survival function.

Log-rank test

This is a non-parametric test, comparing two survival distributions without assuming a parametric form for the survival distribution.

## Cox proportional hazard

The Lehmann alternative, \(S_1(t) = [S_0(t)]^\psi\)

Proportional hazard assumption: \(h_1(t) = \psi h_0(t)\). It is key to quantify the difference between two hazard functions.

Hazard ratio, \(\psi = e^{x\beta}\)

Tests

H0: \(\beta = 0\)

  • Wald test
  • Score (logrank) test. The score function is the first derivative of log-likelihood
  • Likelihood ratio test

Cox regression

Hazard ratio (relative to baseline hazard) for subject i is \(\psi_i = e^{x_i\beta}\)

Semi-parametric model for survival outcome

\[h(t|X_i) = h_0(t) exp(\beta_1 X_{i1} + ... + \beta_p X_{ip})\] where

  • \(h(t)\) is hazard, the instantaneous rate at which events occur
  • \(h_0(t)\) is the underlying baseline hazard

Assumptions

  • non-informative censoring
  • proportional hazards

Hazard ratio HR: the ratio of hazards between two groups at any particular point in time. For example, HR = 0.59 (sex female) means 0.59 times as many females die as males at any given time - females have lower hazard of death than males.

Landmark analysis

Covariates are measured at baseline - before follow-up time for the event begins

Examples of covariates that are not measured at baseline: transplant failure, compliance, adverse events

Landmark approach

  • select a fixed time after baseline, this should be done based on clinical information
  • subset population for those followed at least until landmark time
  • calculate follow-up from landmark time, and apply log-rank tests or cox regression

It might be necessary to reset the time (for example by substracting the landmark time, say 90 days)

Time-dependent covariate

This is more appropriate than landmark analysis when

  • value of a covariate changes over time
  • there isn’t an obvious landmark time
  • use of landmark leads to too many exclusions

Interview questions

  • What is a Kaplan-Meier survival curve, and how do you interpret it?
  • How do you handle censored data in survival analysis?
  • Explain the Cox proportional hazards model and its assumptions. How would you test for proportionality of hazards?
  • How would you perform a log-rank test to compare two survival curves?
  • How is hazard ratio used in survival analysis? what are the limitations?
  • How to handle time-to-event data when progression-free survival (PFS) is the primary endpoint in an oncology trial?