Survival
Links
https://www.emilyzabor.com/tutorials/survival_analysis_in_r_tutorial.html
https://www.danieldsjoberg.com/ggsurvfit/
jmpost: combines survival analysis, mixed effect model https://genentech.github.io/jmpost/main/
Basics
Observed time \(Y_i = min(T_i, C_i)\)
- \(T_i\) is event time
- \(C_i\) is censoring time
Event indicator \(\delta_i = 1\) if event observed (\(T_i <= C_i\)), 0 else.
Probability that a subject surves beyond given specific time:
\[S(t) = Pr(T>t) = 1 - F(t)\] where
- \(S(t)\) is the survival function
- \(F(t) = Pr(T <= t)\) is the cumulative distribution function
Survival probability at a certain time t is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time. This can be estimated as the number of patients who are alive (without loss of follow-up), divided by the number of patients who were alive just prior to that time.
Left and right censor
3 Time points: diagnosis, intervention or study time 0 (baseline), recording of outcome
- Right censor: endpoint not observed, survived until at least recording time
- Left censor: diagnosis time unknown. We might care about the time from diagnosis to outcome, rather than baseline to outcome.
Kaplain Meier curve
It is non-parametric estimator of the survival function.
Log-rank test
This is a non-parametric test, comparing two survival distributions without assuming a parametric form for the survival distribution.
## Cox proportional hazard
The Lehmann alternative, \(S_1(t) = [S_0(t)]^\psi\)
Proportional hazard assumption: \(h_1(t) = \psi h_0(t)\). It is key to quantify the difference between two hazard functions.
Hazard ratio, \(\psi = e^{x\beta}\)
Tests
H0: \(\beta = 0\)
- Wald test
- Score (logrank) test. The score function is the first derivative of log-likelihood
- Likelihood ratio test
Cox regression
Hazard ratio (relative to baseline hazard) for subject i is \(\psi_i = e^{x_i\beta}\)
Semi-parametric model for survival outcome
\[h(t|X_i) = h_0(t) exp(\beta_1 X_{i1} + ... + \beta_p X_{ip})\] where
- \(h(t)\) is hazard, the instantaneous rate at which events occur
- \(h_0(t)\) is the underlying baseline hazard
Assumptions
- non-informative censoring
- proportional hazards
Hazard ratio HR: the ratio of hazards between two groups at any particular point in time. For example, HR = 0.59 (sex female) means 0.59 times as many females die as males at any given time - females have lower hazard of death than males.
Landmark analysis
Covariates are measured at baseline - before follow-up time for the event begins
Examples of covariates that are not measured at baseline: transplant failure, compliance, adverse events
Landmark approach
- select a fixed time after baseline, this should be done based on clinical information
- subset population for those followed at least until landmark time
- calculate follow-up from landmark time, and apply log-rank tests or cox regression
It might be necessary to reset the time (for example by substracting the landmark time, say 90 days)
Time-dependent covariate
This is more appropriate than landmark analysis when
- value of a covariate changes over time
- there isn’t an obvious landmark time
- use of landmark leads to too many exclusions
Interview questions
- What is a Kaplan-Meier survival curve, and how do you interpret it?
- How do you handle censored data in survival analysis?
- Explain the Cox proportional hazards model and its assumptions. How would you test for proportionality of hazards?
- How would you perform a log-rank test to compare two survival curves?
- How is hazard ratio used in survival analysis? what are the limitations?
- How to handle time-to-event data when progression-free survival (PFS) is the primary endpoint in an oncology trial?