Regression

Model

Linear, logistic, Cox proportional hazard

Author

Chi Zhang

Published

October 2, 2024

Summary

Aspect Linear Logistic Cox
Variable selection stepwise
regularisation
Model selection AIC
BIC
R2
AIC
BIC
adjusted R2
ROC/AUC
AIC
BIC
Concordance index
Hypothesis test for one or more coefficients t-test / Wald test
F-test (overall model)
LRT (less common)
Wald test
LRT
Wald test
Score (log rank) test
LRT
Diagnostics Residual plot
QQ plot
Influence (Cook’s distance)
Hosmer-Lemeshow test
Calibration plots
influence
Proportional hazard assumption (PH)
residuals (shoenfeld, martingale)
influence measures
When assumption does not hold PH: stratified cox model
time varying covariats
parametric models
competing risk

Model building workflow

  1. state hypothesis
  2. data exploration
  3. fit a regression model
  4. diagnostics
  • (linear reg:) residual vs fitted
  • normal QQ plot of residuals
  • added variable plots
  • influence plot (residual vs leverage)
  1. fix biggest problem, go back to 3
  2. compare alternative models with nested model tests
  3. interpret the coefficients

Hypothesis tests for regression

T-test in linear regression for individual coefficient

Hypothesis: \(H_0: \beta_1 = 0\)

Test statistic: \(T_0 = \frac{\hat{\beta_1} - 0}{se{\hat{\beta_1}}}\)

Reject H0 if \(t_0 > t_{\alpha, n-p-1}\). If one covariate, \(n-2\)

F-test in linear regression for joint significance

Evaluate the overall significance of model, and compare nested models.

Hypothesis: \(H_0: \beta_1 = \beta_2 = \beta_3 = ... = 0\). Joint non-significance of the model

Alternative hypothesis: \(H_1: \beta_i \neq = 0\), one of them is significant

Can be used for two nested linear regression models. Based on variance decomposition, not likelihood.

Compare unrestricted sum of square of residuals (SSR) with restricted SSR.

  • restricted: coefficients are restricted to be 0 (i.e intercept only, H0). Would have higher SSR as no variance is explained by covariates
  • unrestricted: not restricted to be 0, H1

If restricted is much larger than unrestricted, then reject the null.

\[F = \frac{(SSR_r - SSR_u)/p}{SSR_u/n-p-1} \sim f_{p, n-p-1}\]

Likelihood ratio test for nested models

Typically used for GLM and Cox regression. LRT can also be used for linear regression, but F-test is more common.

LRT tests goodness-of-fit between two nested models, tests whether removing one predictor improves the model fit. It is based on likelihood function.

For GLM we do not have SSR, so comparing nested models requires likelihood from restricted and unrestricted models.

Null hypothesis: reduced (fewer predictors) model is sufficient

Alternative: full (more predictors) model is better

Deviance: measures the difference in log-likelihood between fitted model and saturated model, which means it measures how far the current model is from the ideal model that fits the data perfectly.

  • null deviance: intercept only
  • residual deviance: deviance of the model with predictors included. Lower residual deviance, better fit

\[D = -2 \times (\frac{lik_{\text{fitted model}}}{lik_{\text{saturated}}}) \sim \chi^2_{p1 - p2}\]

Model 1 deviance - model 2 deviance

Wald test

Can be used to test about individual coefficients or set of coefficients in regression

\[W = \frac{(\hat{\theta} - \theta_0)^2}{I(\theta)^{-1}} \sim \chi^2_1\] For individual \(\beta\), it is equivalent to t-test (to the power of 2) under normality assumptions

\[W = \frac{(\hat{\beta} - \beta_0)^2}{var(\hat{\beta})}\] For multiple coefficients, testing whether several coefficients are simultaneously zero, the tests uses vectorized theta and covariance matrix.

Wald test is commonly seen in logistic and cox models to test the significance of covariates.

See an example here.

Log rank test for survival curves (semi-parametric)

Score test (log rank) for overall fit

Score test is used to test overall fit of the model without fitting the

Model selection and comparison

Forward selection: start with null, add one-at-a-time, refit

Backward selection: start with full, eliminate one-at-a-time (from the one with largest p-value), refit, repeat

No guarantee that they arrive at the same final model. Generally choose the one with large adjusted R2.

F-test and likelihood ratio tests for nested model

F-test (linear) and LRT can be used for comparing nested models. The commands are similar, anova(lm1, lm2) or anova(lr1, lr2, test = 'Chisq').

AIC, BIC

Trade-offs between goodness-of-fit and number of parameters. Lower AIC, BIC is better

Can be used for non-nested models.

BIC is stricter for model complexity, so favors simpler models in small datasets.

Assumptions and diagnostics

Aspects to consider
  • distribution (QQ)
  • residuals (pattern, outliers; deviance; Schoenfled)
  • fit (overall fit, goodness of fit, comparing nested models)
  • collinearity (variance inflation factor)

When assumptions do not hold

  • model mis-specification: add more or remove variables; interaction terms; transformation of variables
  • collinearity: remove after checking VIF
  • other models: GAM, mixed effects, parametric models
Related to my own work (TBF)

Linear regression

Assumptions:

  1. all relationships are linear
  2. independent observation
  3. no perfect collinearity, no zero variance of independent variance (e.g. only female gender in the data, no male)
  4. error term is normally distributed
  5. homoscedasticity: error term has expected value of zero, uncorrelated with independent var
  6. error term has equal variance

Use residual as estimate for error terms.

  • Residual vs fitted: should show no pattern. If it shows patterns (clusters, butterfly, U shape …) indicate either non-linearity or heteroscedasticity
    • heteroscedasticity: try robust standard errors
    • non-linearity: consider transformation
  • Q-Q plot
  • Residual vs leverage: identify outliers (influential observations)
    • leverage: distance from the mass center of the data
    • Cook’s distance: overall measure of influence of an observation

Less important: scale vs location

Other plots: car::avPlots

See case study: prestige for more examples.

When assumption does not hold

Logistic regression

Logit(p) = log(p/(1-p)) = b0 + bpxp

Assumptions:

  1. binary out ordinal outcome
  2. large samples
  3. independence
  4. linearity of indep variables and log odds (so that it’s linear addition)
  5. none or little multicollinearity between independent variables

When assumption does not hold

Poisson regression

And negative binomial

Key assumption for poisson regression: variance approximately equal to mean. If over dispersed (variance greater than mean), use NB.

How to choose

  • fit a poisson, compute dispersion parameter (SSR/df), where df is n-p. If much greater than 1, consider nb
  • fit a nb
  • compare goodness of fit measures (AIC, BIC, log-likelihood)
  • use likelihood ratio tests

Assumptions:

  1. count data
  2. response follows poisson or nb distribution.
  3. independent observations
  4. linearity of indep variables and the log link
  5. no excessive zeros

When assumption does not hold

  1. robust standard errors if over dispersion is mild
  2. excessive zero: zero-inflated poisson or nb; hurdle models that splits the zeros and non-zeros
  3. independence: use mixed model
  4. linearity: polynomial term, transformation

Cox regression

Proportional hazards assumption, tested with cox.zph().

  • p-value for each covariate
  • significant suggests proportional hazards is violated for this covariate

Concordance: model’s ability to predict the ordering of survival times, i.e. how well the model can rank individual subjects by risk. It ranges from 0.5 to 1, the higher the better.

Residual diagnostics, shouldn’t display patterns

  • martingale residual
  • deviance residual
Note

When assumption does not hold

  • Modify the model within Cox: stratified cox, time varying covariate (e.g.landmark analysis)
  • Parametric model: accelerated failure time AFT model, cure model, competing risk

General: VIF, Cook’s distance, GoF

Subgroup analysis, Sensitivity, interaction

Related to my own work (TBF)
  • requests for post-hoc power analysis
    • separate analysis stratefied by sex and smoking status (COVITA)
    • interaction: alcohol study

Subgroup analysis

E.g. analyse effect of new treatment on patient under 50 vs above 50 to see if treatment works differently in two age subgroups.

  • pre-specified (a priori): planned in SAP
  • post-hoc: conducted afterwards. useful for generating hypothesis, but high risk of false positives (type I) due to multiple comparisons

Steps: define subgroup -> conduct analysis and estimate treatment effect -> check for interaction -> interpretation

Risks of subgroup analysis, how to mitigate

Risk Solution Comment
Multiple comparison Increase the risk of FP (type I error), statistically significant differences occur by chance Bonferroni correction, FDR adjustments
Reduced power (even) Smaller sample
Over-interpretation Post-hoc analysis are not confirming the hypothesis made in SAP, interpretation need to be cautious Relate to other studies
p-hacking Search for significant subgroup without clear hypothesis
Loss of generalizability Obscure the overall treatment effect

ANOVA et al

  • Anova compares the mean difference between groups
  • Ancova adds one additional continuous covariates, e.g. adjusting for age, tumor size (not gender as it’s not continuous)
  • Mancova allows for more than one continuous covariate, on more than one dependent variable.

Interview questions

Concept explanation

Procedure

Model selection

Diagnostics