Logistic regression example: lung

Regression

Logistic regression

Author

Chi Zhang

Published

October 5, 2024

This analysis is in preparation for interviews related to logistic regression. Focus will be put on the procedure (and how to do it in R), as well as interpretation of the results.

mtcars |> head()
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
mlr1 <- glm(vs ~ mpg + wt, family = 'binomial', data = mtcars)
summary(mlr1)

Call:
glm(formula = vs ~ mpg + wt, family = "binomial", data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -12.5412     8.4660  -1.481   0.1385  
mpg           0.5241     0.2604   2.012   0.0442 *
wt            0.5829     1.1845   0.492   0.6227  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.860  on 31  degrees of freedom
Residual deviance: 25.298  on 29  degrees of freedom
AIC: 31.298

Number of Fisher Scoring iterations: 6

Deviance

The difference is how much adding two variables has improved the model

Hypothesis tests

Wald test: z-values are the wald test statistics

Likelihood ratio test for two nested LR

Build a second model

mlr2 <- glm(vs ~ mpg, family = 'binomial', data = mtcars)
summary(mlr2)

Call:
glm(formula = vs ~ mpg, family = "binomial", data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -8.8331     3.1623  -2.793  0.00522 **
mpg           0.4304     0.1584   2.717  0.00659 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.860  on 31  degrees of freedom
Residual deviance: 25.533  on 30  degrees of freedom
AIC: 29.533

Number of Fisher Scoring iterations: 6
anova(mlr2, mlr1, test = 'Chisq') # analysis of deviance
Analysis of Deviance Table

Model 1: vs ~ mpg
Model 2: vs ~ mpg + wt
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        30     25.533                     
2        29     25.298  1  0.23546   0.6275
mlr3 <- glm(vs ~ wt, family = 'binomial', data = mtcars)
summary(mlr3)

Call:
glm(formula = vs ~ wt, family = "binomial", data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   5.7147     2.3014   2.483  0.01302 * 
wt           -1.9105     0.7279  -2.625  0.00867 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.860  on 31  degrees of freedom
Residual deviance: 31.367  on 30  degrees of freedom
AIC: 35.367

Number of Fisher Scoring iterations: 5
anova(mlr3, mlr1, test = 'Chisq') # analysis of deviance
Analysis of Deviance Table

Model 1: vs ~ wt
Model 2: vs ~ mpg + wt
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1        30     31.367                       
2        29     25.298  1   6.0689  0.01376 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(mlr1, mlr2, mlr3)
     df      AIC
mlr1  3 31.29788
mlr2  2 29.53334
mlr3  2 35.36673
BIC(mlr1, mlr2, mlr3)
     df      BIC
mlr1  3 35.69508
mlr2  2 32.46481
mlr3  2 38.29820

Diagnostics