Interview: statistics used in my work

Inference

Case studies

Author

Chi Zhang

Published

October 20, 2024

Interview preparation: statistics and study designs used in my own work. Divide by types of endpoint, design (and why), analysis method.

Selected list of my statistical advising work

Observational studies

Study Endpoint Design Analysis Comment
Hansen 2018 QoL on vocal surgical questionnaire, likert scale Case control, 3 groups: controls (no voice problem), test-retest group, surgical group Cohen’s kappa Reliability of questionnaire
Sarjomaa 2022 Tab 4
Sarjomaa 2024 Case control (with population)
Kabashi 2019 Cross-sectional, multi-site Tab 2 and 3
Fell 2024 CB prevalence
Diab 2023 ECG parameter. Pressure measures n=20 t-test or wilcoxon, correlation Choice of method

Clinical trials

Study Endpoint & Tx Design Analysis Comment
Reier-Nilsen 2019 QoL, Oral immunotherapy RCT, 2 year follow-up (T0, T1, T2). 2:1 block.
Opdal 2023 QT intervals. Methadone vs R-methadone n=9. SS based on one-sample t-test with effect size (QT in ms) 20/15 SS calculation
Simonsen 2023 QoL, stigma stress scale n=32, 1:1. Three endpoints (T0, T1, T2) Revise study protocol, SS
Callender 2023 Artery measures for blood flow n=26. Two legs, one tx one ct. Tx has 3 levels of stimulus. Within-subject design Anova; or mixed model Choice of method

Pay attention to how the sample size is calculated

For behavior questions check here

  • by field: Covid; nutrition;
  • by design:
  • by endpoint: most are QoL

Like:

  • like to help clinicians understand their data
  • like health data, sometimes you see very novel data types (e.g. measurements by the british doctor)
  • interesting ways people carry out their studies

Dislike:

  • communication (e.g. p-value interpretation)
  • sometimes feels the frustration that the importance of data quality, processing is undervalued, and we are just p-value producers
  • often need to re-work multiple rounds

Endpoints

Quality of life QoL

Typically measured on a scale, summed up

  • QoL on vocal surgical questionnaire, likert scale. Intervention: throat surgery (Hansen 2018)
  • QoL on likert scale for both children and parents. Intervention: oral immunotherapy (Reier-Nilsen 2019)
  • QoL, stigma stress scale (Simonsen 2023)

Biomarker

  • Artery blood flow measures. Intervention: local anaesthesia medicine (Callender 2023)
  • QT intervals. Intervention: methadone (Opdal 2023)
  • ECG parameters. Intervention: pressure measures (Diab 2023)

Observational: Covid / CB

Design

RCT

Parallel or cross-over? Choice of control - placebo, active control, historical control? Timing and frequency of measurements? Randomization method and blinding?

My works have an exploratory nature - the RCT works I’ve done all seems to be phase II (neither determining the safe dose, nor find deterministic evidence in the wider population) with small to medium sample size.

I have never followed an RCT from start to end, so I’ve never been involved in the design of the studies.

For the designs: mostly parallel designs with single measurement or longitudinal measurements, where subjects are measured multiple times. I’ve done one cross-over study (same patient, first tx then ct).

I have not done any factorial design (multiple treatments at the same time).

Selection criteria: clinicians have their own criteria for relevant subjects, usually age groups, sex and underlying conditions

Randomization: the ones I’ve done are mostly simple randomization, nothing with blocks or stratification

Masking (blinding): my works seem to have open labels, not blinded

Peanut allergy

(Reier-Nilsen 2019)

2 year follow up (T0, T1, T2)

  • Endpoint: 3 or 5-point likert scale for kids, 7 point likert scale for parents. Points are summed and reported
  • Selection: Enroll positive food challenge 5-15 yo, 96 in total. Exclude asthma and other severe chronic diseases
  • Randomization: Remove 19 ineligible, 77 2:1 block OIT vs controls (observation only). Possible reason for 2:1: many would drop out
  • Blinding: NA

Methadone

(Opdal 2023)

  • Selection: 18+, not pregnant, no serious illness, willingness to comply
  • Design: Cross over, 14 days washoff
  • Randomization: Cross over, subjects are their own controls
  • Blinding: Cardiologist is blinded to calculate QT measurement
  • Comparison: on-methadone QTc vs off
  • Sample size: n=9, clinically relevant difference 20, SD 15 (unit: microsecond), b = 0.9. Can be computed with pwr::pwr.t.test(d=20/15). See more in the next section.

Artery blood flow

(Callender 2023) n = 26; within-subject two legs; three levels of intervention (stimulus)

  • Selection: 13M, 13F, healthy, between age 18-50, free from medication influencing cardiovascular system
  • Design: Parallel
  • Randomization: two legs on the same patient, tx ct
  • Blinding: NA
  • Sample size: a=0.05, b=0.8; effect size based on previous works, computed by clinicians themselves

Power and sample size

clinicians provide information on the anticipated effect size, statistician calculate sample size needed to detect effect with adequate power while minimizing type I and II error.

Sample size computation:

  • Require:
    • test: t-test (one, two samples), tails
    • significance level
    • power
    • cohen’s d or f - effect size (e.g. standardized mean difference). If unavaiable, can use the Cohen’s guideline for small, medium and large effect sizes: (f = 0.1, 0.25, 0.4)

QoL stigma

(Simonsen 2023) n = 32; T0, T1, T2; 1:1; justify SS computation

If the study has a paired design with effect size (standardized after substracting the standard deviation) 0.5,

pwr::pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.8, type = 'one.sample', alternative = 'two.sided')

     One-sample t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Methadone

(Opdal 2023) n=9; effect size provided (as clinically relevant difference and standard deviation); compute SS

In this case study, the computed sample size is as follow,

pwr::pwr.t.test(d = 20/15, sig.level = 0.05, power = 0.9, type = 'one.sample', alternative = 'two.sided')

     One-sample t test power calculation 

              n = 8.07235
              d = 1.333333
      sig.level = 0.05
          power = 0.9
    alternative = two.sided

The effect size is huge, hence the small sample size required.

Post-hoc power computation

It is not recommended, as the information is the same as what p-value provides: a small p-value that indicates a large effect size, that will give a high power.

Power analysis is a design tool, not a diagnostic tool. It is used to make sure that the study has enough samples to detect an effect of a given magnitude and the study has high probability of detecting a true effect if it exists. Once the study is done, analyzing power adds little beyond the p-value.

Observational

AHUS

Retrospective observational study, linked clinical data from EHR.

Selected as subjects have been in the hospital during a certain time (June 2018-19), above a certain age (18) and have been in certain wards (four selected).

Limitation

  • generalizability: four wards only, pre-covid data
  • error in data as EHR data goes
  • unmeasured confounders

COVITA

(Sarjomaa 2024) Case control with population

Aim is to assess risk factors for Covid infection:

  • compare PCR positive (400) vs negative (719) (case control)
  • them compare with population control (14509) (pre-pandemic)

Selection: adults who did PCR tests between 2020.2 to 12. PCR+- identified from test centers and hospitals, then contacted by telephone and invited to participate in research project.

Population controls are age matched. Pre-existing Telemark study data of 21-55 yo in 2018.

Results: sex (M) in TND (test negative)

Limitation (bias)

  • generalizability: location is limited, age group is limited
  • recall bias as it is questionnaire data; despite minimal missing
  • bias in healthcare seeking behavior
  • unmeasured confounders

Alcohol

(Kabashi 2019) Cross sectional, multi-site

Survey

Norkost

Analysis

Tests

t-test, wilcoxon

Multiple testing adjustment

Regression

Linear regression / regularization

HAI paper

Early warning system at a collective level, using predictors that are potentially interveneable.

This was a simulation study.

FHI Excess mortality

Bayesian linear regression

  • count data, but after doing EDA it can be approximated with linear with a single predictor: year
  • the task is not to explain (i.e. regression coefficient and intervals), but to predict
  • sometimes more convenient to use the per 100k number of death (or cases)

Logistic regression / ordinal

COVITA

Different results (in terms of risk factors) can come from

  • study design
  • different selection for case and control
  • geographical location (that affect the population)

Comparisons

  • Study I: 400 PCR+ vs 719 PCR-
  • Study II: 286 PCR+ (age 18-55) vs 14509 pop
  • Study III: 509 PCR- (age 18-55) vs 14509 pop -> other infections other than COVID

Missing data handling

  • low rate of missing (0-6.2% for most questions, roughly 15% for smoking) in questionnaire subjects.
  • no imputation was done
  • sensitivity analysis: excluded the missing, do the analysis again and see whether the conclusion differ

DTW paper

LR as a classifier

Poisson regression

Transfer paper

Two objectives:

  • optimize patient logistics, quantify resource utilisation
  • more detailed knowledge of highly connected hospital hubs and trajectories for infection prevention and control

Transfer chain analysis, 15258 patient stays that have more than one location. 1118 unique transfer chains (sequence of locations)

Skewed: 75% of patients followed one of the top 20 types: notably, ED to neurology; ED to gastrosurgery …. Many have the pattern of bed - ORblock - bed.

Transfer rate with Poisson regression to identify risk factors contributing to higher number of transfers

  • higher age, less transfers
  • higher mean first 48h NEWS, more transfers. (used categories 0-2, 3-4, 5-6 instead of numbers to capture the non-linear relationship)
  • no gender difference
  • after adjusting for HDU/ICU and OR, effect of age and NEWS becomes insignificant
  • interactions between department, surgery and antibiotics use are not significant

Missing data

Complete case only? Imputation? LOCF (last observation carried forward)? IIT vs per protocol

(not typical) DTW

(not typical) Norkost dietary survey

Keep zero as is

Subgroup analysis, sensitivity

Transfer paper

CB

Mixed model

Peanut allergy

Norkost

Survival analysis

Hospital LOS

Others

Cohen’s kappa