Causal Inference in Data Science

Observational data

Some topics that need to be reviewed

Author

Chi Zhang

Published

February 19, 2025

Categorise the problem quickly
  • level of aggregation: individual, or aggregated counts/ country level summary statistic
  • time component: temporal intervention of not
  • control: is control available, or do we need to construct it
Method Time series Control Statistics or ML R packages Python libraries
Difference-in-Difference no statistical did, plm statsmodels, econml, causalinference
Propensity score matching no statistical MatchIt, twang pymatch, causalml, sklearn
Regression discontinuity design no statistical rdd, rdrobust
Instrumental variable no statistical ivreg, AER econml
Causal forests no ML grf econml, causalml, sklearn
Bayesian structural time series yes statistical + ML bsts causalimpact, orbit, pymc3
Synthetic control yes statistical Synth, tidysynth synthcontrol, PySynth

These techniques have different use-cases. Focus on the aggregated time series ones first.

Non-TS usecases in business

IV, RD, PSM