Causal Inference in Data Science
Observational data
Some topics that need to be reviewed
Categorise the problem quickly
- level of aggregation: individual, or aggregated counts/ country level summary statistic
- time component: temporal intervention of not
- control: is control available, or do we need to construct it
Method | Time series | Control | Statistics or ML | R packages | Python libraries |
---|---|---|---|---|---|
Difference-in-Difference | no | statistical | did , plm |
statsmodels , econml , causalinference |
|
Propensity score matching | no | statistical | MatchIt , twang |
pymatch , causalml , sklearn |
|
Regression discontinuity design | no | statistical | rdd , rdrobust |
||
Instrumental variable | no | statistical | ivreg , AER |
econml |
|
Causal forests | no | ML | grf |
econml , causalml , sklearn |
|
Bayesian structural time series | yes | statistical + ML | bsts |
causalimpact , orbit , pymc3 |
|
Synthetic control | yes | statistical | Synth , tidysynth |
synthcontrol , PySynth |
These techniques have different use-cases. Focus on the aggregated time series ones first.
Non-TS usecases in business
IV, RD, PSM