Matching
Overview of matching techniques
This overview summary is based on the review paper Stuart 2010: Matching methods for causal inference: a review and a look forward
Goal of matching: choosing well-matched samples of the original groups to reduce confounding - acquire treatment and control groups with similar covariate distributions.
Alternatives to matching: adjust for covariates in a regression model, instrumental variables, structural equation modeling etc.
Benefits of matching:
- complementary to regression adjustment, can and should be used together
- have straightforward diagnostics, hence performance can be assessed
Two settings to use matching:
- when outcome values are NOT available, and matching is used to select subjects to follow up
- outcome values are available, matching is to reduce bias in treatment effect estimation
(note that the outcomes are usually not used even when they are available)
History of matching methods
- they’ve been used in 1940s, but a theoretical basis was not developed until 1970s (Rubin et al)
- with multiple covariates it is difficult to find exact matches on even small number of covariates (Chapin 1947). The introduction of propensity score (1983, Rosenbaum and Rubin) helped.
Steps to implement matching methods
- Define closeness: distance measure used to determine whether an individual is a good match for another
- implement a matching method
- assess the quality of a matching method. Might require iteration of step 1 and 2
- analyse the outcome given the matched data
Step 1: define closeness
Step 2: implement matching
nearest neighbor matching
Subclassification, full matching, weighting
Step 3: diagnose matches
Step 4: analysis of the outcome
Vignette: Estimating effects after matching by Noah Greifer
Matching (old notes)
Exact matching:
- perfect covariate balance; \(F(X_i|T_i = 1) = F(X_i|T_i=0)\)
- infeasible when covariate is continuous, and when there are many covariates.
Probability of receiving treatment, \(\pi(X_i) = P(T_i = 1 | X_i)\)
Matching based on distance measures
- Mahalanobis distance
- Estimated propensity score, \(D(X_i, X_j) = |P(T_i = 1|X_i) - P(T_j=1 | X_j)|\)
Check covariate balance
- ideally compare joint distribution of all covariates
- practically check lower-dimensional summaries (standardized mean difference, variance ratio, empirical CDF difference)
Balance test
Matching would reduce number of observations
Software
MatchIt
package
PS matching is ONE of the many matching techniques that uses PS as the difference.
Links
https://stats.stackexchange.com/questions/492218/should-the-choice-of-propensity-score-matching-versus-weighting-depend-on-the-de
https://stats.stackexchange.com/questions/553853/understanding-propensity-score-matching?rq=1
https://aetion.com/evidence-hub/understanding-propensity-score-weighting-methods-rwe/