Magdalena Bennett
McCombs School of Business, The University of Texas at Austin
AEFP 50th Conference, Washington DC
March 13th, 2025
We can potentially remove [part of] the bias by matching on Xsit=Xi
Identify contexts when matching can recover causal estimates under certain violations of the parallel trend assumption.
Identify contexts when matching can recover causal estimates under certain violations of the parallel trend assumption.
Simulations:
Different DGP scenarios
Application:
School segregation & vouchers
Let's set up the problem
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
ATT(t)=E[Yit(1)−Yit(0)|Z=1]
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
Difference-in-Differences (DD) focuses on ATT for t>T0:
ATT(t)=E[Yit(1)−Yit(0)|Z=1]
^τDD=ΔpostE[Yi1|Z=1]−E[Yi1|Z=0]−(E[Yi0|Z=1]−E[Yi0|Z=0])Δpre
Bias can be introduced to DD in different ways:
Bias can be introduced to DD in different ways:
1) Time-invariant covariates with time-varying effects: Obs. Bias
Bias can be introduced to DD in different ways:
1) Time-invariant covariates with time-varying effects: Obs. Bias
2) Differential time-varying effects: Obs. Diff. Bias
Bias can be introduced to DD in different ways:
1) Time-invariant covariates with time-varying effects: Obs. Bias
2) Differential time-varying effects: Obs. Diff. Bias
3) Observed or unobserved time-varying covariates: Unobs. Bias
Obs.Bias(¯γ1(X1,t′)−¯γ1(X0,t′))−(¯γ1(X1,t)−¯γ1(X0,t))+(¯γ2(X1,t′)−¯γ2(X1,t))Obs.Diff.Bias+(λt′1−λt′0)−(λt1−λt0)Unobs.Bias=0
Obs.Bias(¯γ1(X1,t′)−¯γ1(X0,t′))−(¯γ1(X1,t)−¯γ1(X0,t))+(¯γ2(X1,t′)−¯γ2(X1,t))Obs.Diff.Bias+(λt′1−λt′0)−(λt1−λt0)Unobs.Bias=0
One of the two conditions need to hold:
1) No effect or constant effect of X on Y over time: E[γ1(X,t)]=E[γ1(X)]
2) Equal distribution of observed covariates between groups: Xi|Z=1d=Xi|Z=0
Obs.Bias(¯γ1(X1,t′)−¯γ1(X0,t′))−(¯γ1(X1,t)−¯γ1(X0,t))+(¯γ2(X1,t′)−¯γ2(X1,t))Obs.Diff.Bias+(λt′1−λt′0)−(λt1−λt0)Unobs.Bias=0
One of the two conditions need to hold:
1) No effect or constant effect of X on Y over time: E[γ1(X,t)]=E[γ1(X)]
2) Equal distribution of observed covariates between groups: Xi|Z=1d=Xi|Z=0
in addition to:
3) No differential time effect of X on Y by treatment group: E[γ2(X,t)]=0
4) No unobserved time-varying effects: λt1=λt0
Obs.Bias(¯γ1(X1,t′)−¯γ1(X0,t′))−(¯γ1(X1,t)−¯γ1(X0,t))+(¯γ2(X1,t′)−¯γ2(X1,t))Obs.Diff.Bias+(λt′1−λt′0)−(λt1−λt0)Unobs.Bias=0
One of the two conditions need to hold:
1) No effect or constant effect of X on Y over time: E[γ1(X,t)]=E[γ1(X)]
2) Equal distribution of observed covariates between groups: Xi|Z=1d=Xi|Z=0
in addition to:
3) No differential time effect of X on Y by treatment group: E[γ2(X,t)]=0
4) No unobserved time-varying effects: λt1=λt0
Cond. 2 can hold through matching
Obs.Bias(¯γ1(X1,t′)−¯γ1(X0,t′))−(¯γ1(X1,t)−¯γ1(X0,t))+(¯γ2(X1,t′)−¯γ2(X1,t))Obs.Diff.Bias+(λt′1−λt′0)−(λt1−λt0)Unobs.Bias=0
One of the two conditions need to hold:
1) No effect or constant effect of X on Y over time: E[γ1(X,t)]=E[γ1(X)]
2) Equal distribution of observed covariates between groups: Xi|Z=1d=Xi|Z=0
in addition to:
3) No differential time effect of X on Y by treatment group: E[γ2(X,t)]=0
4) No unobserved time-varying effects: λt1=λt0
Cond. 2 can hold through matching
Cond. 3 and 4 can be tested with sensitivity analysis
One main issue with the previous test → Underpowered
Rambachan & Roth (2023) propose sensitivity bounds to allow pre-trends violations:
One main issue with the previous test → Underpowered
Rambachan & Roth (2023) propose sensitivity bounds to allow pre-trends violations:

Simulations
For linear and quadratic functions:
S1: No interaction between X and t
S2: Equal interaction between X and t
S3: Differential interaction between X and t
S4: S3 + Bias cancellation
For linear and quadratic functions:
S1: No interaction between X and t
S2: Equal interaction between X and t
S3: Differential interaction between X and t
S4: S3 + Bias cancellation
| Parameter | Value |
|---|---|
| Number of obs (N) | 1,000 |
Pr(Z=1) |
0.5 |
| Time periods (T) | 8 |
| Last pre-intervention period (T_0) | 4 |
| Matching PS | Nearest neighbor (using calipers) |
| MIP Matching tolerance | .01 SD |
| Number of simulations | 1,000 |
Application
Universal flat voucher scheme 2008⟶ Universal + preferential voucher scheme
Preferential voucher scheme:
Targeted to bottom 40% of vulnerable students
Additional 50% of voucher per student
Additional money for concentration of SEP students.
Universal flat voucher scheme 2008⟶ Universal + preferential voucher scheme
Preferential voucher scheme:
Targeted to bottom 40% of vulnerable students
Additional 50% of voucher per student
Additional money for concentration of SEP students.
Students:
- Verify SEP status
- Attend a SEP school
Schools:
- Opt-into the policy
- No selection, no fees
- Resources ~ performance
Prior to matching: No parallel pre-trend
Different types of schools:
Schools that charge high co-payment fees.
Schools with low number of SEP student enrolled.
MIP Matching using constant or "sticky" covariates:
Mean balance (0.025 SD): Enrollment, average yearly subsidy, number of voucher schools in county, charges add-on fees
Exact balance: Geographic province

Matched schools:
Matched schools:
9pp increase in the income gap between SEP and non-SEP schools in matched DD:
SEP schools attracted even more vulnerable students.
Non-SEP schools increased their average family income.
Matched schools:
9pp increase in the income gap between SEP and non-SEP schools in matched DD:
SEP schools attracted even more vulnerable students.
Non-SEP schools increased their average family income.
No evidence of increase in SIMCE score:
Matched schools:
9pp increase in the income gap between SEP and non-SEP schools in matched DD:
SEP schools attracted even more vulnerable students.
Non-SEP schools increased their average family income.
No evidence of increase in SIMCE score:
Findings in segregation are moderately robust to hidden bias (Keele et al., 2019):
Γc=1.76 → Unobserved confounder would have to change the probability of assignment from 50% vs 50% to 32.7% vs 67.3%.
Allows up to 70% of the maximum deviation in the pre-intervention period (M = 0.7) vs 50% without matching (Rambachan & Roth, 2023)
Let's wrap it up

Magdalena Bennett
McCombs School of Business, The University of Texas at Austin
AEFP 50th Conference, Washington DC
March 13th, 2025
One drawback of the previous method is that it can overstate (or understate) the robustness of findings if the point estimate is biased.
One drawback of the previous method is that it can overstate (or understate) the robustness of findings if the point estimate is biased.
Matching can reduce the overall bias of the point estimate
One drawback of the previous method is that it can overstate (or understate) the robustness of findings if the point estimate is biased.
Matching can reduce the overall bias of the point estimate
Match on covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.
Match on covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.
This paper:
Match on time-invariant covariates that could make groups behave differently.
Use of Mixed-Integer Programming (MIP) Matching (Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020):
Balance covariates directly
Yield largest matched sample under balancing constraints (cardinality matching)
Works fast with large samples

Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |