mbennett_ddmatchv2.knit

class: inverse, center, middle

<h1 class="title-own">Difference-in-Differences using Mixed-Integer Programming Matching Approach</h1>

.small[Magdalena Bennett&nbsp;&nbsp;&nbsp; *McCombs School of Business, The University of Texas at Austin*&nbsp;&nbsp;&nbsp;]

.small[AEFP 50th Conference, Washington DC March 13th, 2025]

---
# Diff-in-Diff as an identification strategy

.center[
![](images/dd-1.svg)
]

---
# Parallel trend assumption (PTA)

.center[
![](images/dd2-1.svg)
]

---
# Estimate Average Treatment Effect on the Treated (ATT)

.center[
![](images/dd3-1.svg)
]

---
# But what if the PTA doesn't hold?

.pull-left[
.center[
![](images/po1-1.svg)
]
]

---
# But what if the PTA doesn't hold?

.box-6trans[We can potentially remove [part of] the bias by matching on Xsit=Xi]

.pull-left[
.center[
![](images/po1-1.svg)
]
]

.pull-right[
.center[
![](images/po3-1.svg)
]
]

---

# This paper

- Identify contexts when matching can recover causal estimates under **.darkorange[certain violations of the parallel trend assumption]**.

- Overall bias reduction and increase in robustness for sensitivity analysis.

- Use **.darkorange[mixed-integer programming matching (MIP)]** to balance covariates directly.

--

.pull-left[
.box-6trans[**Simulations:** Different DGP scenarios]
]

.pull-right[
.box-6trans[**Application:** School segregation & vouchers]
]

---

background-position: 50% 50%
class: left, bottom, inverse
.big[
Let's set up the problem
 
 
]

---

# DD Setup

- Let `$Y_{it}(z)$` be the potential outcome for unit `$i$` in period `$t$` under treatment `$z$`.

- Intervention implemented in `$T_0$` `$\rightarrow$` No units are treated in `$t\leq T_0$`
--

- Difference-in-Differences (DD) focuses on ATT for `$t>T_0$`:

`$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$`

---

# DD Setup

- Let `$Y_{it}(z)$` be the potential outcome for unit `$i$` in period `$t$` under treatment `$z$`.

- Intervention implemented in `$T_0$` `$\rightarrow$` No units are treated in `$t\leq T_0$`

- Difference-in-Differences (DD) focuses on ATT for `$t>T_0$`:

`$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$`

- Under the PTA:

`$$\begin{align}
\hat{\tau}^{DD} = &\color{#FFC857}{\overbrace{\color{black}{E[Y_{i1}|Z=1] - E[Y_{i1}|Z=0]}}^{\color{#FFC857}{\Delta_{post}}}} - \\
&\color{#CBB3BF}{\underbrace{\color{black}{(E[Y_{i0}|Z=1] - E[Y_{i0}|Z=0])}}_{\color{#CBB3BF}{\Delta_{pre}}}}
\end{align}$$`

---
# Bias in a DD setting

Bias can be introduced to DD in different ways:

1) **.darkorange[Time-invariant covariates with time-varying effects]**: *Obs. Bias*
  - e.g. Effect of gender on salaries.
  
--

2) **.darkorange[Differential time-varying effects]**: *Obs. Diff. Bias*

- e.g. Effect of race on salaries evolve differently over time by group.

3) **.darkorange[Observed or unobserved time-varying covariates]**: *Unobs. Bias*

- e.g. Test scores

---
# If the PTA holds...

`$$\begin{array}{rcc}
\overbrace{(\bar{\gamma}_1(X^1,t') - \bar{\gamma}_1(X^0,t')) - (\bar{\gamma}_1(X^1,t) - \bar{\gamma}_1(X^0,t))}^{Obs. Bias} +& \\ \underbrace{(\bar{\gamma}_2(X^1,t') -  
\bar{\gamma}_2(X^1,t))}_{Obs. Diff. Bias} + \underbrace{(\lambda_{t'1}-\lambda_{t'0}) - (\lambda_{t1} - \lambda_{t0})}_{Unobs. Bias}&= 0 \\
\end{array}$$`

--
.small[
**.darkorange[One of the two]** conditions need to hold:

1) No effect or constant effect of `$X$` on `$Y$` over time: `$\mathbb{E}[\gamma_1(X,t)] = \mathbb{E}[\gamma_1(X)]$`

2) Equal distribution of observed covariates between groups: `$X_i|Z=1 \overset{d}{=} X_i|Z=0$`
]
--

.small[
**.darkorange[in addition to]**:

3) No differential time effect of `$X$` on `$Y$` by treatment group: `$\mathbb{E}[\gamma_2(X,t)] = 0$`

4) No unobserved time-varying effects: `$\lambda_{t1} = \lambda_{t0}$`
]
--

.small[
.pull-left[
.box-6trans[**Cond. 2** can hold through **matching**]
]]
--
.small[
.pull-right[
.box-6trans[**Cond. 3 and 4** can be tested with **sensitivity analysis**]
]
]
---
# Sensitivity analysis for Diff-in-Diff

- In an event study `$\rightarrow$` null effects prior to the intervention:

.center[
![](images/dd_placebo3-1.svg)
]

---
# Honest approach to test pretrends

- One main issue with the previous test `$\rightarrow$` **.darkorange[Underpowered]**

- Rambachan & Roth (2023) propose **.darkorange[sensitivity bounds]** to allow pre-trends violations:

- E.g. Violations in the post-intervention period can be _at most_ `$M$` times the max violation in the pre-intervention period.

.center[
![:scale 80%](images/rambachan_roth2023.png)]

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Simulations
 
 
]

---
# Different scenarios

For linear and quadratic functions:

.box-1trans[S1: No interaction between X and t]

.box-2trans[S2: Equal interaction between X and t]

.box-3trans[S3: Differential interaction between X and t]

.box-4trans[S4: S3 + Bias cancellation]

--

- For all scenarios, differential distribution of covariates `$X$` between groups

---
#Parameters:

.center[
Parameter                            | Value
-------------------------------------|----------------------------------------------
Number of obs (N)                               | 1,000 
`Pr(Z=1)`              | 0.5 
Time periods (T)          | 8 
Last pre-intervention period (T_0)          | 4 
Matching PS          | Nearest neighbor (using calipers)
MIP Matching tolerance          | .01 SD
Number of simulations | 1,000
]

- Estimate compared to sample ATT (_can be different for matching_)

---
# S1 - No interaction between X and t

.center[
![Event study estimates by time period (wrt T=4) for no interaction between X and t](images/effect_beta_iter2_constant_effect1.svg)]

---
# S2 - Equal interaction between X and t by treatment

.center[
![Event study estimates by time period (wrt T=4) for equal interaction between X and t](images/effect_beta_iter6_constant_effect1.svg)]

---
# S3 - Differential interaction between X and t by treatment

.center[
![Event study estimates by time period (wrt T=4) for differential interaction between X and t](images/effect_beta_iter10_constant_effect1.svg)]

---
# Why is this bias reduction important?

- Example of S2 (Quadratic) with no true effect:

.center[
![Relative Magnitude Sensitivity Bounds on relative magnitudes for Scenario 2 (quadratic) - No effect](images/effect_beta_iter6_quad1_constant_effect0.svg)
]

---
# Why is this bias reduction important?

- Even under modest bias, we would incorrectly reject the null 20% of the time.

.center[
![Rejection rate of null hypothesis for different values of `$\beta_x_t$`](images/coverage_beta_quad1_original_PE_ppt.svg)]

---
# Why is this bias reduction important?

- Sensitivity analysis results are skewed by the magnitude of the bias.

.center[
![](images/coverage_beta_quad1_M_ppt.svg)]

---
# S4: Bias cancellation

.center[
![](images/effect_beta_iter2_constant_break.svg)
]

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Application
 
 
]
---
#Preferential Voucher Scheme in Chile

- Universal **.darkorange[flat voucher]** scheme `$\stackrel{\mathbf{2008}}{\mathbf{\longrightarrow}}$` Universal + **.darkorange[preferential voucher]** scheme

- Preferential voucher scheme:
  - Targeted to bottom 40% of vulnerable students

- Additional 50% of voucher per student

- Additional money for concentration of SEP students.

.pull-left[
.center[
.box-6trans[**Students:** - Verify SEP status - Attend a SEP school]
]
]

.pull-right[
.center[
.box-6trans[**Schools:** - Opt-into the policy - No selection, no fees - Resources ~ performance]
]
]

---
#Before matching: Household income

.pull-left[
![](images/evolution_hh_income_all_ppt.svg)
]

.pull-right[
![](images/hh_income_event_all_ppt.svg)
]

---
#Before matching: Average SIMCE

.pull-left[
![](images/evolution_simce_all_ppt.svg)
]

.pull-right[
![](images/avg_simce_event_all_ppt.svg)
]

---
# Matching + DD

- **.darkorange[Prior to matching]**: No parallel pre-trend

- **.darkorange[Different types of schools]**:

- Schools that charge high co-payment fees.
  
  - Schools with low number of SEP student enrolled.

- **.darkorange[MIP Matching]** using constant or "sticky" covariates:

- Mean balance (0.025 SD): Enrollment, average yearly subsidy, number of voucher schools in county, charges add-on fees
  
  - Exact balance: Geographic province
  
---
# Groups are balanced in specific characteristics

.center[
![:scale 65%](images/loveplot.svg)]

---
# Matching in 16 out of 53 provinces

.center[
![:scale 37%](images/map2b.png)]

---
# After matching: Household income

.pull-left[
![](images/evolution_hh_income_match_ppt.svg)
]

.pull-right[
![](images/hh_income_event_matched_ppt.svg)
]

---
#After matching: Average SIMCE

.pull-left[
![](images/evolution_simce_match_ppt.svg)
]

.pull-right[
![](images/avg_simce_event_matched_ppt.svg)
]

---
#Results

- **.darkorange[Matched schools]**:

- More vulnerable and lower test scores than the population mean.
--

- **.darkorange[9pp increase in the income gap]** between SEP and non-SEP schools in matched DD:

- SEP schools attracted even more vulnerable students.
  
  - Non-SEP schools increased their average family income.
--

- **.darkorange[No evidence of increase in SIMCE score]**:
  
  - Could be a longer-term outcome. 
--

- Findings in segregation are **.darkorange[moderately robust to hidden bias]** (Keele et al., 2019):
  
  - `$\Gamma_c = 1.76$` `$\rightarrow$` Unobserved confounder would have to change the probability of assignment from 50% vs 50% to 32.7% vs 67.3%.
  
  - Allows up to 70% of the maximum deviation in the pre-intervention period (*M = 0.7*) vs 50% without matching (Rambachan & Roth, 2023)
  
---
# Potential reasons?

- Increase in probability of becoming SEP in 2009 **.darkorange[jumps discontinuously at 60%]** of SEP student concentration in 2008 (4.7 pp; SE = 0.024)

.center[
![:scale 65%](images/rd_sep_concentration_ppt.svg)]

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Let's wrap it up
 
 
]
---
# Conclusions and Next Steps

.pull-left[
- Matching can be an important tool to address **.darkorange[violations in PTA]**.

- **.darkorange[Bias reduction]** is very important for sensitivity analysis.

- **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise.

- Next steps: Partial identification using time-varying covariates]

.pull-right[
.center[
![](https://media.giphy.com/media/drwxYI2fxqQGqRZ9Pe/giphy.gif)]
]

---
class: inverse, center, middle

<h1 class="title-own">Difference-in-Differences using Mixed-Integer Programming Matching Approach</h1>

.small[Magdalena Bennett&nbsp;&nbsp;&nbsp; *McCombs School of Business, The University of Texas at Austin*&nbsp;&nbsp;&nbsp;]

.small[AEFP 50th Conference, Washington DC March 13th, 2025]

---
# Honest approach to test pretrends

- One drawback of the previous method is that it can **.darkorange[overstate]** (or understate) the robustness of findings if the point estimate is biased.

- Honest CIs depend on the **.darkorange[magnitude of the point estimate]** as well as the **.darkorange[pre-trend violations]**.
 
--

- Matching can **.darkorange[reduce the overall bias]** of the point estimate

--
.center[
![](images/M_plot_beta6_ppt.svg)
]

---
# How do we match?

- Match on covariates or outcomes? Levels or trends?

- Propensity score matching? Optimal matching? etc.

This paper:

- **.darkorange[Match on time-invariant covariates]** that could make groups behave differently.
  
  - Use distribution of  covariates to match on a template.

- Use of **.darkorange[Mixed-Integer Programming (MIP) Matching]** .small[(Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020)]:

- Balance covariates directly
  
  - Yield largest matched sample under balancing constraints (cardinality matching)
  
  - Works fast with large samples

---
# Data Generating Processes

.center[
![:scale 100%](images/table_sim.png)
]

---
# SEP adoption over time

.center[
![](images/enrollment_sep_year_ppt.svg)
]