A Difference-in-Differences Approach
using Mixed-Integer Programming MatchingMagdalena BennettSDS Seminar Series, UT Austin
Oct 16, 20201 / 33

Diff-in-Diff as an identification strategy

2 / 33

Diff-in-Diff as an identification strategy

3 / 33

Diff-in-Diff as an identification strategy

4 / 33

Very popular for policy evaluation

Source: Google Scholar

5 / 33

What about parallel trends?

Can matching work to solve this?
- It's complicated (?) (Zeldow & Hatfield, 2019;Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)
Most work has focused on matching outcomes

6 / 33

This paper

Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
Use mixed-integer programming matching (MIP) to balance covariates directly.
Matching for panel and repeated cross-sectional data.

7 / 33

This paper

Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
Use mixed-integer programming matching (MIP) to balance covariates directly.
Matching for panel and repeated cross-sectional data.

Simulations:
Different DGP scenarios

Application:
School segregation & vouchers

7 / 33

Let's get started

8 / 33

DD Setup

Let $Y_{i}^{z} (t)$ be the potential outcome for unit $i$ in period $t$ under treatment $z$ .
Intervention implemented in $T_{0}$ $\to$ No units are treated in $t \leq T_{0}$
Difference-in-Differences (DD) focuses on ATT for $t > T_{0}$ : $A T T = E [Y_{i}^{1} (t) - Y_{i}^{0} (t) | Z = 1]$
Assumptions for DD:
- Parallel-trend assumption (PTA)
- Common shocks
$E [Y_{i}^{0} (1) - Y_{i}^{0} (0) | Z = 1] = E [Y_{i}^{0} (1) - Y_{i}^{0} (0) | Z = 0]$

9 / 33

DD Setup (cont.)

Under these assumptions: $\begin{aligned} {\hat{τ}}^{D D} = & \overset{Δ_{p o s t}}{\overset{⏞}{E [Y (1) | Z = 1] - E [Y (1) | Z = 0]}} - \\ \underset{Δ_{p r e}}{\underset{⏟}{(E [Y (0) | Z = 1] - E [Y (0) | Z = 0])}} \end{aligned}$
- Where $t = 0$ and $t = 1$ are the pre- and post-intervention periods, respectively.
- $Y (t) = Y^{1} (t) \cdot Z + (1 - Z) \cdot Y^{0} (t)$ is the observed outcome.

10 / 33

Violations to the PTA

Under PTA, $g_{1} (t) = g_{0} (t) + h (t)$ , where:
- $g_{z} (t) = E [Y_{i}^{0} (t) | Z = z, T = t]$
- $h (t) = α$
Bias in a DD setting depends on the structure of $h (t)$ .
Confounding in DD affect trends and not levels.
Contextual knowledge is important!
- Do groups come from different populations?

11 / 33

How do we match?

Match covariates or outcomes? Levels or trends?
Use of MIP Matching (Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020):
- Balance covariates directly
- Yield largest matched sample under balancing constraints
- Use of template matching to match multiple groups
- Works with large samples

12 / 33

Panel or repeated cross-sections?

Panel data: Straightforward
Repeated cross-section data: Representative template matching

diagram

13 / 33

Simulations

14 / 33

Different scenarios

S1: Time-invariant covariate effect

S2: Time-varying covariate effect

S3: Treatment-independent covariate

S4: Parallel evolution

S5: Evolution differs by group

S6: Evolution diverges in post

Following Zeldow & Hatfield (2019)

15 / 33

Different ways to control

Model	Pseudo `R` code
Simple	`lm(y ~ a*p + t)`
Covariate Adjusted (CA)	`lm(y ~ a*p + t + x)`
Time-Varying Adjusted (TVA)	`lm(y ~ ap + tx)`
Match on pre-treat outcomes	`lm(y ~ a*p + t, data=out.match)`
Match on pre-treat 1st diff	`lm(y ~ a*p + t, data=out.lag.match)`
Match on pre-treat cov (PS)	`lm(y ~ a*p + t, data=cov.match)`
Match on pre-treat cov (MIP)	`Event study (data=cov.match.mip)`
Match on all cov (MIP)	`Event study (data=cov.match.mip.all)`

Following Zeldow & Hatfield (2019)

16 / 33

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

17 / 33

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

S2: Time-varying covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

17 / 33

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

S2: Time-varying covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S3: Treatment-independent covariate

$X_{i} \overset{i n d}{\sim} N (1, 1)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

17 / 33

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

18 / 33

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S5: Evolution differs by group

$X_{i t} = x_{(t - 1) i} + m_{2} (z_{i}, t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

18 / 33

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S5: Evolution differs by group

$X_{i t} = x_{(t - 1) i} + m_{2} (z_{i}, t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S6: Evolution diverges in post

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z - m_{3} (z_{i}, t)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

18 / 33

Covariate evolution: Time-invariant

diagram

19 / 33

Covariate evolution: Time-varying

diagram

20 / 33

Parameters:

Parameter
Value


Number of obs (N)
1,000 

Pr(Z=1)
0.5 

Time periods (T)
10 

Last pre-intervention period (T_0)
5 

Matching PS
Nearest neighbor

MIP Matching tolerance
.05 SD

Number of simulations
1,000


Estimate compared to sample ATT (different for matching)
When matching with post-treat covariates →→ compared with direct effect ττ
21 / 33

Parameter	Value
Number of obs (N)	1,000
`Pr(Z=1)`	0.5
Time periods (T)	10
Last pre-intervention period (T_0)	5
Matching PS	Nearest neighbor
MIP Matching tolerance	.05 SD
Number of simulations	1,000

Results: Time-constant effects

22 / 33

Results: Time-varying effects

23 / 33

Other simulations

Test regression to the mean under no effect:
- Vary autocorrelation of $X_{i} (t)$ (low vs. high)
- $X_{0} (t)$ and $X_{1} (t)$ come from the same or different distribution.

24 / 33

Application

25 / 33

Preferential Voucher Scheme in Chile

Universal flat voucher scheme $\overset{2008}{⟶}$ Universal + preferential voucher scheme
Preferential voucher scheme:
- Targeted to bottom 40% of vulnerable students
- Additional 50% of voucher per student
- Additional money for concentration of SEP students.

26 / 33

Preferential Voucher Scheme in Chile

Universal flat voucher scheme $\overset{2008}{⟶}$ Universal + preferential voucher scheme
Preferential voucher scheme:
- Targeted to bottom 40% of vulnerable students
- Additional 50% of voucher per student
- Additional money for concentration of SEP students.
  
  Students:
  - Verify SEP status
  - Attend a SEP school

Schools:
- Opt-into the policy
- No selection, no fees
- Resources ~ performance

26 / 33

Impact of the SEP policy

Positive impact on test scores for lower-income students (Aguirre, 2019; Nielson, 2016)
Design could have increased socioeconomic segregation
- Incentives for concentration of SEP students
Key decision variables: Performance, current SEP students, competition, add-on fees.
Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools:
- Only for private-subsidized schools
- Matching between 2005-2007 --> Effect estimated for 2008-2011
- Outcome: Average students' household income

27 / 33

Before Matching

diagram

No (pre) parallel trend
Covariates evolve differently in the pre-intervention period

28 / 33

[Pre] parallel trends

diagram

29 / 33

After Matching

diagram

MIP Matching:
- Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees
- Fine balance: Test scores, monthly average voucher.
6% increase in the income gap between SEP and non-SEP schools

30 / 33

Let's wrap it up

31 / 33

Conclusions

Matching can be an important tool to address violations in PTA.
Relevant to think whether groups come from the same or different populations.
Serial correlation also plays an important role: Don't match on random noise.
Adopt flexibility when estimating effects (event study)

Match well and match smart!

32 / 33

A Difference-in-Differences Approach
using Mixed-Integer Programming MatchingMagdalena Bennett33 / 33

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

A Difference-in-Differences Approachusing Mixed-Integer Programming Matching

Magdalena Bennett

SDS Seminar Series, UT AustinOct 16, 2020

Diff-in-Diff as an identification strategy

Diff-in-Diff as an identification strategy

Diff-in-Diff as an identification strategy

Very popular for policy evaluation

What about parallel trends?

This paper

This paper

DD Setup

DD Setup (cont.)

Violations to the PTA

How do we match?

Panel or repeated cross-sections?

Different scenarios

Different ways to control

Time-invariant Covariates

Time-invariant Covariates

Time-invariant Covariates

Time-varying Covariates

Time-varying Covariates

Time-varying Covariates

Covariate evolution: Time-invariant

Covariate evolution: Time-varying

Parameters:

Results: Time-constant effects

Results: Time-varying effects

Other simulations

Preferential Voucher Scheme in Chile

Preferential Voucher Scheme in Chile

Impact of the SEP policy

Before Matching

[Pre] parallel trends

After Matching

Conclusions

A Difference-in-Differences Approachusing Mixed-Integer Programming Matching

Magdalena Bennett

Diff-in-Diff as an identification strategy

Help

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching

SDS Seminar Series, UT Austin
Oct 16, 2020

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching