A Difference-in-Differences Approach
using Mixed-Integer Programming MatchingMagdalena Bennett
 McCombs School of Business, UT AustinASA Austin Chapter Meeting
June 09, 20211 / 44

Diff-in-Diff as an identification strategy

2 / 44

Diff-in-Diff as an identification strategy

3 / 44

Diff-in-Diff as an identification strategy

4 / 44

Very popular for policy evaluation

Source: Google Scholar

5 / 44

What about parallel trends?

6 / 44

What about parallel trends?

Bounds on treatment effects (Rambachan & Roth, 2020)

7 / 44

What about parallel trends?

Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
- Similar to synthetic control intuition.

8 / 44

What about parallel trends?

Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
- Similar to synthetic control intuition.
Can matching help?
- It's complicated (?) (Zeldow & Hatfield, 2019; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)

9 / 44

What about parallel trends?

Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
- Similar to synthetic control intuition.
Can matching help?
- It's complicated (?) (Zeldow & Hatfield, 2019; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)

10 / 44

This paper

Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
- Partial identification in some cases.
Use mixed-integer programming matching (MIP) to balance covariates directly.

11 / 44

This paper

Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
- Partial identification in some cases.
Use mixed-integer programming matching (MIP) to balance covariates directly.

Simulations:
Different DGP scenarios

Application:
School segregation & vouchers

11 / 44

Let's get started

12 / 44

DD Setup

Let $Y_{i t} (z)$ be the potential outcome for unit $i$ in period $t$ under treatment $z$ .
Intervention implemented in $T_{0}$ $\to$ No units are treated in $t \leq T_{0}$

13 / 44

DD Setup

Let $Y_{i t} (z)$ be the potential outcome for unit $i$ in period $t$ under treatment $z$ .
Intervention implemented in $T_{0}$ $\to$ No units are treated in $t \leq T_{0}$
Difference-in-Differences (DD) focuses on ATT for $t > T_{0}$ :

$A T T = E [Y_{i t} (1) - Y_{i t} (0) | Z = 1]$

13 / 44

DD Setup

Let $Y_{i t} (z)$ be the potential outcome for unit $i$ in period $t$ under treatment $z$ .
Intervention implemented in $T_{0}$ $\to$ No units are treated in $t \leq T_{0}$
Difference-in-Differences (DD) focuses on ATT for $t > T_{0}$ :

$A T T = E [Y_{i t} (1) - Y_{i t} (0) | Z = 1]$

Assumptions for DD:
- Parallel-trend assumption (PTA)
- Common shocks
$E [Y_{i 1} (0) - Y_{i 0} (0) | Z = 1] = E [Y_{i 1} (0) - Y_{i 0} (0) | Z = 0]$

13 / 44

DD Setup (cont.)

Under these assumptions: $\begin{aligned} {\hat{τ}}^{D D} = & \overset{Δ_{p o s t}}{\overset{⏞}{E [Y (1) | Z = 1] - E [Y (1) | Z = 0]}} - \\ \underset{Δ_{p r e}}{\underset{⏟}{(E [Y (0) | Z = 1] - E [Y (0) | Z = 0])}} \end{aligned}$
- Where $t = 0$ and $t = 1$ are the pre- and post-intervention periods, respectively.
- $Y (t) = Y^{1} (t) \cdot Z + (1 - Z) \cdot Y^{0} (t)$ is the observed outcome.

14 / 44

Violations to the PTA

Under PTA, $g_{1} (t) = g_{0} (t) + h (t)$ , where:
- $g_{z} (t) = E [Y_{i t} (0) | Z = z, T = t]$
- $h (t) = α$

15 / 44

Violations to the PTA

Under PTA, $g_{1} (t) = g_{0} (t) + h (t)$ , where:
- $g_{z} (t) = E [Y_{i t} (0) | Z = z, T = t]$
- $h (t) = α$
Bias in a DD setting depends on the structure of $h (t)$ .
Confounding in DD affect trends and not levels.

16 / 44

Violations to the PTA

Under PTA, $g_{1} (t) = g_{0} (t) + h (t)$ , where:
- $g_{z} (t) = E [Y_{i t} (0) | Z = z, T = t]$
- $h (t) = α$
Bias in a DD setting depends on the structure of $h (t)$ .
Confounding in DD affect trends and not levels.
Contextual knowledge is important!

17 / 44

Two distinct problems when combining matching + DD

Bias when matching on time-varying covariates:
- Depends on the structure of time variation
Regression to the mean:
- Both groups come from different populations
- Particularly salient when matching on previous outcomes and small number of pre-periods.

diagram

18 / 44

How do we match?

Match covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.

19 / 44

How do we match?

Match covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.

This paper:

Match on covariates that could make groups behave differently.
Use of Mixed-Integer Programming (MIP) Matching (Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020):
- Balance covariates directly
- Yield largest matched sample under balancing constraints (cardinality matching)
- Works with large samples

19 / 44

Simulations

20 / 44

Different scenarios

Time-invariant covariates:

S1: Time-invariant covariate effect

S2: Time-varying covariate effect

S3: Treatment-independent covariate

21 / 44

Different scenarios

Time-invariant covariates:

S1: Time-invariant covariate effect

S2: Time-varying covariate effect

S3: Treatment-independent covariate

Time-varying covariates:

S4: Parallel evolution

S5: Evolution differs by group

S6: Evolution diverges in post

Following Zeldow & Hatfield (2019)

21 / 44

Time-invariant covariates

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

22 / 44

Time-invariant covariates

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S1) Time-invariant covariate effect: g(x_i,t) = 0

S2) Time-varying covariate effect: g(x_i,t) ≠ 0

S3) Time-varying covariate effect: m(z_i) = μ and v(z_i) = σ

22 / 44

Time-varying covariates

$X_{i t} = x_{(t - 1) i} + h (z_{i}, t) \cdot r_{i} + m (z_{i}, t)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

23 / 44

Time-varying covariates

$X_{i t} = x_{(t - 1) i} + h (z_{i}, t) \cdot r_{i} + m (z_{i}, t)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S4) Parallel evolution: h(z_i,t) = h(t) and m(z_i,t) = 0

S5) Evolution differs by group: m(z_i,t) = 0

S6) Evolution differs in post: h(z_i,t) = h(t) and m(z_i,t) = Post*m(z_i,t)

23 / 44

Different ways to control

Model	Pseudo `R` code
Simple	`lm(y ~ a*p + t)`
Covariate Adjusted (CA)	`lm(y ~ a*p + t + x)`
Time-Varying Adjusted (TVA)	`lm(y ~ ap + tx)`
Match on pre-treat outcomes	`lm(y ~ a*p + t, data=out.match)`
Match on pre-treat 1st diff	`lm(y ~ a*p + t, data=out.lag.match)`
Match on pre-treat cov (PS)	`lm(y ~ a*p + t, data=cov.match)`
Match on pre-treat cov (MIP)	`Event study (data=cov.match.mip)`
Match on all cov (MIP)	`Event study (data=cov.match.mip.all)`

Following Zeldow & Hatfield (2019)

24 / 44

Parameters:

Parameter
Value


Number of obs (N)
1,000 

Pr(Z=1)
0.5 

Time periods (T)
10 

Last pre-intervention period (T_0)
5 

Matching PS
Nearest neighbor

MIP Matching tolerance
.05 SD

Number of simulations
1,000


Estimate compared to sample ATT (different for matching)
When matching with post-treat covariates →→ compared with direct effect ττ
25 / 44

Parameter	Value
Number of obs (N)	1,000
`Pr(Z=1)`	0.5
Time periods (T)	10
Last pre-intervention period (T_0)	5
Matching PS	Nearest neighbor
MIP Matching tolerance	.05 SD
Number of simulations	1,000

Results: Time-constant covariates

26 / 44

Results: Time-varying covariates

27 / 44

Results: Time-varying covariates

In these simulations. for time-varying covariates:
- Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.

28 / 44

Results: Time-varying covariates

In these simulations. for time-varying covariates:
- Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
- Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

28 / 44

Results: Time-varying covariates

In these simulations. for time-varying covariates:
- Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
- Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

We don't know in which scenario we are

28 / 44

Results: Time-varying covariates

In these simulations. for time-varying covariates:
- Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
- Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

We don't know in which scenario we are

Matching on pre- and post-intervention covariates returns the direct effect of the treatment on the outcome
Depending on the context, this could be an upper or lower bound for the true effect.

28 / 44

Other simulations

Test regression to the mean under no effect:
- Vary autocorrelation of $X_{i} (t)$ (low vs. high)
- $X_{0} (t)$ and $X_{1} (t)$ come from the same or different distribution.

29 / 44

Application

30 / 44

Preferential Voucher Scheme in Chile

Universal flat voucher scheme $\overset{2008}{⟶}$ Universal + preferential voucher scheme
Preferential voucher scheme:
- Targeted to bottom 40% of vulnerable students
- Additional 50% of voucher per student
- Additional money for concentration of SEP students.

31 / 44

Preferential Voucher Scheme in Chile

Universal flat voucher scheme $\overset{2008}{⟶}$ Universal + preferential voucher scheme
Preferential voucher scheme:
- Targeted to bottom 40% of vulnerable students
- Additional 50% of voucher per student
- Additional money for concentration of SEP students.
  
  Students:
  - Verify SEP status
  - Attend a SEP school

Schools:
- Opt-into the policy
- No selection, no fees
- Resources ~ performance

31 / 44

Impact of the SEP policy

Positive impact on test scores for lower-income students (Aguirre, 2019; Nielson, 2016)
Design could have increased socioeconomic segregation
- Incentives for concentration of SEP students
Key decision variables for schools: Performance, current SEP students, competition, add-on fees.
Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools:
- Only for private-subsidized schools
- Matching between 2005-2007 --> Effect estimated for 2008-2011
- Outcome: Average students' household income

32 / 44

Before Matching

diagram

33 / 44

Matching + DD

Prior to matching: No parallel pre-trend, covariates evolve differently for both groups.
Different types of schools:
- Schools that charge high co-payment fees.
- Schools with low number of SEP student enrolled.
MIP Matching using constant or "sticky" covariates:
- Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees
- Fine balance: Test scores, monthly average voucher.

34 / 44

After matching

diagram

35 / 44

Results

Matched schools:
- More vulnerable and lower test scores than the population mean.
6% increase in the income gap between SEP and non-SEP schools in matched DD:
- SEP schools attracted even more vulnerable students.
- Non-SEP schools increased their average family income.

36 / 44

Results

Matched schools:
- More vulnerable and lower test scores than the population mean.
6% increase in the income gap between SEP and non-SEP schools in matched DD:
- SEP schools attracted even more vulnerable students.
- Non-SEP schools increased their average family income.
There is a need to evaluate the policy as a whole.
- Unintended consequences also matter.

36 / 44

Let's wrap it up

37 / 44

Conclusions

Matching can be an important tool to address violations in PTA.
Relevant to think whether groups come from the same or different populations.
Serial correlation also plays an important role: Don't match on random noise.

38 / 44

Conclusions

Matching can be an important tool to address violations in PTA.
Relevant to think whether groups come from the same or different populations.
Serial correlation also plays an important role: Don't match on random noise.

Match well and match smart!

39 / 44

A Difference-in-Differences Approach
using Mixed-Integer Programming MatchingMagdalena Bennett40 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

41 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

S2: Time-varying covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

41 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t), 1)$

S2: Time-varying covariate effect

$X_{i} \overset{i n d}{\sim} N (m (z_{i}), v (z_{i}))$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S3: Treatment-independent covariate

$X_{i} \overset{i n d}{\sim} N (1, 1)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

41 / 44

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

42 / 44

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S5: Evolution differs by group

$X_{i t} = x_{(t - 1) i} + m_{2} (z_{i}, t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

42 / 44

Time-varying Covariates

S4: Parallel evolution

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S5: Evolution differs by group

$X_{i t} = x_{(t - 1) i} + m_{2} (z_{i}, t) \cdot z$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

S6: Evolution diverges in post

$X_{i t} = x_{(t - 1) i} + m_{1} (t) \cdot z - m_{3} (z_{i}, t)$ $Y_{i} (t) \overset{i n d}{\sim} N (1 + z_{i} + t r e a t_{i t} + u_{i} + x_{i} + f (t) + g (x_{i}, t), 1)$

42 / 44

Covariate evolution: Time-invariant

diagram

43 / 44

Covariate evolution: Time-varying

diagram

44 / 44

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Toggle scribble toolbox

A Difference-in-Differences Approachusing Mixed-Integer Programming Matching

Magdalena Bennett McCombs School of Business, UT Austin

ASA Austin Chapter MeetingJune 09, 2021

Diff-in-Diff as an identification strategy

Diff-in-Diff as an identification strategy

Diff-in-Diff as an identification strategy

Very popular for policy evaluation

What about parallel trends?

What about parallel trends?

What about parallel trends?

What about parallel trends?

What about parallel trends?

This paper

This paper

DD Setup

DD Setup

DD Setup

DD Setup (cont.)

Violations to the PTA

Violations to the PTA

Violations to the PTA

Two distinct problems when combining matching + DD

How do we match?

How do we match?

Different scenarios

Different scenarios

Time-invariant covariates

Time-invariant covariates

Time-varying covariates

Time-varying covariates

Different ways to control

Parameters:

Results: Time-constant covariates

Results: Time-varying covariates

Results: Time-varying covariates

Results: Time-varying covariates

Results: Time-varying covariates

Results: Time-varying covariates

Other simulations

Preferential Voucher Scheme in Chile

Preferential Voucher Scheme in Chile

Impact of the SEP policy

Before Matching

Matching + DD

After matching

Results

Results

Conclusions

Conclusions

A Difference-in-Differences Approachusing Mixed-Integer Programming Matching

Magdalena Bennett

Time-invariant Covariates

Time-invariant Covariates

Time-invariant Covariates

Time-varying Covariates

Time-varying Covariates

Time-varying Covariates

Covariate evolution: Time-invariant

Covariate evolution: Time-varying

Diff-in-Diff as an identification strategy

Help

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching

Magdalena Bennett
McCombs School of Business, UT Austin

ASA Austin Chapter Meeting
June 09, 2021

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching