+ - 0:00:00
Notes for current slide
Notes for next slide

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching

Magdalena Bennett
McCombs School of Business, UT Austin

ASA Austin Chapter Meeting
June 09, 2021

1 / 44

Diff-in-Diff as an identification strategy

2 / 44

Diff-in-Diff as an identification strategy

3 / 44

Diff-in-Diff as an identification strategy

4 / 44

Very popular for policy evaluation

Source: Google Scholar

5 / 44

What about parallel trends?

6 / 44

What about parallel trends?

  • Bounds on treatment effects (Rambachan & Roth, 2020)
7 / 44

What about parallel trends?

  • Bounds on treatment effects (Rambachan & Roth, 2020)

  • Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)

    • Similar to synthetic control intuition.
8 / 44

What about parallel trends?

  • Bounds on treatment effects (Rambachan & Roth, 2020)

  • Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)

    • Similar to synthetic control intuition.
  • Can matching help?

    • It's complicated (?) (Zeldow & Hatfield, 2019; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)
9 / 44

What about parallel trends?

  • Bounds on treatment effects (Rambachan & Roth, 2020)

  • Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)

    • Similar to synthetic control intuition.
  • Can matching help?

    • It's complicated (?) (Zeldow & Hatfield, 2019; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)
10 / 44

This paper

  • Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.

    • Partial identification in some cases.
  • Use mixed-integer programming matching (MIP) to balance covariates directly.

11 / 44

This paper

  • Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.

    • Partial identification in some cases.
  • Use mixed-integer programming matching (MIP) to balance covariates directly.

    Simulations:
    Different DGP scenarios

Application:
School segregation & vouchers

11 / 44

Let's get started

12 / 44

DD Setup

  • Let Yit(z) be the potential outcome for unit i in period t under treatment z.

  • Intervention implemented in T0 No units are treated in tT0

13 / 44

DD Setup

  • Let Yit(z) be the potential outcome for unit i in period t under treatment z.

  • Intervention implemented in T0 No units are treated in tT0

  • Difference-in-Differences (DD) focuses on ATT for t>T0:

ATT=E[Yit(1)Yit(0)|Z=1]

13 / 44

DD Setup

  • Let Yit(z) be the potential outcome for unit i in period t under treatment z.

  • Intervention implemented in T0 No units are treated in tT0

  • Difference-in-Differences (DD) focuses on ATT for t>T0:

ATT=E[Yit(1)Yit(0)|Z=1]

  • Assumptions for DD:

    • Parallel-trend assumption (PTA)

    • Common shocks

    E[Yi1(0)Yi0(0)|Z=1]=E[Yi1(0)Yi0(0)|Z=0]

13 / 44

DD Setup (cont.)

  • Under these assumptions: τ^DD=E[Y(1)|Z=1]E[Y(1)|Z=0]Δpost(E[Y(0)|Z=1]E[Y(0)|Z=0])Δpre

    • Where t=0 and t=1 are the pre- and post-intervention periods, respectively.

    • Y(t)=Y1(t)Z+(1Z)Y0(t) is the observed outcome.

14 / 44

Violations to the PTA

  • Under PTA, g1(t)=g0(t)+h(t), where:

    • gz(t)=E[Yit(0)|Z=z,T=t]
    • h(t)=α

15 / 44

Violations to the PTA

  • Under PTA, g1(t)=g0(t)+h(t), where:

    • gz(t)=E[Yit(0)|Z=z,T=t]
    • h(t)=α
  • Bias in a DD setting depends on the structure of h(t).

  • Confounding in DD affect trends and not levels.

16 / 44

Violations to the PTA

  • Under PTA, g1(t)=g0(t)+h(t), where:

    • gz(t)=E[Yit(0)|Z=z,T=t]
    • h(t)=α
  • Bias in a DD setting depends on the structure of h(t).

  • Confounding in DD affect trends and not levels.

  • Contextual knowledge is important!

17 / 44

Two distinct problems when combining matching + DD

  • Bias when matching on time-varying covariates:

    • Depends on the structure of time variation
  • Regression to the mean:

    • Both groups come from different populations

    • Particularly salient when matching on previous outcomes and small number of pre-periods.

diagram

18 / 44

How do we match?

  • Match covariates or outcomes? Levels or trends?

  • Propensity score matching? Optimal matching? etc.

19 / 44

How do we match?

  • Match covariates or outcomes? Levels or trends?

  • Propensity score matching? Optimal matching? etc.

This paper:

  • Match on covariates that could make groups behave differently.

  • Use of Mixed-Integer Programming (MIP) Matching (Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020):

    • Balance covariates directly

    • Yield largest matched sample under balancing constraints (cardinality matching)

    • Works with large samples

19 / 44

Simulations

20 / 44

Different scenarios

Time-invariant covariates:

S1: Time-invariant covariate effect

S2: Time-varying covariate effect

S3: Treatment-independent covariate

21 / 44

Different scenarios

Time-invariant covariates:

S1: Time-invariant covariate effect

S2: Time-varying covariate effect

S3: Treatment-independent covariate

Time-varying covariates:

S4: Parallel evolution

S5: Evolution differs by group

S6: Evolution diverges in post



Following Zeldow & Hatfield (2019)

21 / 44

Time-invariant covariates

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

22 / 44

Time-invariant covariates

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S1) Time-invariant covariate effect: g(xi,t) = 0

S2) Time-varying covariate effect: g(xi,t) ≠ 0

S3) Time-varying covariate effect: m(zi) = μ and v(zi) = σ

22 / 44

Time-varying covariates

Xit=x(t1)i+h(zi,t)ri+m(zi,t) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

23 / 44

Time-varying covariates

Xit=x(t1)i+h(zi,t)ri+m(zi,t) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S4) Parallel evolution: h(zi,t) = h(t) and m(zi,t) = 0

S5) Evolution differs by group: m(zi,t) = 0

S6) Evolution differs in post: h(zi,t) = h(t) and m(zi,t) = Post*m(zi,t)

23 / 44

Different ways to control

Model Pseudo R code
Simple lm(y ~ a*p + t)
Covariate Adjusted (CA) lm(y ~ a*p + t + x)
Time-Varying Adjusted (TVA) lm(y ~ a*p + t*x)
Match on pre-treat outcomes lm(y ~ a*p + t, data=out.match)
Match on pre-treat 1st diff lm(y ~ a*p + t, data=out.lag.match)
Match on pre-treat cov (PS) lm(y ~ a*p + t, data=cov.match)
Match on pre-treat cov (MIP) Event study (data=cov.match.mip)
Match on all cov (MIP) Event study (data=cov.match.mip.all)

Following Zeldow & Hatfield (2019)

24 / 44

Parameters:

Parameter Value
Number of obs (N) 1,000
Pr(Z=1) 0.5
Time periods (T) 10
Last pre-intervention period (T_0) 5
Matching PS Nearest neighbor
MIP Matching tolerance .05 SD
Number of simulations 1,000
  • Estimate compared to sample ATT (different for matching)
  • When matching with post-treat covariates compared with direct effect τ
25 / 44

Results: Time-constant covariates

26 / 44

Results: Time-varying covariates

27 / 44

Results: Time-varying covariates

  • In these simulations. for time-varying covariates:

    • Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
28 / 44

Results: Time-varying covariates

  • In these simulations. for time-varying covariates:

    • Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.

    • Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

28 / 44

Results: Time-varying covariates

  • In these simulations. for time-varying covariates:

    • Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.

    • Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

We don't know in which scenario we are

28 / 44

Results: Time-varying covariates

  • In these simulations. for time-varying covariates:

    • Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.

    • Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.

We don't know in which scenario we are

  • Matching on pre- and post-intervention covariates returns the direct effect of the treatment on the outcome

  • Depending on the context, this could be an upper or lower bound for the true effect.

28 / 44

Other simulations

  • Test regression to the mean under no effect:

    • Vary autocorrelation of Xi(t) (low vs. high)
    • X0(t) and X1(t) come from the same or different distribution.

29 / 44

Application

30 / 44

Preferential Voucher Scheme in Chile

  • Universal flat voucher scheme 2008 Universal + preferential voucher scheme

  • Preferential voucher scheme:

    • Targeted to bottom 40% of vulnerable students

    • Additional 50% of voucher per student

    • Additional money for concentration of SEP students.

31 / 44

Preferential Voucher Scheme in Chile

  • Universal flat voucher scheme 2008 Universal + preferential voucher scheme

  • Preferential voucher scheme:

    • Targeted to bottom 40% of vulnerable students

    • Additional 50% of voucher per student

    • Additional money for concentration of SEP students.

      Students:
      - Verify SEP status
      - Attend a SEP school

Schools:
- Opt-into the policy
- No selection, no fees
- Resources ~ performance

31 / 44

Impact of the SEP policy

  • Positive impact on test scores for lower-income students (Aguirre, 2019; Nielson, 2016)

  • Design could have increased socioeconomic segregation

    • Incentives for concentration of SEP students
  • Key decision variables for schools: Performance, current SEP students, competition, add-on fees.

  • Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools:

    • Only for private-subsidized schools

    • Matching between 2005-2007 --> Effect estimated for 2008-2011

    • Outcome: Average students' household income

32 / 44

Before Matching

diagram

diagram

33 / 44

Matching + DD

  • Prior to matching: No parallel pre-trend, covariates evolve differently for both groups.

  • Different types of schools:

    • Schools that charge high co-payment fees.

    • Schools with low number of SEP student enrolled.

  • MIP Matching using constant or "sticky" covariates:

    • Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees

    • Fine balance: Test scores, monthly average voucher.

34 / 44

After matching

diagram

diagram

35 / 44

Results

  • Matched schools:

    • More vulnerable and lower test scores than the population mean.
  • 6% increase in the income gap between SEP and non-SEP schools in matched DD:

    • SEP schools attracted even more vulnerable students.

    • Non-SEP schools increased their average family income.

36 / 44

Results

  • Matched schools:

    • More vulnerable and lower test scores than the population mean.
  • 6% increase in the income gap between SEP and non-SEP schools in matched DD:

    • SEP schools attracted even more vulnerable students.

    • Non-SEP schools increased their average family income.

  • There is a need to evaluate the policy as a whole.

    • Unintended consequences also matter.
36 / 44

Let's wrap it up

37 / 44

Conclusions

  • Matching can be an important tool to address violations in PTA.

  • Relevant to think whether groups come from the same or different populations.

  • Serial correlation also plays an important role: Don't match on random noise.

38 / 44

Conclusions

  • Matching can be an important tool to address violations in PTA.

  • Relevant to think whether groups come from the same or different populations.

  • Serial correlation also plays an important role: Don't match on random noise.

Match well and match smart!

39 / 44

A Difference-in-Differences Approach
using Mixed-Integer Programming Matching

Magdalena Bennett

40 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t),1)

41 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t),1)

S2: Time-varying covariate effect

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

41 / 44

Time-invariant Covariates

S1: Time-invariant covariate effect

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t),1)

S2: Time-varying covariate effect

XiindN(m(zi),v(zi)) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S3: Treatment-independent covariate

XiindN(1,1) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

41 / 44

Time-varying Covariates

S4: Parallel evolution

Xit=x(t1)i+m1(t)z Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

42 / 44

Time-varying Covariates

S4: Parallel evolution

Xit=x(t1)i+m1(t)z Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S5: Evolution differs by group

Xit=x(t1)i+m2(zi,t)z Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

42 / 44

Time-varying Covariates

S4: Parallel evolution

Xit=x(t1)i+m1(t)z Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S5: Evolution differs by group

Xit=x(t1)i+m2(zi,t)z Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

S6: Evolution diverges in post

Xit=x(t1)i+m1(t)zm3(zi,t) Yi(t)indN(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)

42 / 44

Covariate evolution: Time-invariant

diagram

43 / 44

Covariate evolution: Time-varying

diagram

44 / 44

Diff-in-Diff as an identification strategy

2 / 44
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
Esc Back to slideshow