Source: Google Scholar
Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
Can matching help?
Bounds on treatment effects (Rambachan & Roth, 2020)
Find sub-groups that potentially follow PTA (e.g. similar units in treatment and control)
Can matching help?
Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
Use mixed-integer programming matching (MIP) to balance covariates directly.
Identify contexts when matching can recover causal estimates under violations in the parallel trend assumption.
Use mixed-integer programming matching (MIP) to balance covariates directly.
Simulations:
Different DGP scenarios
Application:
School segregation & vouchers
Let's get started
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
ATT=E[Yit(1)−Yit(0)|Z=1]
Let Yit(z) be the potential outcome for unit i in period t under treatment z.
Intervention implemented in T0 → No units are treated in t≤T0
ATT=E[Yit(1)−Yit(0)|Z=1]
Assumptions for DD:
Parallel-trend assumption (PTA)
Common shocks
E[Yi1(0)−Yi0(0)|Z=1]=E[Yi1(0)−Yi0(0)|Z=0]
Under these assumptions: ^τDD=ΔpostE[Y(1)|Z=1]−E[Y(1)|Z=0]−(E[Y(0)|Z=1]−E[Y(0)|Z=0])Δpre
Where t=0 and t=1 are the pre- and post-intervention periods, respectively.
Y(t)=Y1(t)⋅Z+(1−Z)⋅Y0(t) is the observed outcome.
Under PTA, g1(t)=g0(t)+h(t), where:
Under PTA, g1(t)=g0(t)+h(t), where:
Bias in a DD setting depends on the structure of h(t).
Confounding in DD affect trends and not levels.
Under PTA, g1(t)=g0(t)+h(t), where:
Bias in a DD setting depends on the structure of h(t).
Confounding in DD affect trends and not levels.
Contextual knowledge is important!
Bias when matching on time-varying covariates:
Regression to the mean:
Both groups come from different populations
Particularly salient when matching on previous outcomes and small number of pre-periods.
Match covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.
Match covariates or outcomes? Levels or trends?
Propensity score matching? Optimal matching? etc.
This paper:
Match on covariates that could make groups behave differently.
Use of Mixed-Integer Programming (MIP) Matching (Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020):
Balance covariates directly
Yield largest matched sample under balancing constraints (cardinality matching)
Works with large samples
Simulations
Time-invariant covariates:
S1: Time-invariant covariate effect
S2: Time-varying covariate effect
S3: Treatment-independent covariate
Time-invariant covariates:
S1: Time-invariant covariate effect
S2: Time-varying covariate effect
S3: Treatment-independent covariate
Time-varying covariates:
S4: Parallel evolution
S5: Evolution differs by group
S6: Evolution diverges in post
Following Zeldow & Hatfield (2019)
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xiind∼N(m(zi),v(zi))
Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S1) Time-invariant covariate effect: g(xi,t) = 0
S2) Time-varying covariate effect: g(xi,t) ≠ 0
S3) Time-varying covariate effect: m(zi) = μ and v(zi) = σ
Xit=x(t−1)i+h(zi,t)⋅ri+m(zi,t) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xit=x(t−1)i+h(zi,t)⋅ri+m(zi,t)
Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S4) Parallel evolution: h(zi,t) = h(t) and m(zi,t) = 0
S5) Evolution differs by group: m(zi,t) = 0
S6) Evolution differs in post: h(zi,t) = h(t) and m(zi,t) = Post*m(zi,t)
Model | Pseudo R code |
---|---|
Simple | lm(y ~ a*p + t) |
Covariate Adjusted (CA) | lm(y ~ a*p + t + x) |
Time-Varying Adjusted (TVA) | lm(y ~ a*p + t*x) |
Match on pre-treat outcomes | lm(y ~ a*p + t, data=out.match) |
Match on pre-treat 1st diff | lm(y ~ a*p + t, data=out.lag.match) |
Match on pre-treat cov (PS) | lm(y ~ a*p + t, data=cov.match) |
Match on pre-treat cov (MIP) | Event study (data=cov.match.mip) |
Match on all cov (MIP) | Event study (data=cov.match.mip.all) |
Following Zeldow & Hatfield (2019)
Parameter | Value |
---|---|
Number of obs (N) | 1,000 |
Pr(Z=1) |
0.5 |
Time periods (T) | 10 |
Last pre-intervention period (T_0) | 5 |
Matching PS | Nearest neighbor |
MIP Matching tolerance | .05 SD |
Number of simulations | 1,000 |
In these simulations. for time-varying covariates:
In these simulations. for time-varying covariates:
Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.
In these simulations. for time-varying covariates:
Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.
We don't know in which scenario we are
In these simulations. for time-varying covariates:
Matching on treatment covariates returns a unbiased ATT estimate if covariates evolve differently over time and treatment does not affect them.
Matching on treatment covariates returns a biased ATT estimate if covariates evolve differently over time and are affected by treatment.
We don't know in which scenario we are
Matching on pre- and post-intervention covariates returns the direct effect of the treatment on the outcome
Depending on the context, this could be an upper or lower bound for the true effect.
Test regression to the mean under no effect:
Application
Universal flat voucher scheme 2008⟶ Universal + preferential voucher scheme
Preferential voucher scheme:
Targeted to bottom 40% of vulnerable students
Additional 50% of voucher per student
Additional money for concentration of SEP students.
Universal flat voucher scheme 2008⟶ Universal + preferential voucher scheme
Preferential voucher scheme:
Targeted to bottom 40% of vulnerable students
Additional 50% of voucher per student
Additional money for concentration of SEP students.
Students:
- Verify SEP status
- Attend a SEP school
Schools:
- Opt-into the policy
- No selection, no fees
- Resources ~ performance
Positive impact on test scores for lower-income students (Aguirre, 2019; Nielson, 2016)
Design could have increased socioeconomic segregation
Key decision variables for schools: Performance, current SEP students, competition, add-on fees.
Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools:
Only for private-subsidized schools
Matching between 2005-2007 --> Effect estimated for 2008-2011
Outcome: Average students' household income
Prior to matching: No parallel pre-trend, covariates evolve differently for both groups.
Different types of schools:
Schools that charge high co-payment fees.
Schools with low number of SEP student enrolled.
MIP Matching using constant or "sticky" covariates:
Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees
Fine balance: Test scores, monthly average voucher.
Matched schools:
6% increase in the income gap between SEP and non-SEP schools in matched DD:
SEP schools attracted even more vulnerable students.
Non-SEP schools increased their average family income.
Matched schools:
6% increase in the income gap between SEP and non-SEP schools in matched DD:
SEP schools attracted even more vulnerable students.
Non-SEP schools increased their average family income.
There is a need to evaluate the policy as a whole.
Let's wrap it up
Matching can be an important tool to address violations in PTA.
Relevant to think whether groups come from the same or different populations.
Serial correlation also plays an important role: Don't match on random noise.
Matching can be an important tool to address violations in PTA.
Relevant to think whether groups come from the same or different populations.
Serial correlation also plays an important role: Don't match on random noise.
Match well and match smart!
S1: Time-invariant covariate effect
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t),1)
S1: Time-invariant covariate effect
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t),1)
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S1: Time-invariant covariate effect
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t),1)
Xiind∼N(m(zi),v(zi)) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xiind∼N(1,1) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S4: Parallel evolution
Xit=x(t−1)i+m1(t)⋅z Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S4: Parallel evolution
Xit=x(t−1)i+m1(t)⋅z Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xit=x(t−1)i+m2(zi,t)⋅z Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
S4: Parallel evolution
Xit=x(t−1)i+m1(t)⋅z Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xit=x(t−1)i+m2(zi,t)⋅z Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Xit=x(t−1)i+m1(t)⋅z−m3(zi,t) Yi(t)ind∼N(1+zi+treatit+ui+xi+f(t)+g(xi,t),1)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
s | Toggle scribble toolbox |
Esc | Back to slideshow |