mbennett_did.knit

class: inverse, center, middle

<h1 class="title-own">A Difference-in-Differences Approach using Mixed-Integer Programming Matching</h1>

.small[Magdalena Bennett&nbsp;&nbsp;&nbsp; *McCombs School of Business, The University of Texas at Austin*&nbsp;&nbsp;&nbsp;]

.small[Universidad Diego Portales June 23rd, 2022]

---
# Diff-in-Diff as an identification strategy

<img src="mbennett_did_files/figure-html/dd-1.svg" style="display: block; margin: auto;" />
---
# Diff-in-Diff as an identification strategy

---
# Diff-in-Diff as an identification strategy

---
# Very popular for policy evaluation

<img src="mbennett_did_files/figure-html/gg-1.svg" style="display: block; margin: auto;" />
.source[Source: Web of Science]

---

# What about parallel trends?

.pull-left[
.center[
![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)]
]

.pull-right[]

---

# What about parallel trends?

.pull-left[
.center[
![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)]
]

.pull-right[

- Main identification assumption **.darkorange[fails]**]
  
---

# What about parallel trends?

.pull-left[
.center[
![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)]
]

.pull-right[

- Main identification assumption **.darkorange[fails]**

- Find sub-groups that potentially **.darkorange[follow PTA]**

- E.g. similar units in treatment and control

- Similar to synthetic control intuition.
]

---
# What about parallel trends?

.pull-left[
.center[
![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)]
]

.pull-right[

- Main identification assumption **.darkorange[fails]**

- Find sub-groups that potentially **.darkorange[follow PTA]**

- E.g. similar units in treatment and control

- Similar to synthetic control intuition.

- Can matching help?

- It's **.darkorange[complicated]** .small[(Ham & Miratrix, 2022; Zeldow & Hatfield, 2021; Basu & Small, 2020; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)]
]

---

# This paper

- Identify contexts when matching can recover causal estimates under **.darkorange[violations of the parallel trend assumption]**.

- Partial identification in some cases.

- Use **.darkorange[mixed-integer programming matching (MIP)]** to balance covariates directly.

--

.pull-left[
.box-6trans[**Simulations:** Different DGP scenarios]
]

.pull-right[
.box-6trans[**Application:** School segregation & vouchers]
]

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Let's get started
 
 
]
---

# DD Setup

- Let `$Y_{it}(z)$` be the potential outcome for unit `$i$` in period `$t$` under treatment `$z$`.

- Intervention implemented in `$T_0$` `$\rightarrow$` No units are treated in `$t\leq T_0$`
--

- Difference-in-Differences (DD) focuses on ATT for `$t>T_0$`:

`$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$`
--

- **.darkorange[Assumptions for DD]**:

- Parallel-trend assumption (PTA)
  
  - Common shocks
  
  `$$E[Y_{i1}(0) - Y_{i0}(0) | Z=1] = E[Y_{i1}(0) - Y_{i0}(0) | Z=0]$$`
---
# DD Setup (cont.)

- Under these assumptions:
$$
`\begin{align}
\hat{\tau}^{DD} = &\color{#820a94}{\overbrace{\color{black}{E[Y_{i1}|Z=1] - E[Y_{i1}|Z=0]}}^{\color{#820a94}{\Delta_{post}}}} - \\
&\color{#F89441}{\underbrace{\color{black}{(E[Y_{i0}|Z=1] - E[Y_{i0}|Z=0])}}_{\color{#F89441}{\Delta_{pre}}}}
\end{align}`
$$
  - Where `$t=0$` and `$t=1$` are the pre- and post-intervention periods, respectively.
  
  - `$Y_{it} = Y_{it}(1)\cdot Z_i + (1-Z_i)\cdot Y_{it}(0)$` is the observed outcome.

---
# But what if the PTA doesn't hold?

.pull-left[
Potential outcomes are a function of observed (X) and unobserved characteristics (u):

`$$Y_{it}(0) = g(X_{it}, u_{it}, t)$$`
`$$Y_{it}(1) = g(X_{it}, u_{it}, t) + \tau\mathrm{I}(t>T_0)$$`
]

.pull-right[
<img src="mbennett_did_files/figure-html/po1-1.svg" style="display: block; margin: auto;" />

]

---
# But what if the PTA doesn't hold?

.pull-left[
Potential outcomes are a function of observed (X) and unobserved characteristics (u):

`$$Y_{it}(0) = g(X_{it}, u_{it}, t)$$`
`$$Y_{it}(1) = g(X_{it}, u_{it}, t) + \tau\mathrm{I}(t>T_0)$$`

... but `$X_{it}| Z_i = z \sim F_x(t,z)$` and `$u_{it}| Z_i = z \sim F_u(t,z)$`  
]

.pull-right[
<img src="mbennett_did_files/figure-html/po2-1.svg" style="display: block; margin: auto;" />

]
---

# Violations to the PTA

.pull-left[
- Under PTA, `$g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)$`, where:

- `$g_z(t) = E[Y_{it}(0) | Z=z, T=t]$`
  - `$h(t) = \alpha$`
]

.pull-right[
![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif)
]

---
# Violations to the PTA

.pull-left[
- Under PTA, `$g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)$`, where:

- `$g_z(t) = E[Y_{it}(0) | Z=z, T=t]$`
  - `$h(t) = \alpha$`

- Bias in a DD setting depends on the structure of `$h(t)$`.

- Confounding in DD affect **.darkorange[trends]** and not **.darkorange[levels]**.
]

.pull-right[
![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif)
]

---

# Violations to the PTA

.pull-left[
- Under PTA, `$g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)$`, where:

- `$g_z(t) = E[Y_{it}(0) | Z=z, T=t]$`
  - `$h(t) = \alpha$`

- Bias in a DD setting depends on the structure of `$h(t)$`.

- Confounding in DD affect **.darkorange[trends]** and not **.darkorange[levels]**.

- Contextual knowledge is important!
]

.pull-right[
![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif)
]

---
# Two distinct problems when combining matching + DD

.pull-left[
- **.darkorange[Bias when matching on time-varying covariates]**:
  
  - Depends on the structure of time variation
  
- **.darkorange[Regression to the mean]**:

- Both groups come from different populations
  
  - Particularly salient when matching on previous outcomes and small number of pre-periods.
]

.pull-right[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/reg_to_the_mean.svg" alt="diagram" width="500"/>
]

---
# Using pre-trends as robustness check

- Let's use two points in the pre-intervention period: `$(t, t') \in (-\infty, T_0]$`

- Define `$\Delta$` as the bias in the pre-intervention period:

$$
`\begin{aligned}
E[Y_{it'}(1)| Z = 1] - E[Y_{it'}(0)| Z = 0] -(E[Y_{it}(1)|Z=1] - E[Y_{it}(0)|Z = 0]) &=\\
\color{#ffffff}{\overbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}^{\color{#ffffff}{\Delta_{post}}}} -& \\
(\color{#ffffff}{\underbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}_{\color{#ffffff}{\Delta_{pre}}}}) &= \Delta
\end{aligned}`
$$

---
# Using pre-trends as robustness check

- Let's use two points in the pre-intervention period: `$(t, t') \in (-\infty, T_0]$`

- Define `$\Delta$` as the bias in the pre-intervention period:

$$
`\begin{aligned}
E[Y_{it'}(1)| Z = 1] - E[Y_{it'}(0)| Z = 0] -(E[Y_{it}(1)|Z=1] - E[Y_{it}(0)|Z = 0]) &=\\
\color{#820a94}{\overbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}^{\color{#820a94}{\Delta_{post}}}} -& \\
(\color{#F89441}{\underbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}_{\color{#F89441}{\Delta_{pre}}}}) &= \Delta
\end{aligned}`
$$
--

- `$\Delta \neq 0$` due to `$X$`, `$u$`, or both.

- Find `$\mathbf{X} = \mathbf{X^T}$`, so that `$\Delta = 0$` (i.e. assume no differential time-varying unobserved confounder)

---
# How does matching help?

- If **.darkorange[X is constant or evolves similarly for both groups]**:

- Matching on pre-treatment covariates can **.darkorange[eliminate/significantly reduce bias]**
 
--

- If **.darkorange[X evolves differently over time** for Z=0 and Z=1]:

- Matching on pre-treatment covariates **.darkorange[still returns a biased estimate]**
 
--

.box-6trans[Can we do something about it?]

---
# Matching on post-intervention covariates

- Adjusting for post-intervention covariates can **.darkorange[introduce bias]** (Rosenbaum, 1984)

- E.g. collider bias

--

- If pre-parallel trends test **.darkorange[fails]** (e.g. `$\Delta=0$`), `$\rightarrow$` match on **.darkorange[post-intervention]** covariates for bounding the direct effect (DE) and indirect effect (IE) .small[(Hong, Yang, & Qin, 2021)]

--

- Decompose ATT as:

- Direct effect: `$E[Y(1, X(0)) - Y(0,X(0))| Z=1]$`
  - Indirect effect: `$E[Y(1,X(1)) - Y(1, X(0))| Z=1]$`

--

- Using different values of the *potential* correlation between `$X_{t'}(0)$` and `$X_{t'}(1)$` `$\rightarrow$` conditional distribution of `$X_{t'}(0)$` as a function of `$X_t$` and `$X_{t'}(1)$`.

---
# How do we match?

- Match on covariates or outcomes? Levels or trends?

- Propensity score matching? Optimal matching? etc.

This paper:

- **.darkorange[Match on covariates]** that could make groups behave differently.
  
  - Use distribution of  covariates to match on a template (for post-treat covariates).

- Use of **.darkorange[Mixed-Integer Programming (MIP) Matching]** .small[(Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020)]:

- Balance covariates directly
  
  - Yield largest matched sample under balancing constraints (cardinality matching)
  
  - Works with large samples

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Simulations
 
 
]

---

# Different scenarios

.pull-left[
**Time-invariant covariates:**

.box-1trans[S1: Time-invariant covariate effect]

.box-2trans[S2: Time-varying covariate effect]

.box-3trans[S3: Treatment-independent covariate]

]

.pull-right[
**Time-varying covariates:**

.box-5trans[S4: Parallel evolution]

.box-6trans[S5: Evolution differs by group]

.box-7trans[S6: Evolution diverges in post]
]

.source[Following Zeldow & Hatfield (2021)]

---

.center[
![:scale 90%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/sree_20210927/images/zeldow_hatfield_table.png)
]

.source[Table 1 from Zeldow & Hatfield (2021)]
---

# Different ways to control

.small[
<div class="center"><table>
<thead>
<tr>
<th>Model</th>
<th>Pseudo <code class="remark-inline-code">R</code> code</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t)</code> </td>
</tr>
<tr>
<td>Covariate Adjusted (CA)</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t + x)</code> </td>
</tr>
<tr>
<td>Time-Varying Adjusted (TVA)</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t*x)</code> </td>
</tr>
<tr>
<td>Match on pre-treat outcomes</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t, data=out.match)</code> </td>
</tr>
<tr>
<td>Match on pre-treat 1st diff</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t, data=out.lag.match)</code> </td>
</tr>
<tr>
<td>Match on pre-treat cov (PS)</td>
<td><code class="remark-inline-code">lm(y ~ a*p + t, data=cov.match)</code> </td>
</tr>
<tr>
<td id="highlight">Match on pre-treat cov (MIP)</td>
<td id="highlight"><code class="remark-inline-code">Event study (data=cov.match.mip)</code></td>
</tr>
<tr>
<td id="highlight">Match on all cov (MIP)</td>
<td id="highlight"><code class="remark-inline-code">Event study (data=cov.match.mip.all)</code></td>
</tr>
</tbody>
</table>
</div>
]
.bottom[
.source[Following Zeldow & Hatfield (2021)]]

---
#Parameters:

.center[
Parameter                            | Value
-------------------------------------|----------------------------------------------
Number of obs (N)                               | 1,000 
`Pr(Z=1)`              | 0.5 
Time periods (T)          | 10 
Last pre-intervention period (T_0)          | 5 
Matching PS          | Nearest neighbor
MIP Matching tolerance          | .05 SD
Number of simulations | 1,000
]

- Estimate compared to sample ATT (_different for matching_)
- When matching with post-treat covariates `$\rightarrow$` compared with direct effect `$\tau$`
---
#Results: Time-constant covariates

---
# Results: Time-varying covariates

---
# Other simulations

- Test **.darkorange[regression to the mean]** under no effect:

- Vary autocorrelation of `$X_i(t)$` (low vs. high)
  - `$X_0(t)$` and `$X_1(t)$` come from the same or different distribution.
  
- Similar conclusions to Ham & Miratrix (2022) regarding the reliability of X on Y.

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Application
 
 
]
---
#Preferential Voucher Scheme in Chile

- Universal **.darkorange[flat voucher]** scheme `$\stackrel{\mathbf{2008}}{\mathbf{\longrightarrow}}$` Universal + **.darkorange[preferential voucher]** scheme

- Preferential voucher scheme:
  - Targeted to bottom 40% of vulnerable students

- Additional 50% of voucher per student

- Additional money for concentration of SEP students.

.pull-left[
.center[
.box-6trans[**Students:** - Verify SEP status - Attend a SEP school]
]
]

.pull-right[
.center[
.box-6trans[**Schools:** - Opt-into the policy - No selection, no fees - Resources ~ performance]
]
]

---
#Impact of the SEP policy

- **.darkorange[Positive impact on test scores]** for lower-income students (Aguirre, 2019; Nielson, 2016)

- Design could have **.darkorange[increased]** socioeconomic segregation
  - Incentives for concentration of SEP students
  
- Key decision variables for schools: Performance, current SEP students, competition, add-on fees.

- **.darkorange[Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools]**:
  - Only for **.darkorange[private-subsidized schools]**
  
  - Matching between 2005-2007 --> Effect estimated for 2008-2011
  
  - Outcome: Average students' household income
  
---
#Before Matching

.pull-left[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/pta_all.svg" alt="diagram" width="800"/>
]

.pull-right[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/dd_all.svg" alt="diagram" width="800"/>
]

---
# Matching + DD

- **.darkorange[Prior to matching]**: No parallel pre-trend

- **.darkorange[Different types of schools]**:

- Schools that charge high co-payment fees.
  
  - Schools with low number of SEP student enrolled.

- **.darkorange[MIP Matching]** using constant or "sticky" covariates:

- Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees
  
  - Fine balance: Test scores, monthly average voucher.
  
---
# After matching

.pull-left[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/pta_match.svg" alt="diagram" width="800"/>
]
.pull-right[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/dd_match.svg" alt="diagram" width="800"/>
]

---
#Results

- **.darkorange[Matched schools]**:

- More vulnerable and lower test scores than the population mean.

- **.darkorange[6% increase in the income gap]** between SEP and non-SEP schools in matched DD:

- SEP schools attracted even more vulnerable students.
  
  - Non-SEP schools increased their average family income.

- There is a need to **.darkorange[evaluate the policy as a whole]**.

- Unintended consequences also matter.

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Let's wrap it up
 
 
]
---
#Conclusions & Next Steps

.pull-left[
- Matching can be an important tool to address **.darkorange[violations in PTA]**.

- **.darkorange[Partial identification]** can also be useful

- **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise.

- Next steps: Sensitivity analysis to hidden bias ]

.pull-right[
![](https://media.giphy.com/media/drwxYI2fxqQGqRZ9Pe/giphy.gif)
]

---
#Conclusions & Next Steps

.pull-left[
- Matching can be an important tool to address **.darkorange[violations in PTA]**.

- **.darkorange[Partial identification]** can also be useful

- **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise.

- Next steps: Sensitivity analysis to hidden bias

.box-6trans[Match well and match smart!
]]
.pull-right[
![](https://media.giphy.com/media/drwxYI2fxqQGqRZ9Pe/giphy.gif)
]

---
class: inverse, center, middle

<h1 class="title-own">A Difference-in-Differences Approach using Mixed-Integer Programming Matching</h1>

.small[Magdalena Bennett&nbsp;&nbsp;&nbsp; *McCombs School of Business, The University of Texas at Austin*&nbsp;&nbsp;&nbsp;]

.small[Universidad Diego Portales June 23rd, 2022]

---
# Time-invariant covariates

`$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`

--

.box-1b[S1) Time-invariant covariate effect: g(xi,t) = 0]

.box-2b[S2) Time-varying covariate effect: g(xi,t) &#8800; 0]

.box-3b[S3) Time-varying covariate effect: m(zi) = &mu; and v(zi) = &sigma;]

---
# Time-varying covariates

`$$X_{it} = x_{(t-1)i} + h(z_i,t)\cdot r_i + m(z_i,t)$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`

--

.box-4b[S4) Parallel evolution: h(zi,t) = h(t) and m(zi,t) = 0]

.box-6b[S5) Evolution differs by group: m(zi,t) = 0]

.box-7b[S6) Evolution differs in post: h(zi,t) = h(t) and m(zi,t) = Post*m(zi,t)]

---
# Time-invariant Covariates

.box-1a.medium.sp-after-half[S1: Time-invariant covariate effect]
.small[
`$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t),1)$$`]

--
.box-2a.medium.sp-after-half[S2: Time-varying covariate effect]
.small[
`$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`]

--
.box-3a.medium.sp-after-half[S3: Treatment-independent covariate]
.small[
`$$X_i \stackrel{ind}{\sim} N(1,1)$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`]

---
# Time-varying Covariates

.box-4a.medium.sp-after-half[S4: Parallel evolution]
.small[
`$$X_{it} = x_{(t-1)i} + m_1(t)\cdot z$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`]

--
.box-6a.medium.sp-after-half[S5: Evolution differs by group]
.small[
`$$X_{it} = x_{(t-1)i} + m_2(z_i,t)\cdot z$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`]

--
.box-7a.medium.sp-after-half[S6: Evolution diverges in post]
.small[
`$$X_{it} = x_{(t-1)i} + m_1(t)\cdot z - m_3(z_i,t)$$`
`$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`]

---
# Covariate evolution: Time-invariant

.center[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/cov_1.svg" alt="diagram" width="900"/>]

---
# Covariate evolution: Time-varying

.center[
<img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/cov_2.svg" alt="diagram" width="900"/>]