class: inverse, center, middle <br> <br> <br> <h1 class="title-own">A Difference-in-Differences Approach<br/>using Mixed-Integer Programming Matching</h1> <br> <br> .small[Magdalena Bennett <br>*McCombs School of Business, The University of Texas at Austin* ] <br> .small[Universidad Diego Portales<br>June 23rd, 2022] --- # Diff-in-Diff as an identification strategy <img src="mbennett_did_files/figure-html/dd-1.svg" style="display: block; margin: auto;" /> --- # Diff-in-Diff as an identification strategy <img src="mbennett_did_files/figure-html/dd2-1.svg" style="display: block; margin: auto;" /> --- # Diff-in-Diff as an identification strategy <img src="mbennett_did_files/figure-html/dd3-1.svg" style="display: block; margin: auto;" /> --- # Very popular for policy evaluation <img src="mbennett_did_files/figure-html/gg-1.svg" style="display: block; margin: auto;" /> .source[Source: Web of Science] --- # What about parallel trends? .pull-left[ .center[ ![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)] ] .pull-right[] --- # What about parallel trends? .pull-left[ .center[ ![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)] ] .pull-right[ - Main identification assumption **.darkorange[fails]**] --- # What about parallel trends? .pull-left[ .center[ ![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)] ] .pull-right[ - Main identification assumption **.darkorange[fails]** - Find sub-groups that potentially **.darkorange[follow PTA]** - E.g. similar units in treatment and control - Similar to synthetic control intuition. ] --- # What about parallel trends? .pull-left[ .center[ ![:scale 80%](https://raw.githubusercontent.com/maibennett/website_github/master/exampleSite/content/images/data_comic.jpg)] ] .pull-right[ - Main identification assumption **.darkorange[fails]** - Find sub-groups that potentially **.darkorange[follow PTA]** - E.g. similar units in treatment and control - Similar to synthetic control intuition. - Can matching help? - It's **.darkorange[complicated]** .small[(Ham & Miratrix, 2022; Zeldow & Hatfield, 2021; Basu & Small, 2020; Lindner & McConnell, 2018; Daw & Hatfield, 2018 (x2); Ryan, 2018; Ryan et al., 2018)] ] --- # This paper - Identify contexts when matching can recover causal estimates under **.darkorange[violations of the parallel trend assumption]**. - Partial identification in some cases. - Use **.darkorange[mixed-integer programming matching (MIP)]** to balance covariates directly. -- <br/> <br/> .pull-left[ .box-6trans[**Simulations:**<br/>Different DGP scenarios] ] .pull-right[ .box-6trans[**Application:**<br/>School segregation & vouchers] ] --- background-position: 50% 50% class: left, bottom, inverse .big[ Let's get started <br> <br> ] --- # DD Setup - Let `\(Y_{it}(z)\)` be the potential outcome for unit `\(i\)` in period `\(t\)` under treatment `\(z\)`. - Intervention implemented in `\(T_0\)` `\(\rightarrow\)` No units are treated in `\(t\leq T_0\)` -- - Difference-in-Differences (DD) focuses on ATT for `\(t>T_0\)`: `$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$` -- - **.darkorange[Assumptions for DD]**: - Parallel-trend assumption (PTA) - Common shocks `$$E[Y_{i1}(0) - Y_{i0}(0) | Z=1] = E[Y_{i1}(0) - Y_{i0}(0) | Z=0]$$` --- # DD Setup (cont.) - Under these assumptions: $$ `\begin{align} \hat{\tau}^{DD} = &\color{#820a94}{\overbrace{\color{black}{E[Y_{i1}|Z=1] - E[Y_{i1}|Z=0]}}^{\color{#820a94}{\Delta_{post}}}} - \\ &\color{#F89441}{\underbrace{\color{black}{(E[Y_{i0}|Z=1] - E[Y_{i0}|Z=0])}}_{\color{#F89441}{\Delta_{pre}}}} \end{align}` $$ - Where `\(t=0\)` and `\(t=1\)` are the pre- and post-intervention periods, respectively. - `\(Y_{it} = Y_{it}(1)\cdot Z_i + (1-Z_i)\cdot Y_{it}(0)\)` is the observed outcome. --- # But what if the PTA doesn't hold? .pull-left[ Potential outcomes are a function of observed (X) and unobserved characteristics (u): `$$Y_{it}(0) = g(X_{it}, u_{it}, t)$$` `$$Y_{it}(1) = g(X_{it}, u_{it}, t) + \tau\mathrm{I}(t>T_0)$$` ] .pull-right[ <img src="mbennett_did_files/figure-html/po1-1.svg" style="display: block; margin: auto;" /> ] --- # But what if the PTA doesn't hold? .pull-left[ Potential outcomes are a function of observed (X) and unobserved characteristics (u): `$$Y_{it}(0) = g(X_{it}, u_{it}, t)$$` `$$Y_{it}(1) = g(X_{it}, u_{it}, t) + \tau\mathrm{I}(t>T_0)$$` ... but `\(X_{it}| Z_i = z \sim F_x(t,z)\)` and `\(u_{it}| Z_i = z \sim F_u(t,z)\)` ] .pull-right[ <img src="mbennett_did_files/figure-html/po2-1.svg" style="display: block; margin: auto;" /> ] --- # Violations to the PTA .pull-left[ - Under PTA, `\(g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)\)`, where: - `\(g_z(t) = E[Y_{it}(0) | Z=z, T=t]\)` - `\(h(t) = \alpha\)` ] .pull-right[ ![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif) ] --- # Violations to the PTA .pull-left[ - Under PTA, `\(g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)\)`, where: - `\(g_z(t) = E[Y_{it}(0) | Z=z, T=t]\)` - `\(h(t) = \alpha\)` - Bias in a DD setting depends on the structure of `\(h(t)\)`. - Confounding in DD affect **.darkorange[trends]** and not **.darkorange[levels]**. ] .pull-right[ ![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif) ] --- # Violations to the PTA .pull-left[ - Under PTA, `\(g_1(t) = g_0(t) + h(t) + \tau(t)\mathrm{I}(t>T_0)\)`, where: - `\(g_z(t) = E[Y_{it}(0) | Z=z, T=t]\)` - `\(h(t) = \alpha\)` - Bias in a DD setting depends on the structure of `\(h(t)\)`. - Confounding in DD affect **.darkorange[trends]** and not **.darkorange[levels]**. - Contextual knowledge is important! ] .pull-right[ ![](https://media.giphy.com/media/L8yQ0RQBItqso/giphy.gif) ] --- # Two distinct problems when combining matching + DD .pull-left[ - **.darkorange[Bias when matching on time-varying covariates]**: - Depends on the structure of time variation - **.darkorange[Regression to the mean]**: - Both groups come from different populations - Particularly salient when matching on previous outcomes and small number of pre-periods. ] .pull-right[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/reg_to_the_mean.svg" alt="diagram" width="500"/> ] --- # Using pre-trends as robustness check - Let's use two points in the pre-intervention period: `\((t, t') \in (-\infty, T_0]\)` - Define `\(\Delta\)` as the bias in the pre-intervention period: $$ `\begin{aligned} E[Y_{it'}(1)| Z = 1] - E[Y_{it'}(0)| Z = 0] -(E[Y_{it}(1)|Z=1] - E[Y_{it}(0)|Z = 0]) &=\\ \color{#ffffff}{\overbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}^{\color{#ffffff}{\Delta_{post}}}} -& \\ (\color{#ffffff}{\underbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}_{\color{#ffffff}{\Delta_{pre}}}}) &= \Delta \end{aligned}` $$ --- # Using pre-trends as robustness check - Let's use two points in the pre-intervention period: `\((t, t') \in (-\infty, T_0]\)` - Define `\(\Delta\)` as the bias in the pre-intervention period: $$ `\begin{aligned} E[Y_{it'}(1)| Z = 1] - E[Y_{it'}(0)| Z = 0] -(E[Y_{it}(1)|Z=1] - E[Y_{it}(0)|Z = 0]) &=\\ \color{#820a94}{\overbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}^{\color{#820a94}{\Delta_{post}}}} -& \\ (\color{#F89441}{\underbrace{\color{black}{E[g(X_{it'},u_{it'}, t')|Z=1] - E[g(X_{it'},u_{it'}, t')|Z=0]}}_{\color{#F89441}{\Delta_{pre}}}}) &= \Delta \end{aligned}` $$ -- - `\(\Delta \neq 0\)` due to `\(X\)`, `\(u\)`, or both. - Find `\(\mathbf{X} = \mathbf{X^T}\)`, so that `\(\Delta = 0\)` (i.e. assume no differential time-varying unobserved confounder) --- # How does matching help? - If **.darkorange[X is constant or evolves similarly for both groups]**: - Matching on pre-treatment covariates can **.darkorange[eliminate/significantly reduce bias]** -- <br> <br> - If **.darkorange[X evolves differently over time** for Z=0 and Z=1]: - Matching on pre-treatment covariates **.darkorange[still returns a biased estimate]** -- <br> <br> .box-6trans[Can we do something about it?] --- # Matching on post-intervention covariates - Adjusting for post-intervention covariates can **.darkorange[introduce bias]** (Rosenbaum, 1984) - E.g. collider bias -- <br> <br> - If pre-parallel trends test **.darkorange[fails]** (e.g. `\(\Delta=0\)`), `\(\rightarrow\)` match on **.darkorange[post-intervention]** covariates for bounding the direct effect (DE) and indirect effect (IE) .small[(Hong, Yang, & Qin, 2021)] -- <br> <br> - Decompose ATT as: - Direct effect: `\(E[Y(1, X(0)) - Y(0,X(0))| Z=1]\)` - Indirect effect: `\(E[Y(1,X(1)) - Y(1, X(0))| Z=1]\)` -- <br> <br> - Using different values of the *potential* correlation between `\(X_{t'}(0)\)` and `\(X_{t'}(1)\)` `\(\rightarrow\)` conditional distribution of `\(X_{t'}(0)\)` as a function of `\(X_t\)` and `\(X_{t'}(1)\)`. --- # How do we match? - Match on covariates or outcomes? Levels or trends? - Propensity score matching? Optimal matching? etc. -- This paper: - **.darkorange[Match on covariates]** that could make groups behave differently. - Use distribution of covariates to match on a template (for post-treat covariates). - Use of **.darkorange[Mixed-Integer Programming (MIP) Matching]** .small[(Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020)]: - Balance covariates directly - Yield largest matched sample under balancing constraints (cardinality matching) - Works with large samples --- background-position: 50% 50% class: left, bottom, inverse .big[ Simulations <br> <br> ] --- # Different scenarios .pull-left[ **Time-invariant covariates:** .box-1trans[S1: Time-invariant covariate effect] .box-2trans[S2: Time-varying covariate effect] .box-3trans[S3: Treatment-independent covariate] ] -- .pull-right[ **Time-varying covariates:** .box-5trans[S4: Parallel evolution] .box-6trans[S5: Evolution differs by group] .box-7trans[S6: Evolution diverges in post] ] <br> <br> .source[Following Zeldow & Hatfield (2021)] --- .center[ ![:scale 90%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/sree_20210927/images/zeldow_hatfield_table.png) ] .source[Table 1 from Zeldow & Hatfield (2021)] --- # Different ways to control .small[ <div class="center"><table> <thead> <tr> <th>Model</th> <th>Pseudo <code class="remark-inline-code">R</code> code</th> </tr> </thead> <tbody> <tr> <td>Simple</td> <td><code class="remark-inline-code">lm(y ~ a*p + t)</code> </td> </tr> <tr> <td>Covariate Adjusted (CA)</td> <td><code class="remark-inline-code">lm(y ~ a*p + t + x)</code> </td> </tr> <tr> <td>Time-Varying Adjusted (TVA)</td> <td><code class="remark-inline-code">lm(y ~ a*p + t*x)</code> </td> </tr> <tr> <td>Match on pre-treat outcomes</td> <td><code class="remark-inline-code">lm(y ~ a*p + t, data=out.match)</code> </td> </tr> <tr> <td>Match on pre-treat 1st diff</td> <td><code class="remark-inline-code">lm(y ~ a*p + t, data=out.lag.match)</code> </td> </tr> <tr> <td>Match on pre-treat cov (PS)</td> <td><code class="remark-inline-code">lm(y ~ a*p + t, data=cov.match)</code> </td> </tr> <tr> <td id="highlight">Match on pre-treat cov (MIP)</td> <td id="highlight"><code class="remark-inline-code">Event study (data=cov.match.mip)</code></td> </tr> <tr> <td id="highlight">Match on all cov (MIP)</td> <td id="highlight"><code class="remark-inline-code">Event study (data=cov.match.mip.all)</code></td> </tr> </tbody> </table> </div> ] .bottom[ .source[Following Zeldow & Hatfield (2021)]] --- #Parameters: .center[ Parameter | Value -------------------------------------|---------------------------------------------- Number of obs (N) | 1,000 `Pr(Z=1)` | 0.5 Time periods (T) | 10 Last pre-intervention period (T_0) | 5 Matching PS | Nearest neighbor MIP Matching tolerance | .05 SD Number of simulations | 1,000 ] - Estimate compared to sample ATT (_different for matching_) - When matching with post-treat covariates `\(\rightarrow\)` compared with direct effect `\(\tau\)` --- #Results: Time-constant covariates <img src="mbennett_did_files/figure-html/res1-1.svg" style="display: block; margin: auto;" /> --- # Results: Time-varying covariates <img src="mbennett_did_files/figure-html/res2-1.svg" style="display: block; margin: auto;" /> --- # Other simulations - Test **.darkorange[regression to the mean]** under no effect: - Vary autocorrelation of `\(X_i(t)\)` (low vs. high) - `\(X_0(t)\)` and `\(X_1(t)\)` come from the same or different distribution. - Similar conclusions to Ham & Miratrix (2022) regarding the reliability of X on Y. <img src="mbennett_did_files/figure-html/res3-1.svg" style="display: block; margin: auto;" /> --- background-position: 50% 50% class: left, bottom, inverse .big[ Application <br> <br> ] --- #Preferential Voucher Scheme in Chile - Universal **.darkorange[flat voucher]** scheme `\(\stackrel{\mathbf{2008}}{\mathbf{\longrightarrow}}\)` Universal + **.darkorange[preferential voucher]** scheme - Preferential voucher scheme: - Targeted to bottom 40% of vulnerable students - Additional 50% of voucher per student - Additional money for concentration of SEP students. -- <br/> .pull-left[ .center[ .box-6trans[**Students:**<br/>- Verify SEP status<br/>- Attend a SEP school] ] ] .pull-right[ .center[ .box-6trans[**Schools:**<br/>- Opt-into the policy<br/>- No selection, no fees<br/>- Resources ~ performance] ] ] --- #Impact of the SEP policy - **.darkorange[Positive impact on test scores]** for lower-income students (Aguirre, 2019; Nielson, 2016) - Design could have **.darkorange[increased]** socioeconomic segregation - Incentives for concentration of SEP students - Key decision variables for schools: Performance, current SEP students, competition, add-on fees. - **.darkorange[Diff-in-diff (w.r.t. 2007) for SEP and non-SEP schools]**: - Only for **.darkorange[private-subsidized schools]** - Matching between 2005-2007 --> Effect estimated for 2008-2011 - Outcome: Average students' household income --- #Before Matching .pull-left[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/pta_all.svg" alt="diagram" width="800"/> ] .pull-right[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/dd_all.svg" alt="diagram" width="800"/> ] --- # Matching + DD - **.darkorange[Prior to matching]**: No parallel pre-trend - **.darkorange[Different types of schools]**: - Schools that charge high co-payment fees. - Schools with low number of SEP student enrolled. - **.darkorange[MIP Matching]** using constant or "sticky" covariates: - Mean balance (0.05 SD): Rural, enrollment, number of schools in county, charges add-on fees - Fine balance: Test scores, monthly average voucher. --- # After matching .pull-left[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/pta_match.svg" alt="diagram" width="800"/> ] .pull-right[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/udp_20220623/images/dd_match.svg" alt="diagram" width="800"/> ] --- #Results - **.darkorange[Matched schools]**: - More vulnerable and lower test scores than the population mean. - **.darkorange[6% increase in the income gap]** between SEP and non-SEP schools in matched DD: - SEP schools attracted even more vulnerable students. - Non-SEP schools increased their average family income. -- - There is a need to **.darkorange[evaluate the policy as a whole]**. - Unintended consequences also matter. --- background-position: 50% 50% class: left, bottom, inverse .big[ Let's wrap it up <br> <br> ] --- #Conclusions & Next Steps .pull-left[ - Matching can be an important tool to address **.darkorange[violations in PTA]**. - **.darkorange[Partial identification]** can also be useful - **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise. - Next steps: Sensitivity analysis to hidden bias ] .pull-right[ ![](https://media.giphy.com/media/drwxYI2fxqQGqRZ9Pe/giphy.gif) ] --- #Conclusions & Next Steps .pull-left[ - Matching can be an important tool to address **.darkorange[violations in PTA]**. - **.darkorange[Partial identification]** can also be useful - **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise. - Next steps: Sensitivity analysis to hidden bias .box-6trans[Match well and match smart! ]] .pull-right[ ![](https://media.giphy.com/media/drwxYI2fxqQGqRZ9Pe/giphy.gif) ] --- class: inverse, center, middle <br> <br> <br> <h1 class="title-own">A Difference-in-Differences Approach<br/>using Mixed-Integer Programming Matching</h1> <br> <br> .small[Magdalena Bennett <br>*McCombs School of Business, The University of Texas at Austin* ] <br> .small[Universidad Diego Portales<br>June 23rd, 2022] --- # Time-invariant covariates `$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$` -- <br> <br> .box-1b[S1) Time-invariant covariate effect: g(x<sub>i</sub>,t) = 0] .box-2b[S2) Time-varying covariate effect: g(x<sub>i</sub>,t) ≠ 0] .box-3b[S3) Time-varying covariate effect: m(z<sub>i</sub>) = μ and v(z<sub>i</sub>) = σ] --- # Time-varying covariates `$$X_{it} = x_{(t-1)i} + h(z_i,t)\cdot r_i + m(z_i,t)$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$` -- <br> <br> .box-4b[S4) Parallel evolution: h(z<sub>i</sub>,t) = h(t) and m(z<sub>i</sub>,t) = 0] .box-6b[S5) Evolution differs by group: m(z<sub>i</sub>,t) = 0] .box-7b[S6) Evolution differs in post: h(z<sub>i</sub>,t) = h(t) and m(z<sub>i</sub>,t) = Post*m(z<sub>i</sub>,t)] --- # Time-invariant Covariates .box-1a.medium.sp-after-half[S1: Time-invariant covariate effect] .small[ `$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t),1)$$`] -- .box-2a.medium.sp-after-half[S2: Time-varying covariate effect] .small[ `$$X_i \stackrel{ind}{\sim} N(m(z_i),v(z_i))$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`] -- .box-3a.medium.sp-after-half[S3: Treatment-independent covariate] .small[ `$$X_i \stackrel{ind}{\sim} N(1,1)$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`] --- # Time-varying Covariates .box-4a.medium.sp-after-half[S4: Parallel evolution] .small[ `$$X_{it} = x_{(t-1)i} + m_1(t)\cdot z$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`] -- .box-6a.medium.sp-after-half[S5: Evolution differs by group] .small[ `$$X_{it} = x_{(t-1)i} + m_2(z_i,t)\cdot z$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`] -- .box-7a.medium.sp-after-half[S6: Evolution diverges in post] .small[ `$$X_{it} = x_{(t-1)i} + m_1(t)\cdot z - m_3(z_i,t)$$` `$$Y_i(t) \stackrel{ind}{\sim} N(1+z_i+treat_{it}+u_i+x_i+f(t)+g(x_i,t),1)$$`] --- # Covariate evolution: Time-invariant .center[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/cov_1.svg" alt="diagram" width="900"/>] --- # Covariate evolution: Time-varying .center[ <img src="https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/DD/bb_20201202/images/cov_2.svg" alt="diagram" width="900"/>]