class: center, middle, inverse, title-slide # How far is too far?
Generalization of a regression discontinuity design away from the cutoff ## Magdalena Bennett
The University of Texas at Austin ### UC San Diego Econometrics Seminar
Apr 20, 2021 --- # Regression discontinuity design <img src="mbennett_grd_files/figure-html/rd_intro1-1.svg" style="display: block; margin: auto;" /> --- # Very popular design for causal inference <img src="mbennett_grd_files/figure-html/rd_publications-1.svg" style="display: block; margin: auto;" /> --- # Strong internal validity <img src="mbennett_grd_files/figure-html/rd_intro2-1.svg" style="display: block; margin: auto;" /> --- # ... But limited external validity <img src="mbennett_grd_files/figure-html/rd_intro3-1.svg" style="display: block; margin: auto;" /> --- # Missing data and no overlap <img src="mbennett_grd_files/figure-html/rd_intro4-1.svg" style="display: block; margin: auto;" /> --- # Can we find a generalization interval? <img src="mbennett_grd_files/figure-html/rd_intro5-1.svg" style="display: block; margin: auto;" /> --- # This paper **Identification of generalization interval and estimation of ATT for population within such interval:** -- - Pre-intervention period informs the generalization bandwidth <br> .tiny[(Wing & Cook, 2013; Keele, Small, Hsu, & Fogarty, 2020)] - Leverage the use of predictive covariates for breaking link between running variable and outcome <br> .tiny[(Angrist & Rokkanen, 2015; Rokkanen, 2015; Keele, Titiunik, & Zubizarreta, 2015)] - Based on idea of local randomization near the cutoff <br> .tiny[(Lee, 2008; Cattaneo, Frandsen, & Titiunik, 2015)] --- # This paper **Main advantages:** -- - Gradual approach - No need for "All or Nothing" - Interval informed by the data .tinylist[(Cattaneo et al., 2015)] -- - No extrapolation of population characteristics - Compare like-to-like .tinylist[(Rosenbaum, 1987)] - Makes overlap region explicit -- - Generalization to population of interest - Use of representative template matching .tinylist[(Silber et al., 2014; Bennett, Vielma, & Zubizarreta, 2020)] -- - Sensitivity analysis to hidden bias .tiny[(Rosenbaum, 2010; Keele et al., 2020)] --- # Outline 1. Motivation <br> <br> 2. Generalized Regression Discontinuity Design (GRD) 2.1 Framework 2.2 GRD in practice <br> <br> 3. Application: Free Higher Education in Chile <br> <br> 4. Conclusions --- background-position: 50% 50% class: left,middle, inverse, nonum .big[ Generalized Regression Discontinuity Design ] --- # Generalized Regression Discontinuity Design (GRD) -- Two-part problem with pre- and post-intervention periods: <br> -- <br> <span class="box-5">1) Identification of generalization interval H<sup>\*</sup></span> .box-5t[(using pre-intervention period)] -- <br> <span class="box-5">2) Estimation of ATT for population within H<sup>\*</sup></span> .box-5t[(using post-intervention period)] --- #The setup - **Two periods**: pre- and post-intervention, `\(t=0\)` and `\(t=1\)`. - Running variable `\(R\)` determines assignment `\(Z\)` in `\(t=1\)`. E.g.: `$$Z_{it} = \mathrm{I}(R_{it}<c)$$` - Potential outcomes under treatment `\(z = 0,1\)`: `$$Y^{(z)}_{it} = g_z(\mathbf{X}_{it},\mathbf{u}_{it},r_{it}) + z_{it}\cdot \underbrace{\tau(\mathbf{X}_{it},\mathbf{u}_{it},r_{it})}_{\color{#E16462}{\style{font-family:inherit}{\text{Treat. Effect}}}} + \underbrace{\alpha_t}_{\color{#E16462}{\style{font-family:inherit}{\text{Period FE}}}}$$` - `\(\mathbf{X}_{it}\)`: Predictive covariates - `\(\mathbf{u}_{it}\)`: Unobserved confounders - `\(\tau(\cdot)\)`: Causal effect --- # Two periods for GRD <img src="mbennett_grd_files/figure-html/grd_setup-1.svg" style="display: block; margin: auto;" /> --- # A gradual approach - Conditional expectations of potential outcomes, `\(Y^{(z)}_t(R)\)`: `$$Y^{(0)}_0(R) = \mathbb{E}[Y^{(0)}_{i0}|R] = \mu_0(R)$$` `$$Y^{(1)}_0(R) = \mathbb{E}[Y^{(1)}_{i0}|R] = \underbrace{\mu_0(R)}_{\color{#E16462}{\style{font-family:inherit}{\text{Avg. Outcome by R}}}} + \underbrace{\tau_0(R)}_{\color{#E16462}{\style{font-family:inherit}{\text{Treat. Effect by R}}}}$$` -- - Identify generalization interval `\(H = [H_{-},H_{+}]\)` for `\(t=0\)`: `$$R_{i} = h(\mathbf{X}_{i}) + \eta_i \ \ \forall \ R_i \in H$$` -- - If `\(H^* = \max\{|H|\}\)` exists, then for a set of covariates `\(\mathbf{X} = \mathbf{X}_T\)`: `$$Y^{(0)}_0(R')|\mathbf{X}_T = Y^{(0)}_0(R'')|\mathbf{X}_T \ \ \ \ \ \ \style{font-family:inherit}{\text{for any}} \ R', R'' \in H^*$$` --- # Conditional outcome within the generalization interval <img src="mbennett_grd_files/figure-html/grd_setup_pre-1.svg" style="display: block; margin: auto;" /> --- # Main assumption for generalization to t=1 <br> <br> .box-0[<b>Assumption: Conditional time-invariance under control</b> `$$Y^{(0)}_0(R|\mathbf{X}) = Y_1^{(0)}(R|\mathbf{X}) + \alpha, \ \ \ \forall R \in H^*$$`] -- - No changes in unobserved confounders between `\(t=0\)` and `\(t=1\)` for units within `\(H^\ast\)` - Partially testable for `\(Z=0\)` in `\(t=1\)` --- # Estimating an effect away from the cutoff <img src="mbennett_grd_files/figure-html/grd_setup_post-1.svg" style="display: block; margin: auto;" /> --- background-position: 50% 50% class: left,middle, inverse, nonum .big[ GRD in practice ] --- # Context: Traditional matching .center[ ![:scale 60%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/UCSD_20210420/images/diagram1_v2.svg) ] --- # Context: Representative template matching with two samples .center[ ![:scale 60%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/UCSD_20210420/images/diagram2_v3.svg) ] --- # Context: Representative template matching .center[ ![:scale 60%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram3v2.svg) ] --- # Diagram for GRD .center[ ![:scale 80%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd.png) ] --- # Step 0: Identification of narrow interval .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd1.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd1-1.svg" style="display: block; margin: auto;" /> ] --- # Step 1: Template selection .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd2.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2a-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2b-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2c-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2d-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2e-1.svg" style="display: block; margin: auto;" /> ] --- # Step 2: Matching units in pre-intervention period .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd3.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd2f-1.svg" style="display: block; margin: auto;" /> ] --- # Step 3: Fit a local polynomial .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd4a.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd3-1.svg" style="display: block; margin: auto;" /> ] --- # Step 4: Identify new generalization interval .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd4b.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd4-1.svg" style="display: block; margin: auto;" /> ] --- # Step 5: If generalization interval > template... .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd5.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd5-1.svg" style="display: block; margin: auto;" /> ] --- # Step 5a: ... expand template and do it again. .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd6.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd5a-1.svg" style="display: block; margin: auto;" /> ] --- # Step 5b: ... until template is the same as the interval .pull-left[ .center[ ![](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/diagram_grd7.png) ] ] .pull-right[ <img src="mbennett_grd_files/figure-html/grd5b-1.svg" style="display: block; margin: auto;" /> ] --- # Step 5b: ... until template is the same as the interval .pull-left[ This can break in two ways: <br> <br> .box-0[1) No overlap of covariates] <br> .box-0[.center[2) Predictive covariates don't explain correlation between R and Y]] ] .pull-right[
] --- # Step 6: ATT estimation <img src="mbennett_grd_files/figure-html/grd6-1.svg" style="display: block; margin: auto;" /> --- # Step 6: ATT estimation Straightforward estimation given the matched sample: - E.g. paired t-test: `$$\hat{\tau}_{ATT}= \sum_{k=1}^N\frac{Y_{k(1)1} - Y_{k(0)1} - (Y_{k(1)0} - Y_{k(0)0})}{N} = \sum_{k=1}^N\frac{d_k}{N}$$` `\(Y_{k(z)t}\)`: Outcome within matched group `\(k\)` with treatment `\(z\)` for period `\(t\)`. --- background-position: 50% 50% class: left, middle, inverse, nonum .big[ Application: Free Higher Education <br> ] --- # Free Higher Education (FHE) in Chile **Context of higher education in Chile:** - Centralized admission system (deferred admission mechanism) - Admission score: PSU score + GPA score + ranking score - Before 2016: Scholarships + government-backed loans **FHE policy:** - Introduced in December 2015 (unanticipated) - Eligibility: Lower 50% income distribution + admitted to eligible program --- # Research question <br> .box-5[What was the effect of being eligible for FHE on application and enrollment to university?] -- <br> - **Treatment:** SE eligibility for FHE - **Two outcomes:** Application to university and enrollment - Lower-income students `\(\rightarrow\)` financial constraints - Saliency of the policy -- - **Larger effects for students away from the cutoff?** - Compare RD to GRD results --- #What data do I have? <br> <br> - **3 cohorts:** 2014, 2015, and 2016 (~ 200,000 students) - **Rich baseline data:** Demographic and socioeconomic data at student level, 10th (8th) grade standardized scores, school characteristics. - **Application data:** Scores by subject, application, and enrollment. --- # How does the RD look like? .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/rd_application-1.svg) ] <!-- --> --- # GRD for Free Higher Education **Steps for GRD:** - Select template size: `\(N = 1,000\)` - 20 bin for grid - MIP matching: - Restricted mean balance (0.05 SD) : Academic performance, school characteristics, demographic/socioeconomic variables. - Fine balance: Gender, mother's and father's education (8 cat.), PSU language score (deciles), PSU math score (deciles), HS GPA (quintiles). -- **Generalization interval: [-M$500.3, M$300.9]** --- # For what population are we generalizing for? .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/rd_grd_application-1.svg) ] <!-- <img src="mbennett_grd_files/figure-html/rd_grd_application-1.svg" style="display: block; margin: auto;" /> --> --- # For what population are we generalizing for? .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/rd_grd_application2-1.svg) ] <!-- <img src="mbennett_grd_files/figure-html/rd_grd_application2-1.svg" style="display: block; margin: auto;" /> --> --- # Testing time invariance on control side for t=1 .center[ ![:scale 90%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/test_y1.png)] --- # Balance for the entire sample .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/balance_all-1.svg)] <!-- <img src="mbennett_grd_files/figure-html/balance_all-1.svg" style="display: block; margin: auto;" /> --> --- # Balance within generalization interval (before matching) .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/balance_before-1.svg)] <!-- <img src="mbennett_grd_files/figure-html/balance_before-1.svg" style="display: block; margin: auto;" /> --> --- # Balance within generalization interval (after matching) .center[ ![:scale 100%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/balance_after-1.svg)] <!-- <img src="mbennett_grd_files/figure-html/balance_after-1.svg" style="display: block; margin: auto;" /> --> --- # Effects of introduction of FHE: RD and GRD .small[
.source[Note: Generalization interval \[-M$500, M$301\]. 95% CI in brackets.] ] --- # Effects of introduction of FHE: Application .small[
.source[Note: Generalization interval \[-M$500, M$301\]. 95% CI in brackets.] ] --- # Effects of introduction of FHE: Enrollment .small[
.source[Note: Generalization interval \[-M$500, M$301\]. 95% CI in brackets.] ] --- # How does the effect change with interval width? .center[ ![:scale 90%](https://raw.githubusercontent.com/maibennett/presentations/main/content/presentations/RD/IC_20210312/images/effect_by_h.png)] --- # Sensitivity analysis to hidden bias - How should an unmeasured confounder shift the probabilities of assignment in order to change the qualitative results of the study? - Follow DD adaptation of Rosenbaum bounds (Keele, Small, Hsu, & Fogarty, 2019). .center[ .small[
]] --- # Conclusions - GRD as a **gradual approach** (not all or nothing) -- <br> <br> - Use data to inform interval for generalization -- <br> <br> - Use of matching to avoid extrapolation -- <br> <br> - Limitations: - More data: two periods - Conditional time invariance assumption for `\(t=1\)` -- <br> <br> - Multiple applications for DD-RD: e.g. geographic RDs. --- background-position: 50% 50% class: center, middle, inverse, nonum .big[ How Far is too Far? Generalization of a Regression Discontinuity Design Away from the Cutoff ] <br> <br> New version of the paper is coming soon. <p href="https://magdalenabennett.com">www.magdalenabennett.com</p>