class: inverse, center, middle <br> .left[ <h1 class="title-own">Beyond Exclusion:<br/>The Role of High-Stake Testing on Attendance the Day of the Test</h1>] .pull-left-little_l[ <br> <br> <br> <br> <br> <br> .left[.small[AEFP Conference<br>March 14th, 2024]]] .pull-right-little_l[ <br> .small[.right[Magdalena Bennett <br>*The University of Texas at Austin* ]] <br> .small[.right[Christopher Neilson <br>*Yale University* ]] <br> .small[.right[Nicolás Rojas <br>*Columbia University* ]] ] <br> <br> <br> --- # Motivation Results from **.coolblue[high-stakes tests]** widely used in education policy - E.g. funding, promotions, school closures, school choice, etc. <br> <br> --- # Motivation Results from **.coolblue[high-stakes tests]** widely used in education policy - E.g. funding, promotions, school closures, school choice, etc. <br> **.coolblue[Assumption]**: Standardize tests used as a proxy of school quality <br> --- # Motivation Results from **.coolblue[high-stakes tests]** widely used in education policy - E.g. funding, promotions, school closures, school choice, etc. <br> **.coolblue[Assumption]**: Standardize tests used as a proxy of school quality <br> <br> **.coolblue[Multiple issues]** with the use of standardized tests: - Teaching to the test/explicit cheating. - Correlation with SES. - **.coolblue[Non-representative patterns of attendance.]** --- # Motivation Results from **.coolblue[high-stakes tests]** widely used in education policy - E.g. funding, promotions, school closures, school choice, etc. <br> **.coolblue[Assumption]**: Standardize tests used as a proxy of school quality <br> <br> **.coolblue[Multiple issues]** with the use of standardized tests: - Teaching to the test/explicit cheating. - Correlation with SES. - **.coolblue[Non-representative patterns of attendance.]** - Prior work: Re-classification of low-performers .small[(e.g. Figlio & Loeb, 2011; Figlio & Getzler, 2007; Cullen & Reback, 2006)]; Use of disciplinary measures .small[(Figlio, 2006)]; Distortions in quality signals .small[(Cuesta et al., 2020)] --- # This paper **.coolblue[Attendance Patterns]** - Event study approach: - *How do these exclusions patterns look like? Are these the same for every (type of) school and every grade?* - Focus beyond bottom performers - Robustness checks for alternative mechanisms -- **.coolblue[What can be done?]** - Machine learning prediction: - Identification of schools that are most likely gaming the system - Consequences of blanket policies in imputation of scores --- background-position: 50% 50% class: left, middle, inverse <br> <br> <br> <br> <br> <br> <br> .big[ Context of the Study and Data ] --- # The Chilean context: Standardized testing - Chile has a **.coolblue[universal voucher system]** (school choice) -- <br> <br> - Universal standardized testing since 1980's (SIMCE) - For all 4th graders; then extended to other grades. -- <br> <br> - SIMCE as **.coolblue[high-stake]** testing: - Results widely available in a universal voucher system - Tied to teachers' bonuses - Tied to budget restrictions and school closures --- # Data Available - **.coolblue[Standardized tests 2011-2018 (SIMCE)]** - Scores at student and school level for different subjects (Math, Language, History, and Science) - Student's socioeconomic characterization (parental questionnaire) -- <br> <br> - **.coolblue[Daily attendance data 2011-2018 (SIGE)]** - Use for voucher payments (each day has ~ 2.5 million records) -- <br> <br> - **.coolblue[GPA Performance 2011-2018 (Rendimiento)]** - Use GPA performance deciles within school-grade --- background-position: 50% 50% class: left, middle, inverse <br> <br> <br> <br> <br> <br> <br> .big[ Attendance Patterns for the Day of the Test ] --- # Empirical approach for difference in attendance - Event study centered around the day of the test (T=0): `$$Y_{ipsgt} = \sum_{P=1}^5\sum_{T=-4}^5 \tau^{PT}D^{PT}_{ipsgt} + \gamma_{pt} +\alpha_i + \epsilon_{ipsgt}$$` Where - `\(Y_{ipsgt}\)`: Binary attendance for student `\(i\)`, from GPA group `\(p\)`, in school `\(s\)` and grade `\(g\)`, for day `\(t\)`. - `\(D^{PT}_{ipsgt}\)`: Indicator variables (lags and leads) for students that belong to a tested grade. --- # Clear difference in attendance by performance for 2nd grade <img src="mbennett_attendance_files/figure-html/event_study_plot2nd-1.svg" style="display: block; margin: auto;" /> --- # No effect on lower performers for 10th grade <img src="mbennett_attendance_files/figure-html/event_study_plot10th-1.svg" style="display: block; margin: auto;" /> --- # Attendance patterns differ by grade <img src="mbennett_attendance_files/figure-html/event_study_plot-1.svg" style="display: block; margin: auto;" /> --- # Potential mechanisms that explain these patterns <br> - Students are **.coolblue[excluded due to other reasons]** (justified) - Low-performing students are **.coolblue[less aware of the test]** - Students experience a **.coolblue[disutility from testing]** - Schools directly **.coolblue[(des)incentivize attendance of (lower)higher performers]** --- # Differences in communication and incentives between high and low performers - 2017 survey for students in test-taking grades. <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <caption>Results for 4th Grade</caption> <thead> <tr> <th style="text-align:left;"> GPA Decile </th> <th style="text-align:center;"> Told </th> <th style="text-align:center;"> Notification </th> <th style="text-align:center;"> Preparation </th> <th style="text-align:center;"> Grades </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> D1 </td> <td style="text-align:center;"> -0.06*** </td> <td style="text-align:center;"> -0.11*** </td> <td style="text-align:center;"> -0.08*** </td> <td style="text-align:center;"> 0.14*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> <tr> <td style="text-align:left;"> D10 </td> <td style="text-align:center;"> 0.06*** </td> <td style="text-align:center;"> 0.05*** </td> <td style="text-align:center;"> 0.05*** </td> <td style="text-align:center;"> -0.2*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> <tr> <td style="text-align:left;"> Baseline </td> <td style="text-align:center;"> 0.89*** </td> <td style="text-align:center;"> 0.87*** </td> <td style="text-align:center;"> 0.89*** </td> <td style="text-align:center;"> 0.39*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> </tbody> </table> --- # Differences in communication and incentives between high and low performers - 2017 survey for students in test-taking grades. <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <caption>Results for 10th Grade</caption> <thead> <tr> <th style="text-align:left;"> GPA Decile </th> <th style="text-align:center;"> Told </th> <th style="text-align:center;"> Notification </th> <th style="text-align:center;"> Preparation </th> <th style="text-align:center;"> Grades </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> D1 </td> <td style="text-align:center;"> -0.02*** </td> <td style="text-align:center;"> -0.01*** </td> <td style="text-align:center;"> -0.02*** </td> <td style="text-align:center;"> 0.05*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> <tr> <td style="text-align:left;"> D10 </td> <td style="text-align:center;"> 0.01*** </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> -0.03*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> <tr> <td style="text-align:left;"> Baseline </td> <td style="text-align:center;"> 0.95*** </td> <td style="text-align:center;"> 0.78*** </td> <td style="text-align:center;"> 0.82*** </td> <td style="text-align:center;"> 0.33*** </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.00) </td> </tr> </tbody> </table> --- # No evidence of self-selection from students because of testing .pull-left[ - Students experience a **.coolblue[disutility from testing]**? - Use of **.coolblue[no-stake test]** applied to schools (~300 schools per year) `\(\rightarrow\)` No effect on attendance ] .pull-right[ .small[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <caption>Results for No-Stakes Test</caption> <thead> <tr> <th style="text-align:left;"> Grade - Year </th> <th style="text-align:center;"> D1 </th> <th style="text-align:center;"> D2 </th> <th style="text-align:center;"> D3D8 </th> <th style="text-align:center;"> D9 </th> <th style="text-align:center;"> D10 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2nd 2011 </td> <td style="text-align:center;"> -0.01 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.01* </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> </tr> <tr> <td style="text-align:left;"> 5th 2012 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> -0.01 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.01 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> </tr> <tr> <td style="text-align:left;"> 6th 2011 </td> <td style="text-align:center;"> 0.02* </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.01** </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> </tr> <tr> <td style="text-align:left;"> 6th 2017 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.02) </td> <td style="text-align:center;"> (0.02) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.02) </td> <td style="text-align:center;"> (0.01) </td> </tr> <tr> <td style="text-align:left;"> 11th 2012 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> 0.00 </td> <td style="text-align:center;"> -0.02** </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.00) </td> <td style="text-align:center;"> (0.01) </td> <td style="text-align:center;"> (0.01) </td> </tr> </tbody> </table> ] ] --- background-position: 50% 50% class: left, middle, inverse <br> <br> <br> <br> <br> <br> <br> .big[ Predicting the Counterfactual ] --- # How do these results compare to predicted counterfactual? - Can we use **.coolblue[this existing rich panel data]** to predict attendance on the day of the test *as if it was a regular day*? -- <br> <br> - Use **.coolblue[Extreme Gradient Boosting (XGBoost)]** with panel data for **.coolblue[attendance prediction]** - Predictor variables include day of the week, grade, GPA group, and sibling's attendance, in addition to schools' characteristics. -- <br> <br> - Use data between 1st and 5th grade (2017) in the Metropolitan region - 4th grade is treated: - Data **.coolblue[before]** the test to **.coolblue[predict attendance on the day of the test]**. --- # Overall predictions over performance distribution <img src="mbennett_attendance_files/figure-html/prediction_all_boxplot1-1.svg" style="display: block; margin: auto;" /> --- # Overall predictions over performance distribution <img src="mbennett_attendance_files/figure-html/prediction_all_boxplot2-1.svg" style="display: block; margin: auto;" /> --- # Example: Comparisons between schools? .pull-left[ .center[ .box-7trans[**School1**<br> .small[Math: 258<br> Language: 260]] ]] .pull-right[ .center[ .box-7trans[**School2**<br> .small[Math: 259<br> Language: 256]] ]] -- <img src="mbennett_attendance_files/figure-html/prediction_example-1.svg" style="display: block; margin: auto;" /> --- # Example: Comparisons between schools? .pull-left[ .center[ .box-7trans[**School1**<br> .small[Math: 258<br> Language: 260]] ]] .pull-right[ .center[ .box-7trans[**School2**<br> .small[Math: 259<br> Language: 256]] ]] <img src="mbennett_attendance_files/figure-html/prediction_example2-1.svg" style="display: block; margin: auto;" /> --- # Can we characterize these schools? .small[ - **.coolblue[K-means clustering]** Use differences between predicted and observed attendance. - 2 optimal clusters] -- <img src="mbennett_attendance_files/figure-html/cluster_plot-1.svg" style="display: block; margin: auto;" /> --- # Schools that appear to exclude lower-perfoming students are also more vulnerable .smaller[ <table style='NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;' class="table lightable-paper"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Cluster 1<br>Increase att (N=1094)</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Cluster 2<br>Lower att (bottom) (N=346)</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Mean </th> <th style="text-align:center;"> Std. Dev. </th> <th style="text-align:center;"> Mean </th> <th style="text-align:center;"> Std. Dev. </th> <th style="text-align:center;"> Diff. in Means </th> <th style="text-align:center;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Avg. SIMCE Lang </td> <td style="text-align:center;"> 258.84 </td> <td style="text-align:center;"> 22.38 </td> <td style="text-align:center;"> 252.62 </td> <td style="text-align:center;"> 23.84 </td> <td style="text-align:center;"> -6.22 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Avg. SIMCE Math </td> <td style="text-align:center;"> 254.42 </td> <td style="text-align:center;"> 25.70 </td> <td style="text-align:center;"> 247.80 </td> <td style="text-align:center;"> 25.15 </td> <td style="text-align:center;"> -6.62 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Public </td> <td style="text-align:center;"> 0.35 </td> <td style="text-align:center;"> 0.48 </td> <td style="text-align:center;"> 0.42 </td> <td style="text-align:center;"> 0.49 </td> <td style="text-align:center;"> 0.07 </td> <td style="text-align:center;"> 0.03 </td> </tr> <tr> <td style="text-align:left;"> SEP status </td> <td style="text-align:center;"> 0.84 </td> <td style="text-align:center;"> 0.37 </td> <td style="text-align:center;"> 0.88 </td> <td style="text-align:center;"> 0.33 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> 0.11 </td> </tr> <tr> <td style="text-align:left;"> % Priority Students </td> <td style="text-align:center;"> 0.48 </td> <td style="text-align:center;"> 0.19 </td> <td style="text-align:center;"> 0.52 </td> <td style="text-align:center;"> 0.19 </td> <td style="text-align:center;"> 0.04 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Diff D1 GPA </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 0.15 </td> <td style="text-align:center;"> -0.22 </td> <td style="text-align:center;"> 0.27 </td> <td style="text-align:center;"> -0.24 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Diff D2 GPA </td> <td style="text-align:center;"> 0.05 </td> <td style="text-align:center;"> 0.11 </td> <td style="text-align:center;"> -0.17 </td> <td style="text-align:center;"> 0.21 </td> <td style="text-align:center;"> -0.22 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Diff D9 GPA </td> <td style="text-align:center;"> 0.04 </td> <td style="text-align:center;"> 0.06 </td> <td style="text-align:center;"> -0.03 </td> <td style="text-align:center;"> 0.15 </td> <td style="text-align:center;"> -0.07 </td> <td style="text-align:center;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Diff D10 GPA </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> 0.07 </td> <td style="text-align:center;"> -0.01 </td> <td style="text-align:center;"> 0.12 </td> <td style="text-align:center;"> -0.04 </td> <td style="text-align:center;"> 0.00 </td> </tr> </tbody> <tfoot><tr><td style="padding: 0; " colspan="100%"> <sup></sup> Note: Diff DX GPA represents the difference between obs. attendance and predicted attendance for decile X</td></tr></tfoot> </table> ] --- # Implications for imputation policies - **.coolblue[How to handle this absenteeism problem?]** - E.g.: Observed attendance (no imputation), attendance as if the test hadn't happened (impute "typical day"), everybody is present. -- <br> <br> - Proposals to impute **.coolblue[lowest scores for absent students]** to disincentivize arbitrary exclusion - Most vulnerable schools have higher absenteeism rates `\(\rightarrow\)` Increase inequality and non-representativeness --- # Some imputation exercises How can we **.coolblue[impute missing scores]**? - **.coolblue[Scenario 1]**: Not impute at all. Show observed distributions. - **.coolblue[Scenario 2]**: Impute by decile only for the difference between predicted and observed attendance. - Imputed score: (a) overall min, (b) decile min, (c) min school, or (d) min decile by school. - **.coolblue[Scenario 3]**: Impute every missing student. - Imputed score: overall min -- **.coolblue[Some caveats]**: - Difference between predicted and obs. captures total incentives/disincentives in attendance. - Imputed score might be too optimistic (e.g. real score would be lower than observed distribution) --- # Scenario 1 vs Scenario 3: No imputation and Impute all <img src="mbennett_attendance_files/figure-html/imputation_diff1-1.svg" style="display: block; margin: auto;" /> --- # Imputing Predicted - Observed is less extreme <img src="mbennett_attendance_files/figure-html/imputation_diff2-1.svg" style="display: block; margin: auto;" /> --- background-position: 50% 50% class: left, middle, inverse <br> <br> <br> <br> <br> <br> <br> .big[ Let's Wrap Up... ] --- # Conclusions and next steps - Non-representative patterns of absenteeism **.coolblue[beyond exclusion of low-performers]** - High heterogeneity between schools - Identification of gaming schools (e.g. persistance) -- <br> <br> - **.coolblue[Communication strategies]** play important role for **.coolblue[lower-performing students]** -- <br> <br> - Impact of **.coolblue[imputation policies]**? - Work in progress: How does non-representativeness and different imputation strategies impact policies and information provision? What score do we impute and for whom? --- class: inverse, center, middle <br> .left[ <h1 class="title-own">Beyond Exclusion:<br/>The Role of High-Stake Testing on Attendance the Day of the Test</h1>] .pull-left-little_l[ <br> <br> <br> <br> <br> <br> .left[.small[AEFP Conference<br>March 14th, 2023]]] .pull-right-little_l[ <br> .small[.right[Magdalena Bennett <br>*The University of Texas at Austin* ]] <br> .small[.right[Christopher Neilson <br>*Yale University* ]] <br> .small[.right[Nicolás Rojas <br>*Columbia University* ]] ] <br> <br> <br>