+ - 0:00:00
Notes for current slide
Notes for next slide


Beyond Exclusion:
The Role of High-Stake Testing on Attendance the Day of the Test







PRIISM Seminar
April 3rd, 2024


Magdalena Bennett   
The University of Texas at Austin   


Christopher Neilson   
Yale University   


Nicolás Rojas   
Columbia University   




Motivation

  • Results from high-stakes tests widely used in education policy

Motivation

  • Results from high-stakes tests widely used in education policy

Motivation

  • Results from high-stakes tests widely used in education policy

Motivation

  • Results from high-stakes tests widely used in education policy

Motivation

  • Results from high-stakes tests widely used in education policy

Motivation

  • Results from high-stakes tests widely used in education policy

  • Assumption: Standardize tests used as a proxy of school quality

Motivation

  • Results from high-stakes tests widely used in education policy

  • Assumption: Standardize tests used as a proxy of school quality

  • Multiple issues with the use of standardized tests:

    • Teaching to the test/explicit cheating.

    • Correlation with SES.

    • Non-representative patterns of attendance.

Motivation

  • Prior literature related to student exclusion:

    • Reclassification of low-performers as students with disabilities (e.g. Figlio & Loeb, 2011; Figlio & Getzler, 2007; Cullen & Reback, 2006)
    • Use of disciplinary measures to exclude low-performers (Figlio, 2006)

Motivation

  • Prior literature related to student exclusion:

    • Reclassification of low-performers as students with disabilities (e.g. Figlio & Loeb, 2011; Figlio & Getzler, 2007; Cullen & Reback, 2006)
    • Use of disciplinary measures to exclude low-performers (Figlio, 2006)

  • Some studies analyzing the effect of attendance manipulation in Chile

    • Distortion in quality signals (Cuesta et al., 2020)
    • Manipulation for specific vulnerable schools (SEP) to raise scores (Feigenberg et al., 2019; Quezada & Hippel, 2017)

Motivation

  • Prior literature related to student exclusion:

    • Reclassification of low-performers as students with disabilities (e.g. Figlio & Loeb, 2011; Figlio & Getzler, 2007; Cullen & Reback, 2006)
    • Use of disciplinary measures to exclude low-performers (Figlio, 2006)

  • Some studies analyzing the effect of attendance manipulation in Chile

    • Distortion in quality signals (Cuesta et al., 2020)
    • Manipulation for specific vulnerable schools (SEP) to raise scores (Feigenberg et al., 2019; Quezada & Hippel, 2017)

  • Schools have incentives to game the system

    • Especially in high-accountability settings

This paper

Attendance Patterns

  • Event study approach:

    • How do these exclusions patterns look like? Are these the same for every (type of) school and every grade?

    • Focus beyond bottom performers

    • Robustness checks for alternative mechanisms

This paper

Attendance Patterns

  • Event study approach:

    • How do these exclusions patterns look like? Are these the same for every (type of) school and every grade?

    • Focus beyond bottom performers

    • Robustness checks for alternative mechanisms

What can be done?

  • Machine learning prediction:

    • Identification of schools that are most likely gaming the system

    • Consequences of blanket policies in imputation of scores








The Chilean Educational Context

The Chilean context: Standardized testing

  • Chile has a universal voucher system (school choice)

The Chilean context: Standardized testing

  • Chile has a universal voucher system (school choice)

  • Universal standardized testing since 1980's (SIMCE)

    • For all 4th graders; then extended to other grades.

The Chilean context: Standardized testing

  • Chile has a universal voucher system (school choice)

  • Universal standardized testing since 1980's (SIMCE)

    • For all 4th graders; then extended to other grades.

  • SIMCE as a high-stake test:

    • Results widely available in a universal voucher system

    • Tied to teachers' bonuses

    • Tied to budget restrictions and school closures

SIMCE and absenteeism

  • Use of pre-filled communication for parents to be sent out by schools

    • Evidence that parents from lower-income students are less likely to receive information

SIMCE and absenteeism

  • Use of pre-filled communication for parents to be sent out by schools

    • Evidence that parents from lower-income students are less likely to receive information

  • No real consequences for low attendance:

    • Between 2005-2007, non-representative results where marked with symbols

    • No imputation strategy so far

SIMCE and absenteeism

  • Use of pre-filled communication for parents to be sent out by schools

    • Evidence that parents from lower-income students are less likely to receive information

  • No real consequences for low attendance:

    • Between 2005-2007, non-representative results where marked with symbols

    • No imputation strategy so far

  • Improvement of regulation for justifying students exclusion

    • E.g. specific disabilities (blindness) or non-Spanish speakers.








Attendance Patterns for the Day of the Test

Data Available

  • Standardized tests 2011-2018 (SIMCE)

    • Scores at student and school level for different subjects (Math, Language, History, and Science)

    • Student's socioeconomic characterization (parental questionnaire)

Data Available

  • Standardized tests 2011-2018 (SIMCE)

    • Scores at student and school level for different subjects (Math, Language, History, and Science)

    • Student's socioeconomic characterization (parental questionnaire)

  • Daily attendance data 2011-2018 (SIGE)

    • Use for voucher payments (each day has ~ 2.5 million records)

Data Available

  • Standardized tests 2011-2018 (SIMCE)

    • Scores at student and school level for different subjects (Math, Language, History, and Science)

    • Student's socioeconomic characterization (parental questionnaire)

  • Daily attendance data 2011-2018 (SIGE)

    • Use for voucher payments (each day has ~ 2.5 million records)

  • GPA Performance 2011-2018 (Rendimiento)

    • Use GPA performance deciles within school-grade

Observations from our data

Data description
Grade Years tested Num Schools Num Students
2 2013, 2014, 2015 5,266 628,073
4 2011, 2013-2018 5,673 1,461,289
6 2013-2016, 2018 5,516 1,056,243
8 2011, 2013-2015, 2017 5,545 1,078,140
10 2013-2018 2,623 1,213,067

Empirical approach for difference in attendance

  • Event study centered around the day of the test (T=0):

Yipsgt=P=15T=45τPTDipsgtPT+γpt+αi+ϵipsgt

Where

  • Yipsgt: Binary attendance for student i, from GPA group p, in school s and grade g, for day t.

  • DipsgtPT: Indicator variables (lags and leads) for students that belong to a tested grade.

Clear difference in attendance by performance for 2nd grade

No effect on lower performers for 10th grade

Attendance patterns differ by grade

Potential mechanisms that explain these patterns


  • Students are excluded due to other reasons (justified)
  • Low-performing students are less aware of the test
  • Students experience a disutility from testing
  • Schools directly (des)incentivize attendance of (lower)higher performers

Differences in communication and incentives between high and low performers

  • 2017 survey for students in test-taking grades.
Results for 4th Grade
GPA Decile Told Notification Preparation Grades
D1 -0.06*** -0.11*** -0.08*** 0.14***
(0.00) (0.00) (0.00) (0.00)
D10 0.06*** 0.05*** 0.05*** -0.2***
(0.00) (0.00) (0.00) (0.00)
Baseline 0.89*** 0.87*** 0.89*** 0.39***
(0.00) (0.00) (0.00) (0.00)

Differences in communication and incentives between high and low performers

  • 2017 survey for students in test-taking grades.
Results for 4th Grade
GPA Decile Told Notification Preparation Grades
D1 -0.06*** -0.11*** -0.08*** 0.14***
(0.00) (0.00) (0.00) (0.00)
D10 0.06*** 0.05*** 0.05*** -0.2***
(0.00) (0.00) (0.00) (0.00)
Baseline 0.89*** 0.87*** 0.89*** 0.39***
(0.00) (0.00) (0.00) (0.00)

Differences in communication and incentives between high and low performers

  • 2017 survey for students in test-taking grades.
Results for 4th Grade
GPA Decile Told Notification Preparation Grades
D1 -0.06*** -0.11*** -0.08*** 0.14***
(0.00) (0.00) (0.00) (0.00)
D10 0.06*** 0.05*** 0.05*** -0.2***
(0.00) (0.00) (0.00) (0.00)
Baseline 0.89*** 0.87*** 0.89*** 0.39***
(0.00) (0.00) (0.00) (0.00)

Differences in communication and incentives between high and low performers

  • 2017 survey for students in test-taking grades.
Results for 10th Grade
GPA Decile Told Notification Preparation Grades
D1 -0.02*** -0.01*** -0.02*** 0.05***
(0.00) (0.00) (0.00) (0.00)
D10 0.01*** 0.00 0.00 -0.03***
(0.00) (0.00) (0.00) (0.00)
Baseline 0.95*** 0.78*** 0.82*** 0.33***
(0.00) (0.00) (0.00) (0.00)

No evidence of self-selection from students because of testing

  • Students experience a disutility from testing?

    • Use of no-stake test applied to schools (~300 schools per year) No effect on attendance
Results for No-Stakes Test
Grade - Year D1 D2 D3D8 D9 D10
2nd 2011 -0.01 0.01 0.01* 0.02 0.00
(0.01) (0.01) (0.01) (0.01) (0.01)
5th 2012 0.00 -0.01 0.00 0.01 0.01
(0.01) (0.01) (0.01) (0.01) (0.01)
6th 2011 0.02* 0.01 0.01** 0.01 0.00
(0.01) (0.01) (0.00) (0.01) (0.01)
6th 2017 0.00 0.03 0.01 0.01 0.00
(0.02) (0.02) (0.01) (0.02) (0.01)
11th 2012 0.00 0.00 0.00 -0.02** 0.00
(0.01) (0.01) (0.00) (0.01) (0.01)








Predicting the Counterfactual

How do these results compare to predicted counterfactual?

  • Can we use this existing rich panel data to predict attendance on the day of the test as if it was a regular day?

How do these results compare to predicted counterfactual?

  • Can we use this existing rich panel data to predict attendance on the day of the test as if it was a regular day?

  • Use Extreme Gradient Boosting (XGBoost) with panel data for attendance prediction

    • Predictor variables include lags for attendance, day of the week, grade, GPA group, and sibling's attendance, in addition to student and schools' characteristics.

How do these results compare to predicted counterfactual?

  • Can we use this existing rich panel data to predict attendance on the day of the test as if it was a regular day?

  • Use Extreme Gradient Boosting (XGBoost) with panel data for attendance prediction

    • Predictor variables include lags for attendance, day of the week, grade, GPA group, and sibling's attendance, in addition to student and schools' characteristics.

  • Use data between 1st and 5th grade (2017) in the Metropolitan region - 4th grade is treated:

    • Data before the test to predict attendance on the day of the test.

Overall predictions over performance distribution

Overall predictions over performance distribution

Example: Comparisons between schools?

School1

Math: 258
Language: 260

School2

Math: 259
Language: 256

Example: Comparisons between schools?

School1

Math: 258
Language: 260

School2

Math: 259
Language: 256

Example: Comparisons between schools?

School1

Math: 258
Language: 260

School2

Math: 259
Language: 256

Can we characterize these schools?

  • K-means clustering Use differences between predicted and observed attendance.
    • 2 optimal clusters

Can we characterize these schools?

  • K-means clustering Use differences between predicted and observed attendance.
    • 2 optimal clusters

Schools that appear to exclude lower-perfoming students are also more vulnerable

Schools that appear to exclude lower-perfoming students are also more vulnerable

Schools that appear to exclude lower-perfoming students are also more vulnerable

Schools that appear to exclude lower-perfoming students are also more vulnerable

Schools that appear to exclude lower-perfoming students are also more vulnerable

Cluster 1
Increase Att (N=1094)
Cluster 2
Lower Att (bottom) (N=346)
Mean SD Mean SD Diff. in Means P-value
Avg. SIMCE Lang 258.84 22.38 252.62 23.84 -6.22 0.00
Avg. SIMCE Math 254.42 25.70 247.80 25.15 -6.62 0.00
Public 0.35 0.48 0.42 0.49 0.07 0.03
SEP status 0.84 0.37 0.88 0.33 0.03 0.11
% Priority Students 0.48 0.19 0.52 0.19 0.04 0.00
Diff D1 GPA 0.02 0.15 -0.22 0.27 -0.24 0.00
Diff D2 GPA 0.05 0.11 -0.17 0.21 -0.22 0.00
Diff D9 GPA 0.04 0.06 -0.03 0.15 -0.07 0.00
Diff D10 GPA 0.03 0.07 -0.01 0.12 -0.04 0.00

Schools that appear to exclude lower-perfoming students are also more vulnerable

Cluster 1
Increase Att (N=1094)
Cluster 2
Lower Att (bottom) (N=346)
Mean SD Mean SD Diff. in Means P-value
Avg. SIMCE Lang 258.84 22.38 252.62 23.84 -6.22 0.00
Avg. SIMCE Math 254.42 25.70 247.80 25.15 -6.62 0.00
Public 0.35 0.48 0.42 0.49 0.07 0.03
SEP status 0.84 0.37 0.88 0.33 0.03 0.11
% Priority Students 0.48 0.19 0.52 0.19 0.04 0.00
Diff D1 GPA 0.02 0.15 -0.22 0.27 -0.24 0.00
Diff D2 GPA 0.05 0.11 -0.17 0.21 -0.22 0.00
Diff D9 GPA 0.04 0.06 -0.03 0.15 -0.07 0.00
Diff D10 GPA 0.03 0.07 -0.01 0.12 -0.04 0.00

Implications for imputation policies

  • How to handle this absenteeism problem?

    • E.g.: Observed attendance (no imputation), attendance as if the test hadn't happened (impute "typical day"), everybody is present.

Implications for imputation policies

  • How to handle this absenteeism problem?

    • E.g.: Observed attendance (no imputation), attendance as if the test hadn't happened (impute "typical day"), everybody is present.

  • Proposals to impute lowest scores for absent students to disincentivize arbitrary exclusion

    • Most vulnerable schools have higher absenteeism rates Increase inequality and non-representativeness

Some imputation exercises

How can we impute missing scores?

  • Scenario 1: Not impute at all. Show observed distributions.

  • Scenario 2: Impute by decile only for the difference between predicted and observed attendance.

    • Imputed score: (a) overall min, (b) decile min, (c) min school, or (d) min decile by school.
  • Scenario 3: Impute every missing student.

    • Imputed score: overall min

Some imputation exercises

How can we impute missing scores?

  • Scenario 1: Not impute at all. Show observed distributions.

  • Scenario 2: Impute by decile only for the difference between predicted and observed attendance.

    • Imputed score: (a) overall min, (b) decile min, (c) min school, or (d) min decile by school.
  • Scenario 3: Impute every missing student.

    • Imputed score: overall min

Some caveats:

  • Difference between predicted and obs. captures total incentives/disincentives in attendance.

  • Imputed score might be too optimistic (e.g. real score would be lower than observed distribution)

Scenario 1 vs Scenario 3: No imputation and Impute all

Imputing Predicted - Observed is less extreme








Let's Wrap Up...

Conclusions and next steps

  • Non-representative patterns of absenteeism beyond exclusion of low-performers

    • High heterogeneity between schools
    • Identification of gaming schools (e.g. persistance)

Conclusions and next steps

  • Non-representative patterns of absenteeism beyond exclusion of low-performers

    • High heterogeneity between schools
    • Identification of gaming schools (e.g. persistance)

  • Communication strategies play important role for lower-performing students

Conclusions and next steps

  • Non-representative patterns of absenteeism beyond exclusion of low-performers

    • High heterogeneity between schools
    • Identification of gaming schools (e.g. persistance)

  • Communication strategies play important role for lower-performing students

  • Impact of imputation policies?

    • Work in progress: How does non-representativeness and different imputation strategies impact policies and information provision? What score do we impute and for whom?


Beyond Exclusion:
The Role of High-Stake Testing on Attendance the Day of the Test







PRIISM Seminar
April 3rd, 2024


Magdalena Bennett   
The University of Texas at Austin   


Christopher Neilson   
Yale University   


Nicolás Rojas   
Columbia University   




Motivation

  • Results from high-stakes tests widely used in education policy
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow