data-driven strategies for predicting on-time high school...

1
This work was done during the Eric & Wendy Schmidt Data Science for Social Good Fellowship at the University of Chicago. Impact 5 Our work provides school district personnel with empirical, data-driven predictions of whether students will graduate on time. These predictions can be leveraged to identify key factors contributing to a student’s struggles. In turn, these key factors can be used to facilitate focused interventions, which may have a profoundly positive effect on a student’s academic trajectory—and, ultimately, his or her life. 0 0.2 0.4 0.6 0.8 1 6 7 8 9 10 11 Jayne Cobb River Tam Olivia Dunham Maeby Funke Inara Serra Malcolm Reynolds Kaylee Frye Maggie Lizer Nina Sharp Peter Bishop Risk Score Grade Level 22% 87% 37% Risk Trajectory Risk Factors Attendance Behavior Coursework Students Results Our data-driven modeling approach significantly improves upon existing baselines. Here we summarize the results produced for one of our partner districts, where we predict on-time high school graduation for students in 6 th through 11 th grade. The results indicate that we can identify off- track students with relatively high precision, which increases as the prediction time-frame decreases. The error bars illustrate statistical significance. 4 0 0.2 0.4 0.6 0.8 1 6 7 8 9 10 11 Grade Level Precision at Top 10% Baseline Our Approach Methods 3 Name Score Actual Jayne Cobb .97 Yes River Tam .92 Yes Olivia Dunham .90 Yes Maeby Funke .87 No Inara Serra .82 Yes Malcolm Reynolds .79 No Kaylee Frye .79 No Maggie Lizer .62 Yes Nina Sharp .40 No Peter Bishop .21 No Precision at k: Precision on the top k off-track predictions. true off-track predictions + false off-track predictions true off-track predictions Precision: Bad: On-track student predicted higher risk than off-track student. Good: Off-track students have the highest predicted risk. Risk Scoring These predictive models were evaluated upon the precision with which they predict off-track students. Model Validation With the provided data, we applied machine learning models to deliver accurate, interpretable, and data- driven predictions of on-time high school graduation. Predicted On-Track Predicted Off-Track Feature A x Feature A < x Feature B y Feature B > y Feature C z Feature C > z Feature A < z Feature A z Feature D = y Feature D y Example Decision Tree We then used these student-level features to predict on-time graduation via a suite of predictive models, generating a model for each applicable grade level. Data-Driven Modeling Grade Level 6 7 8 9 10 11 12 Features Features Features Features Features Features Features From these records, we generated student-level features potentially predictive of on-time graduation. Feature Generation Grade Level 6 7 8 9 10 11 12 Absences Grades Tardies Tests Absences Grades Tardies Tests Absences Grades Tardies Tests Absences Grades Tardies Tests Absences Grades Tardies Tests Absences Grades Tardies Tests Absences Grades Tardies Tests Date of Birth, Gender, Race/Ethnicity, … The data provided by our partnering school districts primarily consists of longitudinal student records. Raw Data Data 2 In addition to being geographically disparate, these districts represent markedly diverse student populations with different graduation outcomes. Data for this work comes out of a partnership with three public school districts—Arlington Public Schools (APS), Vancouver Public Schools (VPS), and Wake County Public School System (WCPSS)—each already recognizing the importance of identifying students at risk and applying interventions. 0% 20% 40% 60% 80% 100% LEP/ELL FRL SPED Minority Grad. Rate APS VPS WCPSS Percentage of Students Introduction *G. Kena, L. Musu-Gillette, J. Robinson, et al. The Condition of Education 2015. (NCES 2015-144). U.S. Department of Education, National Center for Education Statistics, Washington, D.C., 2015. About one in five high school students in the U.S. fail to graduate on time * , either not completing high school or graduating late. To best apply intervention programs for these off-track students, schools need to identify students at risk as early as possible and understand the factors that contribute to this risk. 1 Graduated Late Non-completer Alan Fritzler Northwestern University Anushka Anand Tableau Research Reid A. Johnson University of Notre Dame Siobhan Greatorex-Voith Harvard University Ruobin Gong Harvard University Kerstin Frailey Cornell University Data-Driven Strategies for Predicting On-Time High School Graduation

Upload: others

Post on 12-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data-Driven Strategies for Predicting On-Time High School ...rjohns15/content/posters/dssg2015_hs_poster.pdfNina Sharp Peter Bishop e Grade Level 22% 87% 37% Risk Trajectory Risk Factors

This work was done during the Eric & Wendy Schmidt Data Science for Social Good Fellowship at the University of Chicago.

Impact!5!Our work provides school district personnel with empirical, data-driven predictions of whether students will graduate on time. These predictions can be leveraged to identify key factors contributing to a student’s struggles. In turn, these key factors can be used to facilitate focused interventions, which may have a profoundly positive effect on a student’s academic trajectory—and, ultimately, his or her life.

0

0.2

0.4

0.6

0.8

1

6 7 8 9 10 11

Jayne Cobb River Tam Olivia Dunham!Maeby Funke Inara Serra Malcolm Reynolds Kaylee Frye Maggie Lizer Nina Sharp Peter Bishop Ri

sk S

core

Grade Level

22% 87% 37%

Risk Trajectory

Risk Factors Attendance Behavior Coursework

Students

Results!Our data-driven modeling approach significantly improves upon existing baselines. Here we summarize the results produced for one of our partner districts, where we predict on-time high school graduation for students in 6th through 11th grade. The results indicate that we can identify off-track students with relatively high precision, which increases as the prediction time-frame decreases. The error bars illustrate statistical significance.

4!

0

0.2

0.4

0.6

0.8

1

6 7 8 9 10 11

Grade Level

Prec

ision

at T

op 1

0%

Baseline

Our Approach

Methods!3!

Name Score Actual Jayne Cobb .97 Yes River Tam .92 Yes Olivia Dunham .90 Yes Maeby Funke .87 No Inara Serra .82 Yes Malcolm Reynolds .79 No Kaylee Frye .79 No Maggie Lizer .62 Yes Nina Sharp .40 No Peter Bishop .21 No

Precision at k: Precision on the top k off-track predictions.

true off-track predictions + false off-track predictions true off-track predictions

Precision:

Bad: On-track student predicted higher risk than off-track student.

Good: Off-track students have the highest predicted risk.

Risk Scoring These predictive models were evaluated upon the precision with which they predict off-track students.

Model Validation

With the provided data, we applied machine learning models to deliver accurate, interpretable, and data-driven predictions of on-time high school graduation.

Predicted On-Track

Predicted Off-Track

Feature A ≥ x Feature A < x

Feature B ≤ y Feature B > y

Feature C ≤ z Feature C > z Feature A < z Feature A ≥ z

Feature D = y Feature D ≠ y

Example Decision Tree

We then used these student-level features to predict on-time graduation via a suite of predictive models, generating a model for each applicable grade level.

Data-Driven Modeling

Grade Level 6 7 8 9 10 11 12

Features Features Features Features Features Features Features

From these records, we generated student-level features potentially predictive of on-time graduation.

Feature Generation

Grade Level 6 7 8 9 10 11 12

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Absences Grades Tardies Tests

Date of Birth, Gender, Race/Ethnicity, …

The data provided by our partnering school districts primarily consists of longitudinal student records.

Raw Data

Data!2!

In addition to being geographically disparate, these districts represent markedly diverse student populations with different graduation outcomes.

Data for this work comes out of a partnership with three public school districts—Arlington Public Schools (APS), Vancouver Public Schools (VPS), and Wake County Public School System (WCPSS)—each already recognizing the importance of identifying students at risk and applying interventions.

0%

20%

40%

60%

80%

100%

LEP/ELL FRL SPED Minority Grad. Rate

APS

VPS

WCPSS

Perc

enta

ge o

f Stu

dent

s

Introduction!

* G. Kena, L. Musu-Gillette, J. Robinson, et al. The Condition of Education 2015. (NCES 2015-144). U.S. Department of Education, National Center for Education Statistics, Washington, D.C., 2015.

About one in five high school students in the U.S. fail to graduate on time*, either not completing high school or graduating late. To best apply intervention programs for these off-track students, schools need to identify students at risk as early as possible and understand the factors that contribute to this risk.

1!

Graduated Late

Non-completer

Alan Fritzler!Northwestern University

Anushka Anand!Tableau Research

Reid A. Johnson!University of Notre Dame

Siobhan Greatorex-Voith!Harvard University

Ruobin Gong!Harvard University

Kerstin Frailey!Cornell University

Data-Driven Strategies for Predicting On-Time High School Graduation