data science academy student demo day--richard sheng, kinvolved school attendance

Post on 28-Nov-2014

248 Views

Category:

Engineering

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Data Science Academy, Student Demo day, Data science by R, Vivian S. Zhang, see www.nycdatascience.com for more details.

TRANSCRIPT

Increasing New York student attendance with Kinvolved and Data Science Richard Sheng @rcsheng Data Science and Strategic Analytics TE Connectivity

NYC Data Science Academy

Data Science & Strategic Analytics

Investment Banking Associate

NYU MBA

Principal Consultant, SAP Data Science

Application Engineer

Disclaimer: My views are my own

Kinvolved was Co-founded by a former educator (Teach For America, NYC, 2008) and a parent advocate. Miriam and Alex began this journey while graduate students at the Robert F. Wagner School of Public Service at NYU in 2012. They completed an accelerator in August 2013, and are currently based at the Blue Ridge Foundation in Brooklyn, NY.

Stakeholders: Kinvolved

School Principals

External funders Goals:

Help drive adoption of Kinvolved’s product to improve attendance rates, an early predictor of drop-outs

Impact1: Estimated lost lifetime revenue for male dropouts

between the ages of 25 and 34 is approximately $944 billion dollars, and costs associated with poor health and criminal activity have been estimated at $24 billion

1. Source: http://www.attendanceworks.org/wordpress/wp-content/uploads/2010/04/Schoeneberger_2011.pdf

read.delim("attendance-2009-2014.csv",as.is=TRUE,header=TRUE,stringsAsFactors=FALSE,fill=TRUE,fileEncoding="UTF-16LE")

Date conversion dist_attnd09to14 <- subset(attnd09to14,District==School) dist_attnd09to14 <- subset(dist_attnd09to14,City!=District) districts2 <- districts[grep("^DISTRICT",districts)] ds <- dist_attnd09to14[dist_attnd09to14$District %in% districts2,] school.years <- c("09-10","10-11","11-12","12-13","13-14") coln <- c(1:2,which(colnames(ds) %in% school.years)) df <- ds[coln]

newyork_ds <- paste("new york school district",1:32) ds_code <- geocode(newyork_ds) df3 <- df[order(df$District),c("District","13-14")] data <- cbind(df3,newyork_ds,ds_code) colnames(data)[2] <- "attnd“ ds_map <- ggmap(get_googlemap(center = 'new york',

zoom=11,maptype='terrain'),extent='device') + geom_point(data=data,aes(x=lon,y=lat,colour=attnd,size=1/attnd))+ scale_colour_gradientn(colours=c("red", "blue")) + scale_size_area() + labs(title = "New York School Attendance - '13 to '14 \n" )

print(ds_map)

Kinvolved found that majority of absenteeism of students related to Asthma issues

Level 4: Exceeding the proficiency standard

Level 3: Meeting the proficiency standard

Level 2: Meeting the basic standard

Level 1: Scoring below the learning standard

% Proficiency = % Level 3 & 4 / all students

Looks fairly similar in problematic areas

Just looking at Districts, 67% of exam results variance can be attributed to attendance

Q&A Richard Sheng @rcsheng rcsheng@gmail.com Data Science and Strategic Analytics TE Connectivity

NYC Data Science Academy

top related