data science mis0855 | spring 2016 data cleansing david schuff

Post on 17-Jan-2018

228 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cleaning Data Consider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through Identify two problems with this data set.

TRANSCRIPT

DATA SCIENCEMIS0855 | Spring 2016Data Cleansing

David SchuffDavid.Schuff@temple.edu

http://community.mis.temple.edu/dschuff

Discuss (5 minutes)Have you fallen victim to any of Taber’s “stupid data corruption tricks?”

From the readings, what are the best tips for cleaning data?

Cleaning DataConsider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through 2013.

Identify two problems with this data set.

And the problems show up during analysis…

How do you find the “errors” and fix them?

The problem of outliers

Do you correct this by…• Removing the data point?• Using the average of the other data points?• Guessing at the right value?And is this an error or just an anomaly?

top related