introduction to data science – info 480 – drexel university’s ischool

14
Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five

Upload: samira

Post on 23-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Data Science – INFO 480 – Drexel University’s iSchool. Sean P. Goggins, PhD April 30, 2013 Week Five. What is Data Science?. Storytelling Database Theory – How you organize your data has a big influence on what you can do with it. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Introduction to Data Science – INFO 480 – Drexel University’s

iSchoolSean P. Goggins, PhD

April 30, 2013Week Five

Page 2: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

What is Data Science?

Storytelling Database Theory – How you organize your

data has a big influence on what you can do with it.

Agile Manifesto – Key thing is iterative development; it’s a technology value system.

Spiral Dynamics – What we view as fact and what we desire emerges from the data presented to us.

Credit: http://www.datascientists.net/what-is-data-science

Page 3: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Tonight Share Software for transformation on GitHub Share How you approached the assigment with

the class (individually) Ask questions Make sure you understand everyone’s approach Help each other – The result not the language or

technique used to transform data are what matter

Use network scripts from week one to transform your transformed data (that’s right!) into networks. Groups of 3

Page 4: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Week Five Software Sharing #1 (Share scripts produced

in week 3 using an open source software configuration management tool). Students will refine and then share their scripts

with other students Included in the assignment is a 500 word

explanation of how their script could be improved, optimized and adapted to other data of a similar type.

The “read me” file distributed with the script will explain to another user how to apply the script to the data distributed in assignment one. This will include specific, technical specifications.

Page 5: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Using GitHub for Software Sharing

Creating a GitHub Account Creating a GitHub Project Using the GitHub Desktop client Committing & Syncing The Pull Request Sharing Your Software!

For my respository Create a directory with your name under “student

Files” Put your assignment in there Create a “pull request”

Page 6: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Discuss Homework Analysis Questions. Write up a short essay with

tables or graphs if needed to describe how you would: Build a network using the scripts from week1

against the mention connections? Reply-To connections? In this sample data. What transformations are required? How would you filter the data? Use the actual data to ground your thinking. Feel free to actually write or modify the R code samples from the first two weeks to experiment. Some of you will be more comfortable doing this; some will be more comfortable addressing the question conceptually. This is OK.

Page 7: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Individual Presentations

Informally by you!

Page 8: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Remembering Networks

Page 9: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Underpants Gnomes

With much discourtesy from the US TV Program “South Park”

Motivation

Page 10: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Underpants GnomesMotivation

Page 11: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Addressing The Underpants Gnome

Postulate

Page 12: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

12

Discussion Post•Read•Response

Classification•Open Coding•Axial Coding

Identification of Coordination Events•Time proximity•Topical proximity

Aggregation of Posts by

Topic

Weighted Network

Analysis of Interactions

Methodological Approach

Weight Connections Based on Time Distance, GroupedBy Topic and informed by analysis of time distance between posts.

Identify Key InformationBrokers

Group Informatics Described

Page 13: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Network Transformation

Activity

Page 14: Introduction to Data Science – INFO 480 – Drexel University’s  iSchool

Week Six Week 6: Sharing Data Preparation Results

and Tools Readings and Assignments Due: Presentation involves sharing data with other

people in a way that is visually insightful. Students will be asked to bring an example of a visualization of data from a website or news organization, and make a short presentation about what makes the visualization insightful.

Data Visualization Example Presentation Chapters 4-7 of “The Anarchist in the Library: How

the Clash Between Freedom and Control is Hacking the Real World and Crashing the System”.