1 data 133 - introduction to data science icaora/ds133/materials/fall... · minor in data science...

52
DATA 133 - Introduction to Data Science I Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Fall 2019

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

DATA 133 - Introduction to Data Science I

Instructor: Renzhi Cao Computer Science Department

Pacific Lutheran University Fall 2019

1

Page 2: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

• Office: MCLT 248 • Office hours: In class website (cs.plu.edu/133) • Office Phone: 535-7409 • Email: [email protected]

About me

Renzhi Cao Prof. Cao ? Cow

• Data Science • Machine learning • Bioinformatics

Pictures from: https://www.google.com/search?q=cow&biw=1920&bih=911&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiOt5zlierOAhUE02MKHVbwDY8Q_AUIBigB#imgrc=0dSVh7Vlup1KqM%3A

2

Page 3: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)
Page 4: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)
Page 5: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

5

Page 6: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

About guest speaker

Assistant Professor : Xiaohong Fan

• Accounting

6

School of Business

Page 7: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

About guest speaker

Professor : Tom Edgar

7

Department of Math

Page 8: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

About you

• Names • Where are you from? • Your major? • Hobbies? Movie? Song? Sports? Book? TV show?

What you did over your break? …

8

Page 9: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

What is this course?

What have I gotten myself into?

9

Page 10: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

What is this course?

10

Mathematics?Statistics?

Computer Science?Programming?

Biology?Economics?

Page 11: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

What is this course?

11

It’s a little bit of all of those….

Page 12: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

What is this course?

• Computer programming • Write software using Python and R

• Understand Data Science.

12

Page 13: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

13

Minor in Data Science

Intro to Data Science I

or Intro to Computer Science

DATA 133 or CSCI 144(4 credits)

Introductory StatisticsMATH/STAT 145,

STAT 231, 232, 233 or MATH/STAT 242

(4 credits)

Domain-Specific Elective

(4 credits)

Prerequisite Suggested

Intro to Data Science II

DATA 233

(4 credits)

Statistical Computing & Consulting

MATH/STAT 348

(4 credits)

Requirements - 20 semester hours

Computational and Data Science Foundations 8 semester hours

Statistical Foundations 8 semester hours

Domain-Specific Elective 4 semester hours

Prerequisite: Math 140 Precalculus or equivalent

Page 14: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Why Data Science?• Data scientists are the no. 1 most promising job in

America for 2019 - report from LinkedIn

14https://www.techrepublic.com/article/why-data-scientist-is-the-most-promising-job-of-2019/

• Data scientists have a median base salary of $130,000

• What is the frequently mentioned skills in data science job?

• Python, R, and SQL

Page 15: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

15

JavaPython

Ruby

CC++ C#

Objective CSwift

Perl

Scheme

JavaScript

GoScala

Programming languages

R

Page 16: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

DO I HAVE WHAT IT TAKES TO SUCCEED AT PROGRAMMING?

Page 17: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Image source: http://www.saveursdunet.com/wp-content/uploads/2012/12/Mr-and-Mrs-geek.png

Page 18: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

WITH THE RIGHT ATTITUDE

AND A BIT OF HARD WORK

YOU CAN LEARN PROGRAM

… AND EVEN ENJOY IT!

Page 19: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)
Page 20: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

PROGRAMMING HAS A LOT IN COMMON WITH THINGS PEOPLE DO EVERY DAY

Page 21: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Musical notation captures a set of instructions that can be understood and ‘executed’

Page 22: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Cooking also has special instructions and notations

Page 23: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Similar to learning to drive, programming requires learning many new things at once

Page 24: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Much like reading maps, we often think of our programs abstractly and from different perspectives

Page 25: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

25

One big difference with computers though, is that you must be extremely precise

Page 26: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)
Page 27: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

MINDSETS

Based on the work of Stanford Psychologist Carol Dweck

Page 28: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Growth MindsetFixed Mindset

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 29: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

…avoid challenges

Growth MindsetFixed Mindset

CHALLENGES…embrace challenges

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 30: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

…avoid challenges

…give up easily

Growth MindsetFixed Mindset

CHALLENGES

OBSTACLES

…embrace challenges

…persist in the face of setbacks

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 31: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

…avoid challenges

…give up easily

…see effort as fruitless or worse

Growth MindsetFixed Mindset

CHALLENGES

OBSTACLES

EFFORT

…embrace challenges

…persist in the face of setbacks

…see effort as the path to mastery

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 32: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

…avoid challenges

…give up easily

…see effort as fruitless or worse

Growth MindsetFixed Mindset

…ignore useful negative feedback

CHALLENGES

OBSTACLES

EFFORT

CRITICISM

…embrace challenges

…persist in the face of setbacks

…see effort as the path to mastery

…learn from criticism

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 33: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

…avoid challenges

…give up easily

…see effort as fruitless or worse

Growth MindsetFixed Mindset

…ignore useful negative feedback

...feel threatened by the success of

others

CHALLENGES

OBSTACLES

EFFORT

CRITICISM

SUCCESS OF OTHERS

…embrace challenges

…persist in the face of setbacks

…see effort as the path to mastery

…learn from criticism

…find lessons and inspiration in the success of others

intelligence is static

intelligence can be developed

Based on a graphic by Nigel Holmes available from :http://www.pvusd.net/Departments/GATE/dweck/

Page 34: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

34

What mindset do you have?

Course break?

Page 35: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Syllabus

Page 36: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

36

Attendance

Attendance•Expected to attend every class•YOU are responsible for missed materialsClassroom Conduct•Come to class on time•Turn off electronic devices•Refrain from private conversations (voice or electronic)•Refrain from activities unrelated to current tasks in class•Treat others with respect and dignity

Page 37: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Data Science from Scratch: First Principles With Python. First Edition. Joel Grus. (On PLU Safari online book, also for DATA 233) R Programming for Data Science. Roger Peng

37

Section 1: Tuesday, Thursday 11:50-13:35, Morken #203 (Dr. Cao)

Text book and meeting times

Page 38: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Course website

https://www.cs.plu.edu/~caora/ds133/

38

Page 39: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Course Goals • Develop important problem solving skills by programming. • Explore the Python and R programming language, and write

code with ethical perspective. • Better understand Data Science and build ethical foundations. • Have fun writing computer programs and analyzing data!

39

Page 40: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

Course GradeInnovation project and class

participation – 10% • Use knowledge learned from the class,

propose ideas and analyze the data considering Ethics.

• Participate in each lecture and finish daily works.

Projects - 20% • Around two projects during the semester • Late assignments will be docked 10% per

day (every 24 hours). • Consist of a report, a program and/or a

presentation

Exams – 30% • One mid-term exam and one final Quizzes and homework - 40% • 5 to 10 quizzes or homework • drop lowest score • no makeup quizzes

Overall Score Grade100% -- 90% A / A-

90% -- 80% B+ / B / B-

80% -- 70% C+ / C / C-

70% -- 60% D+ / D / D-

60% -- 0% E

:

Page 41: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

41

I. Introduction to computation

▪Requires as many steps as people:▪ 4 people – 4 steps▪ 16 people – 16 steps▪ 32 people – 32 steps

▪Each person spends most of their time sitting idle:▪ 4 people – Each person idle 75% of the time▪ 16 people – Each person idle 94% of the time▪ 32 people – Each person idle 97% of the time

1. Write new birthday2. Eliminate later birthday

1. Write new birthday2. Eliminate later birthday

1. Write new birthday2. Eliminate later birthday

Stop

Start

1. Write new birthday2. Eliminate later birthday

Finding the earliest birthday - method 1

Page 42: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

42

I. Introduction to computation

Finding the earliest birthday - method 2▪Simultaneous events mean fewer steps:▪ 4 people – 2 steps▪ 16 people – 4 steps▪ 32 people – 5 steps

Stop

1. Compare birthdays2. Eliminate later birthday

Start Start

1. Compare birthdays2. Eliminate later birthday

Start Start

1. Compare birthdays2. Eliminate later birthday

▪ Fewer steps mean less idle time:▪ 4 people – idle ≤ 50% of time▪ 16 people – idle ≤ 75% of time▪ 32 people – idle ≤ 80% of time

Conclusion #1: Computers can’t see the “big picture” – only the immediate task at hand.Conclusion #2: Not all programs are equal – some are faster or more flexible than others.

Page 43: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

43

II. Problem-Solving

A. Understand the Problem▪ Do you understand all the words & terms that are being used? ▪ What are you being asked to find or show?▪ Is there enough information to solve the problem?▪ Can you draw a picture that might help?

B. Come Up With a Plan▪ Guess and check, make a list, or draw a picture.▪ Look for a pattern, or find a key equation.▪ Try solving a simplified version of the problem.▪ Work backwards.

C. Carry Out the Plan▪ Be aware that you may run into roadblocks or dead-ends!▪ Check to see if your results make sense.▪ Don’t be afraid to start over!

D. Make Your Solution Computer-Friendly▪ Imagine you are writing to a student not in this class.▪ Keep things brief… but make sure that you don’t leave anything out.▪ Write a step-by-step list of instructions… like writing a recipe.

Page 44: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

44

II. Problem-Solving

Some Practice Questions Here are a few problems to think about. Use the strategies from the previous slide, and write down at least three facts or observations that you think are important when it comes to solving the problem. We’ll discuss the pros and cons of each fact/observation before trying to solve the problems.

1. Same birthday. You and your classmates want to know if there are students sharing the same birthday. You have everyone’s birthday date (Month and Day), how do you quickly find it out?

2. Pizza Prices. You're trying to decide what size pizza to order, and have the choice of a 12" pizza for $13 or a 14" pizza for $16. Which one gives you the most pizza per dollar?

3. Finding the Day of the Week. What day of the week is 23 December 2017? What about 23 December 2087?

Page 45: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

45

III. Environment setup

Python

•Go to https://www.python.org/downloads/•Download Python 2.7.11•Select the option “add python.exe to Path”•Select “Will be installed in local hard drive”•Double click on the file and click next/yes for all questions

•Wait a moment……•Click Finish•Done!

Page 46: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

46

III. Environment setup

Atom

•Go to https://atom.io/•Click on Download installer•Open the file and follow the instructions

•You are ready to go!

Page 47: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

47

III. Environment setup

Anaconda

•I am aware that the book talks about Anaconda. Here is my opinion about it:

•It is really good! •If you are interested in installing Anaconda instead of Python and pip… Go for it!

•In class we will use regular Python and Atom, but I don’t want to stop you in any way from exploring other software

Page 48: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

48

III. Environment setup

R

•Go to https://www.r-project.org/•Click on Download CRAN•Select the area that it is closest to you (Fred Hutchinson Cancer Research Center, Seattle, WA)

•Select your operating system•Select “base”•Download file•Click on the file and click next on all the prompts. (leave the default values)

•Click Finish!

Page 49: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

49

Resources

Before you leave today… - apply for a curly account

https://www.cs.plu.edu/hub/accounts/requests/new

Finish survey about your background (available on course website):

https://cs.plu.edu/~caora/Survey//index.php/Search/ByID?ID=S_4_5d72bd3ddcc5c

For next time… - Check the book if you would like to buy one, read Page 1-12.- Check out class website and Sakai regularly: https://www.cs.plu.edu/~caora/ds133/

Page 50: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

50

Resources

Account for the first few days (active through Sep 21, 2019):

user name: firstday password: fall2019

Page 51: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

51

Resources

Create an Account…– Open the Firefox browser– Go to https://www.cs.plu.edu/hub/– Click on Request link– Review PLU Policies– Click on I agree link

Page 52: 1 DATA 133 - Introduction to Data Science Icaora/ds133/Materials/fall... · Minor in Data Science Intro to Data Science I or Intro to Computer Science DATA 133 or CSCI 144 (4 credits)

52

Summary

Create an Account…Do the course surveyCheck the programming environment