loops, dplyr, maps - companyname.com - super sloganhofroe.net/stat480/15-loops.pdf ·...

21
Loops, dplyr, maps stat 480 Heike Hofmann

Upload: others

Post on 30-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Loops, dplyr, mapsstat 480

Heike Hofmann

Page 2: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Outline

• Loops

• review of dplyr

• Maps

Page 3: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

• Want to run the same block of code multiple times:

• Loop or iteration

for (i in 1:n) { season <- subset(baseball, id == players[i])

mba[i] <- with(season, mean(h/ab))}

block of commands

Iterations

output

Page 4: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

NANANANANANANANANANANA

mba0.301NANANANANANANANANANA

mba <- rep(NA, n)

for (i in 1:n) { seasons <- subset(baseball, id == players[i])

mba[i] <- with(seasons, mean(h/ab))}

i = 1i = 20.3010.182NANANANANANANANANA

... and so on ...

0.3010.1820.2360.2100.2380.2750.0890.1520.1120.2490.158

Page 5: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Your Turn

• Run the iteration to get (a) the life time batting average for each player(b) the life time number of times each player was at bats.

• Make a dataset player.stats from mba, nab and players (use data.frame and cbind)

• Plot nab versus mba.

Page 6: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Other loops• while (condition) {

}

• repeat {

if (cond) break}

block of commands

block of commands

Page 7: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Good Practice

• Use tabulators to structure blocks of statements

• Build complex blocks of codes step by step, i.e. try with single state first, try to generalize

•# write comments!

Page 8: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Why should we not use loops?

• Loops generally highlight a user’s inexperience, b/c most loops can be dealt with better and faster in R’s vector system

• dplyr alternative takes care of all householding chores (like saving vector space beforehand, and binding vectors into a dataframe afterwards)

Page 9: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Some Social Issues

• How many people do you know admit to driving while intoxicated?

• How many people do not use their seat belts?

• How many people did not work out for a single minute in the last month?

• … the BRFSS (behavioral risk factor surveillance system) tries to answer those kind of questions …

Page 10: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Data set: Behavioral Risk Factor Surveillance System (BRFSS)

• largest telephone survey to track health risks: http://www.cdc.gov/brfss/

• For overview, go to:http://apps.nccd.cdc.gov/brfss/

• Visit the above website and try to answer one of the previous questions.

• Report on this - or another surprise finding.

Page 11: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

What did you find?

• … the online tool is good, but we can do much better in R …

Report back

Page 12: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Using the Codebook

• Open the codebook in a text editor (any text editor, just double click the file once you have downloaded it from the website)

• Use the ‘Search’ function to navigate in the document …

• What does variable QLREST2 encode?

Page 13: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Review of data aggregation with dplyr

group_by, summarise

Page 14: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Recognize .variable• Use dplyr to compute mean QLREST2 values by

state.

• Summarize each of the variables GENHLTH, AVEDRNK2, and DRNKDRI2 by gender (SEX)

•What is the average weight in the population by state, gender and educational level? What is the standard deviation?

Page 15: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Maps

Page 16: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

What is a map?

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Set of points specifying latitude and longitude

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Polygon: connect dots in correct order

Page 17: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

long

lat

30

35

40

-95 -90 -85

What is a map?

long

lat

30

35

40

-95 -90 -85

Polygon: connect only the correct dots

Page 18: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Grouping

• Use parameter group to connect the “right” dots (need to create grouping sometimes)

Page 19: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

lat

30

35

40

45

qplot(long, lat, geom="point", data=states)

qplot(long, lat, geom="path", data=states, group=group)

qplot(long, lat, geom="polygon", data=states, group=group, fill=region)

qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)

Page 20: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Merging Files

• merge(x, y, ...)

• help(merge)

• need to specify along which (key) variable(s) in x and y records are aligned

Page 21: Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf · 2014-03-06 · Loops, dplyr, maps stat 480 Heike Hofmann. Outline ... •dplyr alternative takes

Your Turn

• Draw a choropleth map of states showing percentage of households without healthcare coverage (HLTHPLAN == 2)

• Are elderly more affected? Draw choropleth maps of states showing percentage of households without healthcare coverage (HLTHPLAN) by age groups (AGE10 - defined earlier).- what is the group size?