![Page 1: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/1.jpg)
data_wrangling() &&("manipulation" %in% R)
馃惃馃悷馃尶 %>% %>% %>% 馃馃搳馃グ
> day[2]Ruan van Mazijk
![Page 2: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/2.jpg)
tinyurl.com/r-with-ruanNotes & slides will go up here:
(But I encourage you to make your own notes!)
![Page 3: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/3.jpg)
> workshop$outline[1:3]
DAY 1
Tidy data principles& tidyr
DAY 2
Manipulating data& an intro to dplyr
DAY 3
Extending your datawith mutate(),summarise()
& friends
![Page 4: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/4.jpg)
tidyr::
# Verbs to tidy your data
# Untidy observations?gather() # if > 1 observation per rowspread() # if observations live in > 1 row
# Untidy variables?separate() # if > 1 variable per columnunite() # if variables live in > 1 column
![Page 5: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/5.jpg)
> workshop$outline[2:3]
DAY 2
Manipulating data& an intro to dplyr
DAY 3
Extending your datawith mutate(),summarise()
& friends
![Page 6: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/6.jpg)
# base R
data[ , columns ]
data[ rows , ]
![Page 7: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/7.jpg)
# base R
data[, 4]data[, "plantheight"]
data[1:10, ]data[data$soil == "a", ]
![Page 8: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/8.jpg)
# base R
data[, "plantheight"]
data[data$soil == "a", ]
![Page 9: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/9.jpg)
# tidyverse R
data %>%select(plantheight)
data %>%filter(soil == "a")
![Page 10: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/10.jpg)
dplyr::
# Verbs to manipulate your data
![Page 11: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/11.jpg)
dplyr::
# Verbs to manipulate your data
select() # operates on columnsfilter() # operates on rows
![Page 12: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/12.jpg)
data %>%select(...)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
![Page 13: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/13.jpg)
data %>%select(plant_height, soil, lon, lat, veg_type)
![Page 14: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/14.jpg)
data %>%select(plant_height, soil, lon, lat, veg_type)
data %>%select(plant_height:veg_type)
# Think 1:10 but with words!
![Page 15: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/15.jpg)
data %>%select(plant_height, soil, lon, lat, veg_type)
data %>%select(plant_height:veg_type)
# Think 1:10 but with words!
data %>%select(-mean_annual_temp)
# Think data[, -10],# Or like gather(key, value, -foo)
![Page 16: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/16.jpg)
data %>%select(plant_height, plant_weight, plant_LAI)
![Page 17: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/17.jpg)
data %>%select(plant_height, plant_weight, plant_LAI)
data %>%select(starts_with("plant"))
# Also:# contains() ends_with() matches()# num_range() one_of() starts_with()
![Page 18: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/18.jpg)
data %>%select(plant_height, plant_weight, plant_LAI)
data %>%select(starts_with("plant"))
# Also:# contains() ends_with() matches()# num_range() one_of() starts_with()
data %>%select_if(is.numeric)
# Accepts base R functions (sans "()"):# is.logical is.character is.numeric# is.factor is.datetime
![Page 19: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/19.jpg)
data %>%filter(...)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
![Page 20: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/20.jpg)
data %>%filter(plant_height <= 10)
![Page 21: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/21.jpg)
data %>%filter(plant_height <= 10)
data %>%filter(plant_height <= 10, vegtype == "fynbos")
![Page 22: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/22.jpg)
data %>%filter(plant_height <= 10)
data %>%filter(plant_height <= 10, vegtype == "fynbos")
# Multiple conditions must all be satisfied# So it "&&"s them, so it would be the same as:data %>%
filter(plant_height <= 10 & vegtype == "fynbos")
![Page 23: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/23.jpg)
data %>%filter(plant_height <= 10)
data %>%filter(plant_height <= 10, vegtype == "fynbos")
# Multiple conditions must all be satisfied# So it "&"s them, so it would be the same as:data %>%
filter(plant_height <= 10 & vegtype == "fynbos")
data %>%filter(plant_height <= 10 | plant_weight >= 60)
# We can use "or": |
![Page 24: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/24.jpg)
# Intervals?
data %>%filter(plant_height <= 10 & plant_height >= 0.5)
# There is also a tidy way!data %>%
filter(plant_height %>% between(0.5, 10))
![Page 25: > day[2] - rvanmazijk.github.io filetidyr:: # Verbs to tidy your data # Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy](https://reader030.vdocuments.net/reader030/viewer/2022040714/5e1c0c3a21f5c44f01085a44/html5/thumbnails/25.jpg)
> demo()