learn to love the data frame
TRANSCRIPT
![Page 1: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/1.jpg)
@JennyBryan@jennybc
learn to love the data frame
@STAT545http://stat545.com
![Page 2: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/2.jpg)
decision fatigueaggravationcutting corners
masteryefficiencysafety
😎
😭
![Page 3: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/3.jpg)
decision fatigueaggravationcutting corners
masteryefficiencysafetyI want this for you!
😎
😭
![Page 4: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/4.jpg)
Next 45 mins:I want to convince you that“when in doubt, stick it in a data frame”willincrease yourmastery, efficiency, safetyand reduce misery.
![Page 5: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/5.jpg)
Three tricky bits:- wide range of previous R experience- existing BioC context re: classes- tibble variant of data frame very new
![Page 6: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/6.jpg)
do first bit of pirates vs ninjas live coding
![Page 7: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/7.jpg)
Simple view
Technically correct R view
mode class typeof
character character character character
logical logical logical logical
numeric numeric integer or numeric
integer or double
factor numeric factor integer
R objects come in a few flavours a simple view of simple R objects that will get you pretty far
![Page 8: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/8.jpg)
Simple view
Technically correct R view
mode class typeof
character character character character
logical logical logical logical
numeric numeric integer or numeric
integer or double
factor numeric factor integer
R objects come in a few flavours a simple view of simple R objects that will get you pretty far
![Page 9: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/9.jpg)
Simple view
Technically correct R view
mode class typeof
character character character character
logical logical logical logical
numeric numeric integer or numeric
integer or double
factor numeric factor integer
R objects come in a few flavours a simple view of simple R objects that will get you pretty far
![Page 10: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/10.jpg)
vector
list
Collecting scalars of the same mode?
yes
no
matrixyes
Collecting vectors of the same length?
no
Collecting vectors of the same mode?data.frame
yesno
![Page 11: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/11.jpg)
factor integer numeric
![Page 12: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/12.jpg)
😎
If you can put it in a data frame, DO THAT.
Operate on the data frame holistically.
Pass it to other functions, pref. intact and whole.
Learn how to limit computation to specific rows or columns. Don’t create copies or excerpts lightly.
![Page 13: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/13.jpg)
minimize the creation of data excerpts and copies ...
... they will just confuse you later
![Page 14: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/14.jpg)
back to pirates vs ninjas live codingbut with the tidyverse
![Page 15: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/15.jpg)
😎
If you can put it in a data frame, DO THAT.
Develop your preferred workflow for df manipulation and use it maniacally.
Strong recommend: use the “tidyverse” - tibble + dplyr + tidyr - tbl_df or “tibble” is a variant of data.frame
![Page 16: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/16.jpg)
Where do data frames come from?
![Page 17: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/17.jpg)
Delimited fileread.table() and friendsreadr package
Coercionas.data.frame()as_tibble()
Assembly from vector partsdata.frame()tibble::tibble(), tibble::enframe()
Growing existing objecttransform()dplyr::mutate()
![Page 18: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/18.jpg)
Things you need to know about tibbles:
no partial name matching with `$`stringsAsFactors = FALSEdf[ , “X1”] will be a tibble, i.e. drop = FALSEyou can print them with wild abandonno row namesdo not alter with variable nameswill only recycle input of length 1
![Page 19: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/19.jpg)
Preview of Friday “what should you do next?”
What if you need to divide the df into its rows or groups of rows and compute on them?
Putting complicated objects into list-columns, temporarily. How and why?
![Page 20: learn to love the data frame](https://reader034.vdocuments.net/reader034/viewer/2022051522/58a1a99a1a28ab8e608bb3ea/html5/thumbnails/20.jpg)
decision fatigueaggravationcutting corners
😎😭masteryefficiencysafety