df1 - r - natekin - improving daily analysis with data.table
TRANSCRIPT
Improving daily analysis with data.table
a [brief] tutorial
Alex NatekinDeloitte Analytics Institute
3
Data.table
4
Legend says
And many others…“the R god of number crunching”
5
Legend says (2)
… to read the manual
With great poweR comes great Responsibility
of fasteR & richeR data crunching …
6
Choose your side
dplyr sqldfdata.table
“Hadleyverse” Way of the warrior…
…each one is way different from data.frame
7
Choose your side… wisely
from recent Matt Dowle’s meetup presentations
8
from recent Matt Dowle’s meetup presentations
…just search for “data.table benchmarks”
Choose your side… wisely (2)
9
data.table applicability
SolutionData
extraction & checks
Data processing
Feature engineering Models Stories
…trying to find your place under the sun
10
data.table applicability
SolutionData
extraction & checks
Data processing
Feature engineering Models Stories
Naïve functionality
Most awesome functionality
Is closest to production code
(if applicable to R)
11
Core functionality
1. Data reading & memory management
2. Data access & ordering3. Grouping & aggregation
…feature engineering
More efficient:
12
Core functionality (2)
1. Data reading & memory management
2. Data access & ordering3. Grouping & aggregation
…feature engineering
More efficient:
…as data.frame extension (~100% compatible)
1. Reduce machine time
2. Reduce human programming time
13
Core principle
DT[i, j, by]1. Take DT2. Subset rows by i3. Calculate j4. …grouped by by
14
Core principle (2)
from data.table tutorial
15
Example: churn
Sorry
Laptop died last evening,
no interactive tutorial
Screenshots from remaining
files
16
Example
17
Example
18
Example
19
Example (manual injection)
setkey(DT, colA, colB)
Yet another recent Matt Dowle’s meetup presentations
20
Example
21
Example
22
Example
23
Example
24
Example
25
Example
26
Example: churn
27
Example: churn
28
Example: churn
29
Example: churn
30
Example: churn
31
Functionality: more
1. Fread
2. Column updates
3. Set functions (set, setnames, …)
4. Special symbols (.SD, .I, …)
5. Joins
… next time
32
More: resources
33
SummaRy
1. data.table is helpful & awesome
2. go forth and use it
3. RTFM
Thanks!
Alex [email protected]+7 915 070 45 74