the #rdatatable package · the #rdatatable package open analytics data analyst for fast, flexible...

27
THE #RDATATABLE PACKAGE OPEN ANALYTICS Data Analyst for fast, flexible and memory ecient data wrangling Arun Srinivasan UseR’15

Upload: others

Post on 23-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE #RDATATABLE PACKAGE · THE #RDATATABLE PACKAGE OPEN ANALYTICS Data Analyst for fast, flexible and memory efficient data wrangling Arun Srinivasan UseR’15 •Bioinformatician,

THE #RDATATABLE PACKAGE

OPEN ANALYTICS Data Analyst

for fast, flexible and memory efficient data wrangling

Arun SrinivasanUseR’15

Page 15: THE #RDATATABLE PACKAGE · THE #RDATATABLE PACKAGE OPEN ANALYTICS Data Analyst for fast, flexible and memory efficient data wrangling Arun Srinivasan UseR’15 •Bioinformatician,

year val2013 42014 22014 32015 12015 52015 6

year val2013 42014 22014 32015 12015 52015 6

year mul2014 202015 10

+

+

**

year val2014 1002015 120

year val2013 42014 22014 32015 12015 52015 6

year mul2014 202015 10

year val2014 1002015 120

year mul2014 202015 10

X[Y, .(val = sum(val) * mul), by = .EACHI]s

3. AGGREGATIONS ON JOINS

OPEN ANALYTICS

Page 23: THE #RDATATABLE PACKAGE · THE #RDATATABLE PACKAGE OPEN ANALYTICS Data Analyst for fast, flexible and memory efficient data wrangling Arun Srinivasan UseR’15 •Bioinformatician,

id start end1 0.5 1.11 1 22 0.1 0.42 2.5 5.22 5 6

Axid yid1 NA2 23 34 35 3

result

foverlaps(A, B, which=TRUE)id start end

1 0.1 0.41 1.5 1.82 0.1 5.5

B1:2:3:4:5:

1:2:3:

Not limited to integer ranges, but numeric, POSIXct, Date ranges work just the same.

5. OVERLAPPING RANGE JOINS

OPEN ANALYTICS