cmyk 0/100/100/20 66/54/42/17 34/21/10/0 why is r slow...

41
Why is R slow? How to run R programs faster? Tomas Kalibera Tomas Kalibera

Upload: others

Post on 30-Apr-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Why is R slow?How to run R programs faster?

Tomas Kalibera

CMYK 0/100/100/20 66/54/42/17 34/21/10/0

Tomas Kalibera

Page 2: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

My Background

Virtual machines, runtimes for programming languages

Real-time JavaAutomatic memory management

Evaluating software performance

R User

BenchmarksUsing statistical methods

Page 3: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Currently working on: FastR

A new, experimental virtual machine for (a subset of) R language. Discovering optimizations that can speed-up R.

CMYK 0/100/100/20 66/54/42/17 34/21/10/0

Core team

Jan VitekTomas KaliberaPetr Maj Floreal Morandat

Wider team

Page 4: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Community: Dynamic Languages for Scalable Data Analytics

Use one dynamic, high level language for data analytics tasks running on platforms from a tablet to the cloud.

R, Matlab, Python, Julia

Large software companies interested in R

NSF Funded Workshop at SPLASH 2013Software Infrastructure for Sustained Innovation

Page 5: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Virtual Machines, R & FastR

Page 6: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

int main(int argc, char **argv) {

if (argc != 2) { fprintf(stderr, "tm n\n"); return 1; } int n = atoi(argv[1]); printf("n = %d\n", n);

Source code

main

if

decl

call

!=

argc

2

call

ret

Parse tree

parsing

Page 7: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

main

if

decl

call

!=

argc

2

call

ret

Parse tree executed directly by

(AST) Interpreter

Class If Node Condition, TrueBranch, FalseBranch;

Result execute() { If (Condition.execute() == TRUE) { TrueBranch.execute() } else { FalseBranch.execute() } Return NULL; }

GNU R works like this.

Interpreter

Easy to develop, maintain.

Page 8: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

compilationlinking

Compiler

Ahead of time: C/C++/FortranJust-in-time: Java/C#

0000000000400580 <main>: 400580: 41 54 push %r12 400582: 83 ff 02 cmp $0x2,%edi 400585: 55 push %rbp 400586: 53 push %rbx 400587: 74 25 je 4005ae <main+0x2e> 400589: 48 8b 0d c8 0a 20 00 mov 0x200ac8(%rip),%rcx 400590: ba 05 00 00 00 mov $0x5,%edx 400595: be 01 00 00 00 mov $0x1,%esi 40059a: bf 04 08 40 00 mov $0x400804,%edi 40059f: e8 cc ff ff ff callq 400570 <fwrite@plt> 4005a4: b8 01 00 00 00 mov $0x1,%eax 4005a9: 5b pop %rbx 4005aa: 5d pop %rbp 4005ab: 41 5c pop %r12 4005ad: c3 retq

Machine codemain

if

decl

call

!=

argc

2

call

ret

Parse tree

Fast.

Page 9: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

FastR● Self-optimizing AST interpreter

– Aims to be still easy to develop, maintain

– But fast

● The AST (tree) rewrites as the program executes– Speculative rewrites, recovery

● Runs on a JVM– High-performance garbage collector

– Just-in-Time compilation improves speed

CMYK 0/100/100/20 66/54/42/17 34/21/10/0

Page 10: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset
Page 11: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset
Page 12: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Understanding why GNU-R is slowSpeeding-up R programs

Page 13: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Toeplitz MatrixIn AT&T R Benchmarks 2.5 (Simon Urbanek)

Initializing a square matrix

ai , j=∣i− j∣+1

1 2 3 4 5

2 1 2 3 4

3 2 1 2 3

4 3 2 1 2

5 4 3 2 1

Page 14: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM using For Loop(as included in AT&T R Benchmarks 2.5)

tmFor <- function(n) { b <- matrix(nrow = n, ncol = n) for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } } b }

ai , j=∣i− j∣+1

Page 15: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM using For Loop(as included in AT&T R Benchmarks 2.5 )

tmFor <- function(n) { b <- matrix(nrow = n, ncol = n) for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } } b }

ai , j=∣i− j∣+1

N = 500 650 ms

N = 1000 2610 ms

N = 1500 5910 ms

This is very slow!

Page 16: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM in C

int *b = (int *)malloc(n * n * sizeof(int));

for(j = 1; j <= n; j++) { for(k = 1; k <= n; k++) { b[(k - 1) + (j - 1) * n] = abs(j - k) + 1; } }

N = 500 650 ms

N = 1000 2610 ms

N = 1500 5910 ms

In RN = 500 0.2 ms

N = 1000 0.9 ms

N = 1500 2.1 ms

In C

R slowdowns is hundreds of fold.

Page 17: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Toeplitz MatrixUnderstanding why GNU-R is slow

Page 18: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Checking with a profiler

> Rprof()> dummy <- tmFor(5000)> Rprof(NULL)> summaryRProf()

$by.self self.time self.pct total.time total.pct"tmFor" 51.42 86.36 59.54 100.00"abs" 2.80 4.70 2.80 4.70"-" 2.76 4.64 2.76 4.64"+" 2.42 4.06 2.42 4.06"matrix" 0.12 0.20 0.12 0.20":" 0.02 0.03 0.02 0.03

$by.total total.time total.pct self.time self.pct"tmFor" 59.54 100.00 51.42 86.36"abs" 2.80 4.70 2.80 4.70"-" 2.76 4.64 2.76 4.64"+" 2.42 4.06 2.42 4.06"matrix" 0.12 0.20 0.12 0.20":" 0.02 0.03 0.02 0.03

Page 19: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: R profiler does not help

tmFor <- function(n) { b <- matrix(nrow = n, ncol = n) for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } } b }

Performancecriticalpart.

Page 20: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Checking with a system profilerenv CFLAGS=-g ./configure --with-blas --with-lapack

--enable-R-static-lib –disable-BLAS-shlibmake

source("tm.r")dummy <- tmFor(5000)

perf record -g -- ~/work/R/R-3.0.2/R-3.0.2-dbg/bin/R --slave < runtm.rperf report -g

+ 1.08% R R [.] real_binary+ 0.75% R R [.] integer_binary+ 0.74% R R [.] do_abs

+ 9.91% R R [.] Rf_eval + 9.53% R R [.] Rf_cons - 6.67% R R [.] Rf_findVarInFrame3 - Rf_findVarInFrame3 + 29.17% Rf_findVar + 7.84% EnsureLocal + 2.21% Rf_eval

Page 21: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Checking with a system profiler

+ 9.91% R R [.] Rf_eval + 9.53% R R [.] Rf_cons - 6.67% R R [.] Rf_findVarInFrame3 - Rf_findVarInFrame3 + 29.17% Rf_findVar + 7.84% EnsureLocal + 2.21% Rf_eval Variable look-up

Page 22: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

R built-in functions can be changed

for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } }

abs is a built-in function

abs can be changed at any time

> abs <- function(x) { x * x }> abs(-10)[1] 100

> for(i in 11:13) { if (i==12) { abs <- sqrt } ; print(abs(i)) }[1] 11[1] 3.464102[1] 3.605551

Variable look-up

Page 23: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

R built-in functions can be changed

tmFor <- function(n) { b <- matrix(nrow = n, ncol = n) for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } } b }

tmFor

nn

nb

nn

nj

nk

GlobalEnv

nnntmFor

nnnabs

BaseNamespaceEnv

.Primitive("abs")

Variable look-up

Page 24: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

R built-in functions can be changed

for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } }

abs is a built-in function+ - ( [ { ← for :

are all built-in functions> `:` <- sum> 1:10[1] 11

> `<-` <- function(x,val) { eval.parent( assign(deparse(substitute(x)), 100)) }> z <- 10[1] 100

Variable look-up

Page 25: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Variables can be deleted

for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1 } }

> x <- 10> rm(x)> xError: object 'x' not found

> for(i in 1:3) { if (i==2) { rm(i) } else print(i) }[1] 1[1] 3

> for(i in 1:3) { if (i==2) { rm(i) } ; print(i) }[1] 1Error in print(i) : object 'i' not found

variable look-up is needed

Loop control variable can be deleted

Variable look-up

Page 26: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Checking with a system profiler

Linked-list allocation anduse

+ 9.91% R R [.] Rf_eval- 9.53% R R [.] Rf_cons - Rf_cons + 29.87% Rf_allocList + 24.96% Rf_evalList + 14.35% Rf_evalListKeepMissing + 6.04% Rf_lcons + 5.90% Rf_DispatchOrEval + 5.29% Rf_list2 + 3.85% evalseq + 3.26% Rf_defineVar + 3.04% Rf_list1 + 1.18% Rf_eval + 0.75% replaceCall + 0.52% evalArgs+ 6.67% R R [.] Rf_findVarInFrame3

Page 27: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Arguments passed as linked-listLinked-list allocation and use

for (j in 1:n) { for (k in 1:n) { b[k,j] <- abs(j - k) + 1

Converted to a general replacement call of formF(X) ← Y

The replacement call is then transformed

F(X) ← Y TMP ← XX ← “F<-”( TMP, value = Y )

b[k,j] ← Y TMP ← bb ← “[<-”( TMP, k, j, value = Y )

Page 28: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Replacement call is expensive

Linked-list allocation and use

b[k,j] ← Y

TMP ← bb ← “[<-”( TMP, k, j, value = Y )

nn

nTMP

n[<-

nk

nj

nY

nn

nb

n<-

This linked list allocated in eachiteration

Page 29: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Toeplitz MatrixSpeeding-up R programs

Page 30: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

R Byte-code compiler

env R_ENABLE_JIT=3 R

AST Bytecode

N = 500 650 ms 130 ms

N = 1000 2610 ms 530 ms

N = 1500 5910 ms 1150 ms

Always use byte-code compiler!

> require(compiler)Loading required package: compiler> help(cmpfun)

Page 31: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Sapply

tmSapply <- function(n) { sapply(1:n, function(j) { sapply(1:n, function(k) { abs(j - k) + 1 }) })}

Page 32: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Sapply

tmSapply <- function(n) { sapply(1:n, function(j) { sapply(1:n, function(k) { abs(j - k) + 1 }) })}

For Sapply

N = 500 130 ms 320 ms

N = 1000 530 ms 1300 ms

N = 1500 1150 ms 2960 ms

Using sapply instead of for sometimeshelps. Not now...

Page 33: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Rows Algo

tmRows <- function(n) { b <- matrix(nrow = n, ncol = n) b[1,] <- 1:n if (n >= 2) { for(r in 2:n) { b[r,] <- c(r, b[r-1,-n]) } } b}

1 2 3 4 5

2 1 2 3 4

3 2 1 2 3

4 3 2 1 2

5 4 3 2 1

Page 34: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Rows Algo

tmRows <- function(n) { b <- matrix(nrow = n, ncol = n) b[1,] <- 1:n if (n >= n) { for(r in 2:n) { b[r,] <- c(r, b[r-1,-n]) } } b}

For Rows

N = 500 130 ms 13 ms

N = 1000 530 ms 59 ms

N = 1500 1150 ms 169 ms

Much faster. Reduced calls, lookups.

Page 35: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Cols Algo

tmCols <- function(n) { b <- matrix(nrow = n, ncol = n) b[,1] <- 1:n if (n >= 2) { for(col in 2:n) { b[,col] <- c(col, b[-n, col-1]) } } b}

1 2 3 4 5

2 1 2 3 4

3 2 1 2 3

4 3 2 1 2

5 4 3 2 1

Page 36: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Cols2 Algo

tmByCols <- function(n) { if (n >= 2) { sapply(1:n, function(col) { if (col < n) { c( col:1, 2:(n-col+1) ) } else { n:1 } }) } else { 1 } }

1 2 3 4 5

2 1 2 3 4

3 2 1 2 3

4 3 2 1 2

5 4 3 2 1

Page 37: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Cols2 Algo

tmByCols <- function(n) { if (n >= 2) { sapply(1:n, function(col) { if (col < n) { c( col:1, 2:(n-col+1) ) } else { n:1 } }) } else { 1 } }

Rows Cols2

N = 500 13 ms 5 ms

N = 1000 59 ms 39 ms

N = 1500 169 ms 58 ms

Much faster. Reduced calls, lookups.

Page 38: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Outer Algo

tmOuter <- function(n) {

outer(X = 1:n, Y = 1:n, FUN = function(j,k) { abs(j - k) + 1

})

}1 2 3 4 5

2 1 2 3 4

3 2 1 2 3

4 3 2 1 2

5 4 3 2 1

Page 39: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Outer Algo

tmOuter <- function(n) {

outer(X = 1:n, Y = 1:n, FUN = function(j,k) { abs(j - k) + 1

})

}Cols2 Outer C

N = 500 5 ms 2 ms 0.2 ms

N = 1000 39 ms 27 ms 0.9 ms

N = 1500 58 ms 47 ms 2.1 ms

Yet faster. Vectorized.Also easy to read.

Page 40: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

TM: Summary

For Outer C For-FastR

N = 500 130 ms 2 ms 0.2 ms 13 ms

N = 1000 530 ms 27 ms 0.9 ms 47 ms

N = 1500 1150 ms 47 ms 2.1 ms 101 ms

Page 41: CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow ...hlavka/vyuka/past/fastr-presentation9.pdf · Currently working on: FastR A new, experimental virtual machine for (a subset

Summary

● Use byte-code compiler● Vectorize● Use built-ins (sum, prod, cumsum, outer)● Use simplest data structure possible

– Matrix instead of data.frame

– Avoid data.frame indexing

● Save and re-use intermediate results

Please consider donating your code/data in form ofbenchmarks.