distributed r for big data

12
Distributed R for big data Shivaram Venkataraman * , Indrajit Roy + , Alvin AuYoung + , Rob Schreiber + , Erik Bodzsar # , Kyungyong Lee ^+ * UC Berkeley, + HP Labs, # U Chicago, ^ UFL

Upload: kelii

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Distributed R for big data. Shivaram Venkataraman * , Indrajit Roy + , Alvin AuYoung + , Rob Schreiber + , Erik Bodzsar # , Kyungyong Lee ^+ * UC Berkeley, + HP Labs, # U Chicago, ^ UFL. Single Threaded + Single Machine. R. R. R. R. R. R. darray. foreach. f (x). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed R for big data

Distributed R for big dataShivaram Venkataraman*, Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+, Erik Bodzsar#, Kyungyong Lee^+

*UC Berkeley, +HP Labs, #U Chicago, ^ UFL

Page 2: Distributed R for big data

RSingle

Threaded+

Single Machine

Page 3: Distributed R for big data

RRR

R R

Page 4: Distributed R for big data

darray

Page 5: Distributed R for big data

foreach

f (x)

Page 6: Distributed R for big data

20x faster than In-memory Hadoop

Power method with 1B edges,Netflix ALS

ScaleSpeed

Page 7: Distributed R for big data

demo

Page 8: Distributed R for big data
Page 9: Distributed R for big data

lj_matrix darray(dim=c(n,n),blocks=c(n,n))

in_vector darray(dim=c(n,1), blocks=(s,1), data=1/n)

out_vector darray(dim=c(n,1), blocks=(s,1))

foreach(i, 1:length(splits(lj_matrix)), function(g = splits(lj_matrix, i), i = splits(in_vector), o = splits(out_vector, i)) { n g %*% o update(n)

})

Page 10: Distributed R for big data

Contact us - alpha version

[email protected]

m

hpl.hp.com/research/presto.htm

tinyurl.com/presto-project

Page 11: Distributed R for big data
Page 12: Distributed R for big data

R

R

RR