2014 july use_r

21
R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014

Upload: yasmin-lucero

Post on 25-Jun-2015

1.718 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: 2014 july use_r

R In Production: the productsYasmin Lucero, PhD

Senior Statistician, Gravity-AOL

UserR! 2014

Page 2: 2014 july use_r

Outline• Internal products

• 1. one-off analysis• 2. automated reports• 3. internal R packages• 4. internal dashboards

• External products• 1. customer facing web-app• 2. analytical backend service

• Ops and the managing of an R environment

Page 3: 2014 july use_r

Internal Product 1: one-off analytical product

http://rpubs.com/nathanesau1/21383Nathan Esau

Hilary Parker

Page 4: 2014 july use_r

Internal Product 2: Automated reports

Thursday morning:Automated Business Reporting with R (Zhengying (Doro) Lour)

R + bash + emailR + markdown + web server

Page 5: 2014 july use_r

Internal Product 3:The Internal R package

• Data APIs• Business specific metrics• Custom plotting functions• Custom data manipulation utilities

Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)

Page 6: 2014 july use_r

Internal Product 4:The internal dashboard

Gravity-AOL

Page 7: 2014 july use_r

External Product 1: Customer facing web app

Wednesday afternoonRapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz)

http://www.showmeshiny.com/

Page 8: 2014 july use_r

External Product 2: analytical back-end

Wed afternoon:Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan)Zillow’s Big Data and Real-time Services in R (Yeng Bun)

Page 9: 2014 july use_r

Artwork &

Brands

BankPartner

Transactions

CARD.COMSite / App

CARD.COMAdTech Platform

APIs

RTB Ad Xchgs

CARD.COMAnalytics Platform

Members

Visitors

1

2

3

Details: card.com/useR-2014

pre

dic

t

deploy

learn

CARD.com

Page 10: 2014 july use_r

More good example applications:• http://blog.revolutionanalytics.com/2014/06/how-data-

driven-companies-use-r-to-compete.html

Page 11: 2014 july use_r

Ops: Managing an R Environment• Overall: not complex, but there are pain points:

• R library management• CRAN, non-CRAN and internal packages• Version management• Dependency management (pulling all dependencies)

• Non-R dependencies (especially C++ and Java)• Hardware specifications: How much RAM is enough?

Page 12: 2014 july use_r

Conclusion: Why R?• Plotting• Rich analytical library

• More than a DSL: end to end functionality from data APIs to web apps

• Solid IDE support• Sturdy, stable easy to support platform• Rapid prototyping

Page 13: 2014 july use_r

[email protected]

Thanks.

Page 14: 2014 july use_r

Tools: plotting• Major frameworks

• Base graphics• lattice• ggplot2

• Useful utilties• grid/gridExtra/gtable• latticeExtra• Color: RColorBrewer/munsell/colorspace/dichromat• gplots (the ‘g’ school)• plotrix

• Custom plots• plot.ts• maps• igraph (network visualization)• ggmap• ggvis: interactive graphics• rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github)• rgl (3d)/scatterplot3d• vcd (categorical data)

Page 15: 2014 july use_r

Tools: data manipulation• Base R features

• Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply…• Data structures: ts• Comprehensive, elegant missing data handling (NA)

• Packages• Wickham school: reshape2/plyr/dplyr/tidyr• data.table• Time series: zoo, xts, lubridate• Spatial data tools: sp/maptools• The ‘G’ school: gdata

Page 16: 2014 july use_r

Tools: Data interfaces• Connections: read.table(); url()• DBI: RpostgresSQL; RMySQL; RSQLite;…• RODBC; RJDBC: (vertica, redshift)• Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect• SAS, SYSTAT, SPSS, Stata…: foreign• Rcurl• RProtoBuf: Efficient cross-language data serialization in R

Page 17: 2014 july use_r

Tools: Package development• Package development:

• package.skeleton(); tools (base package)• pkgKitten (CRAN): improvements to package.skeleton• devtools (CRAN) : miscellaneous and very useful tools• gtools: various R programming tools• roxygen2 (CRAN): literate documentation• testthat/testR: unit testing• IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …

Page 18: 2014 july use_r

Tools: Web development & reporting• Shiny• Interactive documents

• Knitr• Sweave

Page 19: 2014 july use_r

Tools: parallel computing• parallel: lots of features formerly distributed among

packages have recently been collected into this base R package

• Revolution analytics• Map-Reduce: rmr/rhadoop• H20 (hexadata)• SparkR (not on CRAN yet, look on github)

Page 20: 2014 july use_r

Tools: big or out of memory computing

• dplyr: supports database backed data structures• ff: supports file based data • biglm/bigmemory: shared memory matrices• HadoopStreaming

Page 21: 2014 july use_r

Tools: memory profiling• lineprof• profr• proftools• object.size()