r: the good and the bad

14
R: The Good and The Bad AnalyticsCamp NC, May 12, 2011 Ian Cook, Organizer, Raleigh-Durham-Chapel Hill R Users Group

Upload: ian-cook

Post on 10-May-2015

6.813 views

Category:

Technology


0 download

DESCRIPTION

An overview of the pros and cons of R, the free and open source language and environment for statistical computing and graphics.

TRANSCRIPT

Page 1: R: The Good and The Bad

R: The Good and The Bad

AnalyticsCamp NC, May 12, 2011Ian Cook, Organizer, Raleigh-Durham-Chapel Hill R Users Group

Page 2: R: The Good and The Bad

The Good…

= ?

Page 3: R: The Good and The Bad

• Effectively the lingua franca of data analysis and statistical computing

• Free and open source• As a statistical language, it’s generally

considered to be very easy to code in (vs. SAS, JSL, SPSS, etc.)

The Good

Page 4: R: The Good and The Bad

• Native cross-platform and 64-bit support• Typically easy to install and configure• Community of millions of users; brilliant minds• Rapidly growing number of packages (2800+

on CRAN, 950+ projects on R-Forge)– http://cran.r-project.org/web/packages/ and

http://r-forge.r-project.org/

The Good

Page 5: R: The Good and The Bad

• Great free, open soruce IDEs and GUIs (e.g., StatET for Eclipse, RStudio just released in late February, Emacs Speaks Statistics, JGR, Tinn-R, lots more)– See “Editors and IDEs” and “Graphical User

Interfaces” sections of http://en.wikipedia.org/wiki/R_(programming_language). Also see http://sciviews.org/_rgui/ and http://stackoverflow.com/questions/1097367/what-ides-are-available-for-r-in-linux

The Good

Page 6: R: The Good and The Bad

• Active mailing lists, trolled by the gurus, very easy to get your questions answered– On a humorous note:

http://yihui.name/en/2010/04/rules-of-thumb-to-meet-r-gurus-in-the-help-list/

• CRAN Task Views– http://cran.r-project.org/web/views/

The Good

Page 7: R: The Good and The Bad

• Growing coverage on Stack Exchange, also on “CrossValidated” statistical analysis Stack Exchange site– http://stackoverflow.com/questions/tagged/r and

http://stats.stackexchange.com/• #rstats hashtag on Twitter– http://twitter.com/search/%23rstats

• Blogger community dedicated to covering R– http://www.r-bloggers.com/

• Growing list of print books and ebooks

The Good

Page 8: R: The Good and The Bad

• Commercial and open source data analysis/mining/analytics/visualization software increasingly integrating with R (Spotfire, SPSS, Netezza, JMP, SAS/IML, RapidMiner)– http

://decisionstats.com/2010/05/04/commercial-r-integration-in-software/

• Revolution Analytics (products, blog, community site)– http://www.revolutionanalytics.com/,

http://blog.revolutionanalytics.com/, and http://www.inside-r.org/

The Good

Page 9: R: The Good and The Bad

The Bad…

= ?

Page 10: R: The Good and The Bad

• Command prompt, lack of GUI is intimidating• Slow (especially looping)• Poor parallelization• Syntactical curiosities, annoyances, design

flaws; little chance of them being remedied– E.g., http

://radfordneal.wordpress.com/2008/09/21/design-flaws-in-r-3-%E2%80%94-zero-subscripts/

• Indices start at 1!

The Bad

Page 11: R: The Good and The Bad

• Subtle problems with scoping– http

://stackoverflow.com/questions/3840769/scoping-and-functions-in-r-2-11-1-whats-going-wrong

• Poor memory performance, difficulty handing big data

• Can be difficult to compile base R and R packages from source– Requires compilers for Fortran, Perl, C/C++, Tcl

The Bad

Page 12: R: The Good and The Bad

• Onerous terms of AGPL• Has been proposed that the R community start

over and build something better from scratch– Estimated that a total rewrite could improve speed

by 2 orders of magnitude– http://

stackoverflow.com/questions/3706990/is-r-that-bad-that-it-should-be-rewritten-from-scratch

• Increasingly attractive alternatives (e.g. Python)

The Bad

Page 13: R: The Good and The Bad

The Verdict

?

Page 14: R: The Good and The Bad

Join the Raleigh-Durham-Chapel Hill R Users Group at:http://www.meetup.com/Triangle-useR/