scaling analysis responsibly
TRANSCRIPT
Scaling Analysis Responsibly
Hilary Parker@hspter
#rcatladies
Not So Standard Deviations
@keegsdur
“We just don’t have enough analysts!”
“Let’s scale by building the perfect BI tool!”
That sounds great!
We should automate some of the things that are slowing you down
PRODUCTTEAM
DATA
http://xkcd.com/
That seems perfectly reasonable!
Let’s just enlist some folks from engineering to help you with it
DATAPRODUCTTEAM
DATA ENG
Sure thing!
...and finally can it add this last graph?
several months pass…
ENG
Sure! File a ticket!
Can we add these 132 extra metrics to the testing?
PRODUCTTEAM
You can’t do that, your family-wise error rate will tend to 1!!
ENG PRODUCTTEAM
DATA
ENG
That’s a reasonable expectation for an internal product. I’m on it!
I’d really like this tool to be more stable.
PRODUCTTEAM
Our test violates a subtle statistical assumption for this new application, and we need to gut this stable product!
ENG PRODUCTTEAM
DATA
Almost impossible to avoid 2-against-1 dysfunction as product teams become “self-service” with engineering support
Invariably becomes a race to the bottom as internal competition for the simplest tool emerges
Stability prioritized over flexibility
(In tech)
Building = Owning
Analysis Developer!
“Analysis Developer”
Someone on the analyst team who develops reproducible, flexible analyses in R and helps all analysts scale their work
I’ll work with the analysis developer on my team!
We should automate some of the things that are slowing you down
PRODUCTTEAM
DATA
Avoids common types of dysfunction
Allows for flexible, accurate analysis
Analysts acquire marketable skills!
Instead of creating dashboards or using static BI tools...
http://dilbert.com/strip/2007-05-16
Series of R packages highly specified for business case, “mix and match” elements to rapidly create common reports.
library(“internal_package”)
Instead of “assembly line” data processing…
Close 2-way partnership with data engineers to optimize the creation of datasets for certain common analyses.
The assembly line handoff from scientist to engineer creates [an uncreative] environment. The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved. - Jeff Magnusson
http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
Instead of PM anxiously watching dashboards…
https://www.youtube.com/watch?v=CCbWyYr82BM
Analysts can create shorter-lived, reproducible reports
Expectation manage the shorter lifespan of the report, but include that report will require less work from teams once created
Productionize in the short-term with CRON jobs
Can add in more stats this way! Y/Y turns into semiparametric models, etc.
“The Problem with Dashboards (And A Solution)” by Stephanie Evergreen
http://stephanieevergreen.com/problem-with-dashboards/
http://dilbert.com/strip/2004-04-05
Instead of promotion based on deliverables…
Consider skill acquisition for analyst promotion
For analysis developers, promoted based on whether or not they were able to help other analysts become more efficient
Support for skill acquisition!
Education support for learning better analysis development methods for all analysts
Internally created resources
Instead of PMs self-teaching analysis based on what’s presented in dashboarding tools..
https://xkcd.com/605/
PMs can use tools for education analysts if they want to “ramp up” on analytical skills like R
This way you can bake in statistical education as well.
“Isn’t this just package development?”
“Isn’t this just package development?”
No!
Ad-hoc spreadsheet work
Ad-hoc spreadsheet work
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing- problem-specific writeups and functions
+ industry-wide audience- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing- problem-specific writeups and functions
+ industry-wide audience- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ reproducibility, some functions, “analysis testing”
+ scripting
+ workplace-wide audience, documentation, testing- problem-specific writeups and functions
+ industry-wide audience- company-specific code and functions
Analysis Developer
Open-Source Developer
Analysis Developer
Stop trying to scale with static BI tools -- this will (almost) always lead to dysfunction
Instead, scale by increasing analyst efficiency using R and education!
Hire Analysis Developers to help with all this!
Thanks!
Hilary Parker
@hspter