scaling analysis responsibly

46
Scaling Analysis Responsibly Hilary Parker @hspter

Upload: work-bench

Post on 14-Apr-2017

7.569 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Scaling Analysis Responsibly

Scaling Analysis Responsibly

Hilary Parker@hspter

Page 2: Scaling Analysis Responsibly

#rcatladies

Not So Standard Deviations

@keegsdur

Page 3: Scaling Analysis Responsibly

“We just don’t have enough analysts!”

Page 4: Scaling Analysis Responsibly

“Let’s scale by building the perfect BI tool!”

Page 5: Scaling Analysis Responsibly

That sounds great!

We should automate some of the things that are slowing you down

PRODUCTTEAM

DATA

http://xkcd.com/

Page 6: Scaling Analysis Responsibly

That seems perfectly reasonable!

Let’s just enlist some folks from engineering to help you with it

DATAPRODUCTTEAM

Page 7: Scaling Analysis Responsibly

DATA ENG

Sure thing!

...and finally can it add this last graph?

Page 8: Scaling Analysis Responsibly

several months pass…

Page 9: Scaling Analysis Responsibly

ENG

Sure! File a ticket!

Can we add these 132 extra metrics to the testing?

PRODUCTTEAM

Page 10: Scaling Analysis Responsibly

You can’t do that, your family-wise error rate will tend to 1!!

ENG PRODUCTTEAM

DATA

Page 11: Scaling Analysis Responsibly

ENG

That’s a reasonable expectation for an internal product. I’m on it!

I’d really like this tool to be more stable.

PRODUCTTEAM

Page 12: Scaling Analysis Responsibly

Our test violates a subtle statistical assumption for this new application, and we need to gut this stable product!

ENG PRODUCTTEAM

DATA

Page 13: Scaling Analysis Responsibly

Almost impossible to avoid 2-against-1 dysfunction as product teams become “self-service” with engineering support

Invariably becomes a race to the bottom as internal competition for the simplest tool emerges

Stability prioritized over flexibility

Page 14: Scaling Analysis Responsibly

(In tech)

Building = Owning

Page 15: Scaling Analysis Responsibly

Analysis Developer!

Page 16: Scaling Analysis Responsibly

“Analysis Developer”

Someone on the analyst team who develops reproducible, flexible analyses in R and helps all analysts scale their work

Page 17: Scaling Analysis Responsibly

I’ll work with the analysis developer on my team!

We should automate some of the things that are slowing you down

PRODUCTTEAM

DATA

Page 18: Scaling Analysis Responsibly

Avoids common types of dysfunction

Allows for flexible, accurate analysis

Analysts acquire marketable skills!

Page 19: Scaling Analysis Responsibly

Instead of creating dashboards or using static BI tools...

http://dilbert.com/strip/2007-05-16

Page 20: Scaling Analysis Responsibly

Series of R packages highly specified for business case, “mix and match” elements to rapidly create common reports.

library(“internal_package”)

Page 21: Scaling Analysis Responsibly
Page 22: Scaling Analysis Responsibly

Instead of “assembly line” data processing…

Page 23: Scaling Analysis Responsibly

Close 2-way partnership with data engineers to optimize the creation of datasets for certain common analyses.

The assembly line handoff from scientist to engineer creates [an uncreative] environment. The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved. - Jeff Magnusson

http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/

Page 24: Scaling Analysis Responsibly

Instead of PM anxiously watching dashboards…

https://www.youtube.com/watch?v=CCbWyYr82BM

Page 25: Scaling Analysis Responsibly

Analysts can create shorter-lived, reproducible reports

Page 26: Scaling Analysis Responsibly

Expectation manage the shorter lifespan of the report, but include that report will require less work from teams once created

Productionize in the short-term with CRON jobs

Can add in more stats this way! Y/Y turns into semiparametric models, etc.

Page 28: Scaling Analysis Responsibly

http://dilbert.com/strip/2004-04-05

Instead of promotion based on deliverables…

Page 29: Scaling Analysis Responsibly

Consider skill acquisition for analyst promotion

For analysis developers, promoted based on whether or not they were able to help other analysts become more efficient

Support for skill acquisition!

Page 30: Scaling Analysis Responsibly

Education support for learning better analysis development methods for all analysts

Internally created resources

Page 31: Scaling Analysis Responsibly

Instead of PMs self-teaching analysis based on what’s presented in dashboarding tools..

https://xkcd.com/605/

Page 32: Scaling Analysis Responsibly

PMs can use tools for education analysts if they want to “ramp up” on analytical skills like R

This way you can bake in statistical education as well.

Page 33: Scaling Analysis Responsibly

“Isn’t this just package development?”

Page 34: Scaling Analysis Responsibly

“Isn’t this just package development?”

No!

Page 35: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

Page 36: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

+ scripting

Page 37: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

+ scripting

Page 38: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

+ scripting

+ reproducibility, some functions, “analysis testing”

Page 39: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

+ scripting

+ reproducibility, some functions, “analysis testing”

Page 40: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

Page 41: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

Page 42: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Page 43: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

External package development

+ scripting

+ reproducibility, some functions, “analysis testing”

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Page 44: Scaling Analysis Responsibly

Ad-hoc spreadsheet work

R workflows

Reproducible R analyses

Internal package development

External package development

+ reproducibility, some functions, “analysis testing”

+ scripting

+ workplace-wide audience, documentation, testing- problem-specific writeups and functions

+ industry-wide audience- company-specific code and functions

Analysis Developer

Open-Source Developer

Page 45: Scaling Analysis Responsibly

Analysis Developer

Stop trying to scale with static BI tools -- this will (almost) always lead to dysfunction

Instead, scale by increasing analyst efficiency using R and education!

Hire Analysis Developers to help with all this!

Page 46: Scaling Analysis Responsibly

Thanks!

Hilary Parker

@hspter