guerrilla analytics: 7 principles for agile analytics (predictive analytics world 2015)

38
Guerrilla Analytics: 7 Principles for Agile Analytics ENDA RIDGE, PHD Copyright Enda Ridge 2015 #GuerrillaAnalytics http://guerrilla- analytics.net

Upload: enda-ridge

Post on 17-Jan-2017

9.363 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

Guerrilla Analytics:7 Principles for Agile Analytics

ENDA RIDGE, PHD

Copyright Enda Ridge 2015

Page 2: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

2What You Will Learn Why you must identify and mitigate disruptions in projects How the Guerrilla Analytics Principles help Case study on the Guerrilla Analytics Principles in action

How this will help you Data Scientists: you need a defensive Guerrilla Analytics mindset. Without it you will be

overwhelmed by the highly iterative nature of predictive analytics Managers and Directors: you need a Guerrilla Analytics capability for a high performing team

Copyright Enda Ridge 2015

Page 3: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

3What I’ve Learned

PhD‘Design of Experime

nts for Tuning

Algorithms’

Boutique Consultanc

y

Forensic Data

Analytics

Senior Manager

Professional

Services

Head of Algorith

ms

Copyright Enda Ridge 2015

No matter the industry, teams are always plagued by the same problem …

Time is wasted in the confusion and chaos of highly iterative Data Science

2004 2008 2010 2012 2015

Page 4: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

4Teams Need ‘Guerrilla Analytics’

Copyright Enda Ridge 2015

Data• Extraction• Receipt• Loading

Analytics• Transform• Algorithms• Consolidate

Insight• Reporting• Work Products

Disruptio

n

Page 5: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

57 Guerrilla Analytics Principles

Principle 1: Space is cheap, confusion is expensive

Principle 2: Prefer simple, visual project structures and conventions

Principle 3: Prefer automation

Principle 4: Maintain Data Provenance

Principle 5: Version control changes

Principle 6: Consolidate team knowledge

Principle 7: Prefer code that runs from start to finish

Copyright Enda Ridge 2015

Page 6: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

6

Case Study

Copyright Enda Ridge 2015

Page 7: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

7Case Study: Business Problem

Situation: A pharma organization’s programme to improve its Identity Access Management (IAM). IAM ensures that IT access privileges are granted according to one interpretation of policy

Objective: identify ‘permission roles’ that group up common IT permissions

Benefits: IT efficiency. Assign roles instead of individual permissions Staff and systems are properly authenticated and audited Ensure company data is not at risk for being misused Avoid regulatory non-compliance

Copyright Enda Ridge 2015

Page 8: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

8Case Study: Data Science ProblemSystem User Permission

System01 Chaz EmailSystem01 Chaz NetworkSystem01 Dave EmailSystem02 Chaz EmailingSystem02 Chaz SharepointSystem02 Dave SharepointSystem02 Meg EmailSystem02 Meg SharepointSystem02 Meg Network…. … …

Find common subsets of permissions These are ‘permission roles’ for Identity

Access Management 70 systems Thousands of permissions Users can access several systems All systems are different Team is mobilized and ready to review

permissions

Copyright Enda Ridge 2015

Page 9: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

9Case Study: ApproachSystem User Permission

System01 Chaz EmailSystem01 Chaz NetworkSystem01 Dave EmailSystem02 Chaz EmailingSystem02 Chaz SharepointSystem02 Dave SharepointSystem02 Meg EmailSystem02 Meg SharepointSystem02 Meg Network…. …. ….

User Permission

Chaz EmailChaz SharepointChaz NetworkDave EmailDave SharepointMeg EmailMeg SharepointMeg Network…. ….

Copyright Enda Ridge 2015

Seems like a popular group

Page 10: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

10Case Study: ApproachSystem User Permission

System01 Chaz EmailSystem01 Chaz NetworkSystem01 Dave EmailSystem02 Chaz EmailingSystem02 Chaz SharepointSystem02 Dave SharepointSystem02 Meg EmailSystem02 Meg SharepointSystem02 Meg Network…. …. ….

User Permission

Chaz EmailChaz SharepointChaz NetworkDave EmailDave SharepointMeg EmailMeg SharepointMeg NetworkSarah EmailSarah SharepointSarah Network…. ….

Copyright Enda Ridge 2015

Or is it this bigger group?

Page 11: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

11

DataData• Extraction• Receipt• Loading

Analytics• Transform• Algorithms• Consolidate

Insight• Reporting• Work Products

Copyright Enda Ridge 2015

Page 12: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

12Data Receipt: Situation2015-10-01.log

EMAIL_Server.csv

EMAIL_Server.csv2

IAM from Joe.log

2015-10-05.log

Security logs.log

2015-10-07.log

Multiple files from 70 different systems No consistency Delivered at different points in time Refreshed at irregular intervals

Copyright Enda Ridge 2015

Disruptio

n

Page 13: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

13Data Receipt: Guerrilla Analytics

Copyright Enda Ridge 2015

Data

D001• 2015-10-01.log

D002• EMAIL_Server.csv

D003• EMAIL_Server.csv2

D004• IAM from Joe.log

D005• 2015-10-05.log

Principle 1: Space is cheap, confusion is expensive

Principle 2: Prefer simple, visual project structures and conventions

Principle 4: Maintain Data Provenance

Robust to multiple data deliveries Robust to random file names and

customer inconsistencies

Page 14: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

14Data Loading: Situation

Raw

Sch

ema

2015-10-01.log

EMAIL_Server.csv

EMAIL_Server.csv2

IAM from Joe.log

2015-10-05.log

Security logs.log

2015-10-07.log

Files loaded all over the analytics environment

Files renamed Files moved Files ‘archived’ Raw files edited

Copyright Enda Ridge 2015

Disruptio

n

Page 15: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

15Data Loading: Guerrilla Analytics

Copyright Enda Ridge 2015

Raw

Sch

ema

D001 2015-10-01.logD002 EMAIL_Server.csvD003 EMAIL_Server.csv2D004 IAM from Joe.log

D005 2015-10-05.log

D006 Security logs.log

D007 2015-10-07.log

Principle 1: Space is cheap, confusion is expensive Keep everything

Principle 2: Prefer simple, visual project structures and conventions One place for raw data

Principle 4: Maintain Data Provenance Don’t rename, move, modify in any way

Robust to crazy inconsistent files Force code to explicitly use data IDs

Page 16: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

16

Analytics

Data• Extraction• Receipt• Loading

Analytics• Transform• Algorithms• Consolidate

Insight• Reporting• Work

Products

Copyright Enda Ridge 2015

Page 17: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

17Transformation: SituationLots of renaming

IDent Usr sys PTY

3477 Charlie Email4.5 Read

4598 Snoopy Email4.5 Read; send

… … … …

70 different systems Unhelpful field names Evolving understanding of correct

fields

Copyright Enda Ridge 2015

id user system permission

3477 Charlie Email4.5 Read

4598 Snoopy Email4.5 Read; send

… … … … Disruptio

n

Page 18: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

18Transformation: Guerrilla Analytics

Principles in Action Principle 3: Prefer automation Principle 4: Maintain Data

Provenance Principle 5: Version control changes Principle 6: Consolidate team

knowledge

Robust to evolving names and inconsistencies

Data provenance of field names

Copyright Enda Ridge 2015

IDent Usr sys PTY

3477 Charlie Email4.5 Read

4598 Snoopy Email4.5 Read; send

… … … …

id user system permission

3477 Charlie Email4.5 Read4598 Snoopy Email4.5 Read;

send… … … …

dataset from toSys1 IDent idSys1 Usr user… … …

Page 19: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

19Algorithm: Situation1

• Choose data

• Apply mapping

2

• Cast• Index

3

• Reshape & Join

• Apply Rules

• Tidy

4

• Apply Algorithm

• Check Output

Copyright Enda Ridge 2015

Disruptio

n

Where do my outputs go? How to iteratively develop code/rules etc?

Different algorithm parameters Different algorithms

How do I iterate with the broader team and customer?

Page 20: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

20Work Products: Guerrilla Analytics

Copyright Enda Ridge 2015

Principles in action

Wor

k Pr

oduc

tsWP001010_Reshape.sql020_Apply_Rules.sql030_Algorithm.py050_Reports.py050_Report.ppt

WP002

WP003

Principle 1: Space is cheap, confusion is expensive Keep everything

Principle 2: Prefer simple, visual project structures and conventions One place for each output

Principle 4: Maintain Data Provenance Code, plots, reports etc in one place

Robust to multiple iterative work products Scalable to team of any size

Page 21: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

21Early result

ABC AB ABD EF E A BD G ZY WZY50

55

60

65

70

75

80

85

90

95

100 Taking too long to cover users Still too many permission groups

suspect data quality Could tweak the itemset mining

algorithms

Need to iterate and improve

Copyright Enda Ridge 2015

Page 22: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

22

Iteration

Copyright Enda Ridge 2015

Page 23: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

23Analysis: Situation

Data• Latest data• Latest mapping

Analysis• Tidy data format• Apply itemset

mining

Insight• ?

Copyright Enda Ridge 2015

Disruptio

n

Wasted effort in repetition Risk of inconsistency in repetitions Need clear view of how understanding has evolved

Page 24: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

24

Analysis 1

Analysis 2

Guerrilla Analytics: Consolidate

1

• Choose data

• Apply mapping

2

• Cast• Index

3

• Reshape & Join

• Apply Rules

• Tidy

4

• Published Interface Datasets

Copyright Enda Ridge 2015

Page 25: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

25

Analysis 1

Analysis 2

Guerrilla Analytics: Consolidate

1

• Choose data

• Apply mapping

2

• Cast• Index

3

• Reshape & Join

• Apply Rules

• Tidy

4

• Published Interface Datasets

Copyright Enda Ridge 2015

Build tool automation

Version controlled code

Page 26: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

26

Analysis 1

Analysis 2

Guerrilla Analytics: Consolidate

1

• Choose data

• Apply mapping

2

• Cast• Index

3

• Reshape & Join

• Apply Rules

• Tidy

4

• Published Interface Datasets

Copyright Enda Ridge 2015

Build tool automation

Version controlled code

Principle 3: prefer automation

Principle 4: maintain data provenance

Principle 5: version control changes

Principle 6: consolidate team knowledge

Page 27: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

27

Reporting

Data• Extraction• Receipt• Loading

Analytics• Transform• Algorithms• Consolidate

Insight• Reporting• Work

Products

Copyright Enda Ridge 2015

Page 28: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

28Iterative Analysis

ABC AB ABD EF E etc50

55

60

65

70

75

80

85

90

95

100 Data cleaning and algorithm tuning give better results

Clear version of ‘consolidated knowledge’

Clear work products for each iteration

Copyright Enda Ridge 2015

Page 29: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

29

Reporting

Copyright Enda Ridge 2015

Page 30: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

30Reporting: SituationWe analysed the latest data, applying an itemset mining algorithm to recommend permission roles.Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of….

Copyright Enda Ridge 2015

ABC AB ABD EF E etc50556065707580859095

100

Page 31: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

31Reporting: SituationWe analysed the latest data, applying an itemset mining algorithm to recommend permission roles.Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of….

Which latest data? Which systems?

Which algorithm? parameters?

Which business rules? What recommendations? How is it different from last iteration?

Copyright Enda Ridge 2015

ABC AB ABD EF E etc50556065707580859095

100

Disruptio

n

Page 32: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

32Guerrilla Analytics: Project Structure

FilesPr

ojec

t

data

D001

D002

D010

work prod

WP_001

WP_002

Copyright Enda Ridge 2015

Page 33: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

33Guerilla Analytics: Project Structure

FilesPr

ojec

t

data

D001

D002

D010

work prod

WP_001

WP_002

Data Science environment

Proj

ect

data

D001

D002

build

clean_data

algo_input

work prod

WP_001

WP_002

Copyright Enda Ridge 2015

Page 34: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

34Reporting: Guerrilla AnalyticsWe analysed the latest data, applying an itemset mining algorithm to recommend permission roles.Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of….

Which latest data? Which rules? Which systems?

Build version 2.2

Which algorithm parameters? What recommendations?

Work product 042

How is it different from last iteration? Work product 031 versus 042

Copyright Enda Ridge 2015

ABC AB ABD EF E etc50556065707580859095

100

Page 35: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

35Guerrilla Analytics Success

Coped with multiple inconsistent data deliveries

Robust to evolving business rules and moving target of live systems

Quick turn around of different algorithms while closing out permission roles in a live system

Project delivered in weeks rather than months

Copyright Enda Ridge 2015

Page 36: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

36Guerrilla Analytics Capability

Agility

3. Guerrilla Analytics Mindset

2.Supporting

Tools

1. Simple

Conventions

Copyright Enda Ridge 2015

Page 37: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

37Guerrilla Analytics Capability

Agility

3. Guerrilla Analytics Mindset

2.Supporting

Tools

1. Simple

Conventions

Copyright Enda Ridge 2015

• 7 Guerrilla Analytics Principles

• 100+ practice tips• Data Science patterns

•Build Tools•Tracking•Version control

•Data receipt•Data load•Tidy Data format•…

Page 38: Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

#GuerrillaAnalytics http://guerrilla-analytics.net

38Summing up Agility means delivering despite disruptions High performing agile teams have capability to

mitigate disruptions 7 Guerrilla Analytics Principles for defensive

Data Science Guerrilla Analytics Principles in action across

Data receipt Data load Iterative work products Consolidation Reporting

Copyright Enda Ridge 2015

@Enda_Ridge

http://guerrilla-analytics.net