the data analytic lifecyclemit.spbau.ru/files/steve todd data analytics lifecycle.pdf ·...

30
1 EMC CONFIDENTIAL—INTERNAL USE ONLY The Data Analytic Lifecycle Steve Todd, EMC Fellow Vice President of Strategy and Innovation Academic University St. Petersburg, Russia April 11, 2013

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

1 EMC CONFIDENTIAL—INTERNAL USE ONLY

The Data Analytic Lifecycle

Steve Todd, EMC Fellow Vice President of Strategy and Innovation

Academic University St. Petersburg, Russia April 11, 2013

Page 2: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

2 EMC CONFIDENTIAL—INTERNAL USE ONLY

Goals

Introduce myself

My history managing Global Innovation since 2011

My decision to gather/analyze global innovation data

The mistakes I made when I started

My involvement with EMC’s Data Scientist curriculum

Starting over with the Data Analytics Lifecycle

Page 3: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

3 EMC CONFIDENTIAL—INTERNAL USE ONLY

My Career

B.S.C.S, M.S.C.S. University of New Hampshire

200+ patents filed

Author of Two Books on Innovation

Selected as Top 10 Innovation Blogger

Selected as EMC Distinguished Engineer in 2008

One of 5 active EMC Fellows (60,000+) employees

Corporate Vice President of Strategy and Innovation

Global Innovation consultant

Russia, China, Israel, Egypt, Europe, India, and Brazil

Page 4: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

4 EMC CONFIDENTIAL—INTERNAL USE ONLY

May 2011: Director of the EIN

EMC Innovation Network was created in 2007

The Director manages global innovation and research

Mission Statement:

You can’t manage what you can’t measure…..

Expand knowledge locally, Transfer it globally, and Leverage it strategically’

Page 5: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

5 EMC CONFIDENTIAL—INTERNAL USE ONLY

Gathering Innovation Data

Beijing/ Jidong Chen

Bangalore / Karthik Srinivasan

Tel Aviv / Yael Villa

St. Petersburg Pavel Egorov, Inga Petryaevskaya, Ivan Gumenyuk

Cork/Padraig Murphy

Santa Clara / Mike Dutch

Cairo / Shareef Bassiouny

Hopkinton/

Team Formed June 7, 2011

Shanghai/ Roby Chen

Steve Todd Sudhir Vijendra Mary Henderson Sairam Iyer Calvin Smith

Page 6: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

6 EMC CONFIDENTIAL—INTERNAL USE ONLY

Data Collection

Track activities commonly associated with innovation

University Engagements

Publications Conferences Customers/ Partners

Knowledge Transfer Sessions

Ideas Intellectual Property

Page 7: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

7 EMC CONFIDENTIAL—INTERNAL USE ONLY

Architectural Approach Dashboard/Analytics for research/innovation activities

Database Dashboard – Metrics/reports

Analytics

Page 8: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

8 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #1

Dirty Data

Page 9: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

9 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #2

Selecting an Analytic Model

Page 10: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

10 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #3

Too many visualizations!

Page 11: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

11 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #4

No way to measure “lineage”

Idea 1

Idea 2

Idea 3

Idea 4

Idea 5

Idea 6

Idea 7

Finalist 1

Finalist 2

Finalist 3

POC Mtg 1

POC Mtg 2

POC Mtg 3

POC Mtg 4

Product Specification

Sprint 1

Sprint 2

Sprint 3

Product Complete

Patent 1 Patent 2

Page 12: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

12 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #5

No recommendations to improve innovation at EMC

Page 13: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

13 EMC CONFIDENTIAL—INTERNAL USE ONLY

EMC To the Rescue

EMC is a Big Data Analytics company……

EMC has created a Data Scientist curriculum

Page 14: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

14 EMC CONFIDENTIAL—INTERNAL USE ONLY

IN 2000 THE WORLD GENERATED

TWO EXABYTES OF NEW INFORMATION

Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.

Page 15: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

15 EMC CONFIDENTIAL—INTERNAL USE ONLY

Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.

IN 2000 THE WORLD GENERATED

TWO EXABYTES OF NEW INFORMATION

EVERY DAY

Page 16: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

16 EMC CONFIDENTIAL—INTERNAL USE ONLY

Page 17: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

17 EMC CONFIDENTIAL—INTERNAL USE ONLY

I enrolled for the Data Scientist Course…

Page 18: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

18 EMC CONFIDENTIAL—INTERNAL USE ONLY

… and discovered the Data Analytics Life Cycle

Page 19: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

19 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Discovery

• Frame the business problem as an analytic challenge that can be solved in phases.

• Understand what's been done in the past.

• Assess the resources supporting the project (people, technology, time, and data).

• Form initial hypotheses.

• Create an Analytic Plan

Page 20: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

20 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Hypotheses

• Statements that I will try and prove or disprove with analytics

• IH1: Innovation activity in different geographic regions can be mapped to corporate strategic directions. • IH2: Innovators that participate in global knowledge transfer deliver ideas more quickly than those that do not. • IH3: An idea submission can be analyzed and evaluated for the likelihood of receiving funding. • IH4: Knowledge discovery and growth for a particular topic can be measured and compared across geographic

regions. • IH5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions. • IH6: Strategic corporate themes can be mapped to geographic regions. • IH7: Frequent knowledge expansion and transfer events reduce the amount of time it takes to generate a

corporate asset from an idea. • IH8: Emerging research topics can be classified and mapped to specific ideators, innovators, boundary spanners

and assets.

An increase in geographic knowledge transfer improves the speed of idea delivery.

Page 21: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

21 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Discovery • Create an Analytic Plan

Page 22: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

22 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 2: Discovery

• Build an analytic sandbox

• Extract, Load, Transform (ELT but not ETL)

• Explore the Data

• Assess Data Quality

• Phase 2 is all about conditioning, or preparing, the data….

• …. And I DISCOVERED that I did not have the right data to prove one of my hypotheses

Page 23: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

23 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 2: Discovery

Page 24: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

24 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 3: Model Planning In Phase 2, the data exploration was mainly about conditioning the data, exploring it, validating quality, and understanding it more fully. Phase 3: Look at every hypothesis Perform limited experiments with different analytic models

H5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions.

Page 25: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

25 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 4: Run the Models!

• Key analytic models chosen for our project • Social Network Analysis • Topic Modeling (Stanford Toolkit) • Natural Language Processing

Page 26: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

26 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 5: Communicate Results

• Many people who are great at the analytics do not enjoy telling their story or evangelizing the project….. • …. But analytics are supposed to drive change!

Page 27: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

27 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 6: Operationalize

• Employees can “improve” their ideas by using our analytic models

• Complex text matching algorithm • Helps measure ancestry of ideas • Helps identify subject matter experts around the world • Convinces EMC executives that Big Data is powerful • Great public relations for EMC • Identifies “clusters” of innovators • Creates a data-driven culture at EMC

Page 28: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

28 EMC CONFIDENTIAL—INTERNAL USE ONLY

EMC Data Science Curriculum

90 min

1 day

5 days Aspiring Data

Scientists

Business Leaders

Heads of Data Science Teams

Data Science and Big Data Analytics

Data Science and Big Data Analytics for Business Transformation

Introducing Data Science and Big Data Analytics for Business Transformation

New

New

Page 29: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,

29 EMC CONFIDENTIAL—INTERNAL USE ONLY

Questions?

Additional Resources:

1. EMC Education Services curriculum on Data Science and Big Data Analytics

for Business Transformation:

http://education.emc.com/guest/campaign/data_science.aspx

2. My Blog on Data Science & Big Data Analytics:

http://infocus.emc.com/author/david_dietrich/

3. Blog on applying Data Analytics Lifecycle to measuring innovation data:

http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/

Page 30: The Data Analytic Lifecyclemit.spbau.ru/files/Steve Todd Data Analytics Lifecycle.pdf · 2015-03-11 · • Assess the resources supporting the project (people, technology, time,