asking the right questions of your data

18
Copyright © Think Big Analytics and Neustar Inc. 1 Asking the Right Questions of your Data Mike Peterson VP of Platforms and Data Architecture, Neustar Jun 26, 2013

Upload: hadoopsummit

Post on 05-Dec-2014

739 views

Category:

Technology


2 download

DESCRIPTION

Executives are still waiting on our “Big Data Deep Insights”. Many of us are down the path of collecting, extracting, and analyzing our ever-growing data in Hadoop environments. We are building our data science expertise and expanding data governance. Yet still we are not getting what we are waiting for.This talk is about: 1. Getting to the right questions 2. Setting expectations with the executive team 3. The unintentional consequence of suddenly having lots of data 4. Framing the boundaries of our data science 5. Pragmatic data governance 6. Looking outside your data to 3rd party data

TRANSCRIPT

Page 1: Asking the Right Questions of Your Data

Copyright © Think Big Analytics and Neustar Inc.1

Asking the Right Questions of your Data

Mike PetersonVP of Platforms and Data Architecture, Neustar

Jun 26, 2013

Page 2: Asking the Right Questions of Your Data

2 Copyright © Neustar Inc.

Page 3: Asking the Right Questions of Your Data

We have come a long way!!!

3

But where/when is the GOLD?Unintended Consequence of Big DataWe need to ask the right QuestionsOh, and lets remember religionand not forget GOVERNANCE

Copyright © Neustar Inc.

Page 4: Asking the Right Questions of Your Data

Big Data Evolution Status

4

» New data platform is built – 3Tier » Collected many Pbs of data» Hadoop infrastructure in place for 2yrs » Established Data Science teams» Machine Learning is in place » Increased technology skills» Focused data teams» Active in the community

Copyright © Neustar Inc.

Page 5: Asking the Right Questions of Your Data

Our Partners are still a part of our process

5 Copyright © Think Big Analytics and Neustar Inc.

» Expertise in Technologies» Trusted partner» Collaborative Teams

» Open source leader» Invested in client success» Price/performance

Page 6: Asking the Right Questions of Your Data

Some Unintended Consequences

6

» More Customer Reporting Request» Because we suddenly have lots of customer

data available» Meaning more work for the DW team!!!

» DR Site is more required than ever» More data, means more critical data to protect» Network Stress to support DR and other additional

access

» Data Governance is overwhelmed with request» Retention Policies need to be re-thought

Copyright © Neustar Inc.

Page 7: Asking the Right Questions of Your Data

Questions

7

» Customer Driven Questions» Easy to understand

» Subject Questions» Discover the pivot and you have a good start

» Exploratory Questions» Thinking of the unformed questions» Working from the top down» Narrowing the answer before you test all the data

Copyright © Neustar Inc.

Page 8: Asking the Right Questions of Your Data

Questions - Approaches

• Understand what manual process you want to automate: what is currently manually predicted that could be automated and determine if there’s any way to get training data comprising of <input,output> pairs.

• Consider methods to augment existing data with a “pivot” column that can be used to join.  For example, geo-location of an IP address could lead to joining with Census Data based on zip+4.

 

Page 9: Asking the Right Questions of Your Data

Questions - Approaches

• Determine if your problem is one of prediction or one of grouping (clustering).  The latter is more of a task that can lead to better understanding rather than solving a direct business problem.

Page 10: Asking the Right Questions of Your Data

Questions - Approaches

• Determine if you are more interested in finding “interesting” relationships among data columns rather than knowing the columns. This is a task I’d call more of “discovery” than prediction but the idea is to determine one column as the output column in terms of the other columns as input.

• Doing this for all output columns can lead to “discovery” of those correlations that are the strongest (e.g., every time a customer buys beer at 5PM, he is likely to buy diapers).  This is more of a fishing expedition, but can lead to unusual insights. 

Page 11: Asking the Right Questions of Your Data

Impetus Approach to Questioning Data

11 Copyright © Neustar Inc.

EXISTING DATA

PROPERTY

BUSINESS

STRATEGY

CUSTOMER

PROBLEM

STATEMENTS

ANALYSIS OF DATA

PROPERTY

DISCUSSION WITH

STAKEHOLDERS

ANALYSIS OF

PROBLEM

STATEMENT

DATA NEEDS

STATEMENT

REFINED

PROBLEM

STATEMENT

DATA ANALYTICS

PLAN

Page 12: Asking the Right Questions of Your Data

Who knew there was religion in Analytics

12

» Statistical Analysis vs. Machine Learning» Stats people think “truth”» Machine Learning people think “near truth”

» Truth is easy to bound» Cost models make sense to org

» Near Truth is hard to explain and bound » It is where the real exploration happens» But – it can consume the Data Scientist

» Both can net real returns – and they need to co-exist

Copyright © Neustar Inc.

Page 13: Asking the Right Questions of Your Data

13 Copyright © Neustar Inc.

Page 14: Asking the Right Questions of Your Data

GOVERNANCE

14

» Don’t forget about Governance» Contracts» PII» Brand

» CPO & CISO are your friends - honestly» Protect your CUSTOMER DATA

» It will slow you down in the beginning» But you want your results to be reputable

» We need to get to a policy framework at some point that is automated

Copyright © Neustar Inc.

Page 15: Asking the Right Questions of Your Data

About Impetus

» Accelerated consulting and services leader for Big Data; Headquartered in San Jose since 1996; 1400+; Presences in Silicon Valley, Atlanta, NYC; offices in India; Expertise through Architects

» Pioneers in distributed software engineering with vertical and functional expertise; Dedicated innovation labs; 200+ Big Data practitioners; 80+ dedicated to R&D

Page 16: Asking the Right Questions of Your Data

Drill* Incoming Question

* Problem Landscape

* Underlying Constraints

* Specific Goals

Assess* Goal Driven Hypotheses

* Data Requirement

* Resource Requirements

* Analysis Plan

Target* Data Collection

* Quality Assessment

* Cross Validation

* Restructuring

Analyze* Test Previous Hypotheses

* Explore New Hypotheses

* Test

* Quantify Results

Recommend

* Summary of Results

* Key Novel Insights

* Impact Analysis

* Action Items

Data Science Approach

Page 17: Asking the Right Questions of Your Data

» Recommender Systems

» Sentiment Analysis

» Topic Identification

» Predictive Analytics

» Data Stream Analytics

Data Science Focus Areas

Contact us at [email protected]

Page 18: Asking the Right Questions of Your Data

Thank you

Questions?