got data? how luminar used hadoop data to analyze the california latino market for milk

35
© 2014 Luminar is a fully owned Entravision business unit GOT DATA? How Hadoop Market Analysis Helped the California Milk Processing Board Better Serve the California Latino Market Presented by: Oscar E. Padilla, VP Strategy for Luminar Justin Sears, Product Marketing Manager , Hortonworks June 4, 2014

Upload: hortonworks

Post on 23-Aug-2014

694 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

Luminar is the first big data analytics provider focused specifically on U.S. Latino consumers. The company offers analysis based on empirical insights, rather than a sample-based approach. Apache Hadoop makes this empirical approach work at scale. In 2012, Luminar began collaborating with Hortonworks to deploy a fully-integrated modern data architecture with Hortonworks Data Platform (HDP). Luminar’s predictive modeling runs on HDP and empowers companies to make real-time decisions that connect their products with Latino consumers for measurable results. Luminar’s VP of Strategy, Oscar Padilla, and Justin Sears, Hortonworks Product Marketing Manager, delivered this presentation at the 2014 Hadoop Summit in San Jose, California. They showed Luminar’s analysis of milk consumption among Latino households, for the California Milk Processing Board.

TRANSCRIPT

Page 1: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

© 2014 Luminar is a fully owned Entravision business unit

GOT DATA? How Hadoop Market Analysis Helped the California Milk Processing Board Better Serve the California Latino Market Presented by: Oscar E. Padilla, VP Strategy for Luminar Justin Sears, Product Marketing Manager , Hortonworks

June 4, 2014

Page 2: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

2

●  How a marketing business adopts Hadoop to solve real challenges

●  Luminar’s Hadoop evolution from HDP 1.x to 2.x

●  Lessons from the successful Luminar/Hortonworks partnership

●  A bit about the data architecture and how it all works together

Key topics we’re looking to cover today

© 2014 Luminar is a fully owned Entravision business unit

Page 3: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process

Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind

Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Hortonworks Mission:

Reseller Partners:

Headquartered in Palo Alto, CA; 300+ employees and growing

Page 4: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New data puts pressure on the architecture AP

PLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

Unstructured  documents,  emails  

Clickstream  

Server  logs  

Sen6ment,  Web  Data  

Sensor.  Machine  Data  

Geoloca6on  

Page 5: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop in a Modern Data Architecture AP

PLICAT

IONS  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca6on  Data  

OPERATIONS  TOOLS  

Provision, Manage & Monitor

DEV  &  DATA  TOOLS  

Build & Test

DATA

   SYSTEM  

REPOSITORIES  

RDBMS   EDW   MPP  

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

Page 6: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New analytic apps for new types of data

$

•  Supplier Consolidation •  Supply Chain and Logistics •  Assembly Line Quality Assurance •  Proactive Maintenance •  Crowdsourced Quality Assurance

•  New Account Risk Screens •  Fraud Prevention •  Trading Risk •  Maximize Deposit Spread •  Insurance Underwriting •  Accelerate Loan Processing

•  Call Detail Records (CDRs) •  Infrastructure Investment •  Next Product to Buy (NPTB) •  Real-time Bandwidth

Allocation •  New Product Development

•  360° View of the Customer •  Analyze Brand Sentiment •  Localized, Personalized

Promotions •  Website Optimization •  Optimal Store Layout

Financial Services

Retail Telecom

Manufacturing Healthcare Utilities, Oil & Gas

•  Genomic data for medical trials •  Monitor patient vitals •  Reduce re-admittance rates •  Store medical research data •  Recruit cohorts for

pharmaceutical trials

•  Smart meter stream analysis •  Slow oil well decline curves •  Optimize lease bidding •  Compliance reporting •  Proactive equipment repair •  Seismic image processing

Page 7: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Companies typically start Hadoop for new analytic applications…

SCA

LE

SCOPE

New Analytic Apps New types of data LOB-driven

Page 8: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

… and incrementally grow to a ‘Data Lake’ SC

ALE

SCOPE

New Analytic Apps New types of data LOB-driven

Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

A Modern Data Architecture/Data Lake  

RDBMS

MPP

EDW

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

Page 9: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP delivers enterprise Hadoop

HDP 2.1 Hortonworks Data Platform

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  YARN  :  Data  Opera4ng  System  

DATA    MANAGEMENT  

SECURITY  DATA    ACCESS  GOVERNANCE  &  INTEGRATION  

Authen4ca4on  Authoriza4on  Accoun4ng  

Data  Protec4on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  Analy6cs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce  

   

Deployment  Choice  Linux Windows On-Premise Cloud

Comprehensive enterprise Hadoop delivered completely in the open Wholly Integrated for deep ecosystem interoperability

Page 10: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Luminar is a Hortonworks pioneer

•  Early adopter: began using Hortonworks Data Platform in September 2012

•  First customer case study on Hortonworks.com

•  Featured on “Advertisers do Hadoop” industry solutions page

•  Quantum migration from HDP 1.1 to HDP 2.0

•  Numerous quotes, interviews and speaking events

•  Excellent results

Page 11: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

11

Santa Monica CA

Denver CO

Dallas TX

Washington DC

Mexico City Mexico

Data Scientist Resource

Buenos Aires Argentina

Data Scientist Resource

●  Luminar is an analytics and modeling company focused on helping clients achieve growth and gain greater efficiencies

●  We build the first cloud-based Big Data/ Hadoop analytics environment in the US serving the Latino market

●  Key client segments include: Retail, CPG, Financial Services, Media & Entertainment, automotive, and Publishing, and services sector

●  Luminar is an Entravision Communications (NYSE: EVC) business unit

Miami FL

Chicago IL

© 2014 Luminar is a fully owned Entravision business unit

Page 12: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

12

Why The U.S. May Be The New 'Emerging' Market P&G, Coke, GM and Others Are Showing Renewed Interest in the U.S. as Its Growth Potential Rises

“The U.S. population is growing – 27 million more inhabitants in 10 years…The percentage of Hispanics,

Asians and African-Americans keeps growing. They contribute to winning elections but, more importantly,

they're over-consumers…”

AdAge Feb. 2013 Frederic Roze, President & CEO L'Oreal USA

© 2014 Luminar is a fully owned Entravision business unit

Page 13: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

13

Today’s Challenge in Targeting Hispanics

Brands are making marketing investment decisions on limited information

Targeting assumptions based mostly on survey or sampled methods (i.e. “Latinos over-index on mobile usage”)

Limited access to quantitative insights

© 2014 Luminar is a fully owned Entravision business unit

Page 14: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

14

Sampling is like a Digital Photo…Insights Become Less Precise the Closer You Examine Your Data

Business decisions are inherently

weakened if you solely rely on

sampling methods

© 2014 Luminar is a fully owned Entravision business unit

Page 15: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

To Address this Challenge, Luminar Set Out to Build the Largest Empirical Data Set of its Kind…

15

Consumer Habits

| Traditional Sampled Approach | Luminar’s Latino Business Intelligence

Transactional Data (POS, CRM, loyalty e-commerce)

Digital Media Interactions

Relevant Analytical Models Cultural References

© 2014 Luminar is a fully owned Entravision business unit

Page 16: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

16

Ingesting Large Data Set to Derive Value

150 Million Unique Records 15 Million US

Adult Latinos

© 2014 Luminar is a fully owned Entravision business unit

Representing 68% of all US Adult Latinos over 18

Page 17: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

17

How Luminar Defines Latino Consumers

Consumer Interactions Points

Luminar Cultural Filter and Scoring

Consumption Behavior

Consumer Characteristics

Household-level Analysis

Cultural sub- groupings

Household characteristics

Consumption Patterns

Non-ethnic Comparison

Language dominance

Persona definitions Consumer

© 2014 Luminar is a fully owned Entravision business unit

Page 18: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Marketers Are Seeking Precise Answers to Fuel Growth and Increase Efficiencies

How much have we earned through diverse promotional

channels?

How acculturated is my market? Do I target them in Spanish,

English or both?

What is the Efficiency of our media activities?

Which marketing drivers have had the

greatest effects?

What’s the size of the prize in my trading area?

Are we optimally allocating our budget across all products?

What’s my market share?

18 © 2014 Luminar is a fully owned Entravision business unit

Page 19: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

19

California Milk Processing Board

© 2014 Luminar is a fully owned Entravision business unit

Page 20: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

●  Milk consumption is experiencing consumption decline. Business dynamics driving this decline include:

─  an aging population,

─  consumption of milk alternative products, as well as

─  consumption of milk substitutes (i.e. energy drinks, juices, etc.)

●  Without an empirical understanding of historical performance CMPB was left with an incomplete and often inaccurate read into the Hispanic market

●  To identify potential areas of growth, CMPB needed a means to analyze consumption data over 2-3 year

●  This would require building a robust tool that could monitor milk consumption across multiple DMAs and across variety of “filter options”

Background and Business Challenge

20 © 2014 Luminar is a fully owned Entravision business unit

Page 21: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

●  Luminar set out to aggregate the largest transactional data across the state of California

●  We took a “total market” approach including four major population sub-groupings

─  Hispanics

─  Asian Americans

─  African American

●  Data included transactional records for both Northern and Southern CA

Luminar Solution – Aggregate the Largest Transactional Data set on Milk Products 11. 5 million households in CA

Luminar captured transactional data for nearly 70% of total CA households

21 © 2014 Luminar is a fully owned Entravision business unit

Page 22: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Building CMPB’s Foundational Data Asset

Luminar 150 million Transactional datastore

-  Zip + household data -  Transactional UPC-level data -  Social/demographics -  Population subsector -  Language of dominance

California Milk Transactional Data

-  Item/UPC codes -  Product category -  Milk diary and milk alternatives

Luminar Analytics Process: -  Data enrichment -  Customer segmentation

22 © 2014 Luminar is a fully owned Entravision business unit

Luminar / Client Ready-Data Asset

Product Consumption

Segmentation Analysis

Trend Analysis

Analysis Report Luminar BI Portal

Page 23: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

23

●  Varied data: credit card transactions, set top box streams, voter records and social media

●  Easy integration: Amazon Cloud, R, Talend and Tableau ●  Better ingest:

─  300 to 2,000 data sources ─  2TB to 15TB, monthly data volume

●  Speed to insight: from 3 days to 3 hours processing time

Hortonworks Data Platform Powers Luminar’s Analytics Models

We are going to be improving our ability to listen for what U.S. Latino consumers want and to communicate that voice to more clients through innovative applications running on Hadoop.

Franklin Rios, President – Luminar ” “

© 2014 Luminar is a fully owned Entravision business unit

Page 24: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP: Varied, Granular & Persistent Data for Analysis

                     OPERATIONS  TOOLS  

DATA

   SYSTEM  

EXISTING  REPOSITORY  

SOURC

ES  

AWS  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca6on  Data  

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

APPLICAT

IONS  

Luminar  Insights  

Data  Onboarding  

Page 25: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

A Look Inside the HDP Technology Stack

Page 26: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

26

●  We ingested transactional data into our Big Data environment for processing and developing the analysis to support client objectives

●  We delivered a BI tool that provided access to custom-made relevant KPIs overtime (3-years of historical data)

●  The granularity of the data provided monthly, quarterly and annual reporting across all product segments

●  We then worked with the Agency of Records to provide greater insights into answer “the so what” questions that help identify market growth potential

Our Data Approach

© 2014 Luminar is a fully owned Entravision business unit

Page 27: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

27

Develop Custom BI Application Consisting of 12 Dashboards intersecting multiple data points

© 2014 Luminar is a fully owned Entravision business unit

Page 28: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

28

Data can be queried along a wide range of variables and on different intersecting points

© 2014 Luminar is a fully owned Entravision business unit

Page 29: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

29

Milk Consumption (Gallons per Household)

© 2014 Luminar is a fully owned Entravision business unit

Page 30: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

30

Milk Consumption by Ethnic Segments

© 2014 Luminar is a fully owned Entravision business unit

Page 31: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

A “Single Source of Truth” based on Empirical Consumer Behavior Data

31 Source: 2012 Gartner

─  What data do I have that is relevant and available to make decisions?

─  What data do I need to gather or acquire?

Data Assets ─  What analytics

technique are most appropriate for the business problem and data available?

─  Analytic techniques might include: ─  Classification ─  Product

Consumption ─  Segmentation

Analysis ─  Trend Analysis

Insights ─  How do I tie insights to

operational decisions? ─  How do I close the

feedback loop to test and learn?

Actions ─  How can I grow

revenue? ─  How can I reduce risk

and be more efficient? ─  What do I need

to know? ─  What are my

alternatives? ─  What are my

constraints?

Business problem

© 2014 Luminar is a fully owned Entravision business unit

Page 32: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

32

Deriving Insights: Gallons/HH for “with kids” have declined significantly more than HH “without kids” over past 12 months

© 2014 Luminar is a fully owned Entravision business unit

Page 33: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

33

Deriving Insights: Middle Income Level is the Most Price Sensitive

© 2014 Luminar is a fully owned Entravision business unit

Page 34: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

34

Deriving Insights: Bilingual/English and English/only Hispanic consumption is declining more rapidly, while Spanish-only is on the rise

© 2014 Luminar is a fully owned Entravision business unit

Page 35: GOT DATA? How Luminar Used Hadoop Data to Analyze the California Latino Market for Milk

35

Three Key Closing Remarks…

The low hanging fruit of Hispanic consumers has been picked…the combination our Hadoop data environment and advanced analytics help drive effective frontline actions

It not just about focusing what we know about Latinos, the true opportunities come from seeing something you never seen before…understanding the unknowns

Reaching Hispanics is not just about language, acculturation or relevancy; it's about having precise measurability that can prove efficiencies and ROI

© 2014 Luminar is a fully owned Entravision business unit