lego: data driven growth hacking powered by big data

14
1 Salesforce Confidential Salesforce Confidential LEGO: Data Driven Growth Hacking Powered by Big Data June 2016 Kamal Duggireddy Prashant Gokhale

Upload: hadoop-summit

Post on 07-Jan-2017

486 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: LEGO: Data Driven Growth Hacking Powered by Big Data

1Salesforce ConfidentialSalesforce

Confidential

LEGO: Data Driven Growth Hacking Powered by Big Data

June 2016

Kamal Duggireddy Prashant Gokhale

Page 2: LEGO: Data Driven Growth Hacking Powered by Big Data

2Salesforce Confidential

Kamal Duggireddy

Kamal Duggireddy currently leads Data Engineering, Product Data Science Team at Salesforce.com Prior to this, he served as Director - Big Data Architecture at American Express. Combining deep technical skills along with business knowledge and strong execution experience, Kamal developed reference architectures and new enterprise-level capabilities with the Hadoop stack.

Prashant Gokhale

Prashant is currently working on solving big data problems at Salesforce.com using Hadoop and its ecosystem components. Prior to this he held several critical engineering positions at Yahoo, Cloudera & Lookout.

About Us

Page 3: LEGO: Data Driven Growth Hacking Powered by Big Data

3Salesforce Confidential

The Use Case | Overview

ExecutivesAnalystsProduct Managers

Page 4: LEGO: Data Driven Growth Hacking Powered by Big Data

4Salesforce Confidential

The Use Case | Flow

Ad-Hoc Requests

Predictive Data Apps

Data Engineering & Curation

Smart Data Dashboards(Salesforce Wave)

Advanced AnalysisInstrumentation

150+ Loglines

HadoopData Processing

Traditional Data Warehouses Dimensions

Page 5: LEGO: Data Driven Growth Hacking Powered by Big Data

5Salesforce Confidential

The Journey | How it all started

Page 6: LEGO: Data Driven Growth Hacking Powered by Big Data

6Salesforce Confidential

Milestones | Along the way

</>

<\>

Reusability Declarative Data Lake Data Dictionary

Self serviceAutomation

Security Visualization Governance

Page 7: LEGO: Data Driven Growth Hacking Powered by Big Data

7Salesforce Confidential

The Framework | Finally!

Dat

aset

s(V

ario

us g

rain

)

Data Lake

Log Processing

Metadata

Flow Engine

W

eb A

pp

Self Service

Log

Sou

rces

Clou

d M

etri

cs

Data Profiler

Data Science

Kafka Splunk

Files

Warehouse

Objects

Hadoop

Cube

s(C

usto

m g

rain

)

Page 8: LEGO: Data Driven Growth Hacking Powered by Big Data

8Salesforce Confidential

Goals

ScalableProcess hundreds of billions of log lines.

FlexibleHandle thousands of log schemas. Support variable grain and transformations using custom code.

Data QualityAutomated data profiling, monitoring and alerting.

Self ServiceEnable ad-hoc analysis

Page 9: LEGO: Data Driven Growth Hacking Powered by Big Data

9Salesforce Confidential

Log Processing Engine•Declaratively define features and flows.

•Normalize data across multiple log lines.

•Custom code injection for data transformation.

Data Profiler•Profile data at scale to detect anomalies.

Web App •Interface to manage features and flows.

Job Automation engine•End to end automation from features/flows to curated data sets in Wave.

Key Building Blocks

Page 10: LEGO: Data Driven Growth Hacking Powered by Big Data

10Salesforce Confidential

Log Processing Engine

logType==’X’ and event==’Create Event’ and page==’Home Landing’,”Feat 1”,”eval_code(event.toUpperCase())”,page,…..

logType==’ABC’ and event==’Create Event’ and page==’Home’,”Feat 2”,”eval_code(event.substring(5))”,event,…..

usage Log Files

Feature definitions

Hive tables

Data Normalization

Data Cleansing

Data Transformation

+

Page 11: LEGO: Data Driven Growth Hacking Powered by Big Data

11Salesforce Confidential

Data Profiler

Dataset Field Type, Total, Min, Max, Avg, # Nulls, # Distinct, Median, 99th %tile, Top N

lego_feat browser STR 2.3B 7 63 25 1M 50 34 38 [.....]

lego_feat url STR 2.3B 20 223 50 0 5M 70 90 [.....] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Datasets across platform

HCatalog

MapReduce

Datasets Dataset Profile An Example

Monitoring & alerting

Page 12: LEGO: Data Driven Growth Hacking Powered by Big Data

12Salesforce Confidential

Everything put together

Dat

aset

s(V

ario

us g

rain

)

Data Lake

Log Processing

Metadata

Flow Engine

W

eb A

pp

Self Service

Log

Sou

rces

Clou

d M

etri

cs

Data Profiler

Data Science

Kafka Splunk

Files

Warehouse

Objects

Hadoop

Cube

s(C

usto

m g

rain

)

Page 13: LEGO: Data Driven Growth Hacking Powered by Big Data

13Salesforce Confidential

Data Volumetrics

TOTAL

Avg. Volume of App Logs processed (Compressed) 100’s TB/mon

Avg. Number of Jobs 6000+ /mon

Avg. Log Size volume growth rate A lot!

Number of Log Record Types 1,000s

Number of fields 10s of 1,000s

200+ BEvents / Day

500+Features

Page 14: LEGO: Data Driven Growth Hacking Powered by Big Data

14Salesforce Confidential

thank y u

14

We are hiring!! www.salesforce.com/comapany/careers