bringing the power of big data computation to salesforce

32
Bringing the Power of Big Data Computation to Salesforce Arun Bhat Chief Architect Model N Inc. [email protected] @parunbhat Krishna Shekhram Software Architect Model N Inc. [email protected] @kshekhram

Upload: salesforce-developers

Post on 20-Mar-2017

404 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Bringing the Power of Big Data Computation to Salesforce

Bringing the Power of Big Data

Computation to Salesforce

Arun Bhat

Chief Architect – Model N Inc.

[email protected]

@parunbhat

Krishna Shekhram

Software Architect – Model N Inc.

[email protected]

@kshekhram

Page 2: Bringing the Power of Big Data Computation to Salesforce

Speaker IntroductionLittle bit about us

Page 3: Bringing the Power of Big Data Computation to Salesforce

• Model N is the leading provider of Revenue Management solutions for the life sciences and

technology industries.

• The company helps customers maximize revenues, drive growth and reduce compliance risk by

transforming the revenue lifecycle from inefficient disjointed operation into a strategic end to end

process.

Why do we care about big data

Model N – The Pioneer in Revenue Management

Founded in 1999$120+BRevenue under management

2+MSales lines processed daily

100+Companies maximizing revenue with Model N

50,000+Sales, Sales Ops, FAE’s, Finance, Marketing, Manufacturing reps and Distributor users

100+Countries where Model N Revenue Management is used

1,000+Distributors in 50 Countries

Page 4: Bringing the Power of Big Data Computation to Salesforce

Arun Bhat

Chief Architect, Revvy Products

15 years in Model N

19 years in Software Industry

Led Architecture of Model N products

Responsible for architecture of multi-tenant

Revvy products on Salesforce

Passionate about technology but likes to read

comics

Krishna Shekhram

Architect, Revvy Products

6 years in Model N

14 years in Software Industry

Architected Model N Analytics Products

Lead for Revvy Big Data Architecture

Enjoys exploring new technologies. Love to

watch documentaries to learn more about

world.

Model N – The Pioneer in Revenue Management

Page 5: Bringing the Power of Big Data Computation to Salesforce

OverviewWhat we will be discussing over this talk

Page 6: Bringing the Power of Big Data Computation to Salesforce

Leveraging Salesforce

Computing using Big Data

Metadata as a common fabric

Integrating into a Cohesive Architecture

Building a Data Driven Application

Demo

Data Pipeline and BigObjects

Summary

Agenda

Big Data

Page 7: Bringing the Power of Big Data Computation to Salesforce

Leveraging Salesforce To build flexible cloud applications

Page 8: Bringing the Power of Big Data Computation to Salesforce

Availability

Deployment

Elasticity

Customization

Security

Upgradeability

Integration

Device Independence

Multi Tenancy

Metadata

Cloud Computing Force.com Stack Enabling Technology

Leveraging Salesforce Power

User Interface

Logic

Integration

Database

Infrastructure

Develo

per

To

ols

Page 9: Bringing the Power of Big Data Computation to Salesforce

Computing using Big

DataRealize valuable insights, actions and faster decisions from your

data at scale

Page 10: Bringing the Power of Big Data Computation to Salesforce

Source: logs, social media,

mobile, IOT, POS

Format: structured, text, picture,

video, binary, document

Speed: real-time streams,

transactions, batch upload

Rapid Ingestion

Bigger Storage

Faster Processing

Quicker Retrieval

Better Visualization

Hidden insights discovery

Facts based decision making

Business process automation

Ecosystem engagement

Growth & monetization of data

Data Explosion Technology Evolution Business Opportunities

Why “Big Data” is a Big DealCompetitive advantage for today, Survival for tomorrow

Page 11: Bringing the Power of Big Data Computation to Salesforce

Big data technology is going through innovation spurt

Big Data Technology Landscape

Page 12: Bringing the Power of Big Data Computation to Salesforce

Components

• HDFS, Map/Reduce, YARN

• Provides fault tolerant and scalable cluster

HDFS as storage

• Supports variety of data formats

• Metadata driven schema evolution

YARN as cluster manager

• Supports Security, Resource Isolation, Multi-tenancy

• Highly available and elastic scaling

Components

• Spark Core, SQL, MLib, Streaming, GraphX

• Can run in variety of clusters (YARN, Mesos,

Standalone)

Data Access

• Data access from HDFS, S3, Cassandra, HBase,

JDBC, Streaming source like Kafka

• Supports multiple formats like Parquet, json, csv, etc.

Compute

• General purpose low latency compute engine

• Batch, Interactive, Query, Predictive, Graph and

Stream processing

Hadoop and Spark AdvantageData driven, flexible, multi-tenant applications at scale

Hadoop Spark

Page 13: Bringing the Power of Big Data Computation to Salesforce

MetadataThe common fabric

Page 14: Bringing the Power of Big Data Computation to Salesforce

Sales Data Sales Metadata

URL: /tx/sales/Sales.parquet

Columns:

Sale ID: ID

Customer : Relationship (Customer)

Product : Relationship (Product)

Invoice Date: Date

Qty : Integer

Price : Decimal

Metadata ExampleMetadata describes data

Sale ID

Customer

Product

Invoice Date

Qty

Price

Product ID

Product #

BU

Customer

ID

Name

Type

Customer

Sales

Product

Page 15: Bringing the Power of Big Data Computation to Salesforce

Calculation Unit Calculation Model

Flexibility & Extensibility Key for multi tenant cloud applications

Calc

OpInput

Dataset

Output

Dataset

Define

Metadata

Define

Metadata

Input

Dataset

Input

Dataset

Input

Dataset

Output

Dataset

Output

Dataset

Output

Dataset

Calculation

Model

Metadata MetadataConfiguration

Page 16: Bringing the Power of Big Data Computation to Salesforce

• Metadata Capture & Synchronization

• Define all dataset as objects in Salesforce to capture metadata. Example: Sales, Inventory, Order

• Load actual data in HDFS

• Synchronize metadata on change

• Master Data Sync

• Synchronize the master data from SFDC to HDFS. Example: Accounts, Catalog

• HDFS Schema using metadata

• Use HDFS file formats which supports schema evolution(e.g. Parquet, Avro)

• Use the dataset metadata to read/write HDFS file

• Configure Calculation

• Define Variability in calculation as configuration using Salesforce custom object

Leverage Salesforce to capture metadata

Flexibility & Extensibility using metadata

Page 17: Bringing the Power of Big Data Computation to Salesforce

IntegrationBuilding a cohesive architecture

Page 18: Bringing the Power of Big Data Computation to Salesforce

• Exposes all the REST APIs needed for application.

• Stores application and object metadata

• Provides support for multi-tenancy, error handling and recovery

• Provides secure API for

• Metadata synchronization

• Data Loads

• Batch calculation

• Querying the aggregated results

• Real time calculation/prediction

Exposes big data computation as service

Web Service as Middleware

Compute

Cluster

Cluster Web

Service

Page 19: Bringing the Power of Big Data Computation to Salesforce

• Abstracts out complexity of big data technology

• Translates business specific service calls to calculation jobs

• Uses metadata to build calculation model

• Handles connection to cluster

• Manages multi-tenancy context to submit jobs to cluster

• Interacts with Various cluster components

• HDFS

• YARN

• Spark

Acts as client for cluster

Web Service as Middleware

Compute

Cluster

Cluster Web

Service

Page 20: Bringing the Power of Big Data Computation to Salesforce

Building a Data Driven

ApplicationGetting best of both world to realize business value

Page 21: Bringing the Power of Big Data Computation to Salesforce

• Unified transactional and analytics application

• Provides real time insights from data in business context

• Calculates KPIs and processes data for business

• Evaluate performance against goal based on data

• Combines intelligence with Action

• Facilitate business process automation

• Learn from data to support fast and accurate decision

Key ConceptsWhat is a data driven application

Page 22: Bringing the Power of Big Data Computation to Salesforce

Contextual Discovery

Measuring KPIs and

triggering workflow

actions, alerts or

notifications based on KPI.

Claim processing

Fraud detection

Processing large amount

of data and running

business calculation on it

to generate results critical

for business operation.

Tax report generation

Stock portfolio valuation

Intelligent decisions and

actions based on learning

from data. Prediction,

Optimization, Anomaly

detection, AI,

Recommendation.

Google Now, Price

Optimization

Business Process

Automation Data Processing Decision Intelligence

Interactive dashboards

and analysis in the

transactional application

business context.

Account performance

dashboard in CRM

application

Data Driven Application Examples

Page 23: Bringing the Power of Big Data Computation to Salesforce

Guideline for building data driven application

Reference Architecture

Metadata

Manager

Common Library

Data

Manager

Job

Manager

Config

Manager

Application

Account

Catalog

Opportunity

Sales

Segment

Big Data Cluster

Web App Middleware

Cluster Client

Metadata

Service

Data

Service

Application

Service

Data Storage

Calculation Runtime

Page 24: Bringing the Power of Big Data Computation to Salesforce

DemoSeeing is believing

Page 25: Bringing the Power of Big Data Computation to Salesforce

User enters segment definition

See Sales metadata in Salesforce

Show Sales lines loaded in Hadoop

Trigger segmentation from Salesforce

Show dashboards with segmented customers in Salesforce

Segmenting customers based on revenue

Demo Overview

Page 26: Bringing the Power of Big Data Computation to Salesforce

Data Pipelines

BigObjectsCollaborating with Salesforce on the big data roadmap

Page 27: Bringing the Power of Big Data Computation to Salesforce

Data Pipelines

Brings batch processing using Hadoop to the Salesforce Platform

Apache Pig for data flow control and evaluation

BigObjects

Storage of large amounts of data

Data Pipelines and BigObjects (Pilot)

Page 28: Bringing the Power of Big Data Computation to Salesforce

Features that can be leveraged

BigObjects to store POS, Order and line items

Apache Pig Script and Hadoop through the Data Pipeline API

Features that need to be incorporated

Support Data Pipeline API through Apex (instead of the Metadata API)

Support for low latency jobs e.g. Spark (as compared to batch processing)

To get big data computation in Salesforce

Collaborate with Salesforce on big data roadmap

Page 29: Bringing the Power of Big Data Computation to Salesforce

Reference Architecture

Metadata

Manager

Common Library

Data

Manager

Job

Manager

Config

Manager

Application

Account

Catalog

Opportunity

Sales

Segment

Big Data Cluster

Web App Middleware

Cluster Client

Metadata

Service

Data

Service

Application

Service

Data Storage

Calculation RuntimeData

Pipeline

Bulk

SOQL

Apex

SObjects

BigObjects

Files

SObjects

BigObjects

Files

SObjects

BigObjects

Files

SObjects

BigObjects

Files

Job

Manager

Config

Manager

Page 30: Bringing the Power of Big Data Computation to Salesforce

SummaryLet’s recap

Page 31: Bringing the Power of Big Data Computation to Salesforce

• How to leverage Salesforce to build flexible cloud applications

• How to use big data computation to realize valuable insights, actions and faster decisions from your data at

scale

• How to fuse Salesforce and Big Data technologies together using metadata and integrations

• How to unlock your business potential using data driven application

• How Salesforce and Big Data technologies can coexist well

What we learnt

Summary

Page 32: Bringing the Power of Big Data Computation to Salesforce

Thank you