bringing the power of big data computation to salesforce
TRANSCRIPT
Bringing the Power of Big Data
Computation to Salesforce
Arun Bhat
Chief Architect – Model N Inc.
@parunbhat
Krishna Shekhram
Software Architect – Model N Inc.
@kshekhram
Speaker IntroductionLittle bit about us
• Model N is the leading provider of Revenue Management solutions for the life sciences and
technology industries.
• The company helps customers maximize revenues, drive growth and reduce compliance risk by
transforming the revenue lifecycle from inefficient disjointed operation into a strategic end to end
process.
Why do we care about big data
Model N – The Pioneer in Revenue Management
Founded in 1999$120+BRevenue under management
2+MSales lines processed daily
100+Companies maximizing revenue with Model N
50,000+Sales, Sales Ops, FAE’s, Finance, Marketing, Manufacturing reps and Distributor users
100+Countries where Model N Revenue Management is used
1,000+Distributors in 50 Countries
Arun Bhat
Chief Architect, Revvy Products
15 years in Model N
19 years in Software Industry
Led Architecture of Model N products
Responsible for architecture of multi-tenant
Revvy products on Salesforce
Passionate about technology but likes to read
comics
Krishna Shekhram
Architect, Revvy Products
6 years in Model N
14 years in Software Industry
Architected Model N Analytics Products
Lead for Revvy Big Data Architecture
Enjoys exploring new technologies. Love to
watch documentaries to learn more about
world.
Model N – The Pioneer in Revenue Management
OverviewWhat we will be discussing over this talk
Leveraging Salesforce
Computing using Big Data
Metadata as a common fabric
Integrating into a Cohesive Architecture
Building a Data Driven Application
Demo
Data Pipeline and BigObjects
Summary
Agenda
Big Data
Leveraging Salesforce To build flexible cloud applications
Availability
Deployment
Elasticity
Customization
Security
Upgradeability
Integration
Device Independence
Multi Tenancy
Metadata
Cloud Computing Force.com Stack Enabling Technology
Leveraging Salesforce Power
User Interface
Logic
Integration
Database
Infrastructure
Develo
per
To
ols
Computing using Big
DataRealize valuable insights, actions and faster decisions from your
data at scale
Source: logs, social media,
mobile, IOT, POS
Format: structured, text, picture,
video, binary, document
Speed: real-time streams,
transactions, batch upload
Rapid Ingestion
Bigger Storage
Faster Processing
Quicker Retrieval
Better Visualization
Hidden insights discovery
Facts based decision making
Business process automation
Ecosystem engagement
Growth & monetization of data
Data Explosion Technology Evolution Business Opportunities
Why “Big Data” is a Big DealCompetitive advantage for today, Survival for tomorrow
Big data technology is going through innovation spurt
Big Data Technology Landscape
Components
• HDFS, Map/Reduce, YARN
• Provides fault tolerant and scalable cluster
HDFS as storage
• Supports variety of data formats
• Metadata driven schema evolution
YARN as cluster manager
• Supports Security, Resource Isolation, Multi-tenancy
• Highly available and elastic scaling
Components
• Spark Core, SQL, MLib, Streaming, GraphX
• Can run in variety of clusters (YARN, Mesos,
Standalone)
Data Access
• Data access from HDFS, S3, Cassandra, HBase,
JDBC, Streaming source like Kafka
• Supports multiple formats like Parquet, json, csv, etc.
Compute
• General purpose low latency compute engine
• Batch, Interactive, Query, Predictive, Graph and
Stream processing
Hadoop and Spark AdvantageData driven, flexible, multi-tenant applications at scale
Hadoop Spark
MetadataThe common fabric
Sales Data Sales Metadata
URL: /tx/sales/Sales.parquet
Columns:
Sale ID: ID
Customer : Relationship (Customer)
Product : Relationship (Product)
Invoice Date: Date
Qty : Integer
Price : Decimal
Metadata ExampleMetadata describes data
Sale ID
Customer
Product
Invoice Date
Qty
Price
Product ID
Product #
BU
Customer
ID
Name
Type
Customer
Sales
Product
Calculation Unit Calculation Model
Flexibility & Extensibility Key for multi tenant cloud applications
Calc
OpInput
Dataset
Output
Dataset
Define
Metadata
Define
Metadata
Input
Dataset
Input
Dataset
Input
Dataset
Output
Dataset
Output
Dataset
Output
Dataset
Calculation
Model
Metadata MetadataConfiguration
• Metadata Capture & Synchronization
• Define all dataset as objects in Salesforce to capture metadata. Example: Sales, Inventory, Order
• Load actual data in HDFS
• Synchronize metadata on change
• Master Data Sync
• Synchronize the master data from SFDC to HDFS. Example: Accounts, Catalog
• HDFS Schema using metadata
• Use HDFS file formats which supports schema evolution(e.g. Parquet, Avro)
• Use the dataset metadata to read/write HDFS file
• Configure Calculation
• Define Variability in calculation as configuration using Salesforce custom object
Leverage Salesforce to capture metadata
Flexibility & Extensibility using metadata
IntegrationBuilding a cohesive architecture
• Exposes all the REST APIs needed for application.
• Stores application and object metadata
• Provides support for multi-tenancy, error handling and recovery
• Provides secure API for
• Metadata synchronization
• Data Loads
• Batch calculation
• Querying the aggregated results
• Real time calculation/prediction
Exposes big data computation as service
Web Service as Middleware
Compute
Cluster
Cluster Web
Service
• Abstracts out complexity of big data technology
• Translates business specific service calls to calculation jobs
• Uses metadata to build calculation model
• Handles connection to cluster
• Manages multi-tenancy context to submit jobs to cluster
• Interacts with Various cluster components
• HDFS
• YARN
• Spark
Acts as client for cluster
Web Service as Middleware
Compute
Cluster
Cluster Web
Service
Building a Data Driven
ApplicationGetting best of both world to realize business value
• Unified transactional and analytics application
• Provides real time insights from data in business context
• Calculates KPIs and processes data for business
• Evaluate performance against goal based on data
• Combines intelligence with Action
• Facilitate business process automation
• Learn from data to support fast and accurate decision
Key ConceptsWhat is a data driven application
Contextual Discovery
Measuring KPIs and
triggering workflow
actions, alerts or
notifications based on KPI.
Claim processing
Fraud detection
Processing large amount
of data and running
business calculation on it
to generate results critical
for business operation.
Tax report generation
Stock portfolio valuation
Intelligent decisions and
actions based on learning
from data. Prediction,
Optimization, Anomaly
detection, AI,
Recommendation.
Google Now, Price
Optimization
Business Process
Automation Data Processing Decision Intelligence
Interactive dashboards
and analysis in the
transactional application
business context.
Account performance
dashboard in CRM
application
Data Driven Application Examples
Guideline for building data driven application
Reference Architecture
Metadata
Manager
Common Library
Data
Manager
Job
Manager
Config
Manager
Application
Account
Catalog
Opportunity
Sales
Segment
Big Data Cluster
Web App Middleware
Cluster Client
Metadata
Service
Data
Service
Application
Service
Data Storage
Calculation Runtime
DemoSeeing is believing
User enters segment definition
See Sales metadata in Salesforce
Show Sales lines loaded in Hadoop
Trigger segmentation from Salesforce
Show dashboards with segmented customers in Salesforce
Segmenting customers based on revenue
Demo Overview
Data Pipelines
BigObjectsCollaborating with Salesforce on the big data roadmap
Data Pipelines
Brings batch processing using Hadoop to the Salesforce Platform
Apache Pig for data flow control and evaluation
BigObjects
Storage of large amounts of data
Data Pipelines and BigObjects (Pilot)
Features that can be leveraged
BigObjects to store POS, Order and line items
Apache Pig Script and Hadoop through the Data Pipeline API
Features that need to be incorporated
Support Data Pipeline API through Apex (instead of the Metadata API)
Support for low latency jobs e.g. Spark (as compared to batch processing)
To get big data computation in Salesforce
Collaborate with Salesforce on big data roadmap
Reference Architecture
Metadata
Manager
Common Library
Data
Manager
Job
Manager
Config
Manager
Application
Account
Catalog
Opportunity
Sales
Segment
Big Data Cluster
Web App Middleware
Cluster Client
Metadata
Service
Data
Service
Application
Service
Data Storage
Calculation RuntimeData
Pipeline
Bulk
SOQL
Apex
SObjects
BigObjects
Files
SObjects
BigObjects
Files
SObjects
BigObjects
Files
SObjects
BigObjects
Files
Job
Manager
Config
Manager
SummaryLet’s recap
• How to leverage Salesforce to build flexible cloud applications
• How to use big data computation to realize valuable insights, actions and faster decisions from your data at
scale
• How to fuse Salesforce and Big Data technologies together using metadata and integrations
• How to unlock your business potential using data driven application
• How Salesforce and Big Data technologies can coexist well
What we learnt
Summary
Thank you