phoenix - a high performance open source sql layer over hbase

29
Phoenix A High Performance Open Source SQL Layer over HBase James Taylor, Salesforce.com, Principle Member of the Technical Staff @JamesPlusPlus

Upload: salesforce-developers

Post on 14-Jun-2015

2.806 views

Category:

Technology


0 download

DESCRIPTION

Have a lot of data? Using or considering using Apache HBase (part of the Hadoop family) to store your data? Want to have your cake and eat it too? Phoenix is an open source project put out by Salesforce. Join us to learn how you can continue to use SQL, but get the raw speed of native HBase usage through Phoenix.

TRANSCRIPT

Page 1: Phoenix - A High Performance Open Source SQL Layer over HBase

PhoenixA High Performance Open Source SQL Layer over HBase

James Taylor, Salesforce.com, Principle Member of the Technical Staff@JamesPlusPlus

Page 2: Phoenix - A High Performance Open Source SQL Layer over HBase

Safe harborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

Page 3: Phoenix - A High Performance Open Source SQL Layer over HBase

Adam TormanDirector, Product ManagementSalesforce.com@atorman

Page 4: Phoenix - A High Performance Open Source SQL Layer over HBase

Agenda▪ What is Phoenix?▪ What is HBase?▪ Why use HBase?▪ Why use Phoenix?▪ Use Cases▪ Under the hood▪ Roadmap▪ Q&A

Page 5: Phoenix - A High Performance Open Source SQL Layer over HBase

What is Phoenix?▪ BSD-Licensed open source project

• https://github.com/forcedotcom/phoenix

▪ High performance SQL engine• Over HBase data• Not based on map-reduce• Targets low latency applications

▪ Turns your key value store into a database▪ Powers the big data use cases at Salesforce.com

Page 6: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix Demo▪ Phoenix Stock Analyzer▪ Fortune 500 companies▪ 10 years of historical stock prices

• Over 2 Billion rows

▪ Demonstrates support for low latency applications

Page 7: Phoenix - A High Performance Open Source SQL Layer over HBase

What is HBase?▪ Part of Apache Hadoop ecosystem▪ Runs on top of HDFS▪ Key/value store

• Sparse• Consistent• Distributed• Multidimensional• Sorted

Page 8: Phoenix - A High Performance Open Source SQL Layer over HBase

HBase distribution of data

Page 9: Phoenix - A High Performance Open Source SQL Layer over HBase
Page 10: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix

Page 11: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix

Phoenix

Page 12: Phoenix - A High Performance Open Source SQL Layer over HBase

Why use HBase?▪ If you have lots of data

• Scales linearly• Shards automatically

▪ If you can live without transactions• But there’s work being done here…

▪ If your data changes▪ If you need strict consistency

Page 13: Phoenix - A High Performance Open Source SQL Layer over HBase

Why use Phoenix?▪ Gives folks an API they already know▪ Reduces the amount of code users need to write▪ Performs optimizations transparent to the user

• Aggregation• Skip scanning• Secondary indexing• Query optimization

▪ Leverages existing tooling• SQL client• OLAP engine

Page 14: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix versus Hive performance

Page 15: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix versus Impala performance

Page 16: Phoenix - A High Performance Open Source SQL Layer over HBase

Use cases▪ Data Archival

• Archive big data off Oracle and into HBase while maintaining query-ability of data

▪ Platform Monitoring• Enable customers to track performance metrics of their platform applications

Page 17: Phoenix - A High Performance Open Source SQL Layer over HBase

Why Phoenix is important

• Scalable, low latency app development starts with Phoenix• Phoenix worries about the physical scale and fast performance

app developers don’t have to• Looks, tastes, feels like SOQL to a force.com developer

• Under the covers, turns Hbase into a database that we understand

• Several key customer use cases:• Data Archive• Monitoring, Audit, and Compliance

Page 18: Phoenix - A High Performance Open Source SQL Layer over HBase

Archive Problem Set

• Field History Tracking grows unbounded

• Enterprise customers require long term storage of ‘cold’ data

• Data retention policies can require years of data to be kept around

Page 19: Phoenix - A High Performance Open Source SQL Layer over HBase

Archive Pilot Demonstration

Page 20: Phoenix - A High Performance Open Source SQL Layer over HBase

field history retention

Field history is the basis for data audit trail

Policy driven data retention policy – 5, 7, 10… years

Increased limits to track history on many fields

data lifecycle management

Time policy driven data lifecycle from live to archive state

Configurable behavior across custom schema, accessibility & archive data model

Maintain and assure operational efficiency

Retain access and visibility across data lifecycle

Winter ‘14 & Spring ’14 - Pilots

Spring ‘14 & Summer ’14 - Pilots

Archive Roadmap

Page 21: Phoenix - A High Performance Open Source SQL Layer over HBase

Monitoring Use Case

• Security, Compliance, and Audit• Product Support and Limits Analysis• Product Usage and Management

For EU data compliance, I need to know who, when, and from where someone outside of Europe accessed data

Before I can take an ex-employee to court for downloading the client list, I need to know when, where, and how they did it

I want to take action when I detect an intrusion, identity fraud, or data leakage

I need to analyze limits consumption to ensure mission critical apps don’t run out of resources

What is the status of my batch import or sandbox copy

Before I invest in more licenses, I want to know how many people are actually using it

Page 22: Phoenix - A High Performance Open Source SQL Layer over HBase

Identity Fraud

Page 23: Phoenix - A High Performance Open Source SQL Layer over HBase

Custom Event Schema

Page 24: Phoenix - A High Performance Open Source SQL Layer over HBase

Queries made possible by PhoenixQuery the number of logins over the span of a weekSELECT Count(LoginTime), UserId FROM LoginEvent WHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000Z Group By

UserId

List all [custom] correlation ids for all users over the past weekSELECT Username, UserId, cCorrelationIdFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000Z

Query login counts per user and browser - low count may indicate an anomaly or just a change in browserSELECT Count(Id), Browser, UserId, UsernameFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000ZGROUP BY UserId, Browser

Query login counts per user and status - high count of failed or invalid password attempts may indicate a brute force attack

SELECT Count(Id), Status, UserId, UsernameFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000ZGROUP BY UserId, Status

Page 25: Phoenix - A High Performance Open Source SQL Layer over HBase

Custom Events

Collect, query, and report on canned and custom events defined by our customers

Define custom time series based metrics to discover anomalies and summarize user interactions

API First but not only - declarative reporting user interface

Custom client side event publishing

Summer ‘14 & Winter ’15 - Pilots

Platform Monitoring Roadmap

Page 26: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix under the hood

FEATURERow Key

Key Values

ORG_ID DATE

TXNS

IO_TIME

RESPONSE_TIME

Product Metrics HTable

● Scan➢ Start key: ORG_ID (:1) + DATE (:2)➢ End key: ORG_ID (:1) + DATE (:3)

● Filter➢ Filter: IO_TIME > 100

● Aggregation➢ Intercepts scan on region server➢ Builds map of distinct FEATURE values➢ Returns one row per distinct group➢ Client does final merge

SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature

Page 27: Phoenix - A High Performance Open Source SQL Layer over HBase

Phoenix Roadmap▪ Apache incubator project▪ Joins▪ Multi-tenant tables▪ Cost-based query optimizer▪ OLAP extensions

• WINDOW, PARTITION OVER, RANK

▪ Monitoring and management▪ Transactions

Page 28: Phoenix - A High Performance Open Source SQL Layer over HBase

James Taylor

Principle Member of the Technical Staff,Salesforce.com

@JamesPlusPlus

Adam Torman

Director, Product Management,Salesforce.com

@atorman

Page 29: Phoenix - A High Performance Open Source SQL Layer over HBase