real-time “olap” for big data (+ use cases) - bigdata.ro 2013

32
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Cosmin Lehene | Adobe #bigdataro - 30 January 2013 Real-time “OLAP” for Big Data (+ use cases)

Upload: cosmin-lehene

Post on 09-May-2015

7.670 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Cosmin Lehene | Adobe

#bigdataro - 30 January 2013

Real-time “OLAP” for Big Data (+ use cases)

Page 2: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

What we needed … and built

OLAP Semantics Low Latency Ingestion High Throughput Real-time Query API

2

Page 3: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

“Physical” Building Blocks

3

Page 4: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Logical Building Blocks

Dimensions, Metrics Aggregations Roll-up, drill-down, slicing and dicing, sorting

4

Page 5: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

OLAP 101 – Queries example

5

Date Country

City OS Browser Sale

2012-05-21

USA NY Windows FF 0.0

2012-05-21

USA NY Windows FF 10.0

2012-05-22

USA SF OSX Chrome 25.0

2012-05-22

Canada Ontario Linux Chrome 0.0

2012-05-23

USA Chicago OSX Safari 15.0

5 visits,3 days

2 countriesUSA: 4Canada: 1

4 cities:NY: 2SF: 1

3 OS-esWin: 2OSX: 2

3 browsersFF: 2Chrome:2

50.03 sales

Page 6: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

OLAP 101 – Queries example

Rolling up to country level:

SELECT COUNT(visits), SUM(sales)

GROUP BY country

“Slice” by browser

SELECT COUNT(visits), SUM(sales)

GROUP BY country

HAVING browser = “FF”

Top browsers by sales

SELECT SUM(sales), COUNT(visits)

GROUP BY browser

ORDER BY sales

6

Country visits

sales

USA 4 $50

Canada 1 0

Country visits

sales

USA 2 $10

Canada 0 0

Browser sales visits

Chrome $25 2

Safari $15 1

FF $10 2

Page 7: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Aggregate at runtime Most flexible

Fast – scatter gather

Space efficient

But I/O, CPU intensive

slow for larger data

low throughput

Pre-aggregate Fast

Efficient – O(1)

High throughput

But More effort to process

(latency)

Combinatorial explosion (space)

No flexibility

OLAP – Runtime Aggregation vs. Pre-aggregation

7

Page 8: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase Map

8

Page 9: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase Domain Model Mapping

9

Page 10: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase - Domain Model Mapping

10

Page 11: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase - Ingestion, Processing, Indexing, Querying

11

Page 12: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase - Ingestion, Processing, Indexing, Querying

12

Page 13: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Ingestion

13

Page 14: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Ingestion(ETL) throughput vs. latency

Historical data (large batches) Optimize for throughput

Increments (latest data, smaller) Optimize for latency

14

Page 15: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Processing

15

Page 16: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Processing

Processing involves reading the Input (files, tables, events), pre-aggregating it (reducing cardinality) and generating cubes that can be queried in real-time

“Super Processor” code running in Storm, Map-Reduce, HBase

16

Page 17: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Processing for OLAP semantics

GROUP BY (process, query)

COUNT, SUM, AVG, etc. (process, query)

SORT (process, query)

HAVING (mostly query, can define pre-process constraints)

17

Page 18: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

SaasBase vs. SQL Views Comparison

18

Page 19: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Query Engine

Always reads indexed, compact data

Query parsing

Scan strategy

Single vs. multiple scans

Start/stop rows (prefixes, index positions, etc.)

Index selection (volatile indexes with incremental processing)

Deserialization

Post-aggregation, sorting, fuzzy-sorting etc.

Paging

Custom dimension/metric class loading

19

Page 20: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Business Catalyst

Online business presence: e-commerce, marketing, web analytics etc.

Use case: Web Analytics (visitors, channels, content, e-commerce, campaigns, etc.)

20

Page 21: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

BC - Workflow

21

Page 22: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Business Catalyst - Stats

3 active datacenters

Raw data ~6TB (from ~1TB 18 months ago)

Visits table: ~1TB each(compressed)

OLAP cubes (stats): 49GB – 64GB (compressed)

~30 minutes latency (from actual pageview/sale to chart in UI)

10s – 100s of milliseconds latency for queries

~3000/s max concurrent OLAP queries (actual traffic is much lower)

22

Page 23: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Pass for TV Everywhere

Authentication & Authorization

Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.)

23

Page 24: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Pass – Use Case

Analytics use case: Operational metrics (users, devices, latencies, etc.)

Real-time ingestion in HBase

High Frequency Map Reduce jobs (every 2 minutes)

24

Page 25: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Pass - Stats (London Olympics 2012)

67M streams ~ 5.3M hours

1.5M concurrent streams

> 7M unique users

1 Technical & Engineering Emmy Award ;)

25

Page 26: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Primetime – Real-time Video Analytics

Unified video platform (acquisition, transcoding, broadcast, ads, analytics)

26

Page 27: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Adobe Primetime – Use Case

Use Cases: Audience metrics – minutes latency ok

Ads metrics – seconds to minutes ok

Streaming QoS metrics – seconds must

Requirements: Massive throughput (millions of streams, multiple

heartbeats every 10 seconds)

Low latency (end-to-end)

27

Page 28: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28

Page 29: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Conclusions

OLAP semantics on a simple data model

Data as first class citizen

Domain Specific “Language” for Dimensions, Metrics, Aggregations

Framework for vertical analytics systems

Tunable performance, resource allocation

29

Page 30: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Thank you!Cosmin Lehene @clehene

http://hstack.org

30

Page 31: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Related

http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/

http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012

31

Page 32: Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.