speed layer : real time views in lambda architecture

34
LAMBDA ARCHITECTURE Speed layer: Real-time views BIS2013 Database Systems Prof. Duc Khanh Tran, PhD 1 st Hai Vo 2 nd Huy Huynh 3 rd Tin Ho

Upload: tin-ho

Post on 12-Jul-2015

382 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Speed layer : Real time views in LAMBDA architecture

LAMBDA ARCHITECTURESpeed layer: Real-time views

BIS2013

Database Systems

Prof. Duc Khanh Tran, PhD

1st Hai Vo

2nd Huy Huynh

3rd Tin Ho

Page 2: Speed layer : Real time views in LAMBDA architecture

BIS2013-RGB GROUP 2

Hai Vo The Speed Layer

Computing & Storing real-time views

Huy Huynh Challenges of incremental algorithms

Asynchronous versus synchronous updates

Tin Ho Expiring real-time views

Cassandra: an example speed layer view

Conclusion

Agenda

Page 3: Speed layer : Real time views in LAMBDA architecture

Motivation

The update latency requirements vary a great deal between applications.

Some applications require updates to propagate immediately, while in other applications a latency of a few hours is fine.

BIS2013-RGB GROUP 3

Page 4: Speed layer : Real time views in LAMBDA architecture

Serving Layer

Motivation

BIS2013-RGB GROUP 4

NEW DATA

Master Data

Batch Layer

Batch View

Batch View

Batch View

Robust & fault-tolerant

Low latency reads

Allows ad hoc queries

Low latency update

Speed Layer

Real - Time View

Real - Time View

Page 5: Speed layer : Real time views in LAMBDA architecture

Speed Layer vs Batch Layer

The speed layer is similar to the batch layer in that it produces views based on data it receives.

Incremental updates as opposed to recomputation updates.

Only produces views on recent data vs produces views on the entire dataset.

BIS2013-RGB GROUP 5

Page 6: Speed layer : Real time views in LAMBDA architecture

Speed Layer

Processing data on a smaller scale

Requires databases that support random reads and random writes

More complex and thus more prone to error.

The speed layer views are transient.

BIS2013-RGB GROUP 6

Page 7: Speed layer : Real time views in LAMBDA architecture

Two major facets of the speed layer

Storing the real-time views and processingthe incoming data stream so as to update those views

BIS2013-RGB GROUP 7

Page 8: Speed layer : Real time views in LAMBDA architecture

Computing Real-time Views

BIS2013-RGB GROUP 8

Real-time View = function (recent data)

ALL DATA Recent Data

Batch Views Real-time Views

Page 9: Speed layer : Real time views in LAMBDA architecture

Computing Real-time Views

BIS2013-RGB GROUP 9

Real-time View = function (new data, previous real-time view)

ALL DATA Recent Data

Batch Views Real-time Views

Real-time Views(Updated)

Page 10: Speed layer : Real time views in LAMBDA architecture

Storing Real-time Views

Low-latency random reads, and usingincremental algorithms necessitates low-latencyrandom updates

BIS2013-RGB GROUP 10

Random reads Fault-tolerantRandom writes Scalability

NoSQL database i.e. Cassandra

Page 11: Speed layer : Real time views in LAMBDA architecture

Challenges of incremental computation

Incremental algorithms are less general and lesshuman fault-tolerant, but higher performance

Challenge in a real-time context: theinteraction between incremental algorithmsand the CAP theorem

BIS2013-RGB GROUP 11

Page 12: Speed layer : Real time views in LAMBDA architecture

The CAP Theorem

BIS2013-RGB GROUP 12

Consistency

Partition Tolerance

Availability

“You can have at most two of Consistency, Availability, and Partition-tolerance"

“When a distributed data

system is Partitioned, it can be Consistent or Availablebut not both"

the system continues to operate despite arbitrary message loss or failure of part of the system

every request receives a response

all nodes see the same data at the same time

Page 13: Speed layer : Real time views in LAMBDA architecture

C vs A

When you choose consistency, sometimes aquery will receive an error instead of an answer

When you choose availability, at best you willhave where eventual consistency

BIS2013-RGB GROUP 13

Page 14: Speed layer : Real time views in LAMBDA architecture

Replication in Distributed system

Sally New York

BIS2013-RGB GROUP 14

Joe Paris

Maria Tokyo

Sally New York

Maria Tokyo

Sally New York

Maria Tokyo

Joe Paris

Maria Tokyo

Page 15: Speed layer : Real time views in LAMBDA architecture

Eventual consistency

BIS2013-RGB GROUP 15

Page 16: Speed layer : Real time views in LAMBDA architecture

BIS2013-RGB GROUP 16

Eventual consistency counting

Page 17: Speed layer : Real time views in LAMBDA architecture

Complexity in a real-time eventually consistentcontext

Read and repair algorithms are needed

BIS2013-RGB GROUP 17

Interaction between the CAP theorem and incremental algorithms

Page 18: Speed layer : Real time views in LAMBDA architecture

Unfortunately there is no escape from thiscomplexity if you want eventual consistency inthe speed layer

If the real-time view becomes corrupted, thebatch/serving layers will later automaticallycorrect the mistake in the serving layer views

BIS2013-RGB GROUP 18

Keep calm with Lambda

Page 19: Speed layer : Real time views in LAMBDA architecture

Synchronous updates

BIS2013-RGB GROUP 19

The application issues a request directly to thedatabase and blocks until the update isprocessed

Page 20: Speed layer : Real time views in LAMBDA architecture

Asynchronous updates

BIS2013-RGB GROUP 20

Asynchronous update requests are placed ina queue with the updates occurring at a latertime

Page 21: Speed layer : Real time views in LAMBDA architecture

Asynchronous versus synchronous updates

Synchronous Asynchronous

Fast Slower

Coordinate the update withother aspects of the application

Not coordinate

May overload the database Not overload the database

BIS2013-RGB GROUP 21

Page 22: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

Since the simpler batch and serving layers continuously override the speed layer, the speed layer views only need to represent data yet to be processed by the batch computation workflow

BIS2013-RGB GROUP 22

Page 23: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

Instead, we present a generic approach for expiring speed layer views that works regardless of the speed layer databases being used. To understand this approach, let's first get an understanding of what exactly needs to be expired each time the serving layer is updated

BIS2013-RGB GROUP 23

Page 24: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

BIS2013-RGB GROUP 24

Page 25: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

BIS2013-RGB GROUP 25

Page 26: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

BIS2013-RGB GROUP 26

Page 27: Speed layer : Real time views in LAMBDA architecture

Expiring real-time views

BIS2013-RGB GROUP 27

Page 28: Speed layer : Real time views in LAMBDA architecture

Cassandra's data model

What do you have ever heard about it?

Column oriented/family database

Key value based database

NOSql database

Real time operational data store

Distributed database

...

Page 29: Speed layer : Real time views in LAMBDA architecture

Cassandra's data model

Some definitions:Key space: is the outermost container for data in

Cassandra.

Column families: is a container for a collection of rows. Each row contains ordered columns.

Columns: is the most basic unit of data in Cassandra data model. Consists of <key,value,timestamp> triplet.

Page 30: Speed layer : Real time views in LAMBDA architecture

Cassandra's data modelColumn family example:

Page 31: Speed layer : Real time views in LAMBDA architecture

Cassandra's data model

Cassandra features make it suitable for real-time view:

Partitions keys among nodes in cluster: RandomPatitioner and OrderPreservingPatitioner

Composite columns.

Page 32: Speed layer : Real time views in LAMBDA architecture

Using Cassandra!

Page 33: Speed layer : Real time views in LAMBDA architecture

Conclusion

Theoretical model of the speed layer

CAP theorem

Incremental computation in stead of batchcomputation

Store speed layer view (real-time)

Page 34: Speed layer : Real time views in LAMBDA architecture

References

BIS2013-RGB GROUP 34

Big Data, Principles and best practices of scalable real-time data system – Nathan Marz