data modeling for microservices with cassandra and spark

38
Strata + Hadoop World NYC Sept 26-29, 2016 Strata + Hadoop World NYC Sept 26-29, 2016 Page 1 Page 1 Jeff Carpenter, Choice Hotels International Data modeling for microservices with Cassandra and Spark

Upload: jeffrey-carpenter

Post on 22-Jan-2018

713 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 1Page 1

Jeff Carpenter, Choice Hotels International

Data modeling for microservices with

Cassandra and Spark

Page 2: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

1 IT Transformation – Distribution and Analytics

2 Creating a Data Architecture

3 Data Modeling for Microservices

4 Using Metadata for Diagnostics and Analytics

5 Challenges

Agenda

Page 2

Page 3: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

IT Capabilities

Corporate IT

Guest

Franchise

Relations

Hotel

Manage-

ment

Business

Intelligence

Distribution

Page 3

This

talk

Page 4: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

CRSWeb and

Mobile

External

Channels

Customer

& LoyaltyBilling

Property

Systems

Reporting

& Analytics

Distribution - Central Reservation System

Page 4

Distribution

Domain

Guest

Domain

Franchisee

Domain

Hotel

Management

Domain

Business

Intelligence

Domain

Page 5: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Current Reservation System – By The Numbers

Page 5

25 years

6,000 hotels

50

transactions / second4,000

distribution channels

1 instance

Page 6: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

New Systems: Distribution and Data Platforms

Page 6

Distribution Platform

Data Platform

History

Realtime

data

See: Choice Hotels's journey to

better understand its customers

through self-service analytics

This Talk: how we model data

and use the self-service

platform

Page 7: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Distribution Platform - Architecture Tenets

Cloud-native

Microservices

Open Source Infrastructure

Extensibility

Stable, Scalable, Secure

Page 7

Page 8: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Data Ownership

What is a Microservice? (one definition)

Page 8

Message

Driven Service

Entity

ServiceClient

REST

API

AMQ

Events

DB

Composing

Service

Persistence

Page 9: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 9

How can we design our data

architecture & models to be…

• Scalable?

• Extensible?

• Maintainable?

• Analytics-ready?

Page 10: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Non-

relational

storage

Long Term

Storage

LoggingReporting

& Analytics

Metrics

Our Data Stack

Page 10

Page 11: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Data Modeling – Then and Now

Isolated

Systems

Data

Dictionary

SOA and

Canonical

Data

Model

Services

own data

Page 11

• Identifying domains and relationships

Conceptual Data Model

• Identifying data types and relationships

Logical Data Model

• Java APIs

• RESTful APIs (JSON)

• Events (JSON)

• Cassandra Schemas

Physical Models

Page 12: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Conceptual Data Model - Domains

Page 12

rates inventoryhotels reservationsoffers

Page 13: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Hotel Management

Domain

Guest DomainDistribution Domain

Conceptual Data Model – Domain Relationships

Page 13

hotelsguest

stay

loyalty

rates

inventory

offers

reservations

Page 14: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Rates Domain

Composite Rate Service

Rate Plan Service

Rate

Service

Logical Data Model – Identifying Types

Page 14

Rate Plan

• id

• code

• hotelId

• effectiveDates

• Conditions

Rate

• id

• ratePlanId

• productId

• hotelId

• dateSpan

Price

• condition

• amount

Product

• id

• code

• hotelId

• features

• …

Page 15: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Standardizing Common Data Types

• Instead of a Canonical Data Model, we standardize basic building blocks

– Feature, Category, Brand

– Geospatial

– Financial

– Time

– Contact information

Page 15

Address

• lines[]

• city

• subdivision

• country

• postalCode

Page 16: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Data Types →Microservice Identification

Page 16

Hotel

Service

Rates

Service

Data Maintenance

Apps

Inventory

Service

Offer

Service

Inventory

Domain

Rates

Domain

Hotel

Domain

Offer

Domain

Internal / External

Client Apps

Reservation

Service

Reservation

Domain

Page 17: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Physical Data Models

Page 17

Physical Models

Java APIsRESTful APIs

(JSON)

Events

(JSON)

Cassandra

Schemas

JSON = primary definition of

the data type owned by each

service

Page 18: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Key Data Types → RESTful Resource Paths

Page 18

Offer

Service/offers

/reservations

Hotel

Service

Reservation

Service

Rates

Service

Inventory

Service

/hotels

/rates

/inventory

Page 19: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

REST Java API

GET /types/<id> Type getTypeById()

GET /types?<query parameters> Type[] searchType(TypeSearchCriteria)

POST /types/ (JSON body) createType(Type)

PUT /types/ (JSON body) updateType(Type)

DELETE /types/<id> deleteType(TypeId)

Java and RESTful APIs – common pattern

Page 19

Page 20: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016Page 20

Cassandra Data Modeling

(an idealized view)

Page 21: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

View hotels near POI

View hotel Info

Show POIs near hotel

Shop for rooms at hotel

View room details

Book a room

Q1 Q2

Q3

Q4

Q5

View reservation by confirmation

number

View hotel reservations for

a date

Find reservation by guest name

Q6

Q8

Q7

View guest details

Q9

Q9

Q9

Cassandra Data Modeling – Access Patterns

Page 21

Page 22: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

pois_by_hotel

hotel_id

poi_name

description

Q3

Q1 Q2 Q4

Q5

amenities_by_room

hotel_id

room_id

amenity_name

description

K

K

C↑

K

C↑

hotels_by_poi

poi_name

hotel_id

name

phone

address

K

C↑

hotels

hotel_id

name

phone

address

K

available_rooms_by_hotel_date

hotel_id

date

room_number

is_available

K

C↑

C↑

Cassandra Data Modeling – Chebotko Diagrams

Page 22

Page 23: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

hotel keyspace

hotels_by_poi

poi_name

hotel_id

name

phone

address

K

C↑

pois_by_hotel

hotel_id

poi_name

description

amenities_by_room

hotel_id

room_number

amenity_name

description

K

K

C↑

K

C↑

available_rooms_by_hotel_date

hotel_id

date

room_number

is_available

K

C↑

C↑

date

smallint

boolean

text

text

text

text

address

text

text

smallint

text

text

text

text

*address*

street

city

state_or_province

postal_code

country

hotels

hotel_id

name

phone

*address*

text

text

text

text

text

text

text

text

address

K

text

Cassandra Data Modeling - Physical

Page 23

Page 24: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Cassandra Data Modeling - Schemas

CREATE KEYSPACE hotel

WITH replication = {'class':

'SimpleStrategy',

'replication_factor' : 3};

CREATE TYPE hotel.address (

street text,

city text,

state_or_province text,

postal_code text,

country text

);

CREATE TABLE hotel.hotels_by_poi (

poi_name text,

hotel_id text,

name text,

phone text,

address frozen<address>,

PRIMARY KEY ((poi_name),

hotel_id)

)

WITH CLUSTERING ORDER BY (

hotel_id ASC) ;

Page 24

Page 25: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016Page 25

And now…

Back to reality

Page 26: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Keyspace hotel

Access Patterns and Denormalization

Page 26

Locate hotel

by identifier

Find hotels

within X miles

of point Y

Find hotels by

city, state,

country

Find hotels

by postal

code

Hotels by

amenity

Find hotels

by brand

hotels_by_id

hotels_by_brand

hotels_by_postal_code

Hotels by

this

Hotels by

that

Hotels by

something

else

Page 27: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Metadata

Page 27

Request Context

• Requestor

• Tracking ID

• Token

• Locale

Service AMQ

Logs

ELK Stack

EventsIncoming

Request

Page 28: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Asynchronous events

Page 28

Event

• Type

• Create

• Update

• Delete

• Request Context

• Old entity

• New entity

Request Context

• Requestor

• Tracking ID

• Token

• Locale

{

"type" : "UPDATE",

"trackingId" : "0da7b794-f2c3-…",

"requestor": "Legacy CRS",

"newEntity" : {

"hotelId": "AZ123",

"productId": "NSK",

"date": "2016-05-20",

"consumedCount": "22",

"totalCount": "25“

},

"oldEntity" : {

"hotelId": "AZ123",

"productId": "NSK",

"date": "2016-05-20",

"consumedCount": "20",

"totalCount": "25“

}

}

Entity (old/new)

• Id

• …

Sample Inventory Event

Page 29: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Putting It Together – Diagnostics

Page 29

Service

C*

node

node

node

node

Incoming

Request

Data History Logs

Metrics StoreELK StackData Platform

Metrics

Page 30: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Metrics StoreELK Stack

Putting It Together – Long Term Storage

Page 30

Data Platform

C*

node

node

node

node

Long

Term

Storage

Page 31: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Separating Active and History Data

Page 31

Now

Time

Yesterday’s data is

ancient history

Rate + Inventory Data

Page 32: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Data Platform - Cloudera

History architecture

Page 32

Service AMQ Kafka

S3

Other

subscribers

History retrieval

History capture

Customer

Service Apps

History

Service

Spark

node

node

node

node

Impala*

Page 33: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Microservice Data Challenges

No Joins?

Data Maintenance

Data Integrity

Cascading Deletes

Transactions

Page 33

Page 34: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Distributed Transactions, Anyone?

Page 34

Commit the

contract

Reserve

the inventory

Booking

Client

Data Maintenance

Apps

Inventory

Service

Reservation

Service

inventory

reservations

Data

synchronization

Page 35: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Alternatives to Distributed Transactions

Approach Example Scope

C* Lightweight

TransactionUpdating inventory counts Data Tier

C* Logged BatchWriting to multiple denormalized

hotel tablesData Tier

Retrying failed callsData synchronization, reservation

processingService

Compensating

transactionsVerifying reservation processing System

Page 35

Eventual

consistency

Strong

consistency

Page 36: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Final Thoughts

Data Models > Microservices

Events = Streams

Use Metadata Everywhere

Page 36

Page 37: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Now Available!

Page 37

Cassandra: The Definitive Guide, 2nd Edition

Completely reworked for Cassandra 3.X:

• Data modeling in CQL

• SASI indexes

• Materialized views

• Lightweight transactions

• DataStax drivers

• New chapters on security, deployment, and integration

Page 38: Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016

Contact Info

@choicehotels

careers.choicehotels.com

@jscarp

jeffreyscarpenter

Page 38