sql to nosql best practices with amazon dynamodb - aws july 2016 webinar series

35
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rick Houlihan, Principal TPM – DBS NoSQL 26 July 2016 From SQL to NoSQL Best Practices for Migrating from RDBMS to DynamoDB

Upload: amazon-web-services

Post on 16-Apr-2017

692 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rick Houlihan, Principal TPM – DBS NoSQL

26 July 2016

From SQL to NoSQLBest Practices for Migrating from RDBMS to DynamoDB

Page 2: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Agenda

• Evolution of Data Processing• Why NoSQL• Key DynamoDB concepts• SQL to NoSQL Data Modeling

Page 3: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Timeline of Database TechnologyDa

ta P

ress

ure

Page 4: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Data Volume Since 2010

Dat

a Vo

lum

eHistorical Current

90% of stored data generated in last 2 years

1 Terabyte of data in 2010 equals 6.5 Petabytes today

Linear correlation between data pressure and technical innovation

SQL is not built for this

Page 5: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Technology Adoption and the Hype Curve

Page 6: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Page 7: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Amazon DynamoDB

Document or Key-Value Scales to Any WorkloadFully Managed NoSQL

Access Control Event Driven ProgrammingFast and Consistent

Page 8: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

TableTable

Items

Attributes

PartitionKey

Sort Key

MandatoryKey-value access patternDetermines data distribution

OptionalModel 1:N relationshipsEnables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

Page 9: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

00 55 A954 FFAA00 FF

Partition Keys

Id = 1Name = Jim

Hash (1) = 7B

Id = 2Name = AndyDept = Eng

Hash (2) = 48

Id = 3Name = KimDept = Ops

Hash (3) = CD

Key Space

Partition Key uniquely identifies an itemPartition Key is used for building an unordered hash indexAllows table to be partitioned for scale

Page 10: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Partition 3

Partition:Sort Key uses two attributes together to uniquely identify an ItemWithin unordered hash index, data is arranged by the sort keyNo limit on the number of items (∞) per partition keyExcept if you have local secondary indexes

Partition:Sort Key

00:0 FF:∞

Hash (2) = 48

Customer# = 2Order# = 10Item = Pen

Customer# = 2Order# = 11Item = Shoes

Customer# = 1Order# = 10Item = Toy

Customer# = 1Order# = 11Item = Boots

Hash (1) = 7B

Customer# = 3Order# = 10Item = Book

Customer# = 3Order# = 11Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AAPartition 1 Partition 2

Page 11: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Partitions are three-way replicated

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Id = 2Name = AndyDept = Engg

Id = 3Name = KimDept = Ops

Id = 1Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Page 12: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Indexes

Page 13: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Local secondary index (LSI)

Alternate sort key attributeIndex is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort) A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected) ALL

10 GB max per partition key, i.e. LSIs limit the # of range keys!

Page 14: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Global secondary index (GSI)

Alternate partition and/or sort keyIndex is across all partition keysUse composite sort keys for compound indexes

A1(partition) A2 A3 A4 A5

A5(partition)

A4(sort)

A1(item key)

A3(projected) INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

GSIs

Table RCUs/WCUs provisioned separately for GSIs

Online indexing

Page 15: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Migrating to NoSQL

Planning DataAnalysis

DataModeling Testing Migration

Page 16: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Data Analysis Phase

Analysis of both the RDBMS schema as well as application access patterns is key to understanding the performance of workloads on a NoSQL database platform.

Page 17: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Data Analysis Phase

RDBMS Source Data AnalysisKey data attributes• Read/Write Velocity• Data Partitioning• Key Cardinality

Access Pattern of the ApplicationExamples:• Write/Read Workloads• Relational Structures• Aggregation Dimensions

Page 18: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Data Modeling

Page 19: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series
Page 20: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series
Page 21: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

1:1 relationships or key-values

Use a table or GSI with an alternate partition keyUse GetItem or BatchGetItem API

Example: Given an SSN or license number, get attributes

Users TablePartition key AttributesSSN = 123-45-6789 Email = [email protected], License = TDL25478134SSN = 987-65-4321 Email = [email protected], License = TDL78309234

Users-License-GSIPartition key AttributesLicense = TDL78309234 Email = [email protected], SSN = 987-65-4321License = TDL25478134 Email = [email protected], SSN = 123-45-6789

Page 22: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

1:N relationships or parent-children

Use a table or GSI with partition and sort keyUse Query API

Example:Given a device, find all readings between epoch X, Y

Device-measurementsPartition Key Sort key AttributesDeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90

Page 23: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

N:M relationships

Use a table and GSI with partition and sort key elements switchedUse Query API

Example: Given a user, find all games. Or given a game, find all users.

User-Games-TablePartition Key Sort keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3

Game-Users-GSIPartition Key Sort keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob

Page 24: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Hierarchical DataTiered relational data structures

Page 25: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

It’s all about aggregations…

Document Management Process ControlSocial Network

Data TreesIT Monitoring

Page 26: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

How OLTP Apps Use Data

Mostly Hierarchical Structures

Entity Driven Workflows Data Spread Across Tables Requires Complex Queries Primary driver for ACID

Page 27: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Hierarchical Structures as Item Collections…Use composite sort key to define a HierarchyHighly selective result sets with sort queriesIndex anything, scales to any size

 Primary Key

AttributesProductID type

Items

1 bookIDtitle author genre publisher datePublished ISBN

 Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4

2

albumIDtitle artist genre label studio relesed producer

Dark Side of the Moon Pink Floyd Progressive Rock Harvest Abbey Road 3/1/73 Pink Floyd

albumID:trackIDtitle length music vocals

 

Speak to Me 1:30 Mason Instrumental

albumID:trackIDtitle length music vocals

Breathe 2:43 Waters, Gilmour, Wright Gilmour

albumID:trackIDtitle length music vocals

On the Run 3:30 Gilmour, Waters Instrumental

3

movieIDtitle genre writer producer

Idiocracy Scifi Comedy Mike Judge 20th Century Fox

movieID:actorIDname character image

 

Luke Wilson Joe Bowers img2.jpg

movieID:actorIDname character image

Maya Rudolph Rita img3.jpg

movieID:actorIDname character image

Dax Shepard Frito Pendejo img1.jpg

Page 28: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

… or as Documents (JSON)

JSON data types (M, L, BOOL, NULL)Index root attributesFully Atomic Updates

 Primary Key

AttributesProductID

Items

1id title author genre publisher datePublished ISBN

 bookID Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-

4

2

id title artist genre Attributes

albumID Dark Side of the Moon Pink Floyd Progressive Rock

{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason",

vocals: "Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour, Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30",

music: ”Gilmour, Waters", vocals: "Instrumental"}]}

3

id title genre writer Attributes

movieID Idiocracy Scifi Comedy Mike Judge

{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name:

"Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]

Page 29: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Partition 11000 WCUs

Partition K1000 WCUs

Partition M1000 WCUs

Partition N1000 WCUs

Votes Table

Candidate A Candidate B

Scaling bottlenecks

50,000/sec

70,000/sec

Voters

Provision 200,000 WCUs

Page 30: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 31: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

UpdateItem: “CandidateA_” + rand(0, 10)ADD 1 to Votes

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 32: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Votes Table

Shard aggregation

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4

Candidate A_5

Candidate A_6 Candidate A_8

Candidate A_7 Candidate B_8

Periodic Process

Candidate ATotal: 2.5M

1. Sum2. Store Voter

Page 33: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Write Sharded IndexingUse for GSI’s with high volume aggregationsCommon when low cardinality attributes must be indexedScales to any size workload

 Primary Key

AttributesProductID type

Items

1 bookIDtitle author genre publisher datePublished ISBN

 Ringworld Larry Niven Science Fiction:1 Ballantine:50 Oct-70 0-345-02046-4

2

albumIDtitle artist genre label studio relesed producer

Dark Side of the Moon Pink Floyd Progressive Rock:99 Harvest Abbey Road 3/1/73 Pink Floyd:2

albumID:trackIDtitle length music vocals

 

Speak to Me 1:30 Mason Instrumental

albumID:trackIDtitle length music vocals

Breathe 2:43 Waters, Gilmour, Wright Gilmour

albumID:trackIDtitle length music vocals

On the Run 3:30 Gilmour, Waters Instrumental

3

movieIDtitle genre writer producer

Idiocracy Scifi Comedy:42 Mike Judge 20th Century Fox:17

movieID:actorIDname character image

 

Luke Wilson Joe Bowers img2.jpg

movieID:actorIDname character image

Maya Rudolph Rita img3.jpg

movieID:actorIDname character image

Dax Shepard Frito Pendejo img1.jpg

Page 34: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Conclusion

Keys to Success

• Select a workload that is a good fit for NoSQL• Understand the source data and access patterns• Test thoroughly and often• Plan on an iterative migration process

Page 35: SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series

Thank you!