introduction to azure documentdb

57
Introduction to Azure DocumentDB Denny Lee, Principal Program Manager, Azure DocumentDB

Upload: denny-lee

Post on 11-Feb-2017

138 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction to Azure DocumentDB

Introduction to Azure DocumentDB

Denny Lee,Principal Program Manager, Azure DocumentDB

Page 2: Introduction to Azure DocumentDB

Denny Lee• Principal Program Manager for Azure DocumentDB• 20+ years of experience in databases, distributed

systems, data sciences, and software development at Microsoft, Concur, and Databricks

• Noteable Projects:• Project Isotope: Incubation team for HDInsight• Yahoo! 24TB cube: Largest SSAS cube in production

@dennylee

Page 3: Introduction to Azure DocumentDB

A Brief Overview...

Page 4: Introduction to Azure DocumentDB

{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}

Perfect for these

Documentsschema-agnostic JSON store

for

hierarchical and de-normalized data at scale

Page 5: Introduction to Azure DocumentDB

Not these documents

Page 6: Introduction to Azure DocumentDB

{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}

Perfect for these

Documentsschema-agnostic JSON store

for

hierarchical and de-normalized data at scale

Page 7: Introduction to Azure DocumentDB

Elastically Scalable Throughput + Storage

Page 8: Introduction to Azure DocumentDB

Guaranteed low latency

Reads <10ms @ P99Writes <15ms @ P99

Page 9: Introduction to Azure DocumentDB

Globally Distributed

Page 10: Introduction to Azure DocumentDB

Speaks your language

Page 11: Introduction to Azure DocumentDB

DocumentDB Query Playground

Demo

Code: https://www.documentdb.com/sql/demo

Page 12: Introduction to Azure DocumentDB

A Primer on Scale...

Page 13: Introduction to Azure DocumentDB

The 4 Vs of Big DataExceeds physical limits of vertical scalabilityVolume

Many different formats making integration expensiveVariety

Small decision window compared to data change rateVelocity

Many options or variables confounding analysisVariability

Page 14: Introduction to Azure DocumentDB

The 4 Vs of Big DataVolume Variety Velocity Variability

Mobile Apps Retail Learning Telematics IoT Gaming

Page 15: Introduction to Azure DocumentDB

Let’s talk about scale.

Volume and Velocity

Page 16: Introduction to Azure DocumentDB
Page 17: Introduction to Azure DocumentDB

Ability to Scale from Day 1• Bursty • Unpredictable traffic

Gaming + Social Experience• Lag-free• Responsive experiences

Move fast without breaking things• Iterative development needs

More users, more problems

Page 18: Introduction to Azure DocumentDB

• Game scores, guilds and social membership

• Leaderboards by country and social• Guild management and messaging• #1 in Apple app store for free apps

<10ms

99P query latency

>1M game

downloads

~1B requests / day

The Walking Dead, results

Page 19: Introduction to Azure DocumentDB

Caches• Scores are continuously

updated• Write heavy without

locality

RDBMS• Scale-out requires partitioning• Schema and index

management

Other NoSQL Stores• Longer tail on latencies• Need to specify secondary

indexes for lookups

The right tool for the job ?

Page 20: Introduction to Azure DocumentDB

Fully managed NoSQL databaseHorizontal scaling for TB and RPSHigh performance, write optimizedSchema agnostic indexing

+Azure DocumentDB

The answer for low latency @ massive scale

Page 21: Introduction to Azure DocumentDB

Fact: Managing shards is really painful.

Managing shards or partitions

Good news: DocumentDB has done all the heavy lifting.

Page 22: Introduction to Azure DocumentDB

Elastic scale

Page 23: Introduction to Azure DocumentDB

Measuring Throughput (Request Units)

Replica gets a fixed budget of request units

Request Unit/sec (RU) is the normalized currency

% IOPS

% CPU

% Memory

READGET Document

Documents

INSERTPOST

REPLACEPUT Document

Operations consume request units (RUs)

QueryPOST Documents

Min RU/sec

Max RU/sec

Inco

min

g Re

ques

ts

Replica Quiescent

Ratelimit

Nothrottling

Requests get rate limited if they exceed the SLA Customers pay for reserved

request units by the hour

Page 24: Introduction to Azure DocumentDB

Elastic Scale

Demo

Code: https://aka.ms/docdb-benchmark

Page 25: Introduction to Azure DocumentDB

Configured @10,100 RUs

~940 writes / s~9800 RUs

Page 26: Introduction to Azure DocumentDB

Configured @250,000 RUs

~12,100 writes / s~128,800 RUsVM @ 99% CPU

Page 27: Introduction to Azure DocumentDB

A Global Distribution Primer…

Page 28: Introduction to Azure DocumentDB

Globally Distributed

Azure DocumentDB gives you the ability circumvent the speed of light!

High Availability and Disaster Recovery

Replicate to any Number of regions

Global low latency access

Dynamically configure write and read regions

Page 29: Introduction to Azure DocumentDB

… with well-defined consistency models!

Consistency Level Strong Bounded Stateless Session Eventual

Total Global Order Yes Yes (outside of the “staleness window”)

No, partial “session” order

No

Consistent prefix guarantee

Yes Yes Yes Yes

Monotonic Reads Yes Yes (within region and across regions outside of the staleness window)

Yes (for the given session)

No

Monotonic Writes Yes Yes Yes Yes

Read your writes Yes Yes (in the write region) Yes No

stronger consistency

faster performance

Page 30: Introduction to Azure DocumentDB

Global Distribution

Demo

Code: https://aka.ms/docdb-latency-script-nodejs

Page 31: Introduction to Azure DocumentDB
Page 32: Introduction to Azure DocumentDB
Page 33: Introduction to Azure DocumentDB
Page 34: Introduction to Azure DocumentDB

Common Scenarios

Page 35: Introduction to Azure DocumentDB

Common scenarios

Retail Gaming IoT Social

Product Catalog

Recommendations

Personalization

User Store

Recommendations

Personalization

Event Store

Device Registry

Telemetry Store

User Behavior

Telemetry

Personalization

Page 36: Introduction to Azure DocumentDB

Common scenarios

IoT

Event Store

Device Registry

Telemetry Store

IoT / Sensor Data Challenges:

• Hardware is relatively hard to update• Different generation of devices

=> different schemas (variety)• Many sensors emitting telemetry

=> high rate of ingestion (volume + variety)

Page 37: Introduction to Azure DocumentDB

Top 5 Automotive Manufacture in the World

Telematics services include:• Safety service• Diagnostic service• Remote service

Ingest and query 100+ TB of semi-structure data

IoT : Vehicle Telematics

Page 38: Introduction to Azure DocumentDB

IoT : Vehicle Telematics

Ingress API

Inbound Interface(Web API)

Raw Event Store (HOT)(DocumentDB)

Aggregated Event Store (Warm)(DocumentDB)

Aggregated Event Store (Cold)(Blob Storage)

Outbound Interface(Web API)

Message Queue(Event Hubs)

Stream Processor(Stream Analytics)

Page 39: Introduction to Azure DocumentDB

Common scenarios

Social + AdTech Challenges:

• Ingest + Analyze Third Party Data => Who dictates schema? (variety)=> How do you index?

• A lot of social and user data=> high rate of ingestion (volume +

variety)

Social

User Behavior

Telemetry

Personalization

Page 40: Introduction to Azure DocumentDB

• Startup - Advanced Marketing Intelligence Platform

• Utilizes deep learning to analyze billions of relational network connections to build a social fingerprint for each user

• Extracts knowledge and cultural insights by analyzing what people choose to follow

Social Analytics + Ad Technology

>1BSocial Media

Profiles

>50M

Tweets per Day

Page 41: Introduction to Azure DocumentDB

• Store tweets, geo-location data, and ML results in DocumentDB

• Data from each social media producer has its own schema that evolves independently

• Need to iterate rapidly… no time for managing VMs

Social Analytics + Ad Technology

>1BSocial Media

Profiles

>50M

Tweets per Day

Page 42: Introduction to Azure DocumentDB

Before moving to DocumentDB, my developers would need to come to me to confirm that our Elasticsearch deployment would support their data or if I would need to scale things to handle it. DocumentDB removed me as a bottleneck, which has been great for me and them.

Stephen Hankinson, CTO, Affinio

Quote

Page 43: Introduction to Azure DocumentDB

Geospatial Supportincluding polygons

Demo

Want to try? Go to DocumentDB Query Playgroundhttps://www.documentdb.com/sql/demo

Page 44: Introduction to Azure DocumentDB

Polygon Query Examplehttps://www.keene.edu/campus/maps/tool/

Polygon of coordinates-124.630000, 48.360000-123.870000, 46.140000-122.230000, 45.540000-119.170000, 45.950000-116.920000, 45.960000-116.990000, 49.000000-123.050000, 49.020000-123.150000, 48.310000-124.630000, 48.360000

Page 45: Introduction to Azure DocumentDB

Finding Volcanos with DocumentDB

https://www.documentdb.com/sql/demo

Page 46: Introduction to Azure DocumentDB

Data Sciences:Apache Spark + DocumentDB

Page 47: Introduction to Azure DocumentDB

Example: Graph Structures

Page 48: Introduction to Azure DocumentDB

Example: Graph Structures

Page 49: Introduction to Azure DocumentDB

Classic Graph Scenario: Flights

vertex = airports

edges = flights

Page 50: Introduction to Azure DocumentDB

Data Sciences:Apache Spark + DocumentDB

Demo

Notebook View: https://aka.ms/docdb-spark-graphpyView: https://aka.ms/pydocdb-spark-graphCode: https://aka.ms/docdb-spark-graph-code

Page 51: Introduction to Azure DocumentDB

Graph Calculations: Degrees, PageRank

What is the most important airport (most flights in / out)

tripGraph.inDegrees\

.sort(desc("inDegree"))\

.limit(10))

Page 52: Introduction to Azure DocumentDB

AdvantagesData Science Scenarios

• Blazing Fast IoT Scenarios

• Updateable columns

• Push-down predicate filtering

Page 53: Introduction to Azure DocumentDB

AdvantagesBlazing Fast IoT Scenarios

Flight information

global safetyalerts

weather

Data Science Scenarios

Device Notifications

Web / REST API

Page 54: Introduction to Azure DocumentDB

AdvantagesUpdateable Columns

Flight information

Data Science Scenarios

Device Notifications

Web / REST API

{ tripid: “100100”, delay: -5, time: “01:00:01”}

{ tripid: “100100”, delay: -30, time: “01:00:01”}

{delay:-30}

{delay:-30}

{delay:-30}

Page 55: Introduction to Azure DocumentDB

AdvantagesPushdown Predicate Filtering Data Science Scenarios

{city:SEA}

locations headquarter exports

0 1

country

Germany

city

Seattle

country

France

city

Paris

city

Moscow

city

Athens

Belgium 0 1 {city:SEA, dst: POR, ...},{city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...

Page 57: Introduction to Azure DocumentDB

More Resources

AskDocDB@microsoft

Follow @DocumentDBUse #DocumentDB

documentdb.com

#azure-documentDB