[pass summit 2016] blazing fast, planet-scale customer scenarios with azure documentdb
TRANSCRIPT
Blazing Fast, Planet-ScaleCustomer Scenarios with
Azure DocumentDB
Denny Lee
Program Manager
Azure DocumentDB
@dennylee
Andrew Liu
Program Manager
Azure DocumentDB
@aliuy8
{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url":
"http://www.smugmug.com",
"blog_url":
"http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for
these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale
{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url":
"http://www.smugmug.com",
"blog_url":
"http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for
these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale
Choose the right
tools for the right job
SQL
SQL Server 2016
SQL Database
Azure DocumentDB
Azure Search
Azure HDInsight
Azure Data Lake
Azure DW APS
Azure Stream Analytics
SQL
SQL Server 2016
Azure Data Factory
Azure ML
Azure Data Catalog
Power BI
SQL
SQL Server 2016
SQLServer 2016
SQL
Microsoft Data Platform
Fact: Managing shards is really painful.
Managing shards or partitions
Good news: DocumentDB has done all the heavy lifting.
Request Unit (RU) is the
normalized currency
%
Memory
%
IOPS
%
CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
setResource
Resource
DocumentsSQL
sprocsargs
Resource Resource
Predictable Performance
Request units
… with well-defined consistency models!
Bounded
StalenessSessio
nEventualStrong
LEFT TO RIGHT Relaxed consistency => better performance and availability
Consistency Level Strong Bounded Staleness Session Eventual
Total global order Yes Yes, outside of the “staleness
window”
No, partial “session” order No
Consistent prefix
guarantee
Yes Yes Yes Yes
Monotonic reads Yes Yes, across regions outside of the
staleness window and within a region
all the time
Yes, for the given session No
Monotonic writes Yes Yes Yes Yes
Read your writes Yes Yes (in the write region) Yes No
27%
3%
54%
16%
Observed Distribution
BoundedStaleness
Eventual
Session
Strong
Global DistributionDemo
Code: https://aka.ms/docdb-latency-script-nodejs
Item Color Microwave Safe Liquid Capacity
Geek Mug Graphite Yes 16oz
Coffee Bean Mug Tan No 12oz
Problem 2: Variety
Item Color Microwave Safe Liquid Capacity
Geek Mug Graphite Yes 16oz
Coffee Bean Mug Tan No 12oz
Surface Book Gray ??? ???
Variety : Different attributes
Item Color Microwave
Safe
Liquid
Capacity
CPU Memory Storage
Geek Mug Graphite Yes 16oz ??? ??? ???
Coffee Bean Mug Tan No 12oz ??? ??? ???
Surface Book Gray ??? ??? 3.4 GHz Intel
Skylake Core i7-
6600U
16GB 1 TB SSD
Variety : More columns ?
Item Color Microwave
Safe
Liquid
Capacity
Geek Mug Graphite Yes 16oz
Coffee Bean Mug Tan No 12oz
Variety : More tables ?
Item CPU Memory Storage
Surface Book 3.4 GHz Intel
Skylake Core i7-
6600U
16GB 1 TB SSD
ProductId Name
1 Geek Mug
2 Coffee Bean Mug
3 Surface Book
Variety : Master data ?
ProductId Attribute Value
1 Microwave Safe Yes
1 Liquid Capacity 16oz
… … …
2 Microwave Safe No
2 Liquid Capacity 12oz
… … …
3 CPU 3.4 GHz Intel Skylake Core i7-
6600U
3 Memory 16GB
… … …
Retail
• Product Catalog
• Product Recommendations + Personalization
Gaming
• Multiplayer + Social Gameplay
IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Social Analytics + Ad Technology
• User behavior telemetry
• 3rd-Party Data from Web Crawlers
Common scenarios
IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Common scenarios
IoT / Sensor Data Challenges:
• Hardware is relatively hard to update
• Different generations of devices=> different schema
(Variety)
• Lots of sensors emitting telemetry=> high rate of ingestion
(Volume + Velocity)
Common Scenarios
Social Analytics + Ad Technology:
• Ingest + Analyze 3rd-Party Data
=> Who dictates schema? How do you index?
(Variety)
• Lots of social / user profiles
=> high rate of ingestion
(Volume + Velocity)
Social Analytics + Ad Technology
• User behavior telemetry
• 3rd-Party Data from Web Crawlers
Social Analytics + Ad Technology
>1BSocial Media
Profiles
>50M Tweets per Day
Before moving to DocumentDB, my developers would
need to come to me to confirm that our Elasticsearch
deployment would support their data or if I would need
to scale things to handle it. DocumentDB removed me
as a bottleneck, which has been great for me and them.
-Stephen Hankinson, CTO, Affinio
Flight Graph with
Spark and DocumentDB
Notebook
View: https://aka.ms/docdb-spark-graph
Code: https://aka.ms/docdb-spark-graph-code
Demo
Understanding most important
airport (most flights in / out)
tripGraph.inDegrees\
.sort(desc("inDegree"))\
.limit(10))
Graph Calculations: Degrees, PageRank
56
• Blazing Fast IoT Scenarios
• Updateable columns
• Push-down predicate filtering
Advantages of DocumentDB in Data Science Scenarios
57
AdvantagesBlazing Fast IoT Scenarios
58
Flight
information
global safety
alerts
weather
Data Science Scenarios
Device
Notifications
Web / REST API
AdvantagesUpdateable Columns
59
Flight
information
Data Science Scenarios
Device
Notifications
Web / REST API
{ tripid: “100100”,delay: -5,time: “01:00:01”
}
{ tripid: “100100”,delay: -30,time: “01:00:01”
}
{delay:-30}
{delay:-30}
{delay:-30}
AdvantagesPushdown Predicate Filtering
60
Data Science Scenarios
{city:SEA}
locations headquarter exports
0 1
country
Germany
city
Seattle
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1{city:SEA, dst: POR, ...},{city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...
More Resources / Coming Soon
Want to know more about Spark-to-DocumentDB
Connector?
Have any other questions?
Session Evaluations
ways to access
Go to passSummit.com Download the GuideBook App
and search: PASS Summit 2016
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 5pmFriday November 6th toWIN prizes
Your feedback is important and valuable. 3