reflects many discussions with:
DESCRIPTION
An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research. Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/1.jpg)
1
An Overview of Cloud Computing @ Yahoo!
Raghu RamakrishnanChief Scientist, Audience and Cloud Computing
Research Fellow, Yahoo! Research
Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata
and joint work with the Sherpa team, in particular:Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! ResearchChuck Neerdaels, P.P. Suryanarayanan and many others in CCDI
![Page 2: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/2.jpg)
2
Questions
• What is cloud computing?– Horizontal and functional services
• What’s it going to change?– Software business models, science, life
• How many clouds will there be?– 1, 2, 3, infinity
• What’s new in cloud computing?– HPC grids, ASPs, hosted services, Multics (!)– Emerging “cloud stack” to support a broad class of
programs, including data intensive applications
![Page 3: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/3.jpg)
3
SCENARIOSPie-in-the-sky
![Page 4: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/4.jpg)
4
Living in the Clouds
• We want to start a new website, FredsList.com• Our site will provide listings of items for sale, jobs,
etc.• As time goes on, we’ll add more features
– And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed
• List of capabilities/components is illustrative, not exhaustive
• Our cloud provides a “dataset” abstraction– FredsList doesn’t worry about the underlying components
![Page 5: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/5.jpg)
5
Step 1: Listings Scenario
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
FredsList.com application FredsList.com application
1234323, transportation, For sale: one bicycle, barely used
FredsList wants to store listings as (key, category, description)
5523442, childcare, Nanny available in San Jose
215534, wanted, Looking for issue 1 of Superman comic book
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
![Page 6: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/6.jpg)
6
Step 2: System Evolution
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
FredsList.com application FredsList.com application
1234323, transportation, For sale: one bicycle, barely used
Fred belatedly realizes prices are useful information!
5523442, childcare, Nanny available in San Jose
215534, wanted, Looking for issue 1 of Superman comic book
ALTER DATASET ListingsADD (Price Float)
ALTER DATASET ListingsADD (Price Float)
Schemas are flexible, and evolve
32138, camera, Nikon D40,USD 300
Not every record in adataset has values defined for all fields declared forthe dataset
vs.
![Page 7: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/7.jpg)
7
Step 3: Search
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
“bicycle”
FredsList’s customers quickly ask for keyword search
Search
Vespa
“dvd’s” “nanny”
FredsList.com application FredsList.com application
ALTER ListingsSET Description SEARCHABLE
ALTER ListingsSET Description SEARCHABLE
Messaging
Tribble
Federation of systems
offering different
capabilities
![Page 8: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/8.jpg)
8
Step 4: Photos
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
FredsList decides to add photos/videos to listings
Search
Vespa
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsADD Photo BLOB
ALTER ListingsADD Photo BLOB
Messaging
Tribble
Federation of systems
offering different
performance points
![Page 9: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/9.jpg)
9
Step 5: Data Analysis
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.
Search
Vespa
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE ANALYZABLE
ALTER ListingsMAKE ANALYZABLE
Compute
Grid
Batch export
Pig query to analyze categories
Hadoop program to geocode data
Hadoop program to generate fancy pages for listings
Messaging
Tribble
![Page 10: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/10.jpg)
10
Step 6: Performance
Simple Web Service API’s Simple Web Service API’s
Database
PNUTS
FredsList wants to reduce its data access latency
Search
Vespa
Messaging
Tribble
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE CACHEABLE
ALTER ListingsMAKE CACHEABLE
Compute
Grid
Batch export
Caching
memcached
And by now, Fred is
global, and wants geo-replication!
![Page 11: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/11.jpg)
11
Data Serving vs. Analysis
• Very different workloads, requirements• Data from serving system is one of many
kinds of data (click streams are another common kind, as are syndicated feeds) to be analyzed and integrated
• The result of analysis often goes right back into serving system
![Page 12: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/12.jpg)
12
EYES TO THE SKIESMotherhood-and-Apple-Pie
![Page 13: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/13.jpg)
13
Why Clouds?
• On-demand infrastructure to create a fundamental shift in the OE curve:
– Do things we can’t do
– Build more robustly, more efficiently, more globally, more completely, more quickly, for a given budget
• Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability– Today, this is done at the
app-level, which is not productive
![Page 14: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/14.jpg)
14
Requirements for Cloud Services
• Multitenant. A cloud service must support multiple, organizationally distant customers.
• Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand.
• Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient, e.g., due to spikes.
• Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service.
• Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service.
• Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.
• Availability. A cloud service should be highly available.• Operability. A cloud service should be easy to operate, with few
operators. Operating costs should scale linearly or better with the capacity of the service.
![Page 15: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/15.jpg)
15
Types of Cloud Services
• Two kinds of cloud services:– Horizontal (“Platform”) Cloud Services
• Functionality enabling tenants to build applications or new services on top of the cloud
– Functional Cloud Services • Functionality that is useful in and of itself to tenants. E.g., various
SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping
• Could be built on top of horizontal cloud services or from scratch• Yahoo! has been offering these for a long while (e.g., Mail for
SMB, Groups, Flickr, BOSS, Ad exchanges)
![Page 16: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/16.jpg)
16
Opening Up Yahoo! Search
Phase 1 Phase 2
Giving site owners and developers control over the appearance of Yahoo!
Search results.
BOSS takes Yahoo!’s open strategy to the next level by providing Yahoo!
Search infrastructure and technology to developers and companies to help them
build their own search experiences.
![Page 17: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/17.jpg)
18
BOSS Offerings
API
A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences.
BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search.
• University of Illinois Urbana Champaign• Carnegie Mellon University
• Stanford University
• Purdue University
• MIT
• Indian Institute of
Technology Bombay
• University of
Massachusetts
CUSTOM
Working with 3rd parties to build a more relevant, brand/site specific web search experience.
This option is jointly built by Yahoo! and select partners.
ACADEMIC
Working with the following universities to allow for wide-scale research in the search field:
(Slide courtesy Prabhakar Raghavan)
![Page 18: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/18.jpg)
19
Partner Examples
![Page 19: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/19.jpg)
20
Horizontal Cloud Services
• Horizontal cloud services are foundations on which tenants build applications or new services. They should be:– Semantics-free. Must be "generic infrastructure,” and not tied to
specific app-logic. • May provide the ability to inject application logic through well-defined
APIs
– Broadly applicable. Must be broadly applicable (i.e., it can't be intended for just one or two properties).
– Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures.
• While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications.
![Page 20: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/20.jpg)
22
Yahoo! Cloud StackPr
ovis
ioni
ng (
Self-
serv
e)
Horizontal Cloud Services …YCS YCPI Brooklyn
EDGEM
onito
ring/
Met
erin
g/Se
curit
y
Horizontal Cloud Services…Hadoop
BATCH
Horizontal Cloud Services…Sherpa MOBStor
STORAGE
Horizontal Cloud ServicesVM/OS …
APP
Horizontal Cloud ServicesVM/OS yApache
WEB
Dat
a H
ighw
ay
Serving Grid
PHP App Engine
![Page 21: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/21.jpg)
23
Yahoo! CCDI Thrust Areas
• Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services– Multiple hosts may be multiplexed onto the same physical
machine.
• Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities
• Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval
• Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP
Rest of today’s talk
![Page 22: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/22.jpg)
24
Web Data Management
Large data analysis(Hadoop)
Structured record storage
(PNUTS/Sherpa)
Blob storage(SAN/NAS)
• Scan oriented workloads
• Focus on sequential disk I/O
• $ per cpu cycle
• CRUD • Point lookups
and short scans
• Index organized table and random I/Os
• $ per latency
• Object retrieval and streaming
• Scalable file storage
• $ per GB
![Page 23: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/23.jpg)
25
[Workflow][Workflow]
Hadoop: Batch Storage/Analysis
Why is batch processing important?
• Whether it’s – response-prediction for advertising– machine-learned relevance for Search, or– content optimization for audience, – data-intensive computing is increasingly
central to everything Yahoo! does– Hadoop is central to addressing this need
• Hadoop is a case-study in our cloud vision– Processes enormous amounts of data– Provides horizontal scaling and fault-
tolerance for our users– Allows those users to focus on their app
logic
HDFSHDFS
Map-ReduceMap-Reduce
High-level query layer (Pig)
High-level query layer (Pig)
![Page 24: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/24.jpg)
26
The World Has Changed
• Web serving applications need:– Scalability!
• Preferably elastic
– Flexible schemas– Geographic distribution– High availability– Reliable storage
• Web serving applications can do without:– Complicated queries– Strong transactions
![Page 25: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/25.jpg)
2727
MObStor
• Yahoo!’s next-generation globally replicated, virtualized media object storage service
• Better provisioning, easy migration, replication, better BCP, and performance
• New features (Evergreen URLs, CDN integration, REST API, …)
• The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage.
![Page 26: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/26.jpg)
28
Storage & Delivery Stack
![Page 27: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/27.jpg)
29
PNUTS /
SHERPA
To Help You Scale Your Mountains of Data
![Page 28: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/28.jpg)
30
CCDI—Research Collaboration
Yahoo! Research
• Raghu Ramakrishnan • Brian Cooper• Utkarsh Srivastava• Adam Silberstein• Rodrigo Fonseca
CCDI
• Chuck Neerdaels • P.P.S. Narayan • Kevin Athey• Toby Negrin• Plus Dev/QA teams
![Page 29: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/29.jpg)
31
Yahoo! Serving Storage Problem
– Small records – 100KB or less
– Structured records – lots of fields, evolving
– Extreme data scale - Tens of TB
– Extreme request scale - Tens of thousands of requests/sec
– Low latency globally - 20+ datacenters worldwide
– High Availability - outages cost $millions
– Variable usage patterns - as applications and users change
31
![Page 30: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/30.jpg)
33
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
What is PNUTS/Sherpa?
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
Parallel databaseParallel database Geographic replicationGeographic replication
Structured, flexible schemaStructured, flexible schema
Hosted, managed infrastructureHosted, managed infrastructure
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
33
![Page 31: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/31.jpg)
35
What Will It Become?
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
Indexes and viewsIndexes and views
![Page 32: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/32.jpg)
36
Scalability• Thousands of machines• Easy to add capacity• Restrict query language to avoid costly queries
Geographic replication• Asynchronous replication around the globe• Low-latency local access
High availability and fault tolerance• Automatically recover from failures• Serve reads and writes despite failures
Design Goals
36
Consistency• Per-record guarantees• Timeline model • Option to relax if needed
Multiple access paths• Hash table, ordered table• Primary, secondary access
Hosted service• Applications plug and play• Share operational cost
![Page 33: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/33.jpg)
37
Technology Elements
PNUTS • Query planning and execution• Index maintenance
Distributed infrastructure for tabular data • Data partitioning • Update consistency• Replication
YDOT FS • Ordered tables
Applications
Tribble• Pub/sub messaging
YDHT FS • Hash tables
Zookeeper• Consistency service
YC
A:
Aut
hori
zati
on
PNUTS API Tabular API
37
![Page 34: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/34.jpg)
38
Data Manipulation
• Per-record operations– Get– Set– Delete
• Multi-record operations– Multiget– Scan– Getrange
• Web service (RESTful) API
38
![Page 35: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/35.jpg)
39
Tablets—Hash Table
Apple
Lemon
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Banana
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
How much did you pay for this lemon?
Is this a vegetable?
New Zealand
The perfect fruit
Name Description Price
$12
$9
$1
$900
$2
$3
$1
$14
$2
$8
0x0000
0xFFFF
0x911F
0x2AF3
39
![Page 36: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/36.jpg)
40
Tablets—Ordered Table
40
Apple
Banana
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Lemon
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
The perfect fruit
Is this a vegetable?
How much did you pay for this lemon?
New Zealand
$1
$3
$2
$12
$8
$1
$9
$2
$900
$14
Name Description PriceA
Z
Q
H
![Page 37: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/37.jpg)
41
Flexible Schema
Posted date Listing id Item Price
6/1/07 424252 Couch $570
6/1/07 763245 Bike $86
6/3/07 211242 Car $1123
6/5/07 421133 Lamp $15
Color
Red
Condition
Good
Fair
![Page 38: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/38.jpg)
42
Storageunits
Routers
Tablet Controller
REST API
Clients
Local region Remote regions
Tribble
Detailed Architecture
42
![Page 39: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/39.jpg)
43
Tablet Splitting and Balancing
43
Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split
Storage unit may become a hotspotStorage unit may become a hotspot
Shed load by moving tablets to other serversShed load by moving tablets to other servers
Storage unitTablet
![Page 40: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/40.jpg)
44
QUERY PROCESSING
44
![Page 41: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/41.jpg)
45
Accessing Data
45
SUSU SU
1
Get key k
2Get key k3 Record for key k
4 Record for key k
![Page 42: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/42.jpg)
46
Bulk Read
46
SUScatter/gather server
SU SU
1
{k1, k2, … kn}
2Get k1
Get k2Get k3
![Page 43: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/43.jpg)
47
Storage unit 1 Storage unit 2 Storage unit 3
Range Queries in YDOT
• Clustered, ordered retrieval of records
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
Router
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
Grapefruit…Pear?Grapefruit…Lime?
Lime…Pear?
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
![Page 44: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/44.jpg)
48
Updates
1
Write key k
2Write key k7
Sequence # for key k
8
Sequence # for key k
SU SU SU
3Write key k
4
5SUCCESS
6Write key k
RoutersMessage brokers
48
![Page 45: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/45.jpg)
49
ASYNCHRONOUS REPLICATION AND
CONSISTENCY
49
![Page 46: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/46.jpg)
50
Asynchronous Replication
50
![Page 47: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/47.jpg)
51
• Goal: Make it easier for applications to reason about updates and cope with asynchrony
• What happens to a record with primary key “Alice”?
Consistency Model
51
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
As the record is updated, copies may get out of sync.
![Page 48: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/48.jpg)
52
Example: Social Alice
User Status
Alice Busy
West East
User Status
Alice Free
User Status
Alice ???User Status
Alice ???
User Status
Alice Busy
User Status
Alice ______
Busy
Free
Free
Record Timeline
![Page 49: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/49.jpg)
53
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current version
Stale versionStale version
Read
Consistency Model
53
In general, reads are served using a local copy
![Page 50: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/50.jpg)
54
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read up-to-date
Current version
Stale versionStale version
Consistency Model
54
But application can request and get current version
![Page 51: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/51.jpg)
55
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current version
Stale versionStale version
Consistency Model
55
Or variations such as “read forward”—while copies may lag themaster record, every copy goes through the same sequence of changes
![Page 52: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/52.jpg)
56
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current version
Stale versionStale version
Consistency Model
56
Achieved via per-record primary copy protocol(To maximize availability, record masterships automaticlly transferred if site fails)
Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors)
![Page 53: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/53.jpg)
57
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Consistency Model
57
Test-and-set writes facilitate per-record transactions
![Page 54: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/54.jpg)
58
Consistency Techniques
• Per-record mastering– Each record is assigned a “master region”
• May differ between records
– Updates to the record forwarded to the master region– Ensures consistent ordering of updates
• Tablet-level mastering– Each tablet is assigned a “master region”– Inserts and deletes of records forwarded to the master region– Master region decides tablet splits
• These details are hidden from the application– Except for the latency impact!
![Page 55: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/55.jpg)
5959
Mastering
A 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 E A 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 EA 42342 EB 42521 W
C 66354 W
D 12352 EE 75656 C
F 15677 E
Tablet master
![Page 56: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/56.jpg)
60
Bulk Insert/Update/Replace
Client
Source Data
Bulk manager
1. Client feeds records to bulk manager
2. Bulk loader transfers records to SU’s in batches• Bypass routers and
message brokers• Efficient import into
storage unit
![Page 57: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/57.jpg)
61
Bulk Load in YDOT
• YDOT bulk inserts can cause performance hotspots
• Solution: preallocate tablets
![Page 58: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/58.jpg)
62
Index Maintenance
• How to have lots of interesting indexes and views, without killing performance?
• Solution: Asynchrony!– Indexes/views updated asynchronously when
base table updated
![Page 59: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/59.jpg)
63
SHERPAIN CONTEXT
63
![Page 60: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/60.jpg)
64
Types of Record Stores
• Query expressiveness
Simple Feature rich
Object retrieval
Retrieval from single table of
objects/records
SQL
S3 PNUTS Oracle
![Page 61: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/61.jpg)
65
Types of Record Stores
• Consistency model
Best effort Strong guaranteesEventual
consistencyTimeline
consistencyACID
S3 PNUTS Oracle
Program centric
consistency
Program centric
consistencyObject-centric consistency
Object-centric consistency
![Page 62: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/62.jpg)
66
Types of Record Stores
• Data model
Flexibility,Schema evolution
Optimized forFixed schemas
CouchDB
PNUTS
Oracle
Consistency spans objectsConsistency
spans objectsObject-centric consistency
Object-centric consistency
![Page 63: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/63.jpg)
67
Types of Record Stores
• Elasticity (ability to add resources on demand)
Inelastic Elastic
Limited (via data
distribution)
VLSD(Very Large
Scale Distribution /Replication)
OraclePNUTS
S3
![Page 64: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/64.jpg)
68
Data Stores Comparison
• User-partitioned SQL stores– Microsoft Azure SDS– Amazon SimpleDB
• Multi-tenant application databases– Salesforce.com– Oracle on Demand
• Mutable object stores– Amazon S3
Versus PNUTS
• More expressive queries• Users must control partitioning• Limited elasticity
• Highly optimized for complex workloads
• Limited flexibility to evolving applications
• Inherit limitations of underlying data management system
• Object storage versus record management
![Page 65: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/65.jpg)
69
Application Design Space
Records Files
Get a few things
Scan everything
Sherpa MObStor
Everest Hadoop
YMDBMySQL
Filer
Oracle
BigTable
69
![Page 66: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/66.jpg)
70
Alternatives Matrix
Ela
stic
Ope
rabi
lity
Glo
bal l
ow
late
ncy
Ava
ilab
ilit
y
Stru
ctur
ed
acce
ss
Sherpa
Y! UDB
MySQL
Oracle
HDFS
BigTable
DynamoU
pdat
esCassandra
Con
sist
ency
m
odel
SQL
/AC
ID
70
![Page 67: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/67.jpg)
71
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni
Asynchronous View Maintenance for VLSD Databases,Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu RamakrishnanSIGMOD 2009 (to appear)
Cloud Storage Design in a PNUTShellBrian F. Cooper, Raghu Ramakrishnan, and Utkarsh SrivastavaBeautiful Data, O’Reilly Media, 2009 (to appear)
![Page 68: Reflects many discussions with:](https://reader036.vdocuments.net/reader036/viewer/2022081512/568131d2550346895d983bf8/html5/thumbnails/68.jpg)
72
QUESTIONS?
72