ramcloud design review indexing ryan stutsman april 1, 2010 1
TRANSCRIPT
![Page 1: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/1.jpg)
1
RAMCloud Design Review
Indexing
Ryan Stutsman
April 1, 2010
![Page 2: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/2.jpg)
2
Introduction
• Should RAMCloud provide indexing?o Leave indexes to client-side using transactions?
• Many apps have similar indexing needso Hash indexes, B+Trees, etc.o Can reduce app visible latency for indexes by optimizing
server-side
![Page 3: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/3.jpg)
3
Implementation Issues
• Indexing on “opaque” data• Splitting Indexes• Consistency• Recovery/Availability of Indexes
![Page 4: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/4.jpg)
4
Explicit Search Keys
• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?
Max Power (650) 555-5555
put(tableId, person.objectId, person.pickle())
![Page 5: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/5.jpg)
5
Explicit Search Keys
• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?
• Idea: Apps provide search keys explicitlyo Apps understand the data
put(tableId, person.objectId, {‘first’: person.first, ‘last’: person.last}, person.pickle())
Powerlast field IDfirst field ID Max Max Power (650) 555-5555
![Page 6: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/6.jpg)
6
Explicit Search Keys
• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?
• Idea: Apps provide search keys explicitlyo Apps understand the data
• Can eliminate redundancyo Search keys need not be repeated in objecto Search keys + Blob are returned to app on get/lookup
put(tableId, person.objectId, {‘first’: person.first, ‘last’: person.last}, person.pickle())
Powerlast field IDfirst field ID Max (650) 555-5555
![Page 7: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/7.jpg)
7
Explicit Search Keys
• Put atomically updates indexes and objecto Details to follow
put(tableId, objectId, searchKeys, blob)
get(tableId, objectId) –> (searchKeys, blob)
lookup(tableId, indexName, searchValue) -> (searchKeys, blob)
![Page 8: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/8.jpg)
8
Splitting Indexes
• Co-locate index and data
• Large tables?• Large indexes?
o Can’t avoid multi-machine operations
IndexA-Z
Data0-99
Master CMaster B
Data0-299
Master A
IndexA-Z
Master A
![Page 9: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/9.jpg)
9
Splitting Indexes
• Split indexes on search key
o One extra access per lookup and put
• Split indexes on object ID
o Lookups go to all index fragmentso Puts are always local
• Our decision (for now): On search keyo Don’t want weakest-link lookup performance
Index200-299
Data200-299
Index100-199
Data100-199
Index0-99
Data0-99
Data100-299
IndexA-R
IndexS-Z
Data0-99
![Page 10: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/10.jpg)
10
Consistency
• Problem: Index/Object inconsistency on putso Object and index may reside on different hostso Apps can get objects that aren’t in the index yeto Apps may see index entries for objects not in table yet
• Avoid commit protocol• Idea: Index entries “commit” on object put
o Write index entrieso Then write object to tableo Index entries considered invalid until object written
• Turns atomic puts into atomic index updates
![Page 11: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/11.jpg)
11
Consistency
Powell 300
Powers 299
Mary 299
Mel 300
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 12: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/12.jpg)
12
Consistency: Lookup
Powell 300
Powers 299
lookup(0, ‘last’, ‘Power’)
Mary 299
Mel 300
• Request goes directly to correct indexo “Not found” returns immediately
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 13: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/13.jpg)
13
Consistency: Lookup
Powell 300
Powers 299
lookup(0, ‘last’, ‘Powell’)
Mary 299
Mel 300
‘Powell’ == ‘Powell’ ok
• Consistency is checked on hito If table and index agree the return the objecto Else “not found”
300lastName Index
firstName Index
300
Mary Powers Mel Powell
Data Table
299 300
![Page 14: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/14.jpg)
14
Consistency: Create
Powell 300
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Mel 300
• Insert index entries before writing object
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 15: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/15.jpg)
15
Consistency: Create
Powell 300
Power 301
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Mel 300
• Insert index entries before writing objecto What if a lookup happens in the meantime?
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 16: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/16.jpg)
16
Consistency: Concurrent Lookup
Powell 300
Power 301
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Mel 300
lookup(0, ‘last’, ‘Power’)
• Concurrent ops ignore inconsistent entries
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 17: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/17.jpg)
17
Mary Powers Mel Powell
Data Table
299 300
Consistency: Concurrent Lookup
Powell 300
Power 301
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Mel 300
lookup(0, ‘last’, ‘Power’)
Not Found
• Concurrent ops ignore inconsistent entries
301
lastName Index
firstName Index
301
![Page 18: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/18.jpg)
18
Consistency: Create (continued)
Powell 300
Power 301
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Max 301
Mel 300
• Insert index entries before writing object
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 19: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/19.jpg)
19
Mary Powers Mel Powell
Data Table
299 300
Consistency: Create
Powell 300
Power 301
Powers 299
put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())
Mary 299
Max 301
Mel 300
Max Power
• Put completes; index entries now valid
lastName Index
firstName Index
301
![Page 20: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/20.jpg)
20
Consistency: Delete
Powell 300
Power 301
Powers 299
delete(0, 301)
Mary 299
Max 301
Mel 300
Max Power
• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
Max Power
301
![Page 21: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/21.jpg)
21
Consistency: Delete
Powell 300
Power 301
Powers 299
delete(0, 301)
Mary 299
Max 301
Mel 300
• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 22: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/22.jpg)
22
Mary Powers Mel Powell
Data Table
299 300
Consistency: Delete
Powell 300
Powers 299
delete(0, 301)
Mary 299
Mel 300
• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object
lastName Index
firstName Index
![Page 23: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/23.jpg)
23
Consistency: Update
Powell 300
Powers 299
put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())
Mary 299
Mel 300
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 24: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/24.jpg)
24
Consistency: Update
Miller 299
Powell 300
Powers 299
put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())
Mary 299
Mel 300
• Compare previous index entrieso Insert new value if updated
lastName Index
firstName Index
Mary Powers Mel Powell
Data Table
299 300
![Page 25: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/25.jpg)
25
Consistency: Update
Miller 299
Powell 300
Powers 299
put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())
Mary 299
Mel 300
• Commit by writing the new valueo Old index entries ignored by lookup since inconsistent
lastName Index
firstName Index
Mary Miller Mel Powell
Data Table
299 300
![Page 26: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/26.jpg)
26
Consistency: Update
Miller 299
Powell 300
put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())
Mary 299
Mel 300
• Cleanup old, inconsistent entries
lastName Index
firstName Index
Mary Miller Mel Powell
Data Table
299 300
![Page 27: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/27.jpg)
27
Consistency: Thoughts
• Atomic puts give index updates atomicity• Low-latency gives simplified consistency
o Can afford to have a single writer per objecto Provides us with atomic put primitive for free
![Page 28: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/28.jpg)
28
Index Recovery
• Problem: Unavailable until indexes recovero Many requests will be lookupso These will block until indexes are recovered
• Rebuild versus Store?o Storing comes at a cost to write-bandwidtho Possible using scale we can rebuild faster than store
![Page 29: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/29.jpg)
29
Index Recovery: Partitioning
•How far does partitioning + rebuilding get us?• Worst case: Entire partition of index data only
o At most 640 MBo Larger indexes recovered a partition to a host in parallel
![Page 30: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/30.jpg)
30
Index Recovery: Partitioning
Recover a single index partition on a new master:
1. Data partitions scan, extract index entries (0.6s)o Hashtable: 10 million lookups/seco 640 MB / 100 byte/object = 6.4 million objects
2. Transmit entries to new index partition (0.6s)o At most 640 MB @ 10 Gbit/s
3. New index master reinsert entries (0.6s) Similar time to master hashtable scan
• All operations are pipelinedo 0.6s to scan, extract, transmit, rebuild total
• If data partitions for index in recovery add 0.6so 1.2s upper bound for conservative 100b object size
![Page 31: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/31.jpg)
31
Summary
• Explicit search keys both flexible and efficient• Split indexes on search key for fast lookup• Atomic puts simplify atomic indexes• Scale drives index recovery for availability
![Page 32: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1](https://reader030.vdocuments.net/reader030/viewer/2022032517/56649c925503460f9494dfc4/html5/thumbnails/32.jpg)
32
Discussion