boxwood: abstractions as the foundation for storage infrastructure lidong zhou, microsoft research...
TRANSCRIPT
![Page 1: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/1.jpg)
Boxwood: Abstractions as the Foundation for Storage Infrastructure
Lidong Zhou, Microsoft Research Silicon Valley
Joint work with Chandu Thekkath, John MacCormick, Nick Murphy, and Marc Najork
![Page 2: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/2.jpg)
12/06/2004 Boxwood 2
Distributed Storage Applications are Hard to Build Distributed storage: low hardware cost, but
high development/deployment cost Application logic on low-level storage interface Hardware parallelism and concurrency control Fault tolerance a necessity Incremental expansion and dynamic
reconfiguration vs. system consistency
Our goal: Distributed storage applications made easyto design, build, and deploy
![Page 3: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/3.jpg)
12/06/2004 Boxwood 3
Target Application and Setting
CPU + Memory
CPU + Memory
CPU + Memory
Local Area Network
Locally Attached Disks
Locally Attached Disks
Locally Attached Disks
Enterprise storage applications and back-end storage for data-intensive Internet services
![Page 4: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/4.jpg)
12/06/2004 Boxwood 4
Roadmap
Boxwood Vision Boxwood Architecture Building Applications on Boxwood Performance Related Work and Conclusion
![Page 5: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/5.jpg)
12/06/2004 Boxwood 5
Boxwood Vision
Incorporate rich virtualized abstractions into low levels of the storage
An evolution path for distributed storage:
Storage Applications
![Page 6: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/6.jpg)
12/06/2004 Boxwood 6
Boxwood Vision
Incorporate rich virtualized abstractions into low levels of the storage
An evolution path for distributed storage:
Virtual Disk
Storage Applications
![Page 7: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/7.jpg)
12/06/2004 Boxwood 7
Boxwood Vision
Incorporate rich virtualized abstractions into low levels of the storage
An evolution path for distributed storage:
Storage Applications
… …
Tree Table List
![Page 8: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/8.jpg)
12/06/2004 Boxwood 8
Why High-Level Abstractions Reduce the complexity of distributed
storage applications Natural continuum of storage virtualization “High-level programming language” for building
distributed storage applications Potential built-in performance optimization
by exploiting structural information Caching Prefetching
![Page 9: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/9.jpg)
12/06/2004 Boxwood 9
Roadmap
Boxwood Vision Boxwood Architecture Building Applications on Boxwood Performance Related Work and Conclusion
![Page 10: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/10.jpg)
12/06/2004 Boxwood 10
Chunk Store
Reliable“Media”
Services
Locking
Logging
Consensus
Storage Application
High-levelStorage
Abstractions
Boxwood Architecture
Replicated Logical Device
Magnetic Media
B-Tree
![Page 11: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/11.jpg)
12/06/2004 Boxwood 11
Chunk Store Persistent storage with
“malloc”-like interface
Virtualization layer that hides the distributed nature
Manage address space or free space for higher layers
Reliable storage through replicated logical device
Chunk Store
AllocateDe-allocate
ReadWrite
ReplicatedLogical Device
![Page 12: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/12.jpg)
12/06/2004 Boxwood 12
B-Tree Abstraction B-Tree: A proven useful
data structure for storage applications
Distributed/reliable B-Link trees in Boxwood B-Link trees: high
concurrency with simple locking
Distributed reliable storage from chunk store
Caching for performance Distributed lock service
for consistency Logging for recovery
B-Link Tree
InsertDelete
LookupEnumerate
Create
Chunk Store
LockingLogging
![Page 13: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/13.jpg)
12/06/2004 Boxwood 13
Boxwood Services Distributed lock service for coordinating
concurrent access to shared data Logging and recovery service for atomicity
in face of transient failures Consensus service for system consistency
Clean design of these services is crucial for scalability and for managing complexity
![Page 14: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/14.jpg)
12/06/2004 Boxwood 14
Roadmap
Boxwood Vision Boxwood Architecture Building Applications on Boxwood Performance Related Work and Conclusion
![Page 15: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/15.jpg)
12/06/2004 Boxwood 15
Distributed Storage Applications on Boxwood: A Recipe
1. Design applications for local storage Map application logic to storage abstractions
2. Adapt the design for a distributed storage infrastructure Boxwood abstractions are virtualized
Boxwood offers facilitating distributed services
Separating algorithmic design from distributed system concerns is attractive.
![Page 16: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/16.jpg)
12/06/2004 Boxwood 16
Local Disks
From B-Link Tree Algorithm to Distributed Reliable B-Link Trees
Local Disks
B-Link trees on a single machine
B-Link Tree Algorithm
LocalLocks
Logging
![Page 17: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/17.jpg)
12/06/2004 Boxwood 17
From B-Link Tree Algorithm to Distributed Reliable B-Link Trees
B-Link Tree Algorithm
GlobalLock
Service
ReliableLogging
Chunk Store
Distributed and reliable B-Link trees
ReplicatedLogical Device
![Page 18: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/18.jpg)
12/06/2004 Boxwood 18
B-Link Tree
Chunk Store
Services
BoxFS
BoxFS:Multi-Node File Server on Boxwood
Exported via NFS v2 Directory/File B-Tree
Directory: maps names to NFS file handle with embedded B-tree handle
File: maps block number to chunk handle
File blocks chunks Locking/caching at file
system level ~2500 lines of C# code
![Page 19: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/19.jpg)
12/06/2004 Boxwood 19
Roadmap
Boxwood Vision Boxwood Architecture Building Applications on Boxwood Performance Related Work and Conclusion
![Page 20: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/20.jpg)
12/06/2004 Boxwood 20
Prototype Deployment and Performance Evaluation System setup
Eight Dell PowerEdge 2650 servers with a single 2.4 GHz Xeon processor, 1GB of RAM
Gigabit Ethernet switch Adaptec AIC-7899 dual SCSI adapter, and 5 SCSI
drives Performance evaluation
Single-machine non-replicated performance (BoxFS vs. NFS)
B-tree operation scalability BoxFS operation scalability
![Page 21: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/21.jpg)
12/06/2004 Boxwood 21
BoxFS vs. NFS over NTFS:Connectathon Benchmarks
0
2
4
6
8
10
12
crea
te
rem
ove
getw
d+st
at
chm
od+sta
twrit
ere
ad
read
dir
rena
me
sym
link
stat
fs
BoxFS
NFS
![Page 22: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/22.jpg)
12/06/2004 Boxwood 22
B-Tree Scaling (Private Tree)
Throughput (Ops/sec)
0
100
200
300
400
500
600
700
800
900
2 4 6 8
Number of Servers
Ins
ert
& D
ele
te
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Lo
ok
up
Inserts Deletes Lookups
![Page 23: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/23.jpg)
12/06/2004 Boxwood 23
BoxFS Scaling (Read)
Throughput (MB/sec)
0
0.5
1
1.5
2
2.5
3
2 3 4 5 6 7 8
Number of BoxFS servers
Re
ad
![Page 24: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/24.jpg)
12/06/2004 Boxwood 24
B-Tree Scaling (Shared Tree)
Throughput (Ops/sec)
0
100
200
300
400
500
600
2 4 6 8
Number of Servers
Ins
ert
& D
ele
te
0
500
1000
1500
2000
2500
3000
3500
4000
Lo
ok
up
Inserts Deletes Lookups
![Page 25: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/25.jpg)
12/06/2004 Boxwood 25
BoxFS Scaling (Write/MkDirEnt)
Write Throughput (MB/sec) and MkDirEnt Latency (sec)
0
0.5
1
2 3 4 5 6 7 8
Number of BoxFS servers
Wri
te
1
1.5
2
2.5
3
3.5
4
MkD
irE
nt
WriteFile MkDirEnt
![Page 26: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/26.jpg)
12/06/2004 Boxwood 26
Roadmap
Boxwood Vision Boxwood Architecture Building Applications on Boxwood Performance Related Work and Conclusion
![Page 27: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/27.jpg)
12/06/2004 Boxwood 27
Related Work Distributed Storage/Operating Systems
Virtual/Logical disks File systems Database systems
Scalable Distributed Data Structures Linear Hash Table (LH) and its variants
(Litwin, 1980--present) Scalable distributed hash table
(Gribble et al., 2000)
Highly concurrent B-trees (Lehman and Yao, 1981; Sagiv, 1986)
![Page 28: Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John](https://reader035.vdocuments.net/reader035/viewer/2022081518/55147549550346494e8b6286/html5/thumbnails/28.jpg)
12/06/2004 Boxwood 28
Conclusion and Future Directions
A storage infrastructure offering virtualized high-level abstractions is promising
Future Work: Explore more abstractions and applications;
expose flexible interfaces (e.g., through hints) Leverage high-level abstractions for better load
balancing, prefetching, and caching Graceful degradation during massive failures