building distributed, wide-area applications with wheelfs
DESCRIPTION
Building Distributed, Wide-Area Applications with WheelFS. Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU. Grid Computations Share Data. Nodes in a distributed computation share: Program binaries Initial input data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/1.jpg)
Building Distributed, Wide-Area Applications with
WheelFS
Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris
MIT CSAIL and NYU
![Page 2: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/2.jpg)
2
Grid Computations Share Data
Nodes in a distributed computation share:– Program binaries– Initial input data– Processed output from one node as
intermediary input to another node
![Page 3: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/3.jpg)
3
So Do Users and Distributed Apps
• Shared home directory for testbeds (e.g., PlanetLab, RON)
• Distributed apps reinvent the wheel:– Distributed digital research library– Wide-area measurement experiments– Cooperative web cache
• Can we invent a shared data layer once?
![Page 4: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/4.jpg)
4
Our Goal• Distributed file system for testbeds/Grids
• App can share data between nodes
• Users can easily access data
• Simple-to-build distributed apps
NodeNodeNode
NodeNode Node
Filefoo
Testbed/Grid
Filefoo
Filefoo
![Page 5: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/5.jpg)
5
Current Solutions
Usual drawbacks:– All data flows through one node – File systems are too transparent
• Mask failures• Incur long delays
Node NodeNode
NodeNode Node
Testbed/Grid
CentralFile Server
Copyfoo File
foo
![Page 6: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/6.jpg)
6
Our Proposal: WheelFS
• A decentralized, wide-area FS
• Main contributions:
1) Provide good performance according to Read Globally, Write Locally
2) Give apps control with semantic cues
![Page 7: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/7.jpg)
7
Talk Outline
1. How to decentralize your file system
2. How to control your files
![Page 8: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/8.jpg)
8
What Does a File System Buy You?
• A familiar interface
• Language-independent usage model
• Hierarchical namespace useful for apps
• Quick-prototyping for apps
![Page 9: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/9.jpg)
9
File Systems 101
• File system (FS) API:– Open <filename> <file_id>– {Close/Read/Write} <file_id>
• Directories translate file names to IDs
App 1 App 2
Operating System File System
API call
Localhard disk
Node
![Page 10: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/10.jpg)
10
Distributed File Systems
App 1 App 2
Operating System File System
API call
Localhard disk
Node
Node Node Node Node Node
Testbed/Grid
File135
Dir 500:“foo” 135
![Page 11: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/11.jpg)
11
Basic Design of WheelFS
Node653
Node076
Node150 Node
554
Node402
Node257
File 135?File135
135
135135
File135v2
File135v3
135v2
135v2
135v3
135v3
Consistency Servers
076 150257 402554 653
![Page 12: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/12.jpg)
12
Read Globally, Write Locally
• Perform writes at local disk speeds
• Efficient bulk data transfer
• Avoid overloading nodes w/ popular files
![Page 13: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/13.jpg)
13
Write Locally
Node653 Node
076
Node150
Node554
Node402
Node257
Createfoo/bar
1. Choose an ID
2. Create dir entry
3. Write local file
550
Dir209(foo)
File550(bar) bar = 550
Readfoo/bar
![Page 14: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/14.jpg)
14
Read Globally
Node653 Node
076
Node150
Node554
Node402
Node257
Read file 135
File135
Cached135 Cached
135
076653
Chunk
Chunk
Cached135
1. Contact node
2. Receive list
3. Get chunks
076653
076554653
Chunk
Read file 135
File135
![Page 15: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/15.jpg)
15
Example: BLAST
• DNA alignment tool run on Grids
• Copy separate DB portions and queries to many nodes
• Run separate computations
• Later fetch and combine results
![Page 16: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/16.jpg)
16
Example: BLAST
• With WheelFS, however:– No explicit DB copying necessary– Efficient initial DB transfers– Automatic caching for reused DBs and queries
• Could be better since data is never updated
![Page 17: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/17.jpg)
17
Example: Cooperative Web Cache
Collection of nodes that:– Serve redirected web requests– Fetch web content from original web servers– Cache web content and serve it directly– Find cached content on other CWC nodes
![Page 18: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/18.jpg)
18
Example: Cooperative Web Cache
• Avoid hotspots
if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL
![Page 19: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/19.jpg)
19
if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL
Example: Cooperative Web Cache
Node653 Node
076
Node150
Node554
Node402
Node257
File135
Cached135
Client $URL
“$URL”?135
135?135 = v1402
Chunk
Chunk
Chunk
Cached135
No!
$URL
File550
“$URL” == 550
Dir070
(/wfs/cwc)
![Page 20: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/20.jpg)
20
Talk Outline
1. How to decentralize your file system
2. How to control your files
![Page 21: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/21.jpg)
21
Example: Cooperative Web Cache
• Would rather fail and refetch than wait
• Perfect consistency isn’t crucial
if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL
![Page 22: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/22.jpg)
22
Explicit Semantic Cues
• Allow direct control over system behavior
• Meta-data that attach to files, dirs, or refs
• Apply recursively down dir tree
• Possible impl: intra-path component– /wfs/cwc/.cue/foo/bar
![Page 23: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/23.jpg)
23
Semantic Cues: Writability• Applies to files
• WriteMany (default)
• WriteOnce Node653 Node
076
Node150
Node554
Node402
Node257
File 135?
File135
File135v2
File135v3
Cached135v3
Cached135
![Page 24: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/24.jpg)
24
Semantic Cues: Freshness• Applies to file references
• LatestVersion (default)
• AnyVersion
• BestVersion
Node653 Node
076
Node150
Node554
Node402
Node257
File 135?
File135
Cached135
![Page 25: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/25.jpg)
25
Semantic Cues: Write Consistency• Applies to files or directories
• Strict (default)
• Lax Node653 Node
076
Node150
Node554
Node402
Node257
WriteFile 135
File135
135
WriteFile 135
File135v2
135v2
![Page 26: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/26.jpg)
26
Example: BLAST
• WriteOnce for all:– DB files– Query files– Result files
• Improves cachability of these files
![Page 27: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/27.jpg)
27
Example: Cooperative Web Cache
• Reading an older version is ok:– cat /wfs/cwc/.maxtime=250,bestversion/foo
• Writing conflicting versions is ok:– wget http://foo > /wfs/cwc/.lax,writemany/foo
if [ -f /wfs/cwc/.maxtime=250,bestversion/$URL ]; then if notexpired /wfs/cwc/.maxtime=250,bestversion/$URL; then cat /wfs/cwc/.maxtime=250,bestversion/$URL exit fifiwget $URL –O - | tee /wfs/cwc/.lax,writemany/$URL
![Page 28: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/28.jpg)
28
Discussion• Must break data up into files small enough
to fit on one disk
• Stuff we swept under the rug:– Security– Atomic renames across dirs– Unreferenced files
![Page 29: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/29.jpg)
29
Related Work
• Every FS paper ever written
• Specifically:– Cluster FS: Farsite, GFS, xFS, Ceph– Wide-area FS: JetFile, CFS, Shark– Grid: LegionFS, GridFTP, IBP– POSIX I/O High Performance Computing
Extensions
![Page 30: Building Distributed, Wide-Area Applications with WheelFS](https://reader035.vdocuments.net/reader035/viewer/2022062800/568141a4550346895dad897d/html5/thumbnails/30.jpg)
30
Conclusion
• WheelFS: distributed storage layer for newly-written applications
• Performance by reading globally and writing locally
• Control through explicit semantic cues