speculative execution in distributed file system and external synchrony
DESCRIPTION
Speculative Execution In Distributed File System and External Synchrony. Edmund B.Nightingale , Kaushik Veeraraghavan Peter Chen, Jason Flinn Presented by Han Wang Slides based on the SOSP and OSDI presentations. C onsistency A vailability P artition Tolerance. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/1.jpg)
Speculative Execution In Distributed File Systemand
External Synchrony
Edmund B.Nightingale, Kaushik VeeraraghavanPeter Chen, Jason Flinn
Presented by Han WangSlides based on the SOSP and OSDI presentations
![Page 2: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/2.jpg)
ConsistencyAvailability
Partition Tolerance
![Page 3: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/3.jpg)
“ … consistency, availability, and partition tolerance. It is impossible to achieve all three. “
-- Gilbert and Lynn, MIT
“So in reality, there are only two types of systems: CP/CA and AP” -- Daniel Abadi, Yale
“There is no ‘free lunch’ with distributed data.” -- Anonymous, HP
![Page 4: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/4.jpg)
AP: Lack Consistency
CP: Lack Availability CA: Lack Partition Tolerance
![Page 5: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/5.jpg)
Synchrony
Asynchrony
![Page 6: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/6.jpg)
Synchrony Asynchrony
![Page 7: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/7.jpg)
synchronous abstractions:strong reliability guarantees but are slow
asynchronous counterparts:relax reliability guarantees reasonable performance
![Page 8: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/8.jpg)
External Synchrony
![Page 9: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/9.jpg)
• provide the reliability and simplicity of a synchronous abstraction
• approximate the performance of an asynchronous abstraction.
![Page 10: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/10.jpg)
Speculative Execution in a Distributed File SystemEdmund B. Nightingale, Peter M. Chen, and Jason Flinn
Rethink the SyncEdmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen and Jason Flinn
![Page 11: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/11.jpg)
Authors• Edmund B Nightingale
– PhD from UMich (Jason Flinn)– Microsoft Research– Best Paper Award (OSDI 2006)
• Kaushik Veeraraghavan– PhD Student in Umich (Jason Flinn)– Best Paper Award (FAST 2010, ASPLOS 2011)
• Peter M Chen– PhD from Berkeley (David Patterson)– Faculty at UMich
• Jason Flinn– PhD from CMU (Mahadev Satyanarayanan)– Faculty at Umich
![Page 12: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/12.jpg)
Speculative Execution in a Distributed File SystemEdmund B. Nightingale, Peter M. Chen, and Jason Flinn
Rethink the SyncEdmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen and Jason Flinn
![Page 13: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/13.jpg)
IdeaExampleDesign
Evaluation
![Page 14: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/14.jpg)
External Synchrony
• Question– How to improve both durability and performance
for local file system?• Two extremes– Synchronous IO• Easy to use• Guarantee ordering
– Asynchronous IO• Fast
![Page 15: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/15.jpg)
15
When a sync() is really async• On sync() data written only to volatile cache– 10x performance penalty and data NOT safe
VolatileCacheOperating
SystemCylinders
Disk
100x slower than asynchronous I/O if disable cache
From Nightingale’s presentation
![Page 16: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/16.jpg)
16
To whom are guarantees provided?• Synchronous I/O definition:– Caller blocked until operation completes
Disk Screen
App App
Guarantee provided to application
App
Network
OS Kernel
From Nightingale’s presentation
![Page 17: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/17.jpg)
17
To whom are guarantees provided?
• Guarantee really provided to the user
OS Kernel
Disk Screen
App App App
Network
From Nightingale’s presentation
![Page 18: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/18.jpg)
18
Example: Synchronous I/O
OS Kernel DiskProcess
101 write(buf_1);102 write(buf_2);103 print(“work done”);104 foo();
Application blocks
Application blocks
%work done%
TEXT
%
From Nightingale’s presentation
![Page 19: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/19.jpg)
19
Observing synchronous I/O101 write(buf_1);102 write(buf_2);103 print(“work done”);104 foo();
• Sync I/O externalizes output based on causal ordering– Enforces causal ordering by blocking an application
• External sync: Same causal ordering without blocking applications
Depends on 1st write
Depends on 1st & 2nd write
From Nightingale’s presentation
![Page 20: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/20.jpg)
20
Example: External synchrony
OS Kernel DiskProcess
101 write(buf_1);102 write(buf_2);103 print(“work done”);104 foo();
TEXT
%work done%%
From Nightingale’s presentation
![Page 21: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/21.jpg)
External Synchrony Design Overview
• Synchrony defined by externally observable behavior.– I/O is externally synchronous if output cannot be distinguished
from output that could be produced from synchronous I/O.– File system does all the same processing as for synchronous.
• Two optimizations made to improve performance.– Group committing is used (commits are atomic).– External output is buffered and processes continue execution.
• Output guaranteed to be committed every 5 seconds.
![Page 22: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/22.jpg)
External Synchrony Implementation
• Xsyncfs leverages Speculator infrastructure for output buffering and dependency tracking for uncommitted state.
• Speculator tracks commit dependencies between processes and uncommitted file system transactions.
• ext3 operates in journaled mode.
![Page 23: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/23.jpg)
Evaluation• Durability• Performance– IO intensive application (Postmark)– Application that synchronize explicitly (MySQL)– Network intensive, Read-heavy application
(SPECweb)– Output-trigger commit on delay
![Page 24: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/24.jpg)
Postmark benchmark
Xsyncfs within 7% of ext3 mounted asynchronously
1
10
100
1000
10000
Tim
e (S
econ
ds)
ext3-asyncxsyncfsext3-syncext3-barrier
From Nightingale’s presentation
![Page 25: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/25.jpg)
The MySQL benchmark
Xsyncfs can group commit from a single client
0500100015002000250030003500400045005000
0 5 10 15 20
Number of db clients
New
Ord
er T
rans
actio
ns P
er M
inut
e
xsyncfsext3-barrier
From Nightingale’s presentation
![Page 26: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/26.jpg)
Specweb99 throughput
Xsyncfs within 8% of ext3 mounted asynchronously
0
50
100
150
200
250
300
350
400
Thro
ughp
ut (K
b/s)
ext3-asyncxsyncfsext3-syncext3-barrier
From Nightingale’s presentation
![Page 27: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/27.jpg)
Specweb99 latencyRequest
size ext3-async xsyncfs
0-1 KB 0.064 seconds 0.097 seconds
1-10 KB 0.150 second 0.180 seconds
10-100 KB 1.084 seconds 1.094 seconds
100-1000 KB 10.253 seconds 10.072 seconds
Xsyncfs adds no more than 33 ms of delayFrom Nightingale’s presentation
![Page 28: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/28.jpg)
Discussions
• Is the idea sound?– Nice idea, new idea.
• Flaws?– Are the experiments realistic?
• What are your take-aways from this paper?
![Page 29: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/29.jpg)
Speculative Execution in a Distributed File SystemEdmund B. Nightingale, Peter M. Chen, and Jason Flinn
Rethink the SyncEdmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen and Jason Flinn
![Page 30: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/30.jpg)
IdeaExampleDesign
Evaluation
![Page 31: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/31.jpg)
Speculation Execution
• Question– How to improve the distributed file system
performance?• Characteristics of DFS– Single, coherent namespace
• Existing approach– Trade-off consistency for performance
![Page 32: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/32.jpg)
The Idea
• Speculative execution– Hide IO latency• Issue multiple IO operations concurrently
– Also improve IO throughput• Group commit
• For it to succeed– Correct– Efficient– Easy to use
![Page 33: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/33.jpg)
Conditions for Success of Speculations
• Results of Speculation is highly predictable– Concurrent updates on cached files are rare
• Checkpointing is faster than Remote I/O– 50us ~ 6ms (amortizable) v.s. network RTT
• Modern computers have spare resources– CPUs are idle for significant portions of time– Extra memory is available for checkpoints
![Page 34: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/34.jpg)
Speculator Interface
• Speculator provides a lightweight checkpoint and rollback mechanism
• Interface to encapsulate implementation details:– create_speculation– commit_speculation– fail_speculation
• Separation of policy and mechanism– Speculator remain ignorant on why clients speculate– DFS do not concern how speculation is done
![Page 35: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/35.jpg)
35Undo log
Implementing SpeculationProce
ss
Checkpoint Spec
1) System call 2) Create speculation
Time
From Nightingale’s presentation
Ordered list of speculative operations
Tracks kernel objects that depend on it
Copy on write fork()
![Page 36: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/36.jpg)
36
Speculation Success
Undo log
Checkpoint
1) System call 2) Create speculation
Process
3) Commit speculation
Time
Spec
From Nightingale’s presentation
Ordered list of speculative operations
Tracks kernel objects that depend on it
![Page 37: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/37.jpg)
37
Speculation Failure
Undo log
Checkpoint
1) System call 2) Create speculation
Process
3) Fail speculation
Process
Time
Spec
From Nightingale’s presentation
Ordered list of speculative operations
Tracks kernel objects that depend on it
![Page 38: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/38.jpg)
Ensuring correctness
• Two invariants– Speculative state should never be visible to user or any
external devices– Process should never view speculative state unless it
speculatively depends on the state• Non-speculative process must block or become speculative when
viewing speculative states
• Three ways to ensure correct executions:– Block– Buffer– Propagate speculations (dependencies)
![Page 39: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/39.jpg)
39
Output Commits
“stat worked”
“mkdir worked”
Undo log
Checkpoint
Checkpoint
Spec(stat)
Spec(mkdir)
1) sys_stat 2) sys_mkdir
Process
Time
3) Commit speculation
From Nightingale’s presentation
![Page 40: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/40.jpg)
Multi-Process Speculation
• Processes often cooperate– Example: “make” forks children to compile, link,
etc.– Would block if speculation limits to one task
• Allow kernel objects to have speculative state– Examples: inodes, signals, pipes, Unix sockets, etc.– Propagate dependencies among objects– Objects rolled back to prior states when specs fail
![Page 41: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/41.jpg)
41
Spec 1Spec 1
Multi-Process Speculation
Spec 2
pid 8001
Checkpoint
Checkpoint
inode 3456
Chown-1
Write-1
pid 8000
CheckpointCheckpoint
Checkpoint
Chown-1
Write-1
From Nightingale’s presentation
![Page 42: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/42.jpg)
Multi-Process Speculation
• Supports– Objects in distributed file system– Objects in local memory file system -- RAMFS– Modified Local ext3 file system– IPCs:• Pipes and fifos, Unix sockets, signals, fork and exits
• Does not Support– System V IPC, Futex, shared memory
![Page 43: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/43.jpg)
Using SpeculationClient 1 Client 2
1. cat foo > bar
2. cat bar
Time
Question: What does client 2 view in ‘bar’?
Reproduced from Nightingale’s Presentation
Handling Mutating Operations• Server permits other processes to see speculatively
changed file only if cached version matches the server version
• Server must process message in the same order as clients see
• Server never store speculative data
![Page 44: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/44.jpg)
44
• Speculator makes group commit possible
write
writecommit
commit
ClientClient Server Server
Using Speculation
Reproduced from Nightingale’s Presentation
![Page 45: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/45.jpg)
Evaluation: Speculative Execution
• To answer the following questions– Performance gain from propagating dependencies– Impact on performance when speculation fails– Impact on performance of group commit and
sharing state
![Page 46: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/46.jpg)
46
Apache Build
• With delays SpecNFS up to 14 times faster
0
50
100
150
200
250
300
No delay
Tim
e (s
econ
ds)
NFSSpecNFSBlueFSext3
0
500
1000
1500
2000
2500
3000
3500
4000
4500
30 ms delay
From Nightingale’s presentation
![Page 47: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/47.jpg)
47
The Cost of Rollback
• All files out of date SpecNFS up to 11x faster
0
20
40
60
80
100
120
140
NFS SpecNFS ext3No delay
Tim
e (s
econ
ds)
0200400600800
100012001400160018002000
NFS SpecNFS ext330ms delay
No files invalid10% files invalid50% files invalid100% files invalid
From Nightingale’s presentation
![Page 48: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/48.jpg)
48
Group Commit & Sharing State
050
100150200250300350400450500
NFS SpecNFS BlueFS
0 ms delay
Tim
e (s
econ
ds)
0500
10001500200025003000350040004500
NFS SpecNFS BlueFS30ms delay
DefaultNo propNo grp commitNo grp commit & no prop
From Nightingale’s presentation
![Page 49: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/49.jpg)
Discussions
• Is speculation in OS the right level of abstraction?– Similar Ideas:
• Transaction and Rollback in Relational Database• Transactional Memory• Speculative Execution in OS
• What if the conditions for success do not hold?• Portability of code– Code perform worse if OS does not speculate– What about transform source code to perform speculation?
• Why isn’t this used nowadays?
![Page 50: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/50.jpg)
Conclusions
• Performance need not be sacrified for durability
• The transaction and rollback infrastructure in OS is very useful, two good papers!
• Ideas are not new, but are generic.
![Page 51: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/51.jpg)
Thanks!
![Page 52: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/52.jpg)
Things they did not do
• Mechanism to prevent disk corruption when crash occurs. They used the default journaled mode.
![Page 53: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/53.jpg)
ComparisonSpeculative Execution Rethink the Sync
Synchronous IO -> Asynchronous IO
Distributed File System Local File System
Checkpointing --
Pipelining Sequential IO --
Propagate Dependencies Propagate Dependencies
Group Commit Group Commit
-- Output-triggered commit
![Page 54: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/54.jpg)
54
Systems Calls• Modify system call jump table• Block calls that externalize state– Allow read-only calls (e.g. getpid)– Allow calls that modify only task state (e.g. dup2)
• File system calls -- need to dig deeper– Mark file systems that support Speculator
getpidrebootmkdir
Call sys_getpid()Block until specs resolvedAllow only if fs supports Speculator
![Page 55: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/55.jpg)
Scenario 1:write ();print ();write ();print ();
Source: OSDI official blog
Question:Does xsyncfs perform similarly as synchronous IO?
![Page 56: Speculative Execution In Distributed File System and External Synchrony](https://reader035.vdocuments.net/reader035/viewer/2022081517/56815c0c550346895dc9ed88/html5/thumbnails/56.jpg)
• Scenario 2:Process A Process B
acquire_mutex(x)
write (val) acquire_mutex(x)
release_mutex(x)
read(val)
release_mutex(x)
print(val)
Time
Question: Will process B fail to read (Step 4) the update by process A?Will the print comes before the write in process A have committed?
Source: OSDI official blog