1 a new approach to file system cache writeback of application data sorin faibish – emc...
TRANSCRIPT
![Page 1: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/1.jpg)
1
A New Approach to File System Cache Writeback
of Application Data
Sorin Faibish – EMC Distinguished EngineerP. Bixby, J. Forecast, P. Armangau and S. PawarEMC USD Advanced Development
SYSTOR 2010, May 24-26, 2010, Haifa, Israel
![Page 2: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/2.jpg)
2
Motivation: changes in servers technology
Cache writeback problem statement
Monitoring behavior of application data flush
Cache writeback as a closed loop system
Current cache writeback methods are obsolete
I/O “slow down” problem
New algorithms for cache writeback
Simulation results of new algorithms
Experimental results of a real NFS server
Summary and conclusions
Future work and extension to Linux FS
Outline
![Page 3: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/3.jpg)
3
Large numbers of cores in CPUs – more computing power
Large cheaper memory caches – cached data very large
Very large disk drives – but modest increase in disk throughput
Application data I/O increased much faster – but require constant flush to disk
Cache writeback is used to smooth bursty I/O traffic to disk
Conclusion: cache writeback of large amounts of application data is slower
Motivation: changes in servers technology
![Page 4: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/4.jpg)
4
I/O speeds increase forcing caching large amounts of dirty pages at servers to hide disk latency
Large number of clients access servers increasing burstiness of disk I/O and need for cache
Large caches of the FS and servers allow longer retention
Cache writeback flush is based on cache fullness metrics
Flush to disk is done at maximum speed when cache full leaving no room for additional I/Os
As long as cache is full I/Os will have to wait for empty cache pages availability – I/O “stoppage”
Result application performance is lower than disk performance
Cache writeback problem statement
![Page 5: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/5.jpg)
5
Monitoring behavior of application data flush
Understanding the problem: •Instrument kernel to measure cache Dirty Pages dynamics•Monitor the behavior of DP in Buffer Cache•Run benchmark multi-client application
![Page 6: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/6.jpg)
6
Cache writeback as a closed loop system
Application controls the flush using I/O commit based on application cache state
– DP in cache are difference between incoming I/O and DP flushed to disk
– Goal is to keep difference/error zero– The error loop is closed as application send
commits after each I/O– Cache Writeback is controlled by application
Flush to disk based on state of fullness of the Buffer Cache
– Cache control mechanism ensure cache availability for new I/Os
– DP in cache like water in tank– Water level is controlled by cache manager to
prevent overflow– No relation between application I/O arrival and
when the I/O is flush to disk – Result in large delays between I/O creation
and I/O on disk – open loop – Cache writeback is controlled by algorithm
Dirty Pages &Buffer Cache
Dynamics
Cache WritebackAlgorithm
+
-
UserI/Os
I/OsIn Cache
I/OsFlushed
+
-
ApplicationCommits
DirtyPages
Dirty Pages &Buffer Cache
Dynamics
Cache WritebackAlgorithm
+
-UserI/Os
I/Os inCache
+
-
WatermarkFlushes
DirtyPages
SampleDirty Pages
Delaysec
![Page 7: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/7.jpg)
7
Current cache writeback methods
Trickle flush of DPs– Flush based on proportion of incoming application I/Os
(rate based)– Use low priority to reduce CPU consumption– Background task with low efficiency– Used only to reduce memory pressures– Cannot address high bursts of I/O
Watermark based flush of DPs– Inspired from database and transactional applications– Cache writeback triggered by number/proportion of DP
in the cache– There is no prediction of high I/O bursts – disadvantage
for multi-clients– Flush is done at maximum disk speed to reduce latency– Close to incoming I/O rate for small caches – flush often– Inefficient for very large caches– Interfere with metadata and read operations
N DirtyPages/sec
n Flushes/sec
Watermark increase(N-n)*t
File System userDirty Pages
Other Dirty Pages
![Page 8: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/8.jpg)
8
Current cache writeback deficiency
Watermark based flush of DPs is similar a non-linear saturation effect in the cache closed loop
Introduces oscillations in the DP behavior due to the saturation
The oscillation introduces additional I/O latencies to the disk latencies
Creates burstiness to the disk I/O – reduce aggregate performance
280 290 300 310 320 330 340 350 360-400
-200
0
200
400
600
800
1000
Time [sec]
Me
mo
ry [M
B];R
ate
[MB
/se
c]
Dirty pages=Blue;Rate of Change=Green
![Page 9: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/9.jpg)
9
I/O “slow down” problem
Application data flush require FS MD updates to same disks
Flush is triggered when high watermark threshold is crossed
Watermark based flushes cannot throttle the I/O speed as it is an ultimate resort before kernel crash on starvation
Additional I/Os are slowed down until the MD is flushed for the new arriving I/Os
Even if NVRAM is used the DP need to be removed from cache to make room for additional I/Os
Application I/Os latency increases until the cache is freed – “slow down”
In worst cases the latency is so high that resemble to a I/O stoppage
If additional burst of I/Os on other new clients there is no room to put I/Os and new I/Os will wait until the watermark goes under low watermark - stoppage
![Page 10: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/10.jpg)
10
New algorithms for cache writeback
Trying to address deficiency of current cache writeback methods
Inspired from control system and signal processing theory
Use adaptive control and machine learning methods
Utilize better modern HW characteristics
The goals of the solution are:– Reduce the I/O slowdown limited only by maximum disk I/O throughput– Reduce to minimum disk I/O burstiness and– Maximize aggregate I/O performance of the system (benchmark)
Same algorithms apply to network as well as local FSs
All the algorithms can be used for application DPs and MD DPs flush
![Page 11: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/11.jpg)
11
New algorithms for cache writeback (cont.)
We present and simulate only 5 algorithms (more were considered):– Modified Trickle Flush – improved version of trickle by changing priority
and use more CPU– Fixed Interval Algorithm – use a goal as target of number of DPs similar to
watermark methods but compensate better for bursts of I/O (semi-throttling) by pacing the flush to disk
– Variable Interval Algorithm – use an adaptive control scheme that adapt the time interval based on the change in DP during previous interval similar to trickle but with faster adaptation in response to I/O bursts
– Quantum Flush – use the idea of lowest retention of DP in cache similar to watermark based methods but adapt flush speed proportional to number of new I/Os in the previous sample time
– Rate of Change Proportional Algorithm – flushes DPs proportional to the first derivative of the number of DPs using fixed interval and a forgetting factor proportional to difference between I/O rate and maximum disk throughput:
c = R * (t - ti ) + W * μμ = α * (B – R) / B
![Page 12: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/12.jpg)
12
Simulation results of new algorithms
Selection of best algorithm by:– Optimal behavior to unexpected bursts of I/Os– Flush best matching the rate of change in DPs in the cache (minimum DP level)– Minimize I/O slow down to clients (reduce I/O average latency)
Rate of change based algorithm with forgetting factor was best
0 20 40 60 80 1000
500
1000
1500
2000
2500
3000
Time [sec]
# o
f D
irty
Pag
es
Dirty Pages in the Buffer Cache for all Algorithms - best version
Trickle 1 sec FIA FVA Quantum Rate alpha=0.16
![Page 13: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/13.jpg)
13
Experimental results of a real NFS server
We implemented the Modified Trickle and Rate Proportional algorithms on the Celerra NAS server
Used SPEC sfs2008 benchmark and measured the number of DP in cache with 4 msec resolution
Experimental results show some I/O slowdown using the MT algorithm resulting in 92K NFS iops (diagrams sampled at same 55K NFS iops level)
The Rate Proportional algorithm show much shorter I/O slow down time resulting in 110.6K NFS iops
0 20 40 60 80 100-300
-200
-100
0
100
200
300
Time [sec]
# D
irty
Pa
ge
s a
nd
Use
r I/O
s [1
00
0 IO
/se
c]
Dirty Pages in BC=green; User I/Os=red; Trickle Algorithm
0 20 40 60 80 100-300
-200
-100
0
100
200
300
Time [sec]
# D
irty
Pa
ge
s a
nd
Use
r I/O
s [1
00
0 IO
/se
c]
Dirty Pages in BC=green; User I/Os=red; Rate Proportional Algorithm
![Page 14: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/14.jpg)
14
Summary and conclusions
Discussed new algorithms and paradigm to address the cache writeback in modern FS and servers
Discussed how the new algorithm can reduce the impact of bursts of application I/Os to the aggregate I/O performance otherwise bounded by the maximum disk speeds
We show how current cache writeback algorithms create I/O slowdown at I/O speeds that are lower than disk speed but changing rapidly
We presented reduced number of algorithms that are presented in the literature explaining their deficiencies
We discuss several new algorithms and show simulation results that allowed us to select the best algorithm for experimentation
We presented experimental results for 2 algorithms and show that Rate Proportional is the best algorithm based on the given criteria of success
Finally we discuss how these algorithms can be used for MD and DP on any file system network or local
![Page 15: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/15.jpg)
15
Future work and extension to Linux FS
Investigation of additional algorithms inspired from signal processing of non-linear signals that address oscillatory behavior
Address similar behavior for cache writeback of local file systems including ext3, ReiserFS and ext4 in Linux OS (a discussion at next Linux workshop)
Linux FS developers are aware of this behavior and currently work to instrument the Linux kernel with same measurement tools as we used
We are also looking to use machine learning in order to be able to compensate for very fast I/O rate changes that will allow to optimize application performance for very large number of clients
Additional work is needed to find algorithms that will allow the maximum application performance equal the maximum aggregate disk performance
We are also looking to instrument NFS clients’ kernel to allow us evaluate the I/O slow down and tune the flush algorithm to reduce the slow down effect to zero
More work is needed to extend this study to MD and find new MD specific flushing methods
![Page 16: 1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S](https://reader030.vdocuments.net/reader030/viewer/2022032414/56649ef15503460f94c022aa/html5/thumbnails/16.jpg)