enlightening the i/o path: a holistic approach for application performance (usenix fast'17)

59
Enlightening the I/O Path: A Holistic Approach for Application Performance Sangwook Kim 13 , Hwanju Kim 2 , Joonwon Lee 3 , and Jinkyu Jeong 3 Apposha 1 Dell EMC 2 Sungkyunkwan University 3

Upload: sangwook-kim

Post on 19-Mar-2017

103 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Enlightening the I/O Path:A Holistic Approach for Application Performance

Sangwook Kim13, Hwanju Kim2, Joonwon Lee3, and Jinkyu Jeong3

Apposha1

Dell EMC2

Sungkyunkwan University3

Page 2: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Data-Intensive Applications

2

Relational

Key-value

SearchColumn

Document

Page 3: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Data-Intensive Applications

• Common structure

3

Storage Device

Operating System

T1

Client

T2

I/O

T3 T4

Request Response

I/O I/O I/O

Application

Application

performance

* Example: MongoDB

- Client (foreground)

- Checkpointer

- Log writer

- Eviction worker

- …

Page 4: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Data-Intensive Applications

• Common structure

4

Storage Device

Operating System

T1

Client

T2

I/O

T3 T4

Request Response

I/O I/O I/O

Application

Application

performance

* Example: MongoDB

- Server (client)

- Checkpointer

- Log writer

- Evict worker

- …

Background tasks are problematic

for application performance

Page 5: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Impact

5

• Illustrative experiment

• YCSB update-heavy workload against MongoDB

Page 6: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Impact

6

• Illustrative experiment

• YCSB update-heavy workload against MongoDB

0

5000

10000

15000

20000

25000

30000

0 200 400 600 800 1000 1200 1400 1600 1800

Opera

tion thro

ughput

(ops/

sec)

Elapsed time (sec)

CFQ

Regular checkpoint task

30 seconds latency at 99.99th percentile

Page 7: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

0

5000

10000

15000

20000

25000

30000

0 200 400 600 800 1000 1200 1400 1600 1800

Opera

tion thro

ughput

(ops/

sec)

Elapsed time (sec)

CFQ CFQ-IDLE

Application Impact

7

• Illustrative experiment

• YCSB update-heavy workload against MongoDB

I/O priority does not help

Page 8: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

0

5000

10000

15000

20000

25000

30000

0 200 400 600 800 1000 1200 1400 1600 1800

Opera

tion thro

ughput

(ops/

sec)

Elapsed time (sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO

Application Impact

8

• Illustrative experiment

• YCSB update-heavy workload against MongoDB

State-of-the-art schedulers do not help much

Page 9: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

What’s the Problem?

• Independent policies in multiple layers

• Each layer processes I/Os w/ limited information

• I/O priority inversion

• Background I/Os can arbitrarily delay foreground tasks

9

Page 10: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

What’s the Problem?

• Independent policies in multiple layers

• Each layer processes I/Os w/ limited information

• I/O priority inversion

• Background I/Os can arbitrarily delay foreground tasks

10

Page 11: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

• Independent I/O processing

11

Storage Device

Caching Layer

Application

File System Layer

Block LayerAb

str

actio

n

Page 12: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

12

Storage Device

Caching Layer

Application

File System Layer

Block Layer

Buffer Cache

read() write()admission

control

Ab

str

actio

n

• Independent I/O processing

Page 13: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

• Independent I/O processing

13

Storage Device

Caching Layer

Application

File System Layer

Block Layer

Buffer Cache

read() write()

Block-level Q

admission control

Ab

str

actio

n

Page 14: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

• Independent I/O processing

14

Storage Device

Caching Layer

Application

File System Layer

Block LayerAb

str

actio

n

Buffer Cache

read() write()

reorder

FG FG BGBG

Page 15: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

• Independent I/O processing

15

Storage Device

Caching Layer

Application

File System Layer

Block LayerAb

str

actio

n

Buffer Cache

read() write()

FG FGBG

Device-internal Q

admission control

Page 16: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Multiple Independent Layers

• Independent I/O processing

16

Storage Device

Caching Layer

Application

File System Layer

Block LayerAb

str

actio

n

Buffer Cache

read() write()

FG FGBG

BG FG BGBG

reorder

Page 17: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

What’s the Problem?

• Independent policies in multiple layers

• Each layer processes I/Os w/ limited information

• I/O priority inversion

• Background I/Os can arbitrarily delay foreground tasks

17

Page 18: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• Task dependency

18

Storage Device

Caching Layer

Application

File System Layer

Block Layer

Locks

Condition variables

Page 19: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• Task dependency

19

Storage Device

Caching Layer

Application

File System Layer

Block Layer

Condition variables

I/OFGlock

BGwait

Page 20: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• Task dependency

20

Storage Device

Caching Layer

Application

File System Layer

Block Layer

I/OFGlock

BGwait

FGwait

wait

BGvarwake

Page 21: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• Task dependency

21

Storage Device

Caching Layer

Application

File System Layer

Block Layer

I/O

FGwait

wait

BGuser var

wake

FGwait

Page 22: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• I/O dependency

22

Storage Device

Caching Layer

Application

File System Layer

Block Layer

Outstanding I/Os

Page 23: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inversion

• I/O dependency

23

Storage Device

Caching Layer

Application

File System Layer

Block Layer

I/OFGwait

For ensuring consistency and/or durability

Page 24: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

24

100 ms latency at 99.99th percentile

0

5000

10000

15000

20000

25000

30000

35000

0 200 400 600 800 1000 1200 1400 1600 1800

Opera

tion thro

ughput

(ops/

sec)

Elapsed time (sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

• Request-centric I/O prioritization (RCP)

• Critical I/O: I/O in the critical path of request handling

• Policy: holistically prioritizes critical I/Os along the I/O path

Our Approach

Page 25: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Challenges

• How to accurately identify I/O criticality

• How to effectively enforce I/O criticality

25

Page 26: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Critical I/O Detection

• Enlightenment API

• Interface for tagging foreground tasks

• I/O priority inheritance

• Handling task dependency

• Handling I/O dependency

26

Page 27: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling task dependency

• Locks

• Condition variables

27

FGlock

BG I/OFG

inherit

BG

submit

complete

FG BG

unlock

FGwait

BG

register

BG

inherit

FG BGI/O

submit

complete

wake

CV CV CV

Page 28: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

28

Page 29: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

29

Block Layer

I/OI/O

Page 30: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

30

Block Layer

Q admission stage

I/OI/O

Sched queueing stage

Page 31: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

31

Block Layer

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Q admission stage

I/OI/O

Sched queueing stage

Non-critical I/O tracking

Descriptor

Location

Resolver

Sector #

Page 32: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

32

Block Layer

Descriptor Location Resolver Sector #

allocate

Q admission stage

I/O

I/O

I/O

Sched queueing stage

Non-critical I/O tracking

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Page 33: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

33

Block Layer

Q admission stage

I/O

I/O

I/O

Sched queueing stage

Non-critical I/O tracking

update

Descriptor Location Resolver Sector #

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Page 34: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

34

Block Layer

Q admission stage

I/O

I/O

Sched queueing stage

I/O

Non-critical I/O tracking

update Descriptor Location Resolver Sector #

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Page 35: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

35

Block Layer

Q admission stage

I/O

I/O

Sched queueing stage

I/O

Non-critical I/O tracking

updateDescriptor Location Resolver Sector #

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Page 36: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

36

Block Layer

Q admission stage

I/O

I/O

Sched queueing stage

I/O

Non-critical I/O tracking

update

Descriptor Location Resolver Sector #

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

Page 37: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

I/O Priority Inheritance

• Handling I/O dependency

37

Block Layer

Q admission stage

I/O

I/O

Sched queueing stage

I/O

Non-critical I/O tracking

Descriptor Location Resolver Sector #

PER-DEV ROOT

NCIO NCIO

NCIO NCIO

delete on completion

Page 38: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Handling Transitive Dependency

• Possible states of dependent task

38

FG

inherit

BG BG

Blocked on task

I/OFG

inherit

BG

wait wait

Blocked on I/O

FG

inherit

BG

wait

Blocked at admission stage

Page 39: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Handling Transitive Dependency

• Recording blocking status

39

FG

inherit

BG BG

Blocked on task

I/OFG

inherit

BG

wait wait

Blocked on I/O

FG

inherit

BG

wait

Blocked at admission stage

Page 40: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Handling Transitive Dependency

• Recording blocking status

40

FG

inherit

BG BG

Task is recorded

I/OFG

inherit

BG

wait

Blocked on I/O

FG

inherit

BG

wait

Blocked at admission stage

inherit

Page 41: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Handling Transitive Dependency

• Recording blocking status

41

FG

inherit

BG BG I/OFG

inherit

BG

reprio

I/O is recorded

FG

inherit

BG

wait

Blocked at admission stage

Task is recorded

inherit

Page 42: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Handling Transitive Dependency

• Recording blocking status

42

FG

inherit

BG BG I/OFG

inherit

BG FG

inherit

BG

retryreprio

I/O is recorded

Task is recorded

inherit

Page 43: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Challenges

• How to accurately identify I/O criticality

• Enlightenment API

• I/O priority inheritance

• Recording blocking status

• How to effectively enforce I/O criticality

43

Page 44: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Criticality-Aware I/O Prioritization

• Caching layer

• Apply low dirty ratio for non-critical writes (1% by default)

• Block layer

• Isolate allocation of block queue slots

• Maintain 2 FIFO queues

• Schedule critical I/O first

• Limit # of outstanding non-critical I/Os (1 by default)

• Support queue promotion to resolve I/O dependency

44

Page 45: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Evaluation

• Implementation on Linux 3.13 w/ ext4

• Application studies

• PostgreSQL relational database

• Backend processes as foreground tasks

• I/O priority inheritance on LWLocks (semop)

• MongoDB document store

• Client threads as foreground tasks

• I/O priority inheritance on Pthread mutex and condition vars (futex)

• Redis key-value store

• Master process as foreground task

45

Page 46: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Evaluation

• Experimental setup

• 2 Dell PowerEdge R530 (server & client)

• 1TB Micron MX200 SSD

• I/O prioritization schemes

• CFQ (default), CFQ-IDLE

• SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15]

• QASIO [FAST’15]

• RCP

46

Page 47: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• PostgreSQL w/ TPC-C workload

47

0

1000

2000

3000

4000

5000

6000

7000

10GB dataset 60GB dataset 200GB dataset

Transa

ctio

n thro

ughput

(trx

/sec)

CFQ CFQ-IDLE

Page 48: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• PostgreSQL w/ TPC-C workload

48

0

1000

2000

3000

4000

5000

6000

7000

10GB dataset 60GB dataset 200GB dataset

Transa

ctio

n thro

ughput

(trx

/sec)

CFQ CFQ-IDLE SPLIT-A

Page 49: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• PostgreSQL w/ TPC-C workload

49

0

1000

2000

3000

4000

5000

6000

7000

10GB dataset 60GB dataset 200GB dataset

Transa

ctio

n thro

ughput

(trx

/sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D

Page 50: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• PostgreSQL w/ TPC-C workload

50

0

1000

2000

3000

4000

5000

6000

7000

10GB dataset 60GB dataset 200GB dataset

Transa

ctio

n thro

ughput

(trx

/sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO

Page 51: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• PostgreSQL w/ TPC-C workload

51

0

1000

2000

3000

4000

5000

6000

7000

10GB dataset 60GB dataset 200GB dataset

Transa

ctio

n thro

ughput

(trx

/sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

37%

31%28%

Page 52: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• Impact on background task

52

0

5

10

15

20

25

30

35

0 200 400 600 800 1000 1200 1400 1600 1800

Transa

ctio

n log s

ize (GB)

Elapsed time (sec)

CFQ CFQ-IDLE QASIO

Page 53: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• Impact on background task

53

0

5

10

15

20

25

30

35

0 200 400 600 800 1000 1200 1400 1600 1800

Transa

ctio

n log s

ize (GB)

Elapsed time (sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO

Page 54: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Throughput

• Impact on background task

54

Our scheme improvesapplication throughput w/o penalizing background tasks

0

5

10

15

20

25

30

35

0 200 400 600 800 1000 1200 1400 1600 1800

Transa

ctio

n log s

ize (GB)

Elapsed time (sec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

Page 55: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Latency

• PostgreSQL w/ TPC-C workload

55

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

0 1000 2000 3000 4000 5000 6000

CCD

F P[X

>=x]

Transaction latency (msec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO

100

10-1

10-2

10-3

10-4

10-5

Over 2 sec at 99.9th

0th

90th

99th

99.9th

99.99th

99.999th

Page 56: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Application Latency

• PostgreSQL w/ TPC-C workload

56

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

0 1000 2000 3000 4000 5000 6000

CCD

F P[X

>=x]

Transaction latency (msec)

CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

100

10-1

10-2

10-3

10-4

10-5

300 msecat 99.999th

Our scheme is effective for improving tail latency

Over 2 sec at 99.9th

100

10-1

10-2

10-3

10-4

10-5

0th

90th

99th

99.9th

99.99th

99.999th

Page 57: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Summary of Other Results

• Performance results

• MongoDB: 12%-201% throughput, 5x-20x latency at 99.9th

• Redis: 7%-49% throughput, 2x-20x latency at 99.9th

• Analysis results

• System latency analysis using LatencyTOP

• System throughput vs. Application latency

• Need for holistic approach

57

Page 58: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Conclusions

• Key observation• All the layers in the I/O path should be considered as a whole with I/O

priority inversion in mind for effective I/O prioritization

• Request-centric I/O prioritization• Enlightens the I/O path solely for application performance

• Improves throughput and latency of real applications

• Ongoing work• Practicalizing implementation

• Applying RCP to database cluster with multiple replicas

58

Page 59: Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Thank You!

• Questions and comments

• Contact

[email protected]

59