redundancy does not imply fault tolerance - usenix · redundancy does not imply fault tolerance:...

249
Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya Ganesan, Ramnatthan Alagappan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

Upload: others

Post on 08-Oct-2019

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does Not Imply Fault Tolerance:Analysis of Distributed Storage Reactions

to Single Errors and Corruptions

Aishwarya Ganesan, Ramnatthan Alagappan,

Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

Page 2: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy for Fault Tolerance

2

Page 3: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy for Fault Tolerance

2

Replication can mask failures

Page 4: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy for Fault Tolerance

2

Replication can mask failures

- System crashes

Page 5: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy for Fault Tolerance

2

Replication can mask failures

- System crashes

- Machine reboots

Page 6: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy for Fault Tolerance

2

Replication can mask failures

- Network failures

- System crashes

- Machine reboots

Page 7: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

Page 8: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

Depend on local file systems to store data

Page 9: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

How about partial storage faults?

Depend on local file systems to store data

Page 10: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

How about partial storage faults?

Depend on local file systems to store data

Page 11: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

How about partial storage faults?

Depend on local file systems to store data

Page 12: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

- disk block corruption

How about partial storage faults?

Depend on local file systems to store data

Page 13: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

- disk block corruption

File system may return I/O error on a read

How about partial storage faults?

Depend on local file systems to store data

Page 14: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

- disk block corruption

File system may return I/O error on a read

- latent sector error

How about partial storage faults?

Depend on local file systems to store data

Page 15: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

- disk block corruption

File system may return I/O error on a read

- latent sector error

We call these file-system faults

How about partial storage faults?

Depend on local file systems to store data

Page 16: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy in Distributed Storage

3

File system may return corrupted data on reads

- disk block corruption

File system may return I/O error on a read

- latent sector error

We call these file-system faults

How about partial storage faults?

Depend on local file systems to store data

Do distributed storage systems use redundancy to recover from local file-system faults?

Page 17: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our Study

4

Page 18: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our StudyBehavior of eight distributed systems in response to file-system faults

4

Page 19: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our StudyBehavior of eight distributed systems in response to file-system faults

Broad spectrum of replication and consensus protocols

4

Page 20: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our StudyBehavior of eight distributed systems in response to file-system faults

Broad spectrum of replication and consensus protocols

Replicated state machines- ZooKeeper (uses ZAB for consensus)

- LogCabin, CockroachDB, and RethinkDB (uses RAFT for consensus)

4

Page 21: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our StudyBehavior of eight distributed systems in response to file-system faults

Broad spectrum of replication and consensus protocols

Replicated state machines- ZooKeeper (uses ZAB for consensus)

- LogCabin, CockroachDB, and RethinkDB (uses RAFT for consensus)

Primary backup replication- MongoDB

- Redis

- Kafka (in-sync replicas for leader election)

4

Page 22: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Our StudyBehavior of eight distributed systems in response to file-system faults

Broad spectrum of replication and consensus protocols

Replicated state machines- ZooKeeper (uses ZAB for consensus)

- LogCabin, CockroachDB, and RethinkDB (uses RAFT for consensus)

Primary backup replication- MongoDB

- Redis

- Kafka (in-sync replicas for leader election)

Dynamo-style quorum- Cassandra (decentralized, no leader/follower)

4

Page 23: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault model

5

Page 24: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault model

A single fault to a single file-system block in a single node

5

Page 25: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault model

A single fault to a single file-system block in a single node

Type of faults:

- corruptions

- read errors

- write errors

5

Page 26: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Common Expectation

6

Page 27: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Common Expectation

6

Page 28: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Common Expectation

6

Fault Model: A single fault to one block in only one replica

Page 29: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Common Expectation

6

Fault Model: A single fault to one block in only one replica

Redundancy would enable recovery from local file-system faults

Page 30: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

7

Redundancy Does Not Imply Fault Tolerance

Page 31: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

7

Redundancy Does Not Imply Fault ToleranceA single fault in one node can cause catastrophic outcomes

Page 32: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

7

Redundancy Does Not Imply Fault ToleranceA single fault in one node can cause catastrophic outcomes

- data loss, corruption, unavailability, and spread of corruption to other intact replicas

Page 33: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

7

Redundancy Does Not Imply Fault ToleranceA single fault in one node can cause catastrophic outcomes

- data loss, corruption, unavailability, and spread of corruption to other intact replicas

Silent corruption

Unavailability Data lossReduced

redundancyQuery

failures

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 34: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

7

Redundancy Does Not Imply Fault ToleranceA single fault in one node can cause catastrophic outcomes

- data loss, corruption, unavailability, and spread of corruption to other intact replicas

Silent corruption

Unavailability Data lossReduced

redundancyQuery

failures

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 35: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

8

Page 36: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

Some fundamental problems across multiple systems – not just implementation bugs!

8

Page 37: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

Some fundamental problems across multiple systems – not just implementation bugs!

Faults are often undetected locally – leads to harmful global effects

8

Page 38: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

Some fundamental problems across multiple systems – not just implementation bugs!

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

8

Page 39: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

Some fundamental problems across multiple systems – not just implementation bugs!

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

Crash and corruption handling are entangled – loss of committed data

8

Page 40: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

Some fundamental problems across multiple systems – not just implementation bugs!

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

Crash and corruption handling are entangled – loss of committed data

Unsafe interaction between local behavior and global distributed protocols can spread corruption or data loss

8

Page 41: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Fundamental Problems: Summary

Page 42: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 43: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 44: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 45: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 46: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 47: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

9

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Fundamental Problems: Summary

Page 48: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Outline

Introduction

Fault Injection

System Behavior Analysis

Major Results

Observations Across Systems

Conclusion

10

Page 49: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault Model

11

Page 50: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault Model

11

Server 1 Server 2 Server 3

Page 51: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

Page 52: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

Page 53: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

File System

read

/w

rite

File System

read

/w

rite

File System

read

/w

rite

Page 54: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

File System

read

/w

rite

A single fault to a single file-system block in a single node

File System

read

/w

rite

File System

read

/w

rite

Page 55: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

File System

read

/w

rite

A single fault to a single file-system block in a single node

Fault for current run:server 1, block B1 read corruption

File System

read

/w

rite

File System

read

/w

rite

Page 56: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

File System

read

/w

rite

A single fault to a single file-system block in a single node

Fault for current run:server 1, block B1 read corruption

Fault for next run:server 1, block B1read error

File System

read

/w

rite

File System

read

/w

rite

Page 57: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model

11

Server 1 Server 2 Server 3Client

File System

read

/w

rite

A single fault to a single file-system block in a single node

Faults injected only to user data not filesystem metadata

Fault for current run:server 1, block B1 read corruption

Fault for next run:server 1, block B1read error

File System

read

/w

rite

File System

read

/w

rite

Page 58: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

File System File System File System

Page 59: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

File System File System File Systemext4 ext4ext4

Page 60: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

File System File System File Systemext4 ext4ext4

Page 61: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

a

File System File System File Systemext4 ext4ext4

Page 62: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

aC

orr

up

t d

ata

File System File System File Systemext4 ext4ext4

Page 63: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

aC

orr

up

t d

ata

Ext4: disk corruption → corrupted data to applications

File System File System File Systemext4 ext4ext4

Page 64: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Ext4: disk corruption → corrupted data to applications

File System File System File Systemext4 ext4ext4 btrfsbtrfs btrfs

Page 65: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

a

Ext4: disk corruption → corrupted data to applications

File System File System File Systemext4 ext4ext4 btrfsbtrfs btrfs

Page 66: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

a

Ext4: disk corruption → corrupted data to applications

File System File System File Systemext4 ext4ext4 btrfsbtrfs btrfs

I/O

Er

ror

Page 67: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Request

Fault Model: ext4 and btrfs

12

Server 1 Server 2 Server 3Client

Co

rru

pt

dat

a

Ext4: disk corruption → corrupted data to applications

Btrfs: disk corruption → I/O error to applications

File System File System File Systemext4 ext4ext4 btrfsbtrfs btrfs

I/O

Er

ror

Page 68: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault Injection Methodology

13

Server 1 Server 2 Server 3

File System File SystemFile System

Page 69: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault Injection Methodology

13

Server 1 Server 2 Server 3

File System File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

errfs - a FUSE file system to inject file-system faults

Page 70: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Fault Injection Methodology

13

Server 1 Server 2 Server 3

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

errfs - a FUSE file system to inject file-system faults

Page 71: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

errfs - a FUSE file system to inject file-system faults

Page 72: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

errfs - a FUSE file system to inject file-system faults

Page 73: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

errfs - a FUSE file system to inject file-system faults

Page 74: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

return B1’-B4

errfs - a FUSE file system to inject file-system faults

Page 75: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

return B1’-B4

errfs - a FUSE file system to inject file-system faults

Local Behavior

Page 76: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

return B1’-B4

errfs - a FUSE file system to inject file-system faults

Local BehaviorCrash RetryIgnore faulty dataNo detection/recovery

Page 77: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

return B1’-B4

errfs - a FUSE file system to inject file-system faults

Local BehaviorCrash RetryIgnore faulty dataNo detection/recovery Global Effect

Page 78: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read

Fault Injection Methodology

13

Server 1 Server 2 Server 3Client

File System

Fault for current run:server 1, block B1 read corruption File SystemFile Systemerrfs (FUSE FS)errfs (FUSE FS) errfs (FUSE FS)

read B1-B4

read B1-B4

return B1-B4

return B1’-B4

errfs - a FUSE file system to inject file-system faults

Local BehaviorCrash RetryIgnore faulty dataNo detection/recovery Global Effect

CorruptionData lossUnavailability

Page 79: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Outline

Introduction

Fault Injection

System Behavior Analysis

Major Results

Observations Across Systems

Conclusion

14

Page 80: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

System Behavior AnalysisBehavior of eight distributed systems in response to file-system faults

Broad spectrum of replication and consensus protocols

Replicated state machines- ZooKeeper (uses ZAB for consensus)

- LogCabin, CockroachDB, and RethinkDB (uses RAFT for consensus)

Primary backup replication- MongoDB

- Redis

- Kafka (in-sync replicas for leader election)

Dynamo-style quorum- Cassandra (decentralized, no leader/follower)

15

Page 81: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

An Example: Redis

16

Page 82: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

An Example: Redis

16

Redis is a popular data structure store

Page 83: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Leader

An Example: Redis

16

Redis is a popular data structure store

Page 84: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

Redis is a popular data structure store

Page 85: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 86: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

ClientWrite

Redis is a popular data structure store

Page 87: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

ClientWrite

Redis is a popular data structure store

Page 88: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

ClientWrite

Redis is a popular data structure store

Page 89: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

ClientWrite

Redis is a popular data structure store

Page 90: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 91: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

redis_database

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 92: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

redis_database

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 93: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

redis_database

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 94: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

redis_database

redis_database

redis_database

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 95: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Follower

Follower

Leader

An Example: Redis

16

redis_database

redis_database

redis_database

appendonlyfile

appendonlyfile

appendonlyfile

Redis is a popular data structure store

Page 96: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

Page 97: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

Read Workload

Page 98: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

Local Behavior

Read Workload

Page 99: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

Local Behavior

Read Workload

Page 100: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

Page 101: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderL

Page 102: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF

Page 103: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Page 104: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Page 105: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Page 106: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Local Behavior

Page 107: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Page 108: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Page 109: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Page 110: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

Page 111: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

UnavailabilityNo checksums to detect corruptionLeader crashes due to failed deserializationNo automatic failover - cluster unavailable

Page 112: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

Page 113: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

Page 114: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

Page 115: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

Page 116: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

Page 117: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

Page 118: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

Page 119: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

Page 120: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Page 121: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Page 122: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Page 123: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

CorruptionNo checksums to detect corruptionLeader returns corrupted data on queriesCorruption propagation to followers

Page 124: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Page 125: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Page 126: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Page 127: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Correct

Page 128: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Correct

Page 129: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redis: Behavior Analysis

17

CorruptRead

I/O Error

L LF F

Local Behavior

Read Workload

LeaderLFollowerF Crash

On-disk Structures

appendonlyfile.metadata

appendonlyfile.data

redis_database.block_0

redis_database.metadata

redis_database.userdata

Global Effect

CorruptRead

I/O Error

L LF F

Local Behavior

Global Effect

Unavailability

ReducedRedundancy

No Detection/ No Recovery

Corruption

Retry

WriteUnavailability

Correct

Page 130: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Other Systems

18

Page 131: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Other Systems

Metadata stores: ZooKeeper, LogCabin

Wide column store: Cassandra

Document stores: MongoDB

Distributed databases: RethinkDB, CockroachDB

Message Queues: Kafka

18

Page 132: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Outline

Introduction

Fault Injection

System Behavior Analysis

Major Results

Redundancy Does not Provide Fault Tolerance

Observations Across Systems

Conclusion

19

Page 133: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Page 134: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

ZooKeeper Write

Kafka Read

RethinkDB Read

Cassandra ReadKafka Write

Page 135: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

ZooKeeper WriteWrite Error

Kafka Read

RethinkDB ReadCorrupt

Cassandra ReadKafka Write

CorruptReadError Corrupt

ReadError

CorruptReadError

Page 136: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

Corrupt

Cassandra ReadKafka Write

checkpoint

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 137: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corrupt

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 138: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

CorruptionCorrupt

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 139: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Data Loss

Corrupt

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 140: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

Corrupt

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 141: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 142: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Query Failure

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Page 143: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Query Failure

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Reduced Redundancy

Page 144: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Query Failure

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Reduced Redundancy

Harmful global effects despite redundancy

Page 145: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

20

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Query Failure

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Reduced Redundancy

Harmful global effects despite redundancyNot simple implementation bugs - fundamental problems across multiple systems!

Page 146: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

OutlineIntroduction

Fault Injection

System Behavior Analysis

Major Results

Observations Across Systems

Faults are Often Undetected Locally

Crashing: Common Local Reaction

Crash and Corruption Handling are Entangled

Unsafe interaction between local and global protocols

Conclusion

21

Page 147: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Faults are Often Undetected Locally

22

Page 148: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

22

Page 149: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

Page 150: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

Cassandra: Locally Undetected Fault

Page 151: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

Client Replica 1 Other Replicas

Cassandra: Locally Undetected Fault

Page 152: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

Cassandra: Locally Undetected Fault

sstable

Page 153: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

Cassandra: Locally Undetected Fault

sstable

Page 154: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

Cassandra: Locally Undetected Fault

sstable

Page 155: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

Cassandra: Locally Undetected Fault

sstable

READ (R=1)

Page 156: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

Cassandra: Locally Undetected Fault

sstable

CORRUPT

READ (R=1)

Page 157: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read Repair

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

Cassandra: Locally Undetected Fault

sstable

CORRUPT

READ (R=1)

Page 158: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read Repair

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

key | value

Cassandra: Locally Undetected Fault

sstable

CORRUPT

READ (R=1)

Page 159: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read Repair

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

key | valueREAD

CORRUPT

Cassandra: Locally Undetected Fault

sstable

CORRUPT

READ (R=1)

Page 160: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Read Repair

sstable compression = offNo checksums to detect corruption

sstable

Faults are Often Undetected LocallyUndetected faults may lead to harmful global effect

- Locally undetected corruption → global silent corruption

22

key | value

Client Replica 1 Other Replicas

key | value

sstable.userdata corrupted

key | valueREAD

CORRUPT

Cassandra: Locally Undetected Fault

sstable

CORRUPT

READ (R=1)

Need for end-to-end integrity and error handling

Page 161: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crashing - Common Local Reaction

23

Page 162: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

Page 163: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

Crash

Leader

Follower

L

F

Page 164: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

Crash

Leader

Follower

L

F

Page 165: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

epochepoch_tmpmyidlog.transaction_headlog.transaction_bodylog.transaction_taillog.remaininglog.tail

ZooKeeper

L F

Crash

Leader

Follower

L

F

Page 166: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

epochepoch_tmpmyidlog.transaction_headlog.transaction_bodylog.transaction_taillog.remaininglog.tail

ZooKeeper

L F

Crash

Leader

Follower

L

F

Page 167: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

epochepoch_tmpmyidlog.transaction_headlog.transaction_bodylog.transaction_taillog.remaininglog.tail

ZooKeeper

L F

Crash

Leader

Follower

L

F

Crashing leads to reduced redundancy and imminent unavailability

Page 168: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

epochepoch_tmpmyidlog.transaction_headlog.transaction_bodylog.transaction_taillog.remaininglog.tail

ZooKeeper

L F

Crash

Leader

Follower

L

F

Crashing leads to reduced redundancy and imminent unavailabilityPersistent fault -- Requires manual intervention

Page 169: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

collections.headercollections.metadatacollections.dataindexjournal.headerjournal.otherstorage_bsonwiredtiger_wt

Crashing - Common Local Reaction

23

Many systems that reliably detect fault simply crash on encountering faults

MongoDBBlock Corruption during Read Workloads

L F

epochepoch_tmpmyidlog.transaction_headlog.transaction_bodylog.transaction_taillog.remaininglog.tail

ZooKeeper

L F

Crash

Leader

Follower

L

F

Crashing leads to reduced redundancy and imminent unavailabilityPersistent fault -- Requires manual intervention

Redundancy underutilized!

Page 170: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are Entangled

24

Page 171: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

24

Page 172: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

24

Page 173: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

data

24

Page 174: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

24

Page 175: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1

24

Page 176: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1

Append(log, entry 2)

24

Page 177: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1

Append(log, entry 2)

24

Page 178: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

24

Page 179: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

24

Page 180: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

24

Page 181: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

24

Page 182: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

Lose uncommitted data

24

Page 183: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

Lose uncommitted data

24

Page 184: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

Lose uncommitted data

0 1 2

24

Page 185: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

Lose uncommitted data

0 1 2

24

Page 186: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Lose uncommitted data

0 1 2

24

Page 187: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Lose uncommitted data

0 1 2

24

Page 188: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Action: Truncate log at 0

Lose uncommitted data

0 1 2

24

Page 189: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Action: Truncate log at 0

Lose uncommitted data

0 1 2

24

Page 190: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Action: Truncate log at 0

Lose uncommitted data

Lose committed data!

0 1 2

24

Page 191: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Action: Truncate log at 0

Lose uncommitted data

Lose committed data!

0 1 2

Developers of LogCabin and RethinkDB agree entanglement is the problem

24

Page 192: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Crash and Corruption Handling are EntangledKafka Message Log

0

checksum data

1 2

Append(log, entry 2)

Checksum mismatch

Checksum mismatch

Action: Truncate log at 1

Disk corruption

Action: Truncate log at 0

Lose uncommitted data

Lose committed data!

0 1 2

Developers of LogCabin and RethinkDB agree entanglement is the problem

24Need for discerning corruptions due to crashes from other type of corruptions

Page 193: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global Protocols

25

Page 194: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global Protocols

0 1 2

25

Kafka: Message log at Node 1

Page 195: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatch

0 1 2

25

Kafka: Message log at Node 1

Page 196: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0

0 1 2

25

Kafka: Message log at Node 1

Page 197: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Page 198: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

Page 199: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

Page 200: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Node1 Other Nodes

0 1 2

Page 201: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Node1 Other Nodes

0 1 2

Set of in-sync replicas

Page 202: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Node1 Other Nodes

0 1 2

Set of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Page 203: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Node1 Other Nodes

0 1 2

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 204: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other NodesREAD

0 1 2

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 205: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

0 1 2

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 206: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 207: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Assertion failure

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 208: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Assertion failureWRITE (W=2)

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 209: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Assertion failure

Failure

WRITE (W=2)

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Page 210: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Assertion failure

Failure

WRITE (W=2)

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Unsafe interaction between local behavior and leader election protocol leads to data loss and write unavailability

Page 211: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Unsafe Interaction between Local & Global ProtocolsDisk corruptionChecksum mismatchAction: Truncate log at 0Lose committed data!0 1 2

25

Kafka: Message log at Node 1

Local Behavior

0 1 2

Client Node1 Other Nodes

message:0

[Silent data loss]

READ

Truncate upto message 0

0 1 2

Assertion failure

Failure

WRITE (W=2)

Leader FollowersSet of in-sync replicas

Node1 with truncated log not removed from in-sync replicas

Node 1 elected as leader

Need for synergy between local behavior and global protocol

Unsafe interaction between local behavior and leader election protocol leads to data loss and write unavailability

Page 212: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Redundancy Does not Provide Fault Tolerance

26

Redis Read

CorruptReadError

txn_headlog.tail

ZooKeeper WriteWrite Error

log.headerlog.otherreplication

L F L F

L F

L F L F

Kafka Read

aof.metadataaof.datardb.metadatardb.userdata

RethinkDB Read

db.txn_headdb.txn_bodydb.txn_taildb.metablock

L F

Corruption

Write Unavailability

Data Loss

UnavailabilityCorrupt

Query Failure

Cassandra ReadKafka Write

checkpointL F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

sstable.block0sstable.metadatasstable.userdatasstable.index

Reduced Redundancy

Page 213: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Page 214: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Page 215: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Page 216: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Page 217: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Page 218: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Page 219: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Page 220: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Page 221: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Unsafe interaction between local and global protocols

Page 222: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Unsafe interaction between local and global protocols

Page 223: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Not simple implementation bugs - fundamental problems across multiple systems!

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Unsafe interaction between local and global protocols

Page 224: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Why does Redundancy Not Imply Fault Tolerance?

27

Redis Read

CorruptReadError

ZooKeeper Write

Write Error

L F L F

L F

L F L F

Kafka Read

RethinkDBRead

L F

Corrupt

Cassandra Read

Kafka Write

L F L F

CorruptReadError Corrupt

ReadError

CorruptReadError

Faults are often locally undetected

Not simple implementation bugs - fundamental problems across multiple systems!Redundancy underutilized as a source of recovery

Crashing on detecting faults is the common reaction

Crash and corruption handling are entangled

Unsafe interaction between local and global protocols

Page 225: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

28

Fundamental Problems: Summary

Page 226: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

28

Fundamental Problems: Summary

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 227: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

28

Fundamental Problems: Summary

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 228: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

28

Fundamental Problems: Summary

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 229: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

28

Fundamental Problems: Summary

More observations, results, and discussions in the paper …

Locally undetected

faults?

Crashing -common local

action?

Crash & corruption handling

entangled?

Unsafe interaction of local & global

protocols?

Redundancy underutilized for recovery?

Redis

ZooKeeper

Cassandra

Kafka

RethinkDB

MongoDB

LogCabin

CockroachDB

Page 230: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

OutlineIntroduction

Fault Injection

System Behavior Analysis

Major Results

Observations Across Systems

Faults are Often Undetected Locally

Crashing: Common Local Reaction

Crash and Corruption Handling are Entangled

Unsafe interaction between local and global protocols

Conclusion

29

Page 231: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Summary

30

Page 232: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

30

Page 233: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

30

Page 234: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

30

Page 235: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

30

Page 236: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

30

Page 237: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

Some fundamental problems across multiple systems:

30

Page 238: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

Some fundamental problems across multiple systems:

Faults are often undetected locally – leads to harmful global effects

30

Page 239: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

Some fundamental problems across multiple systems:

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

30

Page 240: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

Some fundamental problems across multiple systems:

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

Crash and corruption handling are entangled – loss of committed data

30

Page 241: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

SummaryWe analyzed distributed storage reactions to single file-system faults

Redis, ZooKeeper, Cassandra, Kafka, MongoDB, LogCabin, RethinkDB, and CockroachDB

Redundancy does not provide fault tolerance

A single fault in one node can cause catastrophic outcomes

data loss, corruption, unavailability, and spread of corruption to other intact replicas

Some fundamental problems across multiple systems:

Faults are often undetected locally – leads to harmful global effects

On detection, crashing is the common action – redundancy underutilized

Crash and corruption handling are entangled – loss of committed data

Unsafe interaction between local behavior and global distributed protocols can spread corruption or data loss

30

Page 242: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

31

Page 243: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

31

Page 244: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

31

Page 245: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

Detecting faults and not using redundancy to recover is undesirable

31

Page 246: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

Detecting faults and not using redundancy to recover is undesirable

Cannot always assume corruption to be caused by a crash

31

Page 247: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

Detecting faults and not using redundancy to recover is undesirable

Cannot always assume corruption to be caused by a crash

Local behavior has implications for distributed systems

31

Page 248: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

Detecting faults and not using redundancy to recover is undesirable

Cannot always assume corruption to be caused by a crash

Local behavior has implications for distributed systems

Our study provides directions for more robust distributed storage design

31

Page 249: Redundancy Does Not Imply Fault Tolerance - USENIX · Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions Aishwarya

Conclusion

Most distributed systems not yet resilient

Always detect faults – important in layered stacks on commodity hardware

Detecting faults and not using redundancy to recover is undesirable

Cannot always assume corruption to be caused by a crash

Local behavior has implications for distributed systems

Our study provides directions for more robust distributed storage design

Our fault injection framework available online: http://research.cs.wisc.edu/adsl/Software/cords/

Thank you!31