carnegie mellon increasing intrusion tolerance via scalable redundancy greg ganger...

Carnegie Mellon

Increasing Intrusion Tolerance Via Scalable Redundancy

Greg [email protected]

Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck Cranor

Carnegie Mellon

Technical Objective To design, implement and evaluate new protocols for

implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of

failures tolerated grows

Targeting three types of services Read-write data objects Custom “flat” object types for particular applications, notably

directories for implementing an intrusion-tolerant file system Arbitrary objects that support object nesting

Carnegie Mellon

Expected Impact Significant efficiency and scalability benefits over today’s

protocols for intrusion tolerance

For example, for data services, we anticipate At-least twofold latency improvement even at small configurations

(e.g., tolerating 3-5 Byzantine server failures) over current best And improvements will grow as system scales up

A twofold improvement in throughput, again growing with system size

Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

Carnegie Mellon

The Problem Space Distributed services manage redundant state across servers to

tolerate faults We consider tolerance to Byzantine faults, as might result from an

intrusion into a server or client A faulty server or client may behave arbitrarily

We also make no timing assumptions in this work An “asynchronous” system

Primary existing practice: replicated state machines Offers no load dispersion, requires data replication, and degrades as

system scales with O(N2) messages

Carnegie Mellon

Our approach Combine techniques to eliminate work in common cases

Server-side versioning allows optimism with read-time repair, if nec. allows work to be off-loaded to clients in lieu of server agreement

Quorum systems (and erasure coding) allows load dispersion (and more efficient redundancy for bulk

data) Several others applied to defend against Byzantine actions

Major risk? could be complex for arbitrary objects

Carnegie Mellon

Evaluation We are Scenario I: “centralized server setting”

Baseline: the BFT library Popular, publicly available implementation of Byzantine fault-tolerant

state machine replication (by Castro & Liskov) Reported to be an efficient implementation of that approach

Two measures Average latency of operations, from client’s perspective Peak sustainable throughput of operations

Our consistency definition: linearizability of invocations

Carnegie Mellon

Outline Overview Read-write storage protocol Some results Continuing work

Carnegie Mellon

Read-write block storage Clients erasure-code/replicate blocks into fragments Storage-nodes version fragments on every write

Storage-nodes

F3F1 F2 F4 F5

Client Data block

FragmentsF1 F2 F3 F4 F5

Carnegie Mellon

Challenges: Concurrency Concurrent updates can violate linearizability

Data Data

4 51 2 3

Servers

4 5 1 2 3

Carnegie Mellon

Challenges: Server Failures Can attempt to mislead clients

Typically addressed by “voting”

Servers

????

31 2 4 54’

Carnegie Mellon

54

Challenges: Client Failures Byzantine client failures can also mislead clients

Typically addressed by submitting a request via an agreement protocol

Servers

Data?

1 2 3 4’ ?2’

Carnegie Mellon

Consistency via versioning

Leverage versioning storage nodes for consistency

Allow writes to proceed with versioning All writes create new data versions Partial writes and concurrency won’t destroy data

Reader detects and resolves update conflicts Concurrency rare in FS workloads (typically < 1%) Offloads work to client resulting in greater scalability

Only perform extra work when needed Optimistically assume fault-free, concurrency-free operation Single round-trip for reads and writes in common case

Carnegie Mellon

Our system model

Crash-recovery storage-node fault model Up to t total bad storage-nodes (crashed/Byzantine) Up to b ≤ t Byzantine (arbitrary faults) So, t - b faults are crash-recovery faults

Client fault model Any number of crash or Byzantine clients

Asynchronous timing model Point-to-point authenticated channels

Carnegie Mellon

Read/write protocol Unit of update: a block

Complete blocks are read and written Erasure-coding may be used for space-efficiency

Update semantics: Read–write No guarantee of contents between read & write Sufficient for block-based storage

Consistency: Linearizability Liveness: wait-freedom

Carnegie Mellon

R/W protocol: Write

1. Client erasure-codes data-item into N data-fragments

2. Client tags write requests with logical timestamp Round-trip required to read logical time

3. Client issues requests to at least W storage-nodes

4. Storage-nodes validate integrity of request

5. Storage-nodes insert request into version history

6. Write completes after W requests have completed

Carnegie Mellon

R/W protocol: Read1. Client reads latest version from storage-node subset

Read set guaranteed to intersect with latest complete write

2. Client determines latest candidate write (candidate)

Set of responses containing the latest timestamp

3. Client classifies the candidate as one of: Complete Incomplete Repairable

For consistency: only complete writes can be returned

Carnegie Mellon

R/W protocol: Read classification Based on client’s (limited) system knowledge

Failures and asynchrony lead to imperfect information

Candidate classification rules: Complete: candidate exists on W nodes

candidate is decoded and returned

Incomplete: candidate exists on W nodes Read previous version to determine new candidate Iterate…perform classification on new candidate

Repairable: candidate may exist on W nodes Repair and return data-item

Carnegie Mellon

D0 determined complete, returned

Example: Successful read(N=5, W=3, t=1, b=0)

Tim

e Ø Ø Ø Ø ØD0 D0 D0

D1

T0T1

Storage Nodes

D0 D1

D0

T1

Client read operation after T1

1 2 3 4 5

ØD0

D1 latest candidateD1 incompleteD0 latest candidate

Carnegie Mellon

Example: Repairable read(N=5, W=3, t=1, b=0)

Tim

e Ø Ø Ø Ø ØD0 D0 D0

D1

T0T1T2

Storage Nodes

D0 D1D2T2

Client read operation after T2

D2

1 2 3 4 5D2 D2D2

D2 repairableRepair D2

D2 D2

D2 D2

Return D2D2 latest candidate

Carnegie Mellon

Protecting against Byzantine storage-nodes Must defend against servers that modify data in their possession

Solution: Cross checksums [Gong 89] Hash each data-fragment Concatenate all N hashes Append cross checksum to each fragment Clients verify hashes against fragments and use cross checksums as

“votes”

Data-item

Data-fragmentsHashes

Crosschecksum

Carnegie Mellon

Protecting against Byzantine clients Must ensure all fragment sets decode to same value

Solution: Validating timestamps Write: place hash of cross checksum in timestamp

also prevents multiple values being written at same timestamp Storage-nodes validate their fragment against corresponding hash Read: regenerate fragments and cross checksum

Data-items

Data-fragments

≠

Example: Byzantine encoding with “poisonous” fragment

F1 F2 F3 F4 F5

Carnegie Mellon

Experimental setup Prototype system: PASIS 20 node cluster

Dual 1 GHz Pentium III storage-nodes Single 2 GHz Pentium IV clients

100 Mb switched Ethernet 16 KB data-item size (before encoding)

Blowup of over the data-item size Each fragment is the data-item size

Carnegie Mellon

PASIS response time

1 2 3 40

2

4

6

8

10

12

14

16

18

20M

ean

resp

onse

tim

e (m

s)

Total failures tolerated (t)

1-way 16KB ping

Writes b = t

Reads b = t Writes b = 1

Reads b = 1

Fault modelsb = t and b = 1

N = 2t + 2b + 1

N = 17N = 11

Decode computationNW delay: redundant fragments

Carnegie Mellon

Throughput experiment

Same system set-up as resp. time experiment Clients issue read or write requests

Increase number of clients to increase load

Demonstrate value of erasure-codes Increase m to reduce per storage-node load

Compare with Byzantine atomic broadcast BFT library [Castro & Liskov 99] Supports arbitrary operations Replica (with multicast): limits write throughput O(N2) messages: limits performance scalability

Carnegie Mellon

Reduce per storage-node loadwith erasure-codes

BFT uses replicationwhich increases per storage-node load

PASIS vs. BFT: Write throughput

0 2 4 6 80

500

1000

1500

2000

2500

3000

3500

Thr

ough

put

(req

/s)

Clients

PASISPASISBFT

m Nb = t = 1

2 53 61 4

60%

PASIS has higher writethroughput than BFT

Carnegie Mellon

PASIS vs. BFT: Read throughput

0 2 4 6 80

500

1000

1500

2000

2500

3000

3500

Thr

ough

put

(req

/s)

Clients

PASISPASISBFT

m N

2 5

b = t = 1

3 61 4

Carnegie Mellon

Continuing work New testbed: 70 servers connected with switched Gbit/sec

experiments can then explore higher scalability points baseline and our results will come from this testbed

Protocol for arbitrary deterministic functions on objects built from same basic primitives

Protocol for objects with nested objects adds requirement of replicated invocations

Carnegie Mellon

Summary Goal: To design, implement and evaluate new protocols for

implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of

failures tolerated grows

Started with a protocol for read-write storage based on versioning and quorums scales efficiently (and much better than BFT) also flexible (can add assumptions to reduce costs)

Going forward (in progress) generalize types of objects and operations that can be supported

Carnegie Mellon

Questions?

Carnegie Mellon

Garbage collection Pruning old versions is necessary to reclaim space

Versions prior to latest complete write can be pruned

Storage-nodes need to know latest complete write In isolation they do not have this information Perform read operation to classify latest complete write

Many possible policies exist for when to clean what

Best to clean during idle time (if possible) Rank blocks in order of greatest potential gains Work remains in this area

carnegie mellon increasing intrusion tolerance via scalable redundancy greg ganger...

Documents

carnegie mellon challenges

carnegie mellon evaluation

work slide

intrusion tolerance

data services

byzantine server failures

intrusiontolerant services

data replication