increasing intrusion tolerance via scalable redundancy

25
Carnegie Mello Approved for Public Release, Distribution Unlimited Increasing Intrusion Tolerance Via Scalable Redundancy Michael Reiter [email protected] Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor

Upload: anne-beard

Post on 02-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Increasing Intrusion Tolerance Via Scalable Redundancy. Michael Reiter [email protected] Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor. The Problem Space. Distributed services manage redundant state across servers to tolerate faults - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Increasing Intrusion Tolerance Via Scalable Redundancy

Michael [email protected]

Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor

Page 2: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

The Problem Space Distributed services manage redundant state across servers to

tolerate faults

We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client A faulty server or client may behave arbitrarily

We also make no timing assumptions in this work An “asynchronous” system

Page 3: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Our Goals To design, implement and evaluate new protocols for

implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of

failures tolerated grows

Targeting three types of services Read-write data objects Custom “flat” object types for particular applications, notably

directories for implementing an intrusion-tolerant file system Arbitrary objects that support object nesting

Page 4: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Expected Impact Significant efficiency and scalability benefits over today’s

approaches to intrusion tolerance

For example, for data services, we anticipate At-least twofold latency improvement even at small configurations

(e.g., tolerating 3-5 Byzantine server failures) over current best And improvements will grow as system scales up

A twofold improvement in throughput, again growing with system size

Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

Page 5: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Outline Concepts Challenges Techniques Systems Technology transfer

Page 6: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Concepts: Distributed Services Service, or object, abstraction Implementation

push pop sort

invoc

ation

resp

onse

Page 7: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Concepts: Linearizability [Herlihy & Wing 1991]

A strong and accepted semantics for shared objects mimics semantics of a centralized object implementation each method appears to be executed at a distinct point between its

invocation and response

time

c1

c2

Objectinvocations

Apparentexecution

Page 8: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

invinv

invinvinv

invinvinv

invinvinv

invinvinv

inv

Concepts: State Machine Replication Offers no load dispersion, and degrades as system scales

Servers

inv inv inv

Page 9: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Concepts: Wait-Freedom [Herlihy 1990]

A liveness property for object invocations Informally, an implementation is wait-free if any client’s

operation is guaranteed to complete Assuming a limit on the number of faulty servers [Jayanti et al.] But not assuming a limit on the number of faulty clients

Intuitively, wait-freedom precludes synchronization mechanisms that must be “unlocked” by a client

Only read-write objects can be implemented in a wait-free way Virtually any other object cannot (in an asynchronous system)

Page 10: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Challenges: Concurrency Concurrent updates can violate linearizability

Data Data

4 51 2 3

Servers

4 5 1 2 3

Page 11: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Challenges: Server Failures Can attempt to mislead clients

Typically addressed by “voting”

Servers

????

31 2 4 54’

Page 12: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

54

Challenges: Client Failures Byzantine client failures can also mislead clients

Typically addressed by submitting a request via an agreement protocol

Servers

Data?

1 2 3 4’ ?2’

Page 13: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Challenges: Object Nesting Distributed objects have stubs and replicas

Servers

Page 14: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Challenges: Object Nesting

Page 15: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: Versioning

D0 determined complete, returned

Tim

e Ø Ø Ø Ø ØD0 D0 D0

D1

T0T1

D0 D1 Ø

D0

T1

Client read operation after T1

1 2 3 4 5

Ø

D0

D1 latest candidateD1 incompleteD0 latest candidate

3 writes required

Page 16: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: RepairT

ime Ø Ø Ø Ø Ø

D0 D0 D0D1

T0T1T2

D0 D1D2T2

Client read operation after T2

D2

1 2 3 4 5D2 D2D2

Unreachable

D2 unclassifiableRepair D2

D2 D2

D2 D2

Return D2D2 latest candidate

Page 17: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: Quorum Systems A quorum system is a data redundancy technique that supports

load dispersion among servers Only a subset of servers are accessed in each operation

Ex: Grid with n=49, b=3C o n s t r u c t i o n R e s i l i e n c e Q u o r u m s i z e

T h r e s h o l d[ D C 1 9 9 8 ] 4/nb 3 n / 4

M - G r i d[ S I A M J o C 2 0 0 0 ] 2/nb bnO

B o o s t F P P[ S I A M J o C 2 0 0 0 ] 4/nb bnO

P r o b a b i l i s t i c[ I & C ] 2/nb nbO ,max

Page 18: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: Cross Checksums [Gong 1989]

A mechanism for defending against Byzantine servers that attempts to alter data in their possession Each data fragment is appended with a hash of all data fragments When retrieved, hashes are used as “votes” to determine correct data

fragments

Data-item

Data-fragmentsHashes

Crosschecksum

Page 19: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: Validating Timestamps A technique for defending against Byzantine clients that attempt

to write different data values at the same timestamp Cross-checksum of write value recorded in its timestamp Read results are used to regenerate all data fragments and compare them

to the timestamp

Hashes

Crosschecksum

All data-fragmentsData-item

Hash in timestamp

Timestamp

Read results

Page 20: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Techniques: Replicated Invocation b stub replicas cannot invoke

> b stub replicas can

Page 21: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Our Research To summarize, we will explore the use of these techniques for

implementing Read-write block storage (linearizable, wait-free) Specialized metadata objects (e.g., directories) necessary to construct a

fully functional file system (linearizable) A general framework for arbitrary deterministic objects (linearizable)

Not all techniques will be appropriate for all cases “Flat” objects as found in file systems will generally not utilize

replicated clients Nested objects may not benefit from versioning (TBD)

Page 22: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Systems: PASIS PASIS is a survivable storage system developed in a DARPA

IPTO project Funding ended December 2003

Examined the use of encoding schemes for efficiently distributing data storage while protecting confidentiality/integrity

Did not address concurrency control Clients would have to handle explicitly, e.g., using locking

Explored use of versioning for other purposes: recovery from user mistakes, system failures, penetrations Showed viability of comprehensive versioning

Page 23: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Systems: Fleet Fleet is a Java-based distributed object architecture developed in

previous projects in DARPA ATO Funding ended June 2004

Focused on the use of quorum systems for efficient object replication

Fleet does not support nested objects and nested method invocations

Nor does it support potentially faulty clients

Page 24: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Technology Transition Two primary channels are the industry consortia of two research

centers at Carnegie Mellon: CyLab and the Parallel Data Lab CyLab

A center focused on trustworthy and measurable computing Founded in 2003 through the merger of the Center for Computer and

Communications Security and the Sustainable Computing Consortium Corporate affiliate program includes over fifty companies, including

defense suppliers, tech companies and IT-based critical infrastructures

Parallel Data Lab A ten-year-old center focused on storage infrastructures Corporate affiliates include most major storage vendors

Both have a track record of technology transfer

Page 25: Increasing Intrusion Tolerance Via Scalable Redundancy

Carnegie Mellon

Approved for Public Release, Distribution Unlimited

Questions?