scalability, accountability and instant information access for network centric warfare

Johns Hopkins & Purdue 115 Dec 05

Scalability, Accountability and Instant Information Access for

Network Centric Warfare

Department of Computer ScienceJohns Hopkins University

Yair Amir, Claudiu Danilov, Danny Dolev, Jon Kirsch, John Lane, Jonathan Shapiro

Chi-Bun Chan, Cristina Nita-Rotaru, Josh OlsenDavid Zage

Department of Computer SciencePurdue University

http://www.cnds.jhu.edu


Dealing with Insider Threats

• Scaling survivable replication to wide area networks.– Overcome 5 malicious replicas.

– SRS goal: Improve latency by a factor of 3.

– Self imposed goal: Improve throughput by a factor of 3.

– Self imposed goal: Improve availability of the system.

• Dealing with malicious clients.– Compromised clients can inject authenticated but

incorrect data - hard to detect on the fly.

– Malicious or just an honest error? Can be useful for both.

• Exploiting application update semantics for replication speedup in malicious environments.– Weaker update semantics allows for immediate

response.

Project Goals:


State Machine Replication

• Main Challenge: Ensuring coordination between servers.– Requires agreement on the request to be

processed and consistent order of requests.

• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.

• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.


State of the Art in Byzantine ReplicationBFT [CL99]

Baseline technology

C

0

1

2

requ es t p re -p rep a re p rep a re com m it

3

rep ly


The Paxos ProtocolNormal Case, after leader election

[Lam98]

Key: A simple end-to-end algorithm

C

0

1

2

request proposal accept reply


Steward: Survivable Technology for Wide Area Replication

• Each site acts as a trusted logical unit that can crash or partition.

• Effects of malicious faults are confined to the local site.– Threshold signatures prove agreement to other sites.

• Between sites:– Fault-tolerant protocol between sites.

• There is no free lunch – we pay with more hardware…

Server

Replicas 1 o o o2 3 3f+1

ClientsA site


Challenges (I)

• Each site has a representative that:– Coordinates the Byzantine protocol inside the site.– Forwards packets in and out of the site.

• One of the sites act as the leader in the wide area protocol– The representative of the leading site is the one assigning

sequence numbers to updates.

• How do we select and change the representatives and the leader site, in agreement ?

• How do we transition safely when we need to change them ?


Challenges (II)

• Messages coming out of a site during leader election are based on communication between 2f+1(out of 3f+1) servers inside the site.– There can be multiple sets of 2f+1 servers.– In some instances, multiple correct but different site

messages can be issued by a malicious representative.– It is sometimes impossible to completely isolate a malicious

server behavior inside its own site.

• This behavior can happen in two instances:– The servers inside a site propose a new leading site.– The servers inside a site report their individual status with

respect to the global site progress.

• Developed a detailed proof of correctness of the protocol.


Main idea• Sites change their local

representatives based on timeouts.

• Leader site representative has a larger timeout .– allows forcommunication with at least one correct rep. at other sites.

• After changing f+1 leader site representatives, servers at all sites stop participating in the protocol, and elect a different leading site.


Steward: First Byzantine Replication Scalable to Wide

Area Networks• A second iteration implementation

– Based on the complete theoretical design.– Follows closely the pseudocode proven to be

correct.

• We benchmarked the new implementation against the program metrics.

• The code successfully passed the red-team experiment.

• We believe it is theoretically unbreakable.


Testing Environment

Platform: Dual Intel Xeon CPU 3.2 GHz 64 bits 1 GByte RAM, Linux Fedora Core 3.

Library relies on Openssl :- Used OpenSSL 0.9.7a 19 Feb 2003.

Baseline operations:- RSA 1024-bits sign: 1.3 ms, verify: 0.07 ms.- Perform modular exponentiation 1024 bits, ~1 ms.- Generate a 1024 bits RSA key ~55ms.


Evaluation Network 1: Symmetric Wide Area Network

• Synthetic network used for analysis and understanding.

• 5 sites, each of which connected to all other sites with equal bandwidth/latency links.

• One fully deployed site of 16 replicas; the other sites are emulated by one computer each.

• Total – 80 replicas in the system, emulated by 20 computers.

• 50 ms wide area links between sites.

• Varied wide area bandwidth and the number of clients.


Write Update Performance

• Symmetric network.• 5 sites.

• Steward:• 16 replicas per site. • Total of 80 replicas (four

sites are emulated).• Actual computers: 20.

• BFT:• 16 replicas total.• 4 replicas in one site, 3

replicas in each other site.

• Update only performance (no disk writes).

Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward 10Mbps

Steward 5Mbps

Steward 2.5Mbps

BFT 10Mbps

BFT 5Mbps

BFT 2.5Mbps

Update Latency

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)Steward 10Mbps

Steward 5Mbps

Steward 2.5Mbps

BFT 10Mbps

BFT 5Mbps

BFT 2.5Mbps


Read-only Query Performance

• 10 Mbps on wide area links.

• 10 clients inject mixes of read-only queries and write updates.

• None of the systems was limited by bandwidth.

• Performance improves between a factor of two and more than an order of magnitude.

• Availability: Queries can be answered locally, within each site.

Query Mix Throughput

0

50

100

150

200

250

300

350

400

450

500

0 10 20 30 40 50 60 70 80 90 100

Update ratio (%)

Act

ion

s/se

c

Steward

BFT

Query Mix Latency

0

50

100

150

200

250

300

350

400

450

500

0 10 20 30 40 50 60 70 80 90 100

Update ratio (%)

Lat

ency

(m

s)

Steward

BFT


Evaluation Network 2:Practical Wide-Area Network

• Based on a real experimental network (CAIRN). • Modeled on our cluster, emulating bandwidth and latency

constraints, both for Steward and BFT.

ISIPC

ISIPC4

TISWPC

ISEPC3

ISEPC

UDELPC

MITPC

38.8 ms1.86Mbits/sec

1.4 ms1.47Mbits/sec

4.9 ms9.81Mbits/sec

3.6 ms1.42Mbits/sec

100 Mb/s< 1ms

100 Mb/s<1ms

Virginia

Delaware

Boston

San Jose

Los Angeles


CAIRN Emulation Performance

• Link of 1.86Mbps between East and West coasts is the bottleneck

• Steward is limited by bandwidth at 51 updates per second.

• 1.8Mbps can barely accommodate 2 updates per second for BFT.

• Earlier experimentation with benign fault 2-phase commit protocols achieved up to 76 updates per second.

CAIRN Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward

BFT

CAIRN Update Latency

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)

Steward

BFT


Wide-Area Scalability (3)

• Selected 5 Planetlab sites, in 5 different continents: US, Brazil, Sweden, Korea and Australia.

• Measured bandwidth and latency between every pair of sites.

• Emulated the network on our cluster, both for Steward and BFT.

• 3-fold latency improvement even when bandwidth is not limited.

Planetlab Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward

BFT

Planetlab Update Latency

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)

Steward

BFT


Performance metrics

• The system can withstand f (5) faults in each site.• Performs better than a flat solution that withstands

f (5) faults total.• Quantitative improvements - Performance

– Between twice and over 30 times lower latency, depending on network topology and update/query mix.

– Program metric met and exceeded in most types of wide area networks, even when write updates only are considered.

• Qualitative improvements - Availability– Read-only queries can be answered locally even in case

of partitions.– Write updates can be done when only a majority of sites

are connected (as opposed to 2f+1 out of 3f+1 connected servers).


Red Team Experiment

• Excellent interaction both with red-team and white-team.

• Performance evaluation – symmetric network– Several points on the performance graphs

presented were re-evaluated.• results were almost identical.

– Thorough discussions regarding the measuring methodology and presenting the latency results

• validated our experiments.

– Five crash faults were induced in the leading site• Performance slightly improved !!!


Red Team Experiment (2)• Steward under attack

– Five sites, 4 replicas each.– Red team had full control

(sudo) over five replicas, one in each site.

– Compromised replicas were injecting:

• Loss (up to 20% each)

• Delay (up to 200ms)

• Packet reordering

• Fragmentation (up to 100 bytes)

• Replay attacks

– Compromised replicas were running modified servers that contained malicious code.

4

51

2

3


Red Team Experiment (3)

• The system was NOT compromised!– Safety and liveness guarantees were preserved.– The system continued to run correctly under all attacks.– All logs from all experiments are available.

• Most of the attacks did not affect the performance.• The system was slowed down when the representative

of the leading site was attacked.– Speed of update ordering was slowed down to a factor of 1/5.– Speed was not low enough to trigger defense mechanisms.– Crashing the corrupt representative caused the system to do a

view change and re-gain performance.


Red Team Experiment (4)

Lessons learned:

• We re-built the entire system having in mind the red-team attack: we learned a lot even before the experiment.

• The overall performance of the system could be improved by not validating messages that are not needed (after 2f+1 messages have been received).

• Performance under attack could be improved substantially with further research.


Next Steps:Throughput Comparison (CAIRN)

050

100150

200250

300350

400

0 14 28 42 56 70 84 98 112 126 140

number of clients (Evaluation Network 2)

upda

te t

rans

actio

ns /

sec

ond

Congruity Engine Upper bound 2PC

[ADMST02]Not Byzantine!!!!!


Next Steps:

• Performance during common operation:– We believe that wide-area throughput performance can be

improved by at least a factor of 5 by using a more elaborate replication algorithm between wide area sites.

• Performance under attack:– So far, we only focused on optimizing performance in the

common case, while guaranteeing safety and liveness at all times. Performance under attack is extremely important, but not trivial to achieve.

• System availability and safety guarantees:– A Byzantine-tolerant protocol between wide-area sites would

guarantee system availability and safety even when some of the sites are completely compromised.


Impact

New ideas

Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare

ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems such as the Army Future Combat System.

http://www.cnds.jhu.edu/funding/srs/

June 04

Dec 04

June05

Dec 05

C3I model, baseline and demo

Componentanalysis & design

ComponentImplement.

System integration & evaluation

Final C3I demoand baseline eval

First scalable wide-area intrusion-tolerant replication architecture.

Providing accountability for authorized but malicious client updates.

Exploiting update semantics to provide instant and consistent information access.

Comp.eval.

scalability, accountability and instant information access for network centric warfare

Documents