scalability, accountability and instant information access for network centric warfare
DESCRIPTION
Scalability, Accountability and Instant Information Access for Network Centric Warfare. Yair Amir, Claudiu Danilov, Danny Dolev, Jon Kirsch, John Lane, Jonathan Shapiro. Department of Computer Science Johns Hopkins University. Chi-Bun Chan, Cristina Nita-Rotaru, Josh Olsen David Zage. - PowerPoint PPT PresentationTRANSCRIPT
Johns Hopkins & Purdue 115 Dec 05
Scalability, Accountability and Instant Information Access for
Network Centric Warfare
Department of Computer ScienceJohns Hopkins University
Yair Amir, Claudiu Danilov, Danny Dolev, Jon Kirsch, John Lane, Jonathan Shapiro
Chi-Bun Chan, Cristina Nita-Rotaru, Josh OlsenDavid Zage
Department of Computer SciencePurdue University
http://www.cnds.jhu.edu
Johns Hopkins & Purdue 215 Dec 05
Dealing with Insider Threats
• Scaling survivable replication to wide area networks.– Overcome 5 malicious replicas.
– SRS goal: Improve latency by a factor of 3.
– Self imposed goal: Improve throughput by a factor of 3.
– Self imposed goal: Improve availability of the system.
• Dealing with malicious clients.– Compromised clients can inject authenticated but
incorrect data - hard to detect on the fly.
– Malicious or just an honest error? Can be useful for both.
• Exploiting application update semantics for replication speedup in malicious environments.– Weaker update semantics allows for immediate
response.
Project Goals:
Johns Hopkins & Purdue 315 Dec 05
State Machine Replication
• Main Challenge: Ensuring coordination between servers.– Requires agreement on the request to be
processed and consistent order of requests.
• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.
• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.
Johns Hopkins & Purdue 415 Dec 05
State of the Art in Byzantine ReplicationBFT [CL99]
Baseline technology
C
0
1
2
requ es t p re -p rep a re p rep a re com m it
3
rep ly
Johns Hopkins & Purdue 515 Dec 05
The Paxos ProtocolNormal Case, after leader election
[Lam98]
Key: A simple end-to-end algorithm
C
0
1
2
request proposal accept reply
Johns Hopkins & Purdue 615 Dec 05
Steward: Survivable Technology for Wide Area Replication
• Each site acts as a trusted logical unit that can crash or partition.
• Effects of malicious faults are confined to the local site.– Threshold signatures prove agreement to other sites.
• Between sites:– Fault-tolerant protocol between sites.
• There is no free lunch – we pay with more hardware…
Server
Replicas 1 o o o2 3 3f+1
ClientsA site
Johns Hopkins & Purdue 715 Dec 05
Challenges (I)
• Each site has a representative that:– Coordinates the Byzantine protocol inside the site.– Forwards packets in and out of the site.
• One of the sites act as the leader in the wide area protocol– The representative of the leading site is the one assigning
sequence numbers to updates.
• How do we select and change the representatives and the leader site, in agreement ?
• How do we transition safely when we need to change them ?
Johns Hopkins & Purdue 815 Dec 05
Challenges (II)
• Messages coming out of a site during leader election are based on communication between 2f+1(out of 3f+1) servers inside the site.– There can be multiple sets of 2f+1 servers.– In some instances, multiple correct but different site
messages can be issued by a malicious representative.– It is sometimes impossible to completely isolate a malicious
server behavior inside its own site.
• This behavior can happen in two instances:– The servers inside a site propose a new leading site.– The servers inside a site report their individual status with
respect to the global site progress.
• Developed a detailed proof of correctness of the protocol.
Johns Hopkins & Purdue 915 Dec 05
Main idea• Sites change their local
representatives based on timeouts.
• Leader site representative has a larger timeout .– allows forcommunication with at least one correct rep. at other sites.
• After changing f+1 leader site representatives, servers at all sites stop participating in the protocol, and elect a different leading site.
Johns Hopkins & Purdue 1015 Dec 05
Steward: First Byzantine Replication Scalable to Wide
Area Networks• A second iteration implementation
– Based on the complete theoretical design.– Follows closely the pseudocode proven to be
correct.
• We benchmarked the new implementation against the program metrics.
• The code successfully passed the red-team experiment.
• We believe it is theoretically unbreakable.
Johns Hopkins & Purdue 1115 Dec 05
Testing Environment
Platform: Dual Intel Xeon CPU 3.2 GHz 64 bits 1 GByte RAM, Linux Fedora Core 3.
Library relies on Openssl :- Used OpenSSL 0.9.7a 19 Feb 2003.
Baseline operations:- RSA 1024-bits sign: 1.3 ms, verify: 0.07 ms.- Perform modular exponentiation 1024 bits, ~1 ms.- Generate a 1024 bits RSA key ~55ms.
Johns Hopkins & Purdue 1215 Dec 05
Evaluation Network 1: Symmetric Wide Area Network
• Synthetic network used for analysis and understanding.
• 5 sites, each of which connected to all other sites with equal bandwidth/latency links.
• One fully deployed site of 16 replicas; the other sites are emulated by one computer each.
• Total – 80 replicas in the system, emulated by 20 computers.
• 50 ms wide area links between sites.
• Varied wide area bandwidth and the number of clients.
Johns Hopkins & Purdue 1315 Dec 05
Write Update Performance
• Symmetric network.• 5 sites.
• Steward:• 16 replicas per site. • Total of 80 replicas (four
sites are emulated).• Actual computers: 20.
• BFT:• 16 replicas total.• 4 replicas in one site, 3
replicas in each other site.
• Update only performance (no disk writes).
Update Throughput
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30
Clients
Up
dat
es/s
ec
Steward 10Mbps
Steward 5Mbps
Steward 2.5Mbps
BFT 10Mbps
BFT 5Mbps
BFT 2.5Mbps
Update Latency
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30
Clients
Lat
ency
(m
s)Steward 10Mbps
Steward 5Mbps
Steward 2.5Mbps
BFT 10Mbps
BFT 5Mbps
BFT 2.5Mbps
Johns Hopkins & Purdue 1415 Dec 05
Read-only Query Performance
• 10 Mbps on wide area links.
• 10 clients inject mixes of read-only queries and write updates.
• None of the systems was limited by bandwidth.
• Performance improves between a factor of two and more than an order of magnitude.
• Availability: Queries can be answered locally, within each site.
Query Mix Throughput
0
50
100
150
200
250
300
350
400
450
500
0 10 20 30 40 50 60 70 80 90 100
Update ratio (%)
Act
ion
s/se
c
Steward
BFT
Query Mix Latency
0
50
100
150
200
250
300
350
400
450
500
0 10 20 30 40 50 60 70 80 90 100
Update ratio (%)
Lat
ency
(m
s)
Steward
BFT
Johns Hopkins & Purdue 1515 Dec 05
Evaluation Network 2:Practical Wide-Area Network
• Based on a real experimental network (CAIRN). • Modeled on our cluster, emulating bandwidth and latency
constraints, both for Steward and BFT.
ISIPC
ISIPC4
TISWPC
ISEPC3
ISEPC
UDELPC
MITPC
38.8 ms1.86Mbits/sec
1.4 ms1.47Mbits/sec
4.9 ms9.81Mbits/sec
3.6 ms1.42Mbits/sec
100 Mb/s< 1ms
100 Mb/s<1ms
Virginia
Delaware
Boston
San Jose
Los Angeles
Johns Hopkins & Purdue 1615 Dec 05
CAIRN Emulation Performance
• Link of 1.86Mbps between East and West coasts is the bottleneck
• Steward is limited by bandwidth at 51 updates per second.
• 1.8Mbps can barely accommodate 2 updates per second for BFT.
• Earlier experimentation with benign fault 2-phase commit protocols achieved up to 76 updates per second.
CAIRN Update Throughput
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30
Clients
Up
dat
es/s
ec
Steward
BFT
CAIRN Update Latency
0
200
400
600
800
1000
1200
1400
0 5 10 15 20 25 30
Clients
Lat
ency
(m
s)
Steward
BFT
Johns Hopkins & Purdue 1715 Dec 05
Wide-Area Scalability (3)
• Selected 5 Planetlab sites, in 5 different continents: US, Brazil, Sweden, Korea and Australia.
• Measured bandwidth and latency between every pair of sites.
• Emulated the network on our cluster, both for Steward and BFT.
• 3-fold latency improvement even when bandwidth is not limited.
Planetlab Update Throughput
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30
Clients
Up
dat
es/s
ec
Steward
BFT
Planetlab Update Latency
0
200
400
600
800
1000
1200
1400
0 5 10 15 20 25 30
Clients
Lat
ency
(m
s)
Steward
BFT
Johns Hopkins & Purdue 1815 Dec 05
Performance metrics
• The system can withstand f (5) faults in each site.• Performs better than a flat solution that withstands
f (5) faults total.• Quantitative improvements - Performance
– Between twice and over 30 times lower latency, depending on network topology and update/query mix.
– Program metric met and exceeded in most types of wide area networks, even when write updates only are considered.
• Qualitative improvements - Availability– Read-only queries can be answered locally even in case
of partitions.– Write updates can be done when only a majority of sites
are connected (as opposed to 2f+1 out of 3f+1 connected servers).
Johns Hopkins & Purdue 1915 Dec 05
Red Team Experiment
• Excellent interaction both with red-team and white-team.
• Performance evaluation – symmetric network– Several points on the performance graphs
presented were re-evaluated.• results were almost identical.
– Thorough discussions regarding the measuring methodology and presenting the latency results
• validated our experiments.
– Five crash faults were induced in the leading site• Performance slightly improved !!!
Johns Hopkins & Purdue 2015 Dec 05
Red Team Experiment (2)• Steward under attack
– Five sites, 4 replicas each.– Red team had full control
(sudo) over five replicas, one in each site.
– Compromised replicas were injecting:
• Loss (up to 20% each)
• Delay (up to 200ms)
• Packet reordering
• Fragmentation (up to 100 bytes)
• Replay attacks
– Compromised replicas were running modified servers that contained malicious code.
4
51
2
3
Johns Hopkins & Purdue 2115 Dec 05
Red Team Experiment (3)
• The system was NOT compromised!– Safety and liveness guarantees were preserved.– The system continued to run correctly under all attacks.– All logs from all experiments are available.
• Most of the attacks did not affect the performance.• The system was slowed down when the representative
of the leading site was attacked.– Speed of update ordering was slowed down to a factor of 1/5.– Speed was not low enough to trigger defense mechanisms.– Crashing the corrupt representative caused the system to do a
view change and re-gain performance.
Johns Hopkins & Purdue 2215 Dec 05
Red Team Experiment (4)
Lessons learned:
• We re-built the entire system having in mind the red-team attack: we learned a lot even before the experiment.
• The overall performance of the system could be improved by not validating messages that are not needed (after 2f+1 messages have been received).
• Performance under attack could be improved substantially with further research.
Johns Hopkins & Purdue 2315 Dec 05
Next Steps:Throughput Comparison (CAIRN)
050
100150
200250
300350
400
0 14 28 42 56 70 84 98 112 126 140
number of clients (Evaluation Network 2)
upda
te t
rans
actio
ns /
sec
ond
Congruity Engine Upper bound 2PC
[ADMST02]Not Byzantine!!!!!
Johns Hopkins & Purdue 2415 Dec 05
Next Steps:
• Performance during common operation:– We believe that wide-area throughput performance can be
improved by at least a factor of 5 by using a more elaborate replication algorithm between wide area sites.
• Performance under attack:– So far, we only focused on optimizing performance in the
common case, while guaranteeing safety and liveness at all times. Performance under attack is extremely important, but not trivial to achieve.
• System availability and safety guarantees:– A Byzantine-tolerant protocol between wide-area sites would
guarantee system availability and safety even when some of the sites are completely compromised.
Johns Hopkins & Purdue 2515 Dec 05
Impact
New ideas
Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare
ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems such as the Army Future Combat System.
http://www.cnds.jhu.edu/funding/srs/
June 04
Dec 04
June05
Dec 05
C3I model, baseline and demo
Componentanalysis & design
ComponentImplement.
System integration & evaluation
Final C3I demoand baseline eval
First scalable wide-area intrusion-tolerant replication architecture.
Providing accountability for authorized but malicious client updates.
Exploiting update semantics to provide instant and consistent information access.
Comp.eval.