reliability and relay selection in peer- to-peer communication systems salman a. baset and henning...
Post on 21-Dec-2015
224 views
TRANSCRIPT
Reliability and Relay Selection in Peer-to-Peer Communication Systems
Salman A. Baset and Henning SchulzrinneInternet Real-time Laboratory
Department of Computer Science
Columbia University
August 3rd, 2010
3
Peer-to-peer communication system
P2P / PSTN gateway
PSTN / Mobile
NAT / firewall
NAT / firewall
network address of node B?
(3) signaling
(4) media
network addressof node E?
(2) signaling
(2) signaling(3) media
node C
node B
media relay (or relay)
P2P
node A
node D
node E
(1)
(2)
node = user agent
• nodes form an overlay
• share responsibilities for message routing, signaling, media relaying
• super nodes, ordinary nodes
(1)(2)
(1)(1)
Reliability of p2p. comm systems?
Relay selection techniques?
4
Motivation
Reliability
framew
orkReliability and Relay Selection
Improving reliability of relayed calls
User
annoyance
How many relays per call to achieve 99.9% success rate?
Sources of unreliability in p2p comm. systems?
How to quantify the interference of relayed calls with other applications?
Outline
Mod
el fo
r
rela
yed
calls
Rel
ay
sele
ctio
n
How to improve the reliability of relayed calls?
How to find a relay in O(1)hop that minimizes latencyand user annoyance?
5
Reliability framework• Reliability=Proportion of completed calls (99.9%)• Goal
– understand reasons for call failure– devise techniques to improve them
• Reasons for call failure– (1) distributed search fails to find online callee
• DHT lookup
– (2) distributed search fails to find a suitable relay• DHT lookup or any appropriate relay selection scheme
– (3) relay fails during voice/video session • understand and improve reliability for relayed calls• devise techniques for finding a relay
6
Motivation
Reliability
framew
orkReliability and Relay Selection
How many relays per call to achieve 99.9% success rate?
Mod
el fo
r
rela
yed
calls
Outline
7
Understanding reliability of relayed calls
• Percentage of VoIP calls that need relaying– the provider knows – 15-20% calls for a commercial client-server IM / VoIP
application– 341 relays in 20 days for Skype [Suh05Infocom]
• 17 per day for a super node (~50K super nodes)
– Some client-server providers relay all calls– NAT studies
8
Understanding reliability of relayed calls
For desired reliability, minimum relays per call?– let Xi and Ri lifetime and residual lifetime of a relay
candidate (i.i.d.) – let D denote the call duration.– when ith relay fails, call is switched (i+1)st relay which is
instantly selected from the global pool of all relays.
)(rel Desired0
DRPk
ii
Smallest k such that callcompletion prob. is greater than or equal to desired reliability
k depends on the relationship b/w node lifetime and call duration
99.9%
R1 RkRk-1
D
1 2 K-1 k
9
Understanding reliability of relayed calls
Min # of relays k
6 4
3 5
1 10
Min # of relays k
Skype
12 hours (mean)
4 hours (med)
3
(mean call holding time= one hour)
95% of Skype relayed call durations – minimum of 3 relays to maintain 99.9% success rate
95% of Skype relay calls last less than 60 mins
kv))/((199.9%
Exponential node lifetimes Skype node lifetimes
lifetimes approximated as pareto
Mean node lifetime
Mean call duration
What if the system does not have enough relays?
10
Motivation
Reliability
framew
orkReliability and Relay Selection
Improving reliability of relayed calls
Mod
el fo
r
rela
yed
calls
How to improve the reliability of relayed calls?
Outline
11
Improving reliability of relayed calls
))...(max(1)( 1 DRRPDRP kNR
• Approach 1 -- no-replacement– select k relays in the
beginning of a call– do not replace failed relays
• Approach 2 -- with-replacement– select k relays in the
beginning of a call
– replace failed relays after μ– no failure during switch over
– Skype uses 2-relay with-replacement scheme
pure death process
)3/(2/1 2 MTTF
for )( /MTTFtR etFWR
)/()( vvDRP MTTFWR
2 1 0
2λ λ1-(λ + μ)1-2λ
μ
[Bir04]
12
Improving reliability of relayed calls• No-replacement – add more relays?
– diminishing returns• 1 vs. 2 vs. 3 vs. 4
• MTTF 50% 22% 13% (exp)
• No-replacement (NR) vs. with-replacement (WR)– depends on mean lifetime, call duration, repair time
Skypemean=12 hoursMedian=4 hours
2 relay with-replacement
search time=60s
13
Motivation
Reliability
framew
orkReliability and Relay Selection
Improving reliability of relayed calls
User
annoyanceHow to quantify the interference of the relayed call with other applications?
Mod
el fo
r
rela
yed
calls
Outline
14
User annoyance
• Interference of relayed call with other applications running on the relay machine
• File sharing = mutually beneficial (tit-for-tat)• Relaying = altruistic• Provide incentives or minimize user annoyance• How to quantify user annoyance?
– automatically?– spare network capacity
• Issues in measuring spare capacity?– bandwidth tests, ALTO
15
Motivation
Reliability
framew
orkReliability and Relay Selection
Improving reliability of relayed calls
User
annoyance Mod
el fo
r
rela
yed
calls
Rel
ay
sele
ctio
n
How to find a relay in O(1)hop that minimizes latencyand user annoyance?
Outline
16
Distributed relay selection
NAT
NAT
IP address RTT Bandwidth
IP address RTT Bandwidth
• Goal O(1) hop• 2-level hierarchical network
1-relay
close-by
Give me a relay
Here is a randomly selectedrelay
local-random scheme
search performance dropped calls
17
Distributed relay selection
• Delay• User annoyance
– interference with user applications
– file sharing (draft idle peers)– spare capacity
• random• mindelay
– select relay with minimum delay• netmax
– select relay with maximum spare bw
• threshold– select relays with delay < 150
ms and maximum spare capacity
• Results– strategies perform similar near
system collapse point– minimizing latency increases
annoyance, number of jobs per relay, vice versa
– threshold approach performs reasonably well
18
Related work
• Modeling– On lifetime-based node failure and stochastic resilience of
decentralized peer-to-peer networks [Leonard09ToN]
• Minimizing churn– Minimizing churn in distributed systems [Godfrey06Sigcom]
• Relay selection– ASAP: an AS-aware peer relay protocol for high quality VoIP
[Ren06ICDCS]
$ diff this related_work– focus on node isolation– minimizing churn is not sufficient– reliability, relay selection, user annoyance