reliability and relay selection in peer- to-peer communication systems salman a. baset and henning...

19
Reliability and Relay Selection in Peer-to-Peer Communication Systems Salman A. Baset and Henning Schulzrinne Internet Real-time Laboratory Department of Computer Science Columbia University August 3 rd , 2010

Post on 21-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Reliability and Relay Selection in Peer-to-Peer Communication Systems

Salman A. Baset and Henning SchulzrinneInternet Real-time Laboratory

Department of Computer Science

Columbia University

August 3rd, 2010

2

Background

3

Peer-to-peer communication system

P2P / PSTN gateway

PSTN / Mobile

NAT / firewall

NAT / firewall

network address of node B?

(3) signaling

(4) media

network addressof node E?

(2) signaling

(2) signaling(3) media

node C

node B

media relay (or relay)

P2P

node A

node D

node E

(1)

(2)

node = user agent

• nodes form an overlay

• share responsibilities for message routing, signaling, media relaying

• super nodes, ordinary nodes

(1)(2)

(1)(1)

Reliability of p2p. comm systems?

Relay selection techniques?

4

Motivation

Reliability

framew

orkReliability and Relay Selection

Improving reliability of relayed calls

User

annoyance

How many relays per call to achieve 99.9% success rate?

Sources of unreliability in p2p comm. systems?

How to quantify the interference of relayed calls with other applications?

Outline

Mod

el fo

r

rela

yed

calls

Rel

ay

sele

ctio

n

How to improve the reliability of relayed calls?

How to find a relay in O(1)hop that minimizes latencyand user annoyance?

5

Reliability framework• Reliability=Proportion of completed calls (99.9%)• Goal

– understand reasons for call failure– devise techniques to improve them

• Reasons for call failure– (1) distributed search fails to find online callee

• DHT lookup

– (2) distributed search fails to find a suitable relay• DHT lookup or any appropriate relay selection scheme

– (3) relay fails during voice/video session • understand and improve reliability for relayed calls• devise techniques for finding a relay

6

Motivation

Reliability

framew

orkReliability and Relay Selection

How many relays per call to achieve 99.9% success rate?

Mod

el fo

r

rela

yed

calls

Outline

7

Understanding reliability of relayed calls

• Percentage of VoIP calls that need relaying– the provider knows – 15-20% calls for a commercial client-server IM / VoIP

application– 341 relays in 20 days for Skype [Suh05Infocom]

• 17 per day for a super node (~50K super nodes)

– Some client-server providers relay all calls– NAT studies

8

Understanding reliability of relayed calls

For desired reliability, minimum relays per call?– let Xi and Ri lifetime and residual lifetime of a relay

candidate (i.i.d.) – let D denote the call duration.– when ith relay fails, call is switched (i+1)st relay which is

instantly selected from the global pool of all relays.

)(rel Desired0

DRPk

ii

Smallest k such that callcompletion prob. is greater than or equal to desired reliability

k depends on the relationship b/w node lifetime and call duration

99.9%

R1 RkRk-1

D

1 2 K-1 k

9

Understanding reliability of relayed calls

Min # of relays k

6 4

3 5

1 10

Min # of relays k

Skype

12 hours (mean)

4 hours (med)

3

(mean call holding time= one hour)

95% of Skype relayed call durations – minimum of 3 relays to maintain 99.9% success rate

95% of Skype relay calls last less than 60 mins

kv))/((199.9%

Exponential node lifetimes Skype node lifetimes

lifetimes approximated as pareto

Mean node lifetime

Mean call duration

What if the system does not have enough relays?

10

Motivation

Reliability

framew

orkReliability and Relay Selection

Improving reliability of relayed calls

Mod

el fo

r

rela

yed

calls

How to improve the reliability of relayed calls?

Outline

11

Improving reliability of relayed calls

))...(max(1)( 1 DRRPDRP kNR

• Approach 1 -- no-replacement– select k relays in the

beginning of a call– do not replace failed relays

• Approach 2 -- with-replacement– select k relays in the

beginning of a call

– replace failed relays after μ– no failure during switch over

– Skype uses 2-relay with-replacement scheme

pure death process

)3/(2/1 2 MTTF

for )( /MTTFtR etFWR

)/()( vvDRP MTTFWR

2 1 0

2λ λ1-(λ + μ)1-2λ

μ

[Bir04]

12

Improving reliability of relayed calls• No-replacement – add more relays?

– diminishing returns• 1 vs. 2 vs. 3 vs. 4

• MTTF 50% 22% 13% (exp)

• No-replacement (NR) vs. with-replacement (WR)– depends on mean lifetime, call duration, repair time

Skypemean=12 hoursMedian=4 hours

2 relay with-replacement

search time=60s

13

Motivation

Reliability

framew

orkReliability and Relay Selection

Improving reliability of relayed calls

User

annoyanceHow to quantify the interference of the relayed call with other applications?

Mod

el fo

r

rela

yed

calls

Outline

14

User annoyance

• Interference of relayed call with other applications running on the relay machine

• File sharing = mutually beneficial (tit-for-tat)• Relaying = altruistic• Provide incentives or minimize user annoyance• How to quantify user annoyance?

– automatically?– spare network capacity

• Issues in measuring spare capacity?– bandwidth tests, ALTO

15

Motivation

Reliability

framew

orkReliability and Relay Selection

Improving reliability of relayed calls

User

annoyance Mod

el fo

r

rela

yed

calls

Rel

ay

sele

ctio

n

How to find a relay in O(1)hop that minimizes latencyand user annoyance?

Outline

16

Distributed relay selection

NAT

NAT

IP address RTT Bandwidth

IP address RTT Bandwidth

• Goal O(1) hop• 2-level hierarchical network

1-relay

close-by

Give me a relay

Here is a randomly selectedrelay

local-random scheme

search performance dropped calls

17

Distributed relay selection

• Delay• User annoyance

– interference with user applications

– file sharing (draft idle peers)– spare capacity

• random• mindelay

– select relay with minimum delay• netmax

– select relay with maximum spare bw

• threshold– select relays with delay < 150

ms and maximum spare capacity

• Results– strategies perform similar near

system collapse point– minimizing latency increases

annoyance, number of jobs per relay, vice versa

– threshold approach performs reasonably well

18

Related work

• Modeling– On lifetime-based node failure and stochastic resilience of

decentralized peer-to-peer networks [Leonard09ToN]

• Minimizing churn– Minimizing churn in distributed systems [Godfrey06Sigcom]

• Relay selection– ASAP: an AS-aware peer relay protocol for high quality VoIP

[Ren06ICDCS]

$ diff this related_work– focus on node isolation– minimizing churn is not sufficient– reliability, relay selection, user annoyance

19

Conclusion

• Framework for analyzing reliability in p2p communication systems

• A model for reliability of relayed calls

• Reliability improvement schemes

• User annoyance

• Distributed relay selection