peer to peer systems - kocwelearning.kocw.net/contents4/document/lec/2013/gachon/you... ·...
TRANSCRIPT
PEER-TO-PEER SYSTEMS
Joon Yoo
Dept. of Software Design & ManagementGachon University
Distributed Systems - Fall 2013
Outline
o Peer-to-peer systems
o Distributed Hash Table (DHT)
o BitTorrent
- 2Distributed Systems - Fall 2013
3
Peer-to-Peer Systems
o Server-Client system
n Hot spots (Google, Naver etc..) just keep getting hotter while cold pipes remain unused
o Peer-to-Peer system
n share the information, bandwidth and computing resources of individual users
o Main Problem
n Searching for location of the content
Distributed Systems - Fall 2013
4
Classification
o How to search for the content
o Server based Searchn Not actually peer-to-peer, single point of failure
n Soribada, Napster, eDonkey
o Flooding based searchn Simple but doesn’t scale. Worst case O(N) searches.
n Gnutella
o Distributed Hash Table (DHT)n Scalable but complicated.
n CAN, Chord…
Distributed Systems - Fall 2013
Centralized lookup
Publisher@
Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) state and a single point of failure
Key=“title”Value=file data…
N4
- 5Distributed Systems - Fall 2013
Flooded queries
N4Publisher@Client
N6
N9
N7N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=file data…
Lookup(“title”)
- 6Distributed Systems - Fall 2013
Routed queries (Freenet, Chord, etc.)
N4Publisher
Client
N6
N9
N7N8
N3
N2N1
Lookup(“title”)
Key=“title”Value=file data…
- 7Distributed Systems - Fall 2013
Outline
o Peer-to-peer systems
o Distributed Hash Table (DHT)n Chapter 2.2.2 Decentralized Architectures.
o BitTorrent
- 8Distributed Systems - Fall 2013
The lookup problem
• At the heart of all DHTs
Internet
N1N2 N3
N6N5N4
Publisher
Put (Key=“title”Value=file data…) Client
Get(key=“title”)
?
- 9Distributed Systems - Fall 2013
DHT algorithms
o Chord (2001)
n I. Stoica, R. Morris, D. Karger ,M.-F. Kaashoek, H. Balakrishnan (MIT), “Chord: A scalable peer-to-peer lookup service for internet applications,” ACM Sigcomm’01
n 11,032+ citations as of Dec. 2013 (Google scholar)
o CAN (2001)
n S. Ratnasamy , P. Francis , M. Handley , R. Karp , S. Schenker (UC Berkeley), “A scalable content-addressable network,” ACM Sigcomm’01
n 8,323+ citations as of Dec. 2013 (Google scholar)
- 10Distributed Systems - Fall 2013
DHT Basics
o Intuitionn In previous P2P examples, the data search was done either by
using centralized server or fully distributed random search without server
n In DHT, the nodes are distributed with some structure called DHT. So we need some mechanism to map the data item keys to the distributed nodes.
o Data item mappingn Data item: assigned random key identifier from large identifier
space
n DHT Nodes: assigned random node identifier from same identifier space
n Uniquely map key of data item to DHT node based on some distance metric.
- 11Distributed Systems - Fall 2013
CAN (1/6)
o Node (n1) joinsn Function(n1) = (x1,y1)
n n1 takes all space
o Node (n2) joinsn Function(n2) = (x2,y2)
n Identification space is splitted§ Into rectangular space
o Each node has neighbor informationn n1 knows n2
- 12Distributed Systems - Fall 2013
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1(x1,y1)
CAN (2/6)
- 13Distributed Systems - Fall 2013
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1(x1,y1) n2(x2,y2)
o n3 joinso n2 joins
CAN (3/6)
- 14Distributed Systems - Fall 2013
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
o n5 joinso n4 joins
CAN (4/6)
o Files f1~f4 added.
n Files are assigned to the node that is responsible for the region
- 15Distributed Systems - Fall 2013
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN (5/6)
o A client Bob wants to find file f4! But he only knows n1: Routing
- 16Distributed Systems - Fall 2013
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
17
CAN (6/6)
§ Each node knows its neighbors in the d-space
§ Forward query to the neighbor that is closest to the query id
§ Example: assume n1 queries f41 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
§ Space divided between nodes§ All nodes cover the entire space§ Each node covers either a square or a
rectangular area of ratios 1:2 or 2:1
§ Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)
§ Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);
Distributed Systems - Fall 2013
Distributed Hash Table (DHT) usages
o Apache Cassandra: Facebook (2010), open source distributed database management system
o Mainline DHT: Most BitTorrent search engines
o Most distributed data stores employ some form of DHT for lookup.
o Memcached
o And many more…
- 18Distributed Systems - Fall 2013
Outline
o Peer-to-peer systems
o Distributed Hash Table (DHT)
o BitTorrentn Chapter 2.2.3 Hybrid Architectures.
- 19Distributed Systems - Fall 2013
What is BitTorrent?
o Efficient content distribution system using file swarming.
Usually does not perform all the functions of a typical p2p
system, like searching.
o BitTorrent is the most widely used P2P program in the
world
n Utilized by 150 million active users as of Jan. 2012
n BitTorrent has, on average, more active users than YouTube
and Facebook combined.
n Since 2010, more than 20,000 BitTorrent users have been sued
by copyright trolls.
- 20Distributed Systems - Fall 2013
File sharing
o To share a file or group of files, a peer first creates a .torrent file, a small file that contains
n metadata about the files to be shared, and
n Information about the tracker, the computer that coordinates the file distribution.
o Peers first obtain a .torrent file, and then connect to the specified tracker, which tells them from which other peers to download the pieces of the file.
- 21Distributed Systems - Fall 2013
Basic Idea
o Initial seeder chops file into many pieces.
o Leecher first locates the .torrent file that directs it to a tracker
o Tracker tells which other peers are downloading that file.
o As a leecher downloads pieces of the file, replicas of the pieces are created. More downloads mean more replicas available
o As soon as a leecher has a complete piece, it can potentially share it with other downloaders.
o Eventually each leecher becomes a seeder by obtaining all the pieces, and assembles the file. Verifies the checksum.
- 22Distributed Systems - Fall 2013
Piece Selection
o The order in which pieces are selected by different peers is critical for good performance
o If an inefficient policy is used, then peers may end up in a situation where each has all identical set of easily available pieces, and none of the missing ones.
o If the original seed is prematurely taken down, then the file cannot be completely downloaded! What are “good policies?”
- 23Distributed Systems - Fall 2013
Piece Selection: Rarest Piece First
o General rule
o Determine the pieces that are most rare
among your peers, and download those first.
o This ensures that the most commonly
available pieces are left till the end to
download.
- 24Distributed Systems - Fall 2013
Internal mechanism
o Built-in incentive mechanism (where all the
magic happens):
n Choking Algorithm
n Optimistic Unchoking
- 25Distributed Systems - Fall 2013
Choking
- 26
o Choking is a temporary refusal to upload. It
is one of BitTorrent’s most powerful idea to
deal with free riders (those who only
download but never upload).
o Tit-for-tat strategy is based on game-
theoretic concepts.
Distributed Systems - Fall 2013
Upload-Only mode
o Once download is complete, a peer has no download rates to use for comparison nor has any need to use them. The question is, which nodes to upload to?
o Policy: Upload to those with the best upload rate. This ensures that pieces get replicated faster, and new seeders are created fast
- 27Distributed Systems - Fall 2013
Pipelining
o When transferring data over TCP, always have several
requests pending at once, to avoid a delay between
pieces being sent. At any point in time, some number,
typically 5, are requested simultaneously.
o Every time a piece arrives, a new request is sent out.
- 28Distributed Systems - Fall 2013
Distributed tracking: Trackerless torrents
o BitTorrent also supports "trackerless" torrents
o Trackerless torrents features a DHT implementation that allows the client to download torrents that have been created without using a BitTorrent tracker.
- 29Distributed Systems - Fall 2013