1 slides are from richard yang from yale minor modifications are made network applications and...
Post on 16-Jan-2016
213 Views
Preview:
TRANSCRIPT
1
Slides are from Richard Yang from Yale Minor modifications are made
Network Applications and Network Programming:
Web and P2P
Recap: FTP, HTTP
FTP: file transfer ASCII (human-readable
format) requests and responses stateful server one data channel and one control channel
HTTP Extensibility: ASCII requests, header lines,
entity body, and responses line Scalability/robustness
• stateless server (each request should contain the full information); DNS load balancing
• Client caching Web caches one data channel
2
Recap: WebServer Flow
TCP socket space
state: listeningaddress: {*.6789, *.*}completed connection queue: sendbuf:recvbuf:
128.36.232.5128.36.230.2
state: listeningaddress: {*.25, *.*}completed connection queue:sendbuf:recvbuf:
state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}sendbuf:recvbuf:
connSocket = accept()
Create ServerSocket(6789)
read request from connSocket
read local file
write file to connSocket
close connSocket
Recap: Writing High Performance Servers:
Major Issues: Many socket/IO operations can cause processing to block, e.g., accept: waiting for new connection; read a socket waiting for data or close; write a socket waiting for buffer space; I/O read/write for disk to finish
Thus a crucial perspective of network server design is the concurrency design (non-blocking) for high performance to avoid denial of service
A technique to avoidblocking: Thread
Multi-Threaded Web Server
5
connSocket = accept()
Create ServerSocket(6789)
Create thread for connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
Recap: Writing High Performance Servers Problems of multiple
threads Too many threads
throughput meltdown, response time explosion
Event-Driven Programming
Event-driven programming, also called asynchronous i/o Tell the OS to not block when accepting/reading/writing on sockets Java: asynchronous i/o
for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
Yields efficient and scalable concurrency Many examples: Click router, Flash web server, TP Monitors, etc.
Web Server
8
connSocket = accept()
Create ServerSocket(6789)
Create thread for connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
If the OS will not block on sockets, how may the program structure look
like?
Typical Structure of Async i/o
Typically, async i/o programs use Finite State Machines (FSM) to monitor the progress of requests The state info keeps track of the execution
stage of processing each request, e.g., reading request, writing reply, …
The program has a loop to check potential events at each state
9
Async I/O in Java
An important class is the class Selector, to support event loop
A Selector is a multiplexer of selectable channel objects example channels: DatagramChannel, ServerSocketChannel, SocketChannel
use configureBlocking(false) to make a channel non-blocking
A selector may be created by invoking the open method of this class
Async I/O in Java
A selectable channel registers events (called a SelectionKey) with a selector with the register method
A SelectionKey object contains two operation sets interest Set ready Set
A SelectionKey object has an attachment which can store data often the attachment is a
buffer
Selector
Selection Key
Selectable Channel
register
Async I/O in Java
Call select (or selectNow(), or select(int timeout)) to check for ready events, called the selected key set
Iterate over the set to process all ready events
Problems of Event-Driven Server
Difficult to engineer, modularize, and tune
No performance/failure isolation between Finite-State-Machines (FSMs)
FSM code can never block (but page faults, i/o, garbage collection may still force a block) thus still need multiple threads
Summary of Traditional C-S Web Servers
Is the application extensible, scalable, robust, secure?
14
app. server
C0
client 1
client 2
client 3
client n
DNS
Content Distribution History...
“With 25 years of Internet experience, we’ve learned exactly one way to deal with the exponential growth: Caching”.
(1997, Van Jacobson)
15
16
Web Caches (Proxy)
Web caches/proxy placed at entrance of an ISP
Client sends all http requests to web cache if object at web
cache, web cache immediately returns object in http response
else requests object from origin server, then returns http response to client
client
Proxyserver
client
http request
http re
quest
http response
http re
sponse
http re
quest
http re
sponse
http requesthttp response
origin server
origin server
Web Proxy/Cache
Web caches give good performance because very often a single client
repeatedly accesses the same document
a nearby client also accesses the same document
Cache Hit ratio increases logarithmically with number of users
17
app. server
C0
client 1
client 2 client
3
ISP cache
client 4
client 5
client 6
ISP cache
18
Benefits of Web Caching
Assume: cache is “close” to client (e.g., in same network)
smaller response time: cache “closer” to client
decrease traffic to distant servers link out of
institutional/local ISP network often bottleneck
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
What went wrong with Web Caches? Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1 However, Web caching was developed with a
strong ISP perspective, leaving content providers out of the picture It is the ISP who places a cache and controls it ISPs only interest to use Web caches is to reduce
bandwidth
In the USA: Bandwidth relative cheap In Europe, there were many more Web caches
However, ISPs can arbitrarily tune Web caches to deliver stale content
19
Content Provider Perspective
Content providers care about User experience latency Content freshness Accurate access statistics Avoid flash crowds Minimize bandwidth usage in their access
link
20
Content Distribution Networks Content Distribution Networks (CDNs) build an
overlay networks of caches to provide fast, cost effective, and reliable content delivery, while working tightly with content providers.
Example: Akamai – original and largest commercial CDN
operates over 25,000 servers in over 1,000 networks
Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students
21
Basic of Akamai Operation Content provider server
provides the base HTML document Akamai caches embedded objects at a set
of its cache servers (called edge servers) Akamaization of embedded content: e.g.,
<IMG SRC= http://www.provider.com/image.gif > changed to
<IMGSRC = http://a661. g.akamai.net/hash/image.gif>
Akamai customizes DNS to select serving edge servers based on closeness to client browser server load
22
More Akamai information
URL akamaization is becoming obsolete and only supported for legacy reasons Currently most content providers prefer to
use DNS CNAME techniques to get all their content served from the Akamai servers
still content providers need to run their origin servers
Akamai Evolution: Files/streaming Secure pages and whole pages Dynamic page assembly at the edge (ESI) Distributed applications
23
Lab: Problems of Traditional Content Distribution
24
app. server
C0
client 1
client 2
client 3
client n
DNS
25
Objectives of P2P
Share the resources (storage and bandwidth) of individual clients to improve scalability/robustness
Bypass DNS to find clients with resources! examples: instant
messaging, skype
Internet
P2P
But P2P is not new
Original Internet was a p2p system: The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ. of Utah
No DNS or routing infrastructure, just connected by phone lines
Computers also served as routers
P2P is simply an iteration of scalable distributed systems
P2P Systems
File Sharing: BitTorrent, LimeWireStreaming: PPLive, PPStream, Zatto,
…Research systems
Collaborative computing: SETI@Home project
• Human genome mapping• Intel NetBatch: 10,000 computers in 25
worldwide sites for simulations, saved about 500million
Peer-to-Peer Computing- 40-70% of total traffic in many networks- upset the music industry, drawn college
students, web developers, recording artists and universities into court
Source: ipoque Internet study 2008/2009
29
Recap: P2P Objectives
Bypass DNS to locate clients with resources!examples: instant
messaging, skype
Share the storage and bandwidth of individual clients to improve scalability/robustness
Internet
The Lookup Problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Key=“title”Value=MP3 data… Client
Lookup(“title”)
?
find where a particular file is storedpay particular attention to see its equivalence of
DNS
31
Outline
RecapP2P
the lookup problem Napster
32
Centralized Database: Napster Program for sharing music over the Internet History:
5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster Online music service, wrote the program in 60 hours
12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of
Appeals: Napster knew users violating copyright laws
7/01: # simultaneous online users:Napster 160K
9/02: bankruptcy
We are referring to the Napster before closure.03/2000
33
Napster: How Does it Work?
Application-level, client-server protocol over TCP
A centralized index system that maps files (songs) to machines that are alive and with files
Steps: Connect to Napster server Upload your list of files (push) to server Give server keywords to search the full list Select “best” of hosts with answers
34
Napster Architecture
Napster: Publish
I have X, Y, and Z!
Publish
insert(X, 123.2.21.23)...
123.2.21.23
Napster: Search
Where is file A?
Query Reply
search(A)-->123.2.0.18124.1.0.1
123.2.0.18
124.1.0.1
Napster: Ping
ping
123.2.0.18
124.1.0.1ping
Napster: Fetch
123.2.0.18
124.1.0.1fetch
39
Napster MessagesGeneral Packet Format
[chunksize] [chunkinfo] [data...]
CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes
CHUNKINFO: (hex) Intel-endian 16-bit integer.
00 - login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline
5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..
40
Centralized Database: Napster Summary of features: a hybrid design
control: client-server (aka special DNS) for files data: peer to peer
Advantages simplicity, easy to implement sophisticated
search engines on top of the index system
Disadvantages application specific (compared with DNS) lack of robustness, scalability: central search
server single point of bottleneck/failure easy to sue !
41
Variation: BitTorrent
A global central index server is replaced by one tracker per file (called a swarm) reduces centralization; but needs other
means to locate trackers
The bandwidth scalability management technique is more interesting more later
42
Outline
Recap P2P
the lookup problem Napster (central query server; distributed
data servers) Gnutella
Gnutella
On March 14th 2000, J. Frankel and T. Pepper from AOL’s Nullsoft division (also the developers of the popular Winamp mp3 player) released Gnutella
Within hours, AOL pulled the plug on it
Quickly reverse-engineered and soon many other clients became available: Bearshare, Morpheus, LimeWire, etc.
43
44
Decentralized Flooding: Gnutella
On startup, client contacts other servents (server + client) in network to form interconnection/peering relationships servent interconnection used to forward control (queries,
hits, etc) How to find a resource record: decentralized flooding
send requests to neighbors neighbors recursively forward the requests
45
Decentralized Flooding
B
A
C E
F
H
J
S
D
G
IK
M
N
L
46
Decentralized Flooding
B
A
C E
F
H
J
S
D
G
IK
send query to neighbors
M
N
L
Each node forwards the query to its neighbors other than the onewho forwards it the query
47
Background: Decentralized Flooding
B
A
C E
F
H
J
S
D
G
IK
M
N
L
Each node should keep track of forwarded queries to avoid loop ! nodes keep state (which will time out---soft state) carry the state in the query, i.e. carry a list of visited nodes
48
Decentralized Flooding: Gnutella
Basic message header Unique ID, TTL, Hops
Message types Ping – probes network for other servents Pong – response to ping, contains IP addr, # of files, etc.
Query – search criteria + speed requirement of servent QueryHit – successful response to Query, contains addr
+ port to transfer from, speed of servent, etc.
Ping, Queries are flooded QueryHit, Pong: reverse path of previous message
49
Advantages and Disadvantages of Gnutella
Advantages: totally decentralized, highly robust
Disadvantages: not scalable; the entire network can be swamped
with flood requests• especially hard on slow clients; at some point broadcast
traffic on Gnutella exceeded 56 kbps to alleviate this problem, each request has a TTL to
limit the scope• each query has an initial TTL, and each node forwarding
it reduces it by one; if TTL reaches 0, the query is dropped (consequence?)
Flooding: FastTrack (aka Kazaa) Modifies the Gnutella protocol into two-level hierarchy Supernodes
Nodes that have better connection to Internet Act as temporary indexing servers for other nodes Help improve the stability of the network
Standard nodes Connect to supernodes and report list of files
Search Broadcast (Gnutella-style) search across
supernodes Disadvantages
Kept a centralized registration prone to law suits
Optional Slides
51
Optional Slides
52
Aside: Search Time?
Aside: All Peers Equal?
56kbps Modem
10Mbps LAN
1.5Mbps DSL
56kbps Modem56kbps Modem
1.5Mbps DSL
1.5Mbps DSL
1.5Mbps DSL
Aside: Network Resilience
Partial Topology Random 30% die Targeted 4% die
from Saroiu et al., MMCN 2002
56
Asynchronous Network Programming
(C/C++)
57
A Relay TCP Client: telnet-like Program
TCP client
TCP server
writen
readn
fgets
fputs
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpclient
58
Method 1: Process and Thread process
fork() waitpid()
Thread: light weight process pthread_create() pthread_exit()
59
pthread
Void main() { char recvline[MAXLINE + 1]; ss = new socketstream(sockfd);
pthread_t tid; if (pthread_create(&tid, NULL, copy_to, NULL)) { err_quit("pthread_creat()"); }
while (ss->read_line(recvline, MAXLINE) > 0) { fprintf(stdout, "%s\n", recvline); }}
void *copy_to(void *arg) { char sendline[MAXLINE];
if (debug) cout << "Thread create()!" << endl; while (fgets(sendline, sizeof(sendline), stdin)) ss->writen_socket(sendline, strlen(sendline));
shutdown(sockfd, SHUT_WR); if (debug) cout << "Thread done!" << endl;
pthread_exit(0);}
60
Method 2: Asynchronous I/O (Select)
select: deal with blocking system callint select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *set);FD_ZERO(fd_set *set);FD_ISSET(int fd, fd_set *set);FD_SET(int fd, fd_set *set);
61
Method 3: Signal and Select
signal: events such as timeout
62
Examples of Network Programming
Library to make life easier Four design examples
TCP Client TCP server using select TCP server using process and thread Reliable UDP
Warning: It will be hard to listen to me reading through the code. Read the code.
63
Example 2: A Concurrent TCP Server Using Process or Thread
Get a line, and echo it back Use select() For how to use process or thread, see
later Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpserver
Are there potential denial of service problems with the code?
64
Example 3: A Concurrent HTTP TCP Server Using Process/Thread
Use process-per-request or thread-per- request
Check the code at:http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/simple_httpd
top related