1 slides are from richard yang from yale minor modifications are made network applications and...

Post on 16-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Slides are from Richard Yang from Yale Minor modifications are made

Network Applications and Network Programming:

Web and P2P

Recap: FTP, HTTP

FTP: file transfer ASCII (human-readable

format) requests and responses stateful server one data channel and one control channel

HTTP Extensibility: ASCII requests, header lines,

entity body, and responses line Scalability/robustness

• stateless server (each request should contain the full information); DNS load balancing

• Client caching Web caches one data channel

2

Recap: WebServer Flow

TCP socket space

state: listeningaddress: {*.6789, *.*}completed connection queue: sendbuf:recvbuf:

128.36.232.5128.36.230.2

state: listeningaddress: {*.25, *.*}completed connection queue:sendbuf:recvbuf:

state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}sendbuf:recvbuf:

connSocket = accept()

Create ServerSocket(6789)

read request from connSocket

read local file

write file to connSocket

close connSocket

Recap: Writing High Performance Servers:

Major Issues: Many socket/IO operations can cause processing to block, e.g., accept: waiting for new connection; read a socket waiting for data or close; write a socket waiting for buffer space; I/O read/write for disk to finish

Thus a crucial perspective of network server design is the concurrency design (non-blocking) for high performance to avoid denial of service

A technique to avoidblocking: Thread

Multi-Threaded Web Server

5

connSocket = accept()

Create ServerSocket(6789)

Create thread for connSocket

read request from connSocket

read local file

write file to connSocket

close connSocket

read request from connSocket

read local file

write file to connSocket

close connSocket

Recap: Writing High Performance Servers Problems of multiple

threads Too many threads

throughput meltdown, response time explosion

Event-Driven Programming

Event-driven programming, also called asynchronous i/o Tell the OS to not block when accepting/reading/writing on sockets Java: asynchronous i/o

for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/

Yields efficient and scalable concurrency Many examples: Click router, Flash web server, TP Monitors, etc.

Web Server

8

connSocket = accept()

Create ServerSocket(6789)

Create thread for connSocket

read request from connSocket

read local file

write file to connSocket

close connSocket

read request from connSocket

read local file

write file to connSocket

close connSocket

If the OS will not block on sockets, how may the program structure look

like?

Typical Structure of Async i/o

Typically, async i/o programs use Finite State Machines (FSM) to monitor the progress of requests The state info keeps track of the execution

stage of processing each request, e.g., reading request, writing reply, …

The program has a loop to check potential events at each state

9

Async I/O in Java

An important class is the class Selector, to support event loop

A Selector is a multiplexer of selectable channel objects example channels: DatagramChannel, ServerSocketChannel, SocketChannel

use configureBlocking(false) to make a channel non-blocking

A selector may be created by invoking the open method of this class

Async I/O in Java

A selectable channel registers events (called a SelectionKey) with a selector with the register method

A SelectionKey object contains two operation sets interest Set ready Set

A SelectionKey object has an attachment which can store data often the attachment is a

buffer

Selector

Selection Key

Selectable Channel

register

Async I/O in Java

Call select (or selectNow(), or select(int timeout)) to check for ready events, called the selected key set

Iterate over the set to process all ready events

Problems of Event-Driven Server

Difficult to engineer, modularize, and tune

No performance/failure isolation between Finite-State-Machines (FSMs)

FSM code can never block (but page faults, i/o, garbage collection may still force a block) thus still need multiple threads

Summary of Traditional C-S Web Servers

Is the application extensible, scalable, robust, secure?

14

app. server

C0

client 1

client 2

client 3

client n

DNS

Content Distribution History...

“With 25 years of Internet experience, we’ve learned exactly one way to deal with the exponential growth: Caching”.

(1997, Van Jacobson)

15

16

Web Caches (Proxy)

Web caches/proxy placed at entrance of an ISP

Client sends all http requests to web cache if object at web

cache, web cache immediately returns object in http response

else requests object from origin server, then returns http response to client

client

Proxyserver

client

http request

http re

quest

http response

http re

sponse

http re

quest

http re

sponse

http requesthttp response

origin server

origin server

Web Proxy/Cache

Web caches give good performance because very often a single client

repeatedly accesses the same document

a nearby client also accesses the same document

Cache Hit ratio increases logarithmically with number of users

17

app. server

C0

client 1

client 2 client

3

ISP cache

client 4

client 5

client 6

ISP cache

18

Benefits of Web Caching

Assume: cache is “close” to client (e.g., in same network)

smaller response time: cache “closer” to client

decrease traffic to distant servers link out of

institutional/local ISP network often bottleneck

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache

What went wrong with Web Caches? Web protocols evolved extensively to

accommodate caching, e.g. HTTP 1.1 However, Web caching was developed with a

strong ISP perspective, leaving content providers out of the picture It is the ISP who places a cache and controls it ISPs only interest to use Web caches is to reduce

bandwidth

In the USA: Bandwidth relative cheap In Europe, there were many more Web caches

However, ISPs can arbitrarily tune Web caches to deliver stale content

19

Content Provider Perspective

Content providers care about User experience latency Content freshness Accurate access statistics Avoid flash crowds Minimize bandwidth usage in their access

link

20

Content Distribution Networks Content Distribution Networks (CDNs) build an

overlay networks of caches to provide fast, cost effective, and reliable content delivery, while working tightly with content providers.

Example: Akamai – original and largest commercial CDN

operates over 25,000 servers in over 1,000 networks

Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students

21

Basic of Akamai Operation Content provider server

provides the base HTML document Akamai caches embedded objects at a set

of its cache servers (called edge servers) Akamaization of embedded content: e.g.,

<IMG SRC= http://www.provider.com/image.gif > changed to

<IMGSRC = http://a661. g.akamai.net/hash/image.gif>

Akamai customizes DNS to select serving edge servers based on closeness to client browser server load

22

More Akamai information

URL akamaization is becoming obsolete and only supported for legacy reasons Currently most content providers prefer to

use DNS CNAME techniques to get all their content served from the Akamai servers

still content providers need to run their origin servers

Akamai Evolution: Files/streaming Secure pages and whole pages Dynamic page assembly at the edge (ESI) Distributed applications

23

Lab: Problems of Traditional Content Distribution

24

app. server

C0

client 1

client 2

client 3

client n

DNS

25

Objectives of P2P

Share the resources (storage and bandwidth) of individual clients to improve scalability/robustness

Bypass DNS to find clients with resources! examples: instant

messaging, skype

Internet

P2P

But P2P is not new

Original Internet was a p2p system: The original ARPANET connected UCLA,

Stanford Research Institute, UCSB, and Univ. of Utah

No DNS or routing infrastructure, just connected by phone lines

Computers also served as routers

P2P is simply an iteration of scalable distributed systems

P2P Systems

File Sharing: BitTorrent, LimeWireStreaming: PPLive, PPStream, Zatto,

…Research systems

Collaborative computing: SETI@Home project

• Human genome mapping• Intel NetBatch: 10,000 computers in 25

worldwide sites for simulations, saved about 500million

Peer-to-Peer Computing- 40-70% of total traffic in many networks- upset the music industry, drawn college

students, web developers, recording artists and universities into court

Source: ipoque Internet study 2008/2009

29

Recap: P2P Objectives

Bypass DNS to locate clients with resources!examples: instant

messaging, skype

Share the storage and bandwidth of individual clients to improve scalability/robustness

Internet

The Lookup Problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Key=“title”Value=MP3 data… Client

Lookup(“title”)

?

find where a particular file is storedpay particular attention to see its equivalence of

DNS

31

Outline

RecapP2P

the lookup problem Napster

32

Centralized Database: Napster Program for sharing music over the Internet History:

5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster Online music service, wrote the program in 60 hours

12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of

Appeals: Napster knew users violating copyright laws

7/01: # simultaneous online users:Napster 160K

9/02: bankruptcy

We are referring to the Napster before closure.03/2000

33

Napster: How Does it Work?

Application-level, client-server protocol over TCP

A centralized index system that maps files (songs) to machines that are alive and with files

Steps: Connect to Napster server Upload your list of files (push) to server Give server keywords to search the full list Select “best” of hosts with answers

34

Napster Architecture

Napster: Publish

I have X, Y, and Z!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

Napster: Search

Where is file A?

Query Reply

search(A)-->123.2.0.18124.1.0.1

123.2.0.18

124.1.0.1

Napster: Ping

ping

123.2.0.18

124.1.0.1ping

Napster: Fetch

123.2.0.18

124.1.0.1fetch

39

Napster MessagesGeneral Packet Format

[chunksize] [chunkinfo] [data...]

CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes

CHUNKINFO: (hex) Intel-endian 16-bit integer.

00 - login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline

5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..

40

Centralized Database: Napster Summary of features: a hybrid design

control: client-server (aka special DNS) for files data: peer to peer

Advantages simplicity, easy to implement sophisticated

search engines on top of the index system

Disadvantages application specific (compared with DNS) lack of robustness, scalability: central search

server single point of bottleneck/failure easy to sue !

41

Variation: BitTorrent

A global central index server is replaced by one tracker per file (called a swarm) reduces centralization; but needs other

means to locate trackers

The bandwidth scalability management technique is more interesting more later

42

Outline

Recap P2P

the lookup problem Napster (central query server; distributed

data servers) Gnutella

Gnutella

On March 14th 2000, J. Frankel and T. Pepper from AOL’s Nullsoft division (also the developers of the popular Winamp mp3 player) released Gnutella

Within hours, AOL pulled the plug on it

Quickly reverse-engineered and soon many other clients became available: Bearshare, Morpheus, LimeWire, etc.

43

44

Decentralized Flooding: Gnutella

On startup, client contacts other servents (server + client) in network to form interconnection/peering relationships servent interconnection used to forward control (queries,

hits, etc) How to find a resource record: decentralized flooding

send requests to neighbors neighbors recursively forward the requests

45

Decentralized Flooding

B

A

C E

F

H

J

S

D

G

IK

M

N

L

46

Decentralized Flooding

B

A

C E

F

H

J

S

D

G

IK

send query to neighbors

M

N

L

Each node forwards the query to its neighbors other than the onewho forwards it the query

47

Background: Decentralized Flooding

B

A

C E

F

H

J

S

D

G

IK

M

N

L

Each node should keep track of forwarded queries to avoid loop ! nodes keep state (which will time out---soft state) carry the state in the query, i.e. carry a list of visited nodes

48

Decentralized Flooding: Gnutella

Basic message header Unique ID, TTL, Hops

Message types Ping – probes network for other servents Pong – response to ping, contains IP addr, # of files, etc.

Query – search criteria + speed requirement of servent QueryHit – successful response to Query, contains addr

+ port to transfer from, speed of servent, etc.

Ping, Queries are flooded QueryHit, Pong: reverse path of previous message

49

Advantages and Disadvantages of Gnutella

Advantages: totally decentralized, highly robust

Disadvantages: not scalable; the entire network can be swamped

with flood requests• especially hard on slow clients; at some point broadcast

traffic on Gnutella exceeded 56 kbps to alleviate this problem, each request has a TTL to

limit the scope• each query has an initial TTL, and each node forwarding

it reduces it by one; if TTL reaches 0, the query is dropped (consequence?)

Flooding: FastTrack (aka Kazaa) Modifies the Gnutella protocol into two-level hierarchy Supernodes

Nodes that have better connection to Internet Act as temporary indexing servers for other nodes Help improve the stability of the network

Standard nodes Connect to supernodes and report list of files

Search Broadcast (Gnutella-style) search across

supernodes Disadvantages

Kept a centralized registration prone to law suits

Optional Slides

51

Optional Slides

52

Aside: Search Time?

Aside: All Peers Equal?

56kbps Modem

10Mbps LAN

1.5Mbps DSL

56kbps Modem56kbps Modem

1.5Mbps DSL

1.5Mbps DSL

1.5Mbps DSL

Aside: Network Resilience

Partial Topology Random 30% die Targeted 4% die

from Saroiu et al., MMCN 2002

56

Asynchronous Network Programming

(C/C++)

57

A Relay TCP Client: telnet-like Program

TCP client

TCP server

writen

readn

fgets

fputs

http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpclient

58

Method 1: Process and Thread process

fork() waitpid()

Thread: light weight process pthread_create() pthread_exit()

59

pthread

Void main() { char recvline[MAXLINE + 1]; ss = new socketstream(sockfd);

pthread_t tid; if (pthread_create(&tid, NULL, copy_to, NULL)) { err_quit("pthread_creat()"); }

while (ss->read_line(recvline, MAXLINE) > 0) { fprintf(stdout, "%s\n", recvline); }}

void *copy_to(void *arg) { char sendline[MAXLINE];

if (debug) cout << "Thread create()!" << endl; while (fgets(sendline, sizeof(sendline), stdin)) ss->writen_socket(sendline, strlen(sendline));

shutdown(sockfd, SHUT_WR); if (debug) cout << "Thread done!" << endl;

pthread_exit(0);}

60

Method 2: Asynchronous I/O (Select)

select: deal with blocking system callint select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

FD_CLR(int fd, fd_set *set);FD_ZERO(fd_set *set);FD_ISSET(int fd, fd_set *set);FD_SET(int fd, fd_set *set);

61

Method 3: Signal and Select

signal: events such as timeout

62

Examples of Network Programming

Library to make life easier Four design examples

TCP Client TCP server using select TCP server using process and thread Reliable UDP

Warning: It will be hard to listen to me reading through the code. Read the code.

63

Example 2: A Concurrent TCP Server Using Process or Thread

Get a line, and echo it back Use select() For how to use process or thread, see

later Check the code at:

http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpserver

Are there potential denial of service problems with the code?

64

Example 3: A Concurrent HTTP TCP Server Using Process/Thread

Use process-per-request or thread-per- request

Check the code at:http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/simple_httpd

top related