introduction to content-aware switch
DESCRIPTION
Introduction to Content-aware Switch. Presented by Li Zhao. Content-aware Switch (CS). www.yahoo.com. Internet. Image Server. IP. TCP. APP. DATA. Application Server. Switch. GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…. HTML Server. Front-end of a web cluster - PowerPoint PPT PresentationTRANSCRIPT
Content-aware Switch (CS)
Switch
Image Server
Application Server
HTML Server
www.yahoo.comInternet
GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…
APP. DATATCPIP
• Front-end of a web cluster• Route packets based on layer 5/7 (content)
information
Why use CS
• Servers can be specialized for certain types of request– Content segregation
• Exploit locality – Affinity-based routing– Increase the performance because of the improved hit
rate
• Partial replication of server file set– Partition the server’s file set over different nodes
Content-aware Switch Architecture
• Two way architectureServer returns theresponse to the switch
• One way architectureServer returns theresponse to the client
serverswitchclient
[Valeria01]
Layer-7 Two-way Mechanisms
• TCP gateway An application level proxy
running on the web switch mediates the communication between the client and the server
• TCP splicing reduce the overhead in TCP
gateway. Packet forwarding occurs at network level between the network interface driver and the TCP/IP stack, is carried out directly by OS
kernel
user
kernel
user
TCP Splicingclient
content switch server
step1
step2
SYN(CSEQ)
SYN(DSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(DSEQ+1)step3
step7
step8
step4
step5
step6
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
DATA(CSEQ+1) ACK(SSEQ+1)
DATA(SSEQ+1) ACK(CSEQ+lenR+1)
DATA(DSEQ+1) ACK(CSEQ+LenR+1)
ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1)
lenR: size of http request. lenD: size of return document
.
TCP Splicing w/ Pre-forked Connections
client
switch
server
step1
step2
SYN(CSEQ)
SYN(DSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(DSEQ+1)
step3
step7
step8
step4
step5
step6
DATA(PSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(PSEQ+lenR+1)
DATA(DSEQ+1) ACK(CSEQ+LenR+1)
ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1)
lenR: size of http request. lenD: size of return document
.
SYN(PSEQ)
SYN(SSEQ)ACK(PSEQ+1)
ACK(SSEQ+1)
step9
Ref [Yang99]
Pre-Allocate Server Schemeclient
content switch Pre-allocatedserver
step1
step2
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)step3
step4
step5
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
DATA(CSEQ+1) ACK(SSEQ+1)
DATA(SSEQ+1)ACK(CSEQ+lenR+1)
DATA(SSEQ+1)ACK(CSEQ+LenR+1)
ACK(SSEQ+lenD+1) ACK(SSEQ+lenD+1)
• Use a guess routing decision based on IP/Port#/History• Advantage:
• Faster than TCP splicing.• Reduce session processing overhead
no need to convert server sequence # Ref [Edward]
Degenerated to TCP Splicing If Guess Wrong
client content switch
Pre-allocatedserver
step1
step2
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)step3
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
step4
step5
DATA(RSEQ+1)ACK(CSEQ+lenR+1)
DATA(SSEQ+1)ACK(CSEQ+LenR+1)
ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1)
FIN(CSEQ+1)
step4
step5
step6
SYN(CSEQ)
SYN(RSEQ) ACK(CSEQ+1)
DATA(CSEQ+1) ACK(SSEQ+1)
Right server
Sequence # conversion needed
Results
• Overhead of the switch• 89usec reduced pre-forked
connections
• CS vs. Layer 4 switch• Affinity-based routing vs. WRR• Content-segregation vs. WRR
• CGI: 27%• Static: 36%
IBM Switch Architecture
• Switch core• Port controller:
– Identify packets (layer 5) and send them to CPU
– Processing all other packets
• CPU: PowerPC 603e – Parse http request– URL based routing
Ref [Pradhan99]
Results
• CS vs. Layer 4 switch– Entire set of
files are replicated
– Some servers share files by NFS
– Partitioned file set
Layer-7 one-way mechanisms
• TCP handoffThe switch hands off the TCP connection endpoint to the server
• TCP connection hop– Software-based proprietary solution– encapsulating the IP packet in an RPX packet
and sending it to the server.
TCP Handoffclient
content switch server
step1
step2
SYN(CSEQ)
SYN(DSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(DSEQ+1)step3
step4
step5
step6
DATA(DSEQ+1)
ACK(CSEQ+lenR+1) ACK(DSEQ+lenD+1) ACK(DSEQ+lenD+1)
Migrate(Data, CSEQ, DSEQ)
• Migrate the created TCP connection from the switch to the back-end sever– Create a TCP connection at the back-end without going through the TCP
three-way handshake– Retrieve the state of an established connection and destroy the connection
without going through the normal message handshake required to close a TCP connection
• Once the connection is handed off to the back-end server, the switch must forward packets from the client to the appropriate back-end server [Pai98]
TCP Handoff
(1) a client connects to the front-end
(2) the dispatcher at the front-end accepts the connection and hands it off to a back-end server using the handoff protocol
(3) the back-end takes over the established connection received by the handoff protocol
(4) the server at the back-end accepts the created connection
(5) the server at the back-end sends replies directly to the client
Scalable Cluster Design
Switch• Dispatcher component
• Implement the request distribution: decide which server should handle request
• 0.8usec
• Distributor component• Distribute the client
requests to the server (handoff or splicing)
• 300usec for handoff, >750usec for splicing
Cluster Operation
(1) The layer 4 switch receives a SYN packet, choose the least loaded distributor(2) the distributor accepts the TCP connection and parses the client request(3) the distributor contacts the dispatcher for the assignment of the request to a server(4) the distributor hands off the connection using TCP handoff protocol to the server(5) the server takes over the connection using its handoff protocol(6) the server application at the server node accepts the connection(7) The server sends the response directly to the client(8) (not shown) the switch forward TCP acknowledgments to the corresponding server
Results
• The proposed cluster architecture scales far better than the one with a single front-end node.
Our Current Research on CS
Host CPU
MACIX Bus
PCI Bus
StrongARMME
ME
ME
ME
ME
ME
• IXP 1200• StrongARM @
233MHz• Microengine(6)
• IXP 2400• Xscale @
700MHz• Microengines(8)
References
• [Pradhan00] G.Apostolopoulos, et. al, Design, Implementation and Performance of a Content-Based Switch, proceedings of IEEE INFOCOM-2000
• [Pai98] V.S. Pai, et. al, Locality-Aware Request Distribution in Cluster-based Network Servers. In Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, Oct.1998
• [Aron00] Mohit Aron et. al, Scalable Content-aware Request Distribution in Cluster-based Network Servers, Proc. of the 2000 Annual Usenix Technical Conference, June 2000
• [Edward] C. Edward Chow Chow, Introduction to content switch• [Valeria01] Valeria Cardellini, et. al, The state of the Art in Locally
Distributed Web-server Systems, IBM research report• [Yang99] Chu-Sing Yang, et. Al, Efficient support for content-based
rouging in web server clusters, Proc. Of USITS’ 99