1 cheetah-utk collaboration outline potential topics for collaboration cheetah goals with...
TRANSCRIPT
1
Cheetah-UTK collaboration
Outline Potential topics for collaboration Cheetah goals with implications on
network design CDN applications Collaboration with HOPI Three-domain network: UTK, Cheetah,
HOPI Data plane Control plane
Tao Li & Malathi VeeraraghavanUniversity of Virginia
May 3, 2007
Potential topics for UTK-Cheetah collaboration
Applications CDN DVTS
Control-plane Internetworking with cheetah control-plane
solution Virtualizer
For Force10 switches: need this to enable HOPI to serve as a testbed for simultaneous networking experiments
Proposed Three-Domain Network (UTK, Cheetah, HOPI)
ATL
SN16kNxGbE
SLR Force10
SN16k
SN16k
ORNLOC192
Zelda1/2/3 GbE
UTK ORNL
Force10
Zelda4/5UTK
server
UTK server
UTK Humanities
Force10
UTK SERF Force10
Wukong/Wuneng
Washington HOPI
Force10
PC3
10GbE
OC192
GMPLS UNI
Gloriad Force10
NxGbE
NYC HOPI Force10
PC3
PC3
CHEETAH PCCHEETAH
PC
PC3
PC3
CHEETAH PC
Chicago HOPI
Force10
Seattle HOPI
Force10
LA HOPI Force10
10GbE
10GbE10GbE
10GbE
NxGbE
UTK
Cheetah
4
Cheetah goals
Original goals: Support eScience applications primarily the
Terascale Supernova Initiative (TSI) Connect scientist's cluster at NCSU to ORNL Cray TSI project is now complete
2007 refocus Design and demonstrate use of an internetworking
architecture consisting of a core circuit-switched network with control-plane support for dynamic call-by-call bandwidth sharing with connectionless packet-switched regional/enterprise networks quantifiable benefits to applications
Motivation
Bottom-up: Circuit switches, being position-based, are
cheaper for higher-rate interfaces and larger switching capacities than packet switches
Core network switches need higher link rates and switching capacities
Therefore, use circuit switches in the core
5
Motivation contd.
Today’s Internet & Internet2 do use circuit switches in the core e.g., OC192 links between Abilene routers traverse a
SONET circuit-switched network However, circuits are provisioned; leased lines held for
long durations (years); PVCs in ATM lingo For a network of circuit switches to qualify as a
"network" as opposed to a set of "wires," bandwidth sharing must be implemented: Control-plane call-by-call sharing SVCs in ATM lingo.
6
Top-down motivation
Are there any applications for a "core" circuit-switched network with SVC capability?
Without these, it is a technology-driven solution without a problem Technology: control-plane protocols Data-plane aspect of circuit switches already in use
in the form of PVCs
7
Applications for circuit-switched networks with SVC capability
eScience applications: focus is on providing high-bandwidth per call circuit/virtual circuit (VC) capability required enterprise-
to-enterprise, not just in the core network General file-transfer applications:
need to focus on mode of bandwidth sharing due to scale request and obtain dedicated bandwidth and use
temporarily in contrast with TCP bandwidth sharing on connectionless
packet-switched networks, where the amount of bandwidth a flow receives can vary within a flow's duration
has value even if the circuit/VC is just a core network
Host
Host
Packetswitch
Packetswitch
Enterprise/regional
Internetworking architecture for connectionless packet-switched enterprise/regional networks and core circuit-switched network with SVC capability
e.g. roadways network
e.g. airport
e.g. airlines network
Core network
e.g. roadways network
Host
Host
Packetswitch
Packetswitch
Enterprise/regional
Circuitswitch
Circuitswitch
e.g. airport
Key point: gateways need to be connection-oriented packet switchesAnalogy: Airline passenger calls ahead, makes reservation for flight beforedriving on roadways network to reach the airport (gateway)
GatewayGateway
Call ahead beforesending flow that needsits own circuit
Explanation
If gateways are IP routers, and they are operated in connectionless mode (which means no "connection setup" phase prior to data transfer), there is no simple solution to trigger SVC set up and teardown Hence PVC usage of core circuit-switched network IP-over-ATM efforts suggested automatic flow
classification: if X packets for a flow are detected in Y sec, assume it is a long-lived flow and initiate SVC setup Didn't work. Why?
Technology problem: guessing game to identify long-lived flows
More importantly: no value End-to-end bandwidth sharing is still TCP based
What is a "connection-oriented" gateway?
Example: squid web proxy (caching) server When http request arrives, think of it as as a
data-plane packet with an implicit signaling call setup request
If web proxy's secondary NIC (into the circuit-switched) network is tied up in circuits to other web proxy servers distinct from the one identified as best parent for this request, call is effectively rejected
Falls back to primary NIC path and uses connectionless IP path between squid servers
Is there any value?
Clear value to the service provider Cheaper circuit switches at higher rates
implies cheaper core-network connectivity services
Pass some of the savings to users Value to the user:
Different mode of bandwidth sharing Allows for user to pay for and obtain
differentiated services on a per-flow basis Pricing model can capture temporal fairness
Implications of goals on network design Data-plane:
Circuit granularity should be moderate/small, e.g., OC1 Control-plane
Fast call setup delay: if 1-sec call setup delay, holding time should be at least n times this number, e.g., n=10.
Call handling throughput should be high e.g., 160Gbps switching capacity; per-call rate:
300Mbps; number of concurrent calls = 533 if call holding time = 10 sec, switch controller should
handle 53 calls/sec Distributed call controller
Control-plane implications of goals
General-purpose file transfer applications Moderate bandwidth per call Short call holding times Control-plane solution only needs to support
immediate-request mode of bandwidth sharing eScience applications
High bandwidth per call Long (hours) holding time Control-plane solution should support book-ahead
mode of bandwidth sharing
15
Book-ahead (BA) vs. immediate request (IR) modes of bandwidth sharing
Call blocking probability is low when m is high; IR mode works Mean waiting time is proportional to mean call holding time
Can afford to use IR mode when m is small if calls are short Implement some form of call queueuing
But if circuit rates are high and holding times are large, need to support book-ahead mode
Large m Moderate-rate circuits Small m
Short calls Long callsBank teller Doctor's office
High-rate circuits
immediate-request
immediate-request book-ahead
m is the link capacityexpressed in channelse.g., if 1Gbps circuitsare assigned on a 10Gbps link,m = 10
Bandwidth sharing modes
16
Examples representative of the two classes of applications
ApplicationsMetric
Video telephony(IR application)
Video conferencing(BA application)
Call arrival rate High (many endpoints)
Low (one or two endpoints per enterprise)
Call holding time
Short (~3 mins) Long (~1 hr)
Bandwidth required per call
Low Higher
Mode of request Immediate usage
Can plan ahead and reserve bandwidth
17
Outline check
Outline Potential topics for collaboration Cheetah goals with implications on network
design CDN applications Collaboration with HOPI Three-domain network
Data plane Control plane
18
Open-source CDN software
OpenCDN Globule CDN CoralCDN CoDeeN
19
OpenCDN OpenCDN is an application-level content delivery network
(CDN), suitable for live and recorded multimedia content distribution
Creates an application-level multicast-relay tree because IP multicast routing is not widely enabled in routers
Architecture: Data-plane: A tree of streaming servers forming a chain from origin
server to client; New clients join the tree at closest server Control-plane: Request Routing and Distribution Manager (RRDM)
origin and relay nodes register their streaming protocol capabilities
relay nodes registers the IP address space they are willing to serve Currently supported streaming servers:
Darwin Streaming Server by Apple Helix Universal Server by Real
Architecture
• Portal: Web site that shows meta data for the content stored on origin servers available for streaming.
• Nodes Relay servers; Runs streaming clients on one side and servers on the other side
21
Fit of OpenCDN with Cheetah Is Cheetah appropriate to carry concatenated streams
of data with relay nodes located at the Cheetah PoPs? Circuit granularity: OC1 too high for one stream
Could set up one OC1 between two relay nodes and carry multiple streams across this OC1
Holding time: Could be large, for popular TV stations, if held "all-the-time" leased circuits are better
Alternative solution: Use file transfers to copy video files between origin
nodes and CDN servers at Cheetah PoPs Add streaming servers to the CDN servers for serving
local clients Keep the RRDM registration and software for identifying
ideal relay node for client. Relay CDN server
22
Globule CDN Globule is a collaborative content delivery
network, where content providers (origins) share each other’s server resources (an enterprise uses another enterprise's servers as replicas) in contrast to using a commercial CDN service
Each origin server maintains some backup servers and some replica servers
Clients are redirected to their closest replica servers using a redirector (http or DNS)
Need to upgrade Apache Web Server with a Globule module
23
Fit of Globule CDN with Cheetah Deploy replica servers at Cheetah PoPs Use Cheetah for copying files
between replica servers if origin server is on enterprise/regional network after initial copy from origin server to closest replica server
between origin server and replicas if the origin server is itself located at a PoP
Are the updates to replicas automatic? If pre-configured, what criteria are used to determine
which replica servers to use? Paper mentions that only partial copies are maintained
in replicas See IEEE Comm. Mag. Aug. 2006 paper
Akamai model
Steps in looking up a URL when a CDN is used.
Courtesy: Tanenbaum's Fourth Edition slides from Prentice Hall
Content Delivery Networks
(a) Original Web page. (b) Same page after transformation.
Courtesy: Tanenbaum's Fourth Edition slides from Prentice Hall
DNS or http redirection HTTP supports a Location header, which can be
included in the response with the URL of the CDN server to which the http request is redirected
DNS redirection: appears that the DNS server serving the origin server needs to be modified to provide the IP address of an appropriate CDN server based on the client location
Which is appropriate for our deployment? See our goals for deploying applications Understand sclabaility of call-by-call dynamic bandwidth
sharing Need to sign on web servers for this application unlike in
squid where we need to sign on web clients
27
Outline check
Outline Potential topics for collaboration Cheetah goals with implications on network
design CDN applications Collaboration with HOPI
Applications Control-plane Virtualizer
Three-domain network Data plane Control plane
28
Application testing Applications
Selected to show-case advantages of high-speed dedicated virtual circuits (VCs) between PCs located at HOPI PoPs
Mostly file transfer applications Examples:
Web proxy (caching) servers: allows users not directly connected to HOPI to nevertheless use it (use HOPI VCs for inter-proxy file transfers)
CDN and web mirroring: locate these servers at HOPI PoPs, and use VCs for file movement between CDN servers/mirrors
IPTV: move video files between IPTV servers located at PoPs that serve local audiences
Email servers: SMTP-to-SMTP server file transfers Storage and disaster recovery
CDN: Content Delivery network; SMTP: Simple Mail Transfer Protocol
29
Process for application testing
End hosts on which to run applications: Use existing "support" PCs at HOPI PoPs, or Collocate UVa-provided PCs at HOPI PoPs
Obtain virtual circuits from HOPI TSC as required for the experiments, and run tests
Goal of deploying applications: Actively solicit and sign-on users Need to generate sufficient traffic to understand
bandwidth sharing aspects of circuit-switched network
Quite different from running bbcp on two servers in an experiment to obtain throughput of one flow
30
Control-plane testing
Cheetah Control-Plane Module (CCPM) Implements distributed bandwidth
management One CCPM per HOPI Force10 switch to manage
the bandwidth for all the interfaces on that particular switch
Dynamic virtual-circuit service for calls with high call arrival rates short durations moderate bandwidth immediate-request type
Virtualizing HOPI PCs and
Force10 switches
Virtualizing PCs: Invite contributions of PCs from
researchers, to locate at HOPI PoPs Slice these PCs and offer usage to the
whole community Virtual Force10 switch
To support multiple control-plane and management-plane projects
32
Outline check
Outline Potential topics for collaboration Cheetah goals with implications on network
design CDN applications Collaboration with HOPI
Applications Control-plane Virtualizer
Three-domain network Data plane Control plane
Proposed HOPI-CHEETAH-UTK Three-Domain Network: Data plane
ATL
SN16kNxGbE
SLR Force10
SN16k
SN16k
ORNLOC192
Zelda1/2/3 GbE
UTK ORNL
Force10
Zelda4/5UTK
server
UTK server
UTK Humanities
Force10
UTK SERF Force10
Wukong/Wuneng
Washington HOPI
Force10
PC3
10GbE
OC192
GMPLS UNI
Gloriad Force10
NxGbE
NYC HOPI Force10
PC3
PC3
CHEETAH PCCHEETAH
PC
PC3
PC3
CHEETAH PC
Chicago HOPI
Force10
Seattle HOPI
Force10
LA HOPI Force10
10GbE
10GbE10GbE
10GbE
NxGbE
UTK
Cheetah
Cheetah control-plane solution
Developed a software program called circuit-requestor for CHEETAH end hosts
Use built-in GMPLS control-plane software in Sycamore switch controllers
Current solution: supports only port-mapped GbE-STS3-7v-GbE
circuits Planned upgrades:
support VLAN mapped to sub-Gbps SONET circuits
Based on the RSVP-TE code from KOM/DRAGON About 40K lines of C++ code
What we changed: Modified the code to inter-operate with the Sycamore SN16000 Added admission control, session management, user interface, etc. Integrated code for DNS lookup from our partner CUNY Designed and implemented APIs for general applications About 4K lines of new code
CHEETAH architecture
Application
DNS client
RSVP-TE module
TCP/IP
C-TCP/IPNIC 1
NIC 2
End HostCHEETAH software
Internet
SONET circuit-switched network
CircuitGateway
CircuitGateway
Application
DNS client
RSVP-TE module
TCP/IP
C-TCP/IPNIC 1
NIC 2
End HostCHEETAH software
CHEETAH end-host software – includes circuit-requestor + daemons
DNS client
RSVPD API
DNS server
RSVP-TE Daemon(RSVPD)
DNS lookup
CHEETAH Daemon(CD)
socket
Application
CD APIsocket
User space
Kernel spaceC-TCP
C-TCP API
End host
Circuit-requestor
RSVP-TE
messages
Steps:
•DNS lookup (to support our scalability goal)
•Circuit setup signaling procedure (RSVP-TE)
Cheetah control-plane solution
Usage: User logins to a CHEETAH end host
Option 1: Uses circuit-requestor program to request the setup
of a dedicated 1Gb/s Ethernet-SONET-Ethernet circuit to another CHEETAH end host
Runs application, such as file transfer Uses circuit-requestor program to release circuit
Option 2: User starts file-transfer application and unbeknownst
to the user, the software decides whether it is appropriate to use a circuit and if so, sets it up, transfers the file and releases the circuit
Circuit-requestor usage To setup a new circuit:
circuit-requestor setup domain-name-of-called-host bandwidth [holding-time] Default holding-time: 10 mins Max holding time: 1 hour Limit call holding time for fair bandwidth sharing
To renew an existing circuit: circuit-requestor renew session-id [new-holding-time] Release unused circuits if there is no renewal
To release an existing circuit:circuit-requestor release session-id
To check the status of the CHEETAH trunk: circuit-requestor status
Option 2
Modified squid software to have the squid server automatically initiate circuit setup when it receives an http request that requires it to obtain the file from another squid server (parent) located at cheetah/HOPI PoP
Release circuit if less than 10 packets seen on secondary NIC with 60sec (ICP packets do appear between squid servers)
Measurements
Circuit type End-tend circuit setup delay (s)
Processing delay for Path
message(s) at sn16k-nc
Processing delay for Resv
message(s) at the sn16k-nc
OC-1 0.166103 0.091119 0.008689
OC-3 0.165450 0.090852 0.008650
1Gb/s EoS 1.645673 1.566932 0.008697
Round-trip signaling message propagation plus emission delay between sn16k-atl and sn16k-nc: 0.025s
Signaling delays incurred in setting up a circuit between zelda1 and wuneng across the CHEETAH network.
Why the initial DNS lookup?
Verify called end host is on CHEETAH network and to obtain MAC address of the second (CHEETAH) NIC
Why is MAC address necessary? Need to program ARP table to avoid wide-area ARP
lookups
IP and MAC addressing issues Our completed Control-Plane Network Design document
describes Why we chose static public IP addresses for the second
(CHEETAH) NICs at end hosts Choose addresses based on CHEETAH host’s location – i.e.,
allocate address from enterprise’s public IP address space allocation (e.g., UVA hosts: 128.143.x.x)
Reason: Scalability Impact:
After dedicated circuit is setup: far end NIC has IP address from a different subnet Default setting of IP routing table entries will indicate that
such an address is only reachable through the default gateway
Our solution: automatically update IP routing and ARP tables
Update IP routing and ARP tables at both end hosts as last step of circuit setup Analogous to switch fabric configuration
Routing table update Add an entry indicating the remote host is directly
reachable through the second (CHEETAH) NIC
ARP table update Add an entry for the MAC address of the remote
CHEETAH NIC
Addressing in the CHEETAH network
Whether private and/or dynamic IP addresses can be assigned to the data-plane and control-plane interfaces in GMPLS networks? Data-plane Addresses
Static Need to be “called” by other clients
Public Globally unique Scalable to multiple autonomous-systems
Private IP addresses sufficient if goal for CHEETAH is to create a small eScience network
Control-plane Addresses Static
Configured in Traffic-Engineered (TE) link Public
Same scalability reason
Control-plane security
IPsec based Use Juniper NS-5 devices on SN16000 control
ports Use Openswam IPsec software on Linux end
hosts IPsec tunnels created between primary NICs
of hosts and NS-5 devices of switches IPsec tunnels created between switch NS-5
devices for switch-to-switch control-plane exchanges
CCPM for HOPI Force10
Apply same approach as in cheetah network
Since Force10 does not have a built-in GMPLS control-plane, we implemented CCPM
Run one CCPM per host