![Page 1: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/1.jpg)
iWARP: Its Not Just For The LAN AnymoreDennis Dalessandro
Commodity Cluster SymposiumJuly 25, 2006
![Page 2: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/2.jpg)
Agenda
Introduction to iWARP
iWARP Hardware History
Current iWARP Research
iWARP Road Map
![Page 3: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/3.jpg)
What is the problem with TCP/IP anyway?
Network processing = lots of CPU Costly at 1Gbps, imagine 10Gbps
Why? Complex protocol stack processed by CPU Movement of data from memory to NIC by CPU
![Page 4: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/4.jpg)
Why TCP/IP is so costly
ApplicationMemory
TCP
IP
Read Data
Read Data
Send Data
Send DataNIC
OS KernelOS Kernel
![Page 5: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/5.jpg)
Possible solutions
TCP Offload Engine (TOE) Offloads processing of TCP/IP stack Good but not enough
Remote Direct Memory Access (RDMA) Offloads processing of TCP/IP stack Also has Zero-Copy
![Page 6: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/6.jpg)
TOE Card
ApplicationMemory
Read Data
Read Data
Send Data
Send DataTCP
OS KernelOS Kernel
IP
TOETOE
Protocol work offloaded, but CPU still moves the data through OS
![Page 7: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/7.jpg)
RDMA
ApplicationMemory
RDMA
OS KernelOS Kernel
Transport
RNIC/HCARNIC/HCA
Protocol offloaded, and CPU does not move data.
![Page 8: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/8.jpg)
Examples of RDMA
InfiniBand, Myrinet, QuadricsRequire special infrastructure
Do not work in the WAN
Great performance Latency can't be beat
Tried and true IB very common
![Page 9: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/9.jpg)
iWARP - The new kid on the block
iWARP = RDMA over Ethernet (TCP/IP) Runs over existing network infrastructure
WAN Capable!
IETF RFC specifications RDMAP, DDP, MPA
Downside Switch cost for 10 Gigabit New technology
![Page 10: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/10.jpg)
Hardware HistoryAmmasso Inc
First commercially available Only 1 Gigabit Blazed the trail Allowed researchers to experiment with iWARP Ceased operations late 2005
Allowed researchers to continue iWARP work Everything learned is still applicable
Ammasso presence still felt OpenIB - now OpenFabrics driver
![Page 11: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/11.jpg)
New players on the sceneNetEffect
10 Gigabit iWARP adapter Outperforms IB in terms of throughput Boards are selling now OSC leading the way
Paper to appear at RAIT'06 (IEEE Cluster 2006) September 28 in Barcelona, Spain
Chelsio Has an adapter as well
Driver in OpenFabrics source tree
Broadcom ??????
![Page 12: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/12.jpg)
NetEffect performance
Throughput
![Page 13: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/13.jpg)
NetEffect performance cont...
Latency
![Page 14: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/14.jpg)
Switch overhead
*Fulcrum Micro Test Platform
![Page 15: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/15.jpg)
10 Gig iWARP
Comparable (better?) in performance to IB Higher throughput than standard 4X IB Switch latency is comparable A bit higher latency at small message sizes Appropriate for cluster interconnect Appropriate for high-end servers Appropriate for storage (iSCSI)
Just getting started with it WAN tests Interoperability with other iWARP HW
![Page 16: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/16.jpg)
Current iWARP work
iWARP in the WAN Main point of this talk
Interoperability of iWARP devices Ammasso, NetEffect, Software iWARP
RDMA enabled web server Apache mod_rdma and proxy server
RDMA enabled FTP client/serverReal applications with NetEffect device
![Page 17: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/17.jpg)
OSC iWARP resources
NetRes Cluster Up to 41 Ammasso On TFN
P4 Cluster 17 Ammasso On TFN
NetEffect 2 Servers On TFN
![Page 18: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/18.jpg)
Basic performance
At 1 Gbps TCP about same as iWARP Today's processors capable of 1Gbps
At high CPU utilization
10 Gbps will be a different story Things do not work the same in WAN
Tunable network parameters a must Window Size MTU?
![Page 19: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/19.jpg)
Window size effect in WAN
iWARP TCP
*Note 113.1 KB not full BW for iWARP
![Page 20: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/20.jpg)
iWARP FTP
Demo at SC 2005Work in progress to create production versionWritten in OpenFabrics verbs API
Will work on iWARP or InfiniBand
Intended use: Move large data sets in WAN
![Page 21: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/21.jpg)
Basic FTP Performance
iWARP TCP/IP200K .010 s .015 s1M .021 s .031 s
10M .117 s .247 s100M 1.05 s 2.59 s
Server: SpringfieldClient: Columbus Link: 10Gbps (TFN)About same perf
![Page 22: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/22.jpg)
The iWARP Benefit
*16 clients each time
![Page 23: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/23.jpg)
iWARP in the WWWApache
(mod_rdma)
iWARP Proxy
TCPClient
TCPClient
RDMAClient
RDMA module for Apache Add RDMA info in header Request/Response in TCP Data transfer with RDMA
Downside RDMA connection build up
Why not full iWARP port? Extensive changes to
Apache code base New feature not new web
server
![Page 24: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/24.jpg)
RDMA enabled Apache web server
mod_rdma Apache module to handle RDMA transfers “ Grab” out going data and ship it with RDMA Manipulate headers for minor changes
Simple changes, nothing fundamental All the benefits of Apache
No rewrite of Apache code needed Utilizes Apache hooks
![Page 25: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/25.jpg)
GET /index.html HTTP/1.1Host: www.osc.eduUser-Agent: Mozilla/5.0Connection: Keep-AliveRDMA: server-writes, ip=10.0.0.15, port=3242, stag=642, to=0, maxlen=1048576
mod_rdma cont.
Server Writes Client has to guess size of file RDMA connect takes time
Same as TCP connection (it is)
![Page 26: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/26.jpg)
GET /index.html HTTP/1.1Host: www.osc.eduUser-Agent: Mozilla/5.0Connection: Keep-AliveRDMA: clinet-reads, ip=10.0.0.14, port=3242 maxlen=1048576
mod_rdma cont...
HTTP/1.1 200 OKHost: www.osc.eduContent-Length: 1327Connection: Keep-AliveRDMA: client-reads, stag=642, to=0
Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to
tell server RDMA read done
![Page 27: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/27.jpg)
RDMA enabled Apache performance
1 page with 20 images Stock wget RDMA enabled wget
CPU usage for 2,4,6 clients RDMA
starts out low, stays low TCP
starts out in middle goes and stays high
iWARP
TCP
![Page 28: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/28.jpg)
mod_rdma perf cont..
![Page 29: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/29.jpg)
Example web based appDatabase of all US cities
Includes zip code, latitude, longitude, etc. One fake person from each city A little over 42,000 entries
User: “ give me all people within X miles of Zip”Server: responds with a variable number of
results w/pictures per page lots of trig for PHP to crunch on lots of querying for MySQL database pictures ensure lots of data to transfer
Developed by Manu Mukerji
![Page 30: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/30.jpg)
Two scenarios
Server Server
iWARPiWARP iWARPiWARP TCPTCP TCPTCP
wgetwgetiWARP/TCPiWARP/TCP
1. Back end RDMA clients1. Back end RDMA clients 2. Back end TCP clients2. Back end TCP clients
![Page 31: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/31.jpg)
Sample app performance
back end iWARP
back end TCP
iWARP wgetiWARP wget TCP wgetTCP wget
![Page 32: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/32.jpg)
Server performance
back end iWARP
back end TCP
TCP wgetTCP wgetiWARP wgetiWARP wget
![Page 33: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/33.jpg)
Upcoming work...
NetEffect interoperability with Ammasso cards with Software iWARP
OpenFabrics port of mod_rdma Including SSL support
OpenFabrics port of wgetMany 1Gig clients to single 10Gig server
http ftp
![Page 34: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/34.jpg)
iWARP road to adoptionBeginning
Hardware iWARP in most high end of servers Software iWARP in clients
After time.... HW iWARP clients will begin to appear SW iWARP will become common
In parallel..... Specialty clusters of iWARP
Eventually World will move beyond 1 Gig iWARP is one of the best answers for Ethernet
![Page 35: iWARP: Its Not Just For The LAN AnymoreRDMA: client-reads, stag=642, to=0 Client Reads Still have RDMA connect Server replies with RDMA info Client has to send an extra ACK to tell](https://reader033.vdocuments.net/reader033/viewer/2022053016/5f172dd5902cf72c6e74319b/html5/thumbnails/35.jpg)
Conclusion
iWARP is WAN capable iWARP is a viable cluster interconnectHW is now availableWill make a difference in servers todayBenefit all computing not just HPC