impact of high performance sockets on data intensive applications
DESCRIPTION
Impact of High Performance Sockets on Data Intensive Applications. Pavan Balaji, Jiesheng Wu, D.K. Panda, CIS Department The Ohio State University. Tahsin Kurc, Umit Catalyurek, Joel Saltz, BMI Department The Ohio State University. Presentation Layout. Motivation and Background - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/1.jpg)
Impact of High Performance Sockets on Data Intensive
Applications
Pavan Balaji, Jiesheng Wu,D.K. Panda,
CIS DepartmentThe Ohio State University
Tahsin Kurc, Umit Catalyurek,Joel Saltz,
BMI DepartmentThe Ohio State University
![Page 2: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/2.jpg)
Presentation Layout
• Motivation and Background• Sockets Implementations
• DataCutter Library
• Experimental Results
• Conclusions and Future Work
![Page 3: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/3.jpg)
Background
• Data Intensive Applications Communication Intensive; I/O Intensive Require Guarantees in Performance Scalability with guarantees Adaptability to Heterogeneous Networks Several of them are built over TCP/IP
• Times have changed Faster networks available (cLAN, InfiniBand) Faster protocols available (VIA, EMP)
![Page 4: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/4.jpg)
Motivation• High Performance Sockets Layers
+ Take advantage of faster networks [balaji02, shah99]+ No changes to the applications- Bottleneck: Design of Applications based on TCP/IP Communication
• Questions Can a high performance substrate allow the implementation of a scalable interactive
data-intensive application with performance guarantees to the end user? Can a high performance substrate improve the adaptability of data-intensive
applications to heterogeneous environments?
• “High Performance User-Level Sockets over Gigabit Ethernet”, Pavan Balaji, Piyush Shivam, Pete Wyckoff and D. K. Panda, Cluster 2002, Chicago
• “High Performance Sockets and RPC over Virtual Interface (VI) Architecture”, H. V. Shah, C. Pu and R. S. M., CANPC workshop 1999.
![Page 5: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/5.jpg)
Latency with Bandwidth ConstraintTCP
Bandwidth
Message Size
TCP
VIA
Latency
Message Size
01
Reqd BW
0
0
1
1
2
VIA
• Latency Vs Message Size is studied• Latency Vs Bandwidth is relevant for performance guarantees
![Page 6: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/6.jpg)
An Example…
TCP
VIA
01
Reqd BW
01
VIA
TCP
• Image rendering should be interactive• Response times should be small
![Page 7: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/7.jpg)
Pipelining: Computation/Communication Overlap
Latency
Message Size (log Scale)
TCP
VIA
Computation
01
Compute Nodes
Linear Computation with Message Size
Root Node
![Page 8: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/8.jpg)
An Example…
Root Node
Linear Computation with Message Size
• Consider for perfect pipelining• TCP requires 16KB message size• VIA requires 2KB message size
• Say the computation function takes 1 sec/KB
• Each computation step takes• 16 secs for TCP• 2 secs for VIA
Compute Nodes
• Say, a node becomes slower by a factor of 2• Time taken by compute node
• (16 * 2) = 32 secs for TCP• Increases by 16 seconds
• (2 * 2) = 4 secs for VIA• Increases by 2 seconds
![Page 9: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/9.jpg)
Presentation Layout
• Motivation and Background
• Sockets Implementations• DataCutter Library
• Experimental Results
• Conclusions and Future Work
![Page 10: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/10.jpg)
Sockets Implementations
NIC
IP
TCP
Sockets
Application
“VI aware” NIC
IP
TCP
Sockets
Application
IP-to-VI layer
GigaNet cLAN NIC
Sockets over VIA
Application or Library
OS Agent
VIPL
Pros• High Compatibility
Cons• Kernel Context Switches• Multiple Copies• CPU Resources
Traditional Berkeley Sockets
GigaNet Sockets (LANE)
SocketVIA
• Kernel Context Switches• Multiple Copies• CPU Resources• High Performance
![Page 11: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/11.jpg)
Experimental Setup
• 16 Dell Precision 420 Nodes Dual 1 GHz PIII Processors 32bit 33MHz PCI bus 512MB SDRAM and 256K L2-level cache Linux kernel version 2.2.17
• GigaNet cLAN NICs cLAN 1000 Host Adapters cLAN 5300 Cluster switches
![Page 12: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/12.jpg)
Performance of SocketVIA Vs TCP
020406080
100120140160
4 8 16 32 64 128
256
512
1K 2K 4K
Message Size (bytes)
Late
ncy
(use
cs)
VIA SocketVIA TCP
0100200300400500600700800900
4 16 64 256
1K 4K 16K
64K
Message Size (bytes)B
andw
idth
(Mbp
s)
VIA SocketVIA TCP
Latency: TCP (45us), VIA (9us), SocketVIA (9.5us)
Bandwidth: TCP (510Mbps), VIA (790Mbps), SocketVIA (763Mbps)
![Page 13: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/13.jpg)
Presentation Layout
• Motivation and Background
• Sockets Implementations
• DataCutter Library• Experimental Results
• Conclusions and Future Work
![Page 14: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/14.jpg)
DataCutter Software Support for Data Driven Applications
• Component Framework for Combined Task/Data Parallelism
• User defines sequence of pipelined components (filters and filter groups)– Stream based communication
• User directive tells preprocessor/runtime system to generate and instantiate copies of filters
• Flow control between transparent filter copies– Replicated individual filters– Transparent: single stream illusion
6/20/2003 DataCutter 19
Combined Data/Task Parallelism
host1
R0
R1
host2
R2
host3
Ra0
host1
E0
EK
host2
EK+1
EN
host4
Ra1
host5
Ra2
host1
M
Cluster 1
Cluster 3
Cluster 2
http://www.datacutter.org
![Page 15: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/15.jpg)
DataCutter Library
NIC
IP
TCP
Sockets
DataCutter Library
Applications
GigaNet cLAN NIC
Sockets over VIA
DataCutter Library
OS Agent
VIPL
Applications
![Page 16: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/16.jpg)
Virtual Microscope Server
decompress clip subsample
View
• Pipelining of various stages: data reading, decompress, clipping, sub-sampling operations can be realized as a chain of filters.
• Replication of filters to obtain parallelism
read
![Page 17: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/17.jpg)
Software Load Balancing Data Reading
Load Balancer
Slower Node
![Page 18: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/18.jpg)
Presentation Layout
• Motivation and Background
• Sockets Implementations
• DataCutter Library
• Experimental Results• Conclusions and Future Work
![Page 19: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/19.jpg)
Experiments Conducted• Optimal Block Size
• Guarantee on Updates per Second (Complete Image)• Guarantee on Latency of Partial Update (Moving the Image)• Round Robin Load Balancing• Demand Driven Load Balancing
![Page 20: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/20.jpg)
Effects of Guarantees on Updates per Second (Complete Images)
0
500
1000
1500
2000
2500
3000
3500
4000
Late
ncy
of P
artia
l Upd
ates
(use
cs)
4 3.5 3 2.5 2
Complete Updates per Second
TCP SocketVIA SocketVIA(with DR)
• SocketVIA performs better• TCP can’t give guarantees > 3.25
• but…• Limited improvement• Design Decisions are bottlenecks
• Re-sizing of data blocks• Try to alleviate the bottlenecks• Only concern is Updates per Second• Achievable at low block sizes• No application changes (in this case)• Significant performance improvement
![Page 21: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/21.jpg)
Effects of Guarantees on Latency of Partial Updates (Moving the Image)
0
100
200
300
400
500
600
700
800
900
Com
plet
e U
pdat
es p
er S
econ
d
1000 800 600 400 200
Latency of Partial Updates (usecs)
TCP SocketVIA SocketVIA (with DR)
• For High latency guarantees…• Blindly using is good enough• Bandwidth Saturation• Pre-tuned applications
• For Low latency guarantees…
• TCP is no longer in the picture• Blindly using SocketVIA is not OK• Resizing of blocks can help
![Page 22: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/22.jpg)
Effect of Heterogeneous Clusters on Round Robin (RR) Scheduling
0
200
400
600
800
1000
1200
Rea
ctio
n tim
e of
Loa
d B
alan
cer (
usec
s)
2 4 8
Factor of Heterogeneity
SocketVIA TCP
• Dynamic Heterogeneity• Shared Processors• Process Swapping
• Perfect Pipelining• Complete overlap of comm. with comp.• Occurs at 16KB for TCP• At 2KB for VIA
• Scope of Error• A larger chunk to a slower node• More time for complete the chunk• More time for the load balancer to react
![Page 23: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/23.jpg)
Effect of Heterogeneous Clusters on Demand Driven (DD) Scheduling
0
20000
40000
60000
80000
100000
120000
140000
10 20 30 40 50 60 70 80 90
Probability of being Slow (%)
Exec
utio
n Ti
me
(use
cs)
SocketVIA(2) SocketVIA(4)SocketVIA(8) TCP(2)TCP(4) TCP(8)
• Demand Driven Scheduling• Additional Latency Cost• SocketVIA should perform better (?)• Natural overlap of comm. with comp.• Use of SocketVIA or TCP makes no diff.
![Page 24: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/24.jpg)
Presentation Layout
• Motivation and Background
• Sockets Implementations
• DataCutter Library
• Experimental Results
• Conclusions and Future Work
![Page 25: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/25.jpg)
Conclusions and Future Work• High Performance Sockets are good !
– It’s your friend– But, use it wisely
• Minor changes can make a major impact– Order of magnitude performance improvement– Sustained Performance Guarantees– Fine grained Load-Balancing
• Higher Adaptability to Heterogeneous Networks
• Benefits of Parallelization over Pipelining with SocketVIA for large clusters
• High Performance Sockets Implementations– TCP Termination (for the DataCenter environment)– Use in DSM, DataCenter and Storage Server environments
![Page 26: Impact of High Performance Sockets on Data Intensive Applications](https://reader036.vdocuments.net/reader036/viewer/2022062814/568166cc550346895ddad92d/html5/thumbnails/26.jpg)
For more information, please visit the
http://nowlab.cis.ohio-state.eduNetwork Based Computing Group,
The Ohio State University
Thank You!
NBC Home Page