rdma vs tcp experiment. goal environment test tool - iperf test suits conclusion
Post on 20-Dec-2015
232 views
TRANSCRIPT
RDMA vs TCP experiment
• Goal
• Environment
• Test tool - iperf
• Test Suits
• Conclusion
Goal
• Test maximum and average bandwidth usage in 40Gbps(Infiniband) and 10Gbps(iWARP) network environment
• Compare CPU usage between TCP and RDMA data transfer mode
• Compare CPU usage between RDMA READ and RDMA WRITE mode
Environment
40 Gbps Infiniband
10 Gbps iWARP
Netqos03/client Netqos04/server
Tool - iperf• Migrate iperf 2.0.5 to the RDMA environment
with OFED(librdmacm and libibverbs).• 2000+ Source Lines of Code added.• From 8382 to 10562.
• iperf usage extended– -H: RDMA transfer mode instead of TCP/UDP– -G: pr(passive read) pw(passive write)
– Data read from server. – Server writes into clients.
– -O: output data file, both TCP server and RDMA server
• Only one stream to transfer
Test Suits• test suits 1: memory -> memory
• test suits 2: file -> memory -> memory– test case 2.1: file(regular file) -> memory -> memory– test case 2.2: file(/dev/zero) -> memory -> memory– test case 2.3: file(lustre) -> memory -> memory
• test suits 3: memory -> memory -> file– test case 3.1: memory -> memory -> file(regular file)– test case 3.2: memory -> memory -> file(/dev/null)– test case 3.3: memory -> memory -> file(lustre)
• test suits 4: file -> memory -> memory -> file– test case 4.1: file ( regular file) -> memory -> memory -> file( regular file)– test case 4.2: file(/dev/zero) -> memory -> memory -> file(/dev/null)– test case 4.3: file(lustre) -> memory -> memory -> file(lustre)
File choice• File operation with Standard I/O library– fread, fwrite, Cached by OS
• Input with /dev/zero wants to test the maximum application data transfer include file operation – read, which means disk is not the bottleneck
• Output with /dev/null wants to test the maximum application data transfer include file operation – write, which means disk is not the bottleneck
Buffer choice
• RDMA operation block size is 10MB– RDMA READ/WRITE one time
– Previous experiment shows that, in this environment, if the block size more than 5MB, there is little effect to the transfer speed
• TCP read/write buffer size is the default
• TCP window size: 85.3 KByte (default)
Test case 1: memory -> memory CPU
Test case 1: memory -> memory Bandwidth
Test case 2.1: (fread)file(regular file) -> memory -> memory CPU
Test case 2.1: (fread)file(regular file) -> memory -> memory
Bandwidth
Test case 2.2 (five minutes) file(/dev/zero) -> memory -> memory CPU
Test case 2.2 (five minutes) file(/dev/zero) -> memory -> memory Bandwidth
Test case 3.1 (200G file are generated): memory -> memory -> file(regular file) CPU
Test case 3.1 (200G file are generated): memory -> memory -> file(regular file)
Bandwidth
Test case 3.2: memory -> memory -> file(/dev/null) CPU
Test case 3.2: memory -> memory -> file(/dev/null)
Bandwidth
Test case 4.1:file(r) -> memory -> memory -> file(r) CPU
Test case 4.1:file(r) -> memory -> memory -> file(r)
Bandwidth
Test case 4.2:file(/dev/zero) -> memory -> memory ->
file(/dev/null) CPU
Test case 4.2:file(/dev/zero) -> memory -> memory ->
file(/dev/null) Bandwidth
Conclusion
I. For one data transfer stream, the RDMA transport is twice as fast as TCP, while the RDMA has only 10% of CPU load compare with the CPU load under TCP, without disk operation.
II. FTP includes two components: Networking and File operation. Compare with the RDMA operation, file operation (limited by the disk performance) takes most of the CPU usage. Therefore, a well-designed file buffer mode is critical.
Future work
• Setup Lustre environment, and configure Lustre with RDMA function
• Startup FTP migrationo Source controlo Bug databaseo Documento etc (refer to The Joel Test)
Memory Cached Cleanup
# sync
# echo 3 > /proc/sys/vm/drop_caches