an on/off link activation method for low-power ethernet in pc clusters
DESCRIPTION
An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters. Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani (U of Tokyo, Japan ) Hideharu Amano(Keio U/ NII, Japan). GbE. 56%. Interconnects share@TOP500 (Nov 2008 ). Gigabit Ethernet. - PowerPoint PPT PresentationTRANSCRIPT
-
Michihiro Koibuchi(NII, JapanTomohiro Otsuka(Keio U, JapanHiroki Matsutani U of Tokyo, JapanHideharu AmanoKeio U/ NII, JapanAn On/Off Link Activation Method for Low-Power Ethernet in PC Clusters
-
HPC PC Clusters with EthernetHost/CPUVarious low-power techniques are usedDVFSPower GatingEthernet SwitchAlways preparing (active) for packet injectionWe propose, and evaluate a low-power technique of Ethernet switches for PC clustersPCEthernet switch
-
Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationPerformance and power consumption of PC clusters
Outline
-
Ethernet on HPC systemsIncreasing the number of ports of GbE switches- 24/48-port switches provide the lowest cost per portImproving the computation power of host > 10GFlops)
Link aggregation [IEEE 802.3ad]multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004]- drastically increasing the number of linksswitchhostLink aggr. using 3 links
-
Power cons is almost constant regardless of traffic load# of activated ports dominates the power cons of switchesPower cons of port is reduced down to ZERO by port-shutdown operation Power cons of GbE switchesUnit
ProductPortOther (Xbar) Totalratio of portsPC53241.214.942.9(65%)PC62242.042.591.1(53%)PC62482.156.8155.2(63%)SF-4201.032.655.4(41%)C-37501.884.5127.7(34%)
-
Overview of the on/off link methodswitchnodeTraffic load becomes lowturning off a part of links)
TREE 1TREE 4TREE 3TREE 2Network load is not always high (e.g. during computation timeSwitch ports consume 40-60% of the total power
-
Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationPerformance and power consumption of PC clusters
Outline
-
A framework of on/off link methodEg port monitor, IPTraf, pilot executionHow is it implemented on Ethernet?Low traffic load is detectedPaths: Before & After the before path is deactivated
-
Requirements for the on/off link method To achieve a practical on/off link activation method,No update of the MPI communication libraryUsing existing functions of commercial switchesHiding the overhead to activate the linkStabilizing the MAC address tables during updating paths - Avoiding broadcast storms, and communication interruption TREE 1TREE 4TREE 3TREE 2SwitchHostBefore
-
Changing the paths for on/off link opUsing switch-taggedVLAN routing method[Otsuka,ICPP06]Specifying the path by attaching the VLAN tag to a frame Port VLAN ID: PVID) Each host sends and receives usual (untagged) framesWhen an frame arrives at a switch from a host, add a VLAN tag (PVID) to itWhen it leaves to a host, removes the VLAN tag
The path of PVID#v1The path of PVID#v004123567VLAN v0VLAN v1VLAN tag v0 is attached
-
When a deactivated link is activated (1) Activating the target linkUsing no-shutdown command of switch(2) Create VLAN v0 for the new path set that includes the target link, and make its MAC address table(3) Update the PVIDs of the ports for connecting hosts to v0
04123567Updating PVID to v0 BeforeWhen the traffic increases
-
When an activated link is deactivated(1) Create VLAN v1 for the new path set that avoids the target link, and make its MAC address table(2) Update the PVID of the ports for connecting hosts to v1(3) Deactivating the link
The path of PVID v0Before04123567DeactivatingDecreasing the traffic04123567Step 1,2The path of PVID v1
-
Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationOn/off link operationOverhead to modify the path setPerformance and power consumption of PC clusters
OutlineDell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports)We can buy them at $1,000-3,000
-
a link is continuously operated: on off on When enabling STP, the overhead becomes some dozens1 minTo hide this overhead, paths should be updated after completing the on/off operation
Fund. evalOn/Off overhead
On/Off Link Op.PC53244.0 (sec)PC62243.4PC62482.2SF-42012.0
-
Measure the overhead to change paths using VLANsCommunication is not interrupted!!Enabling the runtime on/off link activation Fund. eval(2)overhead to update pathsBeforeAfterUpdate PVID to v1VLANv0VLANv1
Path updatePC53240(sec)PC62240PC62480SF-4200
-
Performance evaluation on a PC clusterPC Cluster 128 hosts, Dual Opteron 1.8GHz x2MPICH 1.2.7p1GbE switchDell Power Connect624828host per switch48port@8ApplicationNPB 3.2
-
Topology of the clusterPeak: 42 torus, 6 links between switchesEnabling the link aggregation (IEEE 803.ad)Pre-executing the applications for estimating traffic amountSet up the on/off link set before executingTwo on/off link selection algorithms Conservative: maintain the maximum amount of traffic on a linkAggressive: further power reductiondetails are the proceeding
Torus
-
Results of NPB(64 procs, PC6248 SWFig 1PerformanceFig 2Power Cons of NWs,PC6248s26% of NW power cons is reduced w/o performance degradation0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressiveThe conservative policy maintained almost the peak performance
Graph1
10.99021.006
10.98090.7193
10.98650.9453
10.9990.9698
35offlink
14
24
10 40
11 40
53
peak(all links)
conservative
aggressive
Relative Mop/s
128node
peakbaseaggressive 1aggressive 2
CG10.98080.5166
LU10.99481.04260.8331
MG11.0811.0630.9324
peakbaseaggressive 1aggressive 2
CG10.9020.853
LU10.9510.8690.828
MG10.9550.9340.921
peak(all links)conservativeaggressive
CG10.98080.5166
LU10.99481.0426
MG11.0810.9324
peak(all links)conservativeaggressive
CG10.9020.853
LU10.9510.869
MG10.9550.921
128node
000
000
000
0.96
1.00
0.83
0.88
0.99
1.65
1.00(total pw(Wh))
1.00
0.92
peak(all links)
conservative
aggressive
Relative Power Cons(W)
Sheet2
000
000
000
24 offlinks
36 offlinks
12 32
11 21
peak(all links)
conservative
aggressive
Relative Mop/s
64node2
0000
0000
0000
peak
base
aggressive 1
aggressive 2
Relative Power Consump(W)
64node1
peakconfig 1config 2config 3
BT10.9704
CG10.9928
EP10.99021.006
IS10.98090.7193
LU10.98650.94530.84
MG11.109
SP10.9990.96980.9357
peakconfig 1config 2config 3#6248model
BT10.935
CG10.93
EP10.8250.735
IS10.930.88
LU10.950.80.75
MG10.87
SP10.9450.80.75
peakconfig 1config 2config 3#SF420peakconfig 1config 2config 3#5324model
BT10.942BT10.908
CG10.937CG10.901
EP10.8430.762EP10.7540.627
IS10.9370.892IS10.9010.831
LU10.9550.820.775LU10.930.7180.648
MG10.883MG10.817
SP10.9510.820.775SP10.9230.7180.648
64node1
0000
0000
0000
0000
0000
0000
0000
peak
config 1
config 2
config 3
Relative Mop/s
64-otherSW
0000
0000
0000
0000
0000
0000
0000
peak
config 1
config 2
config 3
Relative Power Consump(W)
tpds
0000
0000
0000
0000
0000
0000
0000
peak
config 1
config 2
config 3
Relative Power Consump(W)
0000
0000
0000
0000
0000
0000
0000
peak
config 1
config 2
config 3
Relative Power Consump(W)
peak(all links)conservative#6248model
BT10.935
CG10.93
MG10.87
peak(all links)conservativeaggressive
EP10.8250.735
IS10.930.88
LU10.950.8
SP10.9450.8
peak(all links)
conservative
Relative Power Cons(W)
1.00
1.00
1.00
0.83
0.73
0.95
1.22
0.96
0.85
0.95
1.00(total pw(Wh))
0.82
peak(all links)
conservative
aggressive
Relative Power Cons(W)
peak(all links)conservative
BT10.9704
CG10.9928
MG11.109
peak(all links)conservativeaggressive
EP10.99021.006
IS10.98090.7193
LU10.98650.94530.84
SP10.9990.96980.9357
peak(all links)
conservative
Relative Mop/s
35offlink
14
24
10 40
11 40
53
peak(all links)
conservative
aggressive
Relative Mop/s
peak(all links)baseaggressiveconfig 3#SF420
BT10.942
CG10.937
MG10.883
EP10.8430.762
IS10.9370.892
LU10.9550.820.775
SP10.9510.820.775
peak(all links)baseaggressiveconfig 3#5324model
BT10.908
CG10.901
MG10.817
EP10.7540.627
IS10.9010.831
LU10.930.7180.648
SP10.9230.7180.648
peak(all links)conservativeaggressive
EP10.8430.762
IS10.9370.892
LU10.9550.82
SP10.9510.82
peak(all links)conservativeaggressive
EP10.7540.627
IS10.9010.831
LU10.930.718
SP10.9230.718
000
000
000
000
000
000
000
peak(all links)
base
aggressive
Relative Power Cons(W)
000
000
000
000
000
000
000
peak(all links)
base
aggressive
Relative Power Cons(W)
000
000
000
000
0.85
0.76
1.00
1.00
1.00
0.96
1.24
0.97
0.87
0.95
0.85
1.00(total pw(Wh))
peak(all links)
conservative
aggressive
Relative Power Cons(W)
000
000
000
000
1.00
1.00
1.00
0.76
0.62
0.92
1.16
0.94
0.76
0.92
0.74
1.00(total pw(Wh))
peak(all links)
conservative
aggressive
Relative Power Cons(W)
all links are activatedon/off activation
CG.12810.902
LU.12810.951
BT.6410.935
CG.6410.93
EP.6410.825
IS.6410.93
LU.6410.95
MG.6410.87
SP.6410.945
00
00
00
00
00
00
00
00
00
24 off links
12 off links
13 off links
14 off links
35 off links
14 off links
10 off links
26 off links
11 off links
all links are activated
on/off activation
Relative Power Consumption (W)
-
Results of NPB(64 procs, other SWsA small number of services in L2 switchPC5324) is always running compared with that of L3 switch PC6248)0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressive0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressiveFig 3Power Cons,SF-420sFig 4Power Cons,PC5324The L2 switches reduces the larger ratio of power cons
-
On/Off interconnection networksCannot be directly applied to EthernetM.Alonso[IPDPS05],V.Soteriou[TPDS07]
Our on/off link method enables to support some of them in EthernetDVFS for interconnection networksL.Shang[HPCA03], J.M.Stine[CAL04]Using multi-speed Ethernet (10M/100M/GbE/10GE) is similar to the approach for DVFSDell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1WRelated Work
-
We propose the on/off link method on EthernetUsing port-shutdown command for reducing power consSwitch ports consume up to 60% of power cons in GbE switchStabilizing the update of the MAC address tableEvaluations on the PC cluster with GbE switchesNo overhead to update pathsReducing down to up to 37% of NW power cons
We will provide the total solution of Ethernet for Low-Power PC clustersLink aggre. + multi-path topology + on/off linksConclusions
*****1****1******1************
**Tree(1link) 558.8Tree(3link)947.7Tree(6link)1034Compl(1link)1069Compl(2link)1081Torus(1link)901.7Torus(3link)1074Torus(6link)1077Mesh(1link)793.3Mesh(3link)1022Mesh(6link)1055Ring(8link)1066
**Tree(1link) 558.8Tree(3link)947.7Tree(6link)1034Compl(1link)1069Compl(2link)1081Torus(1link)901.7Torus(3link)1074Torus(6link)1077Mesh(1link)793.3Mesh(3link)1022Mesh(6link)1055Ring(8link)1066
*****