an on/off link activation method for low-power ethernet in pc clusters

21
Michihiro Koibuchi(NII, Japan Tomohiro Otsuka(Keio U, Japan Hiroki Matsutani U of Tokyo, Japan Hideharu Amano Keio U/ NII, An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters

Upload: hallie

Post on 19-Mar-2016

41 views

Category:

Documents


0 download

DESCRIPTION

An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters. Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani (U of Tokyo, Japan ) Hideharu Amano(Keio U/ NII, Japan). GbE. 56%. Interconnects share@TOP500 (Nov 2008 ). Gigabit Ethernet. - PowerPoint PPT Presentation

TRANSCRIPT

  • Michihiro Koibuchi(NII, JapanTomohiro Otsuka(Keio U, JapanHiroki Matsutani U of Tokyo, JapanHideharu AmanoKeio U/ NII, JapanAn On/Off Link Activation Method for Low-Power Ethernet in PC Clusters

  • HPC PC Clusters with EthernetHost/CPUVarious low-power techniques are usedDVFSPower GatingEthernet SwitchAlways preparing (active) for packet injectionWe propose, and evaluate a low-power technique of Ethernet switches for PC clustersPCEthernet switch

  • Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationPerformance and power consumption of PC clusters

    Outline

  • Ethernet on HPC systemsIncreasing the number of ports of GbE switches- 24/48-port switches provide the lowest cost per portImproving the computation power of host > 10GFlops)

    Link aggregation [IEEE 802.3ad]multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004]- drastically increasing the number of linksswitchhostLink aggr. using 3 links

  • Power cons is almost constant regardless of traffic load# of activated ports dominates the power cons of switchesPower cons of port is reduced down to ZERO by port-shutdown operation Power cons of GbE switchesUnit

    ProductPortOther (Xbar) Totalratio of portsPC53241.214.942.9(65%)PC62242.042.591.1(53%)PC62482.156.8155.2(63%)SF-4201.032.655.4(41%)C-37501.884.5127.7(34%)

  • Overview of the on/off link methodswitchnodeTraffic load becomes lowturning off a part of links)

    TREE 1TREE 4TREE 3TREE 2Network load is not always high (e.g. during computation timeSwitch ports consume 40-60% of the total power

  • Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationPerformance and power consumption of PC clusters

    Outline

  • A framework of on/off link methodEg port monitor, IPTraf, pilot executionHow is it implemented on Ethernet?Low traffic load is detectedPaths: Before & After the before path is deactivated

  • Requirements for the on/off link method To achieve a practical on/off link activation method,No update of the MPI communication libraryUsing existing functions of commercial switchesHiding the overhead to activate the linkStabilizing the MAC address tables during updating paths - Avoiding broadcast storms, and communication interruption TREE 1TREE 4TREE 3TREE 2SwitchHostBefore

  • Changing the paths for on/off link opUsing switch-taggedVLAN routing method[Otsuka,ICPP06]Specifying the path by attaching the VLAN tag to a frame Port VLAN ID: PVID) Each host sends and receives usual (untagged) framesWhen an frame arrives at a switch from a host, add a VLAN tag (PVID) to itWhen it leaves to a host, removes the VLAN tag

    The path of PVID#v1The path of PVID#v004123567VLAN v0VLAN v1VLAN tag v0 is attached

  • When a deactivated link is activated (1) Activating the target linkUsing no-shutdown command of switch(2) Create VLAN v0 for the new path set that includes the target link, and make its MAC address table(3) Update the PVIDs of the ports for connecting hosts to v0

    04123567Updating PVID to v0 BeforeWhen the traffic increases

  • When an activated link is deactivated(1) Create VLAN v1 for the new path set that avoids the target link, and make its MAC address table(2) Update the PVID of the ports for connecting hosts to v1(3) Deactivating the link

    The path of PVID v0Before04123567DeactivatingDecreasing the traffic04123567Step 1,2The path of PVID v1

  • Ethernet for HPCLink aggregation (channel group) + multi-pathsOn/Off link activation methodEvaluationsOverhead of On/Off link operationOn/off link operationOverhead to modify the path setPerformance and power consumption of PC clusters

    OutlineDell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports)We can buy them at $1,000-3,000

  • a link is continuously operated: on off on When enabling STP, the overhead becomes some dozens1 minTo hide this overhead, paths should be updated after completing the on/off operation

    Fund. evalOn/Off overhead

    On/Off Link Op.PC53244.0 (sec)PC62243.4PC62482.2SF-42012.0

  • Measure the overhead to change paths using VLANsCommunication is not interrupted!!Enabling the runtime on/off link activation Fund. eval(2)overhead to update pathsBeforeAfterUpdate PVID to v1VLANv0VLANv1

    Path updatePC53240(sec)PC62240PC62480SF-4200

  • Performance evaluation on a PC clusterPC Cluster 128 hosts, Dual Opteron 1.8GHz x2MPICH 1.2.7p1GbE switchDell Power Connect624828host per switch48port@8ApplicationNPB 3.2

  • Topology of the clusterPeak: 42 torus, 6 links between switchesEnabling the link aggregation (IEEE 803.ad)Pre-executing the applications for estimating traffic amountSet up the on/off link set before executingTwo on/off link selection algorithms Conservative: maintain the maximum amount of traffic on a linkAggressive: further power reductiondetails are the proceeding

    Torus

  • Results of NPB(64 procs, PC6248 SWFig 1PerformanceFig 2Power Cons of NWs,PC6248s26% of NW power cons is reduced w/o performance degradation0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressiveThe conservative policy maintained almost the peak performance

    Graph1

    10.99021.006

    10.98090.7193

    10.98650.9453

    10.9990.9698

    35offlink

    14

    24

    10 40

    11 40

    53

    peak(all links)

    conservative

    aggressive

    Relative Mop/s

    128node

    peakbaseaggressive 1aggressive 2

    CG10.98080.5166

    LU10.99481.04260.8331

    MG11.0811.0630.9324

    peakbaseaggressive 1aggressive 2

    CG10.9020.853

    LU10.9510.8690.828

    MG10.9550.9340.921

    peak(all links)conservativeaggressive

    CG10.98080.5166

    LU10.99481.0426

    MG11.0810.9324

    peak(all links)conservativeaggressive

    CG10.9020.853

    LU10.9510.869

    MG10.9550.921

    128node

    000

    000

    000

    0.96

    1.00

    0.83

    0.88

    0.99

    1.65

    1.00(total pw(Wh))

    1.00

    0.92

    peak(all links)

    conservative

    aggressive

    Relative Power Cons(W)

    Sheet2

    000

    000

    000

    24 offlinks

    36 offlinks

    12 32

    11 21

    peak(all links)

    conservative

    aggressive

    Relative Mop/s

    64node2

    0000

    0000

    0000

    peak

    base

    aggressive 1

    aggressive 2

    Relative Power Consump(W)

    64node1

    peakconfig 1config 2config 3

    BT10.9704

    CG10.9928

    EP10.99021.006

    IS10.98090.7193

    LU10.98650.94530.84

    MG11.109

    SP10.9990.96980.9357

    peakconfig 1config 2config 3#6248model

    BT10.935

    CG10.93

    EP10.8250.735

    IS10.930.88

    LU10.950.80.75

    MG10.87

    SP10.9450.80.75

    peakconfig 1config 2config 3#SF420peakconfig 1config 2config 3#5324model

    BT10.942BT10.908

    CG10.937CG10.901

    EP10.8430.762EP10.7540.627

    IS10.9370.892IS10.9010.831

    LU10.9550.820.775LU10.930.7180.648

    MG10.883MG10.817

    SP10.9510.820.775SP10.9230.7180.648

    64node1

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    peak

    config 1

    config 2

    config 3

    Relative Mop/s

    64-otherSW

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    peak

    config 1

    config 2

    config 3

    Relative Power Consump(W)

    tpds

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    peak

    config 1

    config 2

    config 3

    Relative Power Consump(W)

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    peak

    config 1

    config 2

    config 3

    Relative Power Consump(W)

    peak(all links)conservative#6248model

    BT10.935

    CG10.93

    MG10.87

    peak(all links)conservativeaggressive

    EP10.8250.735

    IS10.930.88

    LU10.950.8

    SP10.9450.8

    peak(all links)

    conservative

    Relative Power Cons(W)

    1.00

    1.00

    1.00

    0.83

    0.73

    0.95

    1.22

    0.96

    0.85

    0.95

    1.00(total pw(Wh))

    0.82

    peak(all links)

    conservative

    aggressive

    Relative Power Cons(W)

    peak(all links)conservative

    BT10.9704

    CG10.9928

    MG11.109

    peak(all links)conservativeaggressive

    EP10.99021.006

    IS10.98090.7193

    LU10.98650.94530.84

    SP10.9990.96980.9357

    peak(all links)

    conservative

    Relative Mop/s

    35offlink

    14

    24

    10 40

    11 40

    53

    peak(all links)

    conservative

    aggressive

    Relative Mop/s

    peak(all links)baseaggressiveconfig 3#SF420

    BT10.942

    CG10.937

    MG10.883

    EP10.8430.762

    IS10.9370.892

    LU10.9550.820.775

    SP10.9510.820.775

    peak(all links)baseaggressiveconfig 3#5324model

    BT10.908

    CG10.901

    MG10.817

    EP10.7540.627

    IS10.9010.831

    LU10.930.7180.648

    SP10.9230.7180.648

    peak(all links)conservativeaggressive

    EP10.8430.762

    IS10.9370.892

    LU10.9550.82

    SP10.9510.82

    peak(all links)conservativeaggressive

    EP10.7540.627

    IS10.9010.831

    LU10.930.718

    SP10.9230.718

    000

    000

    000

    000

    000

    000

    000

    peak(all links)

    base

    aggressive

    Relative Power Cons(W)

    000

    000

    000

    000

    000

    000

    000

    peak(all links)

    base

    aggressive

    Relative Power Cons(W)

    000

    000

    000

    000

    0.85

    0.76

    1.00

    1.00

    1.00

    0.96

    1.24

    0.97

    0.87

    0.95

    0.85

    1.00(total pw(Wh))

    peak(all links)

    conservative

    aggressive

    Relative Power Cons(W)

    000

    000

    000

    000

    1.00

    1.00

    1.00

    0.76

    0.62

    0.92

    1.16

    0.94

    0.76

    0.92

    0.74

    1.00(total pw(Wh))

    peak(all links)

    conservative

    aggressive

    Relative Power Cons(W)

    all links are activatedon/off activation

    CG.12810.902

    LU.12810.951

    BT.6410.935

    CG.6410.93

    EP.6410.825

    IS.6410.93

    LU.6410.95

    MG.6410.87

    SP.6410.945

    00

    00

    00

    00

    00

    00

    00

    00

    00

    24 off links

    12 off links

    13 off links

    14 off links

    35 off links

    14 off links

    10 off links

    26 off links

    11 off links

    all links are activated

    on/off activation

    Relative Power Consumption (W)

  • Results of NPB(64 procs, other SWsA small number of services in L2 switchPC5324) is always running compared with that of L3 switch PC6248)0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressive0.60.70.80.911.1EPISLUSPRelative Power Cons(W)peak(all links)conservativeaggressiveFig 3Power Cons,SF-420sFig 4Power Cons,PC5324The L2 switches reduces the larger ratio of power cons

  • On/Off interconnection networksCannot be directly applied to EthernetM.Alonso[IPDPS05],V.Soteriou[TPDS07]

    Our on/off link method enables to support some of them in EthernetDVFS for interconnection networksL.Shang[HPCA03], J.M.Stine[CAL04]Using multi-speed Ethernet (10M/100M/GbE/10GE) is similar to the approach for DVFSDell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1WRelated Work

  • We propose the on/off link method on EthernetUsing port-shutdown command for reducing power consSwitch ports consume up to 60% of power cons in GbE switchStabilizing the update of the MAC address tableEvaluations on the PC cluster with GbE switchesNo overhead to update pathsReducing down to up to 37% of NW power cons

    We will provide the total solution of Ethernet for Low-Power PC clustersLink aggre. + multi-path topology + on/off linksConclusions

    *****1****1******1************

    **Tree(1link) 558.8Tree(3link)947.7Tree(6link)1034Compl(1link)1069Compl(2link)1081Torus(1link)901.7Torus(3link)1074Torus(6link)1077Mesh(1link)793.3Mesh(3link)1022Mesh(6link)1055Ring(8link)1066

    **Tree(1link) 558.8Tree(3link)947.7Tree(6link)1034Compl(1link)1069Compl(2link)1081Torus(1link)901.7Torus(3link)1074Torus(6link)1077Mesh(1link)793.3Mesh(3link)1022Mesh(6link)1055Ring(8link)1066

    *****