water/air cooling system of the k computer idle mode€¦ · water/air cooling system of the k...
TRANSCRIPT
A View from the Facility Operations Side on the Water/Air Cooling System of the K ComputerJorji Nonaka, Keiji Yamamoto, Akiyoshi Kuroda, Toshiyuki Tsukamoto (RIKEN R-CCS)Kazuki Koiso, Naohisa Sakamoto (Kobe University)
AbstractThe Operations and Computer Technologies Division at the RIKEN R-CCS is responsible for the operations of the entire HPC Facility, whichincludes the supercomputer itself and its auxiliary subsystems such as the power supply and water/air cooling subsystems. It is worth noting thatpart of these subsystems will be reused in the next supercomputer Fugaku, thus a better understanding of the operational behavior as well asthe potential impacts especially on the hardware failure and power consumption would be greatly beneficial. In this poster, we will presentsome preliminary impressions of the impact of the water/air cooling system on the K computer system, focusing on the potential benefits of theuse of low water/air temperature respectively for the CPU (15oC) and DRAM (17oC) produced by the chilled water cooling system. We expectthat the obtained knowledge will be helpful for the decision support and/or operation planning of the next supercomputer Fugaku.
Contact: Jorji Nonaka <[email protected]>
HPC Usability Development Unit (HUD Unit)Operations and Computer Technologies Division
RIKEN Center for Computational Science
AcknowledgementsPart of the results was obtained by using the K computer at the RIKEN R-CCS. We are grateful for the colleagues at the RIKEN R-CCS whodirectly or indirectly collaborated in this work, and we especially thank Fumiyoshi Shoji (Director of the Operations and ComputerTechnologies Division), Atsuya Uno (Unit Leader of the System Operations and Development Unit), and Shun Ito (currently at Fujitsu),for their helpful collaboration during the experiments, and also some local staffs from Fujitsu for their supportive assistance.
CPU cooling water10oC chilled water is used to control theCPU cooling water temperature (set to15oC). This graph shows a 1-day inputand output water temperature, and thewater flow inside a heat exchanger.
Idle modeThis graph shows the impact of thewater cooling temperature on the powerconsumption of an entire compute rack(T45) during the idle period of the Kcomputer. We observed an increase ofaround 1.75% (20oC) and 3.5% (25oC) inthe energy consumption.
Benchmark applicationsWe utilized five benchmark applicationswith well-known behavior to evaluatethe power consumption of an entirecompute rack (T45). We could observe apower consumption increase of less than4%, when increasing the CPU coolingwater temperature in 10oC (25oC).
ConclusionsWe could observe in practice some of the theoretical benefits (energyconsumption and hardware failure) of using low cooling water temperature(15±1oC) when running the K computer. We could also observe that evenincreasing the CPU cooling water temperature in 10oC, it may still allow thehardware to operate within specification with limited impact on the energyconsumption and hardware failure rate. We expect that the obtainedknowledge will be helpful for the decision support and operation planningof the next supercomputer Fugaku.
Temperature variation inside a compute rackCPU and the cooling air temperature variation inside a compute rack (T45)during the execution of some benchmark applications. SLEEP (Do nothing);PEK99 (CPU intensive); MEM72 (Memory intensive); SUB09 (CPU/Memorybalanced use); and ADVMV (Kernel from a production grade application).
CPU / ICC SB / DRAM
Cooling Water (Around 15oC)
Cooling Air (Around 17oC)
Energy consumption
Hardware failure
CPU and DRAM failuresSpatiotemporal distribution ofthe compute racks which havesubstituted CPU and DRAM dueto the hardware failure (From
Feb. 2012 to May 2019).
Accumulated number offailures per rack did not exceedthree (CPU) and five (DRAM),and the neighborhood of rackT45 concentrated the rackswith higher DRAM failures.
Chilled Water (Around 10oC)
Compute Rack
InterConnect Controller System BoardSPARC64 VIIIfx CPU
DDR3 Memory
CPU
ICC
DRAM
System Board
Water-cooling moduleEvaluationsWe utilized a single compute rack (T45),with an attached power monitoring andlogging device, and the low priority“Micro” class job in order to verify thetemperature variation behavior, and theenergy consumption.
CPU
DRAM