update on farm monitor and control domenico galli, bologna rttc meeting genève, 14 april 2004

15
Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Upload: jacob-goodman

Post on 19-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

Update on Farm Control. 3 Domenico Galli Test of Sub-Farm Monitor & Control Software on Linux SLC3 Sub-farm monitor & control software (SFM­ 0.2) has been tested on Linux SLC3. lm_sensors package had to be recompiled and istalled in order to monitor temperatures and fans. SFM­0.2 package works without recompiling. No imcompatibilities detected.

TRANSCRIPT

Page 1: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Monitor and Control

Domenico Galli, Bologna

RTTC meetingGenève, 14 april 2004

Page 2: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 2Domenico Galli

Outline Test of Sub-Farm Monitor & Control

software on Linux SLC3. PVSS Boot Manager. Changes in monitor PVSS Panels. IPMI-DIM power manager.

Page 3: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 3Domenico Galli

Test of Sub-Farm Monitor & Control Software on Linux SLC3 Sub-farm monitor & control software (SFM

0.2) has been tested on Linux SLC3. lm_sensors package had to be

recompiled and istalled in order to monitor temperatures and fans.

SFM 0.2 package works without recompiling.

No imcompatibilities detected.

Page 4: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 4Domenico Galli

Boot Manager A PVSS panel has been developed to

configure the boot of the subfarms nodes controlled by a control PC.

The panel allows to add/remove/configure the nodes of a sub-farm, by specifying hostname, MAC address and IP address.

At present the panel write a text file containing the configuration of the nodes.

The target is to write directly the DHCP table for the control PC.

Page 5: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 5Domenico Galli

PVSS Monitor Panels A button has been added to all the monitor

panels to configure the thresholds for warning & error state of the statemachine.

A PVSS scriptcompare themonitoredvalue with thethreshold, andif it isexceeded, astate machinetransition istriggered.

Page 6: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 6Domenico Galli

PVSS Monitor Panels (II) If the button is pressed, a new panel is

open, in which an“expert user”can set thealarmthresholds.

Page 7: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 7Domenico Galli

IPMI (Intelligent Platform Management Interface) What IPMI can be useful for?

Switching on/off the power supply of the farm nodes without using expensive network-controlled power distributors.

Monitoring the power status of the farm nodes (on/off).

Monitoring temperatures, fan speeds, power supply voltages, etc. in a OS-independent way.

Accessing on-board event-log.

Page 8: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 8Domenico Galli

IPMI Interfaces IPMI

KCS (Keyboard Controller Style) interface (AKA open interface)

Local interface (interface to the host OS), unauthenticated. Can be accessed through the openIPMI linux software. Can’t be used to swich on a PC or to power cycle a hung-up

PC. LAN interface

Network interface, session-based, authenticated. Designed to be always available (even when the system is

powered down or when the OS is hung or inactive). Hardware implementation. OS independent.

Page 9: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 9Domenico Galli

IPMI LAN Interface Server side (farm node):

Harware implementation. NIC hardware redirects to BMC the Ethernet frames

containing datagrams destined to UDP port 623. Configured by means of PC startup configuration utility. May use DHCP to set up network

parameters. No need of additional

software. Client side (control PC).

Client software, e.g.: IPMItool,freeIPMI, IPMIsh linux software.

ManagementNetwork

Controller

(BMC)Baseboard

ManagementController

Control PC(IPMI client)

UDP port 623

LANFarm node

otherEthernetframes

Page 10: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 10Domenico Galli

IPMI Power Commands on: power-up the chassis. off: power-down the chassis (without a clean shut-

down of the OS). cycle: power-down, wait 1 second, and power-up

again. soft_off: initiate a soft-shutdown of OS via ACPI by

emulating a fatal over-temperature condition. hard_reset: pulse the system reset signal. pulse_diag: pulse a version of a diagnostic

interrupt that goes directly to the processor(s). This is typically used to cause the operating system to do a diagnostic dump (OS dependent).

Page 11: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 11Domenico Galli

DIM-IPMI Power Manager A Power Manager (based on IPMI and DIM) to

switch on/off the power to the Farm Nodes is under development.

Each Control PC runs aDIM server interfaced toIPMI and publishes, for eachnode, acommandand a service.

Control PCIPMI-DIMserver

SFN-001-01BMC

SFN-001-02BMC

SFN-001-03BMC

SFN-001-04BMC

SFN-001-05BMC

IPMI

DIM Services:/SFN-001-01/power_status/SFN-001-02/power_status/SFN-001-03/power_status

DIM Commands:/SFN-001-01/power_switch on|off|soft_off|cycle/SFN-001-02/power_switch on|off|soft_off|cycle/SFN-001-03/power_switch on|off|soft_off|cycle

PVSS-DIMclient

PVSSGUI

Farm Nodes

DIM

CMD-lineclient

Page 12: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 12Domenico Galli

Status of DIM-IPMI Power Manager We started using with IPMItool’s libintf_lan.so

library. Problems:

IPMI response takes at least 0.7 s. In case of a disconnected node, timeout takes about 16 s. A complete cycle over 200 nodes, to update the farm

power status, takes therefore 140-3200 s. Solution:

Use one thread for each node to be contacted, in order to parallelize IPMI connections.

But: libintf_lan.so library is not thread-safe (global variables,

timeouts using signals+longjmp, etc.)

Page 13: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 13Domenico Galli

Status of DIM-IPMI Power Manager (II) DONE

IPMItool’s libintf_lan.so deeply hacked, in order to make it “more” thread-safe (no more global variables, no more signals & longjmps to time-out).

A power manager DIM server and a command-line DIM client are ready and working (tested on a Dell PowerEdge SC 1425 without OS).

TODO: Conflicts between commands and status monitor on the

same node must be arbitrated by the DIM-IPMI server (if the NIC BMC is processing a command, it is not able to receive other commands).

Add mutex to the library to protect non-thread-safe system/library calls (e.g. malloc, free, etc.).

Page 14: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 14Domenico Galli

Power Manager Command-Line ClientpwSwitch [-m hostname] on|off|(cycle|soft_off)

N.B.: nodelhcbcn2 isdisconnected!

command time out

service time out service time out

command time out

Page 15: Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004

Update on Farm Control. 15Domenico Galli

Power Manager PVSS Client Work in progress. Basically one PVSS panel showing:

A list of the controlled nodes with their power status (on, off).

Buttons for power on / off / soft_off / cycle / power_reset / pulse_diag.