green computing observatory

23
The Green Computing Observatory Cécile Germain-Renaud 1 , Fredéric Fürst 2 , Gilles Kassel 2 , Julien Nauroy 1 , Michel Jouvin 3 , Guillaume Philippon 3 1: Laboratoire de Recherche en Informatique, U. Paris Sud, CNRS, INRIA 2: Université Picardie Jules Verne 3: Laboratoire de l’Accélérateur Linéaire, CNRS-IN2P3

Upload: germainrenaud

Post on 10-May-2015

825 views

Category:

Technology


2 download

DESCRIPTION

Talk at CGC 2011

TRANSCRIPT

Page 1: Green Computing Observatory

The Green Computing Observatory

Cécile Germain-Renaud1, Fredéric Fürst2, Gilles Kassel2, Julien Nauroy1, Michel Jouvin3, Guillaume Philippon3

1: Laboratoire de Recherche en Informatique, U. Paris Sud, CNRS, INRIA

2: Université Picardie Jules Verne

3: Laboratoire de l’Accélérateur Linéaire, CNRS-IN2P3

Page 2: Green Computing Observatory

Outline

�  Contexts

�  Sensors

�  Information Model

�  Scientific issues

�  Conclusion

GCG 2011 The Green Computing Observatory

Page 3: Green Computing Observatory

Motivation and Goals

�  Energy usages are complex systems �  Sophisticated HW/SW mechanisms eg ACPI, dynamically over-clocking of

active cores, and other optimisations based on on-line statistical monitoring. �  Interaction with local cooling provisioning (eg. fan speed) and global cooling

�  Validating generative or predictive models and policies requires behavioral models based on real data

�  The first barrier to improved energy efficiency is the difficulty of collecting data on the energy usage of individual components, and the lack of overall data collection.

GCO monitors energy usage at a large computing center, and publishes them through the Grid Observatory.

�  A second barrier is making the collected data usable, consistent and complete.

GCO adopts an ontological approach in order to rigorously define the semantics of the data and the context of their production.

GCG 2011 The Green Computing Observatory

Page 4: Green Computing Observatory

The GRIF-LAL computing room

GCG 2011 The Green Computing Observatory

�  13 racks hosting 1U systems, 4 lower-density racks (network, storage), resulting in ≈240 machines and 2200+ cores, and 500TB of storage.

�  Mainly a Tier 2 in the EGI grid, but also includes local services and the StratusLab Cloud testbed

�  High-throughput, worldwide workload, analysis-oriented production facility, accessible approximation of a data center

Page 5: Green Computing Observatory

Sensors

GCG 2011 The Green Computing Observatory

1GByte/day at 5 minutes sampling period

Page 6: Green Computing Observatory

IPMI

�  IPMI = Intelligent Platform Management Interface,

�  Based on a specialized processor card (BMC) �  1998: IPMI v1.0, 2001: IPMI v1.5, originally by Intel, HP, NEC, Dell �  2004: IPMI v2.0 (matured version of IMPI) �  De facto standard implemented by all motherboard vendors

�  Fine grain monitoring of individual system parts: temperatures, fans, voltages, etc. and much more: Recovery Control (power on/off/reset a server), Logging (System Event Log), Inventory (FRU information)

�  Why? To contribute to a global approach, e.g. cooling inefficiency leads to increased fan speed which leads to +20% in power consumption – vs the “hot servers” trend.

�  http://www.intel.com/design/servers/ipmi

GCG 2011 The Green Computing Observatory

Page 7: Green Computing Observatory

Source: http://www.netways.de/uploads/media/Werner_Fischer_-The-Power-Of-IPMI.pdf

GCG 2011 The Green Computing Observatory

Page 8: Green Computing Observatory

IPMI

�  IPMI = Intelligent Platform Management Interface

�  The exchange protocols are defined and heavily documented

�  But NOT the sensors (nor defined nor documented)

�  At LAL, we have DELL and IBM PowerEdge motherboards �  Very different sensors: e.g. AVGPower (Watts) vs

PSCurrent (Amps) �  Many inactive (NA), may depend on the BIOS version

GCG 2011 The Green Computing Observatory

Page 9: Green Computing Observatory

Smart PDU

�  PGEP PULTI �  16 outlets

�  Each PDU outlet managed separately

�  Query protocol : SNMP

�  Embedded Web server

�  Issue: last systems are Twin2

�  4 systems in 2U, 8-16 processors

�  Useful for calibration

�  Not all racks will be equipped

GCG 2011 The Green Computing Observatory

Page 10: Green Computing Observatory

Ganglia

�  De facto standard

�  Sensors associated with an OS

�  CPU load (average number of processes during a given duration, 1-2-15 minutes) , Memory (buffered, cached, free, shared, swap, total) and network usage

�  Applies to Virtual Machines as well

�  Periodic acquisition of system monitoring information (/proc), transfer and storage protocols. RDD storage inadequate for the GCO.

GCG 2011 The Green Computing Observatory

Page 11: Green Computing Observatory

GCG 2011 The Green Computing Observatory

Page 12: Green Computing Observatory

Information model

�  There is no standard for �  The output of the physical sensors

�  The integration of computational usage and physical sensors’ output

�  There are standards for �  OS information: Ganglia

�  Virtual Machine definition: OVF �  Centralized statistics publication: SDMX (Statistical

Data and Metadata Exchange). Successful experience of porting to a Linked Data model.

GCG 2011 The Green Computing Observatory

Page 13: Green Computing Observatory

Extension of DOLCE

GCG 2011 The Green Computing Observatory

Page 14: Green Computing Observatory

Ontology – measurement concepts

�  Define the semantics of the data (what is measured?) and the context of their production (how are they acquired and/or calculated?) �  Observables are individual qualities of endurants (e.g.

temperature of a component) and/or perdurants (e.g. speed of the rotation of a fan).

�  Observations make use of sensors and data acquisition chains which are physical and non-physical (software) artifacts.

�  Observation values are boolean/numeric/scalar qualia.

�  Extensions/adaptations of DOLCE, FOOM (Functional Ontology of Observation and Measurement) and OGC-O&M (OGC’s Observations and Measurements standard)

�  Work in progress

GCG 2011 The Green Computing Observatory

Page 15: Green Computing Observatory

Publication: XML files

�  At 5 minutes sampling period, 1GB/day.

�  Scalable querying w/ Xpath

�  Integration capability w/ Xinclude

�  Easy conversion to analysis-focused formats eg matlab w/ XSL

GCG 2011 The Green Computing Observatory

Page 16: Green Computing Observatory

Preliminary XML schema

GCG 2011 The Green Computing Observatory

Page 17: Green Computing Observatory

How to

Get an account Download files

GCG 2011 The Green Computing Observatory

www.grid-observatory.org

Page 18: Green Computing Observatory

Status and Roadmap

�  Acquisition of timeseries and metadata for IPMI, Ganglia, PDU and temperature are in production

�  Examples of raw timeseries for IPMI, PDU and Ganglia released

�  Metadata integration and temperature timeseries, stable XML schema V1 1T 2012

�  Global energy consumption 2T 2012

�  Ontology-consistent XML schema (V2) 4T 2012

�  Also: rack monitoring

GCG 2011 The Green Computing Observatory

Page 19: Green Computing Observatory

Non-stationarity

The Green Computing Observatory

The “physical” process is not stationary

�  Trends: Rogers’s curve of adoption

�  Technology innovations

�  Real-world events �  Experimental discoveries

�  Slashdotted accesses

NON-STATIONARITY IS A REASONABLE HYPOTHESIS BUT PRECLUDES NAÏVE STATISTICS

GCG 2011

Page 20: Green Computing Observatory

How to build knowledge?

�  Supervised learning? No reference, too rare experts

�  Let’s build it on-line! Model-free policies e.g. Reinforcement Learning!

�  Unfortunately, tabula rasa policies and vanilla ML methods are too often defeated [Rish & Tesauro 2006).

The Green Computing Observatory

Intelligibility

Exploration/exploitation tradeoff

GCG 2011

Page 21: Green Computing Observatory

Methods

Intelligibility: Uncovering hidden causes

�  Semantic inference [Y. Kim et al. Characterizing E-Science File Access Behavior via Latent Dirichlet Allocation, UCC 2011]

�  Collaborative Prediction, Rank approximation [D. Feng et al. Distributed Monitoring with Collaborative Prediction]

Dealing with non stationarity

�  Segmentation [T. Elteto et al. Towards non stationary Grid Models, JoGC Dec. 2011]

�  Adaptive clustering with changepoint detection [X. Zhang et al. Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming. SIGKDD'2009]

GCG 2011 The Green Computing Observatory

Page 22: Green Computing Observatory

With the support of

�  France Grilles – French NGI member of EGI

�  EGI-Inspire (FP7 project supporting EGI)

�  INRIA – Saclay (ADT programme)

�  CNRS (PEPS programme)

�  University Paris Sud (MRM programme)

n

GCG 2011 The Green Computing Observatory

Page 23: Green Computing Observatory

Conclusion: Digital Curation

�  Establish long-term repositories of digital assets for current and future reference �  Continuously monitoring a large computing facility

�  Providing digital asset search and retrieval facilities to scientific communities through a gateway �  Data published through Grid Observatory portal

�  Tackling the good data creation and management issues, and prominently interoperability, �  Formal mainstream ontology, standard-aware

�  Adding value to data by generating new sources of information and knowledge �  Semantic and Machine Learning based inference.

GCG 2011 The Green Computing Observatory