grid observatory @ ccgrid 2011
DESCRIPTION
TRANSCRIPT
The Grid Observatory Cécile Germain-Renaud , Alain Cady , Philippe Gauron ,
Michel Jouvin , Charles Loomis , Janusz Martyniak , Julien Nauroy , Guillaume Philippon , Michèle Sebag
Goals (I): Digital curation
� For the behavioral data of the EGEE/EGI grid � Collection,
preservation, indexation, querying
� Continuous and exhaustive datasets
� For scientific and engineering usage
CCGrid 2011 24-27 May 2011 2
Goals (II): model and optimize
CCGrid 2011 24-27 May 2011 3
Complex systems description
Statistical and Machine Learning models and optimization
Applications to dimensioning and Autonomics
Outline
� What is the GO?
� Epistemological thoughts
� How the GO helps, with scientific issues
� Ongoing work
CCGrid 2011 24-27 May 2011 4
Outline
� What is the GO?
� Epistemological thoughts
� How the GO helps, with scientific issues
� Ongoing work
CCGrid 2011 24-27 May 2011 5
Who are we?
� Born in the flagship EU grid project EGEE
� Presently a collaborative effort of � CNRS/UPS Laboratoire de Recherche en Informatique
� CNRS/UPS Laboratoire de l'Accélérateur Linéaire � Imperial College London
CCGrid 2011 24-27 May 2011 6
Who are we?
� With the support of � France Grilles – French NGI of EGI � EGI-Inspire � Ile de France council
(Software and Complex Systems programme) � INRIA – Saclay (ADT programme) � CNRS (PEPS programme) � University Paris Sud (MRM programme)
� Scientific Collaborations � NSF Center for Autonomic Computing � European Middleware Initiative � Institut des Systèmes Complexes � Cardiff University
CCGrid 2011 24-27 May 2011 7
The digital data
CCGrid 2011 24-27 May 2011 8
CCGrid 2011 24-27 May 2011 9
JOBS LIFECYCLE: SYNTHETIC JOBS LIFECYCLE: DETAILED
JOBS: TORQUE VIEW
GRID STATUS – SELF AWARENESS
MIDDLEWARE INTERNALS
FILE TRAFFIC
Architecture
CCGrid 2011 24-27 May 2011 10
Grid Services
Grid-Observatory Scripts
Torque
WMS
CE Logging & bookkepping
BDII IC RTM
Incoming Anonymisation Upload
Storage Elements DPM via HTTPs Grid Observatory Portal
SFTP SQL LDAP HTTP • Native data • Often as detailed as on-line • On top of the mainstream monitoring tools • Consistent anonymization
CCGrid 2011 24-27 May 2011 11 CCGrid 2011 11
Production since October 2008
Traces available through the portal: no grid certificate required
www.grid-observatory.org
Portal Usage
CCGrid 2011 24-27 May 2011 12
Use and users both increasing steadily
Global Impact
Outline
� What is the GO?
� Epistemological thoughts
� How the GO helps, with scientific issues
� Ongoing work
CCGrid 2011 24-27 May 2011 13
Why digital curation?
CCGrid 2011 24-27 May 2011 14
Why digital curation?
� How much of your research (and mine) went to the real world?
� We need to show that the research has verifiable and positive impact on production systems
CCGrid 2011 24-27 May 2011 15
Why digital curation?
� How much of your research (and mine) went to the real world?
� We need to show that the research has verifiable and positive impact on production systems « beyond any reasonable doubt »
CCGrid 2011 24-27 May 2011 16
CCGrid 2011 24-27 May 2011 17
How we configure our grids? Courtesy James Casey talk @EGEE09
CCGrid 2011 24-27 May 2011 18
The MAPE-K loop
CCGrid 2011 24-27 May 2011 19
Managed Element E S
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager E S
The MAPE-K loop
CCGrid 2011 24-27 May 2011 20
Managed Element E S
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager E S
State-Space and Data Abstraction Streaming: On-line data mining, clustering,.. Dimensionality reduction Active learning Ontological inference
High-dimensional, high-volume ‘raw’ data
Compressed, ‘informative’ data
The acquisition/analysis feedback loop
� Analysis informs acquisition: a priori feature definition may be seriously misleading.
� Example of � A priori redundant features
� Quasi-linear complexity data streaming demonstrated
� Also : on the sampling frequency of acquisition for the IS, eg [Laurence Field and Rizos Sakellariou. How Dynamic is the Grid? Towards a Quality Metric for Grid Information Systems. Grid’2010]
� ...
CCGrid 2011 24-27 May 2011 21
1 2 3 4 50
20
40
60
80
100
Reservoir
700000
10 47 54129 0 0
8 18 24 30595139
7 13 14 24 972819190
Clusters
Pe
rce
nta
ge
of
job
s a
ssig
ne
d (
%)
exemplar shown as a job vector
1 2 3 4 5 6 7 80
20
40
60
80
100
Reservoir
0
0
0
0
0
0
7
0
0
0
0
0
10
47
54
129
0
0
9
18
25
20110
0
0
8
18
24
30
595
139
6
5
10
14
127
10854
10
18
29
20091
395
276
LogMonitor isgetting clogged
Outline
� What is the GO?
� Epistemological thoughts
� How the GO helps, with scientific issues
� Ongoing work
CCGrid 2011 24-27 May 2011 22
Issue I: Uncertainty (1/2)
� As a dynamic(al) system � Entities change behavior as an effect of unexpected
feedbacks, emergent behavior � Organized self-criticality, minority games,...
CCGrid 2011 24-27 May 2011 23
Some scientific issues
� Uncertainty � As a dynamic(al) system
� Entities change behavior as an effect of unexpected feedbacks, emergent behavior
� Organized self-criticality, minority games,...
CCGrid 2011 24-27 May 2011 24
Complexity???
Symbolic Dynamics for Discrete Adaptive Games
Cosma Rohilla Shalizi, David J. Albers
http://www.santafe.edu/media/workingpapers/02-07-031.pdf
We use symbolic dynamics to study discrete adaptive games, such as the minority game and the El Farol Bar problem. We show that no such game can have deterministic chaos. We put upper bounds on the statistical complexity and period of these games; the former is at most linear in the number agents and the size of their memories. We extend our results to cases where the players have infinite-duration memory (they are still non-chaotic) and to cases where there is ``noise'' in the play (leaving the complexity unchanged or even reduced). We conclude with a mechanism that can reconcile our findings with the phenomenology, and reflections on the merits of simple models of mutual adaptation.
CCGrid 2011 24-27 May 2011 25
Issue I: Uncertainty (2/2)
� As a dynamic(al) system � Entities change behavior as an effect of unexpected
feedbacks, emergent behavior � Organized self-criticality, minority games,...
� Lack of complete and common knowledge – Information uncertainty � Monitoring is distributed too
� Resolution and calibration � Semantics and ontologies
CCGrid 2011 24-27 May 2011 26
CCGrid 2011 24-27 May 2011 27
Resolution and calibration
Semantics and ontologies
Issue II: Fundamentals in statistics
� Statistical significance
� Is prediction possible?
� Which metrics (mathematical sense)?
� And more
CCGrid 2011 24-27 May 2011 28
Statistical significance
CCGrid 2011 24-27 May 2011 29
Extreme values may dominate the statistics
Ò Can we predict anything? É Maybe, but difficult: same as
earthquakes and finance
Metrics
CCGrid 2011 24-27 May 2011 30
Root Mean Squared Error is inadequate
Metrics
CCGrid 2011 24-27 May 2011 31
Root Mean Squared Error is inadequate
The ROC metric: à la BQP
CCGrid 2011 24-27 May 2011 32
Fundamentals in statistics
� Statistical significance
� Is prediction possible?
� Which metrics (mathematical sense)?
� Are our systems stationary?
CCGrid 2011 24-27 May 2011 33
Descriptive and generative models
� The “physical” process is not stationary � Trends: Rogers’s curve
� Technology innovations
� Real-world events
� Experimental discoveries
� Slashdotted accesses
� Non-stationarity and heavy-tailedness can easily be confused
� Non-stationarity is a reasonable alternative
CCGrid 2011 24-27 May 2011 34
Dealing with non-stationarity
� Statistical testing: jump in… � Theoretical guarantees for
known distributions
� Segmentation � AIC, MDL,… – based � Mostly off-line and
computationally expensive � A-priori hypotheses on the
segment models
� Adaptive clustering � The exemplars are the model � On-line rupture detection:
back to statistical testing, but on the indicators, not on the model
CCGrid 2011 24-27 May 2011 35
Outline
� What is the GO?
� Epistemological thoughts
� How the GO helps, with scientific issues
� Ongoing work
CCGrid 2011 24-27 May 2011 36
Lessons learned
� Sociology � Running a production system for usage by computer
science is nearly unchartered territory – we are a few explorators
� Verified that 80% of the cost of Data Mining is in pre-processing
� Technique � Build on existing monitoring tools
� No fancy technology: the goal is usage, not the tool
CCGrid 2011 24-27 May 2011 37
Ongoing work
� Energy monitoring � Unique facility reporting detailed data at the
motherboard level � Method and roadmap to be annouced at the
GreenDays next week
� Grid Observatory v2.0: "services make the repository" � Semantic data organization � On-line visualization
CCGrid 2011 24-27 May 2011 38
Ongoing work
� Energy monitoring � Unique facility reporting detailed data at the motherboard
level
� Method and roadmap to be annouced at the GreenDays next week
� Grid Observatory v2.0: "services make the repository" � Semantic data organization
� On-line visualization
� Keep-on with monitoring standardisation effort at EMI
CCGrid 2011 24-27 May 2011 39
More information
� Coll. with Autonomic Computing � S. Jha’s talk http://www.youtube.com/watch?v=DI62pG_HBcs
� GMAC Panel published by IEEE Internet Computing and www.computer.org/portal/web/computingnow/panel
� Autonomics research � Adaptive clustering with application to fault diagnosis: “Toward Autonomic
Grids: Analyzing the Job Flow with Affinity Streaming”, SIGKDD'2009 � MDL segmentation applied to workload: “Discovering Piecewise Linear
Models of Grid Workload”, CCGRID 2010 � Fault models: “Optimization of jobs submission on the EGEE production
grid: modeling faults using workload”. Journal of Grid Computing 8(2) � Cloud management: Energy-efficient application-aware online provisioning
for virtualized clouds and data centers. International Conference on Green Computing 2010
� And much more on the GO portal, Documents section
CCGrid 2011 24-27 May 2011 40
CCGrid 2011 24-27 May 2011 41 CCGrid 2011 41
www.grid-observatory.org