university of florence – mon, 19 dec 2005 – marco meoni - 1/30 monitoring of a distrubuted...
TRANSCRIPT
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30
Monitoring of adistrubuted computing system:
the Grid AliEn@CERN
Master Degree – 19/12/2005
Marco MEONI
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 2/30
Content
I. Grid Conceptsand Grid Monitoring
II. MonALISA Adaptationsand Extensions
III. PDC’04 Monitoring and Results
IV. Conclusions and Outlooks
http://cern.ch/mmeoni/thesis/eng.pdf
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 3/30
Section I
Grid Concepts and Grid Monitoring
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 4/30
ALICE experiment at CERN LHC
1) Heavy Nuclei and proton-proton colliding
2) Secondary particles areproduced in the collision
3) These particles are recorded by the ALICE detector
4) Particle properties (trajectories, momentum, type) are reconstructedby the AliRoot software
5) ALICE physicists analyse thethe data and search for physics signals of interest
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 5/30
Grid Computing
• Grid Computing definition - “coordinated use of large sets of heterogenous, geographically distributed resources
to allow high-performance computation”
• The AliEn system- pull rather than push architecture: the scheduling service does not need to
know the status of all resources in the Grid – the resources advertise themselves;
- robust and fault tolerant, where resources can come and go at any point in time;
- interfaces to other Grid flavours allowing for rapid expansion of the size of the
computing resources, transparently for the end user.
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 6/30
• R-GMA: an example of implementation
• Jini (Sun): provides the technical basis
Grid Monitoring
Producer
Consumer
RegistryTransfer
Data
Storelocation
Lookuplocation
• GMA Architecture
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 7/30
MonALISA framework• Distributed monitoring service system using JINI/JAVA and WSDL/SOAP technologies• Each MonALISA server acts as a dynamic service system and provides the functionality
to be discovered and used by any other services or clients that require such information
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 8/30
Section II
MonALISA Adaptations and Extensions
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 9/30
• Farms monitoring
MonALISA Adaptations
• User Java class to interface MonALISA and bash script to monitor the site
• A Web Repository as a front-end for production monitoring• Stores history view of the monitored data • Displays the data in variety of predefined histograms and other visualisation formats• Simple interfaces to user code: custom consumers, configuration modules, user-defined charts, distributions
MonALISAAgent
WNs
CEMonitoring script
Java interface class
Monitored data
User code MonALISA frameworkGrid resources
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 10/30
Repository Setup
1. Packages installation (Tomcat, MySQL)
2. Configuration of main servlets for ALICE VO
3. Setup of scripts for startup/shutdown/backup
• A Web Repository as a front-end for monitoring• Keeps full history of monitored data• Shows data in a moltitude of histograms• Added new presentation formats to provide a full set (gauges, distributions)• Simple interfaces to user code: custom consumers, custom tasks
• Installation and Maintenance
• All the produced plots have been built and customized as from as many configuration files
• SQL, parameters, colors, type• cumulative or averaged behaviour • smooth, fluctuations• user time intervals• …many others
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 11/30
Repository• Added a Java thread (DirectInsert) to feed directly the Repository, without passing by the MonALISA agents
Ad hoc java thread
Jobs
information
TOMCATJSP/servlets
AliEn Jobs Monitoring• Centralized or distributed?• AliEn native APIs to retrieve job status snapshots
Job is submitted
>1h
>3h
(Error_I)
(Error_A)
(Error_S)
(Error_E)
(Error_R)
(Error_V, VT, VN)
(Error_SV)
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 12/30
• 7+ Gb of performance information, 24.5M records• During DC data from ~2K monitored parameters arrive every 2/3 mins
Data Collecting:
alimonitor.cern.ch aliweb01.cern.ch
Online Replication
Data Replication: MASTER DB REPLICA DB
• MonALISA Agents• Repository Web Services• AliEn API• LCG Interface• WNs monitoring (UDP)• Web Repository
• ROOT• CARROT
Grid AnalysisData collecting and Grid Monitoring
1min
Averaging
process
10 min 100 min
60 bins for each basic
information
FIFO
Repository DataBase(s)
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 13/30
• Storage and monitoring tools of the Data Challenge running parameters, task completion and resource status
Web Repository
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 14/30
Visualisation Formats
Stacked BarsStatistics and
real-time tabulated
CE Load factors and tasks completion
Menù
Snapshots and Pie charts
Running history
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 15/30
Source Category Number Examples
AliEn API CE load factors 63 Run load, queue load
SE occupancy 62 Used space, free space, number of files
Job information 557 Job status: running, saving, done, failures
Soap CERN Network traffic 29 Size of traffic, number of files
LCG CPU – Jobs 48 Free CPUs, jobs running and waiting
ML services on MQ Job summary 34 Job status: running, saving, done, failures
AliEn parameters 15 DB load, Perl processes
ML services Sites info 1060 Paging, threads, I/O, processes
Job execution efficiency Successfuly done jobs / all submitted jobs
System efficiency Error (CE) free jobs / all submitted jobs
AliRoot efficiency Error (AliROOT) free jobs / all submitted jobs
Resource efficiency Running (queued) jobs / max_running (queued)
Monitored parameters
• Derived classes1868
• 2k parameters and 24,5M records with 1 minute granularity• Analysis of the collected data allows for improvement of the Grid performance
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 16/30
MonALISA Extensions
• Job monitoring of Grid users
• Application Monitoring (ApMon) at WNs
• Repository Web Services
• Using AliEn commands (ps –a, jobinfo #jobid, ps –X -st) + output parsing•Job’s JDL scanning• Results presented in the same web front end
• Alternative to ApMon for WEB repository purposes - don’t need MonALISA agents - store data directly into the DB repository
• Used to monitor Network Traffic through the ftp servers of ALICE at CERN
• ApMon is a set of flexible APIs that can be used by any application to send monitoring information to MonALISA services, via UDP datagrams
• Allows for data aggregation and scaling of the monitoring system• Developed a light monitoring C++ class to include within the Process Monitor
payload
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 17/30
MonALISA Extensions
• Distributions for principle of Analysis• First attempt for a Grid performance tuning, based on real monitored data• Use of ROOT and Carrot features• Cache system to optimize the requests
MonALISARepository
ROOT histogramserver process(central cache)
ROOT/Carrot histogram clients
1. ask for histogram2. query NEW data
3. send NEW data
4. send resulting object/file
HTTP
A p
a c h e
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 18/30
Section III
PDC’04 Monitoring and Results
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 19/30
PDC’04• Purpose: test and validate the ALICE Offline computing model:
– Produce and analyse ~10% of the data sample collected in a standard data-taking year
– Use the complete set of off-line software: AliEn, AliROOT, LCG, Proof and, in Phase 3, the ARDA user analysis prototype
• Structure: logically divided in three phases:1. Phase 1 - Production of underlying Pb+Pb events with different centralities
(impact parameters) + production of p+p events2. Phase 2 - Mixing of signal events with different physics content into the
underlying Pb+Pb events 3. Phase 3 – Distributed analysis
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 20/30
Storage
• Task - simulate the data flow in reverse: events are produced at remote centres and stored in the CERN MSSMaster job submission, Job
Optimizer, RB, File catalogue, processes control, SE…
Central servers
CEs
Sub-jobs
Job processing
AliEn-LCG interface
Sub-jobs
RB
Job processing
CEs
CERN CASTOR: disk servers, tape
Output files
LCG is one AliEn CE
PDC’04 Phase 1
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 21/30
Start 10/03, end 29/05 (58 days active) Maximum jobs running in parallel: 1450 Average during active period: 430
Total number of jobs running in parallel
18 computing centres participating
Total CPU profile• Aiming for continuous running, not always possible due to resources constraints
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 22/30
Efficiency
Successfully done jobs all submitted
jobs
Error (CE) free jobs
all submitted
jobs
Error (AliROOT) free jobs all submitted
jobs
• Calculation principle: jobs are submitted only once
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 23/30
Phase 1 of PDC’04 StatisticsNumber of jobs
Job duration
56.000
8h (cent1), 5h (peripheral 1), 2.5h (peripheral 2-5)
Files per job
Number of entries in AliEn FC
Number of files in CERN MSS
36
3.8M
1.3M
File size 26TB
Total CPU work
LCG CPU work
285MSI-2k hours
67MSI-2k hours
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 24/30
Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue,
processes monitoring and control, SE…
Central servers
CEs
Sub-jobs
Job processing
AliEn-LCG interface
Sub-jobs
RB
Job processing
CEs
Storage
CERN CASTOR: underlying events
Local SEs
CERN CASTOR: backup copy
Storage
Primary copy Primary copy
Local SEs
Output files Output files
Underlying event input files
zip archive of output files
Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN
edg(lcg) copy®ister
File catalogu
e
PDC’04 Phase 2• Task - simulate the event reconstruction
and remote event storage
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 25/30
Start 01/07, end 26/09 (88 days active) As in the 1st phase, general equilibrium in CPU contribution AliEn direct control: 17 CEs, each with a SE CERN-LCG is encompassing the LCG resources worldwide (also with local/close SEs)
Individual sites: CPU contribution
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 26/30
Sites occupancy• Outside CERN, sites such as Bari, Catania and JINR have generally run always at the maximum capacity
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 27/30
Phase 2: Statistics and FailuresNumber of jobs
Job duration
400.000
6h/job
Conditions 62
Number of events 15.2M
Number of files in AliEn FC
Number of files in storage
9M
4.5M distributed at 20 CEs world-wide
Storage at CERN MSS
Storage at remote CEs
30TB
10TB
Network transfer 200TB from CERN to remote CEs
Total CPU work 750MSI-2k hours
Submission CE local scheduler not responding 1%
Loading input data Remote SE not responding 3%
During execution Job aborted (insufficient WN memory or AliRoot problems)
Job cannot start (missing application software directory)
Job killed by CE local scheduler (too long)
WN or global CE malfunction (all jobs on a given site are lost)
10%
Saving output data Local SE not responding 2%
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 28/30
File Catalogue query
CE and SE
processing
User job (many events)
Data set (ESDs, other)
Job Optimizer
Sub-job 1
Sub-job 2 Sub-job n
CE and SEprocessin
g
CE and SE
processing
Job Broker
Grouped by SE files location
Submit to CE with closest SE
Output file 1
Output file 2
Output file n
File merging job
Job output
PDC’04 Phase 3• Task – user data analysis
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 29/30
Analysis
• Distribution of number of running jobs - mainly depends on number of waiting jobs in TQ and availability of free CPU at the remote CEs
• Occupancy versus the number of queued jobs - there is an increase of the occupancy as more jobs are waiting in the local batch queue and a saturation is reached at around 60 queued jobs
Start September 2004, end January 2005 Distributions charts built on top of ROOT environment using the Carrot web interface
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 30/30
Section IV
Conclusions and Outlook
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 31/30
Lessons from PDC’04 User jobs have been running for 9 months using AliEn
MonALISA has provided a flexible and complete monitoring framework successfully adapted to
the needs of Data Challenge
MonALISA has given the expected results for performance tuning and workload balancing
Approach step by step: from resources tuning to resources optimization
MonALISA has been able to gather, store, plot, sort and group large variety of monitored
parameters, either basic or derived in a rich set of presentation formats
The Repository has been the only source of historical information and the modular architecture
has made possible a development of variety of custom modules (~800 lines of fundamental
source code and ~3k lines to perform service tasks)
PDC’04 has been a real example of successful Grid interoperability by interfacing AliEn and LCG
and proving the AliEn design scalability
The usage of MonALISA in ALICE has been documented in an article for a conference at
Computing in High Energy and Nuclear Physics (CHEP) ‘04, Interlaken - Switzerland
Unprecedented experience to develop and improve a monitoring framework on top of a real
functioning Grid, massively testing the involved software technologies
Easy to extend the framework and replace components with equivalent ones following the
technical needs or strategic choices
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 32/30
Credits• Dott. F.Carminati, L.Betev, P.Buncic and all colleagues in ALICE
for the enthusiasm they trasmitted during this work
• MonALISA team collaborative anytime I needed