university of florence – mon, 19 dec 2005 – marco meoni - 1/30 monitoring of a distrubuted...

32
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005 Marco MEONI

Upload: maribel-mullinax

Post on 30-Mar-2015

226 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30

Monitoring of adistrubuted computing system:

the Grid AliEn@CERN

Master Degree – 19/12/2005

Marco MEONI

Page 2: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 2/30

Content

I. Grid Conceptsand Grid Monitoring

II. MonALISA Adaptationsand Extensions

III. PDC’04 Monitoring and Results

IV. Conclusions and Outlooks

http://cern.ch/mmeoni/thesis/eng.pdf

Page 3: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 3/30

Section I

Grid Concepts and Grid Monitoring

Page 4: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 4/30

ALICE experiment at CERN LHC

1) Heavy Nuclei and proton-proton colliding

2) Secondary particles areproduced in the collision

3) These particles are recorded by the ALICE detector

4) Particle properties (trajectories, momentum, type) are reconstructedby the AliRoot software

5) ALICE physicists analyse thethe data and search for physics signals of interest

Page 5: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 5/30

Grid Computing

• Grid Computing definition - “coordinated use of large sets of heterogenous, geographically distributed resources

to allow high-performance computation”

• The AliEn system- pull rather than push architecture: the scheduling service does not need to

know the status of all resources in the Grid – the resources advertise themselves;

- robust and fault tolerant, where resources can come and go at any point in time;

- interfaces to other Grid flavours allowing for rapid expansion of the size of the

computing resources, transparently for the end user.

Page 6: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 6/30

• R-GMA: an example of implementation

• Jini (Sun): provides the technical basis

Grid Monitoring

Producer

Consumer

RegistryTransfer

Data

Storelocation

Lookuplocation

• GMA Architecture

Page 7: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 7/30

MonALISA framework• Distributed monitoring service system using JINI/JAVA and WSDL/SOAP technologies• Each MonALISA server acts as a dynamic service system and provides the functionality

to be discovered and used by any other services or clients that require such information

Page 8: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 8/30

Section II

MonALISA Adaptations and Extensions

Page 9: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 9/30

• Farms monitoring

MonALISA Adaptations

• User Java class to interface MonALISA and bash script to monitor the site

• A Web Repository as a front-end for production monitoring• Stores history view of the monitored data • Displays the data in variety of predefined histograms and other visualisation formats• Simple interfaces to user code: custom consumers, configuration modules, user-defined charts, distributions

MonALISAAgent

WNs

CEMonitoring script

Java interface class

Monitored data

User code MonALISA frameworkGrid resources

Page 10: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 10/30

Repository Setup

1. Packages installation (Tomcat, MySQL)

2. Configuration of main servlets for ALICE VO

3. Setup of scripts for startup/shutdown/backup

• A Web Repository as a front-end for monitoring• Keeps full history of monitored data• Shows data in a moltitude of histograms• Added new presentation formats to provide a full set (gauges, distributions)• Simple interfaces to user code: custom consumers, custom tasks

• Installation and Maintenance

• All the produced plots have been built and customized as from as many configuration files

• SQL, parameters, colors, type• cumulative or averaged behaviour • smooth, fluctuations• user time intervals• …many others

Page 11: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 11/30

Repository• Added a Java thread (DirectInsert) to feed directly the Repository, without passing by the MonALISA agents

Ad hoc java thread

Jobs

information

TOMCATJSP/servlets

AliEn Jobs Monitoring• Centralized or distributed?• AliEn native APIs to retrieve job status snapshots

Job is submitted

>1h

>3h

(Error_I)

(Error_A)

(Error_S)

(Error_E)

(Error_R)

(Error_V, VT, VN)

(Error_SV)

Page 12: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 12/30

• 7+ Gb of performance information, 24.5M records• During DC data from ~2K monitored parameters arrive every 2/3 mins

Data Collecting:

alimonitor.cern.ch aliweb01.cern.ch

Online Replication

Data Replication: MASTER DB REPLICA DB

• MonALISA Agents• Repository Web Services• AliEn API• LCG Interface• WNs monitoring (UDP)• Web Repository

• ROOT• CARROT

Grid AnalysisData collecting and Grid Monitoring

1min

Averaging

process

10 min 100 min

60 bins for each basic

information

FIFO

Repository DataBase(s)

Page 13: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 13/30

• Storage and monitoring tools of the Data Challenge running parameters, task completion and resource status

Web Repository

Page 14: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 14/30

Visualisation Formats

Stacked BarsStatistics and

real-time tabulated

CE Load factors and tasks completion

Menù

Snapshots and Pie charts

Running history

Page 15: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 15/30

Source Category Number Examples

AliEn API CE load factors 63 Run load, queue load

SE occupancy 62 Used space, free space, number of files

Job information 557 Job status: running, saving, done, failures

Soap CERN Network traffic 29 Size of traffic, number of files

LCG CPU – Jobs 48 Free CPUs, jobs running and waiting

ML services on MQ Job summary 34 Job status: running, saving, done, failures

AliEn parameters 15 DB load, Perl processes

ML services Sites info 1060 Paging, threads, I/O, processes

Job execution efficiency Successfuly done jobs / all submitted jobs

System efficiency Error (CE) free jobs / all submitted jobs

AliRoot efficiency Error (AliROOT) free jobs / all submitted jobs

Resource efficiency Running (queued) jobs / max_running (queued)

Monitored parameters

• Derived classes1868

• 2k parameters and 24,5M records with 1 minute granularity• Analysis of the collected data allows for improvement of the Grid performance

Page 16: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 16/30

MonALISA Extensions

• Job monitoring of Grid users

• Application Monitoring (ApMon) at WNs

• Repository Web Services

• Using AliEn commands (ps –a, jobinfo #jobid, ps –X -st) + output parsing•Job’s JDL scanning• Results presented in the same web front end

• Alternative to ApMon for WEB repository purposes - don’t need MonALISA agents - store data directly into the DB repository

• Used to monitor Network Traffic through the ftp servers of ALICE at CERN

• ApMon is a set of flexible APIs that can be used by any application to send monitoring information to MonALISA services, via UDP datagrams

• Allows for data aggregation and scaling of the monitoring system• Developed a light monitoring C++ class to include within the Process Monitor

payload

Page 17: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 17/30

MonALISA Extensions

• Distributions for principle of Analysis• First attempt for a Grid performance tuning, based on real monitored data• Use of ROOT and Carrot features• Cache system to optimize the requests

MonALISARepository

ROOT histogramserver process(central cache)

ROOT/Carrot histogram clients

1. ask for histogram2. query NEW data

3. send NEW data

4. send resulting object/file

HTTP

A p

a c h e

Page 18: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 18/30

Section III

PDC’04 Monitoring and Results

Page 19: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 19/30

PDC’04• Purpose: test and validate the ALICE Offline computing model:

– Produce and analyse ~10% of the data sample collected in a standard data-taking year

– Use the complete set of off-line software: AliEn, AliROOT, LCG, Proof and, in Phase 3, the ARDA user analysis prototype

• Structure: logically divided in three phases:1. Phase 1 - Production of underlying Pb+Pb events with different centralities

(impact parameters) + production of p+p events2. Phase 2 - Mixing of signal events with different physics content into the

underlying Pb+Pb events 3. Phase 3 – Distributed analysis

Page 20: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 20/30

Storage

• Task - simulate the data flow in reverse: events are produced at remote centres and stored in the CERN MSSMaster job submission, Job

Optimizer, RB, File catalogue, processes control, SE…

Central servers

CEs

Sub-jobs

Job processing

AliEn-LCG interface

Sub-jobs

RB

Job processing

CEs

CERN CASTOR: disk servers, tape

Output files

LCG is one AliEn CE

PDC’04 Phase 1

Page 21: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 21/30

Start 10/03, end 29/05 (58 days active) Maximum jobs running in parallel: 1450 Average during active period: 430

Total number of jobs running in parallel

18 computing centres participating

Total CPU profile• Aiming for continuous running, not always possible due to resources constraints

Page 22: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 22/30

Efficiency

Successfully done jobs all submitted

jobs

Error (CE) free jobs

all submitted

jobs

Error (AliROOT) free jobs all submitted

jobs

• Calculation principle: jobs are submitted only once

Page 23: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 23/30

Phase 1 of PDC’04 StatisticsNumber of jobs

Job duration

56.000

8h (cent1), 5h (peripheral 1), 2.5h (peripheral 2-5)

Files per job

Number of entries in AliEn FC

Number of files in CERN MSS

36

3.8M

1.3M

File size 26TB

Total CPU work

LCG CPU work

285MSI-2k hours

67MSI-2k hours

Page 24: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 24/30

Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue,

processes monitoring and control, SE…

Central servers

CEs

Sub-jobs

Job processing

AliEn-LCG interface

Sub-jobs

RB

Job processing

CEs

Storage

CERN CASTOR: underlying events

Local SEs

CERN CASTOR: backup copy

Storage

Primary copy Primary copy

Local SEs

Output files Output files

Underlying event input files

zip archive of output files

Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN

edg(lcg) copy&register

File catalogu

e

PDC’04 Phase 2• Task - simulate the event reconstruction

and remote event storage

Page 25: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 25/30

Start 01/07, end 26/09 (88 days active) As in the 1st phase, general equilibrium in CPU contribution AliEn direct control: 17 CEs, each with a SE CERN-LCG is encompassing the LCG resources worldwide (also with local/close SEs)

Individual sites: CPU contribution

Page 26: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 26/30

Sites occupancy• Outside CERN, sites such as Bari, Catania and JINR have generally run always at the maximum capacity

Page 27: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 27/30

Phase 2: Statistics and FailuresNumber of jobs

Job duration

400.000

6h/job

Conditions 62

Number of events 15.2M

Number of files in AliEn FC

Number of files in storage

9M

4.5M distributed at 20 CEs world-wide

Storage at CERN MSS

Storage at remote CEs

30TB

10TB

Network transfer 200TB from CERN to remote CEs

Total CPU work 750MSI-2k hours

Submission CE local scheduler not responding 1%

Loading input data Remote SE not responding 3%

During execution Job aborted (insufficient WN memory or AliRoot problems)

Job cannot start (missing application software directory)

Job killed by CE local scheduler (too long)

WN or global CE malfunction (all jobs on a given site are lost)

10%

Saving output data Local SE not responding 2%

mmeoni
conditions reported by the ClusterMonitor
Page 28: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 28/30

File Catalogue query

CE and SE

processing

User job (many events)

Data set (ESDs, other)

Job Optimizer

Sub-job 1

Sub-job 2 Sub-job n

CE and SEprocessin

g

CE and SE

processing

Job Broker

Grouped by SE files location

Submit to CE with closest SE

Output file 1

Output file 2

Output file n

File merging job

Job output

PDC’04 Phase 3• Task – user data analysis

Page 29: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 29/30

Analysis

• Distribution of number of running jobs - mainly depends on number of waiting jobs in TQ and availability of free CPU at the remote CEs

• Occupancy versus the number of queued jobs - there is an increase of the occupancy as more jobs are waiting in the local batch queue and a saturation is reached at around 60 queued jobs

Start September 2004, end January 2005 Distributions charts built on top of ROOT environment using the Carrot web interface

mmeoni
Fig.1. In that period TQ had always sufficient w.j for execution. Under such circumstances the running jobs distribution is a direct measurement of the availability of CPUs at CEsFig.2 shows occupancy respect the n. of queued jobs in the local batch system, expressed in term of ratio between number of jobs running and maximum number allowed. Different batch schedulers (PBS, LSF, BQS) have different latency in scheduling jobsfor execution and optimization of the number of jobs in the local queues is necessary to achieve a maximum occupancy with running jobs
Page 30: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 30/30

Section IV

Conclusions and Outlook

Page 31: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 31/30

Lessons from PDC’04 User jobs have been running for 9 months using AliEn

MonALISA has provided a flexible and complete monitoring framework successfully adapted to

the needs of Data Challenge

MonALISA has given the expected results for performance tuning and workload balancing

Approach step by step: from resources tuning to resources optimization

MonALISA has been able to gather, store, plot, sort and group large variety of monitored

parameters, either basic or derived in a rich set of presentation formats

The Repository has been the only source of historical information and the modular architecture

has made possible a development of variety of custom modules (~800 lines of fundamental

source code and ~3k lines to perform service tasks)

PDC’04 has been a real example of successful Grid interoperability by interfacing AliEn and LCG

and proving the AliEn design scalability

The usage of MonALISA in ALICE has been documented in an article for a conference at

Computing in High Energy and Nuclear Physics (CHEP) ‘04, Interlaken - Switzerland

Unprecedented experience to develop and improve a monitoring framework on top of a real

functioning Grid, massively testing the involved software technologies

Easy to extend the framework and replace components with equivalent ones following the

technical needs or strategic choices

Page 32: University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid AliEn@CERN Master Degree – 19/12/2005

University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 32/30

Credits• Dott. F.Carminati, L.Betev, P.Buncic and all colleagues in ALICE

for the enthusiasm they trasmitted during this work

• MonALISA team collaborative anytime I needed