gridview - a monitoring & visualization tool for lcg

Post on 06-Jan-2016

55 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

GridView - A Monitoring & Visualization tool for LCG. Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting 15.09.2006. Gridview : New Developments (During 27 th April to 15 th September). - PowerPoint PPT Presentation

TRANSCRIPT

GridView - A Monitoring & Visualization

tool for LCGRajesh Kalmady, Phool Chand,

Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav

B.A.R.C.

BARC-CERN/LCG Meeting 15.09.2006

Gridview : New Developments(During 27th April to 15th September)

• Enhancements to Gridftp file transfer monitoring

• Development of summarization and presentation modules for

– Job Monitoring

– Service Availability Monitoring

• Deployment of all the new developments to production system

File Transfer Monitoring

• Enhanced Gridftp summarization and presentation modules for– VO-wise distribution of overall data transfers– VO-wise distribution of data transfers per Site– Site-wise distribution of data transfers per VO

• Developed graphs and reports for data transfers from all sites to a given site (Hourly, Daily reports)

File Transfer Monitoring : Overall VO-wise Details

File Transfer Monitoring : Site-wise details for a particular VO

Job Monitoring• Developed summarization module for

computation of job statistics • Developed presentation module to display

periodic Graphs and Reports for– Job Status (Total Number of Jobs in various States)– Job Success Rate– Job Resource Utilization (Elapsed time,CPU, Memory)– Average Job Turnaround time (RB Waiting, Site

Waiting, Execution Time)– Site, VO and RB-wise distribution– Hourly, Daily, Weekly and Monthly reports

Job Monitoring (Cont…)• Developed periodic Graphs and Reports for

– Overall Summary• sites with high/low job execution rate• sites with high/low job success rate• VOs running more/less jobs etc

– Possible to view job statistics for any user selected combination of VO, Site and RB

Job Status : State-wise Distribution

Job Status : VO-wise Distribution

Job Status : RB-wise Distribution

Job Status : Site-wise Distribution

Job Monitoring : Job Success Rate

Job Monitoring : Average Job Turnaround time

Service Availability Monitoring • Developed summarization module for computation of

Service Availability – based on SAM Test Results – AND (critical services) of OR (redundant services)

• Developed presentation module to display periodic Graphs and Reports for– Central Service Availability (FTS, LFC, RB)– Aggregate tier-1 site Availability– Site-wise availability for individual tier-1 sites– Site-wise service availability of tier-2 sites (grouped by

associated VOs)– Detailed availability of various services (CE, SE, SRM) and their

individual instances running at a particular site

Service Availability Monitoring (Cont…)

• Reports on Hourly, Daily, Weekly and Monthly basis

• Tracability from Aggregate Availability to Individual Service Instance Availability

• Provision for saving user preferences based on certificates

Service Availability Monitoring : Central Service Availability

Service Availability Monitoring : FTS Instance Availability

Service Availability Monitoring : Aggregate T1 Site Availability

Service Availability Monitoring : Tier-1 Site Availability

Service Availability Monitoring : Site Detail Availability

On-going Work

• Presentation of Detailed SAM test results for traceability from Availability Graphs to corresponding tests

• Development of Weekly and Monthly reports for All to Given site data transfers

• Modification to Gridftp file transfer GUI and Reports in order to enable Multiple site selection (new request)

Future Work

• Visualization of FTS Statistics

• Archival of Job data for jobs submitted directly to CE

• Interfacing GridView with Information System (Top level BDII) for Resource Availability– Compute nodes (WNs), Storage etc

Future Work : Visualization of FTS Statistics

• Currently GridView visualizes gridftp data transfer rates across the sites.

• FTS statistics to be visualized include

– Successful transfers

– Failure rates

– VO-wise, FTS server-wise and Channel-wise details of data transfers

Problems

• No data is being published to R-GMA table JobMonitor since 2 months (in spite of repeated reminders)

• Gridview Availability Depends on – R-GMA Service– Oracle Database Service– SAM/SFT tests

• Instabilities in Gridview service caused by– R-GMA Instabilities

• Registry failures, Monbox failures, Data loss etc.

– Occasional Oracle downtime– Unannounced software upgrades on production machines leading to

broken code

• Subsequently, Gridview address added to cern-quattor-announce mailing list and upgrades done manually by Gridview team

Thank You

Your comments and suggestions please

top related