CCRC’08 Tools for measuring our progress

Download CCRC’08  Tools for measuring our progress

Post on 15-Jan-2016




0 download

Embed Size (px)


CCRC08 Tools for measuring our progress. CCRC08 F2F 5 th February 2008 James Casey, IT-GS-MND. Overview. Tracking the challenge Observations elog Measuring MoU response times Logbook elog Reconciling the experiment and infrastructure views CCRC08 ServiceMap - PowerPoint PPT Presentation


<p>CCRC08 - Measuring our progress</p> <p>CCRC08 Tools for measuring our progressCCRC08 F2F5th February 2008James Casey, IT-GS-MNDCERN IT DepartmentCH-1211 Geneva</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>1OverviewTracking the challengeObservations elogMeasuring MoU response timesLogbook elogReconciling the experiment and infrastructure viewsCCRC08 ServiceMapThings to comeReporting MoU to the sites</p> <p>2CERN IT DepartmentCH-1211 Genve</p> <p>SC4 Twiki3</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>Problems with the twikiHard to generate reports from a twikiStatistics extraction is manualMessages/Incidents per day, per site, Everyone has to pollNo feedsNo categorizationNo threadingWant it to be write-once, read-manyNo changing history ! 4CERN IT DepartmentCH-1211 Genve</p> <p>SolutionWe believe elog gives us these features</p> <p>Lets use CCRC08 to test itFallback solution could be a standard blogId encourage everyone to use ifAlso secretary of CCRC08 daily meeting will add items of interest that ariseDemo'08+Observations/</p> <p>RSS feed :'08+Observations/elog.rdf</p> <p>5CERN IT DepartmentCH-1211 Genve</p> <p>MoU response timesWeve agreed to try and measure MoU metrics during CCRC08To evaluate if we can actually do it !</p> <p>6ServiceMaximum delay in responding to operational problemsAverage availability measured on an annual basisService interruptionDegradation of the capacity of the service by more than 50%Degradation of the capacity of the service by more than 20%During accelerator operationAt all other timesAcceptance of data from the Tier-0 Centre12 hours12 hours24 hours99%n/aNetworking service to the Tier-0 Centre during accelerator operation12 hours24 hours48 hours98%n/aData-intensive analysis services, including networking to Tier-0, Tier-1 Centres 24 hours48 hours48 hours98%98%All other services prime service hours2 hour2 hour4 hours98%98%All other services other times24 hours48 hours48 hours97%97%</p> <p>'08+Logbook/CERN IT DepartmentCH-1211 Genve</p> <p>Response time reporting workflow7Site AcknowledgeNew Problem!Site FixedVO ConfirmedProblem Solved !2008-02-01 11:30 Site Acknowledged working on it !2008-02-01 10:30 New Problem. VO: Atlas, MoU Area: Distribution of data toTier-1 centres, Site: CERN-PROD - SRM not working2008-02-01 11:49 Site Fixed Weve found the problem in the endpoint, restarted2008-02-01 12:43 VO Confirmed All working again , thanks !Problem Report: Issue ID #42 : 2008-02-01 10:30 :MoU Area: CERN-PROD/ Distribution of data to Tier-1 CentresTime to First Response : 1:00Time to Problem resolved : 1:29Time to VO confirmation : 2:23CERN IT DepartmentCH-1211 Genve</p> <p>Measuring MoU availability</p> <p>8ExperimentFramework/ Dashboard ViewOperational Testing (SAM/SLS) ViewHuman` View - the controlCERN IT DepartmentCH-1211 Genve</p> <p>Mapping to MoU Services 9Tier-1Grid ServiceArcCEBDIICEFTSLFCMYPXOSGCERBRGMASESRMSRMv2VOBOXgCEgRBsBDIIMoU CategoryAcceptance of data from Tier-0 *Networking Services to Tier-0 *Data-intensive analysis service, including networking to Tier-0All Other ServicesMap grid services status (from SAM) to MoU categoriesThese are custom service availability calculationsUse the CMS SAM portal framework as basis for implementing thisAnd send results direct to Tier-1 Nagios</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>CMS SAM Portal10</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>ServiceMapWhats a ServiceMap?Its a gridmap with many different maps, showing different aspects of the WLCG infrastructureWhats the CCRC08 ServiceMap?Service readinessService availabilityFor VO critical servicesExperiment MetricsA single place to see both the VO and the infrastructure view of the grid11CERN IT DepartmentCH-1211 Genve</p> <p>CCRC08 ServiceMapDemo</p> <p>12</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>13Measure of how production-ready a service :In terms of software, service and deploymentManually edited (under SVN control) by responsiblesEIS team, service managers, deployment team</p> <p>Service Readiness</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>Experiment metricsShow the VO view of the infrastructureTwo extra mapsReliability (e.g successful data transfer, jobs, )Metrics (MB/s, events/s, )Need interaction with experiments to create these two viewsNote that this is very similar structure to MoU viewperhaps we merge the two, and report to sites on this structure ?</p> <p>CERN IT DepartmentCH-1211 Genve</p> <p>SummaryCCRC08 is a good opportunity to try some new operational toolsAnd evaluated them in a real-world modeThe CCRC08 ServiceMap seems to give a useful view of the gridNeed to iterate on what is useful to showAnd fill in the white spaces Next StepsMoU calculation and reporting to sitesFeedback on all the tools welcome !15CERN IT DepartmentCH-1211 Genve</p> <p>Links to toolsCCRC08 ServiceMap Observations logbook'08+Observations/RSS feed :'08+Observations/elog.rdfReponse tracking logbook'08+Logbook/RSS feed :'08+Logbook/elog.rdf</p> <p> Presentation title - 16CERN IT DepartmentCH-1211 Genve</p>