egee is a project funded by the european union under contract ist-2003-508833 roles ...

Download EGEE is a project funded by the European Union under contract IST-2003-508833 Roles  Responsibilities Ian Bird SA1 Manager Cork Meeting, 18-22 April 2004

If you can't read please download the document

Upload: marjorie-pierce

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Cork, April 18-22, SA1 Objectives Core Infrastructure services:  Operate essential grid services Grid monitoring and control:  Proactively monitor the operational state and performance,  Initiate corrective action Middleware deployment and resource induction:  Validate and deploy middleware releases  Set up operational procedures for new resources Resource provider and user support:  Coordinate the resolution of problems from both Resource Centres and users  Filter and aggregate problems, providing or obtaining solutions Grid management:  Coordinate Regional Operations Centres (ROC) and Core Infrastructure Centres (CIC)  Manage the relationships with resource providers via service-level agreements. International collaboration:  Drive collaboration with peer organisations in the U.S. and in Asia-Pacific  Ensure interoperability of grid infrastructures and services for cross-domain VO’s  Participate in liaison and standards bodies in wider grid community

TRANSCRIPT

EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004 Cork, April 18-22, Contents Review objectives How these map to ROC, CIC, OMC Overview of organisation ROC mandates CIC mandates OMC role OAG or not? Responsibility for deliverables Cork, April 18-22, SA1 Objectives Core Infrastructure services: Operate essential grid services Grid monitoring and control: Proactively monitor the operational state and performance, Initiate corrective action Middleware deployment and resource induction: Validate and deploy middleware releases Set up operational procedures for new resources Resource provider and user support: Coordinate the resolution of problems from both Resource Centres and users Filter and aggregate problems, providing or obtaining solutions Grid management: Coordinate Regional Operations Centres (ROC) and Core Infrastructure Centres (CIC) Manage the relationships with resource providers via service-level agreements. International collaboration: Drive collaboration with peer organisations in the U.S. and in Asia-Pacific Ensure interoperability of grid infrastructures and services for cross-domain VOs Participate in liaison and standards bodies in wider grid community Cork, April 18-22, SA1 Objectives Core Infrastructure services: (CIC, ROC) Operate essential grid services Grid monitoring and control: (CIC, ROC) Proactively monitor the operational state and performance, Initiate corrective action Middleware deployment and resource induction: (OMC, ROC) Validate and deploy middleware releases Set up operational procedures for new resources Resource provider and user support: (ROC, CIC) Coordinate the resolution of problems from both Resource Centres and users Filter and aggregate problems, providing or obtaining solutions Grid management: (OMC, ROC) Coordinate Regional Operations Centres (ROC) and Core Infrastructure Centres (CIC) Manage the relationships with resource providers via service-level agreements. International collaboration: (OMC) Drive collaboration with peer organisations in the U.S. and in Asia-Pacific Ensure interoperability of grid infrastructures and services for cross-domain VOs Participate in liaison and standards bodies in wider grid community Cork, April 18-22, Operations Infrastructure CERN (OMC, CIC) UK+Ireland (CIC,ROC) France (CIC, ROC) Italy (CIC, ROC) Germany+Switzerland (ROC) Northern Europe (ROC) South West Europe (ROC) South East Europe (ROC) Central Europe (ROC) Russia (CIC M12, ROC) 48 Partners involved in SA1 ROCs in several regions are distributed across many sites Cork, April 18-22, Grid Operations Management Structure Cork, April 18-22, ROC responsibilities Coordinate and support deployment Bring new resources into the infrastructure and support their operation within it. This includes coordination of existing national grid infrastructures and how they integrate into EGEE (OMC will help) Act as source of expert advice and technical support in deploying and operating the middleware in the RCs. Participate in validating new middleware releases - organise and operate certification testbeds within the region, provide resources (from regional RCs) for pre-production service. Port middleware to specific regional needs (e.g. local supercomputers etc.) Coordinate and support operations Act as front-line user support, Act as front-line support for resource centres Refer problems to CIC, OMC, or as appropriate Operate (or organise) regional grid services as necessary BDII, RB, UI? Cork, April 18-22, ROC - management Coordinate RC management Negotiate and monitor SLAs within the region Negotiate app access to resources within region Organise CAs within region Negotiate SLAs within region and of region to project Coordinate reporting of SA1 partners within region Coordinate planning for the regional activities Teams : Deployment team 24hour support team (answers user and rc problems) Operations training at RCs Organise tutorials for users Organise formal deliverables, reviewers etc where region has responsibility Contribute to release notes, planning guides, execution plans, operations guides, etc Cork, April 18-22, CIC mandate Operate infrastructure services VO services: VO servers, VO registration service RBs, UIs RLS and other database services BDIIs Ensure recovery procedures and fail-over (between CICs) Act as Grid Operations Centre Monitoring, proactive troubleshooting Performance monitoring Control sites participation in production service Support to ROCs for operational problems Operational configuration management and change control Accounting and resource usage/availability monitoring Contribute to release notes, guides, etc Feedback for infrastructure improvement Cork, April 18-22, Support ROCs, CICs, OMC etc All have support role Require good problem tracking system Distributed exchange trouble tickets (long term) Central accessible through web,etc (short term) Responsibility of A CIC? Single system for all or user support separated from operational support? Central operational support ROC managed user support with central 2 nd line databes Cork, April 18-22, OMC roles Edit release notes Edit planning guide (cookbooks) Edit execution and implementation plans Coordinate reporting Coordinate operations Via ROC managers, CIC managers, policy body Provide security oversight and coordination Coordinate SLAs between regions Coordinates with International grid projects Negotiate interoperation policies and frameworks Set up joint projects to address common issues Cork, April 18-22, OAG In TA proposed OAG to: Advise operations management of policy issues Negotiate agreements between RCs needed to operate grid Security, CA policies, access policies etc Makeup: Managers of ROCs, CICs, OMC Reps of apps (NA4) Cork, April 18-22, OMC Roles SA1 manager Deputy manager Production service manager Pre-production service manager (could be external) Planning officer Security officer CIC coordinator ROC coordinator (INFN) Teams: Certification Deployment CIC Cork, April 18-22, Coordination bodies ROC Managers Coordinator Cristina Vistoli (INFN) CIC Managers Need coordinator need to agree how they work together Operations Management OMC, ROC managers, CIC managers, SA2, reps from NA4 (=OAG) Resource allocation policy body as a subgroup? Security group clarify relationship with JRA3 Forum for RC system admins/managers? Need CIC managers Set up management group (CIC managers, ROC managers, OMC) to finish execution plan Cork, April 18-22, Milestones & Deliverables MonthDeliverable / Milestone ItemLead M03DSA1.1 Detailed execution plan for first 15 months of infrastructure operation CERN M06MSA1.1 Initial pilot production grid operational M06DSA1.2 Release notes corresponding to the initial pilot Grid infrastructure operational INFN M09DSA1.3 Accounting and reporting web site publicly available CCLRC M09MSA1.2 First review M12DSA1.4 Assessment of initial infrastructure operation and plan for next 12 months IN2P3 M14DSA1.5 First release of EGEE Infrastructure Planning Guide (cook-book), CERN M14MSA1.3 Full production grid infrastructure operational M14DSA1.6 Release notes corresponding to the full production Grid infrastructure operational CCLRC M18MSA1.4 Second review M22DSA1.7 Updated EGEE Infrastructure Planning Guide CERN M24DSA1.8 Assessment of production infrastructure operation and outline of how sustained operation of EGEE might be addressed. IN2P3 M24MSA1.5 Third review and expanded production grid operational M24DSA1.9 Release notes corresponding to expanded production Grid infrastructure operational INFN Cork, April 18-22, Deliverables and responsibility Accounting web site UK Release notes Should be existing, deliverable is a snapshot responsibility is to take the existing release notes and create a formal document. Should be a summary of Planning guides cook-book Should be a living, evolving document that is a major outcome of this phase of the project sum of all that we learn Deliverables are snapshots Other needed documents: Evolving plan laying out what we intend to do starts with execution plan (appendices) Release notes and cookbook are summary of what we did and what we learned