computing facilities cern it department ch-1211 geneva 23 switzerland t cf alarming with gni voc wg...

17
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ CF Alarming with GNI [email protected] VOC WG meeting 12 th September 2013

Upload: roland-berry

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Computing Facilities GNI Overview Alarming with GNI - 3

TRANSCRIPT

Page 1: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF

Alarming with GNI

[email protected]

VOC WG meeting12th September 2013

Page 2: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Agenda

• GNI Overview• Metrics Manager

– Metric Registration– Metric Workflow – Quattor Legacy

• Lemon Producer• GNI Consumers

– Service Now Integration– GNI Dashboard– No Contact Processor

• Current Status and Next steps

Alarming with GNI - 2

Page 3: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

GNI Overview

Alarming with GNI - 3

Page 4: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Architecture

Alarming with GNI - 4

Page 5: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

Metrics Manager

Alarming with GNI - 5

Page 6: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Metric Registration

• Lemon Metric Manager: https://metricmgr.cern.ch• Single entry point for Quattor & Puppet metrics configuration• Keeps default parameters setting and assign responsibility

– Metrics parameters overloading available via puppet• Lemon metrics concept:

– Sensor implements multiple metric classes definition– Metric class can be used for multiple metrics definition

Alarming with GNI - 6

PuppetHiera

node

LemonAgent

LemonForwarder

configuration files

MetricManager

Page 7: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Metric Workflow

• Supports puppet only and puppet + quattor metrics

• New metrics:– Draft: user defines metric– Pending: user submits metric for approval, itmon team verifies– Production: itmon team propagates new metric to agent definitions

• Metrics already in Quattor: – Legacy: metric was imported from Quattor but is not enabled in Puppet– Production: itmon team propagates metric to lemon agent definitions

• Changes to production metrics:– Production: user changes metric definition– Production: itmon team propagates metric to lemon agent definitions

• Further details: https://metricmgr.cern.ch/help/

Alarming with GNI - 7

Page 8: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Quattor Legacy

• Metric definition must still be added to Quattor– Copy the generated Quattor code into a CDB template – e.g. under prod/pro_monitoring_*.tpl

Alarming with GNI - 8

Page 9: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

Lemon Producer

Alarming with GNI - 9

Page 10: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Lemon Producer

• Main components:– Lemon agent and sensors: no changes– Lemon forwarder: wrapping lemon data to JSON format– Lemon tools: no changes to lemon-host-check and lemon-cli

• Notifications send based on lemon exceptions (alarms)• Notifications can be customized in the node:

– Can be configured via puppet (How-to)– Overwrites defaults in metrics manager

• Users can create other notifications Alarming with GNI - 10

PuppetHiera

node

LemonAgent

LemonForwarder

configuration files

MetricManager

Page 11: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

GNI Consumers

Alarming with GNI - 11

Page 12: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Service Now Integration

• Takes notifications marked for incident creation• Checks if notification should be masked• Opens Incidents in SNOW• Re-submits notification with incident ID

• Supports masking of ticket creation• Today takes alarmed flag defined in Foreman

– Requires successful puppet run• In the future it will be integrated with Roger

– Developed by config team – Prototyping phase

Alarming with GNI - 12

Page 13: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Integration with Roger

• Masking in Roger– Service providing information about host state and masking state– Set masking for no contact notifications and 3 notification types:

• Hardware, OS, Application

• All exceptions must be classified under a notification type:– Hardware, OS, Application

• FE responsibles will be asked to classify their exceptions

Alarming with GNI - 13

Page 14: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities GNI Dashboard

Alarming with GNI - 14

Page 15: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities No Contact Processor

• Heartbeat from lemon metric updates• Processor looks at heartbeat timeout• Raises GNI notification

– Creates SNOW incident for CC Operator• If node comes back

– Closes GNI notification• Possible to mask with ROGER

Alarming with GNI - 15

Page 16: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities Current Status & Next Steps

• Current status– Deployed dev and prod instances of GNI, including Metric Manager– Migrated from Apollo to ActiveMQ– Integrated with training instance of Service Now

• Next Steps– Integrate Roger service for run-time notification type masking – Review default exception configuration – Start opening SNOW incidents for hardware notifications – Redirect production GNI instance to production Service Now

Alarming with GNI - 16

Page 17: Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland  t CF Alarming with GNI VOC WG meeting 12 th September

Computing Facilities

¿Questions?

[email protected]

http://cern.ch/itmon

Alarming with GNI - 17