![Page 1: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/1.jpg)
Jeremy NowellEPCC, University of Edinburgh
[email protected]://www.npm-alarms.org/
A Standards Based Alarms Service for
Monitoring Federated Networks
Kostas Kavoussanakis, Jeremy Nowell, Charaka
Palansuriya, Florian Scharinger, Arthur Trew
ICNS 2009
Valencia
24 April 2009
![Page 2: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/2.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 2
Project Background
• EPCC is supercomputing centre at University of Edinburgh– Host UK national academic HPC service
– Academic and industrial consultancy
– http://www.epcc.ed.ac.uk/
• EPCC has been working in area of network monitoring for Grids for 5 years– First within EGEE project, now more widely
![Page 3: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/3.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 33
Overview
• Challenges of monitoring federated networks
• Standards-based network monitoring
• Why an Alarms Service
• Architecture
• Examples
• Future Work
![Page 4: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/4.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 4
Federated Networks
![Page 5: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/5.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 5
Network Monitoring Challenges
Network
Monitoring
Types Tools
User
Groups
Data
Formats
Administrative
Domains
NOC
backbone iperf ping
netflow
RRD
SQL
Flat file
GOC
End user
project
NREN
MAN
end-to-end
perfSONAR
![Page 6: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/6.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 7
Federated Networks for Grids
GÉANT2
NREN NREN
MAN MAN
Campus Campus
• For Grids need– unified view
– end-to-end performance
• real achievable application performance
![Page 7: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/7.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 8
Federated Network Monitoring Strategy
• Use existing tools and data– Do not try and force adoption of single tool across large multi-
administrative domains
– Instead provide framework for accessing distributed data
• Use standards-based solutions where possible– Access wide range of data
– Allow interoperability between grids, projects and networks
![Page 8: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/8.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 9
Standards-Based Network Monitoring
• Data federation through use of schema provided by Open Grid Forum (OGF) Network Measurements Working Group (NM-WG)
End Users of Network DataResource-brokering
Middleware
NOC/GOCUser
NM-WG Clientsand Services
Monitoring Frameworks
NREN using perfSONAR
End-site using perfSONAR
End-site using e2emonit
Home-grown Framework
NM-WG Schema allows interoperability between clients and measurement frameworks
![Page 9: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/9.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 10
Standards Based Network Monitoring
• EPCC has developed tools for accessing historical network performance data from multiple measurement frameworks
• e2emonit
– End-to-end metrics (TCP/UDP achievable bandwidth, RTT, packet loss, OWDV)
– Active measurement tools (iperf, ping, udpmon)
• perfSONAR
– Developed by collaboration including GÉANT2, ESnet, Internet2
– Passive data for router interfaces
• Utilisation, input errors, output drops
– Traceroute information
![Page 10: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/10.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 11
But…
• Historical data only useful for diagnosing problems when you already know something is wrong
• What users really needed are…
ALARMS
![Page 11: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/11.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 12
Requirements
• A network Alarms Service
– Allows the timely detection of problems
– Notifies users
– Gives an “at a glance” view of network status
![Page 12: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/12.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 13
– perfSONAR based monitoring solution deployed and operated by DANTE
• Need following alarms as minimum– Unexpected path changes
– Routing out of private network
– Router Interface Congestion
• Packets lost
Specific Requirements
• Motivated by the LHCOPN– 10 Gb/s private network for moving
data generated by the LHC
![Page 13: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/13.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 14
Strategy
• Query
• Detect
• Notify
![Page 14: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/14.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 15
Architecture
MA QueryInterface
Current StatusAnalyser
MA Notification
Interface
ConfigurationParser
StatusNotifiers
MeasurementArchive
MeasurementArchive
Alarms Archive
ConfigurationFiles
![Page 15: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/15.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 16
Details
• Query– NM-WG standard queries to perfSONAR RRD and HADES
Measurement Archives
• Passive Router Data – interface errors, drops, utilisation
• Traceroute Information
• Detect– Rules based mechanism to process data against rules defined in
configuration files
• DROOLS library
• Notify– Output status in form usable by Nagios
• Status display, notifications, history
– Easily implement more status notifiers
![Page 16: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/16.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 17
Examples
![Page 17: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/17.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 18
Examples
![Page 18: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/18.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 19
Examples
![Page 19: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/19.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 20
Current Status
• Prototype is currently being used by DANTE to monitor some LHCOPN paths and interfaces, for the required alarm conditions– Test functionality
– Gather feedback from users
• Will be further developed and deployed to monitor whole of LHCOPN during this year
• Actively looking for other users
![Page 20: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/20.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 21
Further Work
• Implement more alarm conditions
• Send status information to other consumers, eg network weather map
• Think about data processing– eg “cleaning” of data to remove bad data points
– Statistical processing etc
![Page 21: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/21.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 22
Summary
• Monitoring of federated networks is a challenge
• An Alarms Service is critical for problem discovery
• The LHCOPN is being monitored using an initial version– and will be developed further to be deployed to monitor the whole
network
![Page 22: A Standards Based Alarms Service for Monitoring Federated Networks](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681456f550346895db23f5d/html5/thumbnails/22.jpg)
24 April 2009 Jeremy Nowell - A Standards Based Alarms Service 23
• Acknowledgements– Funding
• UK Joint Information Systems Committee (JISC)• EGEEII (INFSO-RI-031688)• DEISA2 (RI-222919)
– Collaboration• DANTE• DFN WiN-Labor Erlangen• LHC-OPN