irnc noc - internet2 · irnc noc ๏the international research network connections network...

25
IRNC NOC Luke Fowler 2016 Internet2 Technology Exchange Miami

Upload: others

Post on 15-Jan-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOC

Luke Fowler2016 Internet2 Technology ExchangeMiami

Page 2: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Topics

๏IRNC NOC: Overview

๏Performance Engagement Team (PET)

Page 3: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC Program

๏ International Research Network Connections๏National Science Foundation program๏Funds network infrastructure and other supporting activities such as

measurement and NOC for international science, research, and education.๏ Indiana University GlobalNOC awarded to establish an IRNC NOC

https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503382

Page 4: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOC

๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications for IRNC network management, providing consolidated network monitoring, reporting, and operational visibility for the IRNC program. The IRNC NOC facilitates a single set of operational expectations for all IRNC funded infrastructure programs; this enables greater availability of IRNC infrastructure and improves results in troubleshooting multi-domain network issues. A central data repository created by the IRNC NOC provides critical operational information; monitoring data and performance metrics in support of NSF funded science and research.

Page 5: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOC

๏24x7x365 NOC support for IRNC infrastructure projects

๏1-855-IRNC-NOC๏ [email protected] // [email protected]๏http://irncnoc.globalnoc.iu.edu/

Page 6: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOC๏Service Desk function serves as entry point and single point of contact for

issues reported by users or detected via proactive monitoring๏Creates, maintains, and shepherds trouble tickets for various types of

events:

• Unscheduled outage

• Scheduled maintenance

• Problem report

• Service Request

Page 7: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOC

๏The IRNC NOC service desk develops processes/procedures with each IRNC infrastructure project for:

• Event notification (outage/maintenance)• Problem assignment and escalation• Operational/availability reporting• Etc.

Page 8: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Proactive Monitoring๏ IRNC NOC is workings with each INRC infrastructure projects to establish a

monitoring plan tailored to the project๏Leveraging existing GlobalNOC tools, including:

• GlobalNOC Database

• GlobalNOC Alertmon / Auto-monitoring

• SNAPP

๏Some open questions still for projects that are more experimentally focused

Page 9: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Operational Reporting

๏Provide reports on a regular basis for INRC infrastructure, detailing:

• Unscheduled outages

• Scheduled Maintenances

• Events of note

• Infrastructure availability

Page 10: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

What We Don’t Do?

๏ IRNC NOC helps find and verify the problems, and track their status from start to finish.

๏We don’t (usually) “actually fix” the problem๏Each IRNC infrastructure project does their own network engineering work

Page 11: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Data Archive

๏Collaborating with IRNC NetSage project (Dr. Jennifer Schopf, IU) to establish a shared data archive of network telemetry data.

๏used by NOC for problem detection, reporting, etc.๏Used by Netsage project for analysis & visualization๏Collects data using a variety of formats/protocols including SNMP, Netflow,

packet trace, etc. ๏Working with IRNC participants on data privacy issues to provide

appropriate data as a publicly available resource while ensuring sensitive data is only available for internal NOC use or summarized reporting.

Page 12: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications
Page 13: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications
Page 14: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Leveraging Route Views

๏Beginning work to integrate data from Route Views data into IRNC NOC activity.

๏ Idea: detect and report on ‘interesting’ / ‘important’ routing changes related to IRNC infrastructure

๏Use data from NetSage to identify ’routes of interest’๏Use data from Route Views to detect/observe changes in these routes๏Build operational reports, and potentially eventually pro-active alarming

based on this data๏New staff member beginning to work on this project over the fall.

Page 15: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC NOCPerformance Engagement Team

IRNC NOC is supported by the National Science Foundationaward 1450934

Page 16: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Background๏As network technology becomes more complex and opaque,

troubleshooting performance issues becomes more difficult for the layperson.

๏Trends• Increased Layer2 infrastructure obscures network path

• Heightened security removes public data metrics

• Increased use of network firewalls at the campus level

• Automated data transfer requires 24x7x365 support

๏As infrastructure complexity increases, the researcher is left to determine how to solve performance issues

Page 17: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

IRNC PET: Three Charges

1. Drive quick resolution of international inter-domain performance issues2. Build a common performance troubleshooting playbook3. Evolve perfSONAR as a tool for performance incident management

Page 18: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Drive resolution๏Centralized POC to request network troubleshooting assistance๏PET will

• Identify path

• Investigate with network contacts

• Test with available measurement points

• Resolve problems that are resolvable (and acknowledge problems that aren’t)

๏Researchers and network engineers can involve the PET

๏ Issues are tracked in a ticketing system -> creates accountability, metrics, and centralized contact tracking

Page 19: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Performance Playbook๏ IRNC NOC PET will collaborate on, design and maintain a centralized

troubleshooting process with major partner networks

๏ PET will maintain a website with network troubleshooting resources and references

• https://irncnoc.globalnoc.iu.edu/• Have worked 10+ performance issues to refine our internal process and

understanding of where external collaboration is necessary• Performance process on next slide • Will be working with similar performance-focused efforts (eduPERT,

Esnet, GEANT, etc.) to help define standards for collaboration, shared troubleshooting and knowledge capture

Page 20: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Issue Identified?

PET Issue Submitted

Assign PET Case Manager + Systems Engineer

Initial Questions and issue validation

Determine Network or

Systems primary actor

Retrieve or Draw Relevant Maps/

Diagrams

Investigate with Publicly Available Tools

Open tickets w/ relevant networks

Seek Updates every 3 days

Weekly Customer updates

Take Resolution Action Set state to

inactiveSet date for

review

Close Successful

yes

not yet

yes

no

Write After Action Report

future fix identified

Management Review

Continue

Close Unsuccessful

Resolvable?

Notify Customer

Update Diagram

Monthly Customer updates

Halt

date passed

date not passed

discuss

Additional Information

2

1

3 4 5

6.1

6.2

6

6.3

Investigate

7

9

810

11

12

13

14

15

16

17

18 19

20

A month has passed

Can’t Reproduce

Continue

taking too long

Continue

Notify Customer

IRNC PETPerformance

Troubleshooting Principles

• Investigate as much as you can using publicly available monitoring systems and data

• Provide centralized store of troubleshooting information (maps, ticket documentation, findings, etc.)

• More frequent updates to interested parties

• Likely lots of external collaboration required

Page 21: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Evolve perfSONAR

๏perfSONAR is used as the measurement tool of choice for the IRNC NOC. ๏The more perfSONAR enabled test points, the more successful the IRNC

PET will be in assisting researchers without involving the individual network owners

๏ IRNC PET will use experience gained in working cases to provide feedback and enhancements to the perfSONAR project

Page 22: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Year 1 Findings๏ Early involvement in performance troubleshooting process – We’re more effective

the earlier we’re brought in• This largely comes down to awareness of the IRNC NOC PET and its charge

๏ Perfsonar deployments into the campus• Issues tend to be local and the closer to the user the monitoring deployments,

the more troubleshooting work the IRNC NOC PET can do without involving regional and campus resources

• Visibility into network topology, traffic monitors and other data is sometimes restricted for security reasons

๏ Cooperation from peer and campus network engineering who may not see external user performance issues as a priority over their daily workload• We attempt to get around this by being squeaky wheels on behalf of the

researchers, but still….

Page 23: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Year 1 Findings (cont.)๏ Identifying “invisible” infrastructure (Layer2 switches and Firewalls)๏ Collaboration within the community is hugely important

• Documentation of findings will support that• We need a shared database of performance-focused contacts for large (and

small) networks๏ Understanding what network performance should be

• When performance has been bad for a long time, it’s difficult to know what the researcher should be getting

• Researchers sometimes lack the vocabulary or understanding to explain what they expect (e.g. “It just feels wrong”, “The graph looks off”)

Page 24: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Next Steps๏Create performance-focused contact database• Question: should we publish that? How open?

๏Outreach to science communities and R&E networks to make them aware the IRNC NOC PET exists as a resource

๏Continue to gather more experience• Assisting NSF-funded Netsage project in isolate problems on their

perfsonar mesh• May do more generalized perfsonar mesh monitoring beyond those in

the IRNC project

Page 25: IRNC NOC - Internet2 · IRNC NOC ๏The International Research Network Connections Network Operations Center (IRNC NOC) serves as a cooperative point of contact and communications

Questions?Chris Robb – [email protected] Fowler – [email protected] NOC: [email protected] a Performance Issue: [email protected]

IRNC NOC is supported by the National Science Foundationaward 1450934