ops post-utrecht.docx · web viewthe individual functions of the gno-me must be defined for each...

22
GNA – Global Network Architecture Web: http://gna-re.net/ Work in Progress document Document name: Operations – including Operational Security Author(s): GNA Technical Group Contributor(s ): GNA Technical Group Date: 26 October 2015 Version: 0.9P 1. Operations - including Operational Security This document describes the principles, systems and procedures relating to the operation of the GIRE, including security considerations. 1.1. Introduction There will not be a single ‘GIRE NOC’ in the sense of one organization operating a global network. NOCs will collaborate and together provide the quality that is required. Due to this heterogeneous, open and inter-organizational character of the GNA the majority of traditional operations functions will remain distributed. Some of these distributed functions are essentially local functions, that need to be performed, but that have no global visibility or dependency, and that will remain the responsibility of the individual operators. Some of the distributed functions however are global functions that must be defined for the operations and management of GIRE facilities

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Document name: Operations – including Operational SecurityAuthor(s): GNA Technical GroupContributor(s): GNA Technical GroupDate: 26 October 2015Version: 0.9P

1. Operations - including Operational Security

This document describes the principles, systems and procedures relating to the operation of the GIRE, including security considerations.

1.1. Introduction

There will not be a single ‘GIRE NOC’ in the sense of one organization operating a global network. NOCs will collaborate and together provide the quality that is required.

Due to this heterogeneous, open and inter-organizational character of the GNA the majority of traditional operations functions will remain distributed.

Some of these distributed functions are essentially local functions, that need to be performed, but that have no global visibility or dependency, and that will remain the responsibility of the individual operators.

Some of the distributed functions however are global functions that must be defined for the operations and management of GIRE facilities and services to provide the high-quality end-to-end operational service expected by users of the GIRE. These functions are executed in a distributed manner but are dependent on global collaboration. This means these functions need to be understood, agreed, and jointly developed processes are put in place to ensure the entities managing the networks co-operate to provide the services and operational responsiveness that is fundamental to differentiating the GIRE from commercial offerings.

In addition to the distributed functions a number of cooperative functions are required, that are mostly related to ensuring the distributed global functions are correctly implemented and easy to execute for the different NOCs. This entails the responsibility for a common data repository and distribution mechanism that supports the sharing of data. Next to the technical functions where a centralized entity makes sense more managerial functions can be envisioned here such as monitoring, liaising and facilitating decision-making.

Page 2: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Figure 1

1.2. Distributed Global Operational Functions

Distributed Global Operational Functions are these functions that are organized and executed locally, but that have a global dependency.

Page 3: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

The following distributed global functions are required for operation of the GIRE:

Function Description

Availability Management Monitoring of the status and performance of links, devices and services, and systems to exchange this information.

Incident Management Ticketing systems, troubleshooting procedures, and communication and escalation protocols required to manage planned and unplanned incidents with impact on the performance of the GIRE.

Change Management Management of adds, moves and changes to the network.

Capacity Management and Planning

Systems and processes enabling identification and anticipation of capacity or congestion issues, and procedures to deal with these.

Security Management Procedures and systems to identify and deal with security-related issues, relating to GIRE equipment, facilities and sevices. Needs to take into account national or regional legal requirements.

Reporting Management General process to ensure reporting of all functions via appropriate channels at COs.

Provisioning Management Procedures and systems, either manual or automated, required to provision or deprovision services and infrastructure on the GIRE.

Vendor Management Management of hardware and software purchases and support/maintenance contracts. Includes assessment and validation that a vendor’s equipment is suitable for use in the GIRE.

Some of these operational functions require the setting of a global standard to create the basis for a high-quality service. Other functions are provided locally by Controlling Organisations (COs), as they do not have a global impact.

Each function consists of three areas that (may) require global definition and potentially support, as opposed to those managed completely locally by the CO:

Page 4: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

● Standards: The function needs globally standardization within the GNA context, ie. all COs must collaborate to create one set of procedures or processes and adhere to their implementation. These standards will be documented. (Example: SLAs; technical standards; service provisioning procedures.)

● Real time data: The function requires data to be made available globally, in real time for all other entities in the operational setup, to allow them to manage the service. (Example: up/down status of circuits)

● Reporting: Some functions requiring standardized global reporting of performance, since it has a direct relationship to the quality of the end-to-end service. (Example: latency of a path.)

Standards Real time data Reporting

Availability Management Global Global Global

Incident Management Global Global Global

Change Management Global Local Local

Capacity Management Global Local Local

Security Management Global Global Local

Reporting Management Global Global Global

Provisioning Management GlobalGlobal

Local

Vendor Management GlobalLocal

Local

1.3. Global Network Ops Management and Federated Information Environment

Page 5: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

The management functions together comprise the Global Network Ops Management Environment (GNO-ME). The exact details and combination of the management functions will vary by facility or service. However, these management functions are applied to both facilities and services.

All operations management functions create data for each facility or service being managed. The integration of this data across the Global Network represents the Global Network Federated Information Environment (GN-FIE). The GN-FIE represents the current operational status of the GN and, via its data archive, the operational history of the GN. For reference, see Figure 1 above.

When an organization brings a facility or a service to the GNA for inclusion in the GN, that organization must take responsibility for the implementing the GNO-ME for the contributed facility/service. The organization is also responsible for supplying the GNO-ME data to the GN-FIE.

1.4. Centralized Components: Catalyzer, Support and Verification

Each participant in GNA will have their own Service Level and perhaps multiple different SLAs, and publish this/these. The GNA will determine minimum levels and thresholds that are agreed among the participants.

To ensure that the operational setup is successful, a limited set of meta-functions must be organized centrally rather than relying on a distributed and potentially sparse effort. These meta-functions are not involved in the immediate operational activity, but are to facilitate and ensure that the global standards are implemented, adhered to and monitored.

Centralized functions include:

● Initial assessment and ongoing verification that COs are adequately supporting the required functions.

● Providing expertise and advice to COs.● Building and maintaining the global data exchange (GN-FIE)● Centralizing and combining reporting● Escalation of issues.● Facilitating the interaction between the different operators

Page 6: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

1.5. Localized Functions

Local functions are those functions that need no information exchange, that have no direct operational impact.

Operational Security

As stated previously, the GNA is a framework that network participants agree to and participate in, in order to operate the GIRE. As such, the most basic GNA security requirements are to keep any GN owned equipment physically and logically secure and the software at appropriate security and OS levels.

In addition to this, COs are obliged to maintain security of the portion of the GNO-ME and the GN-FIE which they are hosting, since the databases and processes provide information critical to the operation of the GIRE, and in some cases confidential. This may include elements of each operational function which are both globally as well as locally significant.

In addition to the security expectations defined in the Operational Management function and agreed between COs, any local, national or international laws remain applicable. Legal advice will be sought during the detailed development of the operational structure of the GNA, in order to identify potential issues around confidentiality of data, legal responsibility and liability.

1.6. Global Functions Requirements Note- many of these requirements have been adapted from the ANA operational documents and are placeholders subject to additional refinement.

Open Exchange Point NOC services

For detailed information regarding the Open Exchange Point Services please see the separate Open Exchange Point Requirements document.

ProceduresThe following procedures are identified as the first to be defined on a global scale:

Change Management: Link Initialization Incident Management: Service disruption reported to a NOC: an end user

reports a problem related to a service that crosses the GNA. Incident Management: Integrity check of the physical link: in the event of an

unexpected issue; Change Management: Maintenance by an Open Exchange: when one of the

exchanges is going to carry out maintenance on its infrastructure;

Page 7: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Change Management: Maintenance by a Supplier: in case the carrier must perform maintenance;

Escalation: if a circuit stays in a faulty condition after procedures above and appropriate action is not taken, this procedure describes the escalation procedure.

Information Sharing: Provisioning/Deprovisioning: What are the procedures for making requests?

Information Sharing: Topology Sharing: What are the expectations or requirements around sharing information about internal structure?

Information Sharing: Ticketing: How is information shared between the various organizations?

o Incident Management: Security Incidents: How are these handled? What are notification needs? What are the differences between data plane and control plane issues?

Change management: Link InitializationTypically a link is procured by a contracting party (for example ESnet, or the ANA consortium), but operationally provided by a commercial supplier.

When a link is connected between two GXPs the supplier will need to be provided with the contact information of the NOCs at the Exchange points. This will be a responsibility of the contracting party for the circuit. When the contract is put in place the supplier must be provided with, and agree to use, all the contacts that the contractual owner provides.

There is a variety of information that needs to be kept by every GXP for each circuit. This includes:

a) Circuit ID’s from the carrier.b) Contractual owner contact information.c) NOC contacts at both ends of the circuit.d) Port information on the connecting ends.e) Opt-in contact lists for notifications.

This information will be shared between both connecting NOCs. Any time this information is updated all participating parties will need to be notified.

For all of the links and resources (such as the GXP) there will be SLAs. It is important that the contracting party understands the global GNA requirements and ensures that the SLA he concludes with the provider falls well within the SLA provided by the GNA.

Each GXP NOC should be able to request the state, as noted at both ends, of any circuit landing on that Exchange Point.

Page 8: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Incident Management: Service disruption reported to a NOC:

An end user reports a problem with a service that uses the GNA links for connectivity. This might for example be a user reporting perceived substandard end-to-end performance of a data transfer. In any multi-domain system there are multiple spots a problem can occur, and not every NOC will have access to every location for testing. In order for this to be successful both test points and information sharing must be available to every NOC.

Such a service disruption report is generally directed to the NOC of the organization that provides services to that user. An example is a reported problem between a site connected to Internet2 and a site connected to an NREN in Europe.

When this occurs the following process will be followed:1) Originating NOC opens a ticket.

a. If the initial investigation shows it’s a local problem fix it and close the ticket.2) A Global ID and alias is assigned.3) Originating NOC contacts their upstream or parallel NOC with the ticket.

a. That upstream NOC also opens a ticket for the issue.b. This process will continue as needed in order to involve all of the relevant

NOC’s.4) The NOC’s will jointly develop a test plan to isolate the problem; the Originating NOC

coordinates.5) Once the problem is understood and resolved the tickets will be closed.

In these cases the NOCs make sure that issues around False Isolation do not unduly prevent solving the problems. The most efficient way around this is to have sufficient test points in place to minimize the danger of incorrect diagnosis.

It is possible that a service level report turns out to be an Integrity issue on the circuit itself in which case that procedure will be initiated. Tickets related to the service level should remain open until the circuit issues are resolved and there is verification that the service is operating normally once again.

As a part of the monitoring service, performance measurement tests should occur at regular intervals. Where possible these should be done at the service level. These measurements will be made available for use as a baseline for determining if a reported issue is within the normal service parameters or not.

Page 9: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Incident Management: Integrity check of the physical link

Figure 1: Integrity check of the physical link

The NOC that first notices a performance impacting issue will take ownership of the incident and contact the NOC on the other side1. If one of them is able to find the problem in their own domain (by checking alarms, port status and link load) it is resolved by that entity and the fix - or maintenance for resolving - is communicated back to the other NOC.

In case both NOCs cannot find problems in their own domain the carrier is contacted by the problem-owning NOC. The supplier checks their link and reports back. If the carrier is able to resolve the problem immediately, it may do so and report back to the requesting NOC.

1 Note that the supplier NOC is not immediately informed by the originating NOC in step 2, as requesting information from only the other OE NOC does not need unnecessary or time consuming interaction with the supplier. In case NOC A and B expect the fault resides within the carrier’s domain, the carrier will be contacted in step 5.

Page 10: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

The NOCS at either end of a circuit will need to share information on the equipment types where the circuit lands. Tools for testing for circuit integrity should be shared between the participating NOC’s.

When each domain reports no issues are seen, then a more detailed investigation is started, or the problem is escalated: see Escalation procedure.

Incident Management: Escalation

Figure 2: Escalation procedure

Example: if a link between two Open Exchanges is in a faulty condition and the carrier seems non-responsive, the escalation procedure is started. The problem-owning NOC escalates to the Contractual Owner of the link. The Contractual Owner escalates at the Supplier. The supplier takes appropriate action and reports a fix to the Contractual Owner and both NOCs.

There may also be situations where internal (internal to the GNA) escalation needs to occur as well. In the RACI matrix there should be sufficient information to allow this to happen. Whether any NOC chooses to extend the scope of this to their participant organizations is a local decision.

Page 11: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Change Management: Maintenance by OXP

For all maintenance issues the participating NOC’s will need to agree on requirements. These would include:

- Notification time for scheduled maintenanceo 5 working days is suggested.

- General agreement on preferred windows for maintenance need to be determined.o This is difficult due to the many time zones serviced by this infrastructure.

- Impact assessments for any planned maintenance need to be prepared and made available.

There will also be times where emergency maintenance will be required. There needs to be a clear understanding of what situations justify an emergency maintenance.

- Thresholds for soft failures need to be determined.- This may in part depend on the user requirements in any given case.- Some minimum levels should be set, with the understanding that some lesser levels

may generate a maintenance.

In all cases, planned or emergency, notification will be sent to all NOC’s so they can proceed to notify their user base.

Page 12: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Figure 3: Maintenance by Open Exchange

Figure 3 shows the maintenance procedure in case an Open Exchange is going to perform maintenance. The NOC schedules the maintenance and notifies the other NOC and supplier – and optionally the end users of one or more services over the physical link2. The contractual Owners of the links connected to that exchange should also be notified.

Maintenance is carried out and after completion all parties receive a notification that the action has finished.

2 Of course the NOC that has scheduled maintenance also notifies their other OE-connected networks involved. For readability, this is not drawn in Figure 3.

Page 13: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Change Management: Maintenance by a Supplier

Page 14: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Figure 4: Maintenance by a Supplier

In case the supplier of the link is to perform maintenance it will schedule and notify both NOCs connected to their link. The NOCs notify their users with a local ticket ID. After maintenance has been completed, the carrier notifies the NOCs. Information on standard provider windows for maintenance events should be made available to each NOC.

Page 15: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

Information Sharing: Provisioning/Deprovisioning documentation:

Documentation on procedures for provisioning services will be made available to all GNA participants. This does not need to be different from standard provisioning methods, it just needs to be made available. Useful information would include:

- Standard provision request addresses- Standard provisioning times- Information required to provision a service

Information Sharing: Naming conventions

Figure 5: Naming conventions

Page 16: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

To identify links and services, several IDs are proposed:

Link ID: assigned by the link supplier

Global ID: assigned by originating Open Exchange (per service, same in all domains). The global ID consists of the prefix ‘urn:ogf:network:’, followed by the DNS-like identifier of the originating domain, followed by a local ID that is chosen by the originating Open Exchange. See Figure 6 for examples.

Local ID: optional local ID that domains have (per service, may be different in each domain). The local ID is free format.

While there is general agreement that using the Global ID will be valuable, these ID’s can become somewhat cumbersome. One suggestion to simplify naming would be to assign an alias to the Global Id. That alias could be an imbedded part of the Global ID or simply a string that is associated with that Global ID.

Alias ID: Simple name associated with the Global ID used to simplify discussion about an incident.

Information Sharing: Topology Sharing: What are the expectations or requirements around sharing information about internal structure?

There are really two separate issues here. One is the topology of the Exchange points. For this to work is it necessary for them to make public information on who is connected to the exchange and port structure? The other is a similar question regarding the attached networks.

Certainly if there is any automated provisioning occurring sufficient topology will need to be included to allow this to be successful. The question is how much needs to be available for diagnostic and other provisioning needs.

Information Sharing: Ticketing: The consensus is that each organization should utilize its own ticketing system, there is no intent to establish a single GNA ticket system. However, ticketing has to be designed such that tickets can be handled in the different systems with an understanding that the end-to-end service has a global context.

Global service parameters such as resolution times for end-to-end services might be very difficult to report with this kind of setup, so eventually more tooling might be considered.

Security Management: Security Incidents:

Page 17: Ops Post-Utrecht.docx · Web viewThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration

GNA – Global Network ArchitectureWeb: http://gna-re.net/

Work in Progress document

How are these handled? What are notification needs? What are the differences between data plane and control plane issues?

This needs more information. I am not sure what to include here.

1.7. Next steps

Definition of GNO-ME componentsThe individual functions of the GNO-ME must be defined for each facility or service comprising the GIRE. It seems best to begin with the ANA Collaboration as a test case for a facility, focusing initially on operation of the links and GXPs concerned, then adding services subsequently. And expanding to other path finders.

Definition of the GN-FIEData and archive structures for the GN-FIE must be defined and a mechanism for moving data from the GNO-ME to the GN-FIE defined. A commitment to supportA commitment from the CEOs of NRENs participating in the GNA is necessary for the distributed operations model posed here to succeed.

First, a CO must be willing to work with GNA Ops-Sec to develop and implement the GNO-ME for the particular organizational contribution (service or facility). This will require a commitment of time and resources on the part of the CO.

Second, the GNA must provide staff dedicated to:1. Defining the GNO-ME for the facility/service being contributed.2. Working with the CO to implement the GNO-ME for the facility/service.3. Working with the CO to implement the necessary data movement to enable the GN-FIE

component for that facility/service.