nexus operations and maintenance best...

112

Upload: docong

Post on 05-Sep-2018

238 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,
Page 2: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Nexus Operations and Maintenance Best Practices

Arvind Durai, Director, Solutions Integration

Nick Garner, Solutions Integration Architect

BRKDCT-2458

Page 3: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Arvind Durai Nick Garner

– 17 years with Cisco Advanced Services

– Has worked with more than 100 customers in enterprise architecture, technology designs and operational simplification

– 9 years of Active Cisco live presenter

– Co-authored three Cisco Press Books – Cisco Firewall Services Module, Virtual Routing in the Cloud and TCL Scripting for Cisco IOS

– CCIE R/S and Security #7016

– 9 years with Cisco Advanced Services

Enterprise Campus and Datacenter

– Prior to Cisco, 10 years in financial and military verticals.

– Architecture, design and security support for the largest

ACI environment in the world.

– Cisco Press Technical Editor

– CCIE R/S and Security #17871

Contributors: Satish Kondalam, Junmei Zhang, Tarunesh Ahuja,

and many others from the Nexus TME team.

BRKDCT-2458 3

Page 4: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

• vPC Refresher

• Operational Best Practices: Software

• Operational Best Practices: Hardware

• Node Isolation

• NX-OS Graceful Insertion and Removal

• ACI Operational Best Practices

• Change Window Best Practices

Agenda

Page 5: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Course Objective & Goal

• To help Data Center operations and engineering staff understand the operational best practices when maintaining a Cisco Nexus data center network deployment.

• Attendees should leave the session with a firm understanding of

• Operational Best Practices

• Nexus Graceful Insertion and Removal

• Change Window Best Practices

BRKDCT-2458 5

Page 6: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

High-Level Technology Review

Page 7: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Center Technology Evolution

FabricPath with vPC+

2010

2009

VPC

< 2008

STP2013-2014

MPLS, OTV,

LISP

Standalone Fab.

VXLAN

MPLS, OTV,

LISP

2014-2015

ACI

2010

FEX with vPC

MPLS, OTV,

LISP

vPC

BRKDCT-2458 7

Page 8: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

vPC Feature OverviewvPC Terminology

Layer 3 Cloud

vPC Member Port

vPC

Orphan Device

Orphan Port

vPC Peer

CFS

vPC Domain

Peer Link

vPC PeerKeepalive Link

S1

S3

S2

Failure Scenario

• If both peers are active, then Secondary vPC peer will disable all vPCs to avoid Dual-Active.

• Data will automatically forward down remaining active port channel ports.

• Loss of in-flight packets will depend on deployment of vPC best practice.

For YourReference

Layer 3 pt-pt link

BRKDCT-2458 8

Page 9: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

vPC best practice

• vPC Domain ID’s

Use a unique vPC domain ID within a contiguous L2 domain to avoid MAC overlap.

• vPC Peer Link

Should be point-to-point connection & dedicated links.

• vPC Peer Keepalive Link

Dedicate a control plane in a dual-supervisor environment. Use a management switch.

• vPC peer-gateway

Acts as active gateway for frames addressed to peer switch. Avoid Peer Link forwarding.

• Use vPC peer-switch

Optimizes BPDU processing, single logical L2 entity

• Distribute port-channel member interfaces across line cards within the same chassis.

• Create a map for oversubscription aligned to current and future demand.

Deployment practice – 20:1 at access and 2:1 at Core.

vPC general deployment best practice

HIGHIMPACT / HARD TO

IMPLEMENT

LOWIMPACT / HARD TO

IMPLEMENT

HIGHIMPACT / EASY TO

IMPLEMENT

LOWIMPACT / EASY TO

IMPLEMENT

QUICK WINS!

BRKDCT-2458 9

Page 10: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

1. vPC peer-link down : S2 - secondary shuts all its vPC member ports

2. S1 down : vPC peer-keepalive link down : S2 receives no keepalives

3. After 3 keepalive timeouts, S2 changes role and brings up its vPC

vPC Primary

vPC Secondary

PS

P

Operational

Primary

Auto-recovery addresses two cases of single switch behavior

• Peer-link fails and after a while primary switch (or keepalive link) fails

• Both VPC peers are reloaded and only one comes back up

S2S1

S3

P S

S1 S2

S3

P S

S1 S2

S3

vPC Configuration Best PracticesvPC Auto-recovery

BRKDCT-2458 10

Page 11: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

VPC Self-isolation Automatically triggered isolation

Example Presented: All Line Cards Fail

Current Impact Self-isolation Behavior (7.2)

• When this failure

happens on primary,

peer-link is brought down.

• This causes the

secondary brings down

all legs.

• Traffic is completely

blocked.

When this failure happens:

•Physically bring down peer-link

•Physically bring down all vPC legs

•Send self-isolation through peer-keep-alive

Peer switch:

•Receive self-isolation from the peer through

peer-keep-alive

•Change role to Primary

•Bring up all down vPC legs

BU Testing Results:

Sub-second Recovery (N>S) (S>N) (E>W)

PKA

Vlan 1-100

Vlan 1-100Vlan 1-100

Primary Secondary

Current

PKA

Vlan 1-100

PrimarySecondary

Self-isolation

NOTE: Available in NX-OS 7.2, 5k/6k/7k

BRKDCT-2458 11

Page 12: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

VPC Self-isolation Automatically triggered isolation

No up VLANs on peer-link: This case address the issue that no VLANs are up on the peer-link while the port channel is physically up (i.e., peer-link is up with no VLANs). For example: VLAN misconfiguration, hardware programming failure.

Current Behavior Self-isolation Behavior

• The up VLAN on vPC legs

are the results of configured

VLAN intersected with the

UP VLAN on peer-link (i.e.,

three-way intersection).

• All VLAN down on vPC

domain.

• This completely blocks the

traffic

Isolated switch:

• Physically bring down peer-link

• Physically bring down all vPC legs

• Send “self-isolation” through peer-keep-

alive

Peer switch:

• Receive self-isolation from the peer

through peer-keep-alive

• Change role to Primary

• Bring up all down vPC legs

PKA

No vlan

Vlan 1-100 Vlan 1-100

Current

PKA

Vlan 1-100

Self-isolation

NOTE: Available in NX-OS 7.2, 5k/6k/7k

BRKDCT-2458 12

Page 13: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Object-tracking

vPC Configuration Best Practices

S1 S2

• Object Tracking triggered when the track object goes down

• vPC object tracking, tracks both peer-link and uplinks in a list of Boolean OR

• Traffic forwarded over the remaining vPC peer.! Track the vpc peer link

track 1 interface port-channel11 line-protocol

! Track the uplinks

track 2 interface Ethernet1/1 line-protocol

track 3 interface Ethernet1/2 line-protocol

! Combine all tracked objects into one.

! “OR” means if ALL objects are down, this object will go down

track 10 list boolean OR

object 1

object 2

object 3

! If object 10 goes down on the primary vPC peer,

! system will switch over to other vPC peer and disable all local vPCs

vpc domain 1

track 10

• Suspends the vPCs on the impaired device.

S4 S5

S3

BRKDCT-2458 13

Page 14: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

SSO-Aware Applications

Forwarding Information Base

IEEE 802.1x

PAgP / LACP

…and more

SSO-Aware Applications

Forwarding Information Base

IEEE 802.1x

PAgP / LACP

…and more

Checkpointing

Facility

Checkpointing

Facility

Redundancy

Facility

Redundancy

Facility

SSO-Compliant Applications

Routing Protocols

NetFlow

Cisco Discovery Protocol

…and more

SSO-Compliant Applications

Routing Protocols

NetFlow

Cisco Discovery Protocol

…and more

Stateful Switchover ModeSSO-Aware and SSO-Compliant Applications

Active Supervisor

Standby Hot Supervisor

BRKDCT-2458 14

Page 15: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Routing Protocol Redundancy With NSF

EIGRP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Active Supervisor Engine Slot 1EIGRP RIB

Prefix Next Hop

- -

- -

- -

OSPF RIB

Prefix Next Hop

- -

- -

- -

ARP Table

IP MAC

- -

- -

- -

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Standby Supervisor Engine Slot 2

SSO

Redundancy Facility

Checkpoint Facility

BRKDCT-2458 15

Page 16: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Routing Protocol Redundancy With NSF

EIGRP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

EIGRP RIB

Prefix Next Hop

- -

- -

- -

OSPF RIB

Prefix Next Hop

- -

- -

- -

ARP Table

IP MAC

- -

- -

- -

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

SSO

Redundancy Facility

Checkpoint Facility

Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2

BRKDCT-2458 16

Page 17: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Routing Protocol Redundancy With NSF

EIGRP RIB

Prefix Next Hop

- -

- -

- -

OSPF RIB

Prefix Next Hop

- -

- -

- -

ARP Table

IP MAC

- -

- -

- -

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Standby Supervisor Engine Slot 2

GR/NSF Signaling per protocol

Synchronization per protocol

EIGRP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

BRKDCT-2458 17

Page 18: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Routing Protocol Redundancy With NSR

BGP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Active Supervisor Engine Slot 1

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Standby Supervisor Engine Slot 2

SSO

Redundancy Facility

Checkpoint Facility

BGP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

BRKDCT-2458 18

Page 19: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Routing Protocol Redundancy With NSR

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0.0 aa25cc:ddeee8

Standby Supervisor Engine Slot 2BGP RIB

Prefix Next Hop

10.0.0.0 10.1.1.1

10.1.0.0 10.1.1.1

10.20.0.0 10.1.1.1

OSPF RIB

Prefix Next Hop

192.168.0 192.168.0.1

192.168.55.0 192.168.55.1

192.168.32.0 192.168.32.1

ARP Table

IP MAC

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

10.20.1.1 aa25cc:ddeee8

No additional signaling required to maintain topology

BRKDCT-2458 19

Page 20: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Comparing NSF and NSRMetric Non-Stop Forwarding / Graceful Restart Non-Stop Routing / Stateful Restart

Configuration required Yes, per protocol instance on NSF

capable device, No configuration

required for NSF –aware devices for

Interior Gateway Protocols. BGP

requires GR configuration on both

NSF-Capable and NSF Helper device

Yes per protocol instance.

BGP also requires per peer

configurations

Which routing protocols are supported EIGRP, OSPFv2, OSPFv3, ISIS, BGP,

LDP

ISIS, OSPFv2

Synchronizes routing protocol state

and RIB information across redundant

control planes

No Yes

Requires specific feature support on

peer routers

Yes No

BRKDCT-2458 20

Page 21: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Standalone Chassis Redundant Core

• Redundant topologies with equal cost

paths provide sub-second convergence.

• NSF/SSO provides superior availability in

environments with non-redundant paths.

Se

co

nd

s o

f L

ost

Vo

ice

*

X

Failure or change at the Core

• Enable BFD for all OSPF neighbor links

• Adjust OSPF spf-throttling timers with:timers throttle spf

timers throttle lsa

timers lsa arrival

0

1

2

3

4

5

6

RP Convergence

Is Dependent

on IGP and Tuning

Link

Failure

Node

Failure

NSF

SSO

SPF

Throttle

OSPF

Convergence

* Route scale dependent.BRKDCT-2458 21

Page 22: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Traditional vPC Environment Change Change Best Practice & Window

vPC

Primary Secondary

Fex

Core Node / Link Failure or Isolation

Access Node / Link Failure or Isolation

RP BP

vPCBest Practice

Hardware HA Best Practice

BRKDCT-2458 22

Page 23: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Layer 3 Spine/Leaf Environment Change Best Practice & Window

Leaf Node / Link Failure or Isolation

VPC Best Practice

Spine Node / Link Failure or Isolation

RP BP

Hardware HA Best Practice

n1 n2

BRKDCT-2458 23

Page 24: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Layer 2 Spine/Leaf EnvironmentChange Best Practice & Window

Leaf Node Failure / Link Failure

VPC Best Pactice

RP BP

Layer 3

Spine Node Failure / Link Failure

Hardware HA Best Practice

n1 n2

BRKDCT-2458 24

Page 25: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Operational Best PracticesSoftware

Page 26: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Independent memory-protected restart-able processes

• Service Restart-ability

• Stateful Restart with

Persistent Storage Service (PSS)

• Checkpoints states to PSS

• Recover states from PSSupon restart.

• Stateful Restart with Graceful Restart

• Recover states based oninformation from other services and/or network.

• Mainly Routing Protocols

• Stateless Restart

• Fresh start, no trace of former instantiation.

Layer-2 Protocols Storage ProtocolsLayer-3 Protocols

Interface Management

Chassis Management

Kernel

Sysm

gr,

PS

S &

MT

S

SN

MP

, X

ML,

CLI

Managem

ent

Chip/Driver Infrastructure

VLAN mgr

STP

OSPF

BGP

EIGRP

GLBP

HSRP

VRRP

VSANs

Zoning

FCIP

FSPF

IVR

UDLD

CDP

802.1XIGMP snp

LACP PIMCTS SNMP

Other Services

Future Services

Possibilities

……

Protocol Stack (IPv4 / IPv6 / L2)

NX-OS High AvailabilityProcess Modularity

BRKDCT-2458 26

Page 27: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 27BRKDCT-2458

Hardware FIB

Software RIB

Linux Kernel

BG

P

OS

PF

PIM

TC

P/U

DP

IPv6

ST

P

HS

RP

LA

CP

etc

HA Manager

Restart process!

Graceful restartGraceful restart

Routing updatesRouting updatesTable

Update

Nexus Data Plane

Stateful Fault Recovery

If a fault occurs in a process…

• HA manager determines best recovery action (restart process, switchover to redundant supervisor)

• Process restarts with no impact on data plane• State checkpointing (PSS) allows instant, stateful process recovery

• Software utilizes Graceful Restart where appropriate

Page 28: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Active Sup

Backup Sup

LC - NSF

• Stateful Switchover (SSO)

• Active-backup supervisors synchronized at all times

• Routing Protocols: PSS Stateful Restart

NSF Graceful Restart failover

• Other components: PSS Stateful Restart

• Triggers:

• HA Policy Initiated – e.g. 3 component crashes SSO

• User Initiated – system switchover

• ISSU initiated SSO

NX-OS High AvailabilitySupervisor Switchover

BRKDCT-2458 28

Page 29: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

An In Service Software Upgrade (ISSU) provides the capability to perform transparent software upgrades on platforms with redundant supervisors. Therefore, new features and fixes can be installed without any downtime or packet loss.

Recommended for highly critical, no packet loss tolerated, environments.

Only 1 command requiredn7000# install all kickstart bootflash:n7000-s1-kickstart.6.2.12.bin

system bootflash:n7000-s1-dk9.6.2.12.bin

Note: The CLI command wrapped due to the length

1. Copy the NX-OS images to flash2. Verify MD5 – switch# show file <image> md5sum

3. Use the install all command to start the upgrade

Upgrade Procedure:

Software UpgradeISSU

29BRKDCT-2458

Page 30: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Dual-supervisor failover only

• ISSU is user initiated:

• Through CLIinstall all kickstart <kickstart image> system

<system image> cmp <cmp image>

• Components upgraded:

• Supervisor: BIOS, Kickstart and System image

• Linecard: BIOS and Linecard image

• System wide upgrade – Not VDC specific

• Single-supervisor ISSU is not possible on the 7k. Service disruption might occur.

Upgrade BIOS on Active, Standby Sup and Linecards

Bring up old Active Sup with new image

Switchover (Standby takes over as Active)

Upgrade CMP

Perform HITLESS LC Upgrade (one at a time)

Upgrade Done

Bring up Standby Sup with new image

NX-OS High AvailabilityISSU

30BRKDCT-2458

Page 31: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Release

5.2(9a)Release

6.2(8b)

Linux KernelO

SP

F

BG

P

PIM

etc

.

HA Manager

Nexus Data Plane

Linux Kernel

HA Manager

Active

I/O Module Images

Upgrade and reboot

OS

PF

BG

P

PIM

etc

.

Standby

Initiate stateful failoverUpgrade and rebootUpgrade I/O modules

Release

5.2(9a)

Nexus# install all kickstart bootdisk:6.2-kickstart system …

Release

5.2(9a)

Release

6.2(8b)

Release

6.2(8b)

In-Service Software Upgrade

Needed for animation,

Best Practice:

VPCs should be distributed.

BRKDCT-2458 31

Page 32: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

In-Service Software UpgradeCheck Upgrade Path in Release Notes

Current

Release

Release

Train

Releases That Support ISSU to the

Current Release

Releases That Support ISSD from the

Current Release

NX-OS Release 6.2(12) 6.2 6.2(8a), 6.2(8b), 6.2(10) No support.

NX-OS Release 6.2(10) 6.2 6.2(8), 6.2(8a), 6.2(8b) No support.

NX-OS Release 6.2(8b) 6.2 6.2(8a) No support.

6.1 6.1(5a) No support.

5.2 5.2(9a) No support.

NX-OS Release 6.2(8a) 6.2 6.2(2), 6.2(2a), 6.2(6), 6.2(6a), 6.2(6b), 6.2(8) 6.2(2), 6.2(2a), 6.2(6), 6.2(6a), 6.2(6b), 6.2(8)

6.1 6.1(3), 6.1(4), 6.1(4a), 6.1(5) No support.

5.2 5.2(7), 5.2(9) No support.

BRKDCT-2458 32

Page 33: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Compatibility check is done:

Module bootable Impact Install-type Reason

------ -------- -------------- ------------ ------

1 yes disruptive reset Incompatible image

3 yes disruptive reset Incompatible image

4 yes disruptive reset Incompatible image

5 yes disruptive reset Incompatible image

6 yes disruptive reset Incompatible image

10 yes disruptive reset Incompatible image

Compatibility Check:

Start Installer:switch# install all kickstart n7000-s1-kickstart.6.0.1.bin system n7000-s1-dk9.6.0.1.bin

Switch will be reloaded for disruptive upgrade.

Do you want to continue with the installation (y/n)? [n] y

Continue?

You are prompted to continue

Software Upgrade: Disruptive Upgrade with Installer

NOTE: Isolate

the node first!

Disruptive!

NOTE: Must do ISSU or this for Nexus 5k.BRKDCT-2458 33

Page 34: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

n7000# show boot

sup-1

kickstart variable = bootflash:/n7000-s1-kickstart.4.0.1.bin

system variable = bootflash:/n7000-s1-dk9.4.0.1.bin

sup-2

kickstart variable = bootflash:/n7000-s1-kickstart.4.0.1.bin

system variable = bootflash:/n7000-s1-dk9.4.0.1.bin

No module boot variable set

n7000# dir

49152 Mar 29 00:07:42 2005 lost+found/

80850712 May 23 18:14:46 2005 n7000-s1-dk9.4.0.1.bin

9791207 May 23 21:32:52 2005 n7000-s1-epld.4.0.1.img

22593024 May 23 18:13:11 2005 n7000-s1-kickstart.4.0.1.bin

4096 Jan 01 00:14:52 2005 routing-sw/

Usage for bootflash://

553099264 bytes used

1365549056 bytes free

1918648320 bytes total

Sup1 & Sup2 boot variables

Verifying Boot Variables:

Configuring Boot Variables:

Defaults to bootflash:

Kickstart & System

images present

Verifying Bootflash:

Configured

Sup1 & Sup2

n7000(config)# boot kickstart bootflash:n7000-s1-kickstart.4.0.1.bin sup1 sup2

n7000(config)# boot system bootflash:n7000-s1-kickstart.4.0.1.bin sup1 sup2

n7000# copy running-config startup-config

Available free space

n7000# reload

Manual Reload:

You are prompted to continue

Software Upgrade: Cold Start NOTE: Isolate

the node first!

BRKDCT-2458 34

Page 35: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

TAC: You’ve encountered defect CSCxy12345.

It’s operationally impacting and, I’m sorry to say,

there’s no workaround. You’ll need to upgrade.

You: Fine. Let’s just get it fixed.

Bill, start up a war room.

John, get our AS NCE on the phone.

Sally, schedule testers in two hours.

Where’s my $#@! coffee?

Sally: You know how Richard gets when we call him at 2 AM...

Defect Impact

Belay my last.

We have a SMU

for that.

What?

Gesundheit.

!

35BRKDCT-2458

Page 36: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Software Patching in NX-OS

Overview

• Software Patching is Platform Independent

• Available on Nexus 9000 (6.1(2)I2)

• FCS NX-OS 7.2 (5/6/7k)

• Fully supported with ISSU

Benefits

• Reduce time to resolution in your network.

• SMUs in NX-OS build upon years of

experience in IOS XR.

• Simplify customer operations for defect

resolution and code qualification.

• Better utilize the software HA capabilities

of NX-OS.

• Provide a common cross-platform

experience (N9K/N7K/N6K/N5K).

Who’s familiar with Software Maintenance Updates (SMU)?

36BRKDCT-2458

Page 37: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Any Nexus Chassis

Supporting Patching

SMU Repository

Copy to Device

SMU Removed

SMU Committed

SMU Committed

Switch# install add …

Switch# install activate …

Switch# install commit …Switch# install deactivate …

Switch# install commit …

Switch# install remove …

SMU

Memory: Process:

Memory: Process:

SMU AppliedMemory: Process:

Memory: Process:

Memory: Process:

SMU

show install active

show install committed

show install inactive

show install packages

show install pkg-info …

SMU Lifecycle – CLI

BRKDCT-2458 37

Page 38: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Patching Highlights

SMU Types

• Restart: Restarts affected process

• Process restarted in all VDCs where running.

• ISSU SMU: Dual Sup -> ISSU

Single Sup -> Reload

BRKDCT-2458 38

Page 39: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Patching is for operationally impacting bugs without a workaround.

• Cannot patch to next release.

• Patching is done in default/admin VDC and applies to all VDCs.

• Patching is not available per-VDC.

• ISSU will work with all, or a subset of patches applied.

• You don’t need to apply all patches.

• Some SMUs may only have a single fix, others may have multiple

packaged.

Patching Highlights

BRKDCT-2458 39

Page 40: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Patching Highlights

• SMUs are TAC supported.

• SMUs are synched to standby supervisor.

• On Sup replacement, patch will be synchronized.

• SMUs are not for feature implementation. A SMU cannot change the configuration.

BRKDCT-2458 40

Page 41: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Operational Best PracticesHardware Maintenance

Page 42: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Nexus platforms contain several EPLDs that provide hardware functionalities in all modules.

5 things to know about the EPLD:

1. EPLD Advantage: Enhancement and fixes through software upgrade.

2. EPLD upgrades are a separate and independent process from ISSU.

3. The EPLD upgrade is disruptive to the traffic; however they are performed on a module by module basis, so at any given time only one module is affected.

4. The upgrade is not always required. Please always check the EPLD Release Notes page when a new NX-OS release is posted to www.cisco.com.

Hardware MaintenanceElectronic Programmable Logic Device (EPLD) Upgrade

BRKDCT-2458 42

Page 43: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

n7000# install module 1 epld bootflash:n7000-s1-epld.4.0.1.img

EPLD image file , built on Mon Mar 31 10:31:48 2008

EPLD Curr Ver New Ver

-------------------------------------------------------

Power Manager 4.1 5.3

IO 2.6 2.10

Forwarding Engine 1.4 1.6

WARNING: Upgrade process could take up to 30 minutes.

Module could be powered down and up.

Module 1 will be powered down now!!

Do you want to continue (y/n) ? [n] y

Module 1 EPLD upgrade is successful.

The following example upgrades the EPLD image for module 1. The EPLD

image should be local when the upgrade is performed.

The “install” command highlights the

EPLD version differences

EPLD upgrades are intrusive and

may take up to 30 minutes per

module!

The user is prompted to continue

This procedure is typically not required during an NX-OS upgrade.

Hardware MaintenanceEPLD Upgrade Example

NX-OS >= 6.1: Parallel EPLD Upgrades!

BRKDCT-2458 43

Page 44: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Hardware Maintenance

• Power down line card prior to removal.

Nexus# out-of-service module <module-number>

• Online Insertion and Removal (OIR) of like hardware is supported, configuration maintained. Hitless with VPC provided sufficient bandwidth and port-channel distribution.

• Mixed line card deployment between VPC peers is not supported.

NOTE: Evaluate the VDC interface assignments to verify whichVDCs will experience a service impact.

Line card support matrix:

http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/nexus7000/sw/matrix/technical/reference/Module_Comparison_Matrix.pdf

Scenario: Line Card Hardware Upgrade or Replacement

However:

#conf t

#vpc domain <id>

#bypass module-check

Not BP, only corner

case, change window.

BRKDCT-2458 44

Page 45: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Hardware Maintenance

1. Disconnect all of the cables attached to the front of the module to be removed.

2. Loosen the two captive screws.

3. Press the ejector release buttons on the top and bottom ends of the module to push out the ejector levers and to disconnect the module.

4. Simultaneously rotate the two ejector levers outward to unseat the module from the midplane connector.

5. With a hand on each ejector, pull the module part way out of its slot in the chassis.

6. Grasp the front edge of the module with your left hand and place your right hand under the lower side of the module to support its weight. Pull the module out of its slot.

Line Card Removal

Did you remember to put on your ESD strap?

BRKDCT-2458 45

Page 46: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Hardware Maintenance

• You can do mixed supervisor (Sup1 and/or Sup2) upgrades in a VPC environment but don’t run it in production, only during maintenance or migration.

• NOTE: When migrating from N7k Sup1 to Sup2 or Sup2E, there is a specific process for installing the configuration and copying the licenses.

• Pre-load same NX-OS version prior to replacing the supervisor to avoid delays.

• If replacing the active supervisor, “system switchover” to the other supervisor before running “out-of-service slot-number”.

Scenario: Supervisor Hardware Upgrade (Dual Supervisor System)

Scenario: Supervisor Hardware Upgrade (Single Supervisor System)

• New supervisor should have same NX-OS version as VPC peer before starting to avoid VPC Type-2 inconsistency.

• Isolate for graceful withdrawal from the VPC domain

• Power down chassis

BRKDCT-2458 46

Page 47: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Hardware Maintenance

• Gas up your fork lift.

• Bring switch being replaced into Graceful Insertion and Removal mode or manually isolate prior to power down.

Scenario: Chassis Hardware Upgrade

• Don’t oversubscribe the fabric when replacing fabric modules.

Scenario: Fabric Module Hardware Upgrade

n7000# show hardware fabric-utilization

------------------------------------------------

Slot Total Fabric Utilization

Bandwidth Ingress % Egress %

------------------------------------------------

1 138 Gbps 0.0 0.0

<…cut…>

BRKDCT-2458 47

Page 48: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Hardware Maintenance

• Online Insertion and Removal (OIR) is supported.

• Be mindful of power budget.

Scenario: Power Supply Hardware Upgrade

n7000# show environment power

<..cut..>

Power Usage Summary:

--------------------

Power Supply redundancy mode (configured) Redundant

Power Supply redundancy mode (operational) Redundant

Total Power Capacity (based on configured mode) 6000 W

Total Power of all Inputs (cumulative) 12000 W

Total Power Output (actual draw) 1616 W

BRKDCT-2458 48

Page 49: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Node Isolation

Page 50: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Unplanned Impact During Maintenance Window

DESIRE: Manipulate IGP to only advertise a default route as this device is the only path out of the network.

PLAN: Static default to NULL0, originate default in OSPF, northbound forwarding decisions based on BGP learned prefixes.

PROBLEM: OSPF converges before BGP, traffic black-holed.

SOLUTIONS: Isolate devices before making changes.

• Manipulate IGP/EGP, route around.

• Gracefully* shutdown IGP/EGP while preserving configuration.

Scenario

iBGP

OSPF

Waste of a good RIB.

BRKDCT-2458 50

Page 51: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocol Isolation in Nexus

Option 1: Advertise as Stub Router, LSInfinity

max-metric router-lsa

[ on-startup [ seconds | wait-for bgp tag ]]

Option 2: Shutdown OSPF (Process), Preserve Configuration

router ospf 1

shutdown

NOTE: Currently, shutdown is not graceful.

Option 3: Shutdown OSPF (Interfaces)

interface e1/1

ip ospf shutdown

OSPF

Recommended

BRKDCT-2458 51

Page 52: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocol Isolation in Nexus

Option 1: Advertise as with LSP Database Overload Bit set

set-overload-bit {always | on-startup

{seconds | wait-for bgp as-number}}

[suppress [interlevel | external]]

Option 2: Shutdown IS-IS (Process), Preserve Configuration

router isis 1

shutdown

Option 3: Disable IS-IS (Interfaces)

interface e1/1

isis shutdown

IS-IS

Recommended

BRKDCT-2458 52

Page 53: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocol Isolation in Nexus

Option 1: Manipulate Metrics

interface e1/1

ip delay eigrp instance-tag seconds

Option 2: Shutdown EIGRP (Process) , Preserve Configuration

router eigrp 1

shutdown

NOTE: Currently, shutdown is not graceful.

Option 3: Disable EIGRP (Interfaces)

interface e1/1

ip eigrp 1 shutdown

EIGRP

Recommended

BRKDCT-2458 53

Page 54: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocol Isolation in Nexus

Option 1: Advertise prefixes with longer AS path / higher local-preference

Option 2: Shutdown BGP (Process), Preserve Configuration

router bgp 65010

shutdown

NOTE: This is a not a graceful shutdown such as you would achieve with GSHUT / RFC 6198.

BGP

switch(config)# route-map prepend

switch(config-route-map)# match as-path 1

switch(config-route-map)# set as-path prepend last-as 3

switch(config)# router bgp 65000

switch(config-router)# neighbor 192.168.10.2 remote-as 20

switch(config-router-neighbor)# address-family ipv4 unicast

switch(config-router-neighbor-af)# route-map prepend out

Recommended

BRKDCT-2458 54

Page 55: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocol Isolation in Nexus

Option 4: System Interface Shutdown

system interface shutdown

All Protocols

For many, this is good enough.

And, easy!

BRKDCT-2458 55

Page 56: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

VPC Shutdown Feature

This feature allows customer to manually “isolate” a switch from vPC domain. This is a vPC configuration option.

Pre-7.2 Behavior

• No “shutdown”

command.

• Manual Shutdown

Required

• Down vPCs

• Down Peer Link

• vPC Members

• Etc.

PKA

Vlan 1-100

Vlan 1-100Vlan 1-100

Pri

ma

ry

Se

co

nd

ary

Configure

PKA

Vlan 1-100

Prim

ary

Se

co

nd

ary

De-configure

Vlan 1-100

Vlan 1-100

VPC Shutdown Behavior

• Local switch isolated from remote.

• Cannot exit shutdown without

manual intervention.

• When exiting, PKA, PL, and vPCs

will be re-initialized; vPC domain

brought to normal state.

Availability 5k/6k/7k: NX-OS 7.256BRKDCT-2458

Page 57: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

NX-OS Graceful Insertion and Removal

Page 58: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Graceful Insertion and Removal

feature ospf

feature vpc

Isolate for

Change Window

OSPF:

max-metric router-lsa

VPC:

shutdown

Scripting takes time.

It’d be nice to automate this…

58BRKDCT-2458

Page 59: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Graceful Insertion and Removal

vPC vPC

system mode maintenance

One command!Pre-change System Snapshot

Change window begins.

59BRKDCT-2458

Page 60: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Graceful Insertion and Removal

vPC vPC

Change window complete.

no system mode maintenance

One command!Post-change System Snapshot

60BRKDCT-2458

Page 61: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Flexible framework providing a comprehensive, systemic method to isolate a node.

Configuration profile foundation in NX-OS

Initial support for:– vPC/vPC+– ISIS– OSPF– EIGRP– BGP– Interface

Per VDC on Nexus 7x00

Platform Release

Nexus 5x00/6000 NX-OS 7.1

Nexus 7x00 NX-OS 7.2

Nexus 9000 NX-OS 7.0

Graceful Insertion and Removal

61BRKDCT-2458

Page 62: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Configuration Profiles

Automatic Profiles Manual Profiles

• Generated by default

• Parses configuration to determine

changes going into and out of GIR

• Changes based on base protocol

configuration settings.

• Use: Maintenance Windows

• User created profile for maintenance-

mode and normal-mode

• Flexible selection of protocols for

isolation

• Use: maintenance windows and

isolation during troubleshooting using

preconfigured scripts.

• Maintenance-mode profile is applied when entering GIR mode,

• Normal-mode profile is applied when GIR mode is exited.

BRKDCT-2458 62

Page 63: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

N7K-1-Core# show system mode

System Mode : Normal

N7K-1-Core# config

Enter configuration commands, one per line. End with

CNTL/Z.

N7K-1-Core(config)# system mode maintenance

BGP is not enabled, nothing to be done

EIGRP is not enabled, nothing to be done

OSPF is up..... will be shutdown

OSPF TAG = 100, VRF = default

config terminal

router ospf 100

shutdown

end

OSPFv3 is not enabled, nothing to be done

ISIS is not enabled, nothing to be done

vPC is not enabled, nothing to be done

Interfaces will be shutdown

Do you want to continue (y/n)? [n] y

Generating maintenance-mode profile

Progressing...................Done.

System mode operation completed successfully

N7K-1-Core# show system mode

System Mode : Maintenance

N7K-1-Core#

Enabling Graceful Insertion and RemovalAutomatic Profile Generation

NOTE: Custom profile generation requires “dont-generate-profile”.

63BRKDCT-2458

Page 64: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

config-profile normal-mode type admin

router ospf 100

no shutdown

router eigrp 101

no shutdown

router isis 102

no set-overload-bit always

router bgp 103

no shutdown

vpc domain 20

no shutdown

no system interface shutdown

config-profile maintenance-mode type admin

router ospf 100

shutdown

router eigrp 101

shutdown

router isis 102

set-overload-bit always

router bgp 103

shutdown

vpc domain 20

shutdown

system interface shutdown exclude fex-fabric

Enabling Graceful Insertion and RemovalCustom Profile Generation

• By default, GIR Mode will automatically generate profiles.• CLI to disable automatic profile generation: dont-generate-profile

• If you enter GIR mode with automatic profile, it will overwrite your custom profile.

BRKDCT-2458 64

Page 65: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Protocols still matter – need support for signaling state to neighbors to minimize disruption

– LACP, STP, IGMP, PIM, etc. don’t have a mechanism

– BGP, IGPs have rich set of capabilities

• OSPF: Max-metric router-lsa → graceful shutdown

• IS-IS: Overload bit → graceful shutdown

• EIGRP: Graceful shutdown with Goodbye (All K-values set to 255)

• BGP: Graceful shutdown Note: 7k supports AS-Prepend and local-preference in manual configuration mode.

Graceful Insertion and Removal Mode and Protocols

“system interface shutdown” will isolate those protocols.

65BRKDCT-2458

Page 66: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

1. Advertise costly metrics:

OSPF max-metric

2. Prepend AS-path

(MP-eBGP)

3. Node taken out of

forwarding path.

Graceful Insertion and Removal ModeVxLAN EVPN GIR Example

Control plane is still active, forwarding path altered.

iBGP

RR/VTEP

VTEP

N

VTEP

3VTEP

1

VTEP

2

iBGP

RR/VTEP

BRKDCT-2458 66

Page 67: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Graceful Insertion and Removal ModeL2 & L3 GIR Example

L2

L3

BGP

OSPF

VPC

VPC

Shutdown

as-path

max-metric

system mode maintenance dont-generate-profile

Sys. Int.

Shutdown

BRKDCT-2458 67

Page 68: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Make Nexus undesirable for all transit traffic

• Maintain Protocol Adjacencies

• Send route withdrawals/worse metrics

• Local route states are maintained.

• New CLI 'isolate' in all Unicast Protocols

• Multicast follows Unicast for RPF

Nexus 9k/7k/6k/5kGraceful Removal

N9372(config)# system

mode maintenance

Following configuration

will be applied:

router bgp 33

isolate

router eigrp 1

isolate

router ospf 1

isolate

router isis 1

isolate

5k/6k/7k: NX-OS 7.3

9k: 7.0

BRKDCT-2458 68

Page 69: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

router bgp 33

isolate

router eigrp 1

isolate

router ospf 1

isolate

router isis 1

isolate

Nexus 9k/7k/6kGraceful Removal

Discontinue advertisement of all prefixes.

Advertises maximum metrics for all K-values.

max-metric router-lsa

set-overload-bit

6k/7k: NX-OS 7.3

9k: 7.0

BRKDCT-2458 69

Page 70: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Move the switch from Maintenance mode to Normal mode.

• Control plane maintained throughout isolation of the switch.

• Protocols advertise routes only after it is installed in hardware.

Nexus 9k/7k/6k/5kGraceful Insertion

N9372(config)# no system

mode maintenance

Following configuration

will be applied:

router bgp 33

no isolate

router eigrp 1

no isolate

router ospf 1

no isolate

router isis 1

no isolate

5k/6k/7k: NX-OS 7.3

9k: 7.0

BRKDCT-2458 70

Page 71: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus GIR Snapshots

• Used before and after a GIR mode to compare pre/post change operation.

• Snapshots are automatically generated when entering GIR mode.

switch# snapshot create snap1 For testing

Executing show interface... Done

Executing show bgp sessions vrf all... Done

Executing show ip eigrp topology summary... Done

Executing show vpc... Done

Executing show ip ospf vrf all... Done

Feature 'ospfv3' not enabled, skipping...

Snapshot 'snap1' created

Switch#

BRKDCT-2458 71

Page 72: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus GIR Snapshots Comparison

Nexus# sh snapshots compare before_maintenance after_maintenance

================================================================================

Feature Tag before_maintenance after_maintenance

================================================================================

[bgp]

--------------------------------------------------------------------------------

[neighbor-id:100.120.1.221]

connectionsdropped 2 **3**

lastflap P1DT21H5M12S **P1DT21H25M47S**

lastread P1DT21H25M12S **PT0S**

lastwrite P1DT21H25M14S **PT0S**

state Established **Idle**

localport 52737 **0**

remoteport 179 **0**

notificationssent 2 **3**

<...>

{ }+-BRKDCT-2458 72

Page 73: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

vPC

Primary Secondary

Po10

Po1

Po20

Po2

1 42 3

FE

X 1

01

FE

X 1

02

Overview

• Highly Redundant Design

• Dual-attached FEX

• Dual-attached Hosts

How do we upgrade this environment with minimal disruption?

Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade

V1 V1

V1 V1

BRKDCT-2458 73

Page 74: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade

vPC

Primary Secondary

Po10

Po1

Po20

Po2

1 42 3

FE

X 1

01

FE

X 1

02

• Enter GIR Mode on N5k1

Traffic flow through N5k2V1 V1

V1 V1

• Upgrade N5k1

V2

• Exit GIR on N5k1

Image Version

Mismatch with

both FEXs

IF Down

IF Up, No Forwarding BRKDCT-2458 74

Page 75: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade

vPC

Primary Secondary

Po10

Po1

Po20

Po2

1 42 3

FE

X 1

01

FE

X 1

02

• Manually shut down IF3 on N5k2

V1 V1

V1 V1

V2

IF Down

IF Up, No Forwarding

FEX 101 goes offline.

FEX 101 HIFs go down.

FEX 101 starts pairing process with N5k1.

V2

FEX 101 upgrades to V2.

BRKDCT-2458 75

Page 76: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade

vPC

Primary Secondary

Po10

Po1

Po20

Po2

1 42 3

FE

X 1

01

FE

X 1

02

• Manually shut down IF4 on N5k2

V1 V1

V1 V1

V2

IF Down

IF Up, No Forwarding

FEX 102 goes offline.

FEX 102 HIFs go down.

FEX 102 starts pairing process with N5k1.

V2

FEX 102 upgrades to V2.

V2

BRKDCT-2458 76

Page 77: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade

vPC

Primary Secondary

Po10

Po1

Po20

Po2

1 42 3

FE

X 1

01

FE

X 1

02

V1 V1

V1 V1

V2

IF Down

IF Up, No Forwarding

V2 V2

• Enter GIR Mode on N5k2

IF 3 & 4 Still Admin Down

• Upgrade N5k2

• Exit GIR on N5k2

V2

• Manual Up of IF 3 & 4

Environment upgrade completed with minimal traffic disruption.

BRKDCT-2458 77

Page 78: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Nexus Graceful Insertion and Removal ModeNXOS Software Support

Platform Min. Version

Nexus 5600/6000 7.1

Nexus 7000 7.2

Nexus 9000 (Stand-alone) 7.0

Nexus 3000 No Support

BRKDCT-2458 78

Page 79: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

ACI Operational Practices

Page 80: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Application Centric Infrastructure (ACI)

APPLICATION CENTRIC

POLICY CONTROLLERNEXUS 9500 AND 9300

ACI

Application Network Profile

Three Tier Application

Web App DB

Nexus 9k App Centric Policy APIC

Rapid Deployment of Applications onto

Networks with Scale, Security and Full Visibility

BRKDCT-2458 80

Page 81: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

EPG “Users”EPG “Files”

Leaf Nodes

AVS

EPG “Internet” Service Producers

Service Consumers

ACI Architecture

Spine Nodes

Key features:

Software Defined & API driven DC fabric

Elastic & highly available Fabric based on CLOS architecture

Next generation features for security design, containment, for Operations81

Page 82: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco ACI Fabric

Fabric View Controller Connectivity

BRKDCT-2458 82

Page 83: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Health Score

Aggregation of system-wide health, including pod health

scores, tenant health scores, system fault counts domain

and type and the APIC cluster health state.

Aggregated View

Fabric Topology View

BRKDCT-2458 83

Page 84: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCT-2458 84

Troubleshoot a flow Use ACI inbuilt Visibility engine

Faults

Drops

Eligible Path

Page 85: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Troubleshoot a flow Use ACI inbuilt Visibility engine

Fabric Security Policies

Real Time Traffic

Capture

BRKDCT-2458 85

Page 86: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Upgrade #1Download the release on the APIC

BRKDCT-2458 86

Page 87: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Upgrade #2Upgrade APIC

BRKDCT-2458 87

Page 88: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Upgrade #3Create Groups

BRKDCT-2458 88

Page 89: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Upgrade #4Upgrade the Maintenance Groups

BRKDCT-2458 89

Page 90: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCT-2458 90

Capacity DashboardView the capacity of Data center Fabric

Page 91: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Optimizer tool DC forecasting

Simulator to verify the addition of :

• 10 tenants

• 200EPGs

• 500 contracts per EPG

On the fabric

Over capacity determined, you can tweak the attribute to check the limit of your capacity.

Create additional Node and it scale in ACI (Simulator)

BRKDCT-2458 91

Page 92: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Monitor Manage Troubleshoot

• Faults

• Events

• Health Score

• Atomic Counter

• Contract deny logs

• Statistics

• Capacity Dashboard

• Image Management

• Config Export / Import

• Fabric Inventory

• Show Usage

• Configuration Rollback

• Audit Logs

• iPing

• iTraceroute

• Endpoint Tracker

• ERSPAN

• Traffic Map

• On Demand Counter View

• CLI option

Cisco ACI Deployment Lifecycle

ReactiveProactive Preemptive

Recommended Live sessions for ACI :

BRKACI-2602, LTRACI-3010 92BRKDCT-2458

Page 93: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

NX-OS Operational & Change Management Best Practice

Page 94: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Windows – Golden Rules

• Change Review Board

• Schedule when environment will be least impacted.

• Software Staging

• Verify out of band.

• Test! After and before.

BRKDCT-2458 94

Page 95: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Windows Methodology Overview

Plan

• Imp.

• Back-out

• Test

Plan/Testing

(if applicable)

Peer Review Approve?Bus. Owner

Review

Approve?

BRKDCT-2458 95

Page 96: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Maintenance Windows

Implement

Success?

Stage

OoB

Etc.Test Plan

Back-out

Notify

&

Document

Schedule

Change

Window

&

App Owners

BRKDCT-2458 96

Page 97: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Traditional vPC Environment Change Change Best Practice & Window

vPC

Primary Secondary

Core Isolation

1. Graceful L3 Protocol Isolation

2. Layer 2 Isolation

• VPC

3. Interface Isolation

Using GIR Mode Steps 1-3 could be achieved prescriptively.

Access Isolation

1. Layer 2 Isolation

• VPC

2. Interface Isolation

1. Fex-fabric (include/exclude)

2. Dual-attached FEX Procedure * Recommended

Using GIR Mode Steps 1-2 could be achieved prescriptively.

NOTE: Maintenance mode consideration should be based on Fex-fabric connectivity.

Fex

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.

BRKDCT-2458 97

Page 98: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

L3 Environment Change Best Practice & Window

Core Isolation

1. Graceful L3 Protocol Isolation

2. Interface Isolation

Using GIR Mode Steps 1-2 could be achieved prescriptively.

Access Isolation

1. L3 Protocol isolation

2. Layer 2 Isolation

• vPC

3. Interface Isolation

1. Fex-fabric (include/exclude)

2. Dual-attached FEX Procedure * Recommended

Using GIR Mode, prescriptive isolation is possible.

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.

Layer 3

BRKDCT-2458 98

Page 99: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

FabricPath Environment Change Best Practice & Window

Spine Isolation

1. Use FabricPath IS-IS Overload Bit

Using GIR Mode with isolate configuration, Step 1 could be achieved prescriptively.

Leaf Isolation

1. Use FabricPath IS-IS Overload Bit

2. Shutdown the VPC+ domain.

Using GIR Mode with manual profile, step 1 could be achieved prescriptively.

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.

FabricPath

BRKDCT-2458 99

Page 100: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

VxLAN EVPN Environment Change Best Practice & Window Spine Isolation

1. L3 Protocol isolation

• If iBGP EVPN, consider IGP isolation

• If eBGP EVPN, consider BGP isolation

2. Interface Isolation

Using GIR Mode Steps 1-2 could be achieved prescriptively.

Leaf Isolation

1. L3 Protocol isolation

• If iBGP EVPN, consider IGP isolation

• If eBGP EVPN, consider BGP isolation

2. Layer 2 Isolation

• vPC

3. Interface Isolation

1. Fex-fabric (include/exclude)

2. Dual-attached FEX Procedure * Recommended

Using GIR Mode, prescriptive isolation is possible.

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.

VTEPs

VxLAN

iBGP RR

BRKDCT-2458 100

Page 101: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

NX-OS 6.x -> 7.xUse Case

Page 102: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

7k Upgrade

• Prerequisites

• Code Staging

• VPC Best Practices

• Manual Isolation of Secondary

• Protocol Isolation• Max-metric LSA, etc. -> No service impact

(0-20ms)

• VPC Isolation• Down vPCs-> No service impact (0-20ms)

• Down Peer Link-> No service impact

• Reload Upgrade->No service impact

• Peer link is brought UP-No Service impact

• South links UP – No Service impact

• North protocol Max-metric LSA removal – UP –No Service impact

NX-OS 6.x -> 7.x Use Case - Secondary7k Single Supervisor

Manual Effort

7k

2k

5k

• Peer Switch

• Peer Gateway

• Auto-recovery

• L3 Link between vPC pairs

• BFD

• Routing Protocol Convergence Tuning

BRKDCT-2458 102

Page 103: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

7k Upgrade

• Manual Isolation of Secondary

• Protocol Isolation• Max-metric LSA, etc. -> No service impact (0-20ms)

• VPC Isolation• Down vPCs-> No service impact (0-20ms)

• VPC peer priority changes The secondary should have a lower priority to become the primary incase of flapping.

• Down Peer Link-> No service impact

• Peer link & PKA is brought down & Reload initiated for Upgrade->No service impact to 0-50ms impact in traffic based on traffic pattern (this switch comes as secondary)

• Peer link is brought UP-> No Service impact

• South links UP ->No Service impact

• North protocol UP ->No Service impact

Note: The System did not have firewall or LB connected directly to it.

NX-OS 6.x -> 7.x Use Case Primary7k Single Supervisor

Manual Effort

7k

2k

5k

Prerequisites

Code Staging

VPC Best

Practices

BRKDCT-2458 103

Page 104: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Summary

Page 105: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

What to use? GIR Mode? Patching? ISSU? All of them?

Critical Bug Fix &

PSIRT

Hardware

Upgrade

New Features Risk

Adverse

ISSU ✓ X ✓ ✓

GIR + Cold Boot ✓ X ✓ ✓

GIR + Disruptive

Installer✓ X ✓ ✓

SMU Restart ✓ X X ✓

GIR + SMU ISSU ✓ X X ✓

GIR X ✓ X X

Putting It All Together

OptionSituation

105BRKDCT-2458

Page 106: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Summary

• Verify environment conforms to data center networking best practices.

• Follow the your documented change management process.

• Isolate nodes during maintenance to minimize disruption.

Use GIR Mode where possible to ease isolation configuration.

BRKDCT-2458 106

Page 107: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.

• Complete your session surveys through the Cisco Live mobile app or from the Session Catalog on CiscoLive.com/us.

BRKDCT-2458 107

Page 108: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Lunch & Learn

• Meet the Engineer 1:1 meetings

• Related sessions

BRKDCT-2458 108

Page 109: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Please join us for the Service Provider Innovation Talk featuring:

Yvette Kanouff | Senior Vice President and General Manager, SP Business

Joe Cozzolino | Senior Vice President, Cisco Services

Thursday, July 14th, 2016

11:30 am - 12:30pm, In the Oceanside A room

What to expect from this innovation talk

• Insights on market trends and forecasts

• Preview of key technologies and capabilities

• Innovative demonstrations of the latest and greatest products

• Better understanding of how Cisco can help you succeed

Register to attend the session live now or

watch the broadcast on cisco.com

Page 110: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

Thank you

Page 111: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,
Page 112: Nexus Operations and Maintenance Best Practicesd2zmdbbm9feqrf.cloudfront.net/2016/usa/pdf/BRKDCT-2458.pdf · Nexus Operations and Maintenance Best Practices Arvind Durai, Director,

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Center / Virtualization Cisco Education OfferingsCourse Description Cisco Certification

Introducing Cisco Data Center Networking (DCICN);

Introducing Cisco Data Center Technologies (DCICT)

Learn basic data center technologies and skills to build a

data center infrastructure.

CCNA® Data Center

Implementing Cisco Data Center Unified Fabric (DCUFI);

Implementing Cisco Data Center Unified Computing (DCUCI)

Designing Cisco Data Center Unified Computing (DCUDC)

Designing Cisco Data Center Unified Fabric (DCUFD)

Troubleshooting Cisco Data Center Unified Computing

(DCUCT)

Troubleshooting Cisco Data Center Unified Fabric (DCUFT)

Obtain professional level skills to design, configure,

implement, troubleshoot data center network infrastructure.

CCNP® Data Center

Product Training Portfolio: DCNMM, DCAC9K, DCINX9K,

DCMDS, DCUCS, DCNX1K, DCNX5K, DCNX7K

Gain hands-on skills using Cisco solutions to configure,

deploy, manage and troubleshoot unified computing, policy-

driven and virtualized data center network infrastructure.

Designing the FlexPod® Solution (FPDESIGN);

Implementing and Administering the FlexPod® Solution

(FPIMPADM)

Learn how to design, implement and administer FlexPod

solutions

Cisco and NetApp Certified

FlexPod® Specialist

For more details, please visit: http://learningnetwork.cisco.com

Questions? Visit the Learning@Cisco Booth or contact [email protected]

BRKDCT-2458 112