nexus operations and maintenance best...
TRANSCRIPT
Nexus Operations and Maintenance Best Practices
Arvind Durai, Director, Solutions Integration
Nick Garner, Solutions Integration Architect
BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Arvind Durai Nick Garner
– 17 years with Cisco Advanced Services
– Has worked with more than 100 customers in enterprise architecture, technology designs and operational simplification
– 9 years of Active Cisco live presenter
– Co-authored three Cisco Press Books – Cisco Firewall Services Module, Virtual Routing in the Cloud and TCL Scripting for Cisco IOS
– CCIE R/S and Security #7016
– 9 years with Cisco Advanced Services
Enterprise Campus and Datacenter
– Prior to Cisco, 10 years in financial and military verticals.
– Architecture, design and security support for the largest
ACI environment in the world.
– Cisco Press Technical Editor
– CCIE R/S and Security #17871
Contributors: Satish Kondalam, Junmei Zhang, Tarunesh Ahuja,
and many others from the Nexus TME team.
BRKDCT-2458 3
• vPC Refresher
• Operational Best Practices: Software
• Operational Best Practices: Hardware
• Node Isolation
• NX-OS Graceful Insertion and Removal
• ACI Operational Best Practices
• Change Window Best Practices
Agenda
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Course Objective & Goal
• To help Data Center operations and engineering staff understand the operational best practices when maintaining a Cisco Nexus data center network deployment.
• Attendees should leave the session with a firm understanding of
• Operational Best Practices
• Nexus Graceful Insertion and Removal
• Change Window Best Practices
BRKDCT-2458 5
High-Level Technology Review
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Data Center Technology Evolution
FabricPath with vPC+
2010
2009
VPC
< 2008
STP2013-2014
MPLS, OTV,
LISP
Standalone Fab.
VXLAN
MPLS, OTV,
LISP
2014-2015
ACI
2010
FEX with vPC
MPLS, OTV,
LISP
vPC
BRKDCT-2458 7
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
vPC Feature OverviewvPC Terminology
Layer 3 Cloud
vPC Member Port
vPC
Orphan Device
Orphan Port
vPC Peer
CFS
vPC Domain
Peer Link
vPC PeerKeepalive Link
S1
S3
S2
Failure Scenario
• If both peers are active, then Secondary vPC peer will disable all vPCs to avoid Dual-Active.
• Data will automatically forward down remaining active port channel ports.
• Loss of in-flight packets will depend on deployment of vPC best practice.
For YourReference
Layer 3 pt-pt link
BRKDCT-2458 8
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
vPC best practice
• vPC Domain ID’s
Use a unique vPC domain ID within a contiguous L2 domain to avoid MAC overlap.
• vPC Peer Link
Should be point-to-point connection & dedicated links.
• vPC Peer Keepalive Link
Dedicate a control plane in a dual-supervisor environment. Use a management switch.
• vPC peer-gateway
Acts as active gateway for frames addressed to peer switch. Avoid Peer Link forwarding.
• Use vPC peer-switch
Optimizes BPDU processing, single logical L2 entity
• Distribute port-channel member interfaces across line cards within the same chassis.
• Create a map for oversubscription aligned to current and future demand.
Deployment practice – 20:1 at access and 2:1 at Core.
vPC general deployment best practice
HIGHIMPACT / HARD TO
IMPLEMENT
LOWIMPACT / HARD TO
IMPLEMENT
HIGHIMPACT / EASY TO
IMPLEMENT
LOWIMPACT / EASY TO
IMPLEMENT
QUICK WINS!
BRKDCT-2458 9
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
1. vPC peer-link down : S2 - secondary shuts all its vPC member ports
2. S1 down : vPC peer-keepalive link down : S2 receives no keepalives
3. After 3 keepalive timeouts, S2 changes role and brings up its vPC
vPC Primary
vPC Secondary
PS
P
Operational
Primary
Auto-recovery addresses two cases of single switch behavior
• Peer-link fails and after a while primary switch (or keepalive link) fails
• Both VPC peers are reloaded and only one comes back up
S2S1
S3
P S
S1 S2
S3
P S
S1 S2
S3
vPC Configuration Best PracticesvPC Auto-recovery
BRKDCT-2458 10
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
VPC Self-isolation Automatically triggered isolation
Example Presented: All Line Cards Fail
Current Impact Self-isolation Behavior (7.2)
• When this failure
happens on primary,
peer-link is brought down.
• This causes the
secondary brings down
all legs.
• Traffic is completely
blocked.
When this failure happens:
•Physically bring down peer-link
•Physically bring down all vPC legs
•Send self-isolation through peer-keep-alive
Peer switch:
•Receive self-isolation from the peer through
peer-keep-alive
•Change role to Primary
•Bring up all down vPC legs
BU Testing Results:
Sub-second Recovery (N>S) (S>N) (E>W)
PKA
Vlan 1-100
Vlan 1-100Vlan 1-100
Primary Secondary
Current
PKA
Vlan 1-100
PrimarySecondary
Self-isolation
NOTE: Available in NX-OS 7.2, 5k/6k/7k
BRKDCT-2458 11
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
VPC Self-isolation Automatically triggered isolation
No up VLANs on peer-link: This case address the issue that no VLANs are up on the peer-link while the port channel is physically up (i.e., peer-link is up with no VLANs). For example: VLAN misconfiguration, hardware programming failure.
Current Behavior Self-isolation Behavior
• The up VLAN on vPC legs
are the results of configured
VLAN intersected with the
UP VLAN on peer-link (i.e.,
three-way intersection).
• All VLAN down on vPC
domain.
• This completely blocks the
traffic
Isolated switch:
• Physically bring down peer-link
• Physically bring down all vPC legs
• Send “self-isolation” through peer-keep-
alive
Peer switch:
• Receive self-isolation from the peer
through peer-keep-alive
• Change role to Primary
• Bring up all down vPC legs
PKA
No vlan
Vlan 1-100 Vlan 1-100
Current
PKA
Vlan 1-100
Self-isolation
NOTE: Available in NX-OS 7.2, 5k/6k/7k
BRKDCT-2458 12
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Object-tracking
vPC Configuration Best Practices
S1 S2
• Object Tracking triggered when the track object goes down
• vPC object tracking, tracks both peer-link and uplinks in a list of Boolean OR
• Traffic forwarded over the remaining vPC peer.! Track the vpc peer link
track 1 interface port-channel11 line-protocol
! Track the uplinks
track 2 interface Ethernet1/1 line-protocol
track 3 interface Ethernet1/2 line-protocol
! Combine all tracked objects into one.
! “OR” means if ALL objects are down, this object will go down
track 10 list boolean OR
object 1
object 2
object 3
! If object 10 goes down on the primary vPC peer,
! system will switch over to other vPC peer and disable all local vPCs
vpc domain 1
track 10
• Suspends the vPCs on the impaired device.
S4 S5
S3
BRKDCT-2458 13
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
SSO-Aware Applications
Forwarding Information Base
IEEE 802.1x
PAgP / LACP
…and more
SSO-Aware Applications
Forwarding Information Base
IEEE 802.1x
PAgP / LACP
…and more
Checkpointing
Facility
Checkpointing
Facility
Redundancy
Facility
Redundancy
Facility
SSO-Compliant Applications
Routing Protocols
NetFlow
Cisco Discovery Protocol
…and more
SSO-Compliant Applications
Routing Protocols
NetFlow
Cisco Discovery Protocol
…and more
Stateful Switchover ModeSSO-Aware and SSO-Compliant Applications
Active Supervisor
Standby Hot Supervisor
BRKDCT-2458 14
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Routing Protocol Redundancy With NSF
EIGRP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Active Supervisor Engine Slot 1EIGRP RIB
Prefix Next Hop
- -
- -
- -
OSPF RIB
Prefix Next Hop
- -
- -
- -
ARP Table
IP MAC
- -
- -
- -
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Standby Supervisor Engine Slot 2
SSO
Redundancy Facility
Checkpoint Facility
BRKDCT-2458 15
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Routing Protocol Redundancy With NSF
EIGRP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
EIGRP RIB
Prefix Next Hop
- -
- -
- -
OSPF RIB
Prefix Next Hop
- -
- -
- -
ARP Table
IP MAC
- -
- -
- -
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
SSO
Redundancy Facility
Checkpoint Facility
Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2
BRKDCT-2458 16
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Routing Protocol Redundancy With NSF
EIGRP RIB
Prefix Next Hop
- -
- -
- -
OSPF RIB
Prefix Next Hop
- -
- -
- -
ARP Table
IP MAC
- -
- -
- -
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Standby Supervisor Engine Slot 2
GR/NSF Signaling per protocol
Synchronization per protocol
EIGRP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
BRKDCT-2458 17
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Routing Protocol Redundancy With NSR
BGP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Active Supervisor Engine Slot 1
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Standby Supervisor Engine Slot 2
SSO
Redundancy Facility
Checkpoint Facility
BGP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
BRKDCT-2458 18
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Routing Protocol Redundancy With NSR
FIB Table
Prefix Next HOP
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
192.168.0.0 aa25cc:ddeee8
Standby Supervisor Engine Slot 2BGP RIB
Prefix Next Hop
10.0.0.0 10.1.1.1
10.1.0.0 10.1.1.1
10.20.0.0 10.1.1.1
OSPF RIB
Prefix Next Hop
192.168.0 192.168.0.1
192.168.55.0 192.168.55.1
192.168.32.0 192.168.32.1
ARP Table
IP MAC
10.1.1.1 aabbcc:ddee32
10.1.1.2 adbb32:d34e43
10.20.1.1 aa25cc:ddeee8
No additional signaling required to maintain topology
BRKDCT-2458 19
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Comparing NSF and NSRMetric Non-Stop Forwarding / Graceful Restart Non-Stop Routing / Stateful Restart
Configuration required Yes, per protocol instance on NSF
capable device, No configuration
required for NSF –aware devices for
Interior Gateway Protocols. BGP
requires GR configuration on both
NSF-Capable and NSF Helper device
Yes per protocol instance.
BGP also requires per peer
configurations
Which routing protocols are supported EIGRP, OSPFv2, OSPFv3, ISIS, BGP,
LDP
ISIS, OSPFv2
Synchronizes routing protocol state
and RIB information across redundant
control planes
No Yes
Requires specific feature support on
peer routers
Yes No
BRKDCT-2458 20
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Standalone Chassis Redundant Core
• Redundant topologies with equal cost
paths provide sub-second convergence.
• NSF/SSO provides superior availability in
environments with non-redundant paths.
Se
co
nd
s o
f L
ost
Vo
ice
*
X
Failure or change at the Core
• Enable BFD for all OSPF neighbor links
• Adjust OSPF spf-throttling timers with:timers throttle spf
timers throttle lsa
timers lsa arrival
0
1
2
3
4
5
6
RP Convergence
Is Dependent
on IGP and Tuning
Link
Failure
Node
Failure
NSF
SSO
SPF
Throttle
OSPF
Convergence
* Route scale dependent.BRKDCT-2458 21
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Traditional vPC Environment Change Change Best Practice & Window
vPC
Primary Secondary
Fex
Core Node / Link Failure or Isolation
Access Node / Link Failure or Isolation
RP BP
vPCBest Practice
Hardware HA Best Practice
BRKDCT-2458 22
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Layer 3 Spine/Leaf Environment Change Best Practice & Window
Leaf Node / Link Failure or Isolation
VPC Best Practice
Spine Node / Link Failure or Isolation
RP BP
Hardware HA Best Practice
n1 n2
BRKDCT-2458 23
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Layer 2 Spine/Leaf EnvironmentChange Best Practice & Window
Leaf Node Failure / Link Failure
VPC Best Pactice
RP BP
Layer 3
Spine Node Failure / Link Failure
Hardware HA Best Practice
n1 n2
BRKDCT-2458 24
Operational Best PracticesSoftware
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Independent memory-protected restart-able processes
• Service Restart-ability
• Stateful Restart with
Persistent Storage Service (PSS)
• Checkpoints states to PSS
• Recover states from PSSupon restart.
• Stateful Restart with Graceful Restart
• Recover states based oninformation from other services and/or network.
• Mainly Routing Protocols
• Stateless Restart
• Fresh start, no trace of former instantiation.
Layer-2 Protocols Storage ProtocolsLayer-3 Protocols
Interface Management
Chassis Management
Kernel
Sysm
gr,
PS
S &
MT
S
SN
MP
, X
ML,
CLI
Managem
ent
Chip/Driver Infrastructure
VLAN mgr
STP
OSPF
BGP
EIGRP
GLBP
HSRP
VRRP
VSANs
Zoning
FCIP
FSPF
IVR
UDLD
CDP
802.1XIGMP snp
LACP PIMCTS SNMP
Other Services
Future Services
Possibilities
……
Protocol Stack (IPv4 / IPv6 / L2)
NX-OS High AvailabilityProcess Modularity
BRKDCT-2458 26
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 27BRKDCT-2458
Hardware FIB
Software RIB
Linux Kernel
BG
P
OS
PF
PIM
TC
P/U
DP
IPv6
ST
P
HS
RP
LA
CP
etc
HA Manager
Restart process!
Graceful restartGraceful restart
Routing updatesRouting updatesTable
Update
Nexus Data Plane
Stateful Fault Recovery
If a fault occurs in a process…
• HA manager determines best recovery action (restart process, switchover to redundant supervisor)
• Process restarts with no impact on data plane• State checkpointing (PSS) allows instant, stateful process recovery
• Software utilizes Graceful Restart where appropriate
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Active Sup
Backup Sup
LC - NSF
• Stateful Switchover (SSO)
• Active-backup supervisors synchronized at all times
• Routing Protocols: PSS Stateful Restart
NSF Graceful Restart failover
• Other components: PSS Stateful Restart
• Triggers:
• HA Policy Initiated – e.g. 3 component crashes SSO
• User Initiated – system switchover
• ISSU initiated SSO
NX-OS High AvailabilitySupervisor Switchover
BRKDCT-2458 28
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
An In Service Software Upgrade (ISSU) provides the capability to perform transparent software upgrades on platforms with redundant supervisors. Therefore, new features and fixes can be installed without any downtime or packet loss.
Recommended for highly critical, no packet loss tolerated, environments.
Only 1 command requiredn7000# install all kickstart bootflash:n7000-s1-kickstart.6.2.12.bin
system bootflash:n7000-s1-dk9.6.2.12.bin
Note: The CLI command wrapped due to the length
1. Copy the NX-OS images to flash2. Verify MD5 – switch# show file <image> md5sum
3. Use the install all command to start the upgrade
Upgrade Procedure:
Software UpgradeISSU
29BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Dual-supervisor failover only
• ISSU is user initiated:
• Through CLIinstall all kickstart <kickstart image> system
<system image> cmp <cmp image>
• Components upgraded:
• Supervisor: BIOS, Kickstart and System image
• Linecard: BIOS and Linecard image
• System wide upgrade – Not VDC specific
• Single-supervisor ISSU is not possible on the 7k. Service disruption might occur.
Upgrade BIOS on Active, Standby Sup and Linecards
Bring up old Active Sup with new image
Switchover (Standby takes over as Active)
Upgrade CMP
Perform HITLESS LC Upgrade (one at a time)
Upgrade Done
Bring up Standby Sup with new image
NX-OS High AvailabilityISSU
30BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Release
5.2(9a)Release
6.2(8b)
Linux KernelO
SP
F
BG
P
PIM
etc
.
HA Manager
Nexus Data Plane
Linux Kernel
HA Manager
Active
I/O Module Images
Upgrade and reboot
OS
PF
BG
P
PIM
etc
.
Standby
Initiate stateful failoverUpgrade and rebootUpgrade I/O modules
Release
5.2(9a)
Nexus# install all kickstart bootdisk:6.2-kickstart system …
Release
5.2(9a)
Release
6.2(8b)
Release
6.2(8b)
In-Service Software Upgrade
Needed for animation,
Best Practice:
VPCs should be distributed.
BRKDCT-2458 31
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
In-Service Software UpgradeCheck Upgrade Path in Release Notes
Current
Release
Release
Train
Releases That Support ISSU to the
Current Release
Releases That Support ISSD from the
Current Release
NX-OS Release 6.2(12) 6.2 6.2(8a), 6.2(8b), 6.2(10) No support.
NX-OS Release 6.2(10) 6.2 6.2(8), 6.2(8a), 6.2(8b) No support.
NX-OS Release 6.2(8b) 6.2 6.2(8a) No support.
6.1 6.1(5a) No support.
5.2 5.2(9a) No support.
NX-OS Release 6.2(8a) 6.2 6.2(2), 6.2(2a), 6.2(6), 6.2(6a), 6.2(6b), 6.2(8) 6.2(2), 6.2(2a), 6.2(6), 6.2(6a), 6.2(6b), 6.2(8)
6.1 6.1(3), 6.1(4), 6.1(4a), 6.1(5) No support.
5.2 5.2(7), 5.2(9) No support.
BRKDCT-2458 32
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
1 yes disruptive reset Incompatible image
3 yes disruptive reset Incompatible image
4 yes disruptive reset Incompatible image
5 yes disruptive reset Incompatible image
6 yes disruptive reset Incompatible image
10 yes disruptive reset Incompatible image
Compatibility Check:
Start Installer:switch# install all kickstart n7000-s1-kickstart.6.0.1.bin system n7000-s1-dk9.6.0.1.bin
Switch will be reloaded for disruptive upgrade.
Do you want to continue with the installation (y/n)? [n] y
Continue?
You are prompted to continue
Software Upgrade: Disruptive Upgrade with Installer
NOTE: Isolate
the node first!
Disruptive!
NOTE: Must do ISSU or this for Nexus 5k.BRKDCT-2458 33
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
n7000# show boot
sup-1
kickstart variable = bootflash:/n7000-s1-kickstart.4.0.1.bin
system variable = bootflash:/n7000-s1-dk9.4.0.1.bin
sup-2
kickstart variable = bootflash:/n7000-s1-kickstart.4.0.1.bin
system variable = bootflash:/n7000-s1-dk9.4.0.1.bin
No module boot variable set
n7000# dir
49152 Mar 29 00:07:42 2005 lost+found/
80850712 May 23 18:14:46 2005 n7000-s1-dk9.4.0.1.bin
9791207 May 23 21:32:52 2005 n7000-s1-epld.4.0.1.img
22593024 May 23 18:13:11 2005 n7000-s1-kickstart.4.0.1.bin
4096 Jan 01 00:14:52 2005 routing-sw/
Usage for bootflash://
553099264 bytes used
1365549056 bytes free
1918648320 bytes total
Sup1 & Sup2 boot variables
Verifying Boot Variables:
Configuring Boot Variables:
Defaults to bootflash:
Kickstart & System
images present
Verifying Bootflash:
Configured
Sup1 & Sup2
n7000(config)# boot kickstart bootflash:n7000-s1-kickstart.4.0.1.bin sup1 sup2
n7000(config)# boot system bootflash:n7000-s1-kickstart.4.0.1.bin sup1 sup2
n7000# copy running-config startup-config
Available free space
n7000# reload
Manual Reload:
You are prompted to continue
Software Upgrade: Cold Start NOTE: Isolate
the node first!
BRKDCT-2458 34
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
TAC: You’ve encountered defect CSCxy12345.
It’s operationally impacting and, I’m sorry to say,
there’s no workaround. You’ll need to upgrade.
You: Fine. Let’s just get it fixed.
Bill, start up a war room.
John, get our AS NCE on the phone.
Sally, schedule testers in two hours.
Where’s my $#@! coffee?
Sally: You know how Richard gets when we call him at 2 AM...
Defect Impact
Belay my last.
We have a SMU
for that.
What?
Gesundheit.
!
35BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Software Patching in NX-OS
Overview
• Software Patching is Platform Independent
• Available on Nexus 9000 (6.1(2)I2)
• FCS NX-OS 7.2 (5/6/7k)
• Fully supported with ISSU
Benefits
• Reduce time to resolution in your network.
• SMUs in NX-OS build upon years of
experience in IOS XR.
• Simplify customer operations for defect
resolution and code qualification.
• Better utilize the software HA capabilities
of NX-OS.
• Provide a common cross-platform
experience (N9K/N7K/N6K/N5K).
Who’s familiar with Software Maintenance Updates (SMU)?
36BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Any Nexus Chassis
Supporting Patching
SMU Repository
Copy to Device
SMU Removed
SMU Committed
SMU Committed
Switch# install add …
Switch# install activate …
Switch# install commit …Switch# install deactivate …
Switch# install commit …
Switch# install remove …
SMU
Memory: Process:
Memory: Process:
SMU AppliedMemory: Process:
Memory: Process:
Memory: Process:
SMU
show install active
show install committed
show install inactive
show install packages
show install pkg-info …
SMU Lifecycle – CLI
BRKDCT-2458 37
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Patching Highlights
SMU Types
• Restart: Restarts affected process
• Process restarted in all VDCs where running.
• ISSU SMU: Dual Sup -> ISSU
Single Sup -> Reload
BRKDCT-2458 38
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Patching is for operationally impacting bugs without a workaround.
• Cannot patch to next release.
• Patching is done in default/admin VDC and applies to all VDCs.
• Patching is not available per-VDC.
• ISSU will work with all, or a subset of patches applied.
• You don’t need to apply all patches.
• Some SMUs may only have a single fix, others may have multiple
packaged.
Patching Highlights
BRKDCT-2458 39
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Patching Highlights
• SMUs are TAC supported.
• SMUs are synched to standby supervisor.
• On Sup replacement, patch will be synchronized.
• SMUs are not for feature implementation. A SMU cannot change the configuration.
BRKDCT-2458 40
Operational Best PracticesHardware Maintenance
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Nexus platforms contain several EPLDs that provide hardware functionalities in all modules.
5 things to know about the EPLD:
1. EPLD Advantage: Enhancement and fixes through software upgrade.
2. EPLD upgrades are a separate and independent process from ISSU.
3. The EPLD upgrade is disruptive to the traffic; however they are performed on a module by module basis, so at any given time only one module is affected.
4. The upgrade is not always required. Please always check the EPLD Release Notes page when a new NX-OS release is posted to www.cisco.com.
Hardware MaintenanceElectronic Programmable Logic Device (EPLD) Upgrade
BRKDCT-2458 42
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
n7000# install module 1 epld bootflash:n7000-s1-epld.4.0.1.img
EPLD image file , built on Mon Mar 31 10:31:48 2008
EPLD Curr Ver New Ver
-------------------------------------------------------
Power Manager 4.1 5.3
IO 2.6 2.10
Forwarding Engine 1.4 1.6
WARNING: Upgrade process could take up to 30 minutes.
Module could be powered down and up.
Module 1 will be powered down now!!
Do you want to continue (y/n) ? [n] y
Module 1 EPLD upgrade is successful.
The following example upgrades the EPLD image for module 1. The EPLD
image should be local when the upgrade is performed.
The “install” command highlights the
EPLD version differences
EPLD upgrades are intrusive and
may take up to 30 minutes per
module!
The user is prompted to continue
This procedure is typically not required during an NX-OS upgrade.
Hardware MaintenanceEPLD Upgrade Example
NX-OS >= 6.1: Parallel EPLD Upgrades!
BRKDCT-2458 43
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Maintenance
• Power down line card prior to removal.
Nexus# out-of-service module <module-number>
• Online Insertion and Removal (OIR) of like hardware is supported, configuration maintained. Hitless with VPC provided sufficient bandwidth and port-channel distribution.
• Mixed line card deployment between VPC peers is not supported.
NOTE: Evaluate the VDC interface assignments to verify whichVDCs will experience a service impact.
Line card support matrix:
http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/nexus7000/sw/matrix/technical/reference/Module_Comparison_Matrix.pdf
Scenario: Line Card Hardware Upgrade or Replacement
However:
#conf t
#vpc domain <id>
#bypass module-check
Not BP, only corner
case, change window.
BRKDCT-2458 44
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Maintenance
1. Disconnect all of the cables attached to the front of the module to be removed.
2. Loosen the two captive screws.
3. Press the ejector release buttons on the top and bottom ends of the module to push out the ejector levers and to disconnect the module.
4. Simultaneously rotate the two ejector levers outward to unseat the module from the midplane connector.
5. With a hand on each ejector, pull the module part way out of its slot in the chassis.
6. Grasp the front edge of the module with your left hand and place your right hand under the lower side of the module to support its weight. Pull the module out of its slot.
Line Card Removal
Did you remember to put on your ESD strap?
BRKDCT-2458 45
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Maintenance
• You can do mixed supervisor (Sup1 and/or Sup2) upgrades in a VPC environment but don’t run it in production, only during maintenance or migration.
• NOTE: When migrating from N7k Sup1 to Sup2 or Sup2E, there is a specific process for installing the configuration and copying the licenses.
• Pre-load same NX-OS version prior to replacing the supervisor to avoid delays.
• If replacing the active supervisor, “system switchover” to the other supervisor before running “out-of-service slot-number”.
Scenario: Supervisor Hardware Upgrade (Dual Supervisor System)
Scenario: Supervisor Hardware Upgrade (Single Supervisor System)
• New supervisor should have same NX-OS version as VPC peer before starting to avoid VPC Type-2 inconsistency.
• Isolate for graceful withdrawal from the VPC domain
• Power down chassis
BRKDCT-2458 46
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Maintenance
• Gas up your fork lift.
• Bring switch being replaced into Graceful Insertion and Removal mode or manually isolate prior to power down.
Scenario: Chassis Hardware Upgrade
• Don’t oversubscribe the fabric when replacing fabric modules.
Scenario: Fabric Module Hardware Upgrade
n7000# show hardware fabric-utilization
------------------------------------------------
Slot Total Fabric Utilization
Bandwidth Ingress % Egress %
------------------------------------------------
1 138 Gbps 0.0 0.0
<…cut…>
BRKDCT-2458 47
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Maintenance
• Online Insertion and Removal (OIR) is supported.
• Be mindful of power budget.
Scenario: Power Supply Hardware Upgrade
n7000# show environment power
<..cut..>
Power Usage Summary:
--------------------
Power Supply redundancy mode (configured) Redundant
Power Supply redundancy mode (operational) Redundant
Total Power Capacity (based on configured mode) 6000 W
Total Power of all Inputs (cumulative) 12000 W
Total Power Output (actual draw) 1616 W
BRKDCT-2458 48
Node Isolation
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Unplanned Impact During Maintenance Window
DESIRE: Manipulate IGP to only advertise a default route as this device is the only path out of the network.
PLAN: Static default to NULL0, originate default in OSPF, northbound forwarding decisions based on BGP learned prefixes.
PROBLEM: OSPF converges before BGP, traffic black-holed.
SOLUTIONS: Isolate devices before making changes.
• Manipulate IGP/EGP, route around.
• Gracefully* shutdown IGP/EGP while preserving configuration.
Scenario
iBGP
OSPF
Waste of a good RIB.
BRKDCT-2458 50
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocol Isolation in Nexus
Option 1: Advertise as Stub Router, LSInfinity
max-metric router-lsa
[ on-startup [ seconds | wait-for bgp tag ]]
Option 2: Shutdown OSPF (Process), Preserve Configuration
router ospf 1
shutdown
NOTE: Currently, shutdown is not graceful.
Option 3: Shutdown OSPF (Interfaces)
interface e1/1
ip ospf shutdown
OSPF
Recommended
BRKDCT-2458 51
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocol Isolation in Nexus
Option 1: Advertise as with LSP Database Overload Bit set
set-overload-bit {always | on-startup
{seconds | wait-for bgp as-number}}
[suppress [interlevel | external]]
Option 2: Shutdown IS-IS (Process), Preserve Configuration
router isis 1
shutdown
Option 3: Disable IS-IS (Interfaces)
interface e1/1
isis shutdown
IS-IS
Recommended
BRKDCT-2458 52
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocol Isolation in Nexus
Option 1: Manipulate Metrics
interface e1/1
ip delay eigrp instance-tag seconds
Option 2: Shutdown EIGRP (Process) , Preserve Configuration
router eigrp 1
shutdown
NOTE: Currently, shutdown is not graceful.
Option 3: Disable EIGRP (Interfaces)
interface e1/1
ip eigrp 1 shutdown
EIGRP
Recommended
BRKDCT-2458 53
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocol Isolation in Nexus
Option 1: Advertise prefixes with longer AS path / higher local-preference
Option 2: Shutdown BGP (Process), Preserve Configuration
router bgp 65010
shutdown
NOTE: This is a not a graceful shutdown such as you would achieve with GSHUT / RFC 6198.
BGP
switch(config)# route-map prepend
switch(config-route-map)# match as-path 1
switch(config-route-map)# set as-path prepend last-as 3
switch(config)# router bgp 65000
switch(config-router)# neighbor 192.168.10.2 remote-as 20
switch(config-router-neighbor)# address-family ipv4 unicast
switch(config-router-neighbor-af)# route-map prepend out
Recommended
BRKDCT-2458 54
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocol Isolation in Nexus
Option 4: System Interface Shutdown
system interface shutdown
All Protocols
For many, this is good enough.
And, easy!
BRKDCT-2458 55
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
VPC Shutdown Feature
This feature allows customer to manually “isolate” a switch from vPC domain. This is a vPC configuration option.
Pre-7.2 Behavior
• No “shutdown”
command.
• Manual Shutdown
Required
• Down vPCs
• Down Peer Link
• vPC Members
• Etc.
PKA
Vlan 1-100
Vlan 1-100Vlan 1-100
Pri
ma
ry
Se
co
nd
ary
Configure
PKA
Vlan 1-100
Prim
ary
Se
co
nd
ary
De-configure
Vlan 1-100
Vlan 1-100
VPC Shutdown Behavior
• Local switch isolated from remote.
• Cannot exit shutdown without
manual intervention.
• When exiting, PKA, PL, and vPCs
will be re-initialized; vPC domain
brought to normal state.
Availability 5k/6k/7k: NX-OS 7.256BRKDCT-2458
NX-OS Graceful Insertion and Removal
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Graceful Insertion and Removal
feature ospf
feature vpc
Isolate for
Change Window
OSPF:
max-metric router-lsa
VPC:
shutdown
Scripting takes time.
It’d be nice to automate this…
58BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Graceful Insertion and Removal
vPC vPC
system mode maintenance
One command!Pre-change System Snapshot
Change window begins.
59BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Graceful Insertion and Removal
vPC vPC
Change window complete.
no system mode maintenance
One command!Post-change System Snapshot
60BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Flexible framework providing a comprehensive, systemic method to isolate a node.
Configuration profile foundation in NX-OS
Initial support for:– vPC/vPC+– ISIS– OSPF– EIGRP– BGP– Interface
Per VDC on Nexus 7x00
Platform Release
Nexus 5x00/6000 NX-OS 7.1
Nexus 7x00 NX-OS 7.2
Nexus 9000 NX-OS 7.0
Graceful Insertion and Removal
61BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Configuration Profiles
Automatic Profiles Manual Profiles
• Generated by default
• Parses configuration to determine
changes going into and out of GIR
• Changes based on base protocol
configuration settings.
• Use: Maintenance Windows
• User created profile for maintenance-
mode and normal-mode
• Flexible selection of protocols for
isolation
• Use: maintenance windows and
isolation during troubleshooting using
preconfigured scripts.
• Maintenance-mode profile is applied when entering GIR mode,
• Normal-mode profile is applied when GIR mode is exited.
BRKDCT-2458 62
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
N7K-1-Core# show system mode
System Mode : Normal
N7K-1-Core# config
Enter configuration commands, one per line. End with
CNTL/Z.
N7K-1-Core(config)# system mode maintenance
BGP is not enabled, nothing to be done
EIGRP is not enabled, nothing to be done
OSPF is up..... will be shutdown
OSPF TAG = 100, VRF = default
config terminal
router ospf 100
shutdown
end
OSPFv3 is not enabled, nothing to be done
ISIS is not enabled, nothing to be done
vPC is not enabled, nothing to be done
Interfaces will be shutdown
Do you want to continue (y/n)? [n] y
Generating maintenance-mode profile
Progressing...................Done.
System mode operation completed successfully
N7K-1-Core# show system mode
System Mode : Maintenance
N7K-1-Core#
Enabling Graceful Insertion and RemovalAutomatic Profile Generation
NOTE: Custom profile generation requires “dont-generate-profile”.
63BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
config-profile normal-mode type admin
router ospf 100
no shutdown
router eigrp 101
no shutdown
router isis 102
no set-overload-bit always
router bgp 103
no shutdown
vpc domain 20
no shutdown
no system interface shutdown
config-profile maintenance-mode type admin
router ospf 100
shutdown
router eigrp 101
shutdown
router isis 102
set-overload-bit always
router bgp 103
shutdown
vpc domain 20
shutdown
system interface shutdown exclude fex-fabric
Enabling Graceful Insertion and RemovalCustom Profile Generation
• By default, GIR Mode will automatically generate profiles.• CLI to disable automatic profile generation: dont-generate-profile
• If you enter GIR mode with automatic profile, it will overwrite your custom profile.
BRKDCT-2458 64
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Protocols still matter – need support for signaling state to neighbors to minimize disruption
– LACP, STP, IGMP, PIM, etc. don’t have a mechanism
– BGP, IGPs have rich set of capabilities
• OSPF: Max-metric router-lsa → graceful shutdown
• IS-IS: Overload bit → graceful shutdown
• EIGRP: Graceful shutdown with Goodbye (All K-values set to 255)
• BGP: Graceful shutdown Note: 7k supports AS-Prepend and local-preference in manual configuration mode.
Graceful Insertion and Removal Mode and Protocols
“system interface shutdown” will isolate those protocols.
65BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
1. Advertise costly metrics:
OSPF max-metric
2. Prepend AS-path
(MP-eBGP)
3. Node taken out of
forwarding path.
Graceful Insertion and Removal ModeVxLAN EVPN GIR Example
Control plane is still active, forwarding path altered.
iBGP
RR/VTEP
VTEP
N
VTEP
3VTEP
1
VTEP
2
iBGP
RR/VTEP
BRKDCT-2458 66
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Graceful Insertion and Removal ModeL2 & L3 GIR Example
L2
L3
BGP
OSPF
VPC
VPC
Shutdown
as-path
max-metric
system mode maintenance dont-generate-profile
Sys. Int.
Shutdown
BRKDCT-2458 67
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Make Nexus undesirable for all transit traffic
• Maintain Protocol Adjacencies
• Send route withdrawals/worse metrics
• Local route states are maintained.
• New CLI 'isolate' in all Unicast Protocols
• Multicast follows Unicast for RPF
Nexus 9k/7k/6k/5kGraceful Removal
N9372(config)# system
mode maintenance
Following configuration
will be applied:
router bgp 33
isolate
router eigrp 1
isolate
router ospf 1
isolate
router isis 1
isolate
5k/6k/7k: NX-OS 7.3
9k: 7.0
BRKDCT-2458 68
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
router bgp 33
isolate
router eigrp 1
isolate
router ospf 1
isolate
router isis 1
isolate
Nexus 9k/7k/6kGraceful Removal
Discontinue advertisement of all prefixes.
Advertises maximum metrics for all K-values.
max-metric router-lsa
set-overload-bit
6k/7k: NX-OS 7.3
9k: 7.0
BRKDCT-2458 69
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Move the switch from Maintenance mode to Normal mode.
• Control plane maintained throughout isolation of the switch.
• Protocols advertise routes only after it is installed in hardware.
Nexus 9k/7k/6k/5kGraceful Insertion
N9372(config)# no system
mode maintenance
Following configuration
will be applied:
router bgp 33
no isolate
router eigrp 1
no isolate
router ospf 1
no isolate
router isis 1
no isolate
5k/6k/7k: NX-OS 7.3
9k: 7.0
BRKDCT-2458 70
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus GIR Snapshots
• Used before and after a GIR mode to compare pre/post change operation.
• Snapshots are automatically generated when entering GIR mode.
switch# snapshot create snap1 For testing
Executing show interface... Done
Executing show bgp sessions vrf all... Done
Executing show ip eigrp topology summary... Done
Executing show vpc... Done
Executing show ip ospf vrf all... Done
Feature 'ospfv3' not enabled, skipping...
Snapshot 'snap1' created
Switch#
BRKDCT-2458 71
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus GIR Snapshots Comparison
Nexus# sh snapshots compare before_maintenance after_maintenance
================================================================================
Feature Tag before_maintenance after_maintenance
================================================================================
[bgp]
--------------------------------------------------------------------------------
[neighbor-id:100.120.1.221]
connectionsdropped 2 **3**
lastflap P1DT21H5M12S **P1DT21H25M47S**
lastread P1DT21H25M12S **PT0S**
lastwrite P1DT21H25M14S **PT0S**
state Established **Idle**
localport 52737 **0**
remoteport 179 **0**
notificationssent 2 **3**
<...>
{ }+-BRKDCT-2458 72
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
vPC
Primary Secondary
Po10
Po1
Po20
Po2
1 42 3
FE
X 1
01
FE
X 1
02
Overview
• Highly Redundant Design
• Dual-attached FEX
• Dual-attached Hosts
How do we upgrade this environment with minimal disruption?
Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade
V1 V1
V1 V1
BRKDCT-2458 73
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade
vPC
Primary Secondary
Po10
Po1
Po20
Po2
1 42 3
FE
X 1
01
FE
X 1
02
• Enter GIR Mode on N5k1
Traffic flow through N5k2V1 V1
V1 V1
• Upgrade N5k1
V2
• Exit GIR on N5k1
Image Version
Mismatch with
both FEXs
IF Down
IF Up, No Forwarding BRKDCT-2458 74
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade
vPC
Primary Secondary
Po10
Po1
Po20
Po2
1 42 3
FE
X 1
01
FE
X 1
02
• Manually shut down IF3 on N5k2
V1 V1
V1 V1
V2
IF Down
IF Up, No Forwarding
FEX 101 goes offline.
FEX 101 HIFs go down.
FEX 101 starts pairing process with N5k1.
V2
FEX 101 upgrades to V2.
BRKDCT-2458 75
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade
vPC
Primary Secondary
Po10
Po1
Po20
Po2
1 42 3
FE
X 1
01
FE
X 1
02
• Manually shut down IF4 on N5k2
V1 V1
V1 V1
V2
IF Down
IF Up, No Forwarding
FEX 102 goes offline.
FEX 102 HIFs go down.
FEX 102 starts pairing process with N5k1.
V2
FEX 102 upgrades to V2.
V2
BRKDCT-2458 76
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus 5k Scenario: Dual-homed FEX w/ VPCSoftware Upgrade
vPC
Primary Secondary
Po10
Po1
Po20
Po2
1 42 3
FE
X 1
01
FE
X 1
02
V1 V1
V1 V1
V2
IF Down
IF Up, No Forwarding
V2 V2
• Enter GIR Mode on N5k2
IF 3 & 4 Still Admin Down
• Upgrade N5k2
• Exit GIR on N5k2
V2
• Manual Up of IF 3 & 4
Environment upgrade completed with minimal traffic disruption.
BRKDCT-2458 77
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus Graceful Insertion and Removal ModeNXOS Software Support
Platform Min. Version
Nexus 5600/6000 7.1
Nexus 7000 7.2
Nexus 9000 (Stand-alone) 7.0
Nexus 3000 No Support
BRKDCT-2458 78
ACI Operational Practices
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Application Centric Infrastructure (ACI)
APPLICATION CENTRIC
POLICY CONTROLLERNEXUS 9500 AND 9300
ACI
Application Network Profile
Three Tier Application
Web App DB
Nexus 9k App Centric Policy APIC
Rapid Deployment of Applications onto
Networks with Scale, Security and Full Visibility
BRKDCT-2458 80
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
EPG “Users”EPG “Files”
Leaf Nodes
AVS
EPG “Internet” Service Producers
Service Consumers
ACI Architecture
Spine Nodes
Key features:
Software Defined & API driven DC fabric
Elastic & highly available Fabric based on CLOS architecture
Next generation features for security design, containment, for Operations81
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco ACI Fabric
Fabric View Controller Connectivity
BRKDCT-2458 82
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Health Score
Aggregation of system-wide health, including pod health
scores, tenant health scores, system fault counts domain
and type and the APIC cluster health state.
Aggregated View
Fabric Topology View
BRKDCT-2458 83
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCT-2458 84
Troubleshoot a flow Use ACI inbuilt Visibility engine
Faults
Drops
Eligible Path
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshoot a flow Use ACI inbuilt Visibility engine
Fabric Security Policies
Real Time Traffic
Capture
BRKDCT-2458 85
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Upgrade #1Download the release on the APIC
BRKDCT-2458 86
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Upgrade #2Upgrade APIC
BRKDCT-2458 87
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Upgrade #3Create Groups
BRKDCT-2458 88
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Upgrade #4Upgrade the Maintenance Groups
BRKDCT-2458 89
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCT-2458 90
Capacity DashboardView the capacity of Data center Fabric
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Optimizer tool DC forecasting
Simulator to verify the addition of :
• 10 tenants
• 200EPGs
• 500 contracts per EPG
On the fabric
Over capacity determined, you can tweak the attribute to check the limit of your capacity.
Create additional Node and it scale in ACI (Simulator)
BRKDCT-2458 91
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Monitor Manage Troubleshoot
• Faults
• Events
• Health Score
• Atomic Counter
• Contract deny logs
• Statistics
• Capacity Dashboard
• Image Management
• Config Export / Import
• Fabric Inventory
• Show Usage
• Configuration Rollback
• Audit Logs
• iPing
• iTraceroute
• Endpoint Tracker
• ERSPAN
• Traffic Map
• On Demand Counter View
• CLI option
Cisco ACI Deployment Lifecycle
ReactiveProactive Preemptive
Recommended Live sessions for ACI :
BRKACI-2602, LTRACI-3010 92BRKDCT-2458
NX-OS Operational & Change Management Best Practice
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Windows – Golden Rules
• Change Review Board
• Schedule when environment will be least impacted.
• Software Staging
• Verify out of band.
• Test! After and before.
BRKDCT-2458 94
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Windows Methodology Overview
Plan
• Imp.
• Back-out
• Test
Plan/Testing
(if applicable)
Peer Review Approve?Bus. Owner
Review
Approve?
BRKDCT-2458 95
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Maintenance Windows
Implement
Success?
Stage
OoB
Etc.Test Plan
Back-out
Notify
&
Document
Schedule
Change
Window
&
App Owners
BRKDCT-2458 96
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Traditional vPC Environment Change Change Best Practice & Window
vPC
Primary Secondary
Core Isolation
1. Graceful L3 Protocol Isolation
2. Layer 2 Isolation
• VPC
3. Interface Isolation
Using GIR Mode Steps 1-3 could be achieved prescriptively.
Access Isolation
1. Layer 2 Isolation
• VPC
2. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Using GIR Mode Steps 1-2 could be achieved prescriptively.
NOTE: Maintenance mode consideration should be based on Fex-fabric connectivity.
Fex
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
BRKDCT-2458 97
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
L3 Environment Change Best Practice & Window
Core Isolation
1. Graceful L3 Protocol Isolation
2. Interface Isolation
Using GIR Mode Steps 1-2 could be achieved prescriptively.
Access Isolation
1. L3 Protocol isolation
2. Layer 2 Isolation
• vPC
3. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Using GIR Mode, prescriptive isolation is possible.
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
Layer 3
BRKDCT-2458 98
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
FabricPath Environment Change Best Practice & Window
Spine Isolation
1. Use FabricPath IS-IS Overload Bit
Using GIR Mode with isolate configuration, Step 1 could be achieved prescriptively.
Leaf Isolation
1. Use FabricPath IS-IS Overload Bit
2. Shutdown the VPC+ domain.
Using GIR Mode with manual profile, step 1 could be achieved prescriptively.
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
FabricPath
BRKDCT-2458 99
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
VxLAN EVPN Environment Change Best Practice & Window Spine Isolation
1. L3 Protocol isolation
• If iBGP EVPN, consider IGP isolation
• If eBGP EVPN, consider BGP isolation
2. Interface Isolation
Using GIR Mode Steps 1-2 could be achieved prescriptively.
Leaf Isolation
1. L3 Protocol isolation
• If iBGP EVPN, consider IGP isolation
• If eBGP EVPN, consider BGP isolation
2. Layer 2 Isolation
• vPC
3. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Using GIR Mode, prescriptive isolation is possible.
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
VTEPs
VxLAN
iBGP RR
BRKDCT-2458 100
NX-OS 6.x -> 7.xUse Case
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
7k Upgrade
• Prerequisites
• Code Staging
• VPC Best Practices
• Manual Isolation of Secondary
• Protocol Isolation• Max-metric LSA, etc. -> No service impact
(0-20ms)
• VPC Isolation• Down vPCs-> No service impact (0-20ms)
• Down Peer Link-> No service impact
• Reload Upgrade->No service impact
• Peer link is brought UP-No Service impact
• South links UP – No Service impact
• North protocol Max-metric LSA removal – UP –No Service impact
NX-OS 6.x -> 7.x Use Case - Secondary7k Single Supervisor
Manual Effort
7k
2k
5k
• Peer Switch
• Peer Gateway
• Auto-recovery
• L3 Link between vPC pairs
• BFD
• Routing Protocol Convergence Tuning
BRKDCT-2458 102
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
7k Upgrade
• Manual Isolation of Secondary
• Protocol Isolation• Max-metric LSA, etc. -> No service impact (0-20ms)
• VPC Isolation• Down vPCs-> No service impact (0-20ms)
• VPC peer priority changes The secondary should have a lower priority to become the primary incase of flapping.
• Down Peer Link-> No service impact
• Peer link & PKA is brought down & Reload initiated for Upgrade->No service impact to 0-50ms impact in traffic based on traffic pattern (this switch comes as secondary)
• Peer link is brought UP-> No Service impact
• South links UP ->No Service impact
• North protocol UP ->No Service impact
Note: The System did not have firewall or LB connected directly to it.
NX-OS 6.x -> 7.x Use Case Primary7k Single Supervisor
Manual Effort
7k
2k
5k
Prerequisites
Code Staging
VPC Best
Practices
BRKDCT-2458 103
Summary
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
What to use? GIR Mode? Patching? ISSU? All of them?
Critical Bug Fix &
PSIRT
Hardware
Upgrade
New Features Risk
Adverse
ISSU ✓ X ✓ ✓
GIR + Cold Boot ✓ X ✓ ✓
GIR + Disruptive
Installer✓ X ✓ ✓
SMU Restart ✓ X X ✓
GIR + SMU ISSU ✓ X X ✓
GIR X ✓ X X
Putting It All Together
OptionSituation
105BRKDCT-2458
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Summary
• Verify environment conforms to data center networking best practices.
• Follow the your documented change management process.
• Isolate nodes during maintenance to minimize disruption.
Use GIR Mode where possible to ease isolation configuration.
BRKDCT-2458 106
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.
• Complete your session surveys through the Cisco Live mobile app or from the Session Catalog on CiscoLive.com/us.
BRKDCT-2458 107
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Lunch & Learn
• Meet the Engineer 1:1 meetings
• Related sessions
BRKDCT-2458 108
Please join us for the Service Provider Innovation Talk featuring:
Yvette Kanouff | Senior Vice President and General Manager, SP Business
Joe Cozzolino | Senior Vice President, Cisco Services
Thursday, July 14th, 2016
11:30 am - 12:30pm, In the Oceanside A room
What to expect from this innovation talk
• Insights on market trends and forecasts
• Preview of key technologies and capabilities
• Innovative demonstrations of the latest and greatest products
• Better understanding of how Cisco can help you succeed
Register to attend the session live now or
watch the broadcast on cisco.com
Thank you
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Data Center / Virtualization Cisco Education OfferingsCourse Description Cisco Certification
Introducing Cisco Data Center Networking (DCICN);
Introducing Cisco Data Center Technologies (DCICT)
Learn basic data center technologies and skills to build a
data center infrastructure.
CCNA® Data Center
Implementing Cisco Data Center Unified Fabric (DCUFI);
Implementing Cisco Data Center Unified Computing (DCUCI)
Designing Cisco Data Center Unified Computing (DCUDC)
Designing Cisco Data Center Unified Fabric (DCUFD)
Troubleshooting Cisco Data Center Unified Computing
(DCUCT)
Troubleshooting Cisco Data Center Unified Fabric (DCUFT)
Obtain professional level skills to design, configure,
implement, troubleshoot data center network infrastructure.
CCNP® Data Center
Product Training Portfolio: DCNMM, DCAC9K, DCINX9K,
DCMDS, DCUCS, DCNX1K, DCNX5K, DCNX7K
Gain hands-on skills using Cisco solutions to configure,
deploy, manage and troubleshoot unified computing, policy-
driven and virtualized data center network infrastructure.
Designing the FlexPod® Solution (FPDESIGN);
Implementing and Administering the FlexPod® Solution
(FPIMPADM)
Learn how to design, implement and administer FlexPod
solutions
Cisco and NetApp Certified
FlexPod® Specialist
For more details, please visit: http://learningnetwork.cisco.com
Questions? Visit the Learning@Cisco Booth or contact [email protected]
BRKDCT-2458 112