aci troubleshooting tools and - alcatron.net live 2015 melbourne/cisco live... · #clmel aci...
TRANSCRIPT
#clmel
ACI Troubleshooting Tools andBest Practices
BRKACI-3001
Gerard Chami – Datacenter Solutions Engineer
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Agenda
• Setting the stage for troubleshooting
• Troubleshooting Tools
• Best Practices
– Things to do
– Things to watch out for
• Conclusion
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Overview of ACI Fabric Policy MechanismsWorking with ACI
GUI
CLI
Web
API
Tools
Object
Browser
Python
SDK
Admin
REST
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Overview of ACI Fabric Policy MechanismsLogical Model
fvTenant
fvAp
fvAEPg
fvRsBd
fvAEPg
fvRsBd
fvBD
fvRsCtx
fvCtx
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Overview of ACI Fabric Policy MechanismsResolved Model
fvTenant
fvAp
fvAEPg
fvRsBd
fvAEPg
fvRsBd
fvBD
fvRsCtx
fvCtx
Policy Element
Policy Manager
fvEpPCont
fvEpP
fvLocale
fvEpP
fvLocale
fvStPathAtt
fvIfConn
fvDyPathAtt
fvIfConn
fvEpPCont
fvEpP
fvLocale
fvEpP
fvLocale
fvStPathAtt
fvIfConn
fvDyPathAtt
fvIfConn
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Overview of ACI Fabric Policy MechanismsConcrete Model
Policy Element
sys
l3Ctx
l2BD l2BD
vlanCktEp
l2RsPathDomAtt
vxlanCktEp
l3Ctx
l2BD
fvEpPCont
fvEpP
fvLocale
fvEpP
fvLocale
fvStPathAtt
fvIfConn
fvDyPathAtt
fvIfConn
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Overview of ACI Fabric Policy MechanismsHardware Programing – Forwarding Plane
iNxos
vlan
vxlan
vrf
BGPospf
isis
vrf overlay-1
interfaces
sys
l3Ctx
l2BD l2BD
vlanCktEp
l2RsPathDomAtt
vxlanCktEp
l3Ctx
l2BD
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Setting The Stage For Troubleshooting
Check health scores to narrow down affected scope
Check for faults in the system. If anything fails deployment then faults are raised
Check the resolved object model is present on both APIC and relevant Leafs
Check the concrete objects are present on the relevant Leafs
Verify iNXOS using the iNXOS shell commands
Troubleshooting Checklist
Troubleshooting Tools
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Graphic User Interface (GUI)GUI Tools
FaultsHealth Audits Events
Statistics Call-home Syslogs SNMP
11
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
APIC Command Line (iShell)APIC SSH Access
admin@fab1_apic1:~> lsaci debug mit
admin@fab1_apic1:~> cd aciadmin@fab1_apic1:aci> lsadmin fabric l4-l7-services system tenants vm-networking
admin@fab1_apic1:~> cd debugadmin@fab1_apic1:debug> lsapic1 apic2 apic3 leaf1 leaf2 spine1 spine2
admin@fab1_apic1:~> cd mitadmin@fab1_apic1:mit> lscomp dbgs expcont fwrepo topology uni
OR
SSH to the APIC
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
CLI Available at the Switch
13
leaf101# lsaci bootflash data dev isan lib mit proc sys usb var bin controller debug etc lc logflash mnt sbin tmp usr volatile
leaf101# vshCisco NX-OS Software
Enter NXOS shell
leaf101# vsh_lcmodule-1#
Enter NXOS hardware internals
CLI Shell
leaf101# bcm-shell-hwbcm-shell.0>
Entering the broadcom shell
It is also possible to execute VSH/bcm-shell-hw commands directly from iShell, using the syntaxvsh -c “<command>” vsh_lc –c “<command>” bcm-shell-hw “<command>”
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Visore – It’s Italian for Viewer!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Moquery - Command Line Cousin to Visore!
admin@apic1:~> moquery -d uni/tn-gchami-tn
Total Objects shown: 1
# fv.Tenant
name : gchami-tn
childAction :
descr :
dn : uni/tn-gchami-tn
lcOwn : local
modTs : 2015-02-04T23:27:05.622+00:00
monPolDn : uni/tn-common/monepg-default
ownerKey :
ownerTag :
rn : tn-gchami-tn
status :
uid : 15374
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
ScriptsCobra SDK ACI toolkit
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
API Inspector – Built into the GUI
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Rest API - Postman
Tools in Action
Health, Faults, Events, Audits & Objects
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Health Score Degraded - Identification
21
Navigating to the System Health Dashboard will identify the switch that has a
diminished health score
• Double clicking on that leaf will allow
navigation into the
faults raised on that
device. Here we click
on rtp_leaf1
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Drilling Down
22
Double Click on Degraded Health Score or Highlight the Health Tab
Health Score
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Getting to Object Fault
Interface 1/35 on this
Leaf having issues
Interface has a fault due to
being used by an EPG
however interface is
missing an SFP transceiver
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Faults in ACI A fault is a Managed Object (MO)
contained in M.I.T.
It is a child of the affected MO
It has the following properties:
code
severity
lifecycle
description
timestamps
Faults RN is “fault-<code>”, for example, fault-F123
Can be queried by DN and class (fault:Inst)
chassis-1
card-1 card-2
port-1
fault-F123
fault -F456
fault -F789
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Timer and severity values can be customized using monitoring policies
Fault Lifecycle
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Look for the “faults” tab on the right
Faults in GUI
Keep an eye out for faults indications
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Faults Using MoqueryGetting all faults in txt to analyze later:
leaf1# moquery -c faultInst > /tmp/fault-20141112.txtleaf1# ls -l /tmp/fault-20141112.txt-rw------- 1 admin admin 40113 Nov 13 13:37 /tmp/fault-20141112.txt
Want to get all configuration failed fault ?
leaf1# moquery -c faultInst -f 'fault.Inst.code == "F0467"' | egrep "cause|dn"cause : configuration-faileddn : uni/epp/fv-[uni/tn-testTenant2/ap-testAP/epg-testEPG]/nwissues/fault-F0467
Want that in json?leaf1# moquery -c faultInst -o json
{"imdata": [
{"faultInst": {
"attributes": {"dn": "sys/phys-[eth1/11]/fault-F1186","domain": "infra","code": "F1186","occur": "1","subject": "failure-to-deploy","severity": "warning","descr": "Port configuration failure.
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Events in GUI
• Much like other navigation / HISTORY / EVENTS
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Accounting - Audit Log
• A mechanism to track user-initiated configuration changes
• When a user creates/modifies/deletes an MO, we create an “audit record”containing affected MO DN, user name, timestamp and change details
• System also creates logs for log-in/log-out to controllers and nodes
• Similar to an entry in a log file: once created, they are never modified
• Configuration change logs are MOs of class aaaModLR
• Login/logout logs are MOs of class aaaSessionLR
• Accounting logs get deleted only when a maximum number specified in a retention policy is hit
29
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Audit Log
Who created that ?
30
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Use Moquery or Visore to Check the Model
fvTenantfvApfvAEPgfvRsBdfvBdfvRsCtxfvCtxfvSubnet…
fvCtxDeffvBDDegfvEpPfvlocalefvStPathAttfvDyPathAttfvIfconn
fvCtxDeffvBDDegfvEpPfvlocalefvStPathAttfvDyPathAttfvIfconn
l3Ctxl2BDvlanCktEpvxlanCktEpl2RtDomIfConnvlanRsPathDomAttvlanRsVlanEppAttvxlanRsVxlanEppAtt
Show vlans Show system internal epm …
Vsh_lc:
Show system internal eltmc info…Show system internal epmc info
For YourReference
Datapath Troubleshooting Tools
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
iPing CLI
iping [options] <target ip address>
options:
-V vrf name (tenant:context)-c count-i wait-p pattern-s packet size-t timeout-S source ip address or source interface
33
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Spine1
Leaf2Leaf1
EP1
Tenant: gchami
Context: vrf01
Subnet: 100.0.1.254/24
100.0.2.254/24
Iping –V gchami:vrf01 –S 100.0.1.254 100.0.1.1
100.0.1.1
TEP:10.0.96.95 TEP:10.0.96.92
iping from leaf1
iping from leaf2
snoop
• Recommend to set the source ipaddress to make clear which
gateway address is used• ICMP echo reply packet to the
remote leaf node is relayed by the
local leaf node
iPing Internal
34
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
iTraceroute CLI
Node traceroute:itraceroute <dst-ip> [<pld-size>]
Tenant traceroute:
For vlan encapsulated source EPitraceroute <dst-ip> vrf <vrf-name> [ encap vlan [<vlan-encap>] ] [ payload <pld-size> ]
For VxLAN encapsulated source EPitraceroute <dst-ip> vrf <vrf-name> encap vxlan [<vxlan-encap>] dst-mac <dst-mac> [ { payload <pld- size> } ]
35
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Tenant iTraceroute - Example
pod2-leaf1# itraceroute 10.11.1.11 vrf RDTenant traceroute to 10.11.1.11, tenant VRF RD, source encap vlan-2101, from [10.0.40.66], payload 56 bytes
Path 11: TEP 10.0.64.65 intf eth1/33 0.746 ms2: TEP 10.0.40.65 intf eth1/97 0.490 ms
Path 21: TEP 10.0.64.64 intf eth1/33 0.812 ms2: TEP 10.0.40.65 intf eth1/98 0.526 ms
37
Spine1
Pod2-Leaf4Pod2-Leaf1
TEP: 10.0.40.66 TEP:10.0.96.92
Spine2
TEP: 10.0.64.64 TEP: 10.0.64.65
10.11.1.11
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Node iTraceroute - Example
pod2-leaf1# itraceroute 10.0.40.95
Node traceroute to 10.0.40.95, infra VRF overlay-1, from [10.0.40.66], payload 56 bytes
Path 1
1: TEP 10.0.64.64 intf eth1/33 0.611 ms
2: TEP 10.0.40.95 intf eth1/98 0.608 ms
Path 2
1: TEP 10.0.64.65 intf eth1/33 0.473 ms
2: TEP 10.0.40.95 intf eth1/97 0.540 ms
38
Spine1
Pod2-Leaf4Pod2-Leaf1
TEP: 10.0.40.66 TEP:10.0.96.92
Spine2
TEP: 10.0.64.64 TEP: 10.0.64.65
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
ACI Span
Infrastructure Span:• Meant for traffic to/from access ports.
• Infra SPAN supports both local and to remote (ERSPAN) destinations.
• Infra SPAN can also be filtered by an EPG.
• Configured in Fabric access policies Troubleshoot policies Span (source and dest)
Fabric Span:• Meant for traffic to/from fabric ports (Leaves and Spines).
• Fabric SPAN supports remote destinations only, and can be filtered with a Bridge-domain or Network Context.
• Configured in Fabric Fabric policies Troubleshoot policies span
Tenant Span:• Meant for traffic to/from EPGs and supports only remote destination.
• Configured in The concerned Tenant Troubleshoot policies Span (remote always)
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Atomic Counters
• Troubleshooting tool to count packets and bytes between a source and a destination
• Only packets that traverse the fabric are counted
• Locally switched packets are not counted
• Packets switched in the hypervisors are not counted
• There are two types of counters: “ongoing” and “on demand” counters
• NTP must be properly setup and operational on each nodes and APIC (check on node with show ntp peer-status)
40
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Ongoing Atomic Counters
• Ongoing Atomic Counters are not user-configurable
• They count packets at the infrastructure level: the source and destination of the flow are Tunnel End Points (TEPs)
• Example: all packets sent from L1 to L3
• Paths are unidirectional– L1-to-L3 ≠ L3-to-L1
L1 L2 L3 L4
S1 S2
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
On Demand Atomic Counters
• On Demand counters are configured by the tenant to troubleshoot issues at the level of individual applications
• The source and destination can be EPs, EPGs, IP addresses or “Any”
• For example, packets from
EP1 to EPg2
EP1 EP2 EP3EPG2
S1 S2
Gathering Logs
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Tech Support Features
• One interface to collect tech-support from any subset of fabric components and features
• Save to fabric, or export to remote server
• On-demand or periodic
• Configurable data collection
• Downloadable via http from the fabric
• Tech-Support are HUGE !!! (multi gig of tar data)
• They mostly contain logs useful for development. For Postmortem of an event recommended to get Tech-support of APIC’s and impacted leaves ASAP as some logs rollover quickly.
44
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Create Tech Support Policy
In Admin Import/Export export policies Techsupport …
45
For YourReference
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
ACI Core Files
APIC Cluster Troubleshooting
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
admin@ifav1-ifc1:~> acidiag fnvread
ID Name Serial Number IP Address Role State LastUpdMsgId
---------------------------------------------------------------------------------------------------------------------------------------------
1017 ifav1-leaf1 SAL17267Z9S 10.0.63.127/32 leaf active 0
1018 ifav1-leaf2 SAL1739D5WU 10.0.63.125/32 leaf active 0
1200 ifav1-spine1 SAL1748H575 10.0.63.126/32 spine active 0
APIC Debug Commandsacidiag fnvread
Show Fabric node vector
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
APIC Debug Commands
Replica id State 6=UP
Leadership
state
APIC where it
is running
“acidiag rvread” shows replica which are not healthy
“acidiag rvread <svc><shard><replica>” to see the state of one replica
acidiag rvread
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
APIC Debug Commands
Cluster size Chassis ID ActiveSummary of
replica health
acidiag avread
Show APIC controller application vector
ACI Best Practices
Things to do
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public57
Plan Your Naming Wisely!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Every Object In ACI Must Be Named.
TenantApp
Profile
Bridge
Domain
Private
Network Contract
Filter
Subnet
EPGAttachable
Entity Profile
Filter
Interface
Profile
Switch
Profile
Interface
Selector
VMM
Domain
VLAN PoolPhysical
Domain
L3
Outside
L2
Outside
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
A Good Naming Scheme Is Essential.
VLAN Pools
VL-DVS
VL-AVS
VL-L2-Out
VL-L3-Out
Filters
flt-http
flt-https
flt-sql
Interface Profiles
iprof-ucs
iprof-fex
iprof-ext-switch
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Setup NTP!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Symmetry is Beautiful!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Choose Matching Interface Numbers
1/10 1/10
Makes For Simpler Policy!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Regular Config Exports
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Plan Your Maintenance!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Upgrade Maintenance Groups
Maintenance group 1 Maintenance group 2
Things To Watch Out For!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
The ACI Fabric Cannot Currently Be Used as a ‘Transit’ Network.
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
No Transit Routing Through Fabric.
ACI Fabric
Context / VRF
L3 out
In this example, 192.168.1.0/24 will not
be advertised to Router B.
Router A Router B
192.168.1.0/24
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Use a Unique ACI ‘infra’ IP Range when Provisioning the APIC.
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
VTEP addresses are allocated to nodes in the fabric automatically based on the pool configured at fabric initialisation.
Future services may cause overlapping address space issues.Changing the infra IP range is difficult – so choose unique range.
Leaf 110.0.104.95
Leaf 210.01.104.96
Leaf 310.0.104.97
Spine 110.0.104.92
Spine 210.0.104.93
Infra range:
10.0.0.0/16
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Try to Choose a Unique ‘infra’ VLAN within the Fabric.
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
With AVS, the ‘infra’ VLAN gets extended out of the fabric.
Leaf 1 Leaf 2 Leaf 3
Spine 1 Spine 2Infra VLAN:
4093
ESXi
AVS
If the default infra VLAN of 4093 is used – this becomes
an issue if AVS is deployed ‘behind’ a Nexus 7K, etc due to reserved VLAN ranges on that platform.
The infra VLAN should be chosen carefully if this
scenario is required.
Nexus 7K
VLA
N
4093
VLA
N
4093
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Remember, Adjacent OSPFDevices must use be Configured for NSSA!
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
External
Router
External
Router
Leaf Leaf Leaf Leaf
Spine Spine
router ospf 1
vrf Blue
area 0.0.0.1 nssa
vrf Red
area 0.0.0.2 nssa
Q & A
© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public
Give us your feedback and receive a
Cisco Live 2015 T-Shirt!
Complete your Overall Event Survey and 5 Session
Evaluations.
• Directly from your mobile device on the Cisco Live
Mobile App
• By visiting the Cisco Live Mobile Site
http://showcase.genie-connect.com/clmelbourne2015
• Visit any Cisco Live Internet Station located
throughout the venue
T-Shirts can be collected in the World of Solutions
on Friday 20 March 12:00pm - 2:00pm
Complete Your Online Session Evaluation
Learn online with Cisco Live! Visit us online after the conference for full
access to session videos and
presentations. www.CiscoLiveAPAC.com
Thank you.