operating an asr 1000 -...
TRANSCRIPT
Operating an ASR 1000
BRKARC-2019
Jason Yang – CCIE #10467
Technical Marketing Engineer
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Session Goals
• With over 100,000+ ASR 1000 chassis deployments in the field, customers are thirsty for recommendations and best practices how to operating this platform effectively in their networks.
• This session will share
1. What and how to monitor ASR 1000 during daily operation, including best practice & recommendations.
2. How to detect DoS Attack and Mitigation Best Practices on ASR 1000
3. The most popular problems seen in the field, the workaround and solutions to prevent them from happening.
ASR 1000 ISSU with MDR Demo!
3
© 2014 Cisco and/or its affiliates. All rights reserved. Presentation_ID Cisco Public
Agenda
• Platform Introduction
• What and How to Monitor System in Daily Operation
• DoS Attack Detection and Mitigation Best Practices
• Troubleshooting Common Problems
• In Service Software Upgrade with Minimal Disruptive Restart (Demo)
• Summary and Take Away
4
Platform Introduction
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Midplane
ASR1000 Building Blocks
ES
P FECP
QFP Crypto
Assist.
interconnect
PPE BQS
ES
P
FECP
QFP Crypto
Assist.
interconn.
PPE BQS
FECP
Crypto
Assist.
interconnect
RP
CPU
interconn. GE switch S
IP
SPA SPA
IOCP SPA
Aggreg.
interconnect
RP
CPU
interconn. GE switch
SIP
SPA SPA
IOCP SPA
Aggreg.
interconnect
SIP
SPA SPA
IOCP SPA
Aggreg.
interconnect
Route Processor
Handles control plane
Manages system Embedded Service Processor
Handles forwarding plane traffic
SPA Interface Processor
Houses SPA’s
Buffer packets in & out
• Route Processor (RP) • Handles control plane traffic • Manages system
• Embedded Service Processor (ESP) • Handles forwarding plane traffic
• SPA Interface Processor (SIP) • Shared Port Adapters provide interface
connectivity
• Centralized Forwarding Architecture • All traffic flows through the active ESP,
standby is synchronized with all flow state with a dedicated 10-Gbps link
• Distributed Control Architecture • All major system components have a
powerful control processor dedicated for control and management planes
6
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR 1000 Software (IOS XE) Architecture
ES
P
RP
IOS
active
Platform Adaptation Layer
(PAL)
Forwarding
manager
SIP
IOS
standby
Chassis
manager
Linux Kernel
Forwarding
manager Chassis
manager
Linux Kernel
QFP client / driver
QFP code
Linux Kernel
Chassis
manager
SPA driver SPA driver SPA driver
• Runs Control Plane
• Generates configurations
• Maintains routing tables (RIB, FIB…)
• Initialization of RP processes
• Initialization of installed cards
• Detects and manages OIR of cards
• Manages system status,
environmentals, power, EOBC
• Provides abstraction layer between
hardware & IOS
• Manages ESP redundancy
• Maintains copy of FIB and interface list
• Communicates FIB status to active &
standby ESP
• Programs QFP forwarding
plane and QFP DRAM
• Statistics collection & RP
communication
• Communicates with forwarding
manager on RP
• Maintains copy of FIBs
• Provides interface to QFP
client & driver
• Driver Software for SPA
interface cards is loaded
independently
• Failure or upgrade of driver
does not affect other SPAs
in the chassis
Control
messaging
7
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR1000 Chassis ASR 1001 ASR 1002 ASR 1002-X
ASR 1004 ASR 1006
AS
R 1
01
3
Expansion slots 1 SPA slots 3 SPA slots 3 SPA slots 8 SPA slot
using 2 SIP cards
12 SPA slots
using 3 SIP cards
24 SPA slots
using 6 SIP cards
RP Slots Integrated Integrated Integrated 1 2 2
ESP Slots Integrated 1 Integrated 1 2 2
SIP Slots Integrated Integrated Integrated 2 3 6
IOS Redundancy Software Software Software Software Hardware Hardware
Built-In Ethernet 4 GE 4 GE 6 GE N/A N/A N/A
Height 1.75” (1RU) 3.5” (2RU) 3.5” (2RU) 7” (4RU) 10.5” (6RU) 22.7” (13RU)
Bandwidth 2.5 to 5 Gbps 5 to 10 Gbps 5 to 36 Gbps 10 to 40 Gbps 10 to 100 Gbps 40 to 200 Gbps
Max Output Pwr 400W 470W 470W 765W 1275W 3200W
Airflow Front to back Front to back Front to back Front to back Front to back Front to back
8
What and How to Monitor - System Bootup
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• ASR 1000 image can be booted from
1. Bootflash (best practice – supported in all chassis/RP)
2. Harddisk storage purpose
3. External USB The only official support USB: MEMUSB-1024FT, non-Cisco USB can result in Kernel crash
Once image booted from USB, can not remove it, otherwise can result in Kernel crash
The best practice is to use USB to copy image to bootflash and boot from bootflash
4. TFTP
Where an image can be booted from
ASR1001 ASR1002 ASR1002-X RP1 RP2
Built-in eUSB
Bootflash
8GB 8GB 8GB 1GB 2GB
Harddisk N/A N/A 160GB (optional) 40GB 80GB
External USB MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT
10
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Mastership determines which RP becomes RPact (and which RP becomes RPsby)
• R0/R1, F0/F1 whichever boot up first become the Master, if boot up simultaneously R0/F0 is preferred over R1/F1 as Master.
• Status of ASR 1000 hardware component is kept in the RPs chassis management process CMRP
ASR 1000 Initialization Sequence
POST
HW Initialization
Initialize EOBC
Boot Kernel
Start IOS
CMRP detects cards via CPLD
CMRP determines Master RP and ESP
CMRP informs SIPs & ESP about Master via I2C
CMRP downloads SIP & ESP software packages to SIP / ESP
CMRP sends ESI config to CMSIP and CMESP
POST
HW Initialization
Initialize EOBC
Wait for RP Master
Detect RPact via ROMMON
Upload inventory via CPLD
ROMMON download software package
Boot Kernel
CMESP registers with CMRP
CMESP starts QFP
CMESP signals ready to RP
CMESP sends ESI link status
POST
HW Initialization
Initialize EOBC
Wait for RP Master
Detect RPact via ROMMON
Upload inventory via CPLD
ROMMON download software package
Boot Kernel
CMSIP registers with CMRP
CMSIP starts IOS-XE for SPAs
CMSIP sends ESI link status
RP ESP SIP
11
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• To check the status of each Module, use the show platform.
• This command is aware of the status of each Module
• Syslog is also generated for module status change
Display module status
ASR1000# show platform
Chassis type: ASR1006
Slot Type State Insert time (ago)
--------- ------------------- --------------------- -----------------
1 ASR1000-SIP10 ok 6d17h
1/0 SPA-1X10GE-L-V2 ok 6d17h
1/1 SPA-8X1GE-V2 ok 6d17h
2 ASR1000-SIP10 ok 6d17h
2/0 SPA-1X10GE-L-V2 ok 6d17h
2/1 SPA-8X1GE-V2 ok 6d17h
R0 ASR1000-RP1 ok, active 6d17h
R1 ASR1000-RP1 ok, standby 6d17h
F0 ASR1000-ESP10 ok, active 6d17h
F1 ASR1000-ESP10 ok, standby 6d17h
P0 ASR1006-PWR-DC ok 6d17h
P1 ASR1006-PWR-DC ps, fail 6d17h
Jun 26 07:35:09.169 UTC: %IOSXE_PEM-1-PEMFAIL: The PEM in slot 1 is switched off or encountering a failure condition
12
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
IOS XE 'show version' Display Improvement
Before XE3.10 After XE3.10
ASR1000# show version
Cisco IOS Software, IOS-XE Software
(X86_64_LINUX_IOSD-ADVENTERPRISE-M),
Version 15.1(1)S, RELEASE SOFTWARE
(fc1)
Technical Support:
http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco
Systems, Inc.
Compiled Mon 22-Nov-10 12:19 by
mcpre
ASR1000# show version
Cisco IOS XE Software, Version
03.10.00.S - Extended Support
Release
Cisco IOS Software, ASR1000 Software
(X86_64_LINUX_IOSD-UNIVERSALK9-M),
Version 15.3(3)S, RELEASE SOFTWARE
(fc1)
Technical Support:
http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco
Systems, Inc.
Compiled Thu 25-Jul-13 18:03 by
mcpre
13
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ROMmon Upgrade • ASR1k image has grown to more than 500MB in XE3.8, customer must
upgrade to 15.2(1r)S ROMMON release in order to boot this image.
• It is critical to meet ROMMON release requirement to be able to boot up system and FRUs successfully - Read ROMmon Release Requirements
- Follow ROMmon upgrade procedure
ASR1000# copy ftp://asr:[email protected]/asr1000-rommon.152-1r.S.pkg bootflash:
Accessing ftp://*****:*****@223.255.254.234/asr1000-rommon.152-1r.S.pkg... Loading asr1000-rommon.152-1r.S.pkg !!!!! [OK -
1253680/4096 bytes]
1253680 bytes copied in 0.716 secs (1750950 bytes/sec)
ASR1000# upgrade rom-monitor filename bootflash:asr1000-rommon.152-1r.S.pkg all
Chassis model ASR1001 has a single rom-monitor.
Upgrade rom-monitor
Target copying rom-monitor image file File /tmp/rommon_upgrade/latest.bin is a FIPS ROMMON image 65536+0 records in 65536+0
records out 65536+0 records in 65536+0 records out Checking upgrade image... 1114112+0 records in 2176+0 records out Upgrade image
MD5 signature is fe18056d332dced800d0632a0f629675 65536+0 records in 65536+0 records out 65536+0 records in 65536+0 records out
65536+0 records in 65536+0 records out Burning upgrade partition... 1114112+0 records in 1114112+0 records out
Checking upgrade partition... 1114112+0 records in 1114112+0 records out Upgrade flash partition MD5 signature is
fe18056d332dced800d0632a0f629675 ROMMON upgrade complete.
To make the new ROMMON permanent, you must restart the RP
ASR1000# reload
14
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Problem: You may reuse a SPA from other Cisco platforms, and the existing SPA FPD is incompatibility with ASR1k image, SPA is “out of service”.
SPA Field-Programmable Devices (FPD)
*Sep 10 03:30:47.921: %SPA_OIR-3-SPA_POWERED_OFF: subslot 0/0: SPA 1xOC3 ATM SPA powered off after 5 failures
within 1200 seconds
*Sep 10 03:30:47.921: %SPA_OIR-6-OFFLINECARD: SPA (SPA-1XOC3-ATM-V2) offline in subslot 0/0
*Sep 10 03:30:47.913: %ATMSPA-3-HW_ERROR: SIP0/0: SPA-1XOC3-ATM-V2[0/0] Error 0x1C53 SPI4 initialization failed
ASR1000#show platform
Chassis type: ASR1006
Slot Type State Insert time (ago)
--------- ------------------- --------------------- -----------------
0 ASR1000-SIP40 ok 00:03:31
0/1 SPA-1XOC3-ATM-V2 out of service 00:00:55
R0 ASR1000-RP2 ok, active 00:03:31
F0 ASR1000-ESP40 ok, active 00:03:31
P0 ASR1006-PWR-AC ok 00:03:15
P1 ASR1006-PWR-AC ps, fail 00:03:15
ASR1000#show hw-module subslot all fpd
==== ====================== ====== =============================================
H/W Field Programmable Current Min. Required
Slot Card Type Ver. Device: "ID-Name" Version Version
==== ====================== ====== ================== =========== ==============
0/1 SPA-1XOC3-AT<DISABLED> 1.80 ???????????? ?.? ?.?
==== ====================== ====== =============================================
15
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• How to fix it: follow FPD upgrade procedure
FPD – Cont’d
ASR1000# upgrade hw-module subslot 0/1 fpd bundled % Cannot get FPD version information from SPA-1XOC3-ATM-V2 in subslot 0/1.
If a previous upgrade attempt on the target card was interrupted, then the corruption of FPD image might have prevented the card from coming online. If this is the
case, then a recovery upgrade would be required to fix the failure
(Hit ENTER to proceed with recovery upgrade operation) [confirm] Y
% The following FPD will be upgraded for SPA-1XOC3-ATM-V2 (H/W ver = 1.80) in subslot 0/1:
================== =========== =========== ============
Field Programmable Current Upgrade Estimated
Device: "ID-Name" Version Version Upgrade Time
================== =========== =========== ============
1-I/O FPGA ?.? 2.2 00:07:00
================== =========== =========== ============
% NOTES:
- Use 'show upgrade fpd progress' command to view the progress of the FPD
upgrade.
- Since the target card is currently in disabled state, it will be automatically reloaded after the upgrade operation for the changes to take effect.
% Do you want to perform the recovery upgrade operation? [no]: yes
% Starting recovery upgrade operation in the background ...
(Use "show upgrade fpd progress" command to see upgrade progress)
ASR1000#
*Sep 9 22:44:10.604: %FPD_MGMT-6-UPGRADE_TIME: Estimated total FPD image upgrade time for SPA-1XOC3-ATM-V2 card in subslot 0/1 = 00:07:00.
*Sep 9 22:44:10.873: %FPD_MGMT-6-UPGRADE_START: I/O FPGA (FPD ID=1) image upgrade in progress for SPA-1XOC3-ATM-V2 card in subslot 0/1. Updating to version 2.2. PLEASE
DO NOT INTERRUPT DURING THE UPGRADE PROCESS (estimated upgrade completion time = 00:07:00) ...
ASR1000# show upgrade fpd progress FPD Image Upgrade Progress Table:
==== =================== ==================================================== Approx.
Field Programmable Time Elapsed
Slot Card Type Device : "ID-Name" Needed Time State
==== =================== ================== ========== ========== ===========
0/1 SPA-1XOC3-ATM-V2 1-I/O FPGA 00:07:00 00:02:52 Updating...
==== =================== ====================================================
16
What and How to Monitor - Management Interface & Features
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Mgmt Interface • ASR 1000 has out-of-band Mgmt GE interface attached to the RP
• This interface on a default Mgmt-vrf, can not be removed/changed
• Many mgmt features needs to be configured with vrf options or use Gig0 as source interface: tftp, ntp, snmp, syslogging, tacacs/radius
!!!! ntp
ntp server vrf Mgmt-intf 10.1.1.1 !
!!!! logging
logging host 10.1.1.1 vrf Mgmt-intf
!!!! domain name assignment
ip domain-name vrf Mgmt-intf ntt.com
!!!! DNS service
ip name-server vrf Mgmt-intf 5.20.1.2
!!!! tftp
ip tftp source-interface GigabitEthernet0
!!!! radius server
aaa group server radius foo / ip vrf forwarding
Mgmt-intf
!!!! tacacs+ server
aaa group server tacacs+ bar / ip vrf forwarding
Mgmt-intf
!!!! snmp
snmp-server source-interface traps gigabitEthernet 0
!!!! FTP service
ip ftp source-interface gigabitEthernet 0
18
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• There are few exceptions: Flexible Netflow Export & NAT/FW High Speed Logging (HSL).
• They are directly exported by QFP.
• HSL - ASR 1000 export Netflowv9-like records to an external collector for session creation/deletion events with 5-tuples.
• HSL export rate ~78k events/sec
• HSL supported collector – Isarflow, Lancope, ActionPacked.
Mgmt Interface – cont’d
19
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• CDP
1. The behavior on other Cisco routing platform is that cdp is enabled by default.
2. On ASR1k CDP is disabled on global & interface level by default – need explicitly turn it on.
• NTP – for 7200 migration, ntp update-calendar command is not needed on ASR1k
1. On ASR 1000, NTP runs within IOS daemon (IOSd), which updates the time on the Linux kernel.
2. As the Linux kernel updates the hardware clock every 11 minutes, NTP does not interact with the hardware clock directly.
CDP, NTP
20
What and How to Monitor - Facility & Environment
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR1000 PEM (Power Entry Module) = P/S + Integrated FANs
• P/S Failure:
• The power supplies are redundant. If one of the P/S in a PEM fails, then the system will continue to function with the redundant P/S in the second PEM
• Failure of a P/S does not affect the FANs; The fans source 12 VDC from the backplane, just like the SIPs and will continue to function
• FAN Failure:
• A single fan failure has no impact on the other fans in the PEM
• On multi fan failure a critical alarm will be generated. The system will continue to run and the behavior would be based on where the fan failure occurred.
• For example, if a single fan failed in PEM1 and PEM2 the system would run without issues. But if 2 fans failed in PEM2 (or PEM1) it’s possible that insufficient cooling would eventually result in unpredictable system behavior (most likely a card would stop working).
PEM1
PEM2
PEM1
PEM2
22
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Facilities & Environment can be monitored via
1. SNMP: CISCO-ENTITY-FRU-CONTROL-MIB to monitor FRU status, CISCO-ENTITY- ALARM-MIB to monitor power supply and fan, CISCO-ENTITY-SENSOR-MIB to monitor sensors
2. Show command
• Configure the below CLIs to generate the traps snmp-server enable traps fru-ctrl
snmp-server enable traps alarms
• Recommended traps to monitor cefcModuleStatusChange
cefcPowerStatusChange
cefcFRUInserted
cefcFRURemoved
entConfigChange
entSensorThresholdNotification
Facility & Environment Monitoring
ASR1000# show facility-alarm status
System Totals Critical: 1 Major: 1 Minor: 0
Source Severity Description [Index]
------ -------- -------------------
Cisco ASR1004 AC Power Sup Critical Power Supply Failure [0]
SPA subslot 0/1 MAJOR Unknown state [0]
ASR1000# show environment all | inc R0
V1: VMA R0 Normal 1201 mV
V1: VMB R0 Normal 2495 mV
V1: VMC R0 Normal 3295 mV
V1: VMD R0 Normal 2495 mV
V1: VME R0 Normal 1796 mV
V1: VMF R0 Normal 1528 mV
…
Temp: Outlet R0 Normal 28 Celsius
Temp: CPU AIR R0 Normal 30 Celsius
Temp: Inlet R0 Normal 21 Celsius
Temp: SCBY AIR R0 Normal 41 Celsius
Temp: MCH DIE R0 Normal 48 Celsius
Temp: MCH AIR R0 Normal 36 Celsius
Temp: C2D C0 R0 Normal 32 Celsius
Temp: C2D C1 R0 Normal 32 Celsius
23
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Before using CISCO-ENTITY-SENSOR-MIB to monitor env, 1st use ENTITY-MIB to find out entPhysicalDescr ID:
Facility & Environment Monitoring – cont’d
• Then search CISCO-ENTITY-SENSOR-MIB for required data, such as polling RP CPU temperature ENTITY-MIB::entPhysicalDescr.8022 = STRING: Temp: CPU AIR
[root@shmcp-lnx-1 ~]# snmpwalk -v 2c -c public 5.28.28.10
1.3.6.1.2.1.47.1.1.1.1.2 | more ENTITY-MIB::entPhysicalDescr.1 = STRING: Cisco ASR1013 Chassis
ENTITY-MIB::entPhysicalDescr.2 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.3 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.4 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.5 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.6 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.7 = STRING: CC Slot
ENTITY-MIB::entPhysicalDescr.8 = STRING: RP Slot
ENTITY-MIB::entPhysicalDescr.9 = STRING: RP Slot
ENTITY-MIB::entPhysicalDescr.10 = STRING: FP Slot
ENTITY-MIB::entPhysicalDescr.11 = STRING: FP Slot
ENTITY-MIB::entPhysicalDescr.12 = STRING: Power Supply Bay
ENTITY-MIB::entPhysicalDescr.13 = STRING: Cisco ASR1013 AC Power Supply
ENTITY-MIB::entPhysicalDescr.14 = STRING: PEM Iout
ENTITY-MIB::entPhysicalDescr.15 = STRING: PEM Vout
ENTITY-MIB::entPhysicalDescr.16 = STRING: PEM Vin
ENTITY-MIB::entPhysicalDescr.17 = STRING: Temp: PEM
ENTITY-MIB::entPhysicalDescr.18 = STRING: Temp: FC
ENTITY-MIB::entPhysicalDescr.23 = STRING: Power Supply
ENTITY-MIB::entPhysicalDescr.24 = STRING: Fan
ENTITY-MIB::entPhysicalDescr.25 = STRING: Fan
ENTITY-MIB::entPhysicalDescr.26 = STRING: Fan
ENTITY-MIB::entPhysicalDescr.32 = STRING: Power Supply Bay
ENTITY-MIB::entPhysicalDescr.8022 = STRING: Temp: CPU AIR
[root@shmcp-lnx-1 ~]# snmpwalk -v 2c -c public 5.28.28.10
1.3.6.1.4.1.9.9.91 | grep 8022
CISCO-ENTITY-SENSOR-MIB::entSensorValue.8022 = INTEGER: 30
CISCO-ENTITY-SENSOR-MIB::entSensorStatus.8022 = INTEGER: ok(1)
24
What and How to Monitor - System Resources used by Features
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Feature <> ESP Resources Dependency
QFP complex
Crypto
(Nitrox-II
CN2430)
FECP GE, 1Gbps
I2C
SPA Control
SPA Bus
ESI, 11.2Gbps
SPA-SPI, 11.2Gbps
Hypertransport, 10Gbps
Other
RPs RPs RPs ESP SIPs
E-RP* PCI*
E-CSR
TCAM
Resource
DRAM
(512MB)
Packet Buffer
DRAM
(128MB)
Part Len /
BW SRAM
SA table
DRAM
Dispatcher Packet Buffer
DDRAM
Boot Flash
(OBFL,…)
JTAG Ctrl
Reset / Pwr Ctrl
…
Packet Processor Engines
PPE1 PPE2 PPE3 PPE4 PPE5
PPE6 PPE7 PPE8 PPE40
BQS
Reset / Pwr Ctrl
Interconnect
SPI Mux
Interconnect
EEPROM
Temp Sensor
• QoS Mark/Police
• NAT sessions
• IPSec SA
• Netflow Cache
• FW hash tables
• Memory for FECP
• QFP client / driver
• QoS Class maps
• FM FP
• Statistics
• ACL ACEs copy
• NAT config objects
• IPSec/IKE SA
• NF config data
• ZB-FW config objects
• QoS Queuing
• NAT VFR re-assembly
• IPSec headers
• Class/Policy Maps: QoS,
DPI, FW
• ACL/ACE, Route-map
• IPSec Security Association
class groups, classes, rules
26
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Key System Resources to Monitor - Summary
IOS
Forwarding
Manager
Forwarding
Manager
QFP Client
Driver
Datapath
SIP
show proc cpu sort show mem stat
RP memory RP CPU
TCAM
resource DRAM
pkt memory
crypto assist QFP
ESP memory
show plat
software status
control-processor
brief
show plat
software status
control-processor
brief
show plat
software status
control-processor
brief
FECP CPU
show plat software
status control-
processor brief
show plat hardware
qfp active infra
exmem statistics
show platform hardware
crypto-device
utilization
show plat hardware qfp
active datapath util
summary
show plat hardware
qfp active tcam
resource-manager
-usage
85%
75%
• Each system resource monitoring is explained in details in following slides
75% 75%
27
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Mitigation Plan when run out of resources
• Before upgrading RP, Memory or ESP, customer can immediate take following actions to reduce system utilization:
IOS/RP Memory 1. Reduce prefixes received from a peer
1. neighbor { ip-address} maximum-prefix <number of prefixes>
2. Turn off Software Redundancy
1. redundancy \ mode none
QFP Resources DRAM 1. Reduce NAT max-entries:
ip nat translation max-entries <number of entries>; nat64 translation max-entries <number of entries>
2. Reduce FW session limit:
1. parameter-map type inspect global \ session total <count>
3. Reduce FNF cache limit:
1. flow monitor M1 \ cache entries <number of entries>
28
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• CPU Load in IOSd process
show processes cpu
• In IOSd, to investigate the memory is occupied by which process use the traditional command:
show memory
show memory allocating-process totals
Key System Resources to Monitor - IOSd CPU & Memory Utilization
29
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• For an overview of each Module CPU load on the ASR 1000, use the following command:
Key System Resources to Monitor - Control CPU & Memory Utilization (1)
Load Average represents the process queue or process contention for CPU resources.
1. On a single core processor, an instantaneous load of “7” would mean that seven
processes were “ready to run”, one of which is currently running.
2. On a dual core processor, a load of “7” would represent seven processes were ready to run, two of which are currently running.
ASR1000# show platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
RP0 Healthy 0.06 0.06 0.01
RP1 Healthy 0.06 0.04 0.01
ESP0 Healthy 0.01 0.00 0.00
ESP1 Healthy 0.00 0.00 0.00
SIP1 Healthy 0.04 0.03 0.01
SIP2 Healthy 0.00 0.00 0.00
Sample EEM script to
trigger the Load
monitoring at section
end reference slide
30
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Memory utilization is represented by the following: – Total – Total card memory
– Used – Consumed memory
– Free – Available memory
– Committed – Virtual memory committed to processes
Key System Resources to Monitor - Control CPU & Memory Utilization (2)
<continued from last show command output>
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
RP0 Critical 3919788 3891940 (94%) 27848 (0%) 2005100 (48%)
RP1 Healthy 3919788 1164924 (28%) 2754864 (66%) 1994212 (48%)
ESP0 Healthy 2030288 520744 (24%) 1509544 (71%) 2816620 (134%)
ESP1 Healthy 2030288 514972 (24%) 1515316 (72%) 2816356 (134%)
SIP1 Healthy 484332 311868 (59%) 172464 (32%) 262472 (50%)
SIP2 Healthy 484332 332252 (63%) 152080 (29%) 317648 (60%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
RP0 0 1.28 1.15 0.00 97.25 0.01 0.10 0.20
RP1 0 0.94 1.23 0.00 97.48 0.00 0.02 0.30
ESP0 0 0.56 0.66 0.00 98.76 0.00 0.00 0.00
ESP1 0 0.52 0.64 0.00 98.82 0.00 0.00 0.00
SIP1 0 0.47 0.45 0.00 99.04 0.00 0.01 0.00
SIP2 0 0.58 0.53 0.00 98.85 0.00 0.01 0.00
Status: Critical,
Warning, Healthy.
Definition in
reference slide at
section end
31
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• CPU utilization is a two second relative percentage average of the number of processes requesting CPU resources at a given time and is represented by the following fields: – CPU – The allocated processor
– User – Non-Linux kernel processes
– System – Linux kernel process
– Nice – Low priority processes
– Idle – Percentage of time the CPU was inactive
– IRQ – Interrupts
– SIRQ – System Interrupts
– IOwait – Percentage of time CPU was waiting for IO
• To read real time util:
Key System Resources to Monitor - Control CPU & Memory Utilization (3)
*the first set of values is Invalid. Only the 2nd cycle or higher has valid CPU reported
ASR1000# show platform software process slot RP active monitor cycles 2 | inc Cpu|Mem
Cpu(s): 1.1%us, 1.0%sy, 0.0%ni, 97.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16343244k total, 3988416k used, 12354828k free, 202964k buffers
Swap: 0k total, 0k used, 0k free, 1414668k cached
Cpu(s): 3.8%us, 0.3%sy, 0.0%ni, 95.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16343244k total, 3988788k used, 12354456k free, 202964k buffers
Swap: 0k total, 0k used, 0k free, 1414796k cached
32
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• To check process in each Module, use following command to check in VTY
• Enter “m” to sort by memory usage
Key System Resources to Monitor - Control CPU & Memory Utilization (4)
*the "monitor" command does not work with console, vty works by default.
*Don’t screen shot the 1st output, let the cycle go through few times.
ASR1000# monitor platform software process fp active
Tasks: 80 total, 4 running, 76 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0% us, 0.3% sy, 0.0% ni, 98.7% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 2030288k total, 525260k used, 1505028k free, 21228k buffers
Swap: 0k total, 0k used, 0k free, 192024k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4750 root 20 0 645m 92m 31m S 0.7 4.6 26:36.97 cpp_cp_svr
5597 root 20 0 502m 45m 24m S 0.3 2.3 6:00.44 fman_fp_image
5737 root 20 0 16108 5732 4104 R 0.3 0.3 12:39.08 hman
7321 root 20 0 8876 2200 1712 R 0.3 0.1 0:00.03 in.telnetd
7392 binos 20 0 2496 1212 976 R 0.3 0.1 0:00.10 top
1 root 20 0 2132 632 544 S 0.0 0.0 0:10.63 init
33
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• CISCO-PROCESS-MIB is made to support 64 bits architecture which runs on RP2/ASR1001/ASR1002-X
• CISCO-PROCESS-MIB is able to monitor CPUs on RP, ESP and SIP. Only Active RP/ESP can be monitored, not standby.
• Here is an example:
Key System Resources to Monitor - Control CPU & Memory Utilization (5)
1) Find out the index for the RP’s cpmCPUTotal1min
sw-mrrbu-nms-1-76-> getmany -v2c 9.0.0.52 cpmCPUTotalPhysicalIndex
cpmCPUTotalPhysicalIndex.2 = 7031 ->7031 is RP cpu physical index in entity
mib, so use 2 as index for RP cpmCPUTotal1min
2) The OID used to retrieve instance for the RP’s cpmCPUTotal1min
sw-mrrbu-nms-1-77-> getone -v2c 9.0.0.52 cpmCPUTotal1min.2
cpmCPUTotal1min.2 = 58
Please note that “cpmCPUTotal1min.2” is same as OID “1.3.6.1.4.1.9.9.109.1.1.1.1.4.2”
34
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• To display the QFP utilization, use the following command:
Key System Resources to Monitor - QFP & Resource DRAM Utilization (1)
ASR1000# show platform hardware qfp active datapath utilization summary
CPP 0: 5 secs 1 min 5 min 60 min
Input: Total (pps) 1625349 1625340 1625345 1625345
(bps) 1708810504 1708399184 1708085344 1708039368
Output: Total (pps) 1625333 1625338 1625344 1625344
(bps) 1786828168 1786418448 1786105008 1786059008
Processing: Load (pct) 2 2 2 2
>=99% indicates
crypto chip is
reaching perf limit
>=99% indicates
crypto chip is
reaching perf limit
>=99% indicates
crypto chip is
reaching perf limit
>=97% indicates QFP
chip is reaching
perf limit
35
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• DRAM on QFP usage can be found on the following command
Key System Resources to Monitor - QFP & Resource DRAM Utilization (2)
ASR1000# show platform hardware qfp active infrastructure exmem statistics QFP exmem statistics
Type: Name: DRAM, QFP: 0
Total: 1073741824
InUse: 124180480
Free: 949561344
Lowest free water mark: 949561344
Type: Name: IRAM, QFP: 0
Total: 134217728
InUse: 8134656
Free: 126083072
Lowest free water mark: 126083072
Type: Name: SRAM, QFP: 0
Total: 32768
InUse: 15088
Free: 17680
Lowest free water mark: 17680
%util = InUse/Total
36
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ceqfpUtilProcessingLoad.9072.1 = 2
ceqfpUtilProcessingLoad.9072.2 = 2
ceqfpUtilProcessingLoad.9072.3 = 2
ceqfpUtilProcessingLoad.9072.4 = 2
ceqfpMemoryResTotal.9072.1 = 1073741824
ceqfpMemoryResInUse.9072.1 = 124180480
ceqfpMemoryResFree.9072.1 = 949561344
ceqfpMemoryResLowFreeWatermark.9072.1 = 949561344
ceqfpMemoryResRisingThreshold.9072.1 = 97
ceqfpMemoryResFallingThreshold.9072.1 = 93
bash-2.05b$ getmany 9.76 ciscoEntityQfpMIB
ceqfpUtilInputTotalPktRate.9072.1 = 0x00018cd05
ceqfpUtilInputTotalPktRate.9072.2 = 0x00018ccfc
ceqfpUtilInputTotalPktRate.9072.3 = 0x00018cd01
ceqfpUtilInputTotalPktRate.9072.4 = 0x00018cd01
ceqfpUtilInputTotalBitRate.9072.1 = 0x065da6108
ceqfpUtilInputTotalBitRate.9072.2 = 0x065d41a50
ceqfpUtilInputTotalBitRate.9072.3 = 0x065cf5060
ceqfpUtilInputTotalBitRate.9072.4 = 0x065ce9cc8
ceqfpUtilOutputTotalPktRate.9072.1 = 0x00018ccf5
ceqfpUtilOutputTotalPktRate.9072.2 = 0x00018ccfa
ceqfpUtilOutputTotalPktRate.9072.3 = 0x00018cd00
ceqfpUtilOutputTotalPktRate.9072.4 = 0x00018cd00
ceqfpUtilOutputTotalBitRate.9072.1 = 0x06a80d588
ceqfpUtilOutputTotalBitRate.9072.2 = 0x06a7a9510
ceqfpUtilOutputTotalBitRate.9072.3 = 0x06a75ccb0
ceqfpUtilOutputTotalBitRate.9072.4 = 0x06a751900
• CISCO-ENTITY-QFP-MIB to monitor QFP Processing & Memory Util
Key System Resources to Monitor - QFP & Resource DRAM Utilization (3)
37
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Exceeding 95% threshold:
*Sep 24 10:15:14.249: %BW_LICENSE-5-THROUGHPUT_THRESHOLD_LEVEL: F0:
cpp_ha: Average throughput rate
had exceeded 95 percent of licensed bandwidth 10000000000 bps 1 times, sample period
300 seconds, in last 24 hours
Exceeding total bw:
Sep 24 10:42:28.450: %BW_LICENSE-4-THROUGHPUT_MAX_LEVEL: F0: cpp_ha:
Average throughput rate had
exceeded the total licensed bandwidth 10000000000 bps and dropped 1 times, sample
period 300 seconds, in last 24 hours.
• Syslog when throughput exceeds BW license (ASR1001, ASR1002-X)
Key System Resources to Monitor - QFP & Resource DRAM Utilization (4)
38
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
bash-2.05b$ getmany -v2c 2.0.40.25 clmgmtLicenseFeatureName
clmgmtLicenseFeatureName.7000.2.1 = adventerprise
clmgmtLicenseFeatureName.7000.2.2 = advipservices
clmgmtLicenseFeatureName.7000.2.3 = fwnat_red
clmgmtLicenseFeatureName.7000.2.4 = ipsec
clmgmtLicenseFeatureName.7000.2.5 = lawful_intr
clmgmtLicenseFeatureName.7000.2.6 = sw_redundancy
clmgmtLicenseFeatureName.7000.2.7 = throughput_10g
clmgmtLicenseFeatureName.7000.2.8 = throughput_20g
clmgmtLicenseFeatureName.7000.2.9 = throughput_36g
• CISCO-LICENSE-MGMT-MIB to manage throughput license inUse
Key System Resources to Monitor - QFP & Resource DRAM Utilization (5)
bash-2.05b$ getmany -v2c 2.0.40.25 clmgmtLicenseStatus
clmgmtLicenseStatus.7000.2.1 = notInUse(2)
clmgmtLicenseStatus.7000.2.2 = notInUse(2)
clmgmtLicenseStatus.7000.2.3 = notInUse(2)
clmgmtLicenseStatus.7000.2.4 = notInUse(2)
clmgmtLicenseStatus.7000.2.5 = notInUse(2)
clmgmtLicenseStatus.7000.2.6 = notInUse(2)
clmgmtLicenseStatus.7000.2.7 = inUse(3)
clmgmtLicenseStatus.7000.2.8 = notInUse(2)
clmgmtLicenseStatus.7000.2.9 = notInUse(2)
39
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR1000# show platform hardware qfp active tcam resource-manager usage QFP TCAM Usage Information
80 Bit Region Information
--------------------------
Name : Leaf Region #0
Number of cells per entry : 1
Current 80 bit entries used : 0
Current used cell entries : 0
Current free cell entries : 0
160 Bit Region Information
--------------------------
Name : Leaf Region #1
Number of cells per entry : 2
Current 160 bits entries used : 6
Current used cell entries : 12
Current free cell entries : 4084
320 Bit Region Information
--------------------------
Name : Leaf Region #2
Number of cells per entry : 4
Current 320 bits entries used : 0
Current used cell entries : 0
Current free cell entries : 0
Total TCAM Cell Usage Information
----------------------------------
Name : TCAM #0 on CPP #0
Total number of regions : 3
Total tcam used cell entries : 12
Total tcam free cell entries : 524276
Threshold status : below critical limit
• QFP TCAM usage can be found in following command:
Key System Resources to Monitor - TCAM (1)
40
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Key System Resources to Monitor - TCAM (2)
• For TCAM monitoring, keep an eye on syslog:
%QFPTCAMRM-6-TCAM_RSRC_ERR: F0: QFP_sp: Allocation failed because of insufficient TCAM resources in the system
• Recommendations
1. Test out TCAM utilization before making changes
2. Always there should be unused TCAM entries which are = or > the size of biggest ACL on the router.
• Be aware of the TCAM deny jump issue (often seen in NAT/FW/IPsec deployment and workaround/solution)
41
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Show platform hardware crypto-device utilization
Key System Resources to Monitor - Crypto Chip Utilization (1)
ASR1000# show platform hardware crypto-device utilization
Past crypto device utilization:
1 min (percentage) : 0%
(decrypt pkt): 220997
(encrypt pkt): 173747
5 min (percentage) : 0%
(decrypt pkt): 115381
(encrypt pkt): 897157
15 min (percentage) : 0%
(decrypt pkt): 3320368
(encrypt pkt): 2614638
>=97% indicates
crypto chip is
reaching perf limit
42
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
bash-2.05b$ getmany 9.76 ciscoEntityQfpMIB
cepStatsMeasurement.9028.1.1 = Counter64: 0
cepStatsMeasurement.9028.1.5 = Counter64: 221029
cepStatsMeasurement.9028.1.6 = Counter64: 173838
cepStatsMeasurement.9028.2.1 = Counter64: 0
cepStatsMeasurement.9028.2.5 = Counter64: 1153432
cepStatsMeasurement.9028.2.6 = Counter64: 896529
cepStatsMeasurement.9028.3.1 = Counter64: 0
cepStatsMeasurement.9028.3.5 = Counter64: 3321126
cepStatsMeasurement.9028.3.6 = Counter64: 2614265
• CISCO-ENTITY-PERFORMANCE-MIB is able to monitor Crypto Chip Util
Key System Resources to Monitor - Crypto Chip Utilization (2)
43
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Control-Process Health Definition (1)
Board FIELD WARNING CRITICAL FIELD WARNING CRITICAL FIELD WARNING CRITICAL
SIP10 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
SIP40 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP5 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP10 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP20 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP40 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP100 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ESP200 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
RP1 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
RP2 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ASR1001 1-MIN 8 12 5-MIN 8 12 15-MIN 10 15
ASR1002 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8
ASR1002-X 1-MIN 8 12 5-MIN 8 12 15-MIN 10 15
• “show platform software status control-processor brief” output in slide 30, the Load Average Status can be Healthy, Warning and Critical, this table provides the Warning and Critical status threshold for each field
44
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Control-Process Health Definition (2)
Board FIELD WARNING CRITICAL FIELD WARNING CRITICAL FIELD WARNING CRITICAL
SIP10 Committed 95% 100% MemFree 10% 5% MEMUSED 90% 95%
SIP40 Committed 95% 100% MemFree 10% 5% MEMUSED 90% 95%
ESP5 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ESP10 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ESP20 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ESP40 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ESP100 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ESP200 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
RP1 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
RP2 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ASR1001 Committed 300% 310% MemFree 10% 5% MEMUSED 90% 95%
ASR1002 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%
ASR1002-X Committed 300% 310% MemFree 10% 5% MEMUSED 90% 95%
• “show platform software status control-processor brief” output in slide 31, the Memory Status can be Healthy, Warning and Critical, this table provides the Warning and Critical status threshold for each field
45
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Triggered EEM Script to monitor system load
• This is a sample EEM script that monitors RP0 one minute load.
– A load of .70 triggers actions 1 through 5.
– Action 1 generates a log message when the script triggers.
– Actions 2 through 5 run CLI, outputs them to the bootflash, and appends the cpuinfo file
event manager applet capture_cpu_spike
event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.24.2 get-type exact entry-op ge entry-val 69
exit-time 180 poll-interval 2
action 1.0 syslog msg ”Load is high. Check bootflash:cpuinfo for details."
action 2.0 cli command "en"
action 3.0 cli command "show clock | append bootflash:cpuinfo"
action 4.0 cli command "show platform software status control-processor br | append
bootflash:cpuinfo"
action 5.0 cli command "show platform software process slot rp active monitor | append bootflash:cpuinfo"
46
DoS Attack Detection and Mitigation Best Practices
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• DoS attack is basically an attempt to make a resource unavailable to its intended users.”
1. Consumption of computational resources, such as bandwidth, or CPU cycles.
2. Disruption of configuration information, such as routing information.
3. Disruption of state information, such as unsolicited resetting of TCP sessions.
4. Obstructing the communication between the intended users and the router
• Additional targets of DoS attacks.
1. Trigger errors in packet forwarding.
2. Trigger errors in the sequencing of instructions, to force instability or lock-up.
3. Buffer starvation and/or system thrashing.
4. Crash the operating system itself
DoS Introduction
48
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Example Attack Type
1. ICMP
SMURF
PING
2. TCP/SYN
3. Teardrop
Mangling packets structure/content
4. Nuke
Rapid packet generation
DoS Introduction – cont’d
49
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• Buffer Overflow messages
• Packet Memory Out of resources messages
• CPUHOG Messages
• For example:
ASR1000#show logging
Syslog logging: enabled (0 messages dropped, 18 messages rate-limited, 58 flushes, 0 overruns, xml disabled, filtering disabled)
……
Apr 9 22:12:21.399 JST: %IOSXE-2-PLATFORM: F1: cpp_cp: QFP:00 Thread:077
TS:00022029349683022400 %HAL_PKTMEM-2-OUT_OF_RESOURCES:
Check Buffer Utilization
ASR1000#show buffers
Public buffer pools:
Small buffers, 104 bytes (total 4000, permanent 4000, peak 6010 @ 3w4d):
DoS Detection (1)
50
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Detection (2)
Check CPU Utilization Check Process Resources
ASR1000# show processes cpu extended
Global Statistics
-----------------
5 sec CPU util 72%/26% Timestamp 3w5d
Queue Statistics
----------------
Common Process Information
-------------------------------
PID Name Prio Style
-------------------------------
443 PPPoE Discovery M New
118 ATM Periodic H New
172 Ethernet Timer C H New
173 Ethernet Msec Ti H New
CPU Intensive processes
-----------------------------------------------------------------
PID Total Exec Quant Burst Burst size Schedcall Schedcall
CPUms Count avg/max Count avg/max(ms) Count Per avg/max
-----------------------------------------------------------------
443 2523 34016 0/9 16997 0/9 17008 0/10
ASR1000# show processes 443
Process ID 443 [PPPoE Discovery Daemon], TTY 0
Memory usage [in bytes]
Holding: 822944, Maximum: 0, Allocated: 2941696, Freed: 1176616
Getbufs: 0, Retbufs: 0, Stack: 43288/48000
CPU usage
PC: 2D9A89F, Invoked: 63684786, Giveups: 31842392, uSec: 55
5Sec: 71.43%, 1Min: 78.51%, 5Min: 65.63%, Average: 0.00%
Age: 2273107279 msec, Runtime: 3564164 msec
State: Waiting for Event, Priority: Normal
51
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Detection (3)
Check FP punt activity Check FP punt policer
ASR1000# show platform software infrastructure packet
Statistics for Punt Path activities:
19858208 total packets processed
0 minimum packet received, 2048 maximum packet received
0 minimum packet process switched, 7 maximum packet process
switched
0 msec minimum clock runtime, 30 msec maximum clock runtime
0 msec minimum cpu runtime, 2 msec maximum cpu runtime
6797817 puntpath invocation, 6797817 with message invocation
FP - Punt Policer:
ASR1000# show platform hardware qfp active infrastructure punt
statistics type global-drop
Global Drop Statistics
Number of global drop counters = 21
Counter ID Drop Counter Name Packets
-------------------------------------------------------------
016 PUNT_CAUSE_GLOBAL_POLICER 27117
ASR1000# show platform software punt-policer Per Punt-Cause Policer Configuration and Packet Counters
Punt Configured (pps) Conform Packets Dropped Packets
Cause Description Normal High Normal High Normal High
---------------------------------------------------------------------------------------------
- 2 IPv4 Options 4000 3000 0 0 0 0
3 Layer2 control and legacy 40000 10000 1203060 2146805 0 0
4 PPP Control 2000 1000 0 0 0 0
5 CLNS IS-IS Control 2000 1000 0 0 0 0
6 HDLC keepalives 2000 1000 0 0 0 0
7 ARP request or response 2000 1000 0 68540 0 0
8 Reverse ARP request or re... 2000 1000 0 0 0 0
9 Frame-relay LMI Control 2000 1000 0 0 0 0
10 Incomplete adjacency 2000 1000 0 5 0 0
11 For-us data 40000 5000 803926 0 0 0
12 Mcast Directly Connected ... 2000 1000 0 0 0 0
13 Mcast IPv4 Options data p... 2000 1000 0 0 0 0
14 MPLS TTL expired 5120 2000 0 0 0 0
19 Mcast Internal Copy 2000 1000 0 0 0 0
20 Mcast IGMP Unroutable 2000 1000 0 0 0 0
24 Glean adjacency 2000 5000 0 35052 0 0
25 Mcast PIM signaling 2000 1000 0 0 0 0
27 ESS session control 10000 40000 0 30507493 0 288003062
52
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Detection (4)
Check FP per-cause punt
ASR1000# show platform hardware qfp active infrastructure punt statistics type per-cause clear
Global Per Cause Statistics
Per Inject Cause Statistics
Packets Packets
Counter ID Inject Cause Name Received Transmitted
--------------------------------------------------------------------------------------
000 RESERVED 0 0
001 L2 control/legacy 0 0
002 QFP destination lookup 0 0
003 QFP IPv4/v6 nexthop lookup 0 0
004 QFP generated packet 0 0
005 QFP <->RP keepalive 2 0
006 QFP Fwall generated packet 0 0
007 QFP adjacency-id lookup 0 0
008 Mcast specific inject packet 0 0
009 QFP ICMP generated packet 0 0
010 QFP/RP->QFP ESS data packet 0 0
011 SBC DTMF 0 0
012 ARP request or response 0 0
013 Ethernet OAM loopback packet 0 0
014 Ingress redirect packet 0 0
015 PPPoE discovery packet 48764 48741
016 PPPoE session packet 0 0
53
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Mitigation (1)
ASR1000 implemented global policer to rate limit punt packets @ 146484 pps/2.5Gbps, in addition implemented per cause punt policer based on common feature punt cause to classify punt packets into high & normal queues and set policing threashold for each.
Per cause policer can be seen via show platform software punt-policer
Control-Plane Policing is a security feature designed to protect control-plane
Linux Kernel
IOSd
Per-punt Policer Inject Logic
ESP
RP
CoPP
Following classification criteria are supported in CoPP:
– match access-group – match dscp – match ip dscp – match ip precedence – match precedence – match protocol arp – match protocol ipv6 – match protocol pppoe – match protocol pppoe-discovery – match qos-group – match ipv6 ACL HBH
Global Policer
54
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Mitigation (2)
Global Config Interface Config
no ip source-route
ip arp gratuitous none
no ip gratuitous-arps
no ip bootp server
interface GigabitEthernet <num>
description UNI facing interface
no ip directed-broadcast
ip verify unicast reverse-path
ip access-group <SECURITY> in
ip access-group <SECURITY> out
55
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Mitigation (3)
Control Plane Policing - ARP Control-Plane Policing - TRUSTED
HOSTS
Control-Plane Policing - SNMP
class-map match-all ARP
match protocol arp
!
policy-map CONTROL-PLANE-POLICY
class ARP
police rate 1 pps burst 50 packets
conform-action transmit
exceed-action drop
class-map match-all TRUSTED-HOSTS
match access-group name TRUSTED-HOSTS
!
ip access-list extended TRUSTED-HOSTS
remark allow traffic for trusted monitoring and
maintenance
permit udp any eq 1701 any eq 1701
permit icmp any any echo-reply
permit icmp any any unreachable
permit icmp any any time-exceeded
……
!
policy-map CONTROL-PLANE-POLICY
class TRUSTED-HOSTS
police rate 1 pps burst 100 packets
conform-action transmit
exceed-action transmit
class-map match-all SNMP
match access-group name SNMP
match access-group name TRUSTED-HOSTS
!
ip access-list extended SNMP
remark SNMP Servers
permit udp <SNMP Server> any eq snmp
!
policy-map CONTROL-PLANE-POLICY
class SNMP
police rate 100 pps burst 100 packets
conform-action transmit
exceed-action drop
56
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
DoS Mitigation (4)
Control Plane Policing – IPv6 ICMP Control-Plane Policing – IPv6 TRUSTED
HOSTS
Control-Plane Policing – IPv6 Control
class-map match-all IPv6-ECHO-REQUEST-REPLY
match access-group name IPv6-ECHO-REQUEST-
REPLY
!
ipv6 access-list IPv6-ECHO-REQUEST-REPLY
permit icmp any any echo-request
permit icmp any any echo-reply
!
policy-map CONTROL-PLANE-POLICY
class IPv6-ECHO-REQUEST-REPLY
police rate 20 pps burst 20 packets
conform-action transmit
exceed-action drop
class-map match-all IPv6-TRUSTED-HOSTS
match access-group name IPv6-TRUSTED-HOSTS
!
ipv6 access-list IPv6-TRUSTED-HOSTS
<IPv6 Trusted Host Addresses>
!
policy-map CONTROL-PLANE-POLICY
class IPv6-TRUSTED-HOSTS
police rate 500 pps burst 1000 packets
conform-action transmit
exceed-action transmit
class-map match-all IPv6-CONTROL
match access-group name IPv6-CONTROL
!
ipv6 access-list IPv6-CONTROL
remark Permit NDP RA Type packets
permit icmp any any nd-ns
permit icmp any any nd-na
permit icmp any any router-advertisement
permit icmp any any router-solicitation
……
!
policy-map CONTROL-PLANE-POLICY
class IPv6-CONTROL
police rate 200 pps burst 1000 packets
conform-action transmit
exceed-action drop
57
Troubleshooting Common Problems
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Type of Crashes and Impact
Types of Crash The impact Crashinfo File Name Core Dump File Name File Location
IOSD Crash The box is reloaded (single RP)
IOSd switchover (dual IOSd)
RP switchover (dual RPs)
crashinfo_RP_SlotNumber
_00_Date-Time-Zone
hostname_RP_SlotNumber_ppc
_linux_iosd-_ProcessID.core.gz
Bootflash:
ASR1001, 1002,
1002-X
Harddisk: ASR1004,
1006, 1013
Bootflash:core/
ASR1001, 1002,
1002-X
Harddisk:core/
ASR1004, 1006,
10013
SPA Driver
Crash SPA reloaded
crashinfo_SIP_SlotNumbe
r_00_Date-Time-Zone
hostname_SIP_SlotNumber_mc
pcc-lc-ms_ProcessID.core.gz
QFP ucode
Crash ESP reload n/a
hostname_ESP_SlotNumber_cp
p-mcplo-ucode_ID.core.gz
IOS XE Process
Crash The module reloaded n/a
hostname_FRU_SlotNumber_P
rocessName_ProcessID.core.gz
Linux Kernel
Crash The module reloaded n/a
hostname_FRU_SlotNumber_k
ernel.core
59
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR 1000 Route Scale vs. Memory Allocation
RP and Physical Memory Memory Allocated to IOSd
(w/o IOSd redundancy enabled)
Memory Allocated to Kernel and
other processes
IPv4 Route/FIB Scale
ASR 1001/1002-X (4GB) 1.2GB 2.8GB 500K/500K
ASR 1001/1002-X (8GB) 4GB 4GB 1M/1M
ASR 1001/1002-X (16GB) 7GB 9GB 1M/3.5M
RP1 (4GB) 1.7GB 2.3GB 1M
RP2 (8GB) 4.2GB 3.8GB 1M
RP2 (16GB) 10GB 6GB 4M
• Memory allocation is fixed by design, not configurable.
• ASR 1001/1002-X memory is shared among RP, ESP, SIP. Recommend 8GB for Internet Gateway deployment and do not turn on dual IOSd.
• Additional ISP peering becomes BGP multi-paths, additional path result in ~20% BGP memory consumption overhead. Have seen 2-5 peerings live deployment on ASR1001/1002-X (8GB).
• If turn on IOSd redundancy, memory allocated to each IOSd is further reduced by more than half. Dual IOSd requires minimum 8GB.
• ASR1001 & ASR1002-X are the most common Internet Gateway platforms, customer may have mistakenly deployed 4GB memory to cause memory allocation failure (often seen in 7200 migration).
60
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Memory Upgrade on ASR1001, ASR1002-X & RP2
Memory PID Option Slot 0 (U101D) Slot 1 (U103D) Slot 2 (U100D) Slot 3 (U102D)
M-ASR1K-1001-4GB 2 GB module 2 GB module ___ ___
M-ASR1K-1001-8GB 2 GB module 2 GB module 2 GB module 2 GB module
M-ASR1K-1001-16GB 4 GB Module 4 GB Module 4 GB Module 4 GB Module
Memory PID Option Slot 0 (U2D0) Slot 1 (U2D1) Slot 2 (U1D0) Slot 3 (U1D1)
M-ASR1002X-4GB 2 GB module ___ 2 GB module ___
M-ASR1002X-8GB 2 GB module 2 GB module 2 GB module 2 GB module
M-ASR1002X-16GB 4 GB Module 4 GB Module 4 GB Module 4 GB Module
• Cisco ASR1002-X Router Memory DIMMs
• Cisco ASR1001 Router Memory DIMMs
Memory PID Option Slot 0 (U2D0) Slot 1 (U2D1) Slot 2 (U1D0) Slot 3 (U1D1)
M-ASR1K-RP2-8GB= 2 GB module 2 GB module 2 GB module 2 GB module
M-ASR1K-RP2-16GB= 4 GB Module 4 GB Module 4 GB Module 4 GB Module
• Cisco ASR1000-RP2 Memory DIMMs
4GB to 8GB Upgrade, replace all DIMMs
61
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
TCAM Deny-Jump Issue
• Problem Description:
In ASR 1000 IPsec/FW/NAT deployment, user may see following message:
“%CPP_FM-3-CPP_FM_TCAM_ERROR: F0: cpp_sp: TCAM limit exceeded…”
• Error Message Explanation:
This is an protection mechanism prevents system from crashing with WATCH-DOG timeout error or malloc failure.
• Root Cause Analysis:
1. Classification engine in the TCAM can only represent permit.
2. System convertes the DENY entries into PERMIT ones using cross product
3. This recursive nature cause the required number of entries to “explode”.
62
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
TCAM Deny-Jump Issue – cont’d • Workaround:
1. Before deploying the platform in production, apply the configuration in lab
2. Modify the ACLs to use multiple specific permit statement, and try to reduce or eliminate the explicit use of deny statement
3. Use PBR to bypass NAT
1. Static NAT
• Solutions:
1. IOS XE3.10 introduced the SW classification engine to handle deny-jump like classification
2. System still use TCAM as long as it has room, in case TCAM does not fit, it will switch to SW classification engine.
Original NAT Config VASI & PBR to bypass NAT
ip nat inside source list NAT-ACL pool NAT-POOL overload
!
ip access-list extended NAT-ACL
deny ip any 129.25.0.0 0.0.255.255
permit ip 172.19.0.0 0.0.0.255 any
ip nat inside source list NAT-ACL pool NAT-POOL overload
!
interface GigabitEthernet0/0/1
description nat inside interface
ip address 6.1.1.1 255.255.255.0
ip nat inside
ip policy route-map no-NAT-rmap
interface vasileft1
ip address 13.1.1.1
!
interface vasiright1
ip address 13.1.2.1 255.255.255.0
!
ip access-list extended NAT-ACL
permit ip 172.19.0.0 0.0.0.255 any
ip access-list extended bypass-NAT
permit ip any 129.25.0.0 0.0.255.255
!
route-map no-NAT-rmap permit 10
match ip address bypass-nat
set interface vasileft1
Original NAT Config Identity NAT
ip nat inside source list NAT-ACL pool NAT-POOL overload
!
ip access-list extended NAT-ACL
deny ip host 172.19.1.1 any
permit ip 172.19.0.0 0.0.0.255 any
ip nat inside source static 172.19.1.1 172.19.1.1 no-alias
ip nat inside source list NAT-ACL pool NAT-POOL overload !
ip access-list extended NAT-ACL
permit ip 172.19.0.0 0.0.0.255 any
63
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
NAT ADDR ALLOC FAILURE • Problem Description:
In ASR 1000 PAT/Overload configuration, system get error message:
"%NAT-6-ADDR_ALLOC_FAILURE: Address allocation failed; pool 1 may be exhausted”
• Debug Information that should be gathered: show platform hardware qfp active feature nat data pool
show platform hardware qfp active feature nat data port
show platform hardware qfp active feature nat data stat
show platform hardware qfp active feature nat data base
show ip nat translation | inc <global address of interest>
• Common Reason for Failure:
Customer has a small pool which is being consumed by non-PATTAble binds.
A non-PATtable bind will show in 'sh ip nat trans' as a single local associated with a single global IP address.
It consumes an entire address in the pool.
--- 213.252.7.132 172.16.254.242 ---
64
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
NAT ADDR ALLOC FAILURE – cont’d • Solution 1
1. A non-PAttable bind could be created by packet with a non-PATTable protocol.
2. The best way to prevent this is to tighten the ACL to exclude non-PAttable protocols.
• Solution 2
1. A non-PAttable bind could be created by ALG like DNS which does not have ports in its L7 header has requested a global NAT address.
2. Often customers do not need the DNS ALG so the solution is to turn it off.
3. Below shows the most common ALGs which produce non-PAttable binds being turned off.
access-list 100 permit udp 13.1.0.0 0.0.255.255 any
access-list 100 permit tcp 13.1.0.0 0.0.255.255 any
access-list 100 permit icmp 13.1.0.0 0.0.255.255 any
no ip nat service dns udp
no ip nat service dns tcp
no ip nat service netbios-ns tcp
no ip nat service netbios-ns udp
no ip nat service netbios-ssn
no ip nat service netbios-dgm
no ip nat service ldap
65
In Service Software Upgrade with Minimal Disruptive Restart (MDR)
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
• In Service Software Upgrade (ISSU) is a procedure backed by Cisco IOS infrastructure to accomplish an upgrade/downgrade while packet forwarding continues
• This procedure takes advantage of redundant processors, Cisco Graceful Restart, Non Stop Routing, SSO/NSF
• Minimal Disruptive Restart (MDR) keep interface UP and minimizes traffic disruption during ASR1k SIP/SPA upgrade by not resetting the hardware or reprogramming the data paths
ISSU and MDR
67
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
1. RPBase: RP Linux operating system
Upgrading of the OS will require reload to the RP and expect minimal changes
2. RPIOS: IOS executable
facilitates Software Redundancy feature
3. RPAccess (K9 & non-K9): Software required for Router access
Two versions available (with and without open SSH & SSL)
facilitates software packaging for export-restricted countries
4. RPControl : control plane processes for IOS / hardware interface
IOS XE Middleware
5. ESPBase: All ESP code
Any software upgrade of the ESP requires reload of the ESP
6. SIPBase/ELCBase: SIP/ELC OS & control processes
OS upgrade requires reload of the SIP
7. SIPSPA/ELCSPA: SPA drivers and SPA FPD
Facilitates SPA driver upgrade of specific SPA slots
IOS XE Modularity made ISSU/MDR Possible
ES
P
RP
IOS
active
Platform Adaptation Layer
(PAL)
Forwarding
manager
SIP
/ELC
SSH/SS
L
Chassis
manager
Linux Kernel
Forwarding
manager Chassis
manager
Linux Kernel
QFP client
QFP driver
Linux Kernel
Chassis
manager
SPA driver SPA driver SPA driver
Control
messaging
1
3 2
4
5
6
7
68
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
ASR 1000 Super-Package ISSU
ACT RP
ACT
ESP SIP
Version X
Version X
Version X
STBY RP
STBY
ESP
Version X
Version X
ACT RP
SIP
Version X
STBY RP
Version Y
Version X
issu loadversion
STBY RP
SIP (MDR)
Version X
ACT RP
ACT ESP
Version Y
Version Y
Version Y
issu runversion
(switchover)
issu acceptversion
(stop rollbacktimer)
issu commitversion
(finalizes new file version)
issu abortversion
Automatic rollback
or
issu abortversion
STBY RP
STBY ESP
SIP
Version Y
ACT RP
ACT ESP
Version Y
Version Y
hw-module slot
<STBY_RP> reload
Version Y
Version Y
STBY ESP
Version Y
ACT
ESP
Version X
STBY
ESP
Version X
Entire procedure can be automated by one shot ISSU command:
request platform software package install node file <filename> mdr
69
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
MDR Operation Summary
SIP/SPA/ELC Configurations MDR Operation
SIP40 w/ GE/10GE/POS SPAs
or
ASR1000-2T+20x1GE ELC
or
ASR1000-6TGE ELC
MDR ISSU
SIP40 w/ mixed GE/10GE/POS SPAs and other
SPAs
(without force keyword in ISSU CLI)
ISSU halt with warning message
SIP40 w/ mixed GE/10GE/POS SPAs and other
SPAs
(with force keyword in ISSU CLI)
MDR Performed for SIP40 and GE/10GE/POS SPAs
Non-MDR ISSU performed for non-MDR SPAs
SIP40 without any SPAs Non-MDR ISSU
SIP10 w/ any type of SPA Non-MDR ISSU
70
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
MDR Traffic Loss Observations
Features Traffic Loss Configuration
IPv4 w/ OSPF 0.02msec • 1 OSPF neighbor
• traffic rate @1mpps
IPv6 w/ OSPFv3 0.14msec • OSPFv3 PE-CE
• Traffic rate @142kpps
ISISv4 0.014msec • 1 IS-IS neighbor
• Traffic rate @1.4mpps
IPv4 Multicast 20msec • 1k IGMP group join
• 3 OIFs per group
GRE Tunnel 0.14msec • 1 GRE tunnel
eBGP with MPLS VPN 0.0176msec • 4k eBGP sessions
• 1M VPNv4 prefix
EVC 0.019msec • 1k EFPs
• Traffic rate @4.2mpps
EoMPLS 0.0168msec • 1 EoMPLS VC
VPLS 0.026msec • 4k EFPs
• Traffic rate @1.08mpps
PTA 0.285msec • 90 PPPoE session
• Traffic rate @710kpps
LNS 0.042msec • 10 PPP sessions
71
ISSU MDR Demo
Summary and Take Away
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Operating an ASR 1000 Summary and Take Away …
• To improve network reliability and operation simplicity
1. Follow the proper system operation procedure
2. Know features and resources dependency
3. Proactively monitor key system resources
4. Adopt best practices and implement recommendations
FECP Mem
Proactive Monitoring
Crypto QFP IOS CPU DRAM TCAM RP CPU RP Mem IOS Mem
74
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Relevant Sessions at Cisco Live 2014
Breakout Sessions
• BRKARC-2001 Cisco ASR1000 Series Routers: System & Solution Architectures
• BRKARC-2021 - IOS XE Advanced Troubleshooting (NAT, VPN, FW packet forwarding)
• BRKCRS-3147 Advanced troubleshooting of the ASR1K and ISR 4451-X made easy
75
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Complete Your Online Session Evaluation
• Give us your feedback and you could win fabulous prizes. Winners announced daily.
• Complete your session evaluation through the Cisco Live mobile app or visit one of the interactive kiosks located throughout the convention center.
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
76
© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public
Continue Your Education
• Demos in the Cisco Campus (ASR1001-X Live Demo)
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
77
Thank you.