operating an asr 1000 -...

79

Upload: vuongminh

Post on 10-Apr-2019

319 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467
Page 2: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

Operating an ASR 1000

BRKARC-2019

Jason Yang – CCIE #10467

Technical Marketing Engineer

Page 3: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Session Goals

• With over 100,000+ ASR 1000 chassis deployments in the field, customers are thirsty for recommendations and best practices how to operating this platform effectively in their networks.

• This session will share

1. What and how to monitor ASR 1000 during daily operation, including best practice & recommendations.

2. How to detect DoS Attack and Mitigation Best Practices on ASR 1000

3. The most popular problems seen in the field, the workaround and solutions to prevent them from happening.

ASR 1000 ISSU with MDR Demo!

3

Page 4: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. Presentation_ID Cisco Public

Agenda

• Platform Introduction

• What and How to Monitor System in Daily Operation

• DoS Attack Detection and Mitigation Best Practices

• Troubleshooting Common Problems

• In Service Software Upgrade with Minimal Disruptive Restart (Demo)

• Summary and Take Away

4

Page 5: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

Platform Introduction

Page 6: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Midplane

ASR1000 Building Blocks

ES

P FECP

QFP Crypto

Assist.

interconnect

PPE BQS

ES

P

FECP

QFP Crypto

Assist.

interconn.

PPE BQS

FECP

Crypto

Assist.

interconnect

RP

CPU

interconn. GE switch S

IP

SPA SPA

IOCP SPA

Aggreg.

interconnect

RP

CPU

interconn. GE switch

SIP

SPA SPA

IOCP SPA

Aggreg.

interconnect

SIP

SPA SPA

IOCP SPA

Aggreg.

interconnect

Route Processor

Handles control plane

Manages system Embedded Service Processor

Handles forwarding plane traffic

SPA Interface Processor

Houses SPA’s

Buffer packets in & out

• Route Processor (RP) • Handles control plane traffic • Manages system

• Embedded Service Processor (ESP) • Handles forwarding plane traffic

• SPA Interface Processor (SIP) • Shared Port Adapters provide interface

connectivity

• Centralized Forwarding Architecture • All traffic flows through the active ESP,

standby is synchronized with all flow state with a dedicated 10-Gbps link

• Distributed Control Architecture • All major system components have a

powerful control processor dedicated for control and management planes

6

Page 7: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR 1000 Software (IOS XE) Architecture

ES

P

RP

IOS

active

Platform Adaptation Layer

(PAL)

Forwarding

manager

SIP

IOS

standby

Chassis

manager

Linux Kernel

Forwarding

manager Chassis

manager

Linux Kernel

QFP client / driver

QFP code

Linux Kernel

Chassis

manager

SPA driver SPA driver SPA driver

• Runs Control Plane

• Generates configurations

• Maintains routing tables (RIB, FIB…)

• Initialization of RP processes

• Initialization of installed cards

• Detects and manages OIR of cards

• Manages system status,

environmentals, power, EOBC

• Provides abstraction layer between

hardware & IOS

• Manages ESP redundancy

• Maintains copy of FIB and interface list

• Communicates FIB status to active &

standby ESP

• Programs QFP forwarding

plane and QFP DRAM

• Statistics collection & RP

communication

• Communicates with forwarding

manager on RP

• Maintains copy of FIBs

• Provides interface to QFP

client & driver

• Driver Software for SPA

interface cards is loaded

independently

• Failure or upgrade of driver

does not affect other SPAs

in the chassis

Control

messaging

7

Page 8: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR1000 Chassis ASR 1001 ASR 1002 ASR 1002-X

ASR 1004 ASR 1006

AS

R 1

01

3

Expansion slots 1 SPA slots 3 SPA slots 3 SPA slots 8 SPA slot

using 2 SIP cards

12 SPA slots

using 3 SIP cards

24 SPA slots

using 6 SIP cards

RP Slots Integrated Integrated Integrated 1 2 2

ESP Slots Integrated 1 Integrated 1 2 2

SIP Slots Integrated Integrated Integrated 2 3 6

IOS Redundancy Software Software Software Software Hardware Hardware

Built-In Ethernet 4 GE 4 GE 6 GE N/A N/A N/A

Height 1.75” (1RU) 3.5” (2RU) 3.5” (2RU) 7” (4RU) 10.5” (6RU) 22.7” (13RU)

Bandwidth 2.5 to 5 Gbps 5 to 10 Gbps 5 to 36 Gbps 10 to 40 Gbps 10 to 100 Gbps 40 to 200 Gbps

Max Output Pwr 400W 470W 470W 765W 1275W 3200W

Airflow Front to back Front to back Front to back Front to back Front to back Front to back

8

Page 9: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

What and How to Monitor - System Bootup

Page 10: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• ASR 1000 image can be booted from

1. Bootflash (best practice – supported in all chassis/RP)

2. Harddisk storage purpose

3. External USB The only official support USB: MEMUSB-1024FT, non-Cisco USB can result in Kernel crash

Once image booted from USB, can not remove it, otherwise can result in Kernel crash

The best practice is to use USB to copy image to bootflash and boot from bootflash

4. TFTP

Where an image can be booted from

ASR1001 ASR1002 ASR1002-X RP1 RP2

Built-in eUSB

Bootflash

8GB 8GB 8GB 1GB 2GB

Harddisk N/A N/A 160GB (optional) 40GB 80GB

External USB MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT MEMUSB-1024FT

10

Page 11: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Mastership determines which RP becomes RPact (and which RP becomes RPsby)

• R0/R1, F0/F1 whichever boot up first become the Master, if boot up simultaneously R0/F0 is preferred over R1/F1 as Master.

• Status of ASR 1000 hardware component is kept in the RPs chassis management process CMRP

ASR 1000 Initialization Sequence

POST

HW Initialization

Initialize EOBC

Boot Kernel

Start IOS

CMRP detects cards via CPLD

CMRP determines Master RP and ESP

CMRP informs SIPs & ESP about Master via I2C

CMRP downloads SIP & ESP software packages to SIP / ESP

CMRP sends ESI config to CMSIP and CMESP

POST

HW Initialization

Initialize EOBC

Wait for RP Master

Detect RPact via ROMMON

Upload inventory via CPLD

ROMMON download software package

Boot Kernel

CMESP registers with CMRP

CMESP starts QFP

CMESP signals ready to RP

CMESP sends ESI link status

POST

HW Initialization

Initialize EOBC

Wait for RP Master

Detect RPact via ROMMON

Upload inventory via CPLD

ROMMON download software package

Boot Kernel

CMSIP registers with CMRP

CMSIP starts IOS-XE for SPAs

CMSIP sends ESI link status

RP ESP SIP

11

Page 12: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• To check the status of each Module, use the show platform.

• This command is aware of the status of each Module

• Syslog is also generated for module status change

Display module status

ASR1000# show platform

Chassis type: ASR1006

Slot Type State Insert time (ago)

--------- ------------------- --------------------- -----------------

1 ASR1000-SIP10 ok 6d17h

1/0 SPA-1X10GE-L-V2 ok 6d17h

1/1 SPA-8X1GE-V2 ok 6d17h

2 ASR1000-SIP10 ok 6d17h

2/0 SPA-1X10GE-L-V2 ok 6d17h

2/1 SPA-8X1GE-V2 ok 6d17h

R0 ASR1000-RP1 ok, active 6d17h

R1 ASR1000-RP1 ok, standby 6d17h

F0 ASR1000-ESP10 ok, active 6d17h

F1 ASR1000-ESP10 ok, standby 6d17h

P0 ASR1006-PWR-DC ok 6d17h

P1 ASR1006-PWR-DC ps, fail 6d17h

Jun 26 07:35:09.169 UTC: %IOSXE_PEM-1-PEMFAIL: The PEM in slot 1 is switched off or encountering a failure condition

12

Page 13: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

IOS XE 'show version' Display Improvement

Before XE3.10 After XE3.10

ASR1000# show version

Cisco IOS Software, IOS-XE Software

(X86_64_LINUX_IOSD-ADVENTERPRISE-M),

Version 15.1(1)S, RELEASE SOFTWARE

(fc1)

Technical Support:

http://www.cisco.com/techsupport

Copyright (c) 1986-2010 by Cisco

Systems, Inc.

Compiled Mon 22-Nov-10 12:19 by

mcpre

ASR1000# show version

Cisco IOS XE Software, Version

03.10.00.S - Extended Support

Release

Cisco IOS Software, ASR1000 Software

(X86_64_LINUX_IOSD-UNIVERSALK9-M),

Version 15.3(3)S, RELEASE SOFTWARE

(fc1)

Technical Support:

http://www.cisco.com/techsupport

Copyright (c) 1986-2013 by Cisco

Systems, Inc.

Compiled Thu 25-Jul-13 18:03 by

mcpre

13

Page 14: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ROMmon Upgrade • ASR1k image has grown to more than 500MB in XE3.8, customer must

upgrade to 15.2(1r)S ROMMON release in order to boot this image.

• It is critical to meet ROMMON release requirement to be able to boot up system and FRUs successfully - Read ROMmon Release Requirements

- Follow ROMmon upgrade procedure

ASR1000# copy ftp://asr:[email protected]/asr1000-rommon.152-1r.S.pkg bootflash:

Accessing ftp://*****:*****@223.255.254.234/asr1000-rommon.152-1r.S.pkg... Loading asr1000-rommon.152-1r.S.pkg !!!!! [OK -

1253680/4096 bytes]

1253680 bytes copied in 0.716 secs (1750950 bytes/sec)

ASR1000# upgrade rom-monitor filename bootflash:asr1000-rommon.152-1r.S.pkg all

Chassis model ASR1001 has a single rom-monitor.

Upgrade rom-monitor

Target copying rom-monitor image file File /tmp/rommon_upgrade/latest.bin is a FIPS ROMMON image 65536+0 records in 65536+0

records out 65536+0 records in 65536+0 records out Checking upgrade image... 1114112+0 records in 2176+0 records out Upgrade image

MD5 signature is fe18056d332dced800d0632a0f629675 65536+0 records in 65536+0 records out 65536+0 records in 65536+0 records out

65536+0 records in 65536+0 records out Burning upgrade partition... 1114112+0 records in 1114112+0 records out

Checking upgrade partition... 1114112+0 records in 1114112+0 records out Upgrade flash partition MD5 signature is

fe18056d332dced800d0632a0f629675 ROMMON upgrade complete.

To make the new ROMMON permanent, you must restart the RP

ASR1000# reload

14

Page 15: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Problem: You may reuse a SPA from other Cisco platforms, and the existing SPA FPD is incompatibility with ASR1k image, SPA is “out of service”.

SPA Field-Programmable Devices (FPD)

*Sep 10 03:30:47.921: %SPA_OIR-3-SPA_POWERED_OFF: subslot 0/0: SPA 1xOC3 ATM SPA powered off after 5 failures

within 1200 seconds

*Sep 10 03:30:47.921: %SPA_OIR-6-OFFLINECARD: SPA (SPA-1XOC3-ATM-V2) offline in subslot 0/0

*Sep 10 03:30:47.913: %ATMSPA-3-HW_ERROR: SIP0/0: SPA-1XOC3-ATM-V2[0/0] Error 0x1C53 SPI4 initialization failed

ASR1000#show platform

Chassis type: ASR1006

Slot Type State Insert time (ago)

--------- ------------------- --------------------- -----------------

0 ASR1000-SIP40 ok 00:03:31

0/1 SPA-1XOC3-ATM-V2 out of service 00:00:55

R0 ASR1000-RP2 ok, active 00:03:31

F0 ASR1000-ESP40 ok, active 00:03:31

P0 ASR1006-PWR-AC ok 00:03:15

P1 ASR1006-PWR-AC ps, fail 00:03:15

ASR1000#show hw-module subslot all fpd

==== ====================== ====== =============================================

H/W Field Programmable Current Min. Required

Slot Card Type Ver. Device: "ID-Name" Version Version

==== ====================== ====== ================== =========== ==============

0/1 SPA-1XOC3-AT<DISABLED> 1.80 ???????????? ?.? ?.?

==== ====================== ====== =============================================

15

Page 16: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• How to fix it: follow FPD upgrade procedure

FPD – Cont’d

ASR1000# upgrade hw-module subslot 0/1 fpd bundled % Cannot get FPD version information from SPA-1XOC3-ATM-V2 in subslot 0/1.

If a previous upgrade attempt on the target card was interrupted, then the corruption of FPD image might have prevented the card from coming online. If this is the

case, then a recovery upgrade would be required to fix the failure

(Hit ENTER to proceed with recovery upgrade operation) [confirm] Y

% The following FPD will be upgraded for SPA-1XOC3-ATM-V2 (H/W ver = 1.80) in subslot 0/1:

================== =========== =========== ============

Field Programmable Current Upgrade Estimated

Device: "ID-Name" Version Version Upgrade Time

================== =========== =========== ============

1-I/O FPGA ?.? 2.2 00:07:00

================== =========== =========== ============

% NOTES:

- Use 'show upgrade fpd progress' command to view the progress of the FPD

upgrade.

- Since the target card is currently in disabled state, it will be automatically reloaded after the upgrade operation for the changes to take effect.

% Do you want to perform the recovery upgrade operation? [no]: yes

% Starting recovery upgrade operation in the background ...

(Use "show upgrade fpd progress" command to see upgrade progress)

ASR1000#

*Sep 9 22:44:10.604: %FPD_MGMT-6-UPGRADE_TIME: Estimated total FPD image upgrade time for SPA-1XOC3-ATM-V2 card in subslot 0/1 = 00:07:00.

*Sep 9 22:44:10.873: %FPD_MGMT-6-UPGRADE_START: I/O FPGA (FPD ID=1) image upgrade in progress for SPA-1XOC3-ATM-V2 card in subslot 0/1. Updating to version 2.2. PLEASE

DO NOT INTERRUPT DURING THE UPGRADE PROCESS (estimated upgrade completion time = 00:07:00) ...

ASR1000# show upgrade fpd progress FPD Image Upgrade Progress Table:

==== =================== ==================================================== Approx.

Field Programmable Time Elapsed

Slot Card Type Device : "ID-Name" Needed Time State

==== =================== ================== ========== ========== ===========

0/1 SPA-1XOC3-ATM-V2 1-I/O FPGA 00:07:00 00:02:52 Updating...

==== =================== ====================================================

16

Page 17: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

What and How to Monitor - Management Interface & Features

Page 18: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Mgmt Interface • ASR 1000 has out-of-band Mgmt GE interface attached to the RP

• This interface on a default Mgmt-vrf, can not be removed/changed

• Many mgmt features needs to be configured with vrf options or use Gig0 as source interface: tftp, ntp, snmp, syslogging, tacacs/radius

!!!! ntp

ntp server vrf Mgmt-intf 10.1.1.1 !

!!!! logging

logging host 10.1.1.1 vrf Mgmt-intf

!!!! domain name assignment

ip domain-name vrf Mgmt-intf ntt.com

!!!! DNS service

ip name-server vrf Mgmt-intf 5.20.1.2

!!!! tftp

ip tftp source-interface GigabitEthernet0

!!!! radius server

aaa group server radius foo / ip vrf forwarding

Mgmt-intf

!!!! tacacs+ server

aaa group server tacacs+ bar / ip vrf forwarding

Mgmt-intf

!!!! snmp

snmp-server source-interface traps gigabitEthernet 0

!!!! FTP service

ip ftp source-interface gigabitEthernet 0

18

Page 19: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• There are few exceptions: Flexible Netflow Export & NAT/FW High Speed Logging (HSL).

• They are directly exported by QFP.

• HSL - ASR 1000 export Netflowv9-like records to an external collector for session creation/deletion events with 5-tuples.

• HSL export rate ~78k events/sec

• HSL supported collector – Isarflow, Lancope, ActionPacked.

Mgmt Interface – cont’d

19

Page 20: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• CDP

1. The behavior on other Cisco routing platform is that cdp is enabled by default.

2. On ASR1k CDP is disabled on global & interface level by default – need explicitly turn it on.

• NTP – for 7200 migration, ntp update-calendar command is not needed on ASR1k

1. On ASR 1000, NTP runs within IOS daemon (IOSd), which updates the time on the Linux kernel.

2. As the Linux kernel updates the hardware clock every 11 minutes, NTP does not interact with the hardware clock directly.

CDP, NTP

20

Page 21: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

What and How to Monitor - Facility & Environment

Page 22: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR1000 PEM (Power Entry Module) = P/S + Integrated FANs

• P/S Failure:

• The power supplies are redundant. If one of the P/S in a PEM fails, then the system will continue to function with the redundant P/S in the second PEM

• Failure of a P/S does not affect the FANs; The fans source 12 VDC from the backplane, just like the SIPs and will continue to function

• FAN Failure:

• A single fan failure has no impact on the other fans in the PEM

• On multi fan failure a critical alarm will be generated. The system will continue to run and the behavior would be based on where the fan failure occurred.

• For example, if a single fan failed in PEM1 and PEM2 the system would run without issues. But if 2 fans failed in PEM2 (or PEM1) it’s possible that insufficient cooling would eventually result in unpredictable system behavior (most likely a card would stop working).

PEM1

PEM2

PEM1

PEM2

22

Page 23: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Facilities & Environment can be monitored via

1. SNMP: CISCO-ENTITY-FRU-CONTROL-MIB to monitor FRU status, CISCO-ENTITY- ALARM-MIB to monitor power supply and fan, CISCO-ENTITY-SENSOR-MIB to monitor sensors

2. Show command

• Configure the below CLIs to generate the traps snmp-server enable traps fru-ctrl

snmp-server enable traps alarms

• Recommended traps to monitor cefcModuleStatusChange

cefcPowerStatusChange

cefcFRUInserted

cefcFRURemoved

entConfigChange

entSensorThresholdNotification

Facility & Environment Monitoring

ASR1000# show facility-alarm status

System Totals Critical: 1 Major: 1 Minor: 0

Source Severity Description [Index]

------ -------- -------------------

Cisco ASR1004 AC Power Sup Critical Power Supply Failure [0]

SPA subslot 0/1 MAJOR Unknown state [0]

ASR1000# show environment all | inc R0

V1: VMA R0 Normal 1201 mV

V1: VMB R0 Normal 2495 mV

V1: VMC R0 Normal 3295 mV

V1: VMD R0 Normal 2495 mV

V1: VME R0 Normal 1796 mV

V1: VMF R0 Normal 1528 mV

Temp: Outlet R0 Normal 28 Celsius

Temp: CPU AIR R0 Normal 30 Celsius

Temp: Inlet R0 Normal 21 Celsius

Temp: SCBY AIR R0 Normal 41 Celsius

Temp: MCH DIE R0 Normal 48 Celsius

Temp: MCH AIR R0 Normal 36 Celsius

Temp: C2D C0 R0 Normal 32 Celsius

Temp: C2D C1 R0 Normal 32 Celsius

23

Page 24: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Before using CISCO-ENTITY-SENSOR-MIB to monitor env, 1st use ENTITY-MIB to find out entPhysicalDescr ID:

Facility & Environment Monitoring – cont’d

• Then search CISCO-ENTITY-SENSOR-MIB for required data, such as polling RP CPU temperature ENTITY-MIB::entPhysicalDescr.8022 = STRING: Temp: CPU AIR

[root@shmcp-lnx-1 ~]# snmpwalk -v 2c -c public 5.28.28.10

1.3.6.1.2.1.47.1.1.1.1.2 | more ENTITY-MIB::entPhysicalDescr.1 = STRING: Cisco ASR1013 Chassis

ENTITY-MIB::entPhysicalDescr.2 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.3 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.4 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.5 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.6 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.7 = STRING: CC Slot

ENTITY-MIB::entPhysicalDescr.8 = STRING: RP Slot

ENTITY-MIB::entPhysicalDescr.9 = STRING: RP Slot

ENTITY-MIB::entPhysicalDescr.10 = STRING: FP Slot

ENTITY-MIB::entPhysicalDescr.11 = STRING: FP Slot

ENTITY-MIB::entPhysicalDescr.12 = STRING: Power Supply Bay

ENTITY-MIB::entPhysicalDescr.13 = STRING: Cisco ASR1013 AC Power Supply

ENTITY-MIB::entPhysicalDescr.14 = STRING: PEM Iout

ENTITY-MIB::entPhysicalDescr.15 = STRING: PEM Vout

ENTITY-MIB::entPhysicalDescr.16 = STRING: PEM Vin

ENTITY-MIB::entPhysicalDescr.17 = STRING: Temp: PEM

ENTITY-MIB::entPhysicalDescr.18 = STRING: Temp: FC

ENTITY-MIB::entPhysicalDescr.23 = STRING: Power Supply

ENTITY-MIB::entPhysicalDescr.24 = STRING: Fan

ENTITY-MIB::entPhysicalDescr.25 = STRING: Fan

ENTITY-MIB::entPhysicalDescr.26 = STRING: Fan

ENTITY-MIB::entPhysicalDescr.32 = STRING: Power Supply Bay

ENTITY-MIB::entPhysicalDescr.8022 = STRING: Temp: CPU AIR

[root@shmcp-lnx-1 ~]# snmpwalk -v 2c -c public 5.28.28.10

1.3.6.1.4.1.9.9.91 | grep 8022

CISCO-ENTITY-SENSOR-MIB::entSensorValue.8022 = INTEGER: 30

CISCO-ENTITY-SENSOR-MIB::entSensorStatus.8022 = INTEGER: ok(1)

24

Page 25: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

What and How to Monitor - System Resources used by Features

Page 26: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Feature <> ESP Resources Dependency

QFP complex

Crypto

(Nitrox-II

CN2430)

FECP GE, 1Gbps

I2C

SPA Control

SPA Bus

ESI, 11.2Gbps

SPA-SPI, 11.2Gbps

Hypertransport, 10Gbps

Other

RPs RPs RPs ESP SIPs

E-RP* PCI*

E-CSR

TCAM

Resource

DRAM

(512MB)

Packet Buffer

DRAM

(128MB)

Part Len /

BW SRAM

SA table

DRAM

Dispatcher Packet Buffer

DDRAM

Boot Flash

(OBFL,…)

JTAG Ctrl

Reset / Pwr Ctrl

Packet Processor Engines

PPE1 PPE2 PPE3 PPE4 PPE5

PPE6 PPE7 PPE8 PPE40

BQS

Reset / Pwr Ctrl

Interconnect

SPI Mux

Interconnect

EEPROM

Temp Sensor

• QoS Mark/Police

• NAT sessions

• IPSec SA

• Netflow Cache

• FW hash tables

• Memory for FECP

• QFP client / driver

• QoS Class maps

• FM FP

• Statistics

• ACL ACEs copy

• NAT config objects

• IPSec/IKE SA

• NF config data

• ZB-FW config objects

• QoS Queuing

• NAT VFR re-assembly

• IPSec headers

• Class/Policy Maps: QoS,

DPI, FW

• ACL/ACE, Route-map

• IPSec Security Association

class groups, classes, rules

26

Page 27: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Key System Resources to Monitor - Summary

IOS

Forwarding

Manager

Forwarding

Manager

QFP Client

Driver

Datapath

SIP

show proc cpu sort show mem stat

RP memory RP CPU

TCAM

resource DRAM

pkt memory

crypto assist QFP

ESP memory

show plat

software status

control-processor

brief

show plat

software status

control-processor

brief

show plat

software status

control-processor

brief

FECP CPU

show plat software

status control-

processor brief

show plat hardware

qfp active infra

exmem statistics

show platform hardware

crypto-device

utilization

show plat hardware qfp

active datapath util

summary

show plat hardware

qfp active tcam

resource-manager

-usage

85%

75%

• Each system resource monitoring is explained in details in following slides

75% 75%

27

Page 28: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Mitigation Plan when run out of resources

• Before upgrading RP, Memory or ESP, customer can immediate take following actions to reduce system utilization:

IOS/RP Memory 1. Reduce prefixes received from a peer

1. neighbor { ip-address} maximum-prefix <number of prefixes>

2. Turn off Software Redundancy

1. redundancy \ mode none

QFP Resources DRAM 1. Reduce NAT max-entries:

ip nat translation max-entries <number of entries>; nat64 translation max-entries <number of entries>

2. Reduce FW session limit:

1. parameter-map type inspect global \ session total <count>

3. Reduce FNF cache limit:

1. flow monitor M1 \ cache entries <number of entries>

28

Page 29: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• CPU Load in IOSd process

show processes cpu

• In IOSd, to investigate the memory is occupied by which process use the traditional command:

show memory

show memory allocating-process totals

Key System Resources to Monitor - IOSd CPU & Memory Utilization

29

Page 30: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• For an overview of each Module CPU load on the ASR 1000, use the following command:

Key System Resources to Monitor - Control CPU & Memory Utilization (1)

Load Average represents the process queue or process contention for CPU resources.

1. On a single core processor, an instantaneous load of “7” would mean that seven

processes were “ready to run”, one of which is currently running.

2. On a dual core processor, a load of “7” would represent seven processes were ready to run, two of which are currently running.

ASR1000# show platform software status control-processor brief

Load Average

Slot Status 1-Min 5-Min 15-Min

RP0 Healthy 0.06 0.06 0.01

RP1 Healthy 0.06 0.04 0.01

ESP0 Healthy 0.01 0.00 0.00

ESP1 Healthy 0.00 0.00 0.00

SIP1 Healthy 0.04 0.03 0.01

SIP2 Healthy 0.00 0.00 0.00

Sample EEM script to

trigger the Load

monitoring at section

end reference slide

30

Page 31: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Memory utilization is represented by the following: – Total – Total card memory

– Used – Consumed memory

– Free – Available memory

– Committed – Virtual memory committed to processes

Key System Resources to Monitor - Control CPU & Memory Utilization (2)

<continued from last show command output>

Memory (kB)

Slot Status Total Used (Pct) Free (Pct) Committed (Pct)

RP0 Critical 3919788 3891940 (94%) 27848 (0%) 2005100 (48%)

RP1 Healthy 3919788 1164924 (28%) 2754864 (66%) 1994212 (48%)

ESP0 Healthy 2030288 520744 (24%) 1509544 (71%) 2816620 (134%)

ESP1 Healthy 2030288 514972 (24%) 1515316 (72%) 2816356 (134%)

SIP1 Healthy 484332 311868 (59%) 172464 (32%) 262472 (50%)

SIP2 Healthy 484332 332252 (63%) 152080 (29%) 317648 (60%)

CPU Utilization

Slot CPU User System Nice Idle IRQ SIRQ IOwait

RP0 0 1.28 1.15 0.00 97.25 0.01 0.10 0.20

RP1 0 0.94 1.23 0.00 97.48 0.00 0.02 0.30

ESP0 0 0.56 0.66 0.00 98.76 0.00 0.00 0.00

ESP1 0 0.52 0.64 0.00 98.82 0.00 0.00 0.00

SIP1 0 0.47 0.45 0.00 99.04 0.00 0.01 0.00

SIP2 0 0.58 0.53 0.00 98.85 0.00 0.01 0.00

Status: Critical,

Warning, Healthy.

Definition in

reference slide at

section end

31

Page 32: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• CPU utilization is a two second relative percentage average of the number of processes requesting CPU resources at a given time and is represented by the following fields: – CPU – The allocated processor

– User – Non-Linux kernel processes

– System – Linux kernel process

– Nice – Low priority processes

– Idle – Percentage of time the CPU was inactive

– IRQ – Interrupts

– SIRQ – System Interrupts

– IOwait – Percentage of time CPU was waiting for IO

• To read real time util:

Key System Resources to Monitor - Control CPU & Memory Utilization (3)

*the first set of values is Invalid. Only the 2nd cycle or higher has valid CPU reported

ASR1000# show platform software process slot RP active monitor cycles 2 | inc Cpu|Mem

Cpu(s): 1.1%us, 1.0%sy, 0.0%ni, 97.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 16343244k total, 3988416k used, 12354828k free, 202964k buffers

Swap: 0k total, 0k used, 0k free, 1414668k cached

Cpu(s): 3.8%us, 0.3%sy, 0.0%ni, 95.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 16343244k total, 3988788k used, 12354456k free, 202964k buffers

Swap: 0k total, 0k used, 0k free, 1414796k cached

32

Page 33: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• To check process in each Module, use following command to check in VTY

• Enter “m” to sort by memory usage

Key System Resources to Monitor - Control CPU & Memory Utilization (4)

*the "monitor" command does not work with console, vty works by default.

*Don’t screen shot the 1st output, let the cycle go through few times.

ASR1000# monitor platform software process fp active

Tasks: 80 total, 4 running, 76 sleeping, 0 stopped, 0 zombie

Cpu(s): 1.0% us, 0.3% sy, 0.0% ni, 98.7% id, 0.0% wa, 0.0% hi, 0.0% si

Mem: 2030288k total, 525260k used, 1505028k free, 21228k buffers

Swap: 0k total, 0k used, 0k free, 192024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4750 root 20 0 645m 92m 31m S 0.7 4.6 26:36.97 cpp_cp_svr

5597 root 20 0 502m 45m 24m S 0.3 2.3 6:00.44 fman_fp_image

5737 root 20 0 16108 5732 4104 R 0.3 0.3 12:39.08 hman

7321 root 20 0 8876 2200 1712 R 0.3 0.1 0:00.03 in.telnetd

7392 binos 20 0 2496 1212 976 R 0.3 0.1 0:00.10 top

1 root 20 0 2132 632 544 S 0.0 0.0 0:10.63 init

33

Page 34: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• CISCO-PROCESS-MIB is made to support 64 bits architecture which runs on RP2/ASR1001/ASR1002-X

• CISCO-PROCESS-MIB is able to monitor CPUs on RP, ESP and SIP. Only Active RP/ESP can be monitored, not standby.

• Here is an example:

Key System Resources to Monitor - Control CPU & Memory Utilization (5)

1) Find out the index for the RP’s cpmCPUTotal1min

sw-mrrbu-nms-1-76-> getmany -v2c 9.0.0.52 cpmCPUTotalPhysicalIndex

cpmCPUTotalPhysicalIndex.2 = 7031 ->7031 is RP cpu physical index in entity

mib, so use 2 as index for RP cpmCPUTotal1min

2) The OID used to retrieve instance for the RP’s cpmCPUTotal1min

sw-mrrbu-nms-1-77-> getone -v2c 9.0.0.52 cpmCPUTotal1min.2

cpmCPUTotal1min.2 = 58

Please note that “cpmCPUTotal1min.2” is same as OID “1.3.6.1.4.1.9.9.109.1.1.1.1.4.2”

34

Page 35: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• To display the QFP utilization, use the following command:

Key System Resources to Monitor - QFP & Resource DRAM Utilization (1)

ASR1000# show platform hardware qfp active datapath utilization summary

CPP 0: 5 secs 1 min 5 min 60 min

Input: Total (pps) 1625349 1625340 1625345 1625345

(bps) 1708810504 1708399184 1708085344 1708039368

Output: Total (pps) 1625333 1625338 1625344 1625344

(bps) 1786828168 1786418448 1786105008 1786059008

Processing: Load (pct) 2 2 2 2

>=99% indicates

crypto chip is

reaching perf limit

>=99% indicates

crypto chip is

reaching perf limit

>=99% indicates

crypto chip is

reaching perf limit

>=97% indicates QFP

chip is reaching

perf limit

35

Page 36: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• DRAM on QFP usage can be found on the following command

Key System Resources to Monitor - QFP & Resource DRAM Utilization (2)

ASR1000# show platform hardware qfp active infrastructure exmem statistics QFP exmem statistics

Type: Name: DRAM, QFP: 0

Total: 1073741824

InUse: 124180480

Free: 949561344

Lowest free water mark: 949561344

Type: Name: IRAM, QFP: 0

Total: 134217728

InUse: 8134656

Free: 126083072

Lowest free water mark: 126083072

Type: Name: SRAM, QFP: 0

Total: 32768

InUse: 15088

Free: 17680

Lowest free water mark: 17680

%util = InUse/Total

36

Page 37: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ceqfpUtilProcessingLoad.9072.1 = 2

ceqfpUtilProcessingLoad.9072.2 = 2

ceqfpUtilProcessingLoad.9072.3 = 2

ceqfpUtilProcessingLoad.9072.4 = 2

ceqfpMemoryResTotal.9072.1 = 1073741824

ceqfpMemoryResInUse.9072.1 = 124180480

ceqfpMemoryResFree.9072.1 = 949561344

ceqfpMemoryResLowFreeWatermark.9072.1 = 949561344

ceqfpMemoryResRisingThreshold.9072.1 = 97

ceqfpMemoryResFallingThreshold.9072.1 = 93

bash-2.05b$ getmany 9.76 ciscoEntityQfpMIB

ceqfpUtilInputTotalPktRate.9072.1 = 0x00018cd05

ceqfpUtilInputTotalPktRate.9072.2 = 0x00018ccfc

ceqfpUtilInputTotalPktRate.9072.3 = 0x00018cd01

ceqfpUtilInputTotalPktRate.9072.4 = 0x00018cd01

ceqfpUtilInputTotalBitRate.9072.1 = 0x065da6108

ceqfpUtilInputTotalBitRate.9072.2 = 0x065d41a50

ceqfpUtilInputTotalBitRate.9072.3 = 0x065cf5060

ceqfpUtilInputTotalBitRate.9072.4 = 0x065ce9cc8

ceqfpUtilOutputTotalPktRate.9072.1 = 0x00018ccf5

ceqfpUtilOutputTotalPktRate.9072.2 = 0x00018ccfa

ceqfpUtilOutputTotalPktRate.9072.3 = 0x00018cd00

ceqfpUtilOutputTotalPktRate.9072.4 = 0x00018cd00

ceqfpUtilOutputTotalBitRate.9072.1 = 0x06a80d588

ceqfpUtilOutputTotalBitRate.9072.2 = 0x06a7a9510

ceqfpUtilOutputTotalBitRate.9072.3 = 0x06a75ccb0

ceqfpUtilOutputTotalBitRate.9072.4 = 0x06a751900

• CISCO-ENTITY-QFP-MIB to monitor QFP Processing & Memory Util

Key System Resources to Monitor - QFP & Resource DRAM Utilization (3)

37

Page 38: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Exceeding 95% threshold:

*Sep 24 10:15:14.249: %BW_LICENSE-5-THROUGHPUT_THRESHOLD_LEVEL: F0:

cpp_ha: Average throughput rate

had exceeded 95 percent of licensed bandwidth 10000000000 bps 1 times, sample period

300 seconds, in last 24 hours

Exceeding total bw:

Sep 24 10:42:28.450: %BW_LICENSE-4-THROUGHPUT_MAX_LEVEL: F0: cpp_ha:

Average throughput rate had

exceeded the total licensed bandwidth 10000000000 bps and dropped 1 times, sample

period 300 seconds, in last 24 hours.

• Syslog when throughput exceeds BW license (ASR1001, ASR1002-X)

Key System Resources to Monitor - QFP & Resource DRAM Utilization (4)

38

Page 39: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

bash-2.05b$ getmany -v2c 2.0.40.25 clmgmtLicenseFeatureName

clmgmtLicenseFeatureName.7000.2.1 = adventerprise

clmgmtLicenseFeatureName.7000.2.2 = advipservices

clmgmtLicenseFeatureName.7000.2.3 = fwnat_red

clmgmtLicenseFeatureName.7000.2.4 = ipsec

clmgmtLicenseFeatureName.7000.2.5 = lawful_intr

clmgmtLicenseFeatureName.7000.2.6 = sw_redundancy

clmgmtLicenseFeatureName.7000.2.7 = throughput_10g

clmgmtLicenseFeatureName.7000.2.8 = throughput_20g

clmgmtLicenseFeatureName.7000.2.9 = throughput_36g

• CISCO-LICENSE-MGMT-MIB to manage throughput license inUse

Key System Resources to Monitor - QFP & Resource DRAM Utilization (5)

bash-2.05b$ getmany -v2c 2.0.40.25 clmgmtLicenseStatus

clmgmtLicenseStatus.7000.2.1 = notInUse(2)

clmgmtLicenseStatus.7000.2.2 = notInUse(2)

clmgmtLicenseStatus.7000.2.3 = notInUse(2)

clmgmtLicenseStatus.7000.2.4 = notInUse(2)

clmgmtLicenseStatus.7000.2.5 = notInUse(2)

clmgmtLicenseStatus.7000.2.6 = notInUse(2)

clmgmtLicenseStatus.7000.2.7 = inUse(3)

clmgmtLicenseStatus.7000.2.8 = notInUse(2)

clmgmtLicenseStatus.7000.2.9 = notInUse(2)

39

Page 40: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR1000# show platform hardware qfp active tcam resource-manager usage QFP TCAM Usage Information

80 Bit Region Information

--------------------------

Name : Leaf Region #0

Number of cells per entry : 1

Current 80 bit entries used : 0

Current used cell entries : 0

Current free cell entries : 0

160 Bit Region Information

--------------------------

Name : Leaf Region #1

Number of cells per entry : 2

Current 160 bits entries used : 6

Current used cell entries : 12

Current free cell entries : 4084

320 Bit Region Information

--------------------------

Name : Leaf Region #2

Number of cells per entry : 4

Current 320 bits entries used : 0

Current used cell entries : 0

Current free cell entries : 0

Total TCAM Cell Usage Information

----------------------------------

Name : TCAM #0 on CPP #0

Total number of regions : 3

Total tcam used cell entries : 12

Total tcam free cell entries : 524276

Threshold status : below critical limit

• QFP TCAM usage can be found in following command:

Key System Resources to Monitor - TCAM (1)

40

Page 41: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Key System Resources to Monitor - TCAM (2)

• For TCAM monitoring, keep an eye on syslog:

%QFPTCAMRM-6-TCAM_RSRC_ERR: F0: QFP_sp: Allocation failed because of insufficient TCAM resources in the system

• Recommendations

1. Test out TCAM utilization before making changes

2. Always there should be unused TCAM entries which are = or > the size of biggest ACL on the router.

• Be aware of the TCAM deny jump issue (often seen in NAT/FW/IPsec deployment and workaround/solution)

41

Page 42: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Show platform hardware crypto-device utilization

Key System Resources to Monitor - Crypto Chip Utilization (1)

ASR1000# show platform hardware crypto-device utilization

Past crypto device utilization:

1 min (percentage) : 0%

(decrypt pkt): 220997

(encrypt pkt): 173747

5 min (percentage) : 0%

(decrypt pkt): 115381

(encrypt pkt): 897157

15 min (percentage) : 0%

(decrypt pkt): 3320368

(encrypt pkt): 2614638

>=97% indicates

crypto chip is

reaching perf limit

42

Page 43: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

bash-2.05b$ getmany 9.76 ciscoEntityQfpMIB

cepStatsMeasurement.9028.1.1 = Counter64: 0

cepStatsMeasurement.9028.1.5 = Counter64: 221029

cepStatsMeasurement.9028.1.6 = Counter64: 173838

cepStatsMeasurement.9028.2.1 = Counter64: 0

cepStatsMeasurement.9028.2.5 = Counter64: 1153432

cepStatsMeasurement.9028.2.6 = Counter64: 896529

cepStatsMeasurement.9028.3.1 = Counter64: 0

cepStatsMeasurement.9028.3.5 = Counter64: 3321126

cepStatsMeasurement.9028.3.6 = Counter64: 2614265

• CISCO-ENTITY-PERFORMANCE-MIB is able to monitor Crypto Chip Util

Key System Resources to Monitor - Crypto Chip Utilization (2)

43

Page 44: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Control-Process Health Definition (1)

Board FIELD WARNING CRITICAL FIELD WARNING CRITICAL FIELD WARNING CRITICAL

SIP10 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

SIP40 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP5 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP10 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP20 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP40 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP100 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ESP200 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

RP1 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

RP2 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ASR1001 1-MIN 8 12 5-MIN 8 12 15-MIN 10 15

ASR1002 1-MIN 5 8 5-MIN 5 8 15-MIN 5 8

ASR1002-X 1-MIN 8 12 5-MIN 8 12 15-MIN 10 15

• “show platform software status control-processor brief” output in slide 30, the Load Average Status can be Healthy, Warning and Critical, this table provides the Warning and Critical status threshold for each field

44

Page 45: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Control-Process Health Definition (2)

Board FIELD WARNING CRITICAL FIELD WARNING CRITICAL FIELD WARNING CRITICAL

SIP10 Committed 95% 100% MemFree 10% 5% MEMUSED 90% 95%

SIP40 Committed 95% 100% MemFree 10% 5% MEMUSED 90% 95%

ESP5 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ESP10 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ESP20 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ESP40 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ESP100 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ESP200 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

RP1 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

RP2 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ASR1001 Committed 300% 310% MemFree 10% 5% MEMUSED 90% 95%

ASR1002 Committed 90% 95% MemFree 10% 5% MEMUSED 90% 95%

ASR1002-X Committed 300% 310% MemFree 10% 5% MEMUSED 90% 95%

• “show platform software status control-processor brief” output in slide 31, the Memory Status can be Healthy, Warning and Critical, this table provides the Warning and Critical status threshold for each field

45

Page 46: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Triggered EEM Script to monitor system load

• This is a sample EEM script that monitors RP0 one minute load.

– A load of .70 triggers actions 1 through 5.

– Action 1 generates a log message when the script triggers.

– Actions 2 through 5 run CLI, outputs them to the bootflash, and appends the cpuinfo file

event manager applet capture_cpu_spike

event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.24.2 get-type exact entry-op ge entry-val 69

exit-time 180 poll-interval 2

action 1.0 syslog msg ”Load is high. Check bootflash:cpuinfo for details."

action 2.0 cli command "en"

action 3.0 cli command "show clock | append bootflash:cpuinfo"

action 4.0 cli command "show platform software status control-processor br | append

bootflash:cpuinfo"

action 5.0 cli command "show platform software process slot rp active monitor | append bootflash:cpuinfo"

46

Page 47: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

DoS Attack Detection and Mitigation Best Practices

Page 48: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• DoS attack is basically an attempt to make a resource unavailable to its intended users.”

1. Consumption of computational resources, such as bandwidth, or CPU cycles.

2. Disruption of configuration information, such as routing information.

3. Disruption of state information, such as unsolicited resetting of TCP sessions.

4. Obstructing the communication between the intended users and the router

• Additional targets of DoS attacks.

1. Trigger errors in packet forwarding.

2. Trigger errors in the sequencing of instructions, to force instability or lock-up.

3. Buffer starvation and/or system thrashing.

4. Crash the operating system itself

DoS Introduction

48

Page 49: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Example Attack Type

1. ICMP

SMURF

PING

2. TCP/SYN

3. Teardrop

Mangling packets structure/content

4. Nuke

Rapid packet generation

DoS Introduction – cont’d

49

Page 50: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• Buffer Overflow messages

• Packet Memory Out of resources messages

• CPUHOG Messages

• For example:

ASR1000#show logging

Syslog logging: enabled (0 messages dropped, 18 messages rate-limited, 58 flushes, 0 overruns, xml disabled, filtering disabled)

……

Apr 9 22:12:21.399 JST: %IOSXE-2-PLATFORM: F1: cpp_cp: QFP:00 Thread:077

TS:00022029349683022400 %HAL_PKTMEM-2-OUT_OF_RESOURCES:

Check Buffer Utilization

ASR1000#show buffers

Public buffer pools:

Small buffers, 104 bytes (total 4000, permanent 4000, peak 6010 @ 3w4d):

DoS Detection (1)

50

Page 51: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Detection (2)

Check CPU Utilization Check Process Resources

ASR1000# show processes cpu extended

Global Statistics

-----------------

5 sec CPU util 72%/26% Timestamp 3w5d

Queue Statistics

----------------

Common Process Information

-------------------------------

PID Name Prio Style

-------------------------------

443 PPPoE Discovery M New

118 ATM Periodic H New

172 Ethernet Timer C H New

173 Ethernet Msec Ti H New

CPU Intensive processes

-----------------------------------------------------------------

PID Total Exec Quant Burst Burst size Schedcall Schedcall

CPUms Count avg/max Count avg/max(ms) Count Per avg/max

-----------------------------------------------------------------

443 2523 34016 0/9 16997 0/9 17008 0/10

ASR1000# show processes 443

Process ID 443 [PPPoE Discovery Daemon], TTY 0

Memory usage [in bytes]

Holding: 822944, Maximum: 0, Allocated: 2941696, Freed: 1176616

Getbufs: 0, Retbufs: 0, Stack: 43288/48000

CPU usage

PC: 2D9A89F, Invoked: 63684786, Giveups: 31842392, uSec: 55

5Sec: 71.43%, 1Min: 78.51%, 5Min: 65.63%, Average: 0.00%

Age: 2273107279 msec, Runtime: 3564164 msec

State: Waiting for Event, Priority: Normal

51

Page 52: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Detection (3)

Check FP punt activity Check FP punt policer

ASR1000# show platform software infrastructure packet

Statistics for Punt Path activities:

19858208 total packets processed

0 minimum packet received, 2048 maximum packet received

0 minimum packet process switched, 7 maximum packet process

switched

0 msec minimum clock runtime, 30 msec maximum clock runtime

0 msec minimum cpu runtime, 2 msec maximum cpu runtime

6797817 puntpath invocation, 6797817 with message invocation

FP - Punt Policer:

ASR1000# show platform hardware qfp active infrastructure punt

statistics type global-drop

Global Drop Statistics

Number of global drop counters = 21

Counter ID Drop Counter Name Packets

-------------------------------------------------------------

016 PUNT_CAUSE_GLOBAL_POLICER 27117

ASR1000# show platform software punt-policer Per Punt-Cause Policer Configuration and Packet Counters

Punt Configured (pps) Conform Packets Dropped Packets

Cause Description Normal High Normal High Normal High

---------------------------------------------------------------------------------------------

- 2 IPv4 Options 4000 3000 0 0 0 0

3 Layer2 control and legacy 40000 10000 1203060 2146805 0 0

4 PPP Control 2000 1000 0 0 0 0

5 CLNS IS-IS Control 2000 1000 0 0 0 0

6 HDLC keepalives 2000 1000 0 0 0 0

7 ARP request or response 2000 1000 0 68540 0 0

8 Reverse ARP request or re... 2000 1000 0 0 0 0

9 Frame-relay LMI Control 2000 1000 0 0 0 0

10 Incomplete adjacency 2000 1000 0 5 0 0

11 For-us data 40000 5000 803926 0 0 0

12 Mcast Directly Connected ... 2000 1000 0 0 0 0

13 Mcast IPv4 Options data p... 2000 1000 0 0 0 0

14 MPLS TTL expired 5120 2000 0 0 0 0

19 Mcast Internal Copy 2000 1000 0 0 0 0

20 Mcast IGMP Unroutable 2000 1000 0 0 0 0

24 Glean adjacency 2000 5000 0 35052 0 0

25 Mcast PIM signaling 2000 1000 0 0 0 0

27 ESS session control 10000 40000 0 30507493 0 288003062

52

Page 53: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Detection (4)

Check FP per-cause punt

ASR1000# show platform hardware qfp active infrastructure punt statistics type per-cause clear

Global Per Cause Statistics

Per Inject Cause Statistics

Packets Packets

Counter ID Inject Cause Name Received Transmitted

--------------------------------------------------------------------------------------

000 RESERVED 0 0

001 L2 control/legacy 0 0

002 QFP destination lookup 0 0

003 QFP IPv4/v6 nexthop lookup 0 0

004 QFP generated packet 0 0

005 QFP <->RP keepalive 2 0

006 QFP Fwall generated packet 0 0

007 QFP adjacency-id lookup 0 0

008 Mcast specific inject packet 0 0

009 QFP ICMP generated packet 0 0

010 QFP/RP->QFP ESS data packet 0 0

011 SBC DTMF 0 0

012 ARP request or response 0 0

013 Ethernet OAM loopback packet 0 0

014 Ingress redirect packet 0 0

015 PPPoE discovery packet 48764 48741

016 PPPoE session packet 0 0

53

Page 54: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Mitigation (1)

ASR1000 implemented global policer to rate limit punt packets @ 146484 pps/2.5Gbps, in addition implemented per cause punt policer based on common feature punt cause to classify punt packets into high & normal queues and set policing threashold for each.

Per cause policer can be seen via show platform software punt-policer

Control-Plane Policing is a security feature designed to protect control-plane

Linux Kernel

IOSd

Per-punt Policer Inject Logic

ESP

RP

CoPP

Following classification criteria are supported in CoPP:

– match access-group – match dscp – match ip dscp – match ip precedence – match precedence – match protocol arp – match protocol ipv6 – match protocol pppoe – match protocol pppoe-discovery – match qos-group – match ipv6 ACL HBH

Global Policer

54

Page 55: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Mitigation (2)

Global Config Interface Config

no ip source-route

ip arp gratuitous none

no ip gratuitous-arps

no ip bootp server

interface GigabitEthernet <num>

description UNI facing interface

no ip directed-broadcast

ip verify unicast reverse-path

ip access-group <SECURITY> in

ip access-group <SECURITY> out

55

Page 56: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Mitigation (3)

Control Plane Policing - ARP Control-Plane Policing - TRUSTED

HOSTS

Control-Plane Policing - SNMP

class-map match-all ARP

match protocol arp

!

policy-map CONTROL-PLANE-POLICY

class ARP

police rate 1 pps burst 50 packets

conform-action transmit

exceed-action drop

class-map match-all TRUSTED-HOSTS

match access-group name TRUSTED-HOSTS

!

ip access-list extended TRUSTED-HOSTS

remark allow traffic for trusted monitoring and

maintenance

permit udp any eq 1701 any eq 1701

permit icmp any any echo-reply

permit icmp any any unreachable

permit icmp any any time-exceeded

……

!

policy-map CONTROL-PLANE-POLICY

class TRUSTED-HOSTS

police rate 1 pps burst 100 packets

conform-action transmit

exceed-action transmit

class-map match-all SNMP

match access-group name SNMP

match access-group name TRUSTED-HOSTS

!

ip access-list extended SNMP

remark SNMP Servers

permit udp <SNMP Server> any eq snmp

!

policy-map CONTROL-PLANE-POLICY

class SNMP

police rate 100 pps burst 100 packets

conform-action transmit

exceed-action drop

56

Page 57: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

DoS Mitigation (4)

Control Plane Policing – IPv6 ICMP Control-Plane Policing – IPv6 TRUSTED

HOSTS

Control-Plane Policing – IPv6 Control

class-map match-all IPv6-ECHO-REQUEST-REPLY

match access-group name IPv6-ECHO-REQUEST-

REPLY

!

ipv6 access-list IPv6-ECHO-REQUEST-REPLY

permit icmp any any echo-request

permit icmp any any echo-reply

!

policy-map CONTROL-PLANE-POLICY

class IPv6-ECHO-REQUEST-REPLY

police rate 20 pps burst 20 packets

conform-action transmit

exceed-action drop

class-map match-all IPv6-TRUSTED-HOSTS

match access-group name IPv6-TRUSTED-HOSTS

!

ipv6 access-list IPv6-TRUSTED-HOSTS

<IPv6 Trusted Host Addresses>

!

policy-map CONTROL-PLANE-POLICY

class IPv6-TRUSTED-HOSTS

police rate 500 pps burst 1000 packets

conform-action transmit

exceed-action transmit

class-map match-all IPv6-CONTROL

match access-group name IPv6-CONTROL

!

ipv6 access-list IPv6-CONTROL

remark Permit NDP RA Type packets

permit icmp any any nd-ns

permit icmp any any nd-na

permit icmp any any router-advertisement

permit icmp any any router-solicitation

……

!

policy-map CONTROL-PLANE-POLICY

class IPv6-CONTROL

police rate 200 pps burst 1000 packets

conform-action transmit

exceed-action drop

57

Page 58: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

Troubleshooting Common Problems

Page 59: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Type of Crashes and Impact

Types of Crash The impact Crashinfo File Name Core Dump File Name File Location

IOSD Crash The box is reloaded (single RP)

IOSd switchover (dual IOSd)

RP switchover (dual RPs)

crashinfo_RP_SlotNumber

_00_Date-Time-Zone

hostname_RP_SlotNumber_ppc

_linux_iosd-_ProcessID.core.gz

Bootflash:

ASR1001, 1002,

1002-X

Harddisk: ASR1004,

1006, 1013

Bootflash:core/

ASR1001, 1002,

1002-X

Harddisk:core/

ASR1004, 1006,

10013

SPA Driver

Crash SPA reloaded

crashinfo_SIP_SlotNumbe

r_00_Date-Time-Zone

hostname_SIP_SlotNumber_mc

pcc-lc-ms_ProcessID.core.gz

QFP ucode

Crash ESP reload n/a

hostname_ESP_SlotNumber_cp

p-mcplo-ucode_ID.core.gz

IOS XE Process

Crash The module reloaded n/a

hostname_FRU_SlotNumber_P

rocessName_ProcessID.core.gz

Linux Kernel

Crash The module reloaded n/a

hostname_FRU_SlotNumber_k

ernel.core

59

Page 60: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR 1000 Route Scale vs. Memory Allocation

RP and Physical Memory Memory Allocated to IOSd

(w/o IOSd redundancy enabled)

Memory Allocated to Kernel and

other processes

IPv4 Route/FIB Scale

ASR 1001/1002-X (4GB) 1.2GB 2.8GB 500K/500K

ASR 1001/1002-X (8GB) 4GB 4GB 1M/1M

ASR 1001/1002-X (16GB) 7GB 9GB 1M/3.5M

RP1 (4GB) 1.7GB 2.3GB 1M

RP2 (8GB) 4.2GB 3.8GB 1M

RP2 (16GB) 10GB 6GB 4M

• Memory allocation is fixed by design, not configurable.

• ASR 1001/1002-X memory is shared among RP, ESP, SIP. Recommend 8GB for Internet Gateway deployment and do not turn on dual IOSd.

• Additional ISP peering becomes BGP multi-paths, additional path result in ~20% BGP memory consumption overhead. Have seen 2-5 peerings live deployment on ASR1001/1002-X (8GB).

• If turn on IOSd redundancy, memory allocated to each IOSd is further reduced by more than half. Dual IOSd requires minimum 8GB.

• ASR1001 & ASR1002-X are the most common Internet Gateway platforms, customer may have mistakenly deployed 4GB memory to cause memory allocation failure (often seen in 7200 migration).

60

Page 61: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Memory Upgrade on ASR1001, ASR1002-X & RP2

Memory PID Option Slot 0 (U101D) Slot 1 (U103D) Slot 2 (U100D) Slot 3 (U102D)

M-ASR1K-1001-4GB 2 GB module 2 GB module ___ ___

M-ASR1K-1001-8GB 2 GB module 2 GB module 2 GB module 2 GB module

M-ASR1K-1001-16GB 4 GB Module 4 GB Module 4 GB Module 4 GB Module

Memory PID Option Slot 0 (U2D0) Slot 1 (U2D1) Slot 2 (U1D0) Slot 3 (U1D1)

M-ASR1002X-4GB 2 GB module ___ 2 GB module ___

M-ASR1002X-8GB 2 GB module 2 GB module 2 GB module 2 GB module

M-ASR1002X-16GB 4 GB Module 4 GB Module 4 GB Module 4 GB Module

• Cisco ASR1002-X Router Memory DIMMs

• Cisco ASR1001 Router Memory DIMMs

Memory PID Option Slot 0 (U2D0) Slot 1 (U2D1) Slot 2 (U1D0) Slot 3 (U1D1)

M-ASR1K-RP2-8GB= 2 GB module 2 GB module 2 GB module 2 GB module

M-ASR1K-RP2-16GB= 4 GB Module 4 GB Module 4 GB Module 4 GB Module

• Cisco ASR1000-RP2 Memory DIMMs

4GB to 8GB Upgrade, replace all DIMMs

61

Page 62: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

TCAM Deny-Jump Issue

• Problem Description:

In ASR 1000 IPsec/FW/NAT deployment, user may see following message:

“%CPP_FM-3-CPP_FM_TCAM_ERROR: F0: cpp_sp: TCAM limit exceeded…”

• Error Message Explanation:

This is an protection mechanism prevents system from crashing with WATCH-DOG timeout error or malloc failure.

• Root Cause Analysis:

1. Classification engine in the TCAM can only represent permit.

2. System convertes the DENY entries into PERMIT ones using cross product

3. This recursive nature cause the required number of entries to “explode”.

62

Page 63: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

TCAM Deny-Jump Issue – cont’d • Workaround:

1. Before deploying the platform in production, apply the configuration in lab

2. Modify the ACLs to use multiple specific permit statement, and try to reduce or eliminate the explicit use of deny statement

3. Use PBR to bypass NAT

1. Static NAT

• Solutions:

1. IOS XE3.10 introduced the SW classification engine to handle deny-jump like classification

2. System still use TCAM as long as it has room, in case TCAM does not fit, it will switch to SW classification engine.

Original NAT Config VASI & PBR to bypass NAT

ip nat inside source list NAT-ACL pool NAT-POOL overload

!

ip access-list extended NAT-ACL

deny ip any 129.25.0.0 0.0.255.255

permit ip 172.19.0.0 0.0.0.255 any

ip nat inside source list NAT-ACL pool NAT-POOL overload

!

interface GigabitEthernet0/0/1

description nat inside interface

ip address 6.1.1.1 255.255.255.0

ip nat inside

ip policy route-map no-NAT-rmap

interface vasileft1

ip address 13.1.1.1

!

interface vasiright1

ip address 13.1.2.1 255.255.255.0

!

ip access-list extended NAT-ACL

permit ip 172.19.0.0 0.0.0.255 any

ip access-list extended bypass-NAT

permit ip any 129.25.0.0 0.0.255.255

!

route-map no-NAT-rmap permit 10

match ip address bypass-nat

set interface vasileft1

Original NAT Config Identity NAT

ip nat inside source list NAT-ACL pool NAT-POOL overload

!

ip access-list extended NAT-ACL

deny ip host 172.19.1.1 any

permit ip 172.19.0.0 0.0.0.255 any

ip nat inside source static 172.19.1.1 172.19.1.1 no-alias

ip nat inside source list NAT-ACL pool NAT-POOL overload !

ip access-list extended NAT-ACL

permit ip 172.19.0.0 0.0.0.255 any

63

Page 64: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

NAT ADDR ALLOC FAILURE • Problem Description:

In ASR 1000 PAT/Overload configuration, system get error message:

"%NAT-6-ADDR_ALLOC_FAILURE: Address allocation failed; pool 1 may be exhausted”

• Debug Information that should be gathered: show platform hardware qfp active feature nat data pool

show platform hardware qfp active feature nat data port

show platform hardware qfp active feature nat data stat

show platform hardware qfp active feature nat data base

show ip nat translation | inc <global address of interest>

• Common Reason for Failure:

Customer has a small pool which is being consumed by non-PATTAble binds.

A non-PATtable bind will show in 'sh ip nat trans' as a single local associated with a single global IP address.

It consumes an entire address in the pool.

--- 213.252.7.132 172.16.254.242 ---

64

Page 65: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

NAT ADDR ALLOC FAILURE – cont’d • Solution 1

1. A non-PAttable bind could be created by packet with a non-PATTable protocol.

2. The best way to prevent this is to tighten the ACL to exclude non-PAttable protocols.

• Solution 2

1. A non-PAttable bind could be created by ALG like DNS which does not have ports in its L7 header has requested a global NAT address.

2. Often customers do not need the DNS ALG so the solution is to turn it off.

3. Below shows the most common ALGs which produce non-PAttable binds being turned off.

access-list 100 permit udp 13.1.0.0 0.0.255.255 any

access-list 100 permit tcp 13.1.0.0 0.0.255.255 any

access-list 100 permit icmp 13.1.0.0 0.0.255.255 any

no ip nat service dns udp

no ip nat service dns tcp

no ip nat service netbios-ns tcp

no ip nat service netbios-ns udp

no ip nat service netbios-ssn

no ip nat service netbios-dgm

no ip nat service ldap

65

Page 66: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

In Service Software Upgrade with Minimal Disruptive Restart (MDR)

Page 67: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

• In Service Software Upgrade (ISSU) is a procedure backed by Cisco IOS infrastructure to accomplish an upgrade/downgrade while packet forwarding continues

• This procedure takes advantage of redundant processors, Cisco Graceful Restart, Non Stop Routing, SSO/NSF

• Minimal Disruptive Restart (MDR) keep interface UP and minimizes traffic disruption during ASR1k SIP/SPA upgrade by not resetting the hardware or reprogramming the data paths

ISSU and MDR

67

Page 68: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

1. RPBase: RP Linux operating system

Upgrading of the OS will require reload to the RP and expect minimal changes

2. RPIOS: IOS executable

facilitates Software Redundancy feature

3. RPAccess (K9 & non-K9): Software required for Router access

Two versions available (with and without open SSH & SSL)

facilitates software packaging for export-restricted countries

4. RPControl : control plane processes for IOS / hardware interface

IOS XE Middleware

5. ESPBase: All ESP code

Any software upgrade of the ESP requires reload of the ESP

6. SIPBase/ELCBase: SIP/ELC OS & control processes

OS upgrade requires reload of the SIP

7. SIPSPA/ELCSPA: SPA drivers and SPA FPD

Facilitates SPA driver upgrade of specific SPA slots

IOS XE Modularity made ISSU/MDR Possible

ES

P

RP

IOS

active

Platform Adaptation Layer

(PAL)

Forwarding

manager

SIP

/ELC

SSH/SS

L

Chassis

manager

Linux Kernel

Forwarding

manager Chassis

manager

Linux Kernel

QFP client

QFP driver

Linux Kernel

Chassis

manager

SPA driver SPA driver SPA driver

Control

messaging

1

3 2

4

5

6

7

68

Page 69: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

ASR 1000 Super-Package ISSU

ACT RP

ACT

ESP SIP

Version X

Version X

Version X

STBY RP

STBY

ESP

Version X

Version X

ACT RP

SIP

Version X

STBY RP

Version Y

Version X

issu loadversion

STBY RP

SIP (MDR)

Version X

ACT RP

ACT ESP

Version Y

Version Y

Version Y

issu runversion

(switchover)

issu acceptversion

(stop rollbacktimer)

issu commitversion

(finalizes new file version)

issu abortversion

Automatic rollback

or

issu abortversion

STBY RP

STBY ESP

SIP

Version Y

ACT RP

ACT ESP

Version Y

Version Y

hw-module slot

<STBY_RP> reload

Version Y

Version Y

STBY ESP

Version Y

ACT

ESP

Version X

STBY

ESP

Version X

Entire procedure can be automated by one shot ISSU command:

request platform software package install node file <filename> mdr

69

Page 70: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

MDR Operation Summary

SIP/SPA/ELC Configurations MDR Operation

SIP40 w/ GE/10GE/POS SPAs

or

ASR1000-2T+20x1GE ELC

or

ASR1000-6TGE ELC

MDR ISSU

SIP40 w/ mixed GE/10GE/POS SPAs and other

SPAs

(without force keyword in ISSU CLI)

ISSU halt with warning message

SIP40 w/ mixed GE/10GE/POS SPAs and other

SPAs

(with force keyword in ISSU CLI)

MDR Performed for SIP40 and GE/10GE/POS SPAs

Non-MDR ISSU performed for non-MDR SPAs

SIP40 without any SPAs Non-MDR ISSU

SIP10 w/ any type of SPA Non-MDR ISSU

70

Page 71: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

MDR Traffic Loss Observations

Features Traffic Loss Configuration

IPv4 w/ OSPF 0.02msec • 1 OSPF neighbor

• traffic rate @1mpps

IPv6 w/ OSPFv3 0.14msec • OSPFv3 PE-CE

• Traffic rate @142kpps

ISISv4 0.014msec • 1 IS-IS neighbor

• Traffic rate @1.4mpps

IPv4 Multicast 20msec • 1k IGMP group join

• 3 OIFs per group

GRE Tunnel 0.14msec • 1 GRE tunnel

eBGP with MPLS VPN 0.0176msec • 4k eBGP sessions

• 1M VPNv4 prefix

EVC 0.019msec • 1k EFPs

• Traffic rate @4.2mpps

EoMPLS 0.0168msec • 1 EoMPLS VC

VPLS 0.026msec • 4k EFPs

• Traffic rate @1.08mpps

PTA 0.285msec • 90 PPPoE session

• Traffic rate @710kpps

LNS 0.042msec • 10 PPP sessions

71

Page 72: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

ISSU MDR Demo

Page 73: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

Summary and Take Away

Page 74: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Operating an ASR 1000 Summary and Take Away …

• To improve network reliability and operation simplicity

1. Follow the proper system operation procedure

2. Know features and resources dependency

3. Proactively monitor key system resources

4. Adopt best practices and implement recommendations

FECP Mem

Proactive Monitoring

Crypto QFP IOS CPU DRAM TCAM RP CPU RP Mem IOS Mem

74

Page 75: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Relevant Sessions at Cisco Live 2014

Breakout Sessions

• BRKARC-2001 Cisco ASR1000 Series Routers: System & Solution Architectures

• BRKARC-2021 - IOS XE Advanced Troubleshooting (NAT, VPN, FW packet forwarding)

• BRKCRS-3147 Advanced troubleshooting of the ASR1K and ISR 4451-X made easy

75

Page 76: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Complete Your Online Session Evaluation

• Give us your feedback and you could win fabulous prizes. Winners announced daily.

• Complete your session evaluation through the Cisco Live mobile app or visit one of the interactive kiosks located throughout the convention center.

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

76

Page 77: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

© 2014 Cisco and/or its affiliates. All rights reserved. BRKARC-2019 Cisco Public

Continue Your Education

• Demos in the Cisco Campus (ASR1001-X Live Demo)

• Walk-in Self-Paced Labs

• Table Topics

• Meet the Engineer 1:1 meetings

77

Page 78: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467

Thank you.

Page 79: Operating an ASR 1000 - d2zmdbbm9feqrf.cloudfront.netd2zmdbbm9feqrf.cloudfront.net/2014/usa/pdf/BRKARC-2019.pdf · Operating an ASR 1000 BRKARC-2019 Jason Yang – CCIE #10467