service assurance - dpdk summit · pdf fileprovide service assurance, ... consistent resource...

27
Service Assurance Carlos Gonçalves, NEC Maryam Tahhan, Intel

Upload: phungque

Post on 08-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Service Assurance

Carlos Gonçalves, NEC

Maryam Tahhan, Intel

Page 2: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Legal Disclaimers

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Axxia, Intel, the Intel logo, Intel Atom, Intel Core, Intel. Experience What’s Inside, the Intel. Experience What’s Inside logo and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

© 2016 Intel Corporation. All rights reserved. *Other names and brands may be claimed as property of others.

2

Page 3: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Introduction“Data Centres are powering our everyday lives.

Organizations lose an average of $138,000 for one hour of downtime.” [1].

Telco and Enterprise alike are asking how they get and provide Service Assurance, QoS and provide SLA’s on the

platform and services when deploying NFV.

It is vital to monitor systems for malfunctions or misbehaviours that could lead to service disruption and promptly react to these faults/events to minimize service

disruption/downtime.

Page 4: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

VNF = Active

VNF = Standby

Page 5: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

localhost-port.0-link_status != 0

VNF = Active

VNF = Standby

Page 6: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

localhost-port.0-link_status != 0

VNF = Active

VNF = Standby

Page 7: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

localhost-port.0-link_status != 0

X VNF = Active

VNF = Standby

Page 8: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

localhost-port.0-link_status == 0

X VNF = Active

VNF = Standby

Page 9: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Use case Example

Compute node DPDK interface monitoring for Host link status and switches from

active to standby service when the link goes down.

ControllerCompute Node 1

(out of service)

Compute Node 2

collectd

OVS With

DPDKOVS

ceilometer aodhVNF VNF

X VNF = Active

Page 10: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Project in OPNFV working on building an open-source NFVI fault management

and maintenance framework to ensure Telco VNFs availability in fault and

maintenance events

1. Identify requirements

2. Gap analysis

3. Implementation work in upstream

4. Integration and testingConsistent Resource

State Awareness

Immediate

Notification

Fault CorrelationExtensible

Monitoring

Doctor

Page 11: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: fault management use case

Page 12: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: mapping to the OpenStack ecosystem

Page 13: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: focus of initial contributions Consistent Resource

State Awareness

Immediate

Notification

Page 14: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: focus of initial contributions Immediate

Notification

Consistent Resource

State Awareness

Page 15: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Immediate event alarming

Page 16: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: focus of initial contributions Consistent Resource

State Awareness

Immediate

Notification

Page 17: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor: extending contribution focus Consistent Resource

State Awareness

Immediate

Notification

Fault CorrelationExtensible

Monitoring

Page 18: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Doctor blueprints in OpenStackProject Blueprint Spec Drafter Developer Status

Aodh Event Alarm Evaluator Ryota Mibu (NEC) Ryota Mibu (NEC) Completed (Liberty)

Nova New nova API call to mark nova-compute

down

Tomi Juvonen

(Nokia)

Roman Dobosz

(Intel)

Completed (Liberty)

Support forcing service down Tomi Juvonen

(Nokia)

Carlos Goncalves

(NEC)

Completed (Liberty)

Get valid server state Tomi Juvonen

(Nokia)

Tomi Juvonen

(Nokia)

Completed (Mitaka)

Add notification for service status change Balazs Gibizer

(Ericsson)

Balazs Gibizer

(Ericsson)

Completed (Mitaka)

Congress Push Type Datasource Driver Masahito Muroi

(NTT)

Masahito Muroi

(NTT)

Completed (Newton)

Adds Doctor Driver Masahito Muroi

(NTT)

Masahito Muroi

(NTT)

Completed (Newton)

Page 19: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

SFQM Overview

• Develop the utilities and libraries in DPDK to support:

• Measuring Telco Traffic and Performance KPIs. Including:

• Packet Delay Variation.

• Packet loss.

• Monitoring the performance + status of the DPDK interfaces.

• Detecting and reporting violations that can be consumed by

VNFs and higher level fault management systems.

Enable provisioning, monitoring and service impacting fault detection infrastructure on the Platform

Page 20: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

DPDK 2.0 DPDK 2.1 DPDK 2.2

• Callback API

• RX/TX timestamping

sample app.

• Extended NIC stats

for ixgbe.

• proc_info

• Extended NIC stats

for igb, i40e and VFs.

• Extended NIC API

Alignment across

drivers.

• DPDK KeepAlive

DPDK 16.04

• Primary process

liveliness check.

• dpdkstat plugin –

pull request

• Ceilometer plugin.

Collectd

• Measure packet

latency through

DPDK

• Retrieve statistics directly from NIC

registers.

• Detect DPDK application thread

failure.

• Detect DPDK

primary process

liveliness.

• Relay DPDK

interface stats and

status back to

ceilometer.

Contributions to Open Source Projects

Page 21: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Contributions to Open Source Projects

DPDK 16.07 DPDK 16.11

• Extended Stats API Implementation for vhost PMD. *

• Improved primary/secondary DPDK process interaction *• Extend the Extended NIC stats API to retrieve Last cleared

time for stats as well as average and peak bit rate. *• Extend the Extended NIC stats API to retrieve statistical

Latency for packets. *• OVS extended NIC stats API integration. *• OVS Keep Alive Integration.

• OVS plugin to retrieve interface and flow stats.

• CMT plugin to report Cache Occupancy on a per

Resource Monitoring ID (RMID) basis.

• Integration with collectd SNMP Plugin.

Collectd Collectd

• ID based Extended

NIC stats API

• DPDK Keep Alive

exposing core

status through

POSIX shared

Memory Object

• DPDK Link

status/KA plugin

• Extended/ Improved

Extended NIC stats

& DPDK KA APIs. * Still in planning

Open vSwitch

Page 22: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Intel® Resource Director Technology

Core

app

Core

app

Last Level

Cache

Core

DRAM

app

• Identify misbehaving application and reschedule according to priority

• Cache Occupancy reported on a per Resource Monitoring ID (RMID) basis

Cache Monitoring Technology (CMT)

Core

app

Core

app

Last Level

Cache

Core

DRAM

app

Cache Allocation Technology (CAT)

• Last Level Cache partitioning mechanism enabling the separation of an application

• Misbehaving threads can be isolated to increase determinism

Core

app

Core

app

Last Level

Cache

Core

app

Memory Bandwidth Monitoring (MBM)

• Monitors Memory Bandwidth consumption on per thread/core/app basis

• Shares common RMID architecture

• Provides insight into second order of shared resource contention

DRAM

Key Technologies for Improved Visibility and Performance Determinism

Page 23: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

OPNFV Software Stack based on BrahmaputraOrchestration and Management

Virtual Network Functions

Compute Virtualization

Control

Storage Virtualization

Control

Network Virtualization

Control

OpenStack

KVM Ceph OpenDaylight

OpenContrail

ONOS

OVS

Data Plane Acceleration

DPDK ODP

Infrastructure

Pharos Community Labs

OPNFV Bare Metal Labs

Page 24: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

OPNFV SW Stack based on Brahmaputra + SA Ingredients Orchestration and Management

Virtual Network Functions

Compute Virtualization

Control

Storage Virtualization

Control

Network Virtualization

Control

OpenStack

KVM Ceph OpenDaylight

OpenContrail

ONOS

OVS

Data Plane Acceleration

DPDK ODP

Infrastructure

Pharos Community Labs

OPNFV Bare Metal Labs

Collectd

Monitoring/Analytics Systems

Doctor

EventsStats Base PlatformTelemetry/Monitoring

Monitoring, Resource State Awareness, Fault Correlation & Immediate Fault Notification

Telemetry/Events Extentions

Telemetry/Events Extentions

SFQM

SNMP, Netflow, Sflow

Page 25: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

SFQM + Doctor

Ceilomete

r

collectddpdkstat

Plugincollectd

1. Read

2. Get stats

3. Dispatch Values

OVS With DPDK

RX TX

collectd Ceilometer

Plugin

VM collectd

RX TX

5. Post Values4. Pass Values

Page 26: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Summary

We would like to hear from you if you have any additional requirements or if you are interested in consuming this work.

Call to Action: Come join us in Doctor and SFQM to help pave the Fault management path in OPNFV.

Page 27: Service Assurance - DPDK Summit · PDF fileprovide Service Assurance, ... Consistent Resource Integration and testing State Awareness ... Neutron Cinder. Title: Service Assurance

Cloud Ready NFV Platform

Compute Network Storage

Hypervisor (incl Hard Real-time support)VIM

NFVI +VIM

MANO

Infrastructure SA

Ceilometer

Open Stack Infrastructure SA

Collectd openstack

plugins

sFlow/NetFlow

CorrectiveAction

VirtualisedCompute

VirtualisedNetwork

VirtualisedStorage

HA SolutionIntegration

Local Corrective

Action

Syslog Collectd

PMU countersNIC countersvSwitch counters

SNMP API

Supporting MIBs [IF-MIB etc., Virtualization MIB]

HW Resource Health Detection(Watchdog/Heartbeat

)

CPU/Mem/Utilization Counters

Fast PathTriggers on events or counters

VM Stall Detection/RT Stall Detection

Monitoring/AnalyticsSystems

Slow PathPeriodic Pull5mins-15mins

CAT/CMT/PQoS Intel RAS Features

Open Fault Interface

Fast Path Metrics

Standard Open APIs

Intel Components

Hardware Resource Monitoring

Standard Open APIs for Base Platform Resource Provisioning and MonitoringIncludes SNMP/sflow/NetFlow/IPFIX/collectd/ceilometer/EPA

ReliabilityAgent(s)

Do

ctor

Aodh

Vitrage

Congress

Nova

Neutron

Cinder