hw monitoring and data center - open compute …...agenda infrastructure revolution infrastructure...

26

Upload: others

Post on 22-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management
Page 2: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

H W M o n i t o r i n g a n d M a n a g e m e n t S y s t e m f o r T e l c o D a t a C e n t e r

Jungsoo Kim/R&D Manager/SK Telecom

Page 3: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Agenda

Infrastructure Revolution

Infrastructure Management in VIM & MEC

HMS Overview

-Objectives

-Automated HW Lifecycle Management

-Architecture

-Open Source in HMS

-Centralized vs. Distributed

-Demo

Future Work

Page 4: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Infrastructure Revolution

Mobile network and data center are converging towards open HW & SW based infrastructure

● Software-Defined Infrastructure ● Open Hardware and Software ● Universal Platform for Diverse Applications

Open Hardware

(Physical Infrastructure)

Open Software

(Virtual Infrastructure)

Unbundled

Server Accelerator Network Storage

Virtual Resource Management

SW-Defined Compute

SW-Defined Networking

ML Job Scheduler

Virtualization Layer

Data Center Operation

Automation/ Intelligence

High-Speed NW Analytics GPU

SW-Defined Storage

Open Source Based In-House Dev.

OTT (SNS, Media, etc)

Telco (LTE, 5G, LoRa, etc)

Enterprise IT (BSS, ERP, etc)

Physical Resource Management

Commodity HW Telco Optimized HW

Page 5: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Infrastructure Management

Management of NFV and MEC’s physical & virtual resources is integral part of 5G building blocks,

Needs another layer of hardware management system

Source: ETSI Mobile-Edge Computing Technical White Paper, 2014

: Infrastructure Management System

• Infrastructure Management System Requirements - Support physical & virtual resources - Vendor neutral system - Automation & Intelligence

SKT VIM Ref. Architecture Multi-access Edge Computing Architecture

Vendor A

Vendor B Vendor B

Vendor A

Page 6: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Hardware Management System

•OS Level Info - hostname - serial number - OS version - CPU info - Network Info - etc.

Collected Data

Mgmt. Network • In-Band

【 Existing System 】

Monitoring • Separated

•HW Level Info - Chassis/Board Info - BMC info - Inlet/Outlet Temperature - Power/Fan - PCIe device/Memory module - etc.

• In-Band & Out-Of-Band

【 HMS 】

•Unified

Expands Infra

Mgmt. Scope

Change Mgmt. •Manual + Automatic • Full Automatic

HW Level Info: Collecting low level H/W info like BMC F/W, Temp, Power, Fan, GPU, Memory module etc.

Standardized Data Collection: standardize HW information of heterogeneous systems and support next-gen protocols

Automated Change Management: Periodical H/W information collection enables detection of H/W changes

OOB Management: Out-of-Band system management enables off-line system management

Page 7: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Objectives

Have an deep knowledge of data center hardware components

– Asset Data: Memory, GPU, NIC, Raid Controller (location/form factor/asset tag/serial/manufacture ...) etc.

– Sensor data : Inlet/outlet Temperature, Power Usage etc.

Automated change management and asset register

– Automatically detects any hardware and configuration changes

– Updates asset DB only with the machine generated code to reduce human error

• ex. Dell Power Edge R720, PE R720, Dell 720 … , Serial Number, Mac Address

Vendor neutral management

– Supports both off-the-shelf system and open source hardware

– Mapping vendor specific keys to the standardized item

Flexible system

– Adding new devices or components easily

– ex) GPU, Accelerator card etc.

Page 8: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Automated HW Management

Procurement Install Configure Monitoring System

Change

Use

Change Disposal

Vendor

Operation

HMS

Submit

Document

CMDB/Asset DB/Operation DB

Tech. Support

Check Changed Item &

Update DB

Change Tracking Management

(Compare w/ CMDB)

Delivery &

Install

Collect HW

Data

OS & App

Install

Confirm &

CMDB Update

Register

Sys Info

Disposal

Process

Update DB

Minimizes human interferences, and detects system installation and changes automatically

Page 9: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Architecture

Telemetry Framework

Compute Asset Network Asset Storage Asset Rack Asset IT Asset

Protocol IPMI RedFish SNMP Open

Config SNMP PCIe Stg PMBUS

IPMI/

Serial

Compute

Module

Network

Module

Storage

Module

Rack

Module

Processing

Module

Common

API

Web Application Server

Application

Relational

Time Series

Agent

Manager

Cluster

Manger

Resource

Analytics

Change

Manager

: undecided

REST API

UI

Page 10: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Open Source SW in HMS

Telemetry Framework

Compute Asset Network Asset Storage Asset Rack Asset IT Asset

Protocol IPMI RedFish Open

Config SNMP PCIe Stg PMBUS

IPMI/

Serial

Compute

Module

Network

Module

Storage

Module

Rack

Module

Processing

Module

Common

API

Web Application Server

Application

Relational

Time Series

Agent

Manager

Cluster

Manger

Resource

Analytics

Change

Manager

: undecided

REST API

UI

SNMP

Dashboard

Page 11: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

SNAP Introduction

Snap is an open telemetry framework designed to simplify

the collection, processing and publishing of system data

through a single API

• REST & CLI

• Flexible Scheduling

• Plugin Lifecycle Management

• Tribe (Clustering)

Source: https://www.slideshare.net/MatthewBrender/intro-to-open-source-telemetry-linux-con-2016

Page 12: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Centralized vs. Distributed

Node

Node

Node

︙ Aggregator

MQ

Cluster

Control

Hbase

Node

Node

Node

Node

DB

Cluster

Control

DB

MQ

Legacy SNAP

Node

• A controller only sends each groups configurations and plugins to

one of the nodes, and the SNAP takes care of rest of the tasks

• The node’s agent can collect, process and publish the data

• No central group manger needed

• A controller manages each node’s configuration and plugins

• A node agent sends raw data, and the aggregator process (if

necessary) and publish the data to the database

• The cluster controller manages the groups

Page 13: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Plugins

Collector

Publisher

• Developed three collector plugins to collect unsupported hardware data and a processor plugin to process raw data

• Collecting, post-processing, and publishing all done at the agent node

Processor

Facter

System Information

~200 metrics

HW Component

Dmidecode/lspci/smartctl

~510 metrics

IPMI BMC, Fan, Temp, Power …

~30 metrics

NVIDIA SMI Temp, Fan, Power, Util. …

~30 metrics

Tag

Tagging custom data

InfluxDB

For time series data

: Plugin developed : Open source plugin

Page 14: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

HMS – Main Page

Detailed Node Info

Node List

Cluster Stats

Group List

Page 15: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

HMS – Group Management

Task Management

Plugin Management

Node Management

Page 16: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

HMS – Plugin Management

Plugin List

Page 17: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

HMS – Task Management

Task Editor File List

Page 18: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

HMS – Change Management

Change History

Page 19: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Dashboard - Grafana

Page 20: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Dashboard - Detailed Node View

Page 21: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Dashboard – Detailed Node View (GPU Info)

Page 22: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

3DV/HMS Integration Demo

Page 23: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Future Work

• Support for Multiple Rooms (Data Centers)

• NFV & MEC Integration

• Complete Next Generation Hardware Management Protocol support (RedFish, OpenConfig etc.)

• Expand Data Coverage (ex. Switch, Enterprise Storage)

• Integration with Facility Management System (FMS)

• Open sourcing developed plugins

Page 24: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management

Thank you

email: [email protected]

SK Telecom Upcoming Talk:

Date: Wednesday March 21, 2018 11:00am - 11:30am

Venue: 210 C (EW: Storage)

Subject: SK Telecom: Shareable DAS Pool with All NVMe Array (Eric H. Chang)

Page 25: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management
Page 26: HW Monitoring and Data Center - Open Compute …...Agenda Infrastructure Revolution Infrastructure Management in VIM & MEC HMS Overview -Objectives -Automated HW Lifecycle Management