monitoring system targeting openstack, baremetal, and network fabric

45
Unified Monitoring System targeting OpenStack, Baremetal, and Network Fabric 안재석 SDI Tech. Lab. NIC기술원 SK텔레콤

Upload: jaesuk-ahn

Post on 16-Feb-2017

576 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Unified Monitoring System targeting OpenStack, Baremetal, and Network Fabric

안재석

SDI Tech. Lab. NIC기술원 SK텔레콤

Page 2: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Story goes like this

연구소 (Software Defined Data Center 구현)

- 사업/운영부서 요구사항 vs. 선행 R&D (Product)

- Navigation between 이상과 현실

- Step by Step

Page 3: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Baremetal MonitoringDatacenter Operation Automation

- 사업/운영부서 요구사항: Asset Management & Baremetal Provisioning

- What we want: Software Defined DC based on Cloud Platform

Baremetal Monitoring

- 사업/운영부서 현황: 기존 솔루션 존재 (오픈소스 기반으로 비용절감 요구사항)

- What we had: Zabbix

Page 4: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Baremetal MonitoringWhat is important here…

- 사업/운영부서에서 사용하는 기존 솔루션 대체로 비용 절감

. Metrics/Logs : 최소한 기존 솔루션 만큼

. Alarm : 최소한 기존 솔루션 만큼 (정확한 알람!)

. Dashboard & UI : 최소한 기존 솔루션 만큼

. 운영/문제 대응 : 최소한 기존 솔루션 만큼

Page 5: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Physical Resource Management (랙, 서버, 스위치, etc)

Baremetal Provisioning Automation (Razor + Chef)

T-ROS (Datacenter Operation Platform)

Page 6: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

VM MonitoringPrivate Cloud 2.0

- 사업/운영부서 요구사항: Open Source (OpenStack) based Cloud Service

- What we want: Yeah~

Private Cloud Monitoring

- 사업/운영부서 현황: 어떤 것을 사용하면 될지 알려주세요

- What we decided: Thinking…

Page 7: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

VM (Cloud) MonitoringWhat is important here…

- OpenStack 운영을 위해서 무엇을 모니터링 해야 하는지 알아내기

. OpenStack Controller, Service, Compute Node, Storage, etc.

- 현재 사용하는 OpenStack Distribution 과 잘 맞아야 하며, 더 나아가서는 어떤 OpenStack Distribution을 쓰더라도 쉽게 적용 가능해야 함

Page 8: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Monasca Project (OpenStack) https://wiki.openstack.org/wiki/Monasca

Monitoring-as-a-Service solution based on a first-class REST API

• Multi-tenancy based on Keystone authentication. Supports self-service.

Metrics storage/retrieval/statistics and alarm/thresholding engine

Notification system

Open-source and built-on open-source technologies such as:

• Kafka: Performant, scalable, fault-tolerant, durable message queue. Used by LinkedIn, Twitter, …

• Apache Storm:

• Time-series databases: InfluxDB supported today.

Page 9: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Monasca 아키텍처Out-Of-Box

API를 통한 Data 수집

API를 통한 Data 조회

OpenStack에 맞춰진 데이터 수집과 알람 (link)

Kafka를 기반으로 다양한 서비스 추가 가능- 향 후 확장성 고려

Page 10: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

P.CL User Portal

Page 11: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

P.CL User Monitoring

Page 12: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Analytics Datacenter Operation Analytics

- 사업/운영부서 요구사항: 정확한 알람을 주세요.

- What we want: 분석툴은 필요없으세요?

Private Cloud Monitoring

- 사업/운영부서 현황: 하고 싶어도, 분석을 위한 여유가 없음.

- What we decided: 분석툴 + Knowledge (직접 수행)

Page 13: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

?

Analytics (T-ROI)

Page 14: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Analytics (T-ROI)

Page 15: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Analytics (T-ROI)

Page 16: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Analytics (T-ROI)

Page 17: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Container MonitoringPaaS (DevOps Platform)

- 사업/운영부서 요구사항: … (Do we need Paas?)

- What we want: DevOps Platform 기반 PaaS 서비스 구축

Container Monitoring

- 사업/운영부서 현황: What?

- What we decided: Just do it

Page 18: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

T-Fabric (PaaS) 모니터링 체계 T-FabricPortalSystem

T-FabricPortalWeb

T-FabricPortalCollector

PinpointSystem

PinpointDB(Hbase )

PinpointCollectorServer

PinpointWebServer

T-FabricMeteringDBServer

T-FabricDB(MongoDB )

Zabbix System

Zabbix Server

Zabbix DB(MySQL)

T-FabricMonitoring

T-FabricMetering

vm

Zabbix Agent

cAdvisor

ContainerPinpointAgent

ContainerPinpointAgent

ContainerPinpointAgent

vm

Zabbix Agent

cAdvisor

ContainerPinpointAgent

ContainerPinpointAgent

ContainerPinpointAgent

PinpointWebAPI

APMMonitoring

SystemMonitoring

SystemMetering

Metering/Monitoring

Zabbix WebAPI

Page 19: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

T-Fabric (PaaS)

Page 20: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

T-Fabric (PaaS)

Page 21: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Houston, We have a problem

Page 22: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Private Cloud로 시작된 변화 P.CL

- SKT의 Self-Provision이 가능한 클라우드 서비스 (오픈소스 기술 적용)

. 내부 IT자원 사용 프로세스 변경 / 클라우드에 맞는 보안정책 변경등 진행

. 오픈소스의 적극적인 도입 시도

- Baremetal (Host), OpenStack Service, VM에 대한 통합 모니터링 필요

- 기존 Legacy 환경과 다른 신규 환경으로 “새로운” 시도를 위한 기반 조성

Page 23: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

솔루션/모니터링의 파편화 Baremetal, VM, Container, 분석까지 너무 복잡합니다.

- 그래서, 저는 무엇을 쓰면 되나요? 어떤 화면을 봐야 하나요?

- 그래서, 알람은 어디서 온다는거에요?

모니터링과 분석을 위한 데이터 요구사항은 유사한데, 어디에서 받아야 하나요?

- 요청되는 데이터는 유사한데 수집 에이전트도 다르고, monitoring data path도 다르고, 복잡합니다.

- 우리 서로 필요한 정보를 가지고 있긴 한데, 어떻게 이걸 주고 받죠?

- Raw데이터를 가공한 정보가 있는데 다른 솔루션에서 사용 가능토록 하려면 어떻게 하죠?

Page 24: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Let’s unify all

Page 25: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Step-by-Step Monasca Agent 적용 범위 확장 & Beaver 적용 범위 확장 - 베어메탈, VM으로도 확장

기존 데이터센터 Metrics/Logs/Alaram 적용 - Monasca Agent Plugin 추가 개발 (link)

Infra Resource (베어메탈, VM) 메타데이터 통합

Datacenter Operation Platform과 Analytics Platform 통합

Page 26: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Step-by-Step

Page 27: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 1

Page 28: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 2

Page 29: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 3

Page 30: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 4

Page 31: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 5

Page 32: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Renewed Operation - 6

Chain of Actions • Ghost VM 삭제 • Ghost VM 생성 원인 파악 - Fix • Ghost VM 생성시 모니터링 방법 (monasca collector log?) • …

Page 33: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

We still have a long journey to go • Integrating Container Monitoring

• Integrating Network Monitoring

• Better analytics - More Data

• Metrics

• Logs

• Event

• Network Packet Flow

• SNMP

• …

• Better analytics - Anomaly Detection

Page 34: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Step-by-Step

Page 35: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Better Analytics • 2016년: Algorithm기반 Anomaly Detection 기능 개발 및 실데이터 기반 검증• 2017년: Algorithm (Deep Learning등) 검증 및 상용 적용 (OpenStack-Focused)

Page 36: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Network Monitoring가상 네트워크 관리 / Fabric 관리

- 사업/운영부서 요구사항: It will be good to have.

- What we want: SDN 기반 가상네트워크 관리 & Fabric 관리

Container Monitoring

- 사업/운영부서 현황: Network (기존 솔루션 존재) / IT (Vendor Dependent)

- What we decided: Software Defined Network Visibility

Page 37: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

OpenStack Needs Network Monitoring

Page 38: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Network Monitoring SDN네트워크 혹은 기존 L2/L3 기반의 Legacy 네트워크에서 SNMP, sFlow, NetFlow, Mirror등을 통하여 모니터링 정보를 수집하며, 이를 원하는 Flow정보로 변환하고 통계 데이터 전송을 할 수 있는 모니터링 플랫폼 개발

Page 39: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Network Monitoring

NPB (Network Packet Broker)- Tab으로부터 Mirroring된 Packet을 5-tuple기반으로 분류/전달 가능한 Packet Filter

NPM (Network Packet Monitoring)- Tab으로부터 Mirroring된 Packet을 Flow Tracking하여 Flow별 네트워크 성능 모니터링 수행

Flow Analyzer - sFlow/NetFlow를 이용한 Flow 정보 추출 및 분석

Page 40: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Network Monitoring

T-CAP: 서버스위치* (news article)

Page 41: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

3D Network Administration

Page 42: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Our Goal (Conceptually)

Page 43: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Let me put my community hat.

Page 44: Monitoring System Targeting OpenStack, Baremetal, and Network Fabric

Help Together in 커뮤니티

OpenStack Operator Group =>• OpenStack 운영을 위해서 무엇을 어떻게 모니터링 해야 할까? • Baremetal, VM, Container를 통합 모니터링 하려면 어떤 방법들을? • 혹시 미리 테스트 해본 넘들은 있는가요? • 운영… @#$!#@#$%$# (속 풀이)

하자고 하고, 여태까지 시작도 못하고 있습니다. ^^