marcelo perazolo, lead software architect, ibm corporation - monitoring a power-based converged...

20
Monitoring a Power - based Converged Infrastructure with Nagios Marcelo Perazolo [email protected]

Upload: nagios

Post on 21-Mar-2017

552 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Monitoring a Power-based Converged Infrastructure

with Nagios

Marcelo Perazolo

[email protected]

About Me

• Software Architect with IBM – Worked on different IBM divisions:

Tivoli, WebSphere, Systems

– 25 years experience with:Management of “anything under the Sun”(Systems, Network, Storage, Middleware, Applications, Cloud, etc.)

– Emphasis on:Open Source softwarePower Systems / OpenPOWER

– Small previous exposure to Open Source community.Trying to convert to the “light” side of the Force,be a good user, contributor and open community citizen !

Why Power ?MariaDB on POWER8 S822L delivers1.87X performance per core and up to

40% better price-performance than Intel Xeon E5-2660 v3 HaswellReduce operating costs with less systems at a lower acquisition cost

13293

10267

54676406

0

2000

4000

6000

8000

10000

12000

14000

16000

PO

WER8

E5-

2660

v3

PO

WER8

E5-

2660

v3

Tra

nsacti

on

s p

er

Min

ute READ 90/10

Rd/Wr

0.480.38

0.63

0.45

00.10.20.30.40.50.60.70.80.9

1

PO

WER8

E5-

2660

v3

PO

WER8

E5-

2660

v3

Tp

M /

$

READ 90/10 Rd/Wr

• Results are based on IBM internal testing of single system image systems running Sysbench OLTP version.05 @ 32M and are current as of May 29,

2015. Performance improvement figures are based on multiple G2 processes running a 32 million record workload . Individual results will vary

depending on individual workloads, configurations and conditions.

• IBM Power System S822L; 20 cores / 160 threads, POWER8; 3.4GHz, 128 GB memory, MariaDB 10.1, RHEL 7.1, RHEV

Competitive stack: Dell R730; 20 cores / 40 threads; Intel E5-2660 v3; 2.6 GHz; 128 GB; , MariaDB 10.1, RHEL 7.1, RHEV

Continuous data load

Massive IO bandwidth

Flash for extreme performance

Parallel processing Large-scalememory processing

IBM POWER server is the first server that has made its systems, processor, and chip

design and architecture fully available to an open development alliance for comprehensive licensing and collaborative design allowing third parties to co-innovate.

Innovation

Why Power ?Porting Linux applications to Power Systems is quick and easy

Most applications port with a simple recompile and test

• 95% of Linux on x86 applications written in C/C++ port to Linux on Power

with no source code change, just a simple recompile and test1

– Canonical reported an average of 250 open source applications ported

per day on Ubuntu. 95% of the Ubuntu 14.04 LTS compiled software ported

with a simple recompile and test

• 100% of hardware agnostic Linux on x86 applications written in scripting (Java) or interpretive languages will run as is with no changes2

• IBM is committed to further simplifying porting and development on Linux on Power

– Embrace open standards and partner with open communities such as OpenPOWER, OpenStack, Ubuntu, and Cloud Foundry

– New tooling and function such as BlueMix

– Provide easier means to build apps leveraging existing code in the open communities

1. Includes C/C++ and other compiled languages. Assumes 16 hours of dedicated time and prior experience with the application code and its dependencies

(e.g. language, libraries, web application, database) and that dependencies already ported and installed. Assumes no platform or device specific dependencies.

2. Interpretive languages include PHP, Python, Perl, Ruby, Java, etc. Assumes 8 hours of dedicated time and prior experience with the application code and its dependencies

(e.g. language, libraries, web application, database) and that dependencies already ported and installed. Assumes no platform or device specific dependencies.

Converged Infrastructure ?

Award Winning Hardware Design

Open Linux Environment

• 3000+ Applications• Little Endian Support

Consolidated Support

Unified ConsoleSingle Point of Access

Competitive Pricing & Financing

• PowerVM• PowerVC• Nagios

ManagementStack

• Shareable Compute resources• Intra-rack Networking• Storage Fabric / Distributed

Elastic Storage• Upward Integration• Virtualization / Cloud – capable Decreased maintenance Increased flexibility & control

“Data Centerin-a-Box”

PureMgr(rhel7.1)

Hypervisor: KVM(rhel7.1)

PowerVC(rhel7.1)

Service(rhel7.1)

UI integrationInventoryConfigurationMonitoring (Nagios)

VirtualizationManagement

ServiceTroubleshootingBring-Up

Host

ThinRHEL/KVMVanilla

Future components

Future functionBacked by OSSto be added here

Secondary/HMC V 1.0 onlyVirtualized later

PurePowerIntegratedManager

Future failover function

Management Node(s) OverviewIncreased OpenStackadoption later:- ICM with OpenStack- BlueBox- Etc.

Direction to move toPower-based nodes

Primary

NagiosCore

NagiosUI

PureMgrUI

PureMgrCore

Activation and configurationscripts

virtual appliance image

Deliverables toManufacturing

Virtual Appliance

• Automated virtual image deployment and configuration

• Capability to conserve customizations on image update

Hardware Inventory & Monitoring

• Automated configuration of all rack devices

• Automated configuration of Nagios & SNMP monitoring infrastructure

• Capability to discover and auto-configure new devices

UI/CLI support

• UI to serve as integration point to all rack element managers

• CLI for management and integration operations

Integrated

ManagementPureMgr

PureMgr Architecture

PureMgr (detail)

NagiosCore

NagiosUI

snmp

traps

NagiosPlugins

Integrated ManagerUI

Hardware Devices

PowerVC

HMC

Integrated ManagerCore

IntegrationPlugins

VIOSes

UI

SNMP Agents

SNMP Proxy

SNMPTTsnmp

traffic

MIBs

trap map

config

subscription +

configuration

snmp

operations

device-specific

operations

Inventory Data

NagiosCore

NagiosUI

NagiosPlugins

PureMgrCore

PowerVC

PureMgrPlugins

remote

monitoring

reports

Virtual Guest OS

NRPE Agent

PureMgrUI

NRPE

query

guests

deploy

1

2

3

4

1. PureMgr queries PowerVC for

deployed Guest OSs

2. Admin uses PureMgr UI to select

which endpoints to deploy agent

and monitor

3. Automated NRPE agent

deployment to all monitored

endpoints

4. Deployed NRPE instances send

monitored data back to Nagios

5. Nagios UI display monitored data

& alerts

5

Guest Monitoring (detail)

Future: consolidate NRPE reportsfor scalability/performance purposes(e.g. Check_MK)

Inventory Data

• Familiar Look & Feel

• 3 integration areas:Integrated Applications(e.g. Nagios)Virtualization Mgmt(e.g. PowerVC)Hardware Mgmt(e.g. HMC, V7000)

• Clickable links forcontextual Launch

• Rack location for easyphysical access

Integrated Console

Machine Type ModelSerial Number

IP Address

Type of resource/deviceLabel for easy identification

Rack number and EIA locationAdministration user id

Hardware Inventory

• Drives reconfigurationof all resources in themanagement network

• Subnet maskGateway address(for routing)

• IP-Addressescan be individuallychanged(validated with mask)

• Changing subnetautomatically changesall device IPs

• Supports devices withmultiple managementinterfaces

Immediate on-premisesnetwork integration !!!!!!(Time-To-Value)

Network Integration

Monitoring of Power nodesExample below shows monitoring of LED status for Power compute nodes and VIOS RMC connection availability

VIOS• Warning event shows the 1st power compute node can be used for LPAR deployment, but no failover capabilities• Critical events show 3 power compute nodes without LPAR deployment capabilities

LED• All compute power node LEDs are in attention state

HMC services• Metrics for CPU/Memory/Swap• Availability of SSH daemon (for HMC CLI usage)• Number of RMC process (connections)• Trap service shows any SNMP Traps sent by the HMC (or the Power compute nodes it monitors)

V7000 services• Status of Pools, Nodes, Pool Capacity, FC Ports• Availability of HTTPS server (for V7000 UI usage)• Trap service shows any SNMP Traps sent by the V7000 (or the Storage expansions it manages)

Monitoring of HMC / Storwize

ManagementSwitch

DataSwitch

StorageSwitch

Monitoring of Switches

PurePower racks are equipped with Smart PDUs Nagios leverages SNMP monitoring capabilities of Smart PDUs and show a full rack view of energy/power consumed

Data show – per PDU:• Cumulative KWh• Total Power in W• Total VA utilization (for more accurate capacity planning)• Total Energy consumed in Wh

+ Whole rack energy consumption summary

Monitoring Energy Consumption

Possible Future Directions

Integrated

ManagementPureMgr

Add more management functions (e.g. Updates/Compliance)

Leverage more open source (e.g. Ganglia)

Support Analytics space (e.g. Hadoop)

Support additional OpenStack certified applications

Hybrid Cloud, On/Off-Premises, Cloud Management

Support PowerKVM

Lower-cost distributed Storage (e.g. Ceph, GPFS)

New OSs (Ubuntu, SLES, IBMi)

Applications patterns, ready for deployment & monitoring

Integrate additional element managers

OpenPOWER, OpenStack drivers, Calamari, etc.

New Hardware, mainstream / lower cost

OpenPOWER, iSCSI Storage, etc.

ReferencesConverged Infrastructure offering:http://www-03.ibm.com/systems/power/hardware/purepower/

Nagios Plugins contributed to open source:https://exchange.nagios.org/directory/Plugins/Hardware/Others/SX1710-monitoring-plugin/details

https://exchange.nagios.org/directory/Plugins/Hardware/Network-Gear/Others/G8052-2FG8264-monitoring-plugin/details

https://exchange.nagios.org/directory/Plugins/Hardware/Storage-Systems/SAN-and-NAS/IBM-Brocade/Check-IBM-2498-2DF48-status/details

(others under submission, more to come in the near future)

Please use and vote !!!

Any Questions?