predicting service outage using machine learning...

26
Predicting Service Outage Using Machine Learning Techniques HPE Innovation Center

Upload: hadung

Post on 27-Jul-2018

245 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Predicting Service Outage Using Machine Learning TechniquesHPE Innovation Center

Page 2: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE Innovation Center - Our AI Expertise

2

Sense Learn Comprehend Act

Computer Vision

Audio/ Speech Processing

Knowledge Representation

Natural Language ProcessingMachine Learning Expert System

HPE Confidential

Page 3: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

IT Operations Analytics (ITOA)

HPE Confidential 3

Page 4: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

IT operations analytics (ITOA)

Optimize IT operational service in near real time for production application and infrastructure computing environments.

HPE Confidential 4

Page 5: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

IT operations analytics (ITOA)

Analyzehigh volume structured, unstructured log data and

business data

5

Proactively avoid service interruptions, slowdowns and outagesHave faster root-cause analysis and problem recovery timesEnhance system and application performanceImprove end-user experienceIncrease operational efficiencyImprove computing resource utilization

HPE Confidential

Page 6: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Worldwide IT Operations Analytics Market, 2016

HPE Confidential 6

Hewlett Packard Enterprise

+17.0% y/y $97.8M

Note: 2016 Growth (%), and Revenue ($M)Source: IDC, 2017

32.9%

$1.9 Billion

Page 7: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Worldwide IT Operations Analytics Business Revenue Share

HPE Confidential 7

2016by region

Source: IDC, 2017

Page 8: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Worldwide IT Operations Analytics Business Revenue Share

HPE Confidential8

by deployment type

2016

Source: IDC, 2017

Page 9: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Service Outage Prediction: a case study

Page 10: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

25 TB log data

HPE Confidential

10

Page 11: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Probability of occurrence of service outage

HPE Confidential 11

Page 12: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Service Outage Prediction

Sys Log

DB Log

MW Log

Data Source

Anomaly Detection

App Log

0 1

Probability of

Outage Occurrence

Data Pre-processingData cleaning

Log Analysis

Prediction Model

HPE Confidential 12

Page 13: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Log Analysis

Numeric Metrics e.g. DB log

Unstructured Text log

e.g. Sys/MW log

Anomaly Detection

of Time Series Data

Text Analysis of Logs

HPE Confidential 13

Page 14: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Service Outage Prediction - Three Models

Naïve Bayes

Logistic Regression

Neural Network

Naïve Bayes

Logistic Regression

Deep Learning

HPE Confidential 14

Page 15: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Performance Comparison of Different Models

Baseline benchmark

Naïve Bayes

Logistic Regression

CNN

HPE Confidential 15

Page 16: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Building Deep Learning Model with Imbalanced Data

Dimension ReductionPCA

AugmentationAutoencoders

Over-samplingSMOTE

Deep LearningCNN

Inference

Outage

Input vector

Encoder DecoderData Syntheticdata

Compressed

-

+

-

+

HPE Confidential 16

Page 17: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Ensembles of CNN

Dimension ReductionPCA

AugmentationAutoencoders

Over-samplingSMOTE

Deep LearningCNN

Inference

Outage

Outage

Outage

NormalOutage

Normal

Outage

HPE Confidential 17

Page 18: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

More accurate prediction of the service outage

Higher customer satisfaction

Higher market penetration rate

18HPE Confidential

Page 19: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE Portfolio for Deep Learning

HPE Confidential 19

Page 20: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE has a comprehensive, purpose-built portfolio for Deep Learning

HPE Confidential

Compute ideal for training models in data center Edge analytics and inference engine

Compute for both training models and inference at edge

HPE Apollo 6500

HPC Storage Choice of Fabrics

HPE SGI 8600

Government, academia and industries

Financial services

Life Sciences,Health

Government and academia

Autonomous vehicles / Mfg.

AI Software Framework

HPE Apollo 4520

Arista Networking

– Intel® Omni-Path Architecture

– Mellanox InfiniBand

– HPE FlexFabric Network

HPC Data Management Framework Software

Large-scale, storage virtualization & tiered data management platform

Petaflop scale for deep learning and HPC

The enterprise bridge to accelerated computing

HPE Apollo 2000The bridge to enterprise scale-out architecture

HPE Edgeline EL4000 Unprecedented deep edge compute and high capacity storage; open standards

Advisory, professional and operational services, HPE Flexible Capacity, HPE Datacenter Care for Hyperscale

HPE Apollo sx40Maximize GPU capacity and performance with lower TCO

Easy Setup and Flexible OSUsing Bright Computing’s distribution of deep learning software development components and workload management tool integration

Page 21: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE AI customers gaining competitive advantage

21

Solving complex AI challenges with hybrid cluster

Upping the ante on Artificial Intelligence

Libratus AI program defeats world’s best poker players

“Through our partnership with SGI, and now HPE, the Tokyo Institute of Technology has worked successfully to deliver a converged world-leading HPC and Deep Learning platform….”

“The best AI’s ability to do strategic reasoning with imperfect information has now surpassed that of the best humans.”

Dr. Tuomas SandholmProfessor of Computer Science DepartmentCarnegie Mellon University

Dr. Nick NystromSenior Director of ResearchPittsburgh Supercomputing Center

“The discoveries and insights our researchers are now uncovering will have direct effects on human lives by way of advancing precision medicine, increasing energy efficiency, and improving policymaking for the economy.

Satoshi MatsuokaProfessor and TSUBAME LeaderTokyo Institute of Technology..

HPE Confidential

Page 22: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE demystifies deep learning for faster intelligence

New AI expertise, blueprints and technologies to get started, scale, integrate and optimize22

Get started rapidly: Develop deep learning models

Scale and Integrate: Deliver attractive returns

Optimize Environment: Enhance competitive advantage

AI expertise and solutionsto “get started” with deep learning models

Proven blueprints and servicesfor “scalable” production deployments

Technology integration capabilities to maximize performance

Expertise− Rapid technology selection guides− State of the art trainingSolutions− Integrated purpose-built solutions− Out of the box solutions

Integration capabilities− Enhanced global Centers of Excellence− Next gen technology integration

Proven Blueprints− Reference Architectures− Innovation labs for best practicesServices− Deploy, integrate and support − Flexible, on-demand capacity

HPE Confidential

Page 23: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

AI expertise and solutions to “get started” with deep learning models

23New foundation to “get started” with deep learning models

Enhance employee productivity

Accelerate app development with New deep learning integrated solution

Pre-configured, proven hardware & software solution− Purpose-build platform− Easy to use and install− Simple management− Automated framework updates

Train your teams

Gain organizational competencies with EnhancedDeep Learning Institute

State of the art deep learning training− Latest techniques− Software frameworks− Infrastructure requirements− Hands on, instructor led

HPE Fraud Detection Solution with Kinetica− Uses deep learning techniques − Qualified with Kinetica in-

memory GPU database− NVIDIA GPU accelerators

Leverage “out of the box” solutions

Increase security of e-commerce with Enhanced HPE Fraud Detection solution

Selectideal technologies & systems

Make Informed technology decisions with New HPE Deep Learning Cookbook

Comprehensive technology selection tool− Estimates & refines performance− Characterizes frameworks − Recommends ideal hardware

and software stacks

IT Expertise Solutions

HPE Confidential

Page 24: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Implement and integrate your production environment

24

Powering artificial intelligence research

Over 75 AI experts inglobal AI Innovation Labs

Advisory services for app and data integration

New AI Innovation Labs

Established AI testbed combining both HPE hardware and state of the art AI software to:− Collaborate with leading academia

on AI research projects− Support internal HPE AI research− Support select customers and

partners in research and POC

Key benefits− Stable AI environment to accelerate

time-to-value − Purpose-built platforms to handle

the most extreme performance needs− Early access to latest technologies

and new, cutting edge hardware and software, often pre-production

− Faster provisioning and setup

Expertise for quick deployments

− Implement and integrate with a trusted partner, HPE Pointnextservices

− Leverage advisory, professional and operational services

HPE Confidential

Page 25: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

HPE Early Access program for 8-way SXM2 GPU Volta!Accelerating Deep Learning adoption

25

First Tier One OEM to provide early access to 8-way Volta SXM2 server

Timing is Q12018 for select customers

Limited engagement for select customers through your Sales contacts

Configuration, benchmarking and technology selection guidance

+

Advice, planning, design, benchmarking

HPE Confidential

Page 26: Predicting Service Outage Using Machine Learning Techniquesimages.nvidia.com/content/APAC/events/ai-conference/resource/ai... · Predicting Service Outage Using Machine Learning

Thank You

HPE Confidential 26