infrastructure choices for machine learning and deep...

47
Infrastructure choices for Machine Learning and Deep Learning Vikas Ratna, Ravi Mishra Cisco Product Management Session ID: E8356

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Infrastructure choices for Machine Learning and Deep Learning

Vikas Ratna, Ravi Mishra

Cisco Product Management

Session ID: E8356

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

How Important is AI & ML?

By 2035,

AI technologies are

projected to increase

business productivity

by up to

40%

By 2020, insights-driven

businesses will steal

$1.2T

per annum from their

less-informed peers

8 out of 10

businesses have already

implemented or are

planning to adopt AI as

a customer service

solution by 2020

• $1.2T https://go.forrester.com/wp-content/uploads/Forrester_Predictions_2017_-Artificial_Intelligence_Will_Drive_The_Insights_Revolution.pdf

• 8 out of 10 - Oracle - https://www.oracle.com/webfolder/s/delivery_production/docs/FY16h1/doc35/CXResearchVirtualExperiences.pdf

• Accenture https://www.accenture.com/us-en/insight-artificial-intelligence-future-growth

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public• 277X data created by IoE devices vs. end users – source: 2014 Cisco® Global Cloud Index

• By 2020, there will be 5200 GB of data for every person on earth – source: 2012 Digital Universe Study conducted by IDC and sponsored by EMC

(see: http://www.computerworld.com/article/2493701/data-center/by-2020--there-will-be-5-200-gb-of-data-for-every-person-on-earth.html)

• 180 billion mobile app downloads by 2015 – source: 2011 IDC Study: https://www.smaato.com/blog-180billiondownloads/

277XData created by IoT devices vs.

end users

30MNew devices connected every week

180BMobile apps downloaded

in 2015

40%Of all data will

come from sensor

data by 2020

5TB+Of data

per person by 2020

4.2BWeb

filtering blocks per

day

Why Now?First…The Data Deluge - Volume, Velocity, Heterogeneity

Unrealistic to leverage traditional analytics to mine meaningful information at speed.

AI activates the potential of raw data into powerful competitive advantage.

Why Now?

Second –Perfectly timed Infra, Compute and ML Framework Innovation

Improved

Data Collection Infrastructure

Increased

Computing Power

Advancement in

ML Frameworks

Hadoop w/Apache Spark Faster GPUs and CPUs TensorFlow, PyTorch, Caffe, Theano

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Quick Recap: What is AI

Act of Artificial

Intelligence

Machine

Learning

Deep

Learning

Machines Making

Decisions

Learn From

Traditional Data and

Make Decisions

without Explicit

Programming

Use Artificial Neural

Networks to Learn

From Complex Data

and Make Decisions

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Business Drives AI

RetailSecurity and

DefenseMedia and

EntertainmentHealthcareFinanceAI Task

Business

QuestionManufacturing

Is "it" present

or not?Detection

Identify

Access

Anomalies

Indication of

Anomaly in

Scan

Content

Based Search

Identify

Security

Breaches

Events in

Store

Surveillance

Detect

Manufacturing

Flaws

What type of

thing is "it"?Classification

Fraud

detection

Diagnostics –

Tumor?

Content

Labeling

Facial

Recognition

Returning vs

New

Shoppers

Robots to

Track Objects

To what

extent is "it"

present?

SegmentationSentiment

Analysis

Condition

Analysis

Improved

Product

Placement

Crowd

Analytics

Segment by

Customers

Actions

Sort

Components

by Quality

What is the

interpretation?

Natural

Language

Processing

Chatbot

Advisors

Expert

Diagnosis

from Notes

Video

Captioning

Real Time

Language

Translation

In Store

Personal

Assistants

Assembly Build

Instruction

Translation

What is the

likely

outcome?

PredictionCredit

Profiling

Length of Stay

Forecasting

Targeted

Content

Generation

Equipment

Health

Assessment

Customer

Churn and

Retention

Proactive

Machine

Maintenance

What will

satisfy the

objective?

Recommenda

tions

Algorithmic

Trading

Treatment

Recommenda

tions

Content

Recommendati

ons

Risk

Management"Magic Mirror"

Assembly

Process

Improvements

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Built-In AI vs AIaaS vs Purpose Built

Tetration Analytics

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

An Example Of Purpose Built Infrastructure

Big Data DL Framework

TrainingInference

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Popular DL Frameworks To Choose From

• Python Based

• Popular for Production

• Static Graph

• More Control and Work

• Apache 2.0 License

• Great Documentation

• Community Supported

• Python Based

• Popular for Research

• Dynamic Graph

• Less Control and Work

• BSD License

• Easy to Debug and

Prototype

• Community Supported

• JVM Based

• Popular in Big Data

• Static Graph

• Integrates well with Kafka,

Spark and Hadoop

• Apache 2.0 License

• High Performance

• Commercially Supported

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Science Tools To Choose From

Environment Libraries Algorithms Frameworks PlatformUser

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

High Level View Of Overall Workflow

Data

InfrastructurePrepare Data Train a Model

Evaluate

the Model

Deploy &

Improve

More Data Trumps

Better Algorithms

Data and Business

Questions will

Determine ML

Algorithm(s)

Data Can Come

From Anywhere and

is Not Usually in a

State Where it is

Ready to Use for

Training ML Models

Build a Model and

Feed the Model with

Prepared Training

Data so that it Can

Learn to Make

Inferences

Test the Trained

Model Performance

and Accuracy by

Analyzing Inference

Feedback

Improve Performance

by Either Selecting

Different Algorithms

and/or Retraining

Model with More Data

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

• Hadoop native support for GPU

YARN 2.0 /Resource Manager Support for GPU and CPU

• First class support for Docker on YARN

Docker based application/assemblies support on Hadoop

• AI/ML Frameworks on Hadoop

Tensorflow/Caffe/etc docker containers on YARN accessing data on Hadoop, scheduling resource (GPU/CPU)

• HDFS Federation (Federated Namenodes)

• HDFS with Erasure encoding

(1.7 copies instead of 3) Primarily for Cold data

• cblock (volume driver) for kubernetes

Technology Trends with HadoopHadoop 3.0 with GPU

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

An Example: Edge-To-Core AI/ML With Big Data

IOT Data

Sources

Web Data Sources

Streaming Data Sources

Gateway with Kafka

HDFS

No SQL

ML Training & Inference

Se

rvin

g L

aye

r

Speed Layer Batch Layer

Spark Streaming

Compute Nodes

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Customer AI Challenges

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

59%

25%

6% 6%

Knowledge gathering/developing strategy

Piloting Implementing Deployed/in use today

Investigating Using

The majority of organizations are still gathering

information to inform their AI adoption strategy

It’s Still Early Days….

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Customer AI Challenges

• C-Suite wants an AI strategy

• Marketing wants to include AI in everything

• Worried about falling behind and competition passing

• Not sure what problems to solve, Not enough expertise

• Everyone is pitching them and they are overwhelmed

Looking For Guidance, Simplification

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Key Challenges for ITMassive & Active

Data Sets

Volume, Velocity, and Variability of

AI Workloads at Scale Demand

New Data Center Architectures

Uncharted

Territory

Rapidly Evolving AI/ML

Ecosystem and Requirements;

Skill Shortages in Data

Science and IT

Infrastructure

Challenge

Distributed Data Sources and

Technologies Risk Operational

Silos and Complexity

Looking For DC Infrastructure Which is: Silo-less, Simple to manage at scale, Validated for new

workload

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco Strategy Product Portfolio

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Recap: Cisco Unified Computing System (UCS) Momentum

Source: 1 IDC, 2017 Q4, Sep 2017, Vendor Revenue Share

Source: As of Cisco Q4FY17 earnings results. Data Center Revenue is defined as Cisco UCS and Nexus 1000V

Integrated Infrastructure

(Cisco UCS, Nexus, FlexPod)

#1

HyperCoverged vendor

#3

Global Revenue

Market Share in x86 Blades

#2

World Record Performance

Benchmark

150+

Exabyte Total Storage

Deployed

1.4

Big Data Revenue

Growth in 4 Years

18x

Enterprise customers

64,000+

of Fortune 500 Have

Invested in UCS

>85%

HyperFlex Customers

3000+

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Component

Cisco UCS Portfolio: Any Workload. Any Scale. One System.

HyperconvergedInfrastructure

Converged InfrastructureROBO

Fifth Generation UCS

HyperFlex Systems

UCS MiniE-Series

Data Intensive

S-SeriesStorage Optimized

C-Series Rack Servers

UCS Integrated Infrastructure

Solutions

CoreEdge Cloud

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Eliminating Operational Silos Demystifying AI/ML/DL

StacksCurating top-to-bottom SW and HW

stacks with leading ecosystem partners to

ensure a faster and more predictable

deployment

Full array of accelerated

computing options for test/dev,

training and inference, all unified

by cloud-based management

Integrating changing data sources as part of a

dynamic data pipeline

Powering the Full AI Data Lifecycle

Introducing: Cisco Computing Solutions for AI/ML

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Inference in Regional and Micro Data Centers

Cisco AI/ML - Computing Platforms, Partners, and Solutions

UCS C220 Servers

HyperFlex C220 Servers

UCS C240 Servers

HyperFlex GPU Nodes

Test/Dev and ML in Private Cloud

Deep Learning in DC Core

Accelerated Computing ISV Partners Solution Partners

NEW!

UCS C480 ML

UCS C480 ML

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco Portfolio Alignment

Use Case

Conception

Feasibility

Study

Model

Design

Model

Deployment

Model

Maintenance

Optimized Big Data

Solutions for Data

Collection and

Preparation

Distribute and

Scale AI Inference

from the Data

Center to the Edge

Offload and

Accelerate AI

Training at Scale

with Performance

Optimized Systems

V100 P4

NVLink

C240 M5 C480 ML M5 HX220c M5

Quick Installation

of Hardware and

Software for AI

Exploration and

Experimentation

V100

HX240c M5

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco AI/ML Compute Portfolio

Test & Dev and Model Training

C240

2 x P4

6 x P4

HyperFlex

240

Deep Learning/ Training

C480

Inferencing

C/HX 220

C/HX 240

Unified Management

Simplified Management, Customer Choice, Cisco Validated Design

Option of GPU Only Nodes

2 x P100/ V100 2 x P100/ V100 Per Node

6 x PCIe P100/ V100 8x V100 with NVLink

C480 ML

Cisco IMC XML API

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

DC Infrastructure AI/ML/DL Platforms

Data / BigData

Libraries

FrameworkAI-ML

Server / Appliances

(CPU or GPU)

Simplified DC Infra To Support AI/ML Workload & Cisco

AI/ML

Apps/

APIs

Services

Manageability

Network

DC Servers and

Storage

Model Deploy/Mgmt

Existing UCS Business

Several CVDs exist

Strategic Partnerships / Cisco AS

UCSM, Intersight

Exists, Build New Partner

Build CVD

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCSCisco HyperFlex Cisco ACICisco Multicloud Portfolio

Cisco AI/ML- A Holistic Approach: Edge to Core

Accelerated Computing

for Inference at the Edge

Accelerated Computing

for AI/ML in the Core

Big Data/Analytics

and ERP

AI/ML Stack

Partnerships

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Introducing UCS C480ML

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Specification• 8 x SMX2 V100 32G Module with NVLink interconnect

• Support for Dual Intel Xeon Scalable Processor Family

• Support for 2666 MHz DDR4 DIMMS with capacity point of 128GB

• Supports up to 24 SAS/SATA HDD/SSD

• Supports up to 6 NVMe drives

• Supports up to 4 x100G VIC

• Modular Internal Flex Storage Option: M.2 SATA

• 4 x 1600W PSUs in 3+1 Redundancy

• 1 x 10/1000 dedicated management Ethernet Port

• Two 10 Base-T Gbps Ethernet Ports

Accelerate AI/ML Projects

Validated with Popular Machine

Learning Software

Industry leading Spec

Fully Integrated Platform Designed

to Accelerate Deep Learning

Simplified

Management

Cisco Intersight or

UCSM/CIMC managed

Prevents Operation Silos

Extends Existing UCS

Environments with Consistent,

Cloud-Based Management

Cisco UCS C480ML Rack Server for Deep LearningA No-Compromise Purpose Built Server for Deep Learning

40,000+ CUDA Cores

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

System Overview

CPU Module

CPU Blank

Module

KVM Access

Storage

2 x PCIe Slots(part of GPU

module)

For 10/25 or

40/100G VIC or

3rd party adapter

2 x PCIe Slots (part of GPU

module)

For 10/25 or

40/100G VIC or

3rd party adapter

4x1600W PSU Slots

Hot swappable and redundant

3+1 Redundancy

IO ModuleTwo 10 Base-T Gbps shared LoM

Display VGA port, Serial port

Dedicated CIMC mgmt. port

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

UCS C480ML - Overview

Raid Controller

NetworkChoice of 10/25 or 40/100G Four PCIe SlotsTwo 10G Base-T shared LOM on IO Module

GPUs8 X V100 32GB: 1st GPU to break 100 teraflops NVLink Interconnect: >300GB/s bandwidth

Redundant Fans

StorageUp to 24 SAS/SATA SSD/HDDUp to 6 NVMe DrivesM.2 SATA

CPUs2 * Intel® Xeon® Processor Scalable Family(Up to 28 cores per socket)24 DDR4 DIMMS—up to 3 TB Memory

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Storage Options

• Most powerful and flexible storage option in the Industry for Deep Learning workloads

• 24 Front Panel Drive Slots in 3 cages

Cage 1 Cage 2 Cage 3

NVME

slots

Cage 1 Cage 2 Cage 3

SAS/SATA drives 4 7 7

NVME/SAS/SATA 4 1 1

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Networking Options

VIC 1455 VIC 1495

Speed 10/25G 40/100G

Ports 4 2

Form Factor PCIe PCIe

FI 6200/6332/6454 6332/6454

UCS FI

Direct Connect

Single cable for both management

traffic and data traffic from VIC and

connected directly to FI

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Network Connectivity Options

Dual 10G

LoM

10G Bandwidth

Dual 10G for UCS Management and

Data

Dual 10G LoM

for MGMT and

VIC for Data

10/25/40/100G Bandwidth

Data

(10/25/40/100G)

MGMT (10G)

Dual 10G for UCS management and

multiple10/25/40/100G VIC for data

10/40G Bandwidth

FEX – 2232,

2348

Data

MGMT

UCS FI

VIC and

Dual 10G LoM

Dual 10G for UCS management and

multiple10/40G VIC for data

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Centralized Management

Across Domains and Geographies:

Up to Thousands of Servers

Global Policies

Centralized Templates

Service Profiles

Policy-based Infrastructure

No Silos

Racks, Blades, and

Hyperconverged

Embedded Management

Integrated ’Agent-less’ out of Band

Remote Management

Scriptable Interface

Large scale Management with XML

API, Python SDK, Redfish

UCS C480ML Management Options

UCS Manager Cisco Intersight (SaaS) Cisco IMC

Data Center 1 Data Center 2UCS Manager UCS Manager

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

GPU Inventory and Management

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

• PCI switch info from UCS management

• GPU and PCI adapter and link information connected to that PCI switch

GPU Module Info and Management

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

6542

37783208 2830

22035

12161

6423

3735 32182793

21078

11438

Tensorflow Training: 8GPU (V100)

Synthetic Data Real Data

C480 ML Performance DataIm

ag

es

/se

c

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Solutions and CVD

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Centric Approach: Expanding Big Data to AI/ML SolutionsCisco Validated Designs

Cloudera Data Science Work Bench Kubeflow

Hortonworks Hadoop 3.1 Data Lake

Hadoop Coupled with GPU Nodes for Deep Learning with

Jupyter Notebook

Portable, Scalable ML Stack Enabling Rapid Development

and Deployment

Integrate Hadoop and AI/ML: YARN Scheduling CPU and GPU with Docker Application Support

YARN Scheduler

HDFS Hot Tier

HDFS Cold TierHDFS

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Big Data CVD with Hortonworks and ClouderaUCS C480ML as Datanode and GPU node on Hadoop

Nvidia CUDA containers

orchestrated with YARN for

Deep Learning

ML/DL with data stored on Hadoop

GPU acceleration for DL workloads

All popular ML/DL

frameworks

01

05

10

15

20

25

30

35

40

02

03

04

06

07

08

09

11

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

31

32

33

34

36

37

38

39

41

42

01

05

10

15

20

25

30

35

40

02

03

04

06

07

08

09

11

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

31

32

33

34

36

37

38

39

41

42

01

05

10

15

20

25

30

35

40

02

03

04

06

07

08

09

11

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

31

32

33

34

36

37

38

39

41

42

01

05

10

15

20

25

30

35

40

02

03

04

06

07

08

09

11

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

31

32

33

34

36

37

38

39

41

42

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1

2

3

4

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

CISCO UCS-FI-6332

ENV

LS

STS

BCN

1

2

3

4

L1 L2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 31 3229 3027 28

UCS

C480 M5

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

ReW ritabl eRECORDER

M U L T IDVD+ReWritable

S

X

X

UCS

C480 M5

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

ReW ritabl eRECORDER

M U L T IDVD+ReWritable

S

X

X

UCS

C480 M5

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

ReW ritabl eRECORDER

M U L T IDVD+ReWritable

S

X

X

UCS

C480 M5

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

XX

NVME SSD

800 GBNVMEHWH800

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X

2 TBHD2T7KL6GN

SATA HDD

X X

NVME SSD

800 GBNVMEHWH800

X

NVME SSD

800 GBNVMEHWH800

ReW ritabl eRECORDER

M U L T IDVD+ReWritable

S

X

X

2 x Cisco UCS 6332

Fabric Interconnect

16 x C240 M5

datanode

8 x C240 M5

datanode

4 x Cisco UCS Modoc each withCompute:

8x V100 Nvidia GPU

2x6132 CPU

Storage:

24 x 2.4 TB SFF drives

Networking:

2 x40G

GPU node + datanode

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Joint Solutions

Cisco Container

Platform on

HyperFlex

KubeFlow

ML/AI

Framework

References: https://cloud.google.com/blog/big-data/2018/03/simplifying-machine-learning-on-open-hybrid-clouds-with-kubeflow

https://blogs.cisco.com/datacenter/cisco-ucs-and-hyperflex-for-ai-ml-workloads-in-the-data-center

Cisco UCS and Cisco HyperFlex

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco/Google Open Hybrid Cloud Solution

Governance

Cisco UCS

or

HyperFlex

CiscoCloudCenter

Google

Cloud Platform

Kubernetes

Google

Kubernetes

Engine

SecurityCisco

CSR1000vCisco

SteathWatch

Cluster Management

Cisco Container Platform

Networking

ContivCisco ACI

Storage

Cisco HyperFlex FlexVolume Driver

API / Mesh ManagementOpen

Service Broker

GA

Now

Kubeflow

Kubeflow

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Orchestrate AI Workloads with Kubernetes and Cisco HyperFlex

• Support for NVIDIA Tesla V100, the most advanced data center GPU ever built

• Leverage Kubernetes on NVIDIA GPUs to automate and orchestrate AI workloads on Cisco HyperFlex

• Maximize utilization across hyperconverged infrastructure

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Analytical Solutions with GPU Acceleration

GPU Accelerated Databases

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

UCS: One System for All Workloads

VDI, VSI, Distributed Databases, CI, Enterprise

Applications: Oracle and SAP HANA, Big Data and

Analytics, AI/ML with GPUs

Mainstream Workloads B200 M5

C220 M5 and C240 M5 HyperFlex

Cloud Computing, Microprocessor Design and Simulation,

Computational Fluid Dynamics (CFD),

High Frequency Trading (HFT,) Fraud Detection,

Online Gaming

Scale Out and Compute Intensive

Applications C4200 and C125

6400 Series Fabric Interconnects

AI/ML With Dense GPUs, and Memory-Intensive Mission-

Critical Enterprise Applications: In-Memory Databases

AI/ML and Scale up Workloads

C480 ML M5 and B480 M5

Data Protection, SDS, Scale-Out Unstructured Data

Repositories, Media Streaming, and Content Distribution

Data Intensive Workloads

S3260 M5

NEW

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visit Cisco Booth

• UCS C480 ML

• Machine learning solutions

on Cisco UCS

• GPU accelerated virtual

workspaces

• Quadro® Virtual DC

Workstations

For more info

http://cisco.com/go/ai-compute