e8528 1530 ai hpc infrastructure on oracle cloud ...€¦ · title: e8528_1530 ai_hpc...

36

Upload: others

Post on 12-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04
Page 2: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

AI & HPC Infrastructure on Oracle Cloud Infrastructure

Marcin ZablockiSolutions Architect

https://www.linkedin.com/in/marcinzablocki/@mz_oracle

Rajan PanchapakesanPM, HPC and Big Compute

Page 3: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 4: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

WHO ARE WE?WHO ARE WE?

Page 5: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

CLOUD(Y) CITYCLOUD(Y) CITY

Page 6: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Our Journey

BARE METAL CLOUDOracle Open World 2016

Page 7: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Our Journey

ORACLE CLOUD INFRASTRUCTURERebranding & Launch October 2017

Page 8: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

NVIDIA TESLA P100

INSTANCES

Our Journey

GENERAL AVAILABILITYOracle Open World 2017

Page 9: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Bare Metal V100 Volta instances

Our Journey

GENERAL AVAILABILITYMarch 2018

Page 10: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

P100 and V100 GPU Virtual Machines.

Our Journey

TOKYOSeptember 2018

GTC

Page 11: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Our JourneyGTC

#GTC18 MUNICHOctober 2018

Page 12: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Our Journey

#GTC18 MUNICHOctober 2018

FIRST CLOUD PROVIDERWITH HGX-2 ARCHITECTURE

Page 13: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

THE METAL IN BARE METAL

Page 14: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Oracle Cloud Infrastructure BM and VM

Bare Metal Standard 52 Cores, 768 GB RAM, up to 1PB Block Storage

2x 25Gbe Network Interfaces

$ 0.0638 / core/h

Bare Metal DenseIO51.2 TB of local NVMe SSD

up to 1PB Block Storage2x 25Gbe Network Interfaces

$ 0.1275 / core/h

Virtual Machine Standard 1-16 and 1-24 Cores

7 – 320 GB RAMup to 1PB Block StorageUp to 25Gb/s Ethernet

$0.0638 / core/h

Virtual Machine Dense4 – 24 Cores

60 – 320 GB RAM3.2 TB – 25.6 TB local NVMe SSD

up to 1PB Block Storage

$0.1275 / core/h

Page 15: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Virtual Machine GPU1 P100 GPU

12 OCPU | 104 GB RAMup to 1PB Block Storage

$1.275 GPU/h

Virtual Machine GPU v21 – 4 V100 GPU

6 – 24 OCPU 104 to 360 GB RAM

up to 1PB Block StorageNVLINK

$2.25 GPU/h

Bare Metal GPU28 Cores, 192 GB RAM,

2x Tesla P100 GPUs $1.275 GPU/h

Bare Metal GPU V252 Cores, 768 GB RAM,

8x Tesla V100 GPUsNVLINK

$2.25 GPU/h

Oracle Cloud Infrastructure GPU

Page 16: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Virtual Machine GPU v35 - 22 OCPU – 3.5GHZ all core

turbo90 - 360 GB RAM

HGX-21 - 4 * V100 32 GB SXM3 GPU

NVLINKUp to 1PB Block Storage

Bare Metal GPU v348 OCPU – 3.5GHZ all core turbo

768 GB RAMHGX-2

8 * V100 32 GB SXM3 GPU NVLINK

Up to 1PB Block Storage

COMING SOON

Page 17: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Oracle Cloud Infrastructure Storage

Block Storage50 GB-16 TB volumes

Up to 25K IOPS per volume400K IOPS per host

320MB/S per volumePay for Storage not IOPS

File Storage ServiceUp to 8 Exabytes of storage

Managed distributed file serviceNFSv3 mount point

Pay for what you use

Local NVME51.2 TB of local NVMe SSD

6.4-25.6 TB for VM

Object StorageUp to 10TB per object

Highly durable

$0.0425 GB/month $0.0425 GB/month

$0.0255 GB/month

$0.0034 10,000 req

Page 18: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Oracle Cloud Infrastructure Services

PaaS Services

Autonomous Analytics Cloud

Autonomous API Platform Cloud Service

Autonomous Data Warehouse Cloud

Autonomous Integration Cloud

Autonomous Mobile Cloud Enterprise

Autonomous Visual Builder Cloud Service

Big Data Cloud

Content and Experience Cloud

Data Hub Cloud Service

Data Integration Platform Cloud

Database Cloud Service

Developer Cloud Service

Event Hub Cloud Service

Java Cloud Service

MySQL Cloud Service

Oracle SOA Cloud Service

Page 19: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

LONDONLONDON

FRANKFURTFRANKFURT

ASHBURNASHBURN

PHOENIXPHOENIX

Page 20: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

LONDONLONDON

FRANKFURTFRANKFURT

ASHBURNASHBURN

PHOENIXPHOENIX

SOONSOONCANADACANADA

SOONSOON

SOONSOON

SOONSOON

SOONSOON

SOONSOON

++

SOONSOON

SOONSOON

GOVGOV

Page 21: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Our ArchitectureAD1

AD2AD3

AD1

AD2AD3

AD1

AD2AD3

AD1

AD2AD3

Multiple fault-domains per

availability domain

Multiple availability domains

per region

Global Hyper-Scale Regions

with Backbone Network

Encrypted interconnect

between Availability Domains

Encrypted backbone between

regions

LONDONLONDON

FRANKFURTFRANKFURTASHBURNASHBURN

PHOENIXPHOENIX

Page 22: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Global Hyper-Scale Regions with Backbone Network

Truly Non-Over Subscribed Flat Networking meaning Flat, Fast & Predictable

Predictable and low (10s of micro-second) latency within Datacenter

100s of micro-seconds across Datacenters within Regions

Millisecond latency across Regions

Confidential – Oracle Internal/Restricted/Highly Restricted 22

HPC is at our CoreDatacenters & Networking

Page 23: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Multi-Tenant Bare-Metal Cloud Instances

Highly configurable private overlay networks – moves management and IO out of the hypervisor and enables lower overhead and bare metal instances

Each host providing a total of 50Gbps of Bandwidth & low latency within a Datacenter (2x25Gb/s)

Best in Class Storage Performance with up to 512TB of NVMe Block Storage per Host

Managed Distributed File System Service

Confidential – Oracle Internal/Restricted/Highly Restricted 23

HPC is at our CoreCompute & Storage

Page 24: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Myth: Cloud means I HAVE to run virtualized

0.86

0.90.88

0.93

0.89

0.82

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

LS-DYNA ANSYS

Fluent

MILC WRF HPL Stream

Top HPC Applicationsrelative performance

BM VM

• Truth OCI Bare Metal runs HPC workloads up to 18% faster than a VM

• Faster turn-around time for a simulation model leads to increased license ROI

• 15% can mean hours and days when running large simulations or AI workloads

Page 25: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

• Truth: OCI Bare Metal instances perform inline with on-premises HPC clusters

• Bare Metal instances scale workload performance linearly, similar to on-premises

• Oracle helps maximize utilization of ISV licenses while providing linear scaleup on performance – ”Best of both worlds”

Myth: HPC in the cloud is slower than on-premises

Page 26: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

• Truth: Cost of an OCI GPU instance is cheaper on a per job basis

• Hardware refresh is faster and consistent on OCI. Jobs complete faster with each generation and the price performance increases

• Cost of full GPU server on-premises is over $150,000 plus power, space, maintenance and support

Myth: GPUs are too expensive on the Cloud

0

5000

10000

15000

20000

25000

30000

35000

40000

1x GPU 2x GPU 4x GPU

Billion Word Language Model Benchmark

(words per second)

White Box

V100-PCIe

Oracle OCI

V100-DXM2

DGX1

V100-SXM2

0

5000

10000

15000

20000

25000

30000

35000

40000

1x GPU 2x GPU 4x GPU

Billion Word Language Model Benchmark

(words per second)

White Box

V100-PCIe

Oracle OCI

V100-DXM2

DGX1

V100-SXM2

Page 27: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

• Truth: FSS on removes the burden for managing file servers while costing same as block storage ($0.04)

• ANSYS Fluent runs some models faster on OCI with the model stored in FSS, even though the R/W takes slightly longer

• Read/Write happens 1 time, solve happens 5000 times

Myth: Managing storage on the cloud is difficult & expensive

0

100

200

300

Read Solve WriteS

ing

le O

pe

rati

on

Tim

e

FSS Performance Compared to Gluster

FSS GFS

0

100

200

300

Read Solve WriteS

ing

le O

pe

rati

on

Tim

e

FSS Performance Compared to Gluster

FSS GFS

Page 28: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Up to 52% less for virtual machine instances

Up to 79x less for high performance block storage

Up to 13x less for internet data egress

Page 29: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29

Openness in Oracle

Chef

Ansible

Fn Project

GraphPipe

Kubernetes

Docker

Terraform

GraalVM

Oracle Linux

Page 30: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

An enterprise data science platform that brings everything together.

Page 31: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

What is it?GraphPipe is a protocol and collection of software

designed to simplify machine learning model deployment

and decouple it from framework-specific model

implementations.

Why did we make it?We found existing solutions for model serving to be

inconsistent and/or inefficient. Without a consistent

protocol for communicating with different model servers,

it is often necessary to build custom clients for each

workload.

GraphPipe solves these problems by standardizing on an

efficient communication protocol and providing simple

model servers for the major ML frameworks.

We hope that open sourcing GraphPipe makes the model

serving landscape a friendlier place.

Features

•A minimalist machine learning transport specification

based on flatbuffers

•Simple, efficient reference model servers for Tensorflow,

Caffe2, and ONNX.

•Efficient client implementations in Go, Python, and Java.

Page 32: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Page 33: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Page 34: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

NGC Deployment

• General Availability

• Pascal & Volta Bare Metal and Virtual Machine Instance Shapes supported

• Deploy Deep Learning Frameworks, HPC Applications and Visualization Applications seamlessly

• Pre-Configured NGC Image available to deploy on OCI

• Flexibility to run dev/test on Virtual Machine GPU Instances and run production workloads on Bare-Metal Instances

https://cloud.oracle.com/iaas/gpu

Page 35: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Page 36: E8528 1530 AI HPC Infrastructure on Oracle Cloud ...€¦ · Title: E8528_1530 AI_HPC Infrastructure_on_Oracle Cloud Infrastructure Author: NVIDIA Created Date: 10/11/2018 5:50:04

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |Copyright © 2018, Oracle and/or its affiliates. All rights reserved.

How can I get started with Oracle Cloud?

Test Drive Oracle Cloud:

Cloud.Oracle.com/TryIt

• Get started with up to 235 GPU

hours or 4700 OCPU hours.

• Start with storage,

dev / test, or analytics

• More than 30 services

available via trial