high-density multi-tenant bare-metal cloud with memory

18
High-density Multi-tenant Bare-metal Cloud with Memory Expansion SoC and Power Management Authors: [email protected] [email protected] [email protected] HotChips

Upload: others

Post on 22-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-density Multi-tenant Bare-metal Cloud with Memory

High-density Multi-tenant Bare-metal Cloud with Memory

Expansion SoC and Power Management

Authors:

[email protected]

[email protected]

[email protected]

HotChips

Page 2: High-density Multi-tenant Bare-metal Cloud with Memory

For security and isolation

2

3 For single-thread performance

4

Why Baremetal Cloud and What is X-Dragon?

Alibaba

Cloud

For interoperability and manageability

For multi-tenancy and cost efficiency

1

Page 3: High-density Multi-tenant Bare-metal Cloud with Memory

Problems

Problem1: VM-based Cloud has non-

ignorable virtualization overhead,

isolation/security concern and limited

single thread performance, but good

manageability

Problem2: Existing bare-metal cloud

design for single tenant, lack of

manageability and also costly

There are VM-based cloud, single-tenant bare-metal cloud and BM-Hive(Multi-tenants bare-metal cloud) in Datacenter

VM-based

Cloud

Legacy

Baremetal

Cloud

Xdragon: Design for cloud with multi-

tenant, secure, high performance and

easy manageable

Page 4: High-density Multi-tenant Bare-metal Cloud with Memory

Same cloud

infrastructure

KVM vs X-Dragon

Same tools to

manage

Both Multi-tenants

X-Dragon High Level View in Cloud

More secure and selectable

bare-metal performance

Page 5: High-density Multi-tenant Bare-metal Cloud with Memory

X-Dragon System Architecture

Compute Boards + Base Server1

Hardware implementation of

virtio devices2

Custom backend: BM-Hypervisor3

Page 6: High-density Multi-tenant Bare-metal Cloud with Memory

X-Dragon: IO Bond and Backend

Shadow Ring buffer design

Transfer data between computing board

and backend base server

BM-Hypervisor design

Emulate virtio-devices, and connect into

existing cloud infrastructure

Page 7: High-density Multi-tenant Bare-metal Cloud with Memory

• X-Dragon BM-Guest vs Native vs VM: BM-

Guests are slightly better performance than VM

• Memory bandwidth: BM-Guests are same as

Native. VM 98% of BM-Guests under load

Evaluation: CPU/Mem/IO performance

• Network PPS: Same PPS rate, however more

implied volatility.

• Latency: Same in application level, longer path

then DPDK bypass-kernel testing

• Storage: substantially better than VM from latency

and long tail.

Page 8: High-density Multi-tenant Bare-metal Cloud with Memory

• Nginx

• MariaDB

• Redis

Evaluation: Real business

X-Dragon BM guest performs

substantially better than the

virtualization-based cloud service for the

popular applications used in the cloud

Page 9: High-density Multi-tenant Bare-metal Cloud with Memory

Memory Pool

2

X-Dragon based Infrastructure Enhancement

Alibaba

Cloud Cloud App Aware Power Management

1

Page 10: High-density Multi-tenant Bare-metal Cloud with Memory

Memory Pool

PMEM

PMEM

xNIC

xNIC

DDR4

CPU/FPGA

CPU/FPGA

DDR4

C PU 0 C PU 1

DDR

DDR

xNIC

xNIC

M em X

C Ccontroller/bridge

C ache Line M anager(rack)

M EM PM EM N VM e

D D R 4D D R 4

D D R 4

PM EMPM EM

PM EM

SSD

SSD

SSD

C om pute

PC Ie

D atabus

Retimer

Retimer

To C Csw itch

To C Csw itch

To non-C Csw itch

PC IeC acheC oherence

R ack /LocalPool Sw itch Fabric

Ether/Ether-C C sw itch

PC Ie /PC Ie-C C sw itch

DDR4

xNIC

PC Ie

To non-C Csw itch

CPU/FPGA

MEM

xNIC

xNIC

CPU/FPGA

CPU/FPGA

PMEM

xNIC

PC Ie

To non-C Csw itch

CPU/FPGA

PMEM

B M C B M C

PMEM

PMEM

DDR4

CPU/FPGA

CPU/FPGA

DDR4

DDR4

To C Csw itch

CPU/FPGA

MEM

CPU/FPGA

CPU/FPGA

PMEM

To C Csw itch

CPU/FPGA

PMEM

B M C B M C

R em ote Pool

EtherSw itch (A IO ps)R M C

D atabus

Page

manager

Page

manager

To non-C Csw itch

C Ccontroller/bridge

C ache Line M anager(node)

Page 11: High-density Multi-tenant Bare-metal Cloud with Memory

ROI Analysis

Page 12: High-density Multi-tenant Bare-metal Cloud with Memory

On Compute & Rack

PCIe

PCIe

Cache Line

converter

MEM

Controller

DDR

DDR

DDR

PMEM

PMEM

PMEM

NVMe

SSD

SSD

SSD

PCIe

Page Manager

Buffer /

Queue

ARM

Prediction &

Prefetch

ACCL

Lookup &

order mgmtWarm-up

Local

processing Eth

ern

et

/P

CIe

CC bridge

NV

Me

-oF

Re

liab

leX

fer

xNIC

Alibaba

Vendor

OS Kernel Mem Mgmt

Hypervisor Memory Mgmt

Instance

Host

• Memory allocation

• Page fault handling

• Performance

optimization

Instance Instance

Ve

rbs

CC

Controller

Page 13: High-density Multi-tenant Bare-metal Cloud with Memory

Type Test Potential Benefits

Traditional Compute Mid to high utilization Lower performance, higher density

Middleware E-Commerce Lower performance, higher density

Micro Services E-Commerce Lower performance, higher density

AI Ali Native training & inference Unacceptable for training

Encyption & Compression Standard payload pre-/post-processing Easier to scale out

Placement & Migration Large instances Faster; saving network b/w

Checkpointing & Mirroring Cloud based HPC High performance checkpointing enabled

NFV Host gateway Depends; easier to provision

Database In-memory DB Cost down significantly

Graph Large social apps Cost down significantly; minor programming modelchange

Upgrade & Deployment Patching & initialization Faster upgrade & composing

Workloads & Potential Benefits

Page 14: High-density Multi-tenant Bare-metal Cloud with Memory

Power Management Platform

Page 15: High-density Multi-tenant Bare-metal Cloud with Memory

Highly Available Management

CPUMSR/MMCFG

Bus master /

proxy

BMC

Accelerators(GPU, FPGA)

DDR

CPU

Chipsets

RoP

Storage

Network

FANS

A

P

I

IB Sub-AgPnPent

Coordinator

OOB PnP Sub-Agent

PnP Agent Compute

Node

Inte

rface

to

Pn

P M

aste

r

Alibaba Power Agent• In-Band Power Management• Out-of-Band Power Management

Server Platform• Fine granularity power and

performance telemetry & controlknobs

• In-Band and Out-of-Band Control Channels

Page 16: High-density Multi-tenant Bare-metal Cloud with Memory

Capping & Budgeting

Rack/Node Power Capping App Driven Power Budgeting

Page 17: High-density Multi-tenant Bare-metal Cloud with Memory

Performance Awareness

Identify IDC, server and app control knobs with least performance impact

Page 18: High-density Multi-tenant Bare-metal Cloud with Memory