s7281: device lending: dynamic sharing of gpus in a...

Post on 22-Jul-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster

Jonas MarkussenPhD studentSimula Research Laboratory

• Motivation

• PCIe Overview

• Non-Transparent Bridges

• Device Lending

Outline

Distributed applications may need to access and use IO resources that are physically located inside remote hosts

Front-end

Interconnect

... ......

...Control+Signaling+

Data

ComputenodeComputenodeComputenode

…… …

• rCUDA

• CUDA-awareOpenMPI

• CustomGPUDirect RDMAimplementation

• ...

Software abstractions simplify the use and allocation of resources in a cluster and facilitate development of distributed applications

Front-end

Logicalviewofresources

...Control+Signaling+

Data

Handled in software

Local

Remote

Application Application

Localresource Remoteresourceusingmiddleware

CUDAlibrary+driver CUDA– middlewareintegration

PCIe IObus

CUDAdriver

Interconnecttransport(RDMA)

Interconnecttransport(RDMA)

Middlewareservice/daemon

Interconnect

Middlewareservice

PCIe IObus

In PCIe clusters, the same fabric is used both as local IO bus within a single node and as the interconnect between separate nodes

ExternalPCIecable

PCIe interconnectswitch RAM Memorybus

PCIe bus

PCIe interconnecthostadapter

PCIe IOdevice

CPUandchipset

Interconnectswitch

Local

Remote

Application

Localresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe-basedinterconnect

Remoteresourceovernativefabric

Application

CUDAlibrary+driver

PCIe IObus

PCIe Overview

PCIe is the dominant IO bus technology in computers today, and can also be used as a high-bandwidth low-latency interconnect

PCI-SIG.PCIExpress3.1BaseSpecification,2010.http://www.eetimes.com/document.asp?doc_id=1259778

0

5

10

15

20

25

30

35

Gen2 Gen3 Gen4

Gigabytesp

erse

cond

(GB/s)

PCIex4PCIex8PCIex16

Memory reads and writes are handled by PCIe as transactions that are packet-switched through the fabric depending on the address

RAM

PCIe device PCIe device

PCIe device

CPUandchipset

• Upstream• Downstream• Peer-to-peer(shortestpath)

IO devices and the CPU share the same physical address space, allowing devices to access system memory and other devices

RAM

PCIe device PCIe device

PCIe device

CPUandchipset

Addressspace0x00000…

0xFFFFF…

IOdevice

IOdevice

IOdevice

Interruptvecs

• Memory-mappedIO(MMIO/PIO)• DirectMemoryAccess(DMA)• Message-SignaledInterrupts(MSI-X)

RAM

0xfee00xxx

Non-Transparent Bridges

RAM

Remote address space can be mapped into local address space by using PCIe Non-Transparent Bridges (NTBs)

PCIe NTBadapter

CPUandchipset

PCIe NTBadapter

CPUandchipset

Addressspace

NTB

LocalhostRemotehost

LocalRAM

Local Remote0xf000 0x9000

. . . . . .

NTBaddr mapping

RAM

Using NTBs, each node in the cluster take part in a shared address space and have their own “window” into the global address space

NTB-basedinterconnect

Globaladdr space

A B C

Addr spaceinB

Addr spaceinC

LocalRAM

LocalIOdevices

Globaladdr space

A’saddr space

LocalRAM

LocalIOdevices

C’saddr space

Exportedaddressrange

Addr spaceinA

Device Lending

A remote IO device can be “borrowed” by mapping it into local address space, making it appear locally installed in the system

RAM

NTBadapter0xe000

CPUandchipset

NTBadapter0x1000

CPUandchipset

RAM

Physicaldevice0xb000

Inserteddevice0x2000

Devicedriver

Remote Local0xb000 0x2000

. . . . . .

NTBaddr mapping

Owner Borrower

PCIe hot-plug

By intercepting DMA API calls to set up IOMMU mappings and inject reverse NTB mappings, physical location is completely transparent

RAM

NTBadapter0xe000

CPUandchipset

NTBadapter0x1000

CPUandchipset

RAM

Physicaldevice0xb000

Inserteddevice0x2000

Devicedriver

Local Remote0xf000 0x5000

. . . . . .

NTBaddr mapping

Owner Borrowerdma_addr = dma_map_page(0x9000);

IOMMU

IOV Phys0x5000 0x9000

. . . . . .

Use addr0xf000

Local

Remote

Application

Borrowed remoteresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

Unmodifiedlocaldriver(withhot-plugsupport)

ResourceappearslocaltoOS,driver,andapp

Hardwaremappingsensurefastdatapath

WorkswithanyPCIe device(evenindividualSR-IOVfunctions)

Local

Remote

Application Application

Borrowed remoteresource Remoteresourceusingmiddleware

CUDAlibrary+driver CUDA– middlewareintegration

CUDAdriver

Interconnecttransport(RDMA)

Interconnecttransport(RDMA)

Middlewareservice/daemon

Interconnect

Middlewareservice

PCIe IObus

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

Local

Remote

Application Application

Borrowed remoteresource Localresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

CUDAlibrary+driver

PCIe IObus

0

2

4

6

8

10

12

14

4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB

Gigabytesp

erse

cond

(GB/s)

Transfersize

bandwidthTest(Local) bandwidthTest(Borrowed) PXH830DMA(GPUDirectRDMA)

1. Nvidia CUDA8.0SamplesbandwidthTest2. GPUDirect RDMAbenchmarkusingDolphinNTBDMA

https://github.com/Dolphinics/cuda-rdma-bench

Device-to-host memory transfer

GPU: Quadro P400 Nvidia driver: Version 375.26(Centos7)

CPU: XeonE5-1630 3.7 GHz Memory: DDR42133MHz

Devicepool

Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices

TaskA TaskB TaskC

GPU

SSD SSD SSD

NIC FPGA GPU

GPU GPU GPU

CPU+chipsetRAM

CPU+chipsetRAM

CPU+chipsetRAM

NTB

NTB

NTB

SSDSSDSSD

NIC

FPGA

TaskB

TaskA

GPU GPU GPU SSD

FPGA NIC

GPU SSD

SSD

GPUGPUGPUTaskC

EIR – Efficient computer aided diagnosis framework for gastrointestinal examination

Examinationroom Examinationroom

Serverroomhttp://mlab.no/blog/2016/12/eir/

Moving forward

• Strategy-based management

• Fail-over mechanisms

• VFIO and other API integration (“SmartIO”)

• Borrowing vGPU functions

Thank you!“Device Lending in PCI Express Networks”ACM NOSSDAV 2016

“Efficient Processing of Video in a Multi Auditory Environment using Device Lending of GPUs”ACM Multimedia Systems 2016 (MMSys’16)

“PCIe Device Lending”University of Oslo 2015

DeviceLendingdemoandmoreVisit Dolphin in exhibition area (booth 625)

jonassm@simula.no

MyemailaddressSelected

publications

top related