linux - fosdem · 2021. 1. 17. · empty slot hot add capable cxl host bridge 2 cxl host bridge 1...

39
1 Links https://gitlab.com/bwidawsk/qemu cxl-2.0v<latest> branch https://gitlab.com/bwidawsk/linux cxl-2.0v<latest> branch https://github.com/pmem/ndctl cxl-2.0v<latest> branch https://www.computeexpresslink.org/download-the-specification

Upload: others

Post on 19-Jul-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

1

Links

• https://gitlab.com/bwidawsk/qemu• cxl-2.0v<latest> branch

• https://gitlab.com/bwidawsk/linux• cxl-2.0v<latest> branch

• https://github.com/pmem/ndctl• cxl-2.0v<latest> branch

• https://www.computeexpresslink.org/download-the-specification

Page 2: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

FOSDEM21 – Emulator devroom

Compute Express Link in QEMUBen Widawsky – Software Engineer

Page 3: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

3

Introduction

Page 4: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

4

Problem Statement

It's difficult.

Page 5: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

5

End Goal

T minus 6 months

• Linux CXL 2.0 memory device drivers within 0 days of spec release• Validate the spec

• Enable hardware vendors and validation

• Have some reusable infra for regression testing

• Potential use for guests

• Masters of our own destiny.

Page 6: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

6

Pre-silicon state of the art

• Hardware• No 2.0 FPGAs available

• No 1.1 hardware available

• Prior art• NVDIMM faked it all in Linux

• QEMU CCIX patches

This Photo by Unknown Author is licensed under CC BY-SA

Page 7: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

7

CXL 2.0

Page 8: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

8

CXL 2.0 High Level

• Open industry standard for high bandwidth, low-latency interconnect

• Connectivity between host processor and accelerators/ memory device/ smart NIC

• Addresses high-performance computational workloads across AI, ML, HPC, and Comms segments

• Heterogeneous processing: scalar, vector, matrix, spatial architectures spanning CPU, GPU, FPGA

• Memory device connectivity• PCIe PHY completely leveraged with additional latency optimization• Dynamic multiplexing of 3 protocols

• Based on PCIe® 5.0 PHY infrastructure• Leverages channel, retimers, PHY, Logical, Protocols• CXL.io – I/O semantics, like PCIe - mandatory• CXL.cache – Caching Semantics – optional• CXL.mem – Memory semantics - optional

Page 9: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

9

CXL 2.0 Usages

C X L

AcceleratorNIC

Cache

DD

RD

DR

Processor

Caching Devices / Accelerators

Usages: • PGAS NIC• NIC atomicsProtocols:• CXL.io• CXL.cache

C X L

DD

RD

DR

Processor

HB

MH

BM

Accelerator

Accelerators with Memory

Usages:• GPU• FPGA• Dense ComputationProtocols:• CXL.io• CXL.cache• CXL.memory

Memory BuffersUsages: • Memory BW expansion• Memory capacity expansionProtocols:• CXL.io• CXL.mem

Me

mo

ry

Me

mo

ry

Me

mo

ry

Me

mo

ry

C X L

DD

RD

DR

Processor

MemoryBufferCache

(Type 1 Device) (Type 2 Device) (Type 3 Device)

Page 10: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

10

CXL 2.0 Architecture

• CXL 2.0 hierarchy appears like PCIe hierarchy

• Legacy PCI SW and CXL SW sees a RP or DSP with Endpoints below

• CXL link/interface errors are signaled to RP, not RCEC

• Port Control Override registers prevent legacy PCIe software from unintentionally resetting the device and the link

CXL 1.1 Device

CPU CXL1.1 DP

RCiEPD0 F0

CX

L

CXL DVSEC

RCiEPOr RP

CXL 2.0Switch

CX

L

CXL Upstream Switch Port, Appears as PCIe USP

CXL DSPAppears as

PCIe DSPPCIe DSP

CXL 2.0 RP appears as PCIe RP

PC

Ie

CX

L 2

.0D

evice

EPD0 F0CXL

DVSEC

CXL 2.0

hierarchies

CXL 1.1 hierarchy

PC

Ie

de

vice

CXL 2.0 RP

CX

L

Empty SlotHot add capable

CXL Host Bridge 2

CXL Host Bridge

1

Root ComplexEvent CollectorPCIe RP

PC

Ie

PC

Ie

de

vic

e

Page 11: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

11

Page 12: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

12

QEMUImplementing CXL

Page 13: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

13

PCIe in QEMU

What we all know and love

• Single root complex• Endpoints

• Root ports

• Switches

• All traffic is funneled to the single host bridge

• QPI/UPI (not modeled)

Q35 Host bridge(0:0.0)

Endpoint(10:0.0)

Root Port(0:7.0)

Root Port(0:1c.0)

Endpoint(20:0.0)

Endpoint(30:0.0)

down

upstream

down

RC

iEP

(0:3

1.0) R

CiE

P(0

:31.1

)

Page 14: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

14

PCIe ~2014

Host bridge(n:0.0)

RCiEP RCiEP

Host bridge(z:0.0)

Root Port RCiEP

Host bridge(0:0.0)

Root Port Root Port

Just like CXL

CXL CXL CXL

Page 15: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

15

Options

• Use a single host bridge• Fine for basic driver bring-up

• Limited for interleave scenarios

• Risk?

• Replace Q35 with something new• A lot of work for not much gain (for CXL)

• Support Burden

• PXB• Good compromise

Page 16: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

16

PCI eXpander Bridge (PXB)

Q35 Host bridge(0:0.0)

Endpoint(10:0.0)

Root Port(0:7.0)

Root Port(0:1c.0)

Endpoint(20:0.0)

Endpoint(30:0.0)

down

upstream

down

RC

iEP

(0:3

1.0) PCI eXpander Bridge

Root Port(50:0.0)

Root Port(50:1.0)

Endpoint(60:0.0)

Endpoint(90:0.0)

Page 17: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

17

CXL Components

• CXL Type 3 device• hw/mem/cxl_type3.c

• CXL Root Port• hw/pci-bridge/cxl_root_port.c

• CXL PXB• hw/pci-

bridge/pci_expander_bridge.c

CXL eXpander Bridge

CXL RP(50:0.0)

CXL RP(50:1.0)

type3(60:0.0)

type3(90:0.0)

Page 18: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

18

CXL utilities

hw/cxl/*

• cxl-component-utils.c• DVSEC helpers• .cache/mem MMIO handler

• cxl-device-utils.c• Device MMIO• Memory device MMIO• Mailbox MMIO

• cxl-mailbox-utils.c• Mailbox command

implementations

• pci_expander_bridge.c• New CXL PXBisms

• MMIO for host bridge

• Windows (*not spec’d)

• cxl_root_port.c• PCIe root port setup

• DVSEC initialization

Page 19: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

19

CXL Type 3 Device (Memory Device)

"PCI and NVDIMM had a coherent byte addressable baby..."

- Ben Widawsky

NVDIMM

• Byte addressable

• Direct mappable

NVME

• PCIeconfigurable

• Hot swappable

Page 20: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

20

CXL Type 3 Device

• NVDIMM & PCI had a baby…

• Inherits from both interfaces

• Mailbox handling

Page 21: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

21

ACPI

• ACPI0016• Per host bridge

• _OSC

• More in the future?

• ACPI0017• singleton

• CEDT• CHBS

• More in the future?

• hw/acpi/cxl.c

Page 22: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

22

LinuxDriver Design

Page 23: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

23

CXL Type-3 vs PCI Enumeration

ACPI0017

CEDT

Host

ACPI0016

Port DVSEC

CXL Root Port

Port DVSEC

CXL DevDevice DVSEC

Root Complex

RCiEP

PCI Bridge

PCI Dev

• A driver named “cxl_acpi” attaches to the ACPI0017 instance and is thus informed of the presence of the CEDT.

• Typical ACPI PCI enumeration runs and attaches ACPI0016 as a “companion device” to the corresponding root complex

• PCI Discovery finds CXL Dev and binds a driver named “cxl_mem” by Class Code.

• HDM decoder validation and programming is coordinated via the cxl_acpi_driver that is aware of the ACPI0016 objects that are participating in an interleave set.

Page 24: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

24

CXL.mem Objects

ACPI

cxl_acpi

ACPI0017

• CEDT

• CHBS

ACPI0016

Linux

cxl_root

cxl_memdev

cxl_mem(PCI binding)

PCI/CXL.io

pci_dev: root

pci_dev: bridge

pci_dev: endpoint

Light blue: sysfs objectYellow: driverwhite: driver object

Page 25: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

25

Key

CXL.mem to libnvdimm objects (future)

1. One NVDIMM bus for all CXL.mempersistent memory

• cxl_acpi is the root driver for x86 and ACPI ARM, other archs / non-ACPI ARM may provide a different root description for CXL.mem resources

2. A mapping is a “window + port + dev” tuple

3. cxl_acpi attaches to the root of the HDM decoder hierarchy and interfaces with the CXL.mem core for HDM topology reconfiguration.

4. libnvdimm and ndctl translate existing NVDIMM provisioning flows to CXL operations

cxl_mem cxl_memdev

cxl_acpi [1,3] cxl_memroot NVDIMM bus

cxl_acpi_port

NVDIMM

cxl_memmapping[2]

NVDIMM region

WIP Not yet startedDONE

Page 26: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

26

Interfacing with CXL

• Sysfs• /sys/bus/cxl/

• IOCTL• QUERY

• SEND• Managed command set

• RAW escape command

• Future• Error reporting

Page 27: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

27

Putting it all together

• QEMU Emulation• Memory region

• MMIO

• Mailboxes

• Kernel driver

• Documentation!

• Tools

• 2.0 spec is available

Page 28: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

28

libcxl

• IOCTLndctl/cxl

cxl_mem.ko

• Mailbox

MMIO

Linux

cxl-mailbox-

utils

• Mailbox

MMIO

QEMU

Page 29: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

29

Outcome

Page 30: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

30

• November 10th 2020

Spec Released

Linux Patches submitted

QEMU patches submitted

Page 31: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

31

Results

• QEMU v2 patches sent• Minimal review

• Linux driver v3 sent• Moderate review

• Nothing upstream

• Still no hardware to test on

• Still no other viable pre-silicon platform to test on.

• Made our deadline!

• This talk!!!

Page 32: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

32

Call to Action

Page 33: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

33

Missing Pieces

• Linux• DOE

• CDAT

• DPA mapping (WIP)• Interleave

• Provisioning• Recognition

• Hotplug• Hot add• Managed remove

• Asynchronous mailbox

• Userspace• Testing

• QEMU• Better tests• DOE

• CDAT• Upstream/Downstream Ports• Interleave Support

• Host bridge• switch

• More firmware commands• Hotplug support• Error testing• Interrupt support• -----------------------• Make Q35 CXL capable• CXL type 1 and 2 devices• CXL 1.1

Page 34: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

34

Making it useful

• qemu-system-x86_64• -machine q35,<cxl>

• -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M

• -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,window-base[0]=0x4c0000000,memdev[0]=cxl-mem1

• -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0

• -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M

• CXL utilities part of ndctl being developed• https://github.com/pmem/ndctl/tree/cxl-2.0v1

Page 35: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

35

Thanks

• Mahesh Natu – CXL architectural diagrams and high level

• Dr. Debendra Das Sharma – CXL usages

• Dan Williams – Linux diagrams

Page 36: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

36

Page 37: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

3737

Backup

Page 38: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

38

Asymmetric Protocol => Reduced Complexity

Accelerator CPUA ccelerator Engine

A ccelerator Caching Agent

A ccelerator Home Agent

Mem ory Agent

CCI Ca ching Agent

CCI H ome Agent

Mem ory Agent

CCI

Core Core Core Core

Accelerator CPU

A ccelerator Engine

Mem ory Agent

BiasedCoherenceBypass

CXL/ CCI Caching Agent

CXL/ CCI Home Agent

Mem ory Agent

Core Core Core Core

CXL

CXLCache

CCI* Model – Symmetric CCI Protocol

CXL Model – Asymmetric Protocol

CXL Key Advantages:• Avoid protocol interoperability hurdles/roadblocks • Enable devices across multiple segments (e.g. client / server)• Enable Memory buffer with no coherency burden• Simpler, processor independent device development

38

Compute Express Link™ and CXL Consortium™ are trademarks of the Compute Express Link Consortium.

*Cache Coherent Interface

Page 39: linux - FOSDEM · 2021. 1. 17. · Empty Slot Hot add capable CXL Host Bridge 2 CXL Host Bridge 1 Root Complex PCIe RP Event Collector I e e. 11. 12 ... •Mahesh Natu –CXL architectural

39

CXL 2.0 Scope: Hot-Plug, Persistence, Switching, and Dis-aggregationCXL 1.1 -> CXL 2.0

Feature Description

CXL PCIe End-Point CXL device to be discovered as PCIe EndpointSupport of CXL 1.1 devices directly connected to Root-Port or Downstream Switch Port

Switching Single level of switching with multiple Virtual Hierarchies (cascaded possible in a single hierarchy)CXL Memory Fan-Out & Pooling with InterleavingCXL.Cache is direct routed between CPU and device with a single caching device within a hierarchy.Downstream port must be capable of being PCIe.

Resource Pooling Memory Pooling for Type3 device – Multiple Logical Device (MLD), a single device to be pooled across 16 VirtualHierarchies.

CXL.cacheCXL.memenhancements

Persistence (Global Persistence Flush), Managed Hot-Plug, Function Level Reset Scope Clarification, Enhanced FLRfor CXL Cache/Mem, Memory Error Reporting and QoS Telemetry

Security Authentication and Encryption – CXL.IO uses PCIe IDE, CXL defines similar capability for CXL.Mem

SoftwareInfrastructure/ API

ACPI & UEFI ECNs to cover notification and management of CXL Ports and devicesCXL Switch API for a multi-host or memory pooled CXL switch configuration and management