a reconfigurable computing system based on a cache-coherent fabric

15
A Reconfigurable Computing System Based on a Cache-Coherent Fabric Presenter: Neal Oliver Intel Corporation Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, Henry Mitchel, Suchit Subhaschandra, Arthur Sheiman, Tim Whisonant, and Prabhat Gupta June 10, 2012 Presented at CARL 2012

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

A Reconfigurable Computing System Based on a Cache-Coherent Fabric

Presenter: Neal Oliver Intel Corporation

Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, Henry Mitchel, Suchit Subhaschandra, Arthur Sheiman, Tim Whisonant, and Prabhat Gupta

June 10, 2012

Presented at CARL 2012

2

Legal Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copyright © 2012, Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others

3

Topics

• Motivations for this work

• Usage Models

• Overview of Intel QuickPath Interconnect (QPI)

• Platform Architecture

• Hardware Architecture

• Programming Model

• Simulation

• Future Work

4

Motivation for this work

• Keep innovation on Intel platforms

• High-throughput, low-latency attachment to server

• Cache-coherent memory

• Enlarge memory space for FPGA platform

• Enable additional programming paradigms

• Drive requirements for future fabrics

• Platform for simulation/emulation

5

Usage Models

Intel QuickAssist Accelerator

Platform (QAP)

Accelerators •Algorithmic accelerators •e.g. Seismic imaging, genomics, computational finance

ASIC Prototyping •Significantly reduces risk for ASICs connecting to QPI •e.g. QPI attached ASICs for telecom, node controllers

CPU Bridge

FPGA QPI

FPGA Farm

OR

Big Box Emulator

CPU Accel

FPGA QPI

QPI

QPI

RTL

ASIC

Logic

ASIC

HW/SW Co-Emulation Platforms • Enables development of high performance emulation platforms • Pre-Si SW development

6

Intel QuickPath Interconnect (QPI)

QPI: A low latency, point-to-point coherent system interconnect.

QPI Caching Agent (CA) – cache devices

QPI Home Agent (HA) – DDR memory controller

Latest Intel platforms have IO integrated into CPU (not shown)

Four Socket Platform

P0 M P1

CS

I/O I/O

QPI Links

Two Socket Platform

M Memory

P Processor

CS Chipset

QPI Links

P0 M

P2 M

P1 M

P3 M

CS

I/O I/O

CS

I/O I/O

QPI Links

QPI Links

7

QPI-attached Accelerator Hardware Module (AHM)

AHM installed in Server

Substitution of AHM for CPU

QPI Links

P M

P M

M

P M

CS

I/O I/O

CS

I/O I/O

QPI Links

AHM

Altera Stratix IV Module

Xilinx Virtex 6 Module

Intel® Xeon processor 7000 series

Accelerator Hardware Module (AHM)

FPGA

PCIe attached FPGA

8

Platform Architecture

Link and Protocol

System Protocol Layer

Accelerator Function Unit(s)

(AFUs)

PHY

CPU FPGA

Intel QPI

SPL Interface

CCI - Core Cache Interface

AAL Interface

Direct Driver Interface

Accelerator Abstraction Layer

Link and Protocol

PHY

SW

ele

ments

Host Application

9

FPGA

QPI HW stack

•FPGA high speed IO operating at 4.8 to 6.4 GT/sec •Full width : 20 data lanes per direction

QPI Link and Protocol (QLP)

System Protocol Layer

(virtual memory subsystem)

Accelerator Function Units

(AFU)

QPI PHY (QPH)

•QPI Caching (cache) and Home Agent (memory controller) implemented •64KB to 256KB set-associative cache

•Manages coherency of FPGA attached DDR memory

•Implements virtual to physical address translation •Handles all cache line reordering in blocks of data •DMA engine •Memory management and protection

•Application-specific hardware module, user algorithm to be accelerated. •Can connect to SPL or CCI interface

10

QPI Home+Cache Agent Architecture

•Modular: CA, HA, CA+HA

• Interfaces: CCI- hides complexity of underlying coherent fabric. MC (optional) – provides interface to memory controller.

•QPH electrical uses existing FPGA IOs

•QLP designed using soft logic, tightly integrated, highly parallel.

•Programmable cache/snoop tables

SPL

QPI Link + Protocol

Control

QPI PHY

(QPH)Rx

AlignTx

Align

8 flits (640 bits)

Rx Control

Tx Control

8 flits (640 bits)

Core Cache

Module

Cache

Data

RAM

CCI

Cache tagCache

table

To AFU

QLP

phits

Memory

Queue

Snoop tag

Snoop

table

Memory Controller

DDR

To

DIMM

QPI link to CPU

MC interface

11

Cache Hit/Miss protocol flow

• Cache hits are an order of magnitude faster.

• Prefetch

striding patterns to improve cache hit rate

12

Programming Model

AAL functions: • Allocate shared workspaces WSi and pin in system memory

• Support both physical and virtual memory access

• Allocate and manage AFUs for application (via “proxy AFU” (PAFUi) abstraction)

• Provide remote procedure call (RFP) abstraction of AFU to application

QPI Physical (QPH) + Link / Protocol layer

System Protocol Layer ( SPL )

AFU 0 AFU 1 AFU n … .

QuickAssist FPGA

Hardware Module

System

Memory

QPI

WS 0 WS 1 WS n … .

CPU (

SOFTWARE )

Accelerator Abstraction Layer ( AAL )

Kernel Space

User

Space

Application User Library

QPI

PAFU 0 PAFU 1 PAFU n … .

Accelerator Interface Adaptor ( AIA )

( QLP )

Proxy AFU: • Provide abstraction of AFU to application

• Enables staged development of AFU algorithms

13

AFU Simulation Environment (ASE)

1 – Intel and Xeon are registered trademarks of Intel Corporation. Other trademarks are the property of their respective owners.

•AFU HW-SW co-design environment. •Models platform behavior. •Design and validate the SW against the AFU RTL in the simulator environment. •Faster simulation. •Ease of debug. System

Memory

AFU Simulator

(Transaction Level Model)

SW APP AFU RTL

Driver/MonitorAccelerator Interface

Adapter (AIA)

SPL or

CCI

Modelsim1/VCS

1 runtime

14

Future Work

• Research and pathfinding on other accelerator architectures

• Programming language/compiler development

• Benchmarking and performance characterization

Thank You

Contact – [email protected] [email protected]