getting to work with openpiton · ld. coherence transaction example (2) 11 core 1 e → i core 2...

22
Getting to Work with OpenPiton Princeton University OpenPit http://openpiton.org

Upload: others

Post on 24-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Getting to Work with OpenPiton

Princeton University

OpenPithttp://openpiton.org

Page 2: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Extension Using NoCs

2

Page 3: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

P-Mesh NoC Connected I/O and Accelerators

3

DRAM DRAM

Deserializer

SerializerDMA

Buffer Accel

Tile

10 G

igab

it Et

hern

et

Page 4: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

P-Mesh NoC: packet format

CHIPID: Highest bits indicate whether the destination is on-chip or off-chip, the rest of the bits indicates the chip ID

XPOS: The position of the destination tile in the X dimension

YPOS: The position of the destination tile in the Y dimension

FBITS: The router output port to the destination

PAYLOAD LENGTH: The number of payload packets

RESERVED: Reserved Bits used by higher-level protocols.

4

RESERVED

Page 5: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

P-Mesh NoC: .h files

piton/design/include/network_define.hDefines the header flits b63-22(all except messageid, tag, and options 1)

piton/design/include/define.vhdefines the rest

5

Page 6: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Cache Coherence ProtocolDirectory-based MESI coherence Protocol- Four-hop message communication (no direct communication

between private L1.5 caches)- Uses 3 physical NoCs with point-to-point ordering to avoid

deadlock- The directory and L2 are co-located but state information are

maintained separately- Silent eviction in E and S states- No need for acknowledgement upon write-back of dirty lines

from L1.5 to L2, but writeback guard needed in some cases.

6

Page 7: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Memory Hierarchy Datapath

7

Private L1.5

Distributed shared

L2

Off-chipChipset

NoC1

NoC2

NoC3

NoC1

NoC2

NoC3

Page 8: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

NoC Messages

8

L1.5 L2L1.5/

Memory

NoC1 NoC2

NoC2 NoC3

In order to avoid deadlock, NoC3 messages will never be blocked

LoadStoreIfill…

DowngradeInv

Mem Req…

DG ackInv ack

Mem Reply…

Load AckStore Ack

Page 9: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Backup Slides

9

Page 10: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Coherence Transaction Example

10

Core 1

I → E

Core 2

I

I → E

Memory

L1.5 L1.5

L2❶Load

❷Mem Req ❸MemReply

❹ Data Ack

Core 1 Core2Ld

Page 11: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Coherence Transaction Example (2)

11

Core 1

E → I

Core 2

I → M

E → M

Memory

L1.5 L1.5

L2 ❶ Store

❷Downgrade

❸ DG Ack

❹ Data Ack

Core 1 Core2Ld

St

Page 12: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Coherence Transaction Example (3)

12

Core 1

I

Core 2

M → I

M → I

Memory

L1.5 L1.5

L2 ❶WbGuard

❷Writeback

Core 1 Core2Ld

StWb

Page 13: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Adding to OpenPiton

• AXI-Lite• Wishbone• Interfacing with the Network on Chip

13

Page 14: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Hooking up an AXI-Lite device

14

Page 15: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Interfacing with the Networks-on-Chip

1.Packet format– Highlighting key packet fields

2.Definition files– .h files

3.Instantiations in Verilog design

15

Page 16: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

NoC: packet format

64-bit flits1 packet header (64b) + X packet payload flits

(64b * X)Ex: Cache request from L1.5 to L2

Header flit + req. address flit + metadata flitEx: Cache response from L2 to L1.5

Header flit + 2x data flits (16B cache line)Ex: Instruction cache response

Header flit + 4x data flits (32B cache line)

16

Page 17: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

NoC: instantiationspiton/design/chip/rtl/chip.v.pyv

Chip-wide connections between tilesAuto generated using PYHP

17

Page 18: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

NoC: instantiationspiton/design/chip/tile/rtl/tile.v.pyv

Instantiation of NoC1/2/3piton/design/chip/tile/rtl/tile.v.pyv

Selectable between router and crossbar design

18

Page 19: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Cache Coherence ProtocolDirectory-based MESI coherence Protocol- Four-hop message communication (no direct communication

between private L1.5 caches)- Uses 3 physical NoCs with point-to-point ordering to avoid

deadlock

19

ReqI->S

DirM->S

ReqRd

AckDt

OwnerM->S

FwdRdAck

FwdRd

L1.5 L2 L1.5

Page 20: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Cache Coherence Protocol (2)Directory-based MESI coherence Protocol- The directory and L2 are co-located but state information are

maintained separately

20

L2 State Dir State Tag Data Sharer List

Page 21: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Cache Coherence Protocol (3)Directory-based MESI coherence Protocol- Silent eviction in E and S states

- No need for acknowledgement upon write-back of dirty lines from L1.5 to L2

21

ReqS->I

ReqE->I

ReqM->I

DirM->IE->I

WbGuard

Wb

Page 22: Getting to Work with OpenPiton · Ld. Coherence Transaction Example (2) 11 Core 1 E → I Core 2 I→ M E → M Memory L1.5 L1.5 L2 Store Downgrade DG Ack Data Ack Core 1 Core2 Ld

Example: Add an on-chip accelerator

1. Implement the NoC interface for the accelerator2. Design and implement the control flow for the accelerator– Use interrupt packets to init and stop the accelerator– Use special load and stores to config the accelerator– Follow the coherence protocol if a coherence cache is

maintained3. Connect the accelerator to NoCs and assign it a new tile ID4. Modify the OS code to init the accelerator if needed5. Write tests to test the accelerator

22