architecture and performance of the tile-gx72 manycore

17
Architecture and Performance of the TILE-Gx72 Manycore Processor Matthew Mattina CTO Tilera Hot Interconnects 2013

Upload: others

Post on 14-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Architecture and Performance of the TILE-Gx72 Manycore Processor

Matthew Mattina

CTO Tilera Hot Interconnects 2013

• Overview

• iMesh Architecture

• Performance Results

Agenda

Hot Interconnects August 2013 2 © 2013 Tilera Corporation.

• Tile Array: 72x 64-bit cores – Tile = core + 256KB L2 cache + iMesh interface

• 1.2GHz, TSMC 40nm HPM

– 75W TDP

• mPIPE™: Wirespeed programmable packet

processing and load balancing engine – Dynamic flow affinity

• 8x 10GbE XAUI ports

– Feeds packets into the mPIPE, 120MPPS

• MiCA™: Cryptographic and RSA Engines

– 44K keys per second, zero core resources

• 6x PCIe ports

– 96 Gbps of dedicated PCIe and SR-IOV support

• 4x DDR3 controllers @ 1600MT/sec – > 50GB/s main memory BW

• Standard SMP Linux, C/C++, Java, gdb,…

• World’s Highest Single Chip CoreMark™ score

TILE-Gx72™: At-a-Glance

Network I/O DDR3 Controller

DDR3 Controller

MiCA

DDR3 Controller

DDR3 Controller

MiCA

mP

IPE

PCIe 2.0

8 Lanes

Flexible

I/O

UART x2,

USB x2,

JTAG,

I2C, SPI

PCIe 2.0

8 Lanes

TR

IO

PCIe 2.0

4 Lanes

32

-Lan

es

24

-Lan

es

PCIe 2.0

4 Lanes

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

10 GbE XAUI S

erD

es

4x GbE SGMII

45 x 45mm BGA package

Hot Interconnects August 2013 3 © 2013 Tilera Corporation.

Heterogeneous Devices

Homogeneous Devices

• TILE-Gx72 data plane + x86 control plane – Application Delivery Controllers

– Firewalls

– Server adaptor cards

– Emerging SDN/NFV

• TILE-Gx72 data and control plane – Single-chip VPN/Router/Firewall

– IDS/IPS Appliances

– Video Transcoding Bridge

Markets and Solutions

X86 Processor

Linux Programmable

I/O Offload

TILE-Gx Processor

DRAM

DRAM

TILE-Gx Processor

Hot Interconnects August 2013 4 © 2013 Tilera Corporation.

• Overview

• iMesh Architecture

• Performance Results

Agenda

Hot Interconnects August 2013 5 © 2013 Tilera Corporation.

• iMesh: Multiple mesh networks and cache coherence protocol interconnecting all on-die components

• Single shared global physical address space; direct cache access; full HW cache coherence

• Scalable: more tiles = more interconnect bandwidth and more shared cache BW

• Three protocol classes, three physical meshes, three different widths: keeps iMesh router simple and fast

iMesh: Scaling to 72 cores and beyond

Number of Cores

Thro

ugh

pu

t 4 × √𝑛

Hot Interconnects August 2013 6 © 2013 Tilera Corporation.

Tile

iMesh: Physical and Link Layer

• Each iMesh interface is 5x5 xbar, connects 4 neighbors + core, point-to-point wires, low power

• Fast arbiter: single cycle hop, same clock as core

• Source-based, wormhole routing, route headers coded for high-speed

• Link level FIFOs and per-hop flow control

• Handplaced M7/M8 routed over L2 cache minimizes wiring channel area

• Regular design accelerates time to market: build/verify one Tile, then replicate

Tile 0,0

Tile 0,1

Hot Interconnects August 2013 7 © 2013 Tilera Corporation.

core

core

Network Name Link Width, Bisection BW

SDN 128b, 346 GB/s

RDN 112b, 302 GB/s

QDN 64b, 173 GB/s

Network I/O

MiCA

MiCA

mP

IPE

PCIe 2.0

8 Lanes

Flexible

I/O

UART

PCIe 2.0

8 Lanes

TR

IO

PCIe 2.0

4 Lanes

32

-Lan

es

24

-Lan

es

PCIe 2.0

4 Lanes

10 GbE XAUI Se

rDe

s 4x GbE SGMII

10 GbE XAUI Se

rDe

s 4x GbE SGMII

10 GbE XAUI Se

rDes

4x GbE SGMII

10 GbE XAUI Se

rDes

4x GbE SGMII

10 GbE XAUI Se

rDes

4x GbE SGMII

10 GbE XAUI Se

rDes

4x GbE SGMII

10 GbE XAUI Se

rDes

4x GbE SGMII

10 GbE XAUI Se

rDe

s 4x GbE SGMII

© 2013 Tilera Corporation. Hot Interconnects August 2013 8

iMesh: Caching and Coherence

• Every Tile contains private 32KB L1 I and

D caches and 256KB L2 cache

• L2 caches private L2 lines and global

“L3” lines

• Distributed coherence directory tracks sharers, invalidates shared copies on writes

• Flexible cache line distribution

iMesh Protocol - Remote Write Hit with Sharer

Tile 1 Tile 0

Tile 2 Store V, 0x1

DDR3

V (a virtual address)‏ TLB P (a physical address), Home Tile coordinates

1

Directory TAG DATA

TAG(P) 0x0

Directory TAG DATA

TAG(P) 0x0

Directory TAG DATA

0, 1 TAG(P) 0x0

Hot Interconnects August 2013 9 © 2013 Tilera Corporation.

iMesh Protocol - Remote Write Hit with Sharer

Tile 1 Tile 0

Tile 2 Store V, 0x1

DDR3

V (a virtual address)‏ TLB P (a physical address), Home Tile coordinates

1

Directory TAG DATA

TAG(P) 0x42

Directory TAG DATA

TAG(P) 0x0

Directory TAG DATA

0, 1 TAG(P) 0x0

3

Cache controller writes-through to home on SDN

2 Local copy is updated TAG(P) 0x1

4 Write is processed by home cache and sharer list is inspected

5

Inval sent to sharer on QDN

6 Data invalidated!

Hot Interconnects August 2013 10 © 2013 Tilera Corporation.

iMesh Protocol - Remote Write Hit with Sharer

Tile 1 Tile 0

Tile 2 Store V, 0x1

DDR3

V (a virtual address)‏ TLB P (a physical address), Home Tile coordinates

1

Directory TAG DATA

TAG(P) 0x42

Directory TAG DATA

TAG(P) 0x42

Directory TAG DATA

0, 1 TAG(P) 0x0

3

Cache controller writes-through to home on SDN

2 Local copy is updated TAG(P) 0x1

4 Write is processed by home cache and sharer list is inspected

5

Inval sent to sharer on QDN

6 Data invalidated!

7

Inval ack sent over RDN

Hot Interconnects August 2013 11 © 2013 Tilera Corporation.

iMesh Protocol - Remote Write Hit with Sharer

Tile 1 Tile 0

Tile 2 Store V, 0x1

DDR3

V (a virtual address)‏ TLB P (a physical address), Home Tile coordinates

1

Directory TAG DATA

TAG(P) 0x42

Directory TAG DATA

TAG(P) 0x42

Directory TAG DATA

0, 1 TAG(P) 0x42

3

Cache controller writes-through to home on SDN

2 Local copy is updated TAG(P) 0x1

4 Write is processed by home cache and sharer list is inspected

5

Inval sent to sharer on QDN

6 Data invalidated!

7

Inval ack sent over RDN

8

Tile 1 is removed from sharers and new data is written to cache!

0 TAG(P) 0x1

9

Write ack sent back to Tile 0 via RDN

Hot Interconnects August 2013 12 © 2013 Tilera Corporation.

iMesh at the Movies

RR Arb, XY Route

NP Arb

NP Arb

Hot Interconnects August 2013 13 © 2013 Tilera Corporation.

NP Arb

• Overview

• iMesh Architecture

• Performance Results and Summary

Agenda

Hot Interconnects August 2013 14 © 2013 Tilera Corporation.

0

100

200

300

400

500

600

700

0 20 40 60 80

Fps

# cores

H.264 Video Encode2

© 2013 Tilera Corporation. Hot Interconnects August 2013 15

Performance Results

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Gb

ps

# cores

TCP Throughput1

65x performance improvement

30x performance improvement

1 512 connections, 1500B packets, “echo” server, full-duplex performance 2 “Toys_and_calendar” video sequence, 3Mbps, baseline profile 3 677 rules, typical traffic

Application Performance

TCP Throughput 80Gbps, 30 cores

h.264 1080p encode 22 channels, 72 cores

Suricata IDS, policy based rule set3 13Gbps, 72 cores

Packet forwarding with netfilter 80Gbps, 40 cores

Summary

SOFTWARE COMPATIBLE

Gx-36 Gx-16

Gx-9

pin compatible, 1265-Ball BGA, 37.5 x 37.5

Gx-72

1847-Ball BGA, 45 x 45

In production

General Sampling

Hot Interconnects August 2013 16 © 2013 Tilera Corporation.

The iMesh interconnect and cache coherence protocol enables Tilera’s Gx family of Manycore processors to efficiently scale performance from the TILE-Gx9 to the TILE-Gx72

Thank You!