architecture and performance of the tile-gx72 manycore
TRANSCRIPT
Architecture and Performance of the TILE-Gx72 Manycore Processor
Matthew Mattina
CTO Tilera Hot Interconnects 2013
• Overview
• iMesh Architecture
• Performance Results
Agenda
Hot Interconnects August 2013 2 © 2013 Tilera Corporation.
• Tile Array: 72x 64-bit cores – Tile = core + 256KB L2 cache + iMesh interface
• 1.2GHz, TSMC 40nm HPM
– 75W TDP
• mPIPE™: Wirespeed programmable packet
processing and load balancing engine – Dynamic flow affinity
• 8x 10GbE XAUI ports
– Feeds packets into the mPIPE, 120MPPS
• MiCA™: Cryptographic and RSA Engines
– 44K keys per second, zero core resources
• 6x PCIe ports
– 96 Gbps of dedicated PCIe and SR-IOV support
• 4x DDR3 controllers @ 1600MT/sec – > 50GB/s main memory BW
• Standard SMP Linux, C/C++, Java, gdb,…
• World’s Highest Single Chip CoreMark™ score
TILE-Gx72™: At-a-Glance
Network I/O DDR3 Controller
DDR3 Controller
MiCA
DDR3 Controller
DDR3 Controller
MiCA
mP
IPE
PCIe 2.0
8 Lanes
Flexible
I/O
UART x2,
USB x2,
JTAG,
I2C, SPI
PCIe 2.0
8 Lanes
TR
IO
PCIe 2.0
4 Lanes
32
-Lan
es
24
-Lan
es
PCIe 2.0
4 Lanes
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
10 GbE XAUI S
erD
es
4x GbE SGMII
45 x 45mm BGA package
Hot Interconnects August 2013 3 © 2013 Tilera Corporation.
Heterogeneous Devices
Homogeneous Devices
• TILE-Gx72 data plane + x86 control plane – Application Delivery Controllers
– Firewalls
– Server adaptor cards
– Emerging SDN/NFV
• TILE-Gx72 data and control plane – Single-chip VPN/Router/Firewall
– IDS/IPS Appliances
– Video Transcoding Bridge
Markets and Solutions
X86 Processor
Linux Programmable
I/O Offload
TILE-Gx Processor
DRAM
DRAM
TILE-Gx Processor
Hot Interconnects August 2013 4 © 2013 Tilera Corporation.
• Overview
• iMesh Architecture
• Performance Results
Agenda
Hot Interconnects August 2013 5 © 2013 Tilera Corporation.
• iMesh: Multiple mesh networks and cache coherence protocol interconnecting all on-die components
• Single shared global physical address space; direct cache access; full HW cache coherence
• Scalable: more tiles = more interconnect bandwidth and more shared cache BW
• Three protocol classes, three physical meshes, three different widths: keeps iMesh router simple and fast
iMesh: Scaling to 72 cores and beyond
Number of Cores
Thro
ugh
pu
t 4 × √𝑛
Hot Interconnects August 2013 6 © 2013 Tilera Corporation.
Tile
iMesh: Physical and Link Layer
• Each iMesh interface is 5x5 xbar, connects 4 neighbors + core, point-to-point wires, low power
• Fast arbiter: single cycle hop, same clock as core
• Source-based, wormhole routing, route headers coded for high-speed
• Link level FIFOs and per-hop flow control
• Handplaced M7/M8 routed over L2 cache minimizes wiring channel area
• Regular design accelerates time to market: build/verify one Tile, then replicate
Tile 0,0
Tile 0,1
Hot Interconnects August 2013 7 © 2013 Tilera Corporation.
core
core
Network Name Link Width, Bisection BW
SDN 128b, 346 GB/s
RDN 112b, 302 GB/s
QDN 64b, 173 GB/s
Network I/O
MiCA
MiCA
mP
IPE
PCIe 2.0
8 Lanes
Flexible
I/O
UART
PCIe 2.0
8 Lanes
TR
IO
PCIe 2.0
4 Lanes
32
-Lan
es
24
-Lan
es
PCIe 2.0
4 Lanes
10 GbE XAUI Se
rDe
s 4x GbE SGMII
10 GbE XAUI Se
rDe
s 4x GbE SGMII
10 GbE XAUI Se
rDes
4x GbE SGMII
10 GbE XAUI Se
rDes
4x GbE SGMII
10 GbE XAUI Se
rDes
4x GbE SGMII
10 GbE XAUI Se
rDes
4x GbE SGMII
10 GbE XAUI Se
rDes
4x GbE SGMII
10 GbE XAUI Se
rDe
s 4x GbE SGMII
© 2013 Tilera Corporation. Hot Interconnects August 2013 8
iMesh: Caching and Coherence
• Every Tile contains private 32KB L1 I and
D caches and 256KB L2 cache
• L2 caches private L2 lines and global
“L3” lines
• Distributed coherence directory tracks sharers, invalidates shared copies on writes
• Flexible cache line distribution
iMesh Protocol - Remote Write Hit with Sharer
Tile 1 Tile 0
Tile 2 Store V, 0x1
DDR3
V (a virtual address) TLB P (a physical address), Home Tile coordinates
1
Directory TAG DATA
TAG(P) 0x0
Directory TAG DATA
TAG(P) 0x0
Directory TAG DATA
0, 1 TAG(P) 0x0
Hot Interconnects August 2013 9 © 2013 Tilera Corporation.
iMesh Protocol - Remote Write Hit with Sharer
Tile 1 Tile 0
Tile 2 Store V, 0x1
DDR3
V (a virtual address) TLB P (a physical address), Home Tile coordinates
1
Directory TAG DATA
TAG(P) 0x42
Directory TAG DATA
TAG(P) 0x0
Directory TAG DATA
0, 1 TAG(P) 0x0
3
Cache controller writes-through to home on SDN
2 Local copy is updated TAG(P) 0x1
4 Write is processed by home cache and sharer list is inspected
5
Inval sent to sharer on QDN
6 Data invalidated!
Hot Interconnects August 2013 10 © 2013 Tilera Corporation.
iMesh Protocol - Remote Write Hit with Sharer
Tile 1 Tile 0
Tile 2 Store V, 0x1
DDR3
V (a virtual address) TLB P (a physical address), Home Tile coordinates
1
Directory TAG DATA
TAG(P) 0x42
Directory TAG DATA
TAG(P) 0x42
Directory TAG DATA
0, 1 TAG(P) 0x0
3
Cache controller writes-through to home on SDN
2 Local copy is updated TAG(P) 0x1
4 Write is processed by home cache and sharer list is inspected
5
Inval sent to sharer on QDN
6 Data invalidated!
7
Inval ack sent over RDN
Hot Interconnects August 2013 11 © 2013 Tilera Corporation.
iMesh Protocol - Remote Write Hit with Sharer
Tile 1 Tile 0
Tile 2 Store V, 0x1
DDR3
V (a virtual address) TLB P (a physical address), Home Tile coordinates
1
Directory TAG DATA
TAG(P) 0x42
Directory TAG DATA
TAG(P) 0x42
Directory TAG DATA
0, 1 TAG(P) 0x42
3
Cache controller writes-through to home on SDN
2 Local copy is updated TAG(P) 0x1
4 Write is processed by home cache and sharer list is inspected
5
Inval sent to sharer on QDN
6 Data invalidated!
7
Inval ack sent over RDN
8
Tile 1 is removed from sharers and new data is written to cache!
0 TAG(P) 0x1
9
Write ack sent back to Tile 0 via RDN
Hot Interconnects August 2013 12 © 2013 Tilera Corporation.
iMesh at the Movies
RR Arb, XY Route
NP Arb
NP Arb
Hot Interconnects August 2013 13 © 2013 Tilera Corporation.
NP Arb
• Overview
• iMesh Architecture
• Performance Results and Summary
Agenda
Hot Interconnects August 2013 14 © 2013 Tilera Corporation.
0
100
200
300
400
500
600
700
0 20 40 60 80
Fps
# cores
H.264 Video Encode2
© 2013 Tilera Corporation. Hot Interconnects August 2013 15
Performance Results
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30
Gb
ps
# cores
TCP Throughput1
65x performance improvement
30x performance improvement
1 512 connections, 1500B packets, “echo” server, full-duplex performance 2 “Toys_and_calendar” video sequence, 3Mbps, baseline profile 3 677 rules, typical traffic
Application Performance
TCP Throughput 80Gbps, 30 cores
h.264 1080p encode 22 channels, 72 cores
Suricata IDS, policy based rule set3 13Gbps, 72 cores
Packet forwarding with netfilter 80Gbps, 40 cores
Summary
SOFTWARE COMPATIBLE
Gx-36 Gx-16
Gx-9
pin compatible, 1265-Ball BGA, 37.5 x 37.5
Gx-72
1847-Ball BGA, 45 x 45
In production
General Sampling
Hot Interconnects August 2013 16 © 2013 Tilera Corporation.
The iMesh interconnect and cache coherence protocol enables Tilera’s Gx family of Manycore processors to efficiently scale performance from the TILE-Gx9 to the TILE-Gx72