serial memories fill a need - memcon · mark baumann director of applications bandwidth engine...

41
Memcon 2015 Serial Memories Fill a Need

Upload: truongtram

Post on 29-Nov-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Memcon 2015

Serial Memories Fill a Need

Agenda

v  Michael Sporer – Director of Marketing

§  The future of parallel versus serial interface for memory

v  Mark Baumann – Director of Applications Engineering

§  Based on experience at MoSys developing and introducing the GigaChip interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will describe several options for future memory interface solutions.

Copyright ©MoSys, Inc. 2015. All rights reserved. 2 MemCon 2015 - October 12th

Discrete DRAM doesn’t do Serial… yet

v  Memory is the last holdout that still hasn’t gone serial

Copyright ©MoSys, Inc. 2015. All rights reserved. 3 MemCon 2015 - October 12th

Challenges of Implementing DDR

Copyright ©MoSys, Inc. 2015. All rights reserved. 4

Source: Agilent MemCon 2015 - October 12th

DRAM bus trace length matching requirements

Design, Development & Qualification

Tradeoffs: Serial vs. Parallel

v  On the Chip §  SerDes adds costs on chip

•  MUX deMUX •  2.5GHz chip with 25 Gbps IO

v  IO Bandwidth / Chip Area §  Roughly the same on chip §  Depends on the range

v  IO Bandwidth / Power §  It depends on reach

v  On the Board §  Fewer lanes

•  25GHz is more challenging, but is solvable

§  Longer reach than parallel •  Easier board floor planning •  Distributed thermal loads

§  Greater noise immunity

v  Is it a balanced tradeoff?

Copyright ©MoSys, Inc. 2015. All rights reserved. 5 MemCon 2015 - October 12th

HMC gives them the bandwidth they need

v  “DDR has run out of pins on the package”

Copyright ©MoSys, Inc. 2015. All rights reserved. 6

Source: Xilinx  Technology  Outlook  -­‐  Liam  Madden,  FPL,  Sept-­‐2014 MemCon 2015 - October 12th

TSV Based DRAM Stacks

v  The performance potential of TSV based DRAM stacks can be realized with two very different interface and packaging solutions.

v  High Bandwidth Memory (HBM) §  Evolutionary §  wide, parallel interface

v  Hybrid Memory Cube (HMC) §  high performance serial interface.

v  Both solutions have their place in new systems design and there are advancements in both options on the horizon.

Copyright ©MoSys, Inc. 2015. All rights reserved. 7 MemCon 2015 - October 12th

and HBM is coming …

v  Just look at what AMD and nvidia have planned

Copyright ©MoSys, Inc. 2015. All rights reserved. 8 MemCon 2015 - October 12th

HBM Gen1 shipping now

HBM Gen2 coming soon

Interposer based MCM

v  Xilinx highlighted that the technology wasn’t the critical element, it was the supply chain.

Copyright ©MoSys, Inc. 2015. All rights reserved. 9

Source: Xilinx  Technology  Outlook  -­‐  Liam  Madden,  FPL,  Sept-­‐2014 MemCon 2015 - October 12th

Economics of Direct Attach HBM

v  @Customer: Can customer afford Direct Attach HBM? §  Interposer development costs §  Fixed memory footprint §  Special Supply Chain

§  What is the volume required to recoup incremental costs?

v  @Manufacturer: Can DA-HBM exist in a low volume, high mix manufacturing environment?

Copyright ©MoSys, Inc. 2015. All rights reserved. 10 MemCon 2015 - October 12th

Serial HBM: High Performance, Low Pin count

Serial HBM Solution

v  Serial HBM Reduces Risk at the Customer §  Lower Technology Risk

•  Pin count advantage for host device, •  Ease of routing a serial interface •  Standard CEI interface •  Scalable and versatile

§  Component type Supply Chain •  Inventories •  Test and Burn-In

§  Cost Advantages •  Standard board assembly

v  Serial HBM Markets §  Networking

•  Packet Buffering and high capacity tables §  Embedded

•  Supports a range of capacity and speeds with long product lifecycles •  Protects customers from changing HBM memory interface on host

v  All the Bandwidth but none of the headaches of DA-HBM

12 Copyright ©MoSys, Inc. 2015. All rights reserved.

Serial Interface HBM

shim GCI

MemCon 2015 - October 12th

Flexible Capacity Expansion : Serial

v  One host port of 16 lanes can connect to 1, 2 or 4 devices

v  No additional bus loading or pin count

v  No throughput degradation

§  Expansion example shows MoSys Bandwidth Engine

Host  

16   8  8  

4   4  4  4  

Host  

Host  1x  

4x  

2x  

13 Copyright ©MoSys, Inc. 2015. All rights reserved.

HBM MCM Yield Analysis

HBM Memory Solutions

v  Direct Attach HBM – 4 HBM §  MCM Yield §  Single Sourced §  Interface support longevity §  Memory controller complexity and power

added to ASIC

v  Serial HBM Package on Package §  Tested and optional burn in of component

HBM before MCM assembly §  shim features optimized for application §  Incremental power for additional shim ASIC §  USR SerDes for MCM

v  Serial HBM On Motherboard: §  VSR SerDes for Motherboard §  Lowest Cost, highest yield solution §  30% board area increase §  Easiest thermal solution

Copyright ©MoSys, Inc. 2015. All rights reserved. 15

ASIC 55 um

HB

M

HB

M

HB

M

HB

M

ASIC 180 um

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

ASIC 180 um

MemCon 2015 - October 12th

Serial vs. Direct Attach Value Comparison

Copyright ©MoSys, Inc. 2015. All rights reserved. 16

A+ribute   Serial  HBM     Direct  A+ach  HBM  

Technical  Risk   +  +

• Smaller  Interposer  • Discrete    Component  BI  &  Test  

-­‐  -­‐

• MCM  Yield  • HBM  Repair  

Cost   +  +

• Lower  yielded  cost  • Supply  Chain  Inventory  

-­‐  -­‐

• MCM  Development  Cost  • MCM  Yield  

Power   -­‐ •   incremental  power  /BW   + • Lower  power    Thermal   + • Distributed  sources   -­‐ • Higher  Thermal  Density  

Time  to  Market   +  +

• Proven  Standard  SerDes  • Discrete  Component  Design  

-­‐  -­‐

• HBM  Interface  IP  Availability  • MCM  Complexity  

Flexibility   +  +  +

• On  or  Off  substrate  • Memory  expansion  • Fungible  Serdes    

-­‐  -­‐

• Depopulate  or  not  • Single  purpose  HBM  IO  Block  

Reliability   +  +

• Burn-­‐In  OpUon  • Field  Repair  managed  in  Serial  HBM  

-­‐    

• JEDEC  Field  Repair  in  host  ASIC  

Supply  Chain  Ownership  

+  +  +

• Single  Point  • Discrete  component  • MulU-­‐sourced  

-­‐  -­‐  -­‐

• MulUple  or  Single  Points  • MCM  Model  • Single  Sourced  

Board  Area   -­‐ • 0%  to  30%  larger   + • baseline  

MemCon 2015 - October 12th

Normalized Yielded Cost of HBM

Copyright ©MoSys, Inc. 2015. All rights reserved. 17 MemCon 2015 - October 12th

Assembly yield expected to be 95%

HMC – Hybrid Memory Cube

v  Breakthrough in power due to TSV based construction §  5 pJ/b DRAM only

v  Combined with Logic die resulting in 24.5W per 1Tbps §  3 links @ 12.5G §  24.5 pJ/b total (vs. 39 for DDR4)

Copyright ©MoSys, Inc. 2015. All rights reserved. 18 MemCon 2015 - October 12th

Serial vs. Parallel Memory Comparison

Attribute Bandwidth Engine BE-2 | BE-3

Hybrid Memory Cube (HMC)

High Bandwidth Memory (JEDEC)

DDR4 (JEDEC)

Physical Interface Serial CEI Standard Serial CEI Std JEDEC HBM IO JEDEC DDR4 IO

Protocol GigaChip™ Interface HMC Consortium RAS/CAS

Source of Supply Dual-Sourced Single Sourced Multi-Sourced

Access TDM Scheduler Sched./Switch Banked RAM

Capacity 576 Mb 1152 Mb 16~32 Gb 32-64 Gb 4-8 Gb

Buffer Bandwidth 400 Gbps 800 Gbps 1280 Gbps 2048 Gbps 38 Gbps

Transaction Rate >4.5 Bt/s >10 Bt/s 2.6~2.9 Bt/s TBD 0.2 Bt/s

Signal Pins 66 66 272 ~1600 42

Package BGA 19x19 BGA 25x25 BGA 31x31 KGSD BGA 8x12

Power 7-11W TBA ~28W 8W estimated 0.7W

DDR4 ~ 16+20Switch

Serial IO

16 16 16 16

………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

TDM / Scheduler

Serial IO

8 8

19 Copyright ©MoSys, Inc. 2015. All rights reserved.

Channel  0   Channel  1  

HBM – 8 channels & 128 banks,

~1600 pins, Si Interposer

MemCon 2015 - October 12th

Future TSV DRAM Comparison

Copyright ©MoSys, Inc. 2015. All rights reserved. 20

Direct  A+ach  HBM   Serial  HBM  concept   HMC  

Bandwidth   equal  

Interposer  /  Yield  cost   CPU   Memory   Memory  

Power   1x     <2x   >3x  

Latency   Lowest   Low   ?  

DeterminisUc   Yes   Yes   No  

Longevity  of  Interface   5  years   indefinitely  

Field  Repair   Host  based   Serial  HBM  based   HMC  based  

Host  IO  (PHY  &  pins)   Single  Purpose   General  Purpose  and  LP  SerDes  

Test  or  Burn-­‐In   Not  possible   Possible  

Supply  Chain   MCM-­‐type   Component  

ApplicaUon  Performance  

none   OpUmized  for  applicaUon  

Generic  HMC  SpecificaKon  

Source   MulU-­‐sourced   Single  Source  

MemCon 2015 - October 12th

What to build with? It depends…

The Ultimate Network Processor’s Memory Implementation

v  Memcon 2014 MoSys presented on extreme memories for networking and showed the relative position and value for different memories for a 1.2Tbps Network processor.

v  HBM for buffering v  Serial memories

for header processing and search

v  Off chip PHY to optimize datapath

v  This is a great point solution for 1.2 Tbps datapath

v  What about less extreme systems?

Copyright ©MoSys, Inc. 2015. All rights reserved. 22 MemCon 2015 - October 12th

Fron

t Pan

el

Example 400G Line Card w/ EZchip NPS Z30 Adds 50% System Memory Bandwidth

Packet Buffer 24 x DDR4 devices

Embedded Memory

uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP

Intelligent Offload Flexible Feature &

Performance Expansion

Memory I/O Memory bandwidth for Packet Buffering, cores

and HW Accelerators

Packet Forwarding Engine

Hardware Accelerators

8-16 serial lanes

Bac

kpla

ne

MoSys Framer/

Gear Box

MoSys

MSRZ30

FIC

Flexibility + Performance “C” Programmable Processors

+ L2-L7 Accelerators

23 Copyright ©MoSys, Inc. 2015. All rights reserved.

DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR

MemCon 2015 - October 12th

800GE Using Serial HBM & BE3

Copyright ©MoSys, Inc. 2015. All rights reserved. 24

400G  PFE  (ASIC/FPGA)  

400G  PFE  (ASIC/FPGA)  

4 x 100G

4 x 100G

Optics Module

GB/RT

LineSpeed Gearbox, Retimer

Optics Module

GB/RT

LineSpeed Gearbox, Retimer

Bandwidth Engine Gen 3

Shared: •  FIB Tables • Statistics • Metering • Semaphores • Packet Buffers

MemCon 2015 - October 12th

shim

GC

I

Conclusion

v  Serial memory offers advantages over Direct Attach HBM §  Economics driven by Supply Chain §  Flexible and adaptable §  Scalable performance §  Quality and reliability §  Simplifying board design and cooling

v  Pick your memory for your application §  Memory core performance and capacity (DRAM vs. others) §  Architecture ( Point to Point versus Chainable) §  IO serial vs. parallel

v  DDR DRAM is the defacto standard based on decades of evolution and optimization. §  If DDR doesn’t meet your needs there are other options available.

Copyright ©MoSys, Inc. 2015. All rights reserved. 25 MemCon 2015 - October 12th

Mark Baumann Director of Applications

Bandwidth Engine Serial Interface (GCI)

Topics

v Parallel Interface evolution – faster, wider à How long can this Last?

v Serial Interface evolution – NRZ à PAM4 à emerging

v Interface efficiency – HMC vs. GCI vs. ILA v Standards based solutions vs. proprietary v Interface for offload (abstracted)

§ serial is better (variable size transfers) § Splitting transaction layer from transport layer

v Purpose built vs. Fungible IO

Copyright ©MoSys, Inc. 2015. All rights reserved. 27 MemCon 2015 - October 12th

NPU Interface Options Today

NPU SSTL/HSTL SerDes

DDR-3 SDRAM

RLDRAM

QDR SRAM

KBP/ TCAM

SSTL/HSTL

SSTL/HSTL SerDes

SerDes

DDR  Style   Serial  Style  

Net

wor

k &

Bac

kpla

ne In

terf

aces

XAUI

10G KR

Interlaken

PCIex

Mem

ory

& C

oPro

cess

or

28 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

NPU Interfaces Using Serial

NPU SerDes

DDR-3 SDRAM

SerDes

SerDes

Serial  Style   Serial  Style  

Net

wor

k &

Bac

kpla

ne In

terf

aces

SerDes

SerDes

DDR-3 Bridge

Enabled  by  10G  KR  GCI  enabled  SerDes  

SSTL/HSTL

3x  to  4x  Bandwidth  Density  per  mm2  

GCI  

GCI  

Interlaken  

KBP/ TCAM

Serial SRAM?

BE

XAUI

10G KR

Interlaken

PCIex Mem

ory

& C

oPro

cess

or

29 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

NPU Interfaces Using Serial

NPU SerDes

SerDes

SerDes

Serial  Style   Serial  Style  

Net

wor

k &

Bac

kpla

ne In

terf

aces

SerDes

SerDes

HMC or Ser. HBM

Enabled  by  10G  KR  GCI  enabled  SerDes  

SSTL/HSTL

3x  to  4x  Bandwidth  Density  per  mm2  

GCI  

Interlaken  

KBP/ TCAM

Serial SRAM?

BE

XAUI

10G KR

Interlaken

PCIex Mem

ory

& C

oPro

cess

or

30 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Parallel vs Serial

31 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

GigaChip Interface Layers & Frame Format

Transaction Application Specific

Data Link

Physical Coding Sublayer (PCS)

Physical Media Access Electrical

Link initialization Lane Deskew Scrambling

Reliable transport of Frames via CRC & Positive Ack

GigaChip  Interface  Protocol  

PC  Board  Trace  

BE QDR,TCAM…

32 Copyright ©MoSys, Inc. 2015. All rights reserved.

CEI  CompaUble  SerDes  

Payload DLL Rx Ack CRC

Data Link Layer Frame Format

v Frame striped across SerDes lanes (1, 2, 4, 8,16) §  Modulo 10 UI, Fixed size §  Sized to meet needs of application §  >90% bandwidth efficiency at 80b

v Data Link Layer operations §  DLL Indicates if payload is Transaction Link Layer

operation or Data Payload §  Data Link Layer operations: Replay, Pause (no-op)

v Data Payload format up to application §  Op codes, address, data…formatting left to higher level §  For memory transactions: 1 frame = transaction §  For packets: variable number of frames can be used

72b 1b 1b 6b

MemCon 2015 - October 12th

CRC Error Handling w/Positive Ack

Tx Request Transactor

Queue

Device  A    CSI  Tx  

Device  B    CSI  Rx  

CRC Error Check

Rx Target Transactor

Queue

Rx Ack Counter

Tx SerDes

Rx SerDes CRC

Gen

Ack    Count  

Compare, Set Tx

Replay if “stuck”

Tx  Replay  Queue  

Rx SerDes

Prev Rx Ack Count

Rx SerDes

PISO SIPO

6  

1  

Ack    Count  

1  

Compare  Ack,  Replay  when  

“stuck”  

Freeze  Ack  If  CRC  Error,  Resume  Replay    Frame  

Post  if  CRC  OK,  Freeze  if  not  OK,  Resume  posUng  on  Replay  Frame  

72  72  

72  +  6   72  +  6  

33 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Multi Core => Multi-Partition & Multi-bank

Copyright ©MoSys, Inc. 2015. All rights reserved. 34

Packet Processor 0

1

n-1

n

Serial Link

Serial Link

Serial Link

Serial Link

…  

…  

…  

Bandwidth Engine

Multi-cycle Scheduler

10 GA

800 Gb/s

BIST Self- repair

…  

…  

ingress   egress  

Multi-bank Multi-partitions allow for high access availability

Multi-threaded Multi-Cores allow for high processing throughput Multi-linked

allow for concurrent transport operations

ALU for functional Acceleration Local processing minimizes intra-chip traffic

Allows Extended Carrier Class & In package Repair

ALU

MemCon 2015 - October 12th

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

0   5   10   15   20   25   30   35   40  Payload  Size  (B)  

Read-­‐Only  Data  Efficiency  

BE  

ILA  

HMC  

Protocol Transfer Efficiency Comparison: Range of Payload Sizes and Applications

35 Copyright ©MoSys, Inc. 2015. All rights reserved.

Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol)

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

0   20   40   60   80   100   120   140   160   180  Payload  Size  (B)  

Read/Write  Data  Transfer  Efficiency  

BE  50:50  

HMC  50:50  

                                                                                                   HMC  128B  Block  Size                                                      HMC  64B  HMC  32B  

Packet Header Processing Application Packet Buffering Applications

Efficiency includes Transaction & Transport protocol:

Note GCI: GCI + TL 2.0

HMC  32B  Block  Size  

MemCon 2015 - October 12th

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

0   10   20   30   40   50   60   70   80  Frame  Size  (Bytes)  

ILA  

Interlaken  

GCI  2.0  

Protocol Transport Efficiency Comparison: GCI Optimized For Smaller Transfers

36 Copyright ©MoSys, Inc. 2015. All rights reserved.

GCI  +  TL  2.0  

GCI ≈ Interlaken

GCI ~ 2x Interlaken

Packet  Transfers  

Header  Processing  

MemCon 2015 - October 12th

Serial Link Rate Road Map

v  Xilinx UltraScale+ 2016 33G GTY SerDes

v  BE3 2016 Q1 31G SerDes

v  56G PAM4 is being demonstrated now

Copyright ©MoSys, Inc. 2015. All rights reserved. 37 MemCon 2015 - October 12th

CEI-56G Will Address Chip to Chip, Module, +

Copyright ©MoSys, Inc. 2015. All rights reserved. 38 MemCon 2015 - October 12th

Summary

v  GCI is a proven chip to chip reliable transport protocol §  Multiple designs in FPGA, ASIC and ASSP in production systems

v  GCI Specification is freely available without restriction on use §  Same as Interlaken model

v  GCI protocol is designed to evolve as the CEI standard evolves

v  The inherent performance efficiency of GCI naturally equates to improved energy efficiency

Copyright ©MoSys, Inc. 2015. All rights reserved. 39 MemCon 2015 - October 12th

Thank You

Copyright ©MoSys, Inc. 2015. All rights reserved. 40 MemCon 2015 - October 12th

CMOS Memory Core Technologies

Copyright ©MoSys, Inc. 2015. All rights reserved. 41

DDR

• Transaction Rate • Power • mm2/bit • Cost

#BitCells per SenseAmp

LL/RL DRAM

eDRAM

SRAM

Logic Fab

DRAM Fab (limited metal)

TCAM

Mobile DRAM

MemCon 2015 - October 12th

HMC HBM