memory in systemverilog - columbia universitysedwards/classes/2015/4840/m… ·  ·...

Post on 01-May-2018

219 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Memory in SystemVerilog

Prof. Stephen A. Edwards

Columbia University

Spring 2015

Implementing Memory

Memory = Storage Element Array + Addressing

Bits are expensiveThey should dumb, cheap, small, and tighly packed

Bits are numerousCan’t just connect a long wire to each one

Williams Tube

CRT-based random access memory, 1946. Used on theManchester Mark I. 2048 bits.

Mercury acoustic delay line

Used in the EDASC,1947.

32 × 17 bits

Selectron Tube

RCA, 1948.

2 × 128 bits

Four-dimensional addressing

A four-input AND gate ateach bit for selection

Magnetic Core

IBM, 1952.

Magnetic Drum Memory

1950s & 60s. Secondary storage.

Modern Memory ChoicesFamily Programmed Persistence

Mask ROM at fabrication ∞

PROM once ∞

EPROM 1000s, UV 10 years

FLASH 1000s, block 10 years

EEPROM 1000s, byte 10 years

NVRAM ∞ 5 years

SRAM ∞ while powered

DRAM ∞ 64 ms

Implementing ROMs

0/1

0

Z: “notconnected”

0

1

0

1

1

1

Add. Data

00 01101 11010 10011 010

2-to-4Decoder

A1

A0

0 1 1

1 1 0

1 0 0

0 1 0

Wordline 00

Wordline 11

Wordline 22

Wordline 33

Bitline 0

D0

Bitline 1

D1

Bitline 2

D2

Implementing ROMs

0/1

0

Z: “notconnected”

0

1

0

1

1

1

Add. Data

00 01101 11010 10011 010

2-to-4Decoder

1A10A0

0 1 1

1 1 0

1 0 0

0 1 0

Wordline 00

Wordline 11

Wordline 22

Wordline 33

Bitline 0

D0

Bitline 1

D1

Bitline 2

D2

1 0 0

0

0

1

0

Implementing ROMs

0/1

0

Z: “notconnected”

0

1

0

1

1

1

Add. Data

00 01101 11010 10011 010

2-to-4Decoder

A1

A0

0

1

2

3

D0D1D2

Implementing ROMs

0/1

0

Z: “notconnected”

0

1

0

1

1

1

Add. Data

00 01101 11010 10011 010

2-to-4Decoder

A1

A0

0

1

2

3

D0D1D2

0 01

1

1

0

1

Mask ROM Die Photo

A Floating Gate MOSFET

Cross section of a NOR FLASH transistor. Kawai et al., ISSCC 2008 (Renesas)

Floating Gate n-channel MOSFET

Channel

Drain Source

Floating Gate

Control GateSiO2

Floating gate uncharged; Control gate at 0V: Off

Floating Gate n-channel MOSFET

Channel

Drain Source

Floating Gate

Control GateSiO2+++++++++

− − − − − − −−+++++++++

− − − − − − −−

Floating gate uncharged; Control gate positive: On

Floating Gate n-channel MOSFET

Channel

Drain Source

Floating Gate

Control GateSiO2

− − −−

− − −−++++

++++

Floating gate negative; Control gate at 0V: Off

Floating Gate n-channel MOSFET

Channel

Drain Source

Floating Gate

Control GateSiO2++++++++

− − − − − − −

−−++

Floating gate negative; Control gate positive: Off

EPROMs and FLASH use Floating-Gate MOSFETs

Static Random-Access Memory Cell

Word line

Bit line Bit line

Layout of a 6T SRAM Cell

��

���

!� !�$%� $%�&$

'()�

�����"#�����

Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley,2010.

Intel’s 2102 SRAM, 1024 × 1 bit, 1972

2102 Block Diagram

SRAM Timing

A12A11

A2A1A0

CS2

D7D6

D1D0

......

CS1

WEOE

62648K × 8SRAM

CS1

CS2

WE

OE

Addr 1 2

Data write 1 read 2

6264 SRAM Block Diagram

CY6264-1

A1A2A3A4A5A6A7A8

I/O0

256 x 32 x 8ARRAY

INPUT BUFFER

COLUMN DECODERPOWERDOWN

I/O1

I/O2

I/O3

I/O4

I/O5

I/O6

I/O7CE1CE2WE

OE

Toshiba TC55V16256J 256K × 16

A17A16

A2A1A0

D15D14

D1D0

......

UBLBWEOECE

256K × 16SRAM

Dynamic RAM Cell

Row

Column

Ancient (c. 1982) DRAM: 4164 64K × 1

A7A6

A2A1A0

Din Dout

...

WECASRAS

416464K × 1DRAM

Basic DRAM read and write cycles

RAS

CAS

Addr Row Col Row Col

WE

Din to write

Dout read

Page Mode DRAM read cycle

RAS

CAS

Addr Row Col Col Col

WE

Din

Dout read read read

Samsung 8M × 16 SDRAM

BA1BA0A11A10

A2A1A0

UDQMLDQM

CKECLK

DQ15DQ14

DQ1DQ0

...

...

WECASRASCS

8M × 16SDRAM

Bank Select

Data Input Register

8M x 4 / 4M x 8 / 2M x 16

8M x 4 / 4M x 8 / 2M x 16

Sense A

MP

Outp

ut B

uffe

rI/O

Contro

l

Column Decoder

Latency & Burst Length

Programming Register

Addre

ss R

egis

ter

Row

Buffe

r

Re

fresh

Cou

nte

r

Row

Decoder

Col. B

uffe

r

LR

AS

LC

BR

LCKE

LRAS LCBR LWE LDQM

CLK CKE CS RAS CAS WE L(U)DQM

LWE

LDQM

DQi

CLK

ADD

LCAS LWCBR

8M x 4 / 4M x 8 / 2M x 16

8M x 4 / 4M x 8 / 2M x 16

Timing Register

SDRAM: Control Signals

RAS CAS WE Action

1 1 1 NOP0 0 0 Load mode register0 1 1 Active (select row)1 0 1 Read (select column, start burst)1 0 0 Write (select column, start burst)1 1 0 Terminate Burst0 1 0 Precharge (deselect row)0 0 1 Auto Refresh

Mode register: selects 1/2/4/8-word bursts, CAS latency,burst on write

SDRAM: Timing with 2-word bursts

Clk

RAS

CAS

WE

Addr Op R C C

BA B B B

DQ W W R R

Load Active Write Read Refresh

Using Memory inSystemVerilog

Basic Memory Model

Address

Data In

Write

Clock

Data OutMemory

Clock

Address A0 A1 A1

Data In D1

Write

Data Out D0 old D1 D1

Read A0

Basic Memory Model

Address

Data In

Write

Clock

Data OutMemory

Clock

Address A0 A1 A1

Data In D1

Write

Data Out D0 old D1 D1

Write A1

Basic Memory Model

Address

Data In

Write

Clock

Data OutMemory

Clock

Address A0 A1 A1

Data In D1

Write

Data Out D0 old D1 D1

Read A1

Memory Is Fundamentally a Bottleneck

Plenty of bits, but

You can only see a small window eachclock cycle

Using memory = scheduling memoryaccesses

Software hides this from you: sequentialprograms naturally schedule accesses

You must schedule memory accesses in ahardware design

Modeling Synchronous Memory in SystemVerilog

module memory(input logic clk ,input logic write

Write enable

write ,input logic [3:0] address

4-bit address

address ,input logic [7:0] data_in

8-bit input busdata_in ,

output logic [7:0] data_out 8-bit output busdata_out);

logic [7:0] memThe memory array: 16 8-bit bytes

mem [15:0];

always_ff @(posedge clkClocked

posedge clk)beginif (write)mem[address] <= data_in

Write to array when askeddata_in;

data_out <= mem[address]

Always read (old) value from array

mem[address];end

endmodule

M10K Blocks in the Cyclone V

10 kilobits (10240 bits) per block

Dual ported: two addresses, write enable signals

Data busses can be 1–20 bits wide

Our Cyclone V 5CSXFC6 has 557 of these blocks (696 KB)

Memory in Quartus: the Megafunction Wizard

Memory: Single- or Dual-Ported

Memory: Select Port Widths

Memory: One or Two Clocks

Memory: Output Ports Need Not Be Registered

Memory: Wizard-Generated Verilog Module

This generates the following SystemVerilog module:

module memory ( // Port A:input logic [12:0] address_a, // 8192 1-bit wordsinput logic clock_a,input logic [0:0] data_a,input logic wren_a, // Write enableoutput logic [0:0] q_a,

// Port B:input logic [8:0] address_b, // 512 16-bit wordsinput logic clock_b,input logic [15:0] data_b,input logic wren_b, // Write enableoutput logic [15:0] q_b);

Instantiate like any module; Quartus treats specially

Two Ways to Ask for Memory

1. Use the Megafunction Wizard+ Warns you in advance about resource usage− Awkward to change

2. Let Quartus infer memory from your code+ Better integrated with your code− Easy to inadvertantly ask for garbage

The Perils of Memory Inference

module twoport(input logic clk,input logic [8:0] aa, ab,input logic [19:0] da, db,input logic wa, wb,output logic [19:0] qa, qb);

logic [19:0] mem [511:0];

always_ff @(posedge clk) beginif (wa) mem[aa] <= da;qa <= mem[aa];if (wb) mem[ab] <= db;qb <= mem[ab];

end

endmodule

Failure: Exploded!Synthesized to an 854-pageschematic with 10280registers (no M10K blocks)Page 1 looked like this:

The Perils of Memory Inference

module twoport2(input logic clk,input logic [8:0] aa, ab,input logic [19:0] da, db,input logic wa, wb,output logic [19:0] qa, qb);

logic [19:0] mem [511:0];

always_ff @(posedge clk) beginif (wa) mem[aa] <= da;qa <= mem[aa];

end

always_ff @(posedge clk) beginif (wb) mem[ab] <= db;qb <= mem[ab];

end

endmodule

Failure

Still didn’t work:

RAM logic “mem” isuninferred due tounsupportedread-during-write behavior

The Perils of Memory Inference

module twoport3(input logic clk,input logic [8:0] aa, ab,input logic [19:0] da, db,input logic wa, wb,output logic [19:0] qa, qb);

logic [19:0] mem [511:0];

always_ff @(posedge clk) beginif (wa) begin

mem[aa] <= da;qa <= da;

end else qa <= mem[aa];end

always_ff @(posedge clk) beginif (wb) begin

mem[ab] <= db;qb <= db;

end else qb <= mem[ab];end

endmodule

Finally!

Took this structure from atemplate: Edit→InsertTemplate→Verilog HDL→FullDesigns→RAMs andROMs→True Dual-Port RAM(single clock)

clk qa[0]~reg[19..0]

D

CLK

Q

mem

SYNC_RAM

WE

CLK0

PORTBWE

PORTBCLK0

DATAIN[19..0]

WADDR[8..0]

RADDR[8..0]

PORTBDATAIN[19..0]

PORTBWADDR[8..0]

PORTBRADDR[8..0]

PORTBDATAOUT[0]

PORTBDATAOUT[1]

PORTBDATAOUT[2]

PORTBDATAOUT[3]

PORTBDATAOUT[4]

PORTBDATAOUT[5]

PORTBDATAOUT[6]

PORTBDATAOUT[7]

PORTBDATAOUT[8]

PORTBDATAOUT[9]

PORTBDATAOUT[10]

PORTBDATAOUT[11]

PORTBDATAOUT[12]

PORTBDATAOUT[13]

PORTBDATAOUT[14]

PORTBDATAOUT[15]

PORTBDATAOUT[16]

PORTBDATAOUT[17]

PORTBDATAOUT[18]

PORTBDATAOUT[19]

DATAOUT[19..0] qa[19..0]

qb[0]~reg[19..0]

D

CLK

Q qb[19..0]da[19..0]

db[19..0]

ab[8..0]

wb

aa[8..0]

wa

The Perils of Memory Inference

module twoport4(input logic clk,input logic [8:0] ra, wa,input logic write,input logic [19:0] d,output logic [19:0] q);

logic [19:0] mem [511:0];

always_ff @(posedge clk) beginif (write) mem[wa] <= d;q <= mem[ra];

end

endmodule

Also works: separate readand write addresses

clk

d[19..0]

q[19..0]

q[0]~reg[19..0]

D

CLK

Q

mem

SYNC_RAM

WE

CLK0

DATAIN[19..0]

WADDR[8..0]

RADDR[8..0] DATAOUT[19..0]ra[8..0]

wa[8..0]

write

Conclusion:

Inference is fine for singleport or one read and onewrite port.

Use the Megafunction Wizardfor anything else.

top related