eceg-3202:computer architecture and organization, dept of ece, aau 1 memory system design

ECEG-3202:Computer Architecture and Organization, Dept of ECE, AAU

1

Memory System Design


2

Characteristics of a Memory System

• Location– Processor– Internal (Main)– External (Secondary)

• Capacity– Word Size– Number of Words

• Unit of Transfer– Word– Block


3


• Access Method– Sequential (Tape)

• Start at the beginning and read through in order• Access time depends on location of data and previous

location

– Direct (Disk)• Individual blocks have unique address• Access is by jumping to vicinity plus sequential search• Access time depends on location of data and previous

location

– Random (RAM/ROM)• Individual addresses identify locations exactly• Access time is independent of location or previous

access


4


• Access Method (Contd.)– Associative (Cache)

• Based on content• Data is located by a comparison with contents of a

portion of the store• Access time is independent of location or previous

access


5


• Performance– Access Time

• Time between presenting the address and getting the valid data

– Cycle Time• Time may be required for the memory to “recover” before

next access• Cycle time is access + recovery

– Transfer Rate• Rate at which data can be moved


6


• Physical Type– Semiconductor

• RAM / ROM

– Magnetic• Disk & Tape

– Optical• CD & DVD

– Magneto-Optical• CD-RW


7


• Physical Characteristics– Charge Decay– Volatile / Non-Volatile– Erasable / Non-Erasable– Power Consumption

• Organization– Physical arrangement of bits into words

• Not always obvious


8

Memory Hierarchy

• Memory design is governed by three questions:– How large?– How fast?– How much?

• Three rules:– Faster access time, greater cost per bit.– Greater capacity, smaller cost per bit.– Greater capacity, slower access time.

To solve this dilemma, designers use a hierarchy of memory systems.


9

Memory Hierarchy

RegisterCache

Main Memory

Magnetic DiskCD ROMCD-RW

DVDDVD-RW

Magnetic TapeWORM

Inboard Memory

Outboard Storage

Off-LineStorage

< $ / bit

> Capacity

> Access Time

< Frequency of access


10

Locality of Reference

• The memory hierarchy presented works because of a natural phenomena known as “locality of reference”.

• During the execution of a program, memory references for instructions and data tend to cluster.

• Keeping the current cluster in the faster memory level allows faster memory access.


11

Main Memory

• Relatively large and fast.

• Used to store programs and data during the computer operation.

• The principle technology is based on semiconductor ICs.

• Usually referred to as Random Access Memory (RAM).– The more accurate name would be Read / Write

Memory (R / WM)


12

RAM

• Allows both read and write operations.– Both operations are performed electrically.

• Volatile.– Used for temporary storage only.– If the power is disconnected, the contents become

invalid.

• Two main varieties.– Static.– Dynamic.


13

Dynamic RAM (DRAM)

• Usually used for Main Memory in most computer systems.– Inexpensive.

• Uses only one transistor per bit.– Data is stored as charge in capacitors.– Destructive read.

• Charge on capacitoris drained during a read.

• Data must be re-writtenafter a read.


14

DRAM – (Contd.)• Charge on a capacitor decays naturally.

– Therefore, DRAM needs refreshing even when powered to maintain the data.

– Refreshing is done by reading and re-writing each word every few milliseconds.

• Refresh Rate.

– During “suspended” operation, notebook computers use power mainly for DRAM refresh.


15

Static RAM (SRAM)

• Consists of internal flip flop like structures that store the binary information.– No charges to leak.

• No refreshing is needed.

– Non-destructive read.– More complex construction.

• Larger cell, Less dense.

– More expensive.– Faster.

• Usually used for Cache Memory.


16

SRAM vs. DRAM

• Storage cells in DRAM are simpler and smaller.+ DRAM is more dense.

• More bits per square area.

+ DRAM is less expensive.+ DRAM uses less power.

– DRAM requires extra circuitry to implement refresh mechanism.

– DRAM is slower.


17

SRAMChip

Organization


18

Bi-directional Data in and Out Pins


19

Read Only Memory (ROM)

• Read but cannot write.

• Non volatile.

• Used for:– Microprogramming.– System programs.– Whole programs in embedded systems.– Library subroutines and function tables.– Constants.

• Manufactured with the data wired into the chip.– No room for mistakes.


20

ROM Structure


21

Programmable ROM (PROM)

• Non volatile.

• Can be programmed - written into - only once.

• Programming is done electrically and can be done after manufacturing.

• Special equipment is needed for the programming process.– Uses fuses instead of diodes.

• Fuses that need to be removed are “vaporized” during the programming process using a high voltage pulase (10 – 30 V).

• CAN NOT BE ERASED.


22

Erasable PROM (EPROM)

• Uses floating-gate MOS transistors with insulating material that changes behavior when exposed to ultraviolet light.– Programmed electrically and erased optically.– Erasing can be repeated a relatively large but limited

number of times (~100,000 times).– Erasing time ~20 minutes.

• Electrically read and written.– Before writing, ALL cells must be erased by exposure to

ultraviolet light.

• Non volatile.

• More expensive than PROM.


23

Electrically Erasable PROM (EEPROM)

• Uses the same floating-gate transistors, except that the insulating material is much thinner.– Its operation can be inverted using voltage.

• Can be written to any time without erasing the previous contents.– Only the bytes addressed are modified.– Write takes a relatively long time (~100sec/byte).– Can be erased only about 10,000 times.

• Non volatile.

• Updatable in place.

• More expensive and less dense than EPROM.


24

Flash Memory

• Called flash due to the speed of re-programming.

• Uses electrical erasure technology.– An entire chip can be erased in 1-2 sec.

• Possible to erase only blocks of data.– Does not provide byte level erasure.

• Uses one transistor per bit.– Very high density.

• Cost is between EPROM and EEPROM.

• Non Volatile.


25

Organization of a Memory Chip

• The basic element of a semiconductor memory is the memory cell.– There are different types, but they all share some

common properties:• Two states, 1 and 0.• It is possible to write into the cell. (At least once).• They can be read to sense the state.


26

Organization of a Memory Chip

• How to organize a-16 Mbit chip?– 1 Mega words of 16 bits each.

• Tall and narrow organization.

– Chips like to be square.

– Typical organization is:• 2048 x 2048 x 4bit array.• Organized internally as a square structure with decoders

for row and column.– Simplifies decoding logic.

– Reduces number of address pins.

» Row and column address bits are multiplexed.


27

Organization of the Memory Chip


28

Memory Module Organization

• Most high capacity RAM chips contain only a single bit per location.– To build a multi-bit per location module, we will

need multiple chips.

• Design a 256K Byte memory system using 8 256K X 1 chips.– 256K requires 18 address wires

• We will apply 9 wires to the row selectors and 9 to the column selectors

– The outputs of the chips are combined together to form the 8 bit output of the system.


29

Organization of the 256 K Byte System

• Each chip receives all 18 bits of the address.

• Each chip produces/receives a single bit of the data.


30

Memory Module Organization

• What if the size of the system is not the same as the chips?

• Design a 1 MByte system using 256K X 1 chips.– We will have to arrange the chips themselves into

columns and rows.• There will be 4 columns of chips.

– Number of columns = system’s address space / chip’s address space.

• There will be 8 rows of chips.– Number of rows = system’s word size / chip’s word size.

– Some of the address wires will have to be used for selecting different rows of chips.


31

Organization of the 1 M Byte System


32

Associative Memory

• Many applications require the search for the location of a particular item in a table in memory.

• Find the name of the student whose ID is 97xxxxx.

• The easiest way would be to search through all records sequentially to find the matching record.– Response varies tremendously based on size of

table and location of item in the table.

• The solution is to find a way to check all entries at the same time and identify the matching one.


33

Associative Memory

• Associative memory consists of four main items:– The memory array.– The input argument register.– A mask register to select specific bits from the

argument for matching (if needed).– A match register.

MemoryArray

MATCH

Argument

Mask


34

Associative Memory

• The argument is masked using the contents of the mask register.

• The argument is then sent to the memory array for comparison.– Each entry in the memory array contains a comparator that

compares the entry’s contents with the argument.– If they match, a bit in the match register is set.

• The match register contains a bit that corresponds to each location in the memory array.– Once the matching is done, the match register will contain an

indication of which locations matched the argument.

• If the request was for all entries containing a certain field, the user gets back the contents of all memory locations containing that field.


35

Cache Memory

• Cache Memory is intended to give:– Memory speed approaching that of the fastest

memories available.– Large memory size at the price of less expensive

types of semiconductor memories.

• Small amount of fast memory.

• Sits between normal main memory and CPU.

• May be located on CPU chip or module.


36

Conceptual Operation

• Relatively large and slow main memory together with faster, smaller cache.

• Cache contains a copy of portions of main memory.

• When processor attempts to read a word from memory, a check is made to determine if the word exists in cache.– If it is, the word is delivered to the processor.

– If not, a block of main memory is read into the cache, then the word is delivered to the processor.

MainMemory

Cache MemoryCPU

BlockTransfer

WordTransfer


37

Hit Ratio

• A measure of the efficiency of the cache structure.– When the CPU refers to memory and the word is

found in the cache, this called a hit.– When the word is not found in cache, this is called

a miss.

• Hit ratio is the total number of hits divided by the total number of access attempts (hits + misses).– It has been shown practically that hit rations higher

than 0.9 are possible.


38

Cache vs. Main Memory Structure

Block(K words)

0

1

2

2n - 1

.

.

.

Tag Block

Block Length(K Words)

0

12

3

C-1

CacheMainMemory

Word Length


39

Main Memory and Cache Memory

• Main Memory consists of 2n addressable words.– Each word has a unique n-bit address.

• We can consider that main memory is made up of blocks of K words each.– Usually, K is about 16

• Cache consists of C lines of K words each.

• A block of main memory is copied into a line of Cache.– The “tag” field of the line identifies which main memory

block each cache line represents


40

Elements of Cache Design

• Size

• Mapping function

• Replacement algorithm

• Write policy

• Line size

• Number of caches


41

Mapping Function

• There are fewer cache lines than memory blocks.– How do we map a memory block to a cache line?

• Assume the following:– Cache can hold 64 Kbytes.– Data is transferred in blocks of 4 bytes.

• Cache is 16K lines of 4 bytes each.

– Main memory is 16 Mbytes.• Memory is 4M blocks of 4 bytes each.

• How do we map the 4M blocks into the 16K lines?


42

Direct Mapping

• Map each block of memory into only one possible cache line.– A block of main memory can only be brought into

the same line of cache every time.

Cache Line Main memory blocks assigned

0 0, C, 2C, 3C, …

1 1, C+1, 2C+1, 3C+1, …

… …

C – 1 C-1, 2C-1, 3C-1, 4C-1, …


43

Direct Mapping

• A main memory address is considered to be made up of two pieces:– Block address

• Upper bits of the address

– Word address within a block• Lower bits of the address

• The block address section is further considered to be made of two items:– Cache line number

• Lower bits

– Tag


44

Tag Line or Slot Word

Direct Mapping Address Structure

• 16 Mbytes of memory.– 24 bits in address.

• 4 byte blocks.– 2 bits.

• 16 K lines in cache.– 14 bits.

• Rest is used to identify the block mapped to the line.

8 14 2


45

Reading From a Direct Mapped System

• The processor produces a 24 bit address.

• The cache uses the middle 14 bits to identify one of its 16 K lines.

• The upper 8 bits of the address are matched to the tag field of the cache entry.– If they match, then the lowest order two bits of the

address are used to access the word in the cache line.

– If not, address is used to fetch the block containing the specified word from main memory to the cache.


46

Direct Mapping Cache Organization


47

Direct Mapping

• Advantages.– Simple.– Inexpensive to implement.

• Disadvantages.– There is a fixed location for each block in the

cache.• If a program addresses words from two blocks mapped

to the same line, the blocks have to be swapped in and out of cache repeatedly.


48

Associative Mapping

• To improve the hit ratio of the cache, another mapping techniques is often utilized, “associative mapping”.

• A block of main memory may be mapped into ANY line of the cache.– A block of memory is no longer restricted to a

single line of cache.


49

Associative Mapping

• A main memory address is considered to be made up of two pieces:– Tag

• Upper bits of the address

– Word address within a block• Lower 2 bits of the address


50

Associative Mapping Address Structure


• 4 byte blocks.– 2 bits.


Tag Word

22 2


51

Reading From an Associative Mapped System


• The upper 22 bits of the address are matched to the tag field of EACH cache entry.– This matching must be done simultaneously to

each of the entries.– i.e. Associative memory.


52

Associative Mapping Cache Organization


53

Associative Mapping

• Advantages.– Improves hit ratio for certain situations.

• Disadvantages.– Requires very complicated matching hardware for

matching the tag and the entries for each line.• Expensive.


54

Set Associative Mapping

• Set Associative Mapping helps reduce the complexity of the matching hardware for an associative mapped cache.

• Cache is divided into a number of sets.– Each set contains a number of lines.

• A 2-way set associative cache has 2 lines per set.

• A block of memory is restricted to a SPECIFIC set of lines.– A block of main memory may map to ANY line in

the given set.


55


• A main memory address is considered to be made up of two pieces:– Tag.

• Upper bits of the address.

– Set number.• Middle bits of the address.

– Word address within a block.• Lower 2 bits of the address.


56

Set Associative Mapping Address Structure


• 4 byte blocks.– Lowest order 2 bits.

• 8K sets in a 2-way associative cache.– 13 bits.


Tag Set Word

9 13 2


57

Reading From a Set Associative Mapped System


• The cache uses the middle 13 bits to identify one of its 8 K sets.

• The upper 9 bits of the address are matched to the tag field of the cache entries that make up the set.– The number of lines to match to is very limited.– Therefore, the matching hardware is much

simpler.


58

Set Associative Mapping Cache Organization


59


• Advantages.– Combines advantages of direct and associative

mapping techniques.

• Disadvantages.– Increasing the size of the set does not always

improve the hit ratio.• 2-way set associative has a much higher hit ratio than

direct mapping.• Increasing it to 4-way improves the hit ratio slightly more.• Beyond that no significant improvement has been seen.


60

Replacement Algorithms

• What happens if there is a “miss” and the cache is already full?– One of the items in the cache needs to be

“replaced” with the new item.– Which one??– Depends on the mapping technique used.

• Direct mapping.– No choice.– Memory blocks map into certain cache lines.

• The entry occupying that line must be swapped out.


61

Replacement Algorithms

• Associative & Set Associative:– Random.– First-in First-out (FIFO).– Least Recently Used (LRU).– Least Frequently Used (LFU).

• The last three require additional bits for each entry to keep track of order, time or number of times used.

– Usually, these algorithms are implemented in hardware for speed.


62

Writing Into Cache

• Cache entries are supposed to be exact “copies” of what is in main memory.– What happens when the CPU wants to write into

memory??– Which memory does it write too???

• Two techniques are possible.– Write-through.– Write-back.


63

Write-Through

• The simplest and most commonly used technique is to update both the cache and main memory at the same time.

• Advantage.– Memory and cache are always in sync.

• Disadvantage.– Memory write becomes slow.


64

Write-Back

• The update is done ONLY to the word in the cache and the block containing the word is marked.

• When the block is to be swapped out of cache, the word is written back to main memory.

• Advantage.– Reduces memory traffic because a word may be

updated several times while in cache.

• Disadvantage.– Cache and memory will be out of sync for a while.– What about DMA??


65

Number of Caches

• When a cache miss occurs, the system suffers through a large delay while the block is read from main memory into the cache.

– Two possible solutions.• Speed up the transfer of information.

– The transfer rate is limited by issues that may not be under our control.

• Speed up the source of the information.– Main memory is between 7X and 10X slower than cache.

– We can insert an intermediate level of memory between cache and main memory.


66

Cache Levels

• In most of today’s designs, cache sits on the same chip as the CPU. “On-chip cache”– Data travels a very short distance– No need to use the very slow bus– This is known as L1 cache

• Intel calls this level L0

• To reduce the penalty of a cache miss, a second level of cache is inserted between main memory and the on-chip cache.– L2 cache


67

Cache Levels

MainMemory

OnChip

CacheCPU

SystemBus

Off-ChipCache

MemoryBus

MPU Chip

Data

Bus

Pentium ProPentium


68

“L2” Cache

• A very fast, SRAM based, cache is placed off-chip.– Slower than the on-chip cache.– Larger than the on-chip cache.

– On-Module Cache.• CPU uses a dedicated, internal, fast, memory bus to

access cache.

– On-Mother-Board Cache.• The CPU has to use the system bus to get to it.• Still much faster than DRAM based main memory.


69

Cache Strategy

• On-Chip Cache is optimized to increase “hit rate”.– Block size about 4 words– Many blocks

• Off-Chip Cache is optimized to reduce “miss penalty”.– Larger block size– Smaller number of blocks.


70

Advanced DRAM Organization

• One of the most critical bottlenecks in the system is the interface to main memory.

• The design of main memory has mostly not changed in the last 30 years.– Still based on slow DRAM design.

• One possibility of improvement has been the insertion of high speed SRAM caches.

• Recently, attempts have been made at improving the performance of the basic cell of the DRAM chip itself.


71


• Enhanced DRAM– Integrate a small SRAM cache on the DRAM chip.

• The cache holds the value of the last row read.• If the next access is to the same row, the value is

accessed from the SRAM

– Refresh can be done in parallel with a read.– Allows a read to partially overlap a previous write

operation.

– Performs similar to a DRAM and external SRAM cache combination.


72


• Cache DRAM– Integrate a larger SRAM cache on the DRAM chip.– Allow the on-chip cache to be used as a true

cache.– Allow a series of locations to be pre-fetched into

the SRAM cache for later quick access.


73


• Synchronous DRAM (SDRAM).– DRAM is asynchronous.

• Data access is independent of the clock.• The CPU must wait and continuously check for data.

– In SDRAM, data access is synchronized to an external clock running at full bus speed.

• The CPU knows exactly when the data will be ready.• It can do something else while the memory chip is

preparing the data.

– SDRAM allows burst mode.• A series of locations can be clocked out very quickly

after the first location is accessed.

– Control Register.


74


• Rambus DRAM (RDRAM).– RDRAM chips exchange information with the

microprocessor on a special 28-wire bus.• The bus can deliver up to 500-Mbps as compared to the

normal 33-Mbps for DRAM.

– The CPU sends all requests to RDRAM over this special bus.

• The request contains the desired address, the type of operation, and the number of bytes.

– Using Rambus DRAM requires special consideration during the design of the CPU.

• Currently only Pentium IV uses Rambus DRAM.

eceg-3202:computer architecture and organization, dept of ece, aau 1 memory system design

Documents