memorijski sistem
DESCRIPTION
Memorijski sistem. Processor. Input. Control. Memory. Datapath. Output. The Big Picture. Since 1946 all computers have had 5 components. Tipovi memorija. Hijerarhijska organi z acija memorije. Levels in Memory Hierarchy. Processor-DRAM Gap. - PowerPoint PPT PresentationTRANSCRIPT
Memorijski sistem
Control
Datapath
Memory
Processor
Input
Output
Since 1946 all computers have had 5 components
The Big Picture
Tipovi memorija
Hijerarhijska organizacija memorije
Levels in Memory Hierarchy
Microprocessor performance improved 55% per year since 1987, and 35% per year until 1986
Memory technology improvements aim primarily at increasing DRAM capacity not DRAM speed
Processor-DRAM Gap
Relative processor/memory speed
Relative processor/memory speed
MOS memories
RAM's ROM's
DRAMSRAM ROM
FLASHEEPROMEPROM
VOLATILE
Power off: contents lost
NON VOLATILE
Power off : contents kept
Volatile vs non Volatile Types of Memories
Percentage of Usage
Typical Applications of DRAM
Trends in DRAM main memory.
1990 1980 2000 2010
Num
ber o
f mem
ory
chip
s
Calendar year
1
10
100
1000
Large PCs
Work- stations
Servers
Super- computers
1 MB
4 MB
16 MB
64 MB
256 MB
1 GB
4 GB
16 GB
64 GB
256 GB
1 TB
Computer class
Memory size
Small PCs
DRAM Evolution
MRAM can replace many ofthers types of memory including SRAM, DRAM, ROM, EEPROM, Flash EEPROM, and feroelectric RAM (FRAM) . Prediction are crystalline structures that users grow on silicon.
Magnetic RAM as Universal Memory
Veliki kraj u odnosu na mali kraj
Konverzija BE u LE
Prezentacija u BE i LE
Tipovi poluprovodničkih memorija
Struktura memorijskog čipa
Address translation
Row decoding & read out
Column decoding
& selection
Tag comparison & validation
Pipelined and Interleaved Memory
Interleaved memory is more flexible than wide-access memory in that it can handle multiple independent accesses at once.
Add- ress
Addresses that are 0 mod 4
Addresses that are 2 mod 4
Addresses that are 1 mod 4
Addresses that are 3 mod 4
Return data
Data in
Data out Dispatch
(based on 2 LSBs of address)
Bus cycle
Memory cycle
0
1
2
3
0
1
2
3
Module accessed
Time
Memory Interleaving
Tri načina organizacije 96-bitne memorije
Povezivanje memorije kod 32-bitnih procesora
Memorijska adresa i memorijska mapa
Primer 1: Povezivanja memorije kod 32-bitnog procesora
Primer 1: Memorijska mapa
Primer 1: Realizacija dekodera adresa
Primer 2: Memorijska mapa i dekoder kada su M1 i M2 veličine 1 MB
Primer 3: Selekcija na osnovu MS bita
Preslikavanje memorijskih prostora
Tipičan adresni dekoder
Realizacija jedne adresne šeme
Realizacija dekodiranja adresa korišćenjem komparatora
Realizacija dekodera adresa korišćenjem PROM-a
Struktura PLD-a
ROM model sa programibilnim OR poljem
PAL model sa fiksnim OR poljem
PAL model sa programibilnim AND i OR poljima
Izgled makroćelije
Izgled FPGA kola
Izgled CPLD ćelije
Tipičan način povezivanja memorije
Tajming memorije
Tajming ciklusa upisa u memoriju
Tajming ciklusa čitanja iz memorije
Struktura dinamičke memorije
Dinamička memorija – logička šema
Povezivanje DRAM-ova
Povezivanje dinamičkih memorija
Tajming kod dinamičkih memorija
Povezivanje dinamičkih memorija
Virtuelna – fizička adresa
Položaj keša u memorijskoj hijerarhiji
Memorijski moduli
Memorijski moduli
Dual-port memorija
Struktura kružnog bafera
Princip rada kružnog bafera
Asocijativna memorija
Asocijativna memorija
Memorija u sistemu
Memorija u sistemu sa većim brojem procesora
Princip rada keša se zasniva na lokalnosti- koja može biti:
CPUadrese
podacikeš
kontroler
kešglavna
memorija
podaci
adresepodaci
Keš kontroler posreduje izmedju CPU-a i memorijskog sistema koga čine keš i glavna memorija.
vremenska lokalnost
prostorna lokalnost
Keš memorija
Cache memories act as intermediaries between the superfast processor and the much slower main memory.
Level-2 cache
Main memory
CPU CPU registers
Level-1 cache
Level-2 cache
Main memory
CPU CPU registers
Level-1 cache
(a) Level 2 between level 1 and main (b) Level 2 connected to “backside” bus
One level of cache with hit rate hCeff = hCfast + (1 – h)(Cslow + Cfast) = Cfast + (1 – h)Cslow
The Need for a Cache
Example
A system with L1 and L2 caches has a CPI of 1.2 with no cache miss. There are 1.1 memory accesses on average per instruction. What is the effective CPI with cache misses factored in? What are the effective hit rate and miss penalty overall if L1 and L2 caches are modeled as a single cache?Level Local hit rate Miss penalty L1 95 % 8 cycles L2 80 % 60 cycles
8 cycles
60 cycles
95% 4%1%
Solution
Ceff = Cfast + (1 – h1)[Cmedium + (1 – h2)Cslow]Because Cfast is included in the CPI of 1.2, we must account for the restCPI = 1.2 + 1.1(1 – 0.95)[8 + (1 – 0.8)60] = 1.2 + 1.1 0.05 20 = 2.3Overall: hit rate 99% (95% + 80% of 5%), miss penalty 60 cycles
Performance of a Two-Level Cache System
Cache size (in bytes or words). A larger cache can hold more of the program’s useful data but is more costly and likely to be slower.
Block or cache-line size (unit of data transfer between cache and main). With a larger cache line, more data is brought in cache with each miss. This can improve the hit rate but also may bring low-utility data in.
Placement policy. Determining where an incoming cache line is stored. More flexible policies imply higher hardware cost and may or may not have performance benefits (due to more complex data location).
Replacement policy. Determining which of several existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies: choosing a random or the least recently used block.
Write policy. Determining if updates to cache words are immediately forwarded to main (write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or copy-back).
Cache Memory Design Parameters
Temporal- and spatial-locality of reference
When exhibiting spatial locality, a program accesses neighboring memory locations. When displaying temporal locality of reference a program repeatedly accesses the same memory location during a short time period. Both forms of locality occur in the following Pascal code segment:
Temporal locality - the CPU accesses i at three points in a short time period. Spatial locality - assuming that Pascal stores the elements of A into consecutive memory locations6, each loop iteration accesses adjacent memory locations.
Assuming no conflict in address mapping, the cache will hold a
small program loop in its entirety, leading to fast execution.
9-instruction program loop
Address mapping (many-to-one)
Cache memory
Main memory
Cache l ine/block (unit of t ransfer between main and cache memories)
Temporal localitySpatial locality
What Makes a Cache Work?
Addresses
Time
From Peter Denning’s CACM paper, July 2005 (Vol. 48, No. 7, pp. 19-24)
Temporal:Accesses to the same address are typically clustered in time
Spatial:When a location is accessed, nearby locations tend to be accessed also
Working set
Temporal and Spatial Localities
procesor
keš za instrukcije
keš za podatke
glavna memorija
pribavljanje podataka
pribavljanje instrukcija
Harvard arhitektura
L1-keš
mikroprocesor
L2-keš
glavnamemorija
1
2L1-keš
mikroprocesor
glavnamemorija
L2-keš
12
backside magistralafrontside magistralafrontside magistrala
L1 i L2 keš
mikroprocesor keš
most(bridge)
glavna memorija
procesorska / memorijska magistrala
sistemska / Ulazno-Izlazna magistrala
mikroprocesor
keš glavna memorija
most
magistrala procesora
memorijska magistrala
sistemska / Ulazno-Izlazna magistrala
Položaj keša u sistemu
grafički kontroler
Host / PCI most
glavna memorija (DRAM)
CD-ROM kontroler
PCI / ISA most
proširenja
tastatura , miš
AGPSCSI-Host
adapter
LAN- kontroler
mikroprocesor(host) saL1-kešom
L2-keš(SRAM)
IDE
IDE
USB
USB
….
PCIkartica:
ISAkartica: audio
flopi-diskkontroler ….
ISA-Bus
PCI-Bus
magistrala procesora (host magistrala)
Položaj keša u sistemu PC-a
Razlika u vremenima pristupa
Gradivni delovi keš memorije
Princip rada keš memorije
Keš memorija – primer organizacije
Keš memorija – primer realizacije u dva nivoa
031Tag
Tag 0
Tag 1
Tag 2
Tag 2047
28
k0
k1
k2
k2047
...
linija 0
linija 2
linija 1
linija 2047
V D
V D
V D
V D
MUX / DEMUX
22
32
32 323232
selekcija bajta
selekcija reči
selekt
selekcija bajta u okviru reči
ka / sa magistrale podataka
keš pogodak
tag pogodatak
adresa
valid pogodak
Asocijativna keš memorija
Asocijativna keš memorija sa FIFO politikom zamene
Direktno preslikana keš memorija
031Tag
Tag 0
Tag 1
Tag 2
Tag 2047
linija 0
linija 2
linija 1
linija 2047
V D
V D
V D
V D
MUX / DEMUX
22
32
32 323232
selekcija bajta
selekcija reči
selekt
selekcija bajta uokviru reči
ka / sa magistralepodataka
kešpogodak
tagpogodatak
adresa
validpogodak
Indeks
=
linija 0
linija 2
linija 1
linija 2047
11
17
17 1
1 1
dekoder adresa
Primer direktno preslikane keš memorije
Preslikavanje adresa kod direktno preslikane keš memorije
Direct-mapped cache holding 32 words within eight 4-word lines. Each line is associated with a tag and a valid bit.
3-bit line index in cache 2-bit word offset in line Main
memory locations
0-3 4-7
8-11
36-39 32-35 40-43
68-71 64-67 72-75
100-103 96-99 104-107
Tag Word
address
Valid bits
Tags
Read tag and specified word
Com-pare
1,Tag
Data out
Cache miss
1 if equal
Direct-Mapped Cache
Skupno-asocijativna keš memorija031
Tag
Tag 0
Tag 1
Tag 2
Tag 1023
linija 0
linija 2
linija 1
linija1023
V D
V D
V D
V D
22
32
32
selekcija bajta
selekcija reči
selekcija bajta u okviru reči
ka / sa magistrale podataka
keš pogodak
tag pogodatak
adresa
valid pogodak
Indeks
=
linija 0
linija 2
linija 1
linija 1023
1018
1
1
1
linija 0V D
Tag 0linija 0
= 11818
MUX / DEMUX
32 32 32
MUX / DEMUXen
en
32keš
pogodak
1
Primer dvostruko skupno-asocijativne keš memorije
Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets.
Main memory locations
0-3
16-19
32-35
48-51
64-67
80-83
96-99
112-115
Valid bits Tags
1
0
2-bit set index in cache
2-bit word offset in line
Tag
Word address
Option 0
Option 1
Read tag and specified word from each option
Com-pare
1,Tag
Com-pare
Data out
Cache
miss
1 if equal
Set-Associative Cache
Položaj keša u sistemu
Položaj keša u sistemu – prod.
Virtuelni i realni keš
Interni i eksterni keš
Keš u odnosu na U/I podsistem
Procenat promašaja u funkciji od obima keša
Procenat promašaja u funkciji od obima keša
Politike zamene kod keša
Virtual verus Physical memory
Program segments in main memory and on disk.
Program and data on several disk tracks
System
Stack
Active pieces of program and data in memory
Unused space
The Need for Virtual Memory
Data movement in a memory hierarchy.
Pages Lines
Words
Registers
Main memory
Cache
Virtual memory
(transferred explicitly
via load/store) (transferred automatically
upon cache miss) (transferred automatically
upon page fault)
Memory Hierarchy: The Big Picture
Virtual-to-physical address translation parameters.
Virtual address
Physical address
Physical page number
Virtual page number Offset in page
Offset in page
Address translation
P bits
P bits
V P bits
M P bits
Example Determine the parameters for 32-bit virtual addresses, 4 KB pages, and 128 MB byte-addressable main memory.
Solution: Physical addresses are 27 b, byte offset in page is 12 b; thus, virtual (physical) page numbers are 32 – 12 = 20 b (15 b)
Address Translation in Virtual Memory
The role of page table in the virtual-to-physical address translation process.
Page table
Main memory
Valid bits
Page table register
Virtual page
number
Other f lags
Page Tables and Address Translation
Virtual memory as a facilitator of sharing and memory protection.
Page table for process 1
Main memory Permission bits
Pointer Flags
Page table for process 2
To disk memory
Only read accesses allow ed
Read & w rite accesses allowed
Protection and Sharing in Virtual Memory
Page table
Main memory
Valid bits
Page table register
Virtual page
number
Other flags
Virtual address
Memory access 1
Physical address
Memory access 2
The Latency Penalty of Virtual Memory
Virtual-to-physical address translation by a TLB and how the resulting physical address is used to access the cache memory.
Virtual page number
Byte offset
Byte offset in word
Physical address tag
Cache index
Valid bits
TLB tags
Tags match and entry is valid
Physical page number Physical
address
Virtual address
Tran
slat
ion
Other flags
Translation Lookaside Buffer
Example
An address translation process converts a 32-bit virtual address to a 32-bit physical address. Memory is byte-addressable with 4 KB pages. A 16-entry, direct-mapped TLB is used. Specify the components of the virtual and physical addresses and the width of the various TLB fields.
Solution Virtual page number
Byte offset
Byte offset in word
Physical address tag
Cache index
Valid bits
TLB tags
Tags match and entry is valid
Physical page number Physical
address
Virtual address
Tran
slat
ion
Other flags
12
12
20
20
VirtualPage number
416TLB
index
Tag
TLB word width =16-bit tag +20-bit phys page # +1 valid bit +Other flags 37 bits
16-entryTLB
Address Translation via TLB
Options for where virtual-to-physical address translation occurs.
TLB Main memory Virtual-address
cache
TLB Main memory Physical-address
cache
TLB
Main memory Hybrid-address cache
Virtual- or Physical-Address Cache?
A scheme for the approximate implementation of LRU .
0
1
0
0
1
1
0
1
0
1
0
1
0
0
0
1
(a) Before replacement (b) After replacement
Least-recently used policy: effective, but hard to implement
Approximate versions of LRU are more easily implemented Clock policy: diagram below shows the reason for name Use bit is set to 1 whenever a page is accessed
Page Replacement Policies
Memory hierarchy parameters and their effects on performance
Parameter variation Potential advantages Possible disadvantages
Larger main or cache size
Fewer capacity misses Longer access time
Longer pages or lines Fewer compulsory misses (prefetching effect)
Greater miss penalty
Greater associativity (for cache only)
Fewer conflict misses Longer access time
More sophisticated replacement policy
Fewer conflict misses Longer decision time, more hardware
Write-through policy (for cache only)
No write-back time penalty, easier write-miss handling
Wasted memory bandwidth, longer access time
Improving Virtual Memory Performance
Data movement in a memory hierarchy.
Pages Lines
Words
Registers
Main memory
Cache
Virtual memory
(transferred explicitly
via load/store) (transferred automatically
upon cache miss) (transferred automatically
upon page fault)
Cache memory: provides illusion of very high speed
Virtual memory: provides illusion of very large size
Main memory: reasonable cost, but slow & small
Locality makes the illusions work
Summary of Memory Hierarchy