Download - Computer Systems Architecture A networking Approach Chapter 12 Introduction The Memory Hierarchy
COMPUTER SYSTEMS ARCHITECTUREA NETWORKING APPROACHCHAPTER 12 INTRODUCTIONTHE MEMORY HIERARCHYCS 147Nathaniel Gilbert
1
Levels of Performance – You Get What You Pay For
Recall: Dynamic Random Access Memory (DRAM)
Capacitors to store state (0 or 1) Periodically refreshed Relatively cheap
Static Random Access Memory (SRAM) Transistors to store state Doesn’t need to be refreshed, faster, and uses
less power than DRAM More expensive than DRAM
2
Levels of Performance cont.
Currently, one Pound is about 2 US Dollars.
R = removable media
3
Levels of Performance cont.
Storage Hierarchy – fastest CPU registers at top, slowest tape drives at bottom
Pre-fetching – Data transferred between layers is usually bigger than requested. This is to anticipate using the extra blocks of data.
4
Localization of Access – exploiting repetition
Computers tend to access the same locality of memory.
This is partly due to the programmer organizing data in clusters along with the compiler attempting to organize code efficiently.
This localization can be exploited in memory hierarchy.
5
Localization of Access cont.
Exploiting localization of memory access Keep related data in smaller groups (try not
to store all input and output to a single array when reading from/writing to disk)
Only the portion of data the CPU is using should be loaded into faster memory.
6
Localization of Access cont.
The following code was used by the author to demonstrate cache action (exploiting localization of memory access)
7
Localization of Access cont.
On a sun workstation (200 MHz CPU, 256 Mbyte main memory, 256 kbyte cache, 4 Gbyte local hard drive), the output was:
(Time is system clock ticks)
8
Localization of Access cont.
The reason for the doubling of time is the movement of data up and down the data hierarchy. The array is sent to higher memory in
blocks because the 256 kbytes of cache memory cannot hold the whole object.
9
Instruction and Data Caches – Matching Memory to CPU Speed
A 2 GHz Pentium CPU accesses program memory an average off 0.5 ns just for fetching instructions
DDO DRAM responds within 10 ns. If the CPU only used DRAM, it would result in 20x loss in speed
This is where using SRAM (cache) comes into play Downfall of cache:
Misses (if the desired code is not in the memory segment) may take longer because the memory has to be reloaded
Negative cache – (depending on architecture) where negative results (failures) are stored
10
Instruction and Data Caches cont.
Cache is built from SRAM chips, and ideally are made to match the system clock speed of a CPU
The Cache Controller Unit (CCU) and cache memory, are inserted between the CPU and the main memory.
Level 1 and Level 2 cache are different by placement. Level 1 is on the CPU chip. Level 2 was generally located off the
CPU chip and was slowed down by the system bus. Intel successfully integrated a 128 kbyte L2 cache memory onto the CPU and continues to offer integrated chips.
11
Instruction and Data Caches cont.
Generic System Architecture Level 1 is the microprocessor with
three forms of cache: D-cache – (Data) Fast buffer
containing application data I-cache – (Instruction) Speed up
executable instruction TLB – (Translation Lookaside
Buffer) Stores a map of translated virtual page addresses
Level 2 is Unified cache Memory – DRAM CPU and Register file reside in
Level 1 Register file – Small amount of
memory closest to CPU where data is manipulated
12
Thank You13