single chip multiprocessors computer architecture term paper (11.12.2003) esra kirba Ş 2002701357

SINGLE CHIP MULTIPROCESSORS

Computer ArchitectureTerm Paper

(11.12.2003)

Esra KIRBAŞ2002701357

Evaluation of Design Alternatives for a Multiprocessor Microprocessor

By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun.

ISCA 23, 1996, pp. 67-77.

•With the use of advanced integrated technology, several options for design of high-performance microprocessors are avaliable.

•In multiproessor design option, a small # of processors are interconnected on a single-chip or on a multi-chip-module (MCM) substrate.

•We consantrate on single-chip multiprocessors.

Our goal is to study two proposed cache-sharing mechanisms for single chip multiprocessors:

I. Shared Level-1 (L1) Cache ArchitectureII. Shared Level-2 (L2) Cache Architecture

(Performance of these two architectures will be compared with a single-bus based shared-memory multiprocessor .)

•A multiprocessor architecture whose interconnect is closer to the CPUs in the memory hierarchy will be able to exploit fine-grained parallelism more efficiently than a multiprocessor architecture whose interconnect is further away from the CPUs in the memory hierarchy.

•Try to achieve good performance on fine-grained parallel applications without sacrificing the performance of parallel independent jobs.

CPU CHARACTERISTICS

•We use the same CPU with all the three architectures.

•2-way issue processor

•Dynamic scheduling•Speculative execution•Non-blocking caches

Instruction Pipeline Functional Units

•2-way 16KB set-associative instruction and data caches

•32-entry centeralized instruction window

•32-entry reorder buffer.

Shared L1-Cache Multiprocessor

Advantages of this Architecture:

• It provides the lowest latency for interprocessor communication by using a shared-memory address space.• Low latency for interprocessor communication helps to achieve high performance in executing fine-grained parallel applications.• Processors may fetch shared data into the cache for each other.• It eleminates the cache coherence logic and implicitly provides a sequentially consistent memory without sacrificing the performance.

Disadvantages of this Architecture:

• Crossbar switching system increases the access time of L1 cache. (We assume that average access time is three.)• All of the memory referances will be entered L1, so there may be some extra delays due to bank conflicts.• If the processors are not executing fine-grained parallel applications, then the miss rate will increase.

Secondary cache and main memories are uniprocessor like systems

L2 (2 MB, 10-cycle latency + 2-cycle occupancy)

Main Memory50-cycle latency6-cycle occupancy

Shared L2-Cache Multiprocessor

• Write-through primary caches’ access time is 1 cycle

• Latency of L2-cache increses to 14 cycles due to the cross-bar overhead.

• L2 cache has four independent banks to increase its bandwith and enable it to support four independent access streams.

• Data-path is 64-bit width.

• occupancy is 4 cycles (for the transfer of 32-bit cache line)

• Only memory accesses that miss in L1-cache will have to deal with the problem of reduced performance L2 cache.

• MCM (multi chip module) technology can be used. (for 1996)

Main Memory50-cycle latency6-cycle occupancy

• To keep the primary caches coherent, we need a coherency protocol.

• Simply, we assume that each primary cache uses a write-through policy for shared data.

• Additional hardware must be installed for this issue.

Shared Main Memory Multiprocessor

• Primary cache access time is 1 cycle.

• Secondary cache access time is 12 cycles.

• All CPUs must access main memory to communicate.

Ideal Memory Latencies of Three Architectures in CPU Clock Cycles

SIMULATION ENVIRONMENT

• SimOS simulation environment is used

• IRIX 5.3 operating system is simulated

• Hand Parallelized Scientific and Engineering Applications Compiler Parallelized Scientific and Engineering Applications Multiprogramming Workload

2 kinds of simulations is done;

I. Simple Simulation (no speculative execution, dynamic scheduling, and non-blocking memory referances)

II. Dynamic Superscalar Simulation

SIMPLE SIMULATION RESULTS(for high degree of interprocessor communication)

EQNOTT

(for moderate degree of interprocessor communication)VOLPACK

FFT Kernel

(for low degree of interprocessor communication)MULTIPROGRAMMING WORKLOAD

DYNAMIC SUPERSCALAR SIMULATION RESULTS

In dynamic superscalar simulation,

Shared-L1 cache performance can diminish substantially,

whereas Shared-L2 and shared-memory architectures retain much of the relative performance predicted by the simple simulation results.

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing

By Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben

Verghese.

ISCA 27, 2000, pp. 282-293

• For Online Transaction Processing Systems• Standart ASIC design technology is used

• The centerpiece of the Piranha architecture is a highly integrated processing node, with eight simple Alpha processor cores, seperate instruction and data caches for each core, a shared second level cache, eight memory controllers, two coherence protocol engines, and a network router all on a single chip.

SIMULATION

single chip multiprocessors computer architecture term paper (11.12.2003) esra kirba Ş 2002701357

secondary cache

cache increses

proposed cache

cache line1536

access time of l1 cache

cycle latency of l2

sharedmemory multiprocessor

cache coherence logic

Documents

esra proiect

esra newsletter - jun 2015 newsletter - jun 2015.pdf ·...

código: ~ esra rn

hierarchical checking of multiprocessors using · pdf...

esra handtransportcatalogus 2013

esra bulut - resume

multiprocessors interconnection networks

esra nur İŞİtez.pptx

multiprocessors— large vs. small scale multiprocessors—...

multiprocessors - university of california, san...

introduction by esra

esra demİrkol - metu

numa multiprocessors

1 multiprocessors computer organization computer...

11 - esra europe

esra probno

shared memory multiprocessors

symmetric multiprocessors

esra karagoz autoshow

large scale multiprocessors and scientific applications ·...