single chip multiprocessors computer architecture term paper (11.12.2003) esra kirba Ş 2002701357

Post on 16-Jan-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357. 1/36. Evaluation of D esign A lternatives for a M ultiprocessor M icroprocessor By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun. ISCA 23, 1996, pp. 67-77. 2/36. - PowerPoint PPT Presentation

TRANSCRIPT

SINGLE CHIP MULTIPROCESSORS

Computer ArchitectureTerm Paper

(11.12.2003)

Esra KIRBAŞ2002701357

1/36

Evaluation of Design Alternatives for a Multiprocessor Microprocessor

By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun.

ISCA 23, 1996, pp. 67-77.

2/36

•With the use of advanced integrated technology, several options for design of high-performance microprocessors are avaliable.

•In multiproessor design option, a small # of processors are interconnected on a single-chip or on a multi-chip-module (MCM) substrate.

•We consantrate on single-chip multiprocessors.

3/36

Our goal is to study two proposed cache-sharing mechanisms for single chip multiprocessors:

I. Shared Level-1 (L1) Cache ArchitectureII. Shared Level-2 (L2) Cache Architecture

(Performance of these two architectures will be compared with a single-bus based shared-memory multiprocessor .)

4/36

•A multiprocessor architecture whose interconnect is closer to the CPUs in the memory hierarchy will be able to exploit fine-grained parallelism more efficiently than a multiprocessor architecture whose interconnect is further away from the CPUs in the memory hierarchy.

•Try to achieve good performance on fine-grained parallel applications without sacrificing the performance of parallel independent jobs.

5/36

CPU CHARACTERISTICS

•We use the same CPU with all the three architectures.

•2-way issue processor

•Dynamic scheduling•Speculative execution•Non-blocking caches

6/36

7/36

Instruction Pipeline Functional Units

•2-way 16KB set-associative instruction and data caches

•32-entry centeralized instruction window

•32-entry reorder buffer.

8/36

Shared L1-Cache Multiprocessor

9/36

Advantages of this Architecture:

• It provides the lowest latency for interprocessor communication by using a shared-memory address space.• Low latency for interprocessor communication helps to achieve high performance in executing fine-grained parallel applications.• Processors may fetch shared data into the cache for each other.• It eleminates the cache coherence logic and implicitly provides a sequentially consistent memory without sacrificing the performance.

10/36

Disadvantages of this Architecture:

• Crossbar switching system increases the access time of L1 cache. (We assume that average access time is three.)• All of the memory referances will be entered L1, so there may be some extra delays due to bank conflicts.• If the processors are not executing fine-grained parallel applications, then the miss rate will increase.

11/36

Secondary cache and main memories are uniprocessor like systems

L2 (2 MB, 10-cycle latency + 2-cycle occupancy)

Main Memory50-cycle latency6-cycle occupancy

12/36

Shared L2-Cache Multiprocessor

13/36

• Write-through primary caches’ access time is 1 cycle

• Latency of L2-cache increses to 14 cycles due to the cross-bar overhead.

14/36

• L2 cache has four independent banks to increase its bandwith and enable it to support four independent access streams.

• Data-path is 64-bit width.

• occupancy is 4 cycles (for the transfer of 32-bit cache line)

15/36

• Only memory accesses that miss in L1-cache will have to deal with the problem of reduced performance L2 cache.

• MCM (multi chip module) technology can be used. (for 1996)

Main Memory50-cycle latency6-cycle occupancy

16/36

• To keep the primary caches coherent, we need a coherency protocol.

• Simply, we assume that each primary cache uses a write-through policy for shared data.

• Additional hardware must be installed for this issue.

17/36

Shared Main Memory Multiprocessor

18/36

• Primary cache access time is 1 cycle.

• Secondary cache access time is 12 cycles.

• All CPUs must access main memory to communicate.

19/36

Ideal Memory Latencies of Three Architectures in CPU Clock Cycles

20/36

SIMULATION ENVIRONMENT

• SimOS simulation environment is used

• IRIX 5.3 operating system is simulated

• Hand Parallelized Scientific and Engineering Applications Compiler Parallelized Scientific and Engineering Applications Multiprogramming Workload

21/36

2 kinds of simulations is done;

I. Simple Simulation (no speculative execution, dynamic scheduling, and non-blocking memory referances)

II. Dynamic Superscalar Simulation

22/36

SIMPLE SIMULATION RESULTS(for high degree of interprocessor communication)

EAR

23/36

EQNOTT

24/36

(for moderate degree of interprocessor communication)VOLPACK

25/36

FFT Kernel

26/36

(for low degree of interprocessor communication)MULTIPROGRAMMING WORKLOAD

27/36

OCEAN

28/36

DYNAMIC SUPERSCALAR SIMULATION RESULTS

29/36

In dynamic superscalar simulation,

Shared-L1 cache performance can diminish substantially,

whereas Shared-L2 and shared-memory architectures retain much of the relative performance predicted by the simple simulation results.

30/36

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing

By Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben

Verghese.

ISCA 27, 2000, pp. 282-293

31/36

• For Online Transaction Processing Systems• Standart ASIC design technology is used

• The centerpiece of the Piranha architecture is a highly integrated processing node, with eight simple Alpha processor cores, seperate instruction and data caches for each core, a shared second level cache, eight memory controllers, two coherence protocol engines, and a network router all on a single chip.

32/36

33/36

34/36

35/36

36/36

SIMULATION

top related