single chip multiprocessors computer architecture term paper (11.12.2003) esra kirba Ş 2002701357

36
SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBAŞ 2002701357 1/3 6

Upload: corine

Post on 16-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357. 1/36. Evaluation of D esign A lternatives for a M ultiprocessor M icroprocessor By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun. ISCA 23, 1996, pp. 67-77. 2/36. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

SINGLE CHIP MULTIPROCESSORS

Computer ArchitectureTerm Paper

(11.12.2003)

Esra KIRBAŞ2002701357

1/36

Page 2: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Evaluation of Design Alternatives for a Multiprocessor Microprocessor

By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun.

ISCA 23, 1996, pp. 67-77.

2/36

Page 3: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

•With the use of advanced integrated technology, several options for design of high-performance microprocessors are avaliable.

•In multiproessor design option, a small # of processors are interconnected on a single-chip or on a multi-chip-module (MCM) substrate.

•We consantrate on single-chip multiprocessors.

3/36

Page 4: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Our goal is to study two proposed cache-sharing mechanisms for single chip multiprocessors:

I. Shared Level-1 (L1) Cache ArchitectureII. Shared Level-2 (L2) Cache Architecture

(Performance of these two architectures will be compared with a single-bus based shared-memory multiprocessor .)

4/36

Page 5: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

•A multiprocessor architecture whose interconnect is closer to the CPUs in the memory hierarchy will be able to exploit fine-grained parallelism more efficiently than a multiprocessor architecture whose interconnect is further away from the CPUs in the memory hierarchy.

•Try to achieve good performance on fine-grained parallel applications without sacrificing the performance of parallel independent jobs.

5/36

Page 6: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

CPU CHARACTERISTICS

•We use the same CPU with all the three architectures.

•2-way issue processor

•Dynamic scheduling•Speculative execution•Non-blocking caches

6/36

Page 7: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

7/36

Instruction Pipeline Functional Units

Page 8: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

•2-way 16KB set-associative instruction and data caches

•32-entry centeralized instruction window

•32-entry reorder buffer.

8/36

Page 9: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Shared L1-Cache Multiprocessor

9/36

Page 10: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Advantages of this Architecture:

• It provides the lowest latency for interprocessor communication by using a shared-memory address space.• Low latency for interprocessor communication helps to achieve high performance in executing fine-grained parallel applications.• Processors may fetch shared data into the cache for each other.• It eleminates the cache coherence logic and implicitly provides a sequentially consistent memory without sacrificing the performance.

10/36

Page 11: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Disadvantages of this Architecture:

• Crossbar switching system increases the access time of L1 cache. (We assume that average access time is three.)• All of the memory referances will be entered L1, so there may be some extra delays due to bank conflicts.• If the processors are not executing fine-grained parallel applications, then the miss rate will increase.

11/36

Page 12: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Secondary cache and main memories are uniprocessor like systems

L2 (2 MB, 10-cycle latency + 2-cycle occupancy)

Main Memory50-cycle latency6-cycle occupancy

12/36

Page 13: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Shared L2-Cache Multiprocessor

13/36

Page 14: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• Write-through primary caches’ access time is 1 cycle

• Latency of L2-cache increses to 14 cycles due to the cross-bar overhead.

14/36

Page 15: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• L2 cache has four independent banks to increase its bandwith and enable it to support four independent access streams.

• Data-path is 64-bit width.

• occupancy is 4 cycles (for the transfer of 32-bit cache line)

15/36

Page 16: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• Only memory accesses that miss in L1-cache will have to deal with the problem of reduced performance L2 cache.

• MCM (multi chip module) technology can be used. (for 1996)

Main Memory50-cycle latency6-cycle occupancy

16/36

Page 17: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• To keep the primary caches coherent, we need a coherency protocol.

• Simply, we assume that each primary cache uses a write-through policy for shared data.

• Additional hardware must be installed for this issue.

17/36

Page 18: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Shared Main Memory Multiprocessor

18/36

Page 19: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• Primary cache access time is 1 cycle.

• Secondary cache access time is 12 cycles.

• All CPUs must access main memory to communicate.

19/36

Page 20: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Ideal Memory Latencies of Three Architectures in CPU Clock Cycles

20/36

Page 21: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

SIMULATION ENVIRONMENT

• SimOS simulation environment is used

• IRIX 5.3 operating system is simulated

• Hand Parallelized Scientific and Engineering Applications Compiler Parallelized Scientific and Engineering Applications Multiprogramming Workload

21/36

Page 22: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

2 kinds of simulations is done;

I. Simple Simulation (no speculative execution, dynamic scheduling, and non-blocking memory referances)

II. Dynamic Superscalar Simulation

22/36

Page 23: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

SIMPLE SIMULATION RESULTS(for high degree of interprocessor communication)

EAR

23/36

Page 24: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

EQNOTT

24/36

Page 25: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

(for moderate degree of interprocessor communication)VOLPACK

25/36

Page 26: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

FFT Kernel

26/36

Page 27: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

(for low degree of interprocessor communication)MULTIPROGRAMMING WORKLOAD

27/36

Page 28: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

OCEAN

28/36

Page 29: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

DYNAMIC SUPERSCALAR SIMULATION RESULTS

29/36

Page 30: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

In dynamic superscalar simulation,

Shared-L1 cache performance can diminish substantially,

whereas Shared-L2 and shared-memory architectures retain much of the relative performance predicted by the simple simulation results.

30/36

Page 31: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing

By Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben

Verghese.

ISCA 27, 2000, pp. 282-293

31/36

Page 32: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

• For Online Transaction Processing Systems• Standart ASIC design technology is used

• The centerpiece of the Piranha architecture is a highly integrated processing node, with eight simple Alpha processor cores, seperate instruction and data caches for each core, a shared second level cache, eight memory controllers, two coherence protocol engines, and a network router all on a single chip.

32/36

Page 33: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

33/36

Page 34: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

34/36

Page 35: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

35/36

Page 36: SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

36/36

SIMULATION