1b.1 types of parallel computers two principal types: shared memory multiprocessor distributed...

1b.1

Types of Parallel Computers

Two principal types:

• Shared memory multiprocessor

• Distributed memory multicomputer

ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

1b.2

Shared Memory Multiprocessor

1b.3

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

1b.4

Shared Memory Multiprocessor SystemNatural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module :

Processors

Processor-memory Interconnections

Memory moduleOneaddressspace

1b.5

Simplistic view of a small shared memory multiprocessor

Examples:• Dual Pentiums• Quad Pentiums

Processors Shared memory

Bus

1b.6

Real computer system have cache memory between the main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Example Quad Shared Memory Multiprocessor

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

1b.7

“Recent” innovation

• Dual-core and multi-core processors• Two or more independent processors in one

package

• Actually an old idea but not put into wide practice until recently.

• Since L1 cache is usually inside package and L2 cache outside package, dual-/multi-core processors usually share L2 cache.

1b.8

Example

• Dual core Pentiums (Intel CoreTM2 Dual processors) -- Two processors in one package sharing a common L2 Cache. Introduced April 2005. (Also hyper-threaded)

• Xbox 360 game console -- triple core PowerPC microprocessor.

• PlayStation 3 Cell processor -- 9 core design.

References and more information:

http://www.intel.com/products/processor/core2duo/index.htm

http://en.wikipedia.org/wiki/Dual_core

1b.9

Programming Shared Memory Multiprocessors

Several possible ways

1. Use Threads - programmer decomposes program into individual parallel sequences, (threads), each being able to access shared variables declared outside threads.

Example Pthreads

2. Use library functions and preprocessor compiler directives with a sequential programming language to declare shared variables and specify parallelism.

Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables - needs OpenMP compiler

1b.10

3. Use a modified sequential programming language -- added syntax to declare shared variables and specify parallelism.

Example UPC (Unified Parallel C) - needs a UPC compiler.

4. Use a specially designed parallel programming language -- with syntax to express parallelism. Compiler automatically creates executable code for each processor (not now common).

5. Use a regular sequential programming language such as C and ask parallelizing compiler to convert it into parallel executable code. Also not now common.

1b.11

Message-Passing Multicomputer

Complete computers connected through an interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

1b.12

Interconnection Networks

• Limited and exhaustive interconnections• 2- and 3-dimensional meshes• Hypercube (not now common)• Using Switches:

– Crossbar– Trees– Multistage interconnection networks

1b.13

Two-dimensional array (mesh)

Also three-dimensional - used in some large high performance systems.

LinksComputer/processor

1b.14

Three-dimensional hypercube

000 001

010 011

100

110

101

111

1b.15

Four-dimensional hypercube

Hypercubes popular in 1980’s - not now

0000 0001

0010 0011

0100

0110

0101

0111

1000 1001

1010 1011

1100

1110

1101

1111

1b.16

Crossbar switch

SwitchesProcessors

Memories

1b.17

Tree

Switchelement

Root

Links

Processors

1b.18

Multistage Interconnection NetworkExample: Omega network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Inputs Outputs

2 ´ 2 switch elements(straight-through or

crossover connections)

1b.19

Networked Computers as a Computing Platform

• A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990’s.

• Several early projects. Notable:

– Berkeley NOW (network of workstations) project.

– NASA Beowulf project.

1b.20

Key advantages:

• Very high performance workstations and PCs readily available at low cost.

• The latest processors can easily be incorporated into the system as they become available.

• Existing software can be used or modified.

1b.21

Beowulf Clusters*

• A group of interconnected “commodity” computers achieving high performance with low cost.

• Typically using commodity interconnects - high speed Ethernet, and Linux OS.

* Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

1b.22

Cluster Interconnects

• Originally fast Ethernet on low cost clusters• Gigabit Ethernet - easy upgrade path

More Specialized/Higher Performance• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor• cLan• SCI (Scalable Coherent Interface)• QNet• Infiniband - may be important as infininband interfaces

may be integrated on next generation PCs

1b.23

Dedicated cluster with a master node and compute nodes

User

Master node

Compute nodes

Dedicated Cluster

Ethernet interface

Switch

External network

Computers

Local network

1b.24

Software Tools for Clusters

• Based upon Message Passing Parallel Programming:

• Parallel Virtual Machine (PVM) - developed in late 1980’s. Became very popular.

• Message-Passing Interface (MPI) - standard defined in 1990s.

• Both provide a set of user-level libraries for message passing. Use with regular programming languages (C, C++, ...).

1b.1 types of parallel computers two principal types: shared memory multiprocessor distributed...

Documents

core slide

common l2 cache

main memory location

multiple memory modules

processor data

cell processor

shared variables

dual processors