parallel architectures: topologies heiko schröder, 2003

32
Parallel Architectures: Topologies Heiko Schröder, 2003

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Architectures: Topologies Heiko Schröder, 2003

Parallel Architectures: Topologies

Heiko Schröder, 2003

Page 2: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 2

Types of sequential processors (SISD)

Types of sequential processors (SISD)

processor memory

processor memory

memory

memory

cache memory

memory

memory

processor

Von Neumann bottleneck

Page 3: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 3

SIMD MIMDSIMD MIMD

PE

PE

PE

PE

PE

Globalcontrol unit

Interconnection network

PE +control unit

PE +control unit

PE +control unit

PE +control unit

Interconnection network

SPMDSIMD

Page 4: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 4

Message passing /shared address space

Message passing /shared address space

PE + Mcontrol unit

PE + Mcontrol unit

PE + Mcontrol unit

PE + Mcontrol unit

Interconnection network

P

P

P

P

P

M

M

M

M

Interconnection network

P/M

Page 5: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 5

Various communication networksState of the art technologyImportant aspects of routing schemesKnown results (theory)

The internet

Page 6: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 6

Desirable feature of a network

1. Algorithmic•Low diameter (1, complete graph)•High bisection width (complete graph) n(n-1)/2 edges

Degree n-1

2. Technical•Low degree (pin limitations – constant – modular – mesh) •Short wires (mesh)•Small area (mesh) •Regular structure (mesh)

Page 7: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 7

Diameter n-1Bisection width 1

Connection networks IConnection networks I

1-D mesh (linear array)

Page 8: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 8

TreeDiameter 2(log n)Bisection width 1

Page 9: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 9

H-treeH-tree

Area: O(n)Longest wire :O(n)

Clock distribution

Page 10: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 10

2-D Mesh

Diameter:

Bisection width :

n

n

Page 11: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 11

TorusTorus

1 2 3 4 5 6 7 8 1 8 2 7 3 6 4 5 12345678

18273645

1 2 3 4 5 6 7 8

Reduced diameterIncreased bisection widthAll nodes equivalentLong wires?

Page 12: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 12

3-D Mesh

Diameter:

Bisection:

n3

23 n

Page 13: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 13

HypercubeHypercube

0-D0

11-D

00

01

10

112-D

000 010

001 011

100 110

101 111

3-D

0 1

4-D

diameter log nbisection width n/2

Page 14: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 14

Cube Connected CyclesCube Connected Cycles

6424 4 nodes

# nodes kk 2*

204828 8 nodes

k2Diameter>

12 kbisection

Page 15: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 15

Exchange (lsb)Shuffle (rotate -- left or right)

000 001

100

010 011

101

110 111

8-node shuffle-exchange graph

Degree: 3Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges

Bisection width: (n / log n)

Page 16: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 16

0000 0001 1110 11110100 0101 1010

1000

0010

1001

0011 0110

1100

1011

0111

1101

Exchange (lsb)Shuffle (rotate -- left or right)

16-node shuffle-exchange graph

u1u2…uk-1uk

exu1u2…uk-1v1

uk v1v2…vk-1…u2…uk v1v2

ls+ex

v1v2…vk

ls+ex

Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges

Bisection width: (n / log n)

Degree: 3

Page 17: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 17

u1u2…uk-1uk u2u3…uk-1uk00

u1u2…uk-1uk u2u3…uk-1uk11

3-dimensional de Bruijn graph

In-degree = out-degree = 2Diameter: log nBisection width: (n / log n)

Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once

1111001011010000 De Bruijn sequence

000

100

001

111

110

101010

011

0

0 0

0

0

0

0

0

1

1

1

1 1

1

1

1

Page 18: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 18

Butterfly networkButterfly network

Unique path

FFTrouting

sorting

Page 19: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 19

Benes networkBenes network

Page 20: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 20

Mesh of treesMesh of trees

Diameter (log n)Bisection width ( ) n

Page 21: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 21

The Power of Hypercubes The Power of Hypercubes

4-D

•Hamiltonian cycle•Gray codes•k-D meshes (tori), N-nodes•simulates mesh of trees•simulates hypercubic networks•contains complete binary tree, almost•normal algorithms

Page 22: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 22

Hamiltonian CycleHamiltonian Cycle

A hypercube contains a Hamiltonian cycle -- proof by induction.

Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).

Page 23: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 23

Gray codeGray code

01

00011110

000001011010110111101100

reflection

Page 24: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 24

Hypercube contains meshes/toriHypercube contains meshes/tori

20

30

21

31

23

33

22

32

10

00

11

01

13

03

12

02wrap around

Theorem:Any n1 x n2 x … x nk mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if ni = 2n .Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)

Page 25: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 25

Hypercube contains double-rooted treesHypercube contains double-rooted trees

HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay).

double-roots (different dimension)

Page 26: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 26

Normal algorithmsNormal algorithms

A hypercube algorithm is said to be normal if •only one dimension of hypercube edges is used at any step and •if consecutive dimensions are used in consecutive steps.

•Most hypercube algorithms are normal.•Normal algorithms can be embedded efficiently on hypercubic networks

Page 27: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 27

0 1 23

4

5

6

7

8

9

10

11

1213

14151617

1819

2021

22

23

24

25

26

27

2829

3031

Josephus graph:Every even node k is connected to k+2i-3Diameter: about (log n) / 2

1

1

1

1

1

2

2

2

2

2

2

22

2

2

2

2

2

2

2

Page 28: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 28

1234 32142314

1324

3124

2134

4132

1432

3412

4312

1342314221434123

1423

2413

4213

1243

3241

2341

4321

3421

2431

4231 Star graph:

Set of nodes: k! nodes of degree k-1.Permutations of k elements.

Set of edges: Exchange of first element with one other.

Small degree, diameter about 2 log n .

Open problems:E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles?

Number of nodes versus degree (Star/HC):24, 120, 720, 4340, 34720, 31248016, 32, 64, 128, 256, 512

Page 29: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 29

pin - limitationspin - limitations

14-D

12

192

16

256

16

Page 30: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 30

wiring - limitationswiring - limitations

4-D

12

1

216 nodes

bisection width: 256 32 K 25cm 32 m

Page 31: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 31

Improve the topology?

The internet

Page 32: Parallel Architectures: Topologies Heiko Schröder, 2003

Heiko Schröder, 2003 Parallel Architectures 32

against parallelismagainst parallelism

• cost(large) < cost (2 small)

• all the FORTRAN / C software

• let’s stick to pipelining

• let’s wait for faster machines

• Amdahl’s Law