computer architecture distributed memory mimd architectures ola flygt växjö university ...
TRANSCRIPT
Computer Computer ArchitectureArchitecture
Distributed Memory MIMD Distributed Memory MIMD ArchitecturesArchitecturesOla Flygt
Växjö Universityhttp://w3.msi.vxu.se/users/ofl/
[email protected]+46 470 70 86 49
Outline
Definition and Design SpaceComputational ModelGranularityNode organizationInterconnection network
TopologySwitchingRouting
CH01
Multicomputers
Distributed Memory MIMD systems are often called Multicomputers
They may or may not have a virtually shared address space
They are typically more loosely coupled than Shared Memory MIMDs
Design space of Multicomputers
CH01
Computational Model
In theory any computational model may be used
In practice some have been implementedConventional + communicationCSP (Communicating Sequential Processes)Dataflow or actor based object oriented
Today almost only the conventional model is used
Granularity
As before Granularity is a parameter which applies to both node size and how much the problem is partitioned
They are of course interrelated and as before we have some optionsFine grainedMedium grainedCoarse grained
Generic Node Architecture
CH01
Generic Organization Model Of The Message-Passing Multicomputers
1st generation
CH01
Generic Organization Model Of The Message-Passing Multicomputers
Decentralized 2nd generation
CH01
Generic Organization Model Of The Message-Passing Multicomputers
Centralized 2nd generation
CH01
Generic Organization Model Of The Message-Passing Multicomputers
3rd generation
CH01
Classification Of Multicomputers
CH01
Main Network Topologies
CH01
Main Network Topologies
CH01
Main Network Topologies
CH01
Main Network Topologies
CH01
Static Network parametersTopology
Node degree Diameter
Bisection width
Arc connectivity Cost
Linear array 1 or 2 N-1 1 1 N-1
Ring 2 N/2 2 2 N
Star 1 or N-1 2 1 1 N-1
Binary tree 1, 2 or 3 2log((N+1)/2) 1 1 N-1
2-D mesh 2, 3 or 4 2(N½-1) N½ 2 2(N-N½)
2-D wraparound
mesh 4 N½2N½
4 2N
3-D cube 3, 4, 5 or 6 3(N⅓-1) N⅔ 3 2(N-N⅔)
Hypercube logN logN N/2 logN (NlogN)/2
Completely connected N-1 1 N2/4 N-1 N(N-1)/2
CH01
Design Space of Switching techniques
CH01
Packet Switchingarrangement
CH01
Packet Switchinglatency
CH01
Packet Switching
All the messages are divided into packets which are sent independently via the communication network between the source and destination nodes
The messages are transmitted in a store-and-forward fashion (each byte contained in a message had to be stored at each node along a route and forwarded to the next hope.)
CH01
Packet Switching
A packet consists of a header and the data. A header contains the necessary routing information based on that the switching unit decides where to forward the packet
When a packet arrives at an intermediate node, the whole packet is stored in a packet buffer.
Main drawback: latency is proportional to the message path length. (This was the reason why the diameter was the most important parameter in the first generation multicomputers, and why the hypercube was so popular.
CH01
Circuit Switchingarrangement
CH01
Circuit Switchinglatency
CH01
Circuit Switching
In the first phase of the communication a path (a circuit) is built up between the source and destination by sending a special short message (called probe)
The probe has similar function as the header of packets in the packet switching system
The circuit is held until the entire message is transmitted
During the communication the channels constituting the circuit are reserved exclusively, no other messages can be transmitted by them.
CH01
Circuit Switching
In the last phase the circuit is torn down either by the tail of transmitted message or by an acknowledgement message returned by the destination node
If a desired channel is used by another circuit in the circuit establishment phase, the partially built up circuit may be torn down
Circuit switching does not need packetizing. No matter what the message size is
There is no need for buffering The most important benefit: If the length of the probe, P
is much smaller then the length of the message, M then the latency becomes independent of the communication distance.
CH01
Virtual Cut-Through Switching
arrangement
CH01
Virtual Cut-Through Switching
latency
CH01
Virtual Cut-Through Switching
Attempt to combine the benefits of packet switching and circuit switching
The message is divided in to small units called flow control digits, or flits
As long as the required channels are free, the message is forwarded flit by flit among the nodes in a pipeline fashion
If the required channel is busy, flits are buffered at intermediate nodes
CH01
Virtual Cut-Through Switching
If the buffers are large enough, the entire message is buffered at the blocked intermediate node, resulting a behavior similar to packet switching
If the buffers are not large enough, the message will be buffered across several nodes, holding the links between them
Main benefit: If HF (length of the header flit) << M, the latency becomes independent of the distance, D
CH01
Worm-Hole Switchingarrangement
CH01
Worm-Hole Switchinglatency
CH01
Worm-Hole Switching
Do not create a circuit between sender and receiver. Instead, an initial control message at the start of the message establishes a path through the network and all subsequent data for that message are forwarded along that path
The message is broken into very small pieces (flits) and the network is pipelined. This is referred to as worm-hole routing due to the way that a message worms its way through the system
CH01
Worm-Hole Switching
A special case of virtual cut-through, where the buffers at the intermediate nodes have the size of a flit
There is a no start-up overhead related to distance, the entire message is not penalized due to the pipelining. If P (packet size) is small relative to N (message length), T will be similar to that for the circuit switching system in that T is not very dependent on D (distance)
The primary advantage of such a network is that links need not be blocked for the entire message duration, and (after introducing the virtual channel concept) it is possible to multiplex messages along individual links.
CH01
Routing protocols
Location of routing ”intelligence’’Source-based routing
Routers “eat” the head of a packetLarger packetsNo fault tolerance
Distributed (Local) routingMore complex routersSmaller packets
Classification of Routing protocols
CH01
Classification of Adaptive Routing
protocols
CH01
Routing protocolsTerminology
Minimal = only paths equal to the shortest path is selected
Profitable = only channels known to move closer to the goal is selected
Misrouting = all channels may be usedProgressive = never backtracks even if
blockedPartially adaptive = not all channels
may be selected as the next step
CH01
Routing protocols
Routing may cause: Deadlocks
Buffer deadlock (store-and-forward switching)
Channel deadlock (wormhole routing)Livelocks
Packets are forwarded in a loop in the network
Routing deadlocks
CH01
Routing protocolsDeterministic routing
X-Y RoutingWalk one
dimension at a time
CH01
Routing protocolsDeterministic routing
Interval labelingDistributed routing with simple
routing tables in the nodes
CH01
Routing protocolsAdaptive routing
Decision on next channel based on the current blocking situation
Can potentially give better utilization
CH01
Deadlock avoidance
Deterministic routing (e.g. X-Y)Partially adaptive routing
For example, west-first routing for 2D meshes: route a packet first to the west (if required), then route the packet adaptively to north, south or east
CH01
West-first routing example
CH01
Deadlock avoidance, cont.
Virtual channelsVirtual channels are logical links
between two nodes using their own buffers and multiplexed over a single physical channel
Virtual channels “break” dependency cycles
CH01
Virtual channels
AdvantagesIncreased network throughputDeadlock avoidanceVirtual topologiesDedicated channels (e.g. debugging,
monitoring) Disadvantages
Hardware costHigher latencyIncoming packets may be out-of-order
CH01
Complex communication support
Common communication patternsPartner communication (unicast)Multicast (one-to-many)Broadcast (one-to-all)Exchange (many-to-many or all-to-all)
Routers may feature hardware support for these communication patterns
CH01
Complex communication support
An example: multicast support
Software unicasts often implemented using a tree communication structure
“Replication” and “Routing support” require the destinations in the packet-header
Replication often found in wormhole networks
CH01
Multicomputers today
The idea with a Distributed Memory MIMD is the basis for most parallel (super computer) systems today
The idea have evolved intoCluster computing, using a LAN as
interconnection networkGrid computing, using more loosely
connected nodes (WAN, different owners)
CH01