compiling fortran d for mimd distributed machines authors: seema hiranandani, ken kennedy, chau-wen...

28
Compiling Fortran D For MIMD Distributed Machines For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand Tuesday, June 21, 2022

Upload: beverly-hector-allen

Post on 29-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Compiling Fortran D

For MIMD Distributed MachinesFor MIMD Distributed MachinesAuthors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng

Published: 1992

Presented by: Sunjeev Sikand

Saturday, April 22, 2023

Page 2: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Problem

• Parallel computers represent the only plausible way to continue to increase the computational power available to scientists and engineers

• However, they are difficult to program

• In particular MIMD machines require message-passing to separate address spaces and synchronizing among processors

Page 3: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Problem cont.

• Because parallel programs are machine-specific scientists are discouraged from utilizing them because they lose their investment when the program changes or a new architecture arrives

• However, vectorizable programs are easily maintained, debugged, portable, and the compilers do all the work

Page 4: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Solution• Previous Fortran dialects lack a means of specifying a

data decomposition• The authors believe that if a program is written in a data

parallel programming style with reasonable data decompositions it can be implemented efficiently.

• Thus they propose to develop a compiler technology to establish such a machine-independent programming model.

• Want to reduce both communication and load imbalance

Page 5: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Data Decomposition• A decomposition is an abstract problem or index

domain; it does not require any storage

• Each element of a decomposition represents a unit of computation

• The DECOMPOSITION statement declares the name, dimensionality, and size of a decomposition for later use

• There are two levels of parallelism in data parallel applications

Page 6: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Decomposition Statement

DECOMPOSITION D(N,N)

Page 7: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Data Decomposition - Alignment• First level of parallelism is array

alignment/problem mapping that is how arrays are aligned with respect to one another

• Represents the minimal requirements for reducing data movement for the program given an unlimited number of processors

• Machine independent and depends on the fine-grained parallelism defined by the individual member of data arrays

Page 8: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Alignment cont.

• Corresponding elements in aligned arrays are always mapped to the same processor

• Array operations between aligned arrays are usually more efficient than array operations between arrays that are not known to be aligned.

Page 9: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Alignment ExampleREAL A(N,N)

DECOMPOSITION D(N,N)

ALIGN A(I,J) with D(J-2,I+3)

Page 10: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Data Decomposition - Distribution

• Other level of parallelism is distribution/machine mapping that is how arrays are distributed on the actual parallel machine

• Represents the translation of the problem onto the finite resources of the machine

• Affected by the topology, communication mechanisms, size of local memory, and number of processors on the underlying machine

Page 11: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Distribution cont.

• Specified by assigning an independent attribute to each dimension.

• Predefined attributes include BLOCK, CYCLIC, and BLOCK_CYCLIC

• The symbol : marks dimensions that are not distributed

Page 12: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Distribution Example 1

DISTRIBUTE D(:,BLOCK)

Page 13: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Distribution Example 2

DISTRIBUTE D(:,CYCLIC)

Page 14: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Fortran D Compiler

• Two major steps in writing a data parallel program are selecting a data decomposition and using it to derive node programs with explicit movement

• The former is left to user• Latter is automatically generated by the compiler

when given a data decomposition• Translated program to a SPMD program with

explicit message passing that execute directly on the nodes of the distributed-memory machine

Page 15: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Fortran D Compiler Structure1 Program Analysis

a-Dependence Analysis

b-Data Decomposition Analysis

c-Partitioning Analysis

d-Communication Analysis

2 Program optimization

a-Message vectorization

b-Collective communications

c-Run-Time processing

d-Pipelined computations

3 Code generation

a-Program partitioning

b-Message generation

c-Storage management

Page 16: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Partition Analysis

Original programREAL A(100)do i = I, I00A(i) = 0.0enddo

SPMD node ProgramREAL A(25)do i = i, 25A(i) = 0.0enddo

• Converting global to local indices

Page 17: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Jacobi Relaxation

• In the grid approximation that discretizes the physical problem, the heat flow into any given point at a given moment is the sum of the four temperature differences between that point and each of the four points surrounding it.

• Translating this into an iterative method, the correct solution can be found if the temperature of a given grid point at a given iteration is taken to be the average of the temperatures of the four surrounding grid points at the previous iteration.

Page 18: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Jacobi Relaxation CodeREAL A(100,100), B(100,100)DECOMPOSITION D(100,100)ALIGN A, B with DDISTRIBUTE D(:,BLOCK)

do k = l,timedo j = 2,99

do i = 2,99S1 A(i,j) = (B(i,j-l)+B(i-l,j)+

B(i+l,j)+B(i,j+l))/4enddo

enddodo j = 2,99

do i = 2,99S2 B(i,j) = A(i,j)

enddoenddo

enddo

Page 19: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Jacobi Relaxation Processor Layout

• Compiling for a four-processor machine.

• Both arrays A and B are aligned identically with decomposition D, so they have the same distribution as D.

• Because the first dimension of D is local and the second dimension is block-distributed, the local index set for both A and B on each processor (in local indices) is [1:100,1:25].

Page 20: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

1 100

2 25 26 2

99

50 51 75 76 991

100

Jacobi Relaxation cont.

Page 21: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Jacobi Relaxation cont.

• The iteration set of the loop nest (in global indices) is [l:time,2:99,2:99].

• Local iteration sets for each processor (in local indices)

• Proc(1) = [1 : time, 2 : 25, 2 : 99]

• Proc(2 : 3) = [1 :time, 1 : 25, 2 : 99]

• Proc(4) = [1 : time, 1 : 24, 2 : 99]

Page 22: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Generated JacobiREAL A(100,25), B(100,0:26)

if (Plocal = 1) lb1 = 2 else lb1 = 1

if (Plocal = 4) ub1 = 24 else ub1 = 25

do k = l,time

if (Plocal > l) send(B(2:99,1), Pleft)

if (Plocal < 4) send(B(2:99,25), Pright)

if (Plocal < 4) recv(B(2:99,26), Pright)

if (Plocal > 1) recv(B(2:99,0) , Pleft)

do j = lb1, ub1

do i = 2,99

A(i,j) = (B(i,j-l)+B(i-l,j)+

B(i+l,j)+B(i,j+l) )/4

enddo

enddo

Page 23: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Generated Jacobi cont.

do j = lb1,ub1

do i = 2,99

B(i,j) = A(i,j)

enddo

enddo

enddo

• Only true cross-processor dependences are on the k loop thus able to vectorize messages

Page 24: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Pipelined Computation

• In loosely synchronous all processors execute in loose lockstep, alternating between phases of local computation and global communication e.g. Red Black SOR and Jacobi

• However some computations such as SOR contain loop carried dependences

• They present an opportunity to exploit parallelism through pipelining.

Page 25: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Pipelined Computation cont.

• The observation is that for some pipelined computations, the program order must be changed.

• Fine grained pipelining interchanges cross processor loops as deeply as possible to improve sequential computation but incurs the most communication overhead

• Coarse grained pipelining uses strip mining and loop interchange to adjust the granularity of the pipelining. Decreases communication overhead at the expense of some parallelism

Page 26: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Conclusions• A usable and efficient machine independent parallel

programming model is needed to make large-scale parallel machines useful to scientific programmers

• Fortran D with its data decomposition model performs message vectorization, collective communication, fine-grained pipelining, and several other optimizations for block distributed arrays

• Fortran D compiler will generate efficient code a for a large class of data parallel programs with minimal effort

Page 27: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Discussion• Q: How is this applicable to sensor networks?

• A: There is no reference to sensor networks explicitly as this paper was written over a decade ago. But they provide a unified programming methodology to distribute data and communicate among processors. Replace this with motes and you’ll this is indeed relevant

Page 28: Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand

Discussion cont.• Q: What about issues such as fault tolerance?• A: Point well taken. If a message is lost it doesn’t seem

as though the infrastructure is there to deal with this. The model could be extended to have redundant computation. Perhaps even check pointing but as someone mentioned the memory of motes may be an issue here.

• Q: They provide a means for load balancing is this even applicable to sensor networks?

• A: Yes, it is in sensor networks as we want to balance the load so energy isn’t completely spent on a mote.