parallel genetic algorithms - golden energy computing...

49
Parallel Genetic Algorithms Timothy H. Kaiser, Ph.D. Friday, August 12, 11

Upload: hakiet

Post on 25-May-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Parallel Genetic Algorithms

Timothy H. Kaiser, Ph.D.

Friday, August 12, 11

2

Introduction

Project Purpose

What is a Genetic Algorithm?

Serial Algorithm

Modes of Parallelization

Parallel Sort

My All to All

Example Problems

Results

Future Direction

Friday, August 12, 11

3

Project Purpose

Learn more about Genetic Algorithm, GA, construction

Develop a Generic GA for use with various optimization problems

Study various methods of Parallelizing GAs

Develop a GA system which runs well even with expensive fitness function evaluations

Friday, August 12, 11

4

What is a Genetic

An “optimization” system

Find good, but maybe not optimal, solutions to difficult problems

Often used on NP-Hard problems

Combinatorial Optimization

Friday, August 12, 11

5

What is a Genetic

Requirements

Solution(s) to the problem represented as a string

A fitness function

Takes as input the solution string

Output the desirability of the solution

A method of combining solution strings to generate new solutions

Friday, August 12, 11

6

More Details on Genetic Algorithms

Find solutions to problems by Darwinian Evolution

Potential solutions are thought of a living entities in a population

The strings are the genetic codes of the individuals

Individuals are evaluated for their fitness

The fittest individuals are allowed to live and “sexually” reproduce

There may be some mutation

Parents die and kids start the next generation

Friday, August 12, 11

7

The Genetic Algorithm

Generate to set of potential solutions, genes

Repeat until done

Find the fitness of the various members of the population

Sort the population

Allow the bottom half to die

Allow the top half to reproduce, replacing old population

Possibly have mutations

Friday, August 12, 11

8

How do solutions

One method. . .

abcdefg

1234567

Split the two at some

Arbitrary LocationRecombinethe Pieces

abc defg

123 4567abc4567

Given 2Solutions

Friday, August 12, 11

9

Primary Modes of

Manager keeps all of the population

Population is distributed across all processors

Friday, August 12, 11

10

Manager keeps all of the Population

Pass out genes with request for fitness function evaluation

Workers (and manager) do fitness function evaluations

Keep fitness value along with index into array of solutions

Discard genes

Do a parallel sort with results returned to manager

Manager reproduces top half

Friday, August 12, 11

11

Comments on this strategy

Good load balance if fitness function is expensive

Could use multiple processors for a single fitness function evaluation

Works well if time for fitness function is unpredictable

Easy to implement

Bad memory balance

Large amounts of communication

Boring

Friday, August 12, 11

12

Population is distributed across all Processors

Each processor does evaluation on its sub-population

Each processor does own reproduction

Many interesting variations on this theme

Friday, August 12, 11

13

Variation 1

Allow processors to work independently for I generations

Do a global parallel sort of all of the fitness values

The global top half of all genes are redistributed

Friday, August 12, 11

14

Comments on Variation 1

Allows for Uncoupled to Tightly coupled algorithm

See effect of various mutation rates

Allow separate evolutions

Simulate sequential algorithm

Requires All to All personalized communication which is difficult to set up

Send half of the population, less communication

Each processor receives same number of genes

Each processor sends a different number to various processors

Good load balance

Friday, August 12, 11

15

Variation 2

Allow processors to work independently for J generations

Exchange sub-population left-right, up-down

Comments

Allows the migration of solutions across topology

Easy to implement

See the effect of various mutation rates

Some what artificial

Friday, August 12, 11

16

Variation 3Assign an aggression factor to each processor

Allow processors to work independently for K generations

Aggressive processors force a portion of their population onto other processors

Comments

Requires two All to All personalized communications

Tell how many are coming

Send the Genes

Friday, August 12, 11

17

Variation 4

Variation 1 + Variation 2 + Variation 3

Amount of each variation is controlled by input

A goal of project is to study the effect of the combinations of these variations

Friday, August 12, 11

18

Parallel Sort

Used to sort values of the fitness function

Results are on the manager

Each processor sorts its subset of the fitness function values

Uses a parallel gather routine to merge the sorted lists

Friday, August 12, 11

19

The Sort/Merge active =1while (2*active < P) active = 2 * active

if(myid >= active)then send(data , myid-active) returnendifif(myid + active < p)then recv(new_data, myid+active) data = merge(data , new_data)endifwhile(active > 1) active = active / 2 if(myid >= active)then send(data, myid-active) else recv(new_data , myid+active) data = merge(data, new_data) endifendwhile

Input is sorted list from each

processor

Friday, August 12, 11

The Sort/Merge

0

1

2

3

4

5

6

7

2

3

1

0

1

0

Phase 1 Phase 2 Phase 3

The Source for this algorithmis in the “Hybrid” examples

Friday, August 12, 11

21

My All to All Personalized Communication

Used primarily to redistribute genes

Different processors send/recv different amounts of data

At the time MPIAlltoallv did not work correctly

My algorithm is based on a Hypercube algorithm

Does not require power of 2 processors

Iterate up to power of 2 -1 processors

Check to see if you are sending to a valid processor

Uses simple trick to avoid nonblocking send/receive

If Myid < partner send first

If Myid > partner recv first

Friday, August 12, 11

22

My ALLtoALLv

find n2, the power of two >= numnodesdo i=1,n2-1 do xor to find the processor xchng xchng=xor(i,myid) if(xchng <= (numnodes-1))then if(myid < xchng)then send from myid to xchng recv from xchng to myid else recv from xchng to myid send from myid to xchng endif else skip this stage endifenddo

Friday, August 12, 11

23

Algorithm with 5

Stage Node 0 Node 1 Node 2 Node 3 Node 41a 0 to 1 0 to 1 2 to 3 3 to 2 skip

1b 1 to 0 1 to 0 3 to 2 2 to 3 skip

2a 0 to 2 1 to 3 0 to 2 1 to 3 skip

2b 2 to 0 3 to 1 2 to 0 3 to 1 skip

3a 0 to 3 1 to 2 1 to 2 0 to 3 skip

3b 3 to 0 2 to 1 2 to 1 3 to 0 skip

4a 0 to 4 skip skip skip 0 to 4

4b 4 to 0 skip skip skip 4 to 0

5a skip 1 to 4 skip skip 1 to 4

5b skip 4 to 1 skip skip 4 to 1

6a skip skip 2 to 4 skip 2 to 4

6b skip skip 4 to 2 skip 4 to 2

7a skip skip skip 3 to 4 3 to 4

7b skip skip skip 4 to 3 4 to 3

Friday, August 12, 11

24

An Example Problem

Problem

Given an input signal put on a transmission line (Amplitude Only)

Given the distorted output signal on the other end of the line (Amplitude Only)

Assume the distortion is from change of phase of Fourier Components

Find the change in phase of each of the Fourier Components

Use the changes found to “correct” the signal

1d version of an atmospheric propagation problem

Friday, August 12, 11

25

Fitness function

Find FFT of reference signal

The gene is a integer vector of phase variations

Add the phase variations to FFT of reference signal

Inverse FFT

Compare measured signal to generated signal

Friday, August 12, 11

26

Input for an Example Run

16 processors

Population = 2000 genes

1000 generations

global exchange every 50 generations

Force genes on to other processors every 5 generations

Shift every 12 generations

Friday, August 12, 11

27

Graph of Ideal and Measured Signals

1.11

0.90.80.70.60.50.40.30.20.10

-0.1

Amplitude

6456484032241680Time

Ideal And Measured SignalsMeasured Ideal

Friday, August 12, 11

28

Graph of Ideal Corrected

1.11

0.90.80.70.60.50.40.30.20.10

-0.1

Amplitude

6456484032241680Time

Ideal and Corrected Signals

Corrected Ideal

Friday, August 12, 11

29

Timings

node genes sent fitness sort total in sort time time time 0 1217 39.919 1.328 50.688 1 1176 39.868 1.258 50.862 2 1127 39.979 1.033 50.863 3 1202 39.844 0.952 50.861 4 1279 39.834 0.903 50.825 5 1202 39.937 1.062 50.841 6 1188 39.981 0.966 50.861 7 1151 39.755 0.905 50.827 8 1206 40.002 0.849 50.866 9 1158 40.054 0.855 50.860 10 1218 39.814 0.880 50.825 11 1185 39.721 0.868 50.828 12 1218 39.864 0.881 50.862 13 1190 39.993 0.867 50.827 14 1192 39.963 0.859 50.827 15 1243 39.925 0.850 50.825

Friday, August 12, 11

Another application

Given a Dielectric (carbon fiber) cylinder of fixed shape (a wing) and an incident electrical field (radar wave)

Find the material properties that minimize (or maximize) the returned signal in a particular direction

Friday, August 12, 11

Why of interest

Design of low radar cross section bodies (Stealth Technology)

Similar to the design of high tech car head lights

Represents a class of problems

Friday, August 12, 11

Find Returned Signal

Create a complex matrix (set of linear equations)

Geometrical Properties (expensive but only done one time)

Material Properties (many times)

Solve linear equations (many times)

Use solution to obtain returned signal in a particular direction (minimize)

Friday, August 12, 11

Fitness Function Steps

Create a complex matrix (set of linear equations)

Solve linear equations , find X

The returned signal is a function of Y, the solution of the set of equations

C X Y× =

F=f(R(X))

Friday, August 12, 11

Our MxM Matrix

n=m

n≠m

Constant, only depends on geometry

Changes for every fitness

function evaluation

SymmetricOnly depends on n

ε is a vector of length M that describes the

material properties

Friday, August 12, 11

Solve the Linear Equations

[tkaiser@mio darwin]$ man cgesvSYNOPSIS SUBROUTINE CGESV( N, NRHS, A, LDA, IPIV, B, LDB, INFO )

INTEGER INFO, LDA, LDB, N, NRHS INTEGER IPIV( * ) COMPLEX A( LDA, * ), B( LDB, * )

PURPOSE CGESV computes the solution to a complex system of linear equations A * X = B, where A is an N-by-N matrix and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = P * L * U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then used to solve the system of equations A * X = B.......

LAPACK version 3.0 15 June 2000 CGESV(l)

We use the LAPACK/MKL

routine cgesv

Friday, August 12, 11

The Link Command

mpif90 charles.o darwin.o ga_list_mod.o global.o init.o \mods.o more_mpi.o mpi.o numz.o unique.o wtime.o \laser_new.o \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_intel_lp64.a \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_core.a \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_sequential.a \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_core.a \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_sequential.a \/opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_core.a \-lpthread \-o darwin

Friday, August 12, 11

Our Wing

799 Elements

Friday, August 12, 11

Friday, August 12, 11

Final Problem

Given our 4 materials

material(0)=cmplx(3,-1)material(1)=cmplx(2,-2)material(2)=cmplx(4,-3)material(3)=cmplx(5,-4)

Minimize the average return over 0° to 30°

Small Signals (< -20) don’t count

Friday, August 12, 11

Friday, August 12, 11

Friday, August 12, 11

Friday, August 12, 11

GA statistics

Ran on 64 Mio processorsPopulation size 5120

Terminated after 1313 Generations because of stagnationTotal Run time 8008 seconds

Matrix size 799x799Matrix Inversions 6,722,560

111156041192735111550042035163793410789548086280344598298545557734384207674422880597461956440434923543714883802852623459362767641138209948290962541754032025298906462696551039941826764245862717597589826903187709568163513990163545434131712589472245284979067605823117125785074692904056218529694448947230242574226153830004788278417156172355372230514994709570664303225740550623006686851674536764546411912524900750974104

60830484002104118443786127982311697061639884742739324040156591161344

Sample Space =4799

Friday, August 12, 11

Input

[tkaiser@mio darwin]$ cat darwin.in&DARWIN_DAT POP_SIZE = 5120 , GENERATIONS = 2000, STAGNATE = 100, GENE_SIZE = 799, SEED = -1650, INVERT_RATE = 0.01 MUT_RATE = 0.075, QUIT_VALUE = 20.0, SHIFT_RATE = 5, SHIFT_NUM = 10, GLOBAL_RATE = 12, PRINTING = 0, HAWK_RATE = 0, HAND_OUT = 0, MAXTIME=17100.0, THE_TOP = T, DO_ONE = T /

Friday, August 12, 11

What Next?

Would like to rewrite so the Matrix inversion is done on the GPU.

Friday, August 12, 11

46

Another classic example

Map 4 color

Given a map of a country divided into states

Given 4 colors

Find a coloring of the map so that no neighboring states have the same color

In our example we use the 22 western US states

Known to be NP-hard with 4**22=17 trillion potential solutions

This turned out to be too easy but still interesting

Can’t be done with 3 colors (Consider Colorado)

Friday, August 12, 11

47

Input and Results

&DARWIN_DAT POP_SIZE=200, GENERATIONS=100, STAGNATE=100,

GENE_SIZE=22, SEED=-12345, INVERT_RATE=0.1E-01, MUT_RATE=0.75E-01,

QUIT_VALUE=1.0, SHIFT_RATE=5, SHIFT_NUM=20, GLOBAL_RATE=10, PRINTING=0, HAWK_RATE=0,

HAND_OUT=0, THE_TOP=T, DO_ONE=T /

node genes sent fitness sort total in sort cpu time cpu time cpu time 0 41 0.025 0.008 0.126 1 116 0.025 0.007 0.122 2 46 0.026 0.006 0.124 3 97 0.025 0.006 0.124

the global best fitness is 1.0 for generations= 30

Friday, August 12, 11

48

results 1 ar yellow 2 az blue 3 ca red 4 co green 5 ia red 6 id green 7 ks red 8 la red 9 mn blue 10 mo green 11 mt yellow 12 nd red 13 ne yellow 14 nm yellow 15 nv yellow 16 ok blue 17 or blue 18 sd green 19 tx green 20 ut red 21 wa red 22 wy blue

Our Coloring

Friday, August 12, 11

49

Summary

GAs are useful optimization routines

Easy to parallelize

Test the various parallel strategies

Program showed good speed up

Friday, August 12, 11