innovative methodologies in evolution strategies

Upload: gerhard-herres

Post on 06-Jul-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    1/62

    ICD  Center for Applied Systems Analysis

    Innovative Methodologies in

    Evolution Strategies— INGENET Project Report D 2.2 —

    June 1998

    Thomas Bäck, Boris Naujoks

    Center for Applied Systems Analysis (CASA)

    Informatik Centrum Dortmund

    Joseph-von-Fraunhofer-Str. 20

    D-44227 Dortmund

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    2/62

    ii

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    3/62

    Abstract

    This INGENET report describes the state-of-the-art in research and application of  evo-

    lution strategies with the goals of making this knowledge accessible to the INGENET mem-

    bers in a compact form and outlining the technological and economical perspectives of 

    evolution strategies on the European level.

    Evolution strategies are one of the main paradigms in the field of  evolutionary compu-

    tation, focusing on algorithms for adaptation and optimization which are gleaned from the

    model of organic evolution.The report puts its emphasis on algorithmic and application-oriented aspects of evolu-

    tion strategies. The algorithmic aspects include an overview of all components of a mod-

    ern ( 

      , 

      )-strategy and a detailed explanation of the concept of  strategy parameter self-

    adaptation, which is considered to be the main distinguishing feature between evolution

    strategies and genetic algorithms. The self-adaptation process implements and evolution-

    ary optimization process also on the level of strategy parameters such as mutational step

    sizes and therefore offers an elegant solution to the parameter tuning problem of evolu-

    tionary algorithms. The working principles of self-adaptation are explained in detail in

    section 3 of this report.

    A number of recent variations of the basic evolution strategy, including alternatives for

    the self-adaptation method, the introduction of hierarchies of evolution strategies, and theprinciple of individual aging in the (     ,    ,    ,    )-strategy, are presented in section 4.

    Further aspects which are of strong interest from an application-oriented point of view

    include  noisy and dynamic  object functions as well as  multiple criteria decision making

     problems and constraint handling. These are discussed in section 5, clarifying the fact that

    evolution strategies offer effective techniques for handling all of these additional difficulties

    of practical applications.

    Section 6 gives a brief overview of the parallelization possibilities of evolution strate-

    gies, which are suitable for fine-grained as well as coarse-grained parallelization.

    An overview of practical applications of evolution strategies is given in section 7, where

    case studies are grouped into disciplines and the corresponding literature references are

    given. Due to the strong increase of the number of publications in the field of evolutionary

    computation in the 1990s, the collection of case studies stops with most recent examples

    from 1994, however containing more than 150 examples up to that time.

    The report concludes by giving an outline of the perspectives of evolution strategies

    by discussing its technological future with a focus on the economic potential by industrial

    applications of these algorithms. This outline might serve as a technological roadmap for

    the exploitation of these techniques within a ten year timeframe.

    Thomas Bäck and Boris Naujoks Dortmund, June 1998

    Contact information:

    Center for Applied Systems Analysis

    Informatik Centrum Dortmund

    Joseph-von-Fraunhofer-Str. 20

    D-44227 Dortmund, Germany

    Phone: +49 231 9700 366

    Fax: +49 231 9700 959

    Email: [email protected]

    iii

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    4/62

    iv

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    5/62

    Contents

    1 A Brief History 1

    2 The Algorithm 2

    2.1 Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    2.2 The Structure of Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.6 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3 Self-Adaptation 6

    4 Variations 11

    4.1 Mutative Step-Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    4.2 Derandomized Step-Size Adaptation . . . . . . . . . . . . . . . . . . . . . . . 11

    4.3 Hierarchical Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 12

    4.4 The (    ,    ,    ,    )-Strategy: Aging of Individuals . . . . . . . . . . . . . . . . . . 13

    5 Application-Oriented Extensions 15

    5.1 Noisy Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    5.2 Robust Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    5.3 Dynamic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.4 Multiple Criteria Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 22

    5.5 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    6 Parallel Evolution Strategies 25

    6.1 The Master-Slave Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    6.2 Coarse Grained Parallelism: The Migration Model . . . . . . . . . . . . . . . 26

    6.3 Fine Grained Parallelism: The Diffusion Model . . . . . . . . . . . . . . . . . 27

    6.4 A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    7 Applications 28

    7.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.2 Biotechnology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    7.3 Technical Design Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    7.4 Chemical Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    7.5 Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    7.6 Dynamic Processes, Modeling, Simulation . . . . . . . . . . . . . . . . . . . . 327.7 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    7.8 Microelectronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    7.9 Military . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    7.10 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    7.11 Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    v

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    6/62

    7.12 Production Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    7.13 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    7.14 Supply- and Disposal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    7.15 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    8 Perspectives 37

    References 39

    vi

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    7/62

    1 A Brief History

    Evolution Strategies are a joint development of Bienert, Rechenberg and Schwefel, who did

    preliminary work in this area in the 1960s at the Technical University of Berlin (TUB) in Ger-

    many. First applications were experimental and dealt with hydrodynamical problems like shape

    optimization of a bended pipe [119], drag minimization of a joint plate [164], and structure

    optimization of a two-phase flashing nozzle [210] 1   . Due to the impossibility to describe and

    solve such optimization problems analytically or by using traditional methods, a simple al-gorithmic method based on random changes of experimental setups was developed. In these

    experiments, adjustments were possible in discrete steps only, in the first two cases (pipe and

    plate) by changing certain joint positions and in the latter case (nozzle) by exchanging, adding

    or deleting nozzle segments. Following observations from nature that smaller mutations occur

    more often than larger ones, the discrete changes were sampled from a binomial distribution

    with prefixed variance. The basic working mechanism of the experiments was to create a mu-

    tation, adjust the joints or nozzle segments accordingly, perform the experiment and measure

    the quality criterion of the adjusted construction. If the new construction happened to be better

    than its predecessor, it served as basis for the next trial. Otherwise, it was discarded and the

    predecessor was retained. No information about the amount of improvements or deteriorations

    was necessary. This experimental strategy led to unexpectedly good results both for the bended

    pipe and the nozzle.

    Schwefel was the first who simulated different versions of the strategy on the first available

    computer at TUB, a Zuse Z23 [200], later on followed by several others who applied the simple

    Evolution Strategy to solve numerical optimization problems. Due to the theoretical results of 

    Schwefel’s diploma thesis, the discrete mutation mechanism was substituted by normally dis-

    tributed mutations with expectation zero and given variance [200]. The resulting  two membered 

    ES works by creating one n    -dimensional real-valued vector of object variables from its parent

    by applying mutation with identical standard deviations to each object variable. The resulting

    individual is evaluated and compared to its parent, and the better of both individuals survives to

    become parent of the next generation, while the other one is discarded. This simple selectionmechanism is fully characterized by the term (1+1)-selection.

    For this algorithm, Rechenberg developed a convergence rate theory for n    1   for two

    characteristic model functions, and he proposed a theoretically confirmed rule for changing the

    standard deviation of mutations (the 1  =  5   -success rule) [166].

    Obviously, the (1+1)-ES did not incorporate the principle of a population. A first  multi-

    membered  Evolution Strategy or (     +1)-ES having >  1   was also designed by Rechenberg

    to introduce a population concept. In a (    +1)-ES     parent individuals recombine to form one

    offspring, which after being mutated eventually replaces the worst parent individual — if it is

    better (extinction of the worst). Mutation and adjustment of the standard deviation was realized

    as in a (1+1)-ES, and a recombination mechanism as explained in section 2.4 was used. This

    strategy, discussed in more detail in [12], was never widely used but provided the basis to facil-

    itate the transition to the (     +    )-ES and (     ,   )-ES as introduced by Schwefel 2   [201, 202, 203].

    1 This experiment is one of the first known examples of using operators like gene deletion and gene duplication,

    i.e. the number of segments the nozzle consisted of was allowed to vary during optimization.2 The material presented here is based on [203] and a number of research articles, but in the meantime an

    1

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    8/62

    Again the notation characterizes the selection mechanism, in the first case indicating that the

    best     individuals out of the union of parents and offspring survive while in the latter case

    only the best     offspring individuals form the next parent generation (consequently, >    is

    necessary). Currently, the (     ,    )-strategy characterizes the state-of-the-art in Evolution Strategy

    research and is therefore the strategy of our main interest to be explained in the following. As

    an introductory remark it should be noted that the major quality of this strategy is seen in its

    ability to incorporate the most important parameters of the strategy (standard deviations and

    correlation coefficients of normally distributed mutations) into the search process, such that op-timization not only takes place on object variables, but also on strategy parameters according to

    the actual local topology of the objective function. This capability is termed  self-adaptation by

    Schwefel [204] and will be a major point of interest in discussing the Evolution Strategy.

    2 The Algorithm

    2.1 Working Principle

    In general, evolutionary algorithms mimic the process of natural evolution, the driving process

    for the emergence of complex and well adapted organic structures, by applying variation and se-lection operators to a set of candidate solutions for a given optimization problem. The following

    structure of a general evolutionary algorithm reflects all essential components of an evolution

    strategy as well (see e.g. [10]):

    Algorithm 1:

    t : = 0

    initializeP  ( t )

    evaluateP  ( t )

    while not terminate do

    0

    ( t ) : = 

    variation( P  ( t ) )

      ;

    evaluate( P 

    0

    ( t ) )

      ;

    P  ( t + 1 ) : =  

    select ( P 

    0

    ( t )   Q  )

    t : =  t + 1

    od

    In case of a (    ,    )-evolution strategy, the following statements regarding the components of 

    algorithm 1 can be made:

      P  ( t ) denotes a population (multiset) of      individuals (candidate solutions to the given

    problem) at generation (iteration) t   of the algorithm.

     

    The initialization att = 0 

      can be done randomly, or with known starting points obtainedby any method.

      The evaluation of a population involves calculation of its members quality according to

    the given objective function (quality criterion).

    updated and extended edition of Schwefel’s book was published (i.e., [207]).

    2

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    9/62

      The variation operators include the exchange of partial information between solutions (re-

    combination) and its subsequent modification by adding normally distributed variations

    (mutation) of adaptable step sizes. These step sizes are themselves optimized during the

    search according to a process called self-adaptation.

      By means of recombination and mutation, an offspring population P  0 ( t )  of         candi-

    date solutions is generated.

     

    The selection operator chooses the 

      best solutions fromP 

    0

    ( t )

      (i.e.,Q  =   

      ) as starting

    points for the next iteration of the loop. Alternatively, a (    +    )-evolution strategy would

    select the 

      best solutions from the union of P 

    0

    ( t )

      andP  ( t )

      (i.e.,Q  =  P  ( t )

      ).

      The algorithm terminates if no more improvements are achieved over a number of subse-

    quent iterations or if a given amount of time is exceeded.

      The algorithm returns the best candidate solution ever found during its execution.

    In the following, these basic components of an evolution strategy are explained in some

    more detail. For extensive information about evolution strategies, refer to [5, 169, 207].

    Using a more formal notation following the outline given in [209, 208], one iteration of thestrategy, that is a step from a population P  ( T  )  towards the next reproduction cycle with P  ( T  + 1 )  ,

    can be modeled as follows:

    ( T  + 1 )

    : =  o p t

    E S 

    ( P 

    ( T  )

    ) (1)

    where o p tE S 

    : I 

     

    !  I 

       is defined by

    o p t

    E S 

    : =  s e l   ( m u t   r e c  )

    (2)

    operating on an input population P  ( T  )  according to

    o p t

    E S 

    ( P 

    ( T  )

    ) =  s e l ( P 

    ( T  )

     

    i = 1

    f  m u t ( r e c  ( P 

    ( T  )

    ) ) g 

    (3)

    (here,t 

      denotes the union operation on multisets). Equation (3) clarifies that the population

    at generation T  + 1   is obtained from P  T   by first applying a    -fold repetition of recombination

    and mutation, which results in an intermediate populationP 

    0  of size 

      , and then applying the

    selection operator to the union of P 

    ( T  )  andP 

    0  . Recall that the recombination operator generates

    only one individual per application, which can then be mutated directly.

    In the following, both the formal as well as the informal way of describing the algorithmic

    components will be used as it seems appropriate.

    2.2 The Structure of Individuals

    For a given optimization problem

    f  : M    I R 

    !  I R f  ( ~x  ) !  m i n 

    an individual of the evolution strategy contains the candidate solution~x  2  I R 

    n   as one part

    of its representation. Furthermore, there exist a variable amount (depending on the type of 

    3

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    10/62

    strategy used) of additional information, so-called  strategy parameters, in the representation of 

    individuals. These strategy parameters essentially encode the n    -dimensional normal distribution

    which is to be used for the variation of the solution.

    More formally, an individual ~a  = ( ~ x ~ ~    )  consists of up to three components ~x  2  I R  n   (the

    solution), ~  2  I R  n    (a set of standard deviations of the normal distribution), and   2  ;    n 

    (a set of rotation angles representing the covariances of the n    -dimensional normal distribution),

    where n 

    2 f  1  : : : n   g   and n  

    2 f  0  ( 2  n  ;  n 

    ) ( n 

    ;  1 ) =  2  g   . The exact meaning of these

    components is described in more detail in section 2.3.

    2.3 Mutation

    The mutation in evolution strategies works by adding a normally distributed random vector

    ~z    N  (

    0  C  ) with expectation vector ~  0    and covariance matrix C  ;  1  , where the covariance matrix

    is described by the mutated strategy parameters of the individual. Depending on the amount of 

    strategy parameters incorporated into the representation of an individual, the following main

    variants of mutation and self-adaptation can be distinguished:

      n 

    = 1  , n  

    = 0   : The standard deviation for all object variables is identical (    ), and all

    object variables are mutated by adding normally distributed random numbers with

     

    0

    =    e x p (  

    0

    N  ( 0  1 ) )

    (4)

    0

    i

    =  x 

    i

    +   

    0

    i

    ( 0  1 )

    (5)

    where  0

    /  (

    n  )

    ;  1  . Here, N  ( 0  1 )  denotes a value sampled from a normally distributed

    random variable with expectation zero and variance one. The notation N i

    ( 0  1 )  indicates

    the random variable to be sampled anew for each setting of the indexi

      .

      n 

    =  n  , n  

    = 0   : All object variables have their own, individual standard deviation  i

      ,

    which determines the corresponding modification according to

     

    0

    i

    =   

    i

    e x p (  

    0

    N  ( 0  1 ) +    N 

    i

    ( 0  1 ) ) (6)

    0

    i

    =  x 

    i

    +   

    0

    i

    N  ( 0  1 ) (7)

    where   0 /  (p 

    2  n  )

    ;  1  and   /  (q 

    n  )

    ;  1  .

      n 

    =  n  , n  

    =  n  ( n  ;  1 ) =  2   : The vectors ~    and ~     represent the complete covariance

    matrix of then 

      -dimensional normal distribution, where the covariances are given by rota-

    tion angles 

    j

      describing the coordinate rotations necessary to transform an uncorrelated

    mutation vector into a correlated one. The details of this mechanism can be found in [5]

    (pp. 68–71) or [180]. The mutation is performed according to

     

    0

    i

    =   

    i

    e x p (  

    0

    N  ( 0  1 ) +    N 

    i

    ( 0  1 ) )

    (8)

     

    0

    j

    =   

    j

    +    N 

    j

    ( 0  1 )

    (9)

    ~x 

    0

    =  ~x  +  N  (

    0  C  ( ~ 

    0

    ~  

    0

    ) ) (10)

    whereN  (

    0  C  ( ~ 

    0

    ~  

    0

    ) )

      denotes the correlated mutation vector and    0  0 8 7 3  

      .

    4

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    11/62

    The amount of information included into the individuals by means of the self-adaptation

    principle increases from the simple case of one standard deviation up to the order of  n  2  addi-

    tional parameters in case of  correlated mutations, which reflects an enormous degree of freedom

    for the internal models of the individuals. This growing degree of freedom often enhances the

    global search capabilities of the algorithm at the cost of the expense in computation time, and

    it also reflects a shift from the precise  adaptation  of a few strategy parameters (as in case of 

    = 1  ) to the exploitation of a large  diversity of strategy parameters.

    One of the main design parameters to be fixed for the practical application of the evolutionstrategy concerns the choice of 

      andn 

     

      , i.e., the amount of self-adaptable strategy parameters

    required for the problem.

    2.4 Recombination

    In evolution strategies recombination is incorporated into the main loop of the algorithm as the

    first variation operator and generates a new intermediate population of      individuals by    -fold

    application to the parent population, creating one individual per application from %    (1    %       )

    individuals. Normally, %  = 2   or %  =     (so-called global recombination) are chosen (but see

    also section 4.4 for a generalization). The recombination types for object variables and strategyparameters in evolution strategies often differ from each other, and typical examples are  dis-

    crete recombination (random choices of single variables from parents, comparable to uniform

    crossover in genetic algorithms) and intermediary recombination (arithmetic averaging). A typ-

    ical setting of the recombination consists in using discrete recombination for object variables

    and global intermediary recombination for strategy parameters. For further details on these

    operators, see [5].

    The recombination operator needs also be specified for a ( 

      , 

      )-evolution strategy when

    >  1 

    is chosen.

    2.5 Selection

    Essentially, the evolution strategy offers two different variants for selecting candidate solutions

    for the next iteration of the main loop of the algorithm: (     ,    )-selection and (     +     )-selection.

    The notation (    ) indicates that     parents create >    offspring by means of recombina-

    tion and mutation, and the best     offspring individuals are deterministically selected to replace

    the parents (in this case, Q  =     in algorithm 1). Notice that this mechanism allows that the

    best member of the population at generation t + 1   might perform worse than the best individual

    at generation t  , i.e., the method is not  elitist , thus allowing the strategy to accept temporary

    deteriorations that might help to leave the region of attraction of a local optimum and reach

    a better optimum. Moreover, in combination with the self-adaptation of strategy parameters,(    ,    )-selection has demonstrated clear advantages over its competitor, the (    +    ) method.

    In contrast, the (     +     )-strategy selects the     survivors from the union of parents and off-

    spring, such that a monotonic course of evolution is guaranteed (Q  =  P  ( t )

      in algorithm 1).

    For reasons related to the self-adaptation of strategy parameters, the ( 

      , 

      )-evolution strategy

    is typically preferred.

    5

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    12/62

    2.6 Termination Criterion

    There are several options for the choice of the termination criterion, including the measurement

    of some absolute or relative measure of the population diversity (see e.g. [5], pp. 80–81), a

    predefined number of iterations of the main loop of the algorithm, or a predefined amount of 

    CPU time or real time for execution of the algorithm.

    3 Self-Adaptation

    The settings for the   learning rates    ,   0  and  0

      are recommended by Schwefel as reasonable

    heuristic settings (see [202], pp. 167–168), but one should have in mind that, depending on

    the particular topological characteristics of the objective function, the optimal setting of these

    parameters might differ from the values proposed. For n 

    = 1   , however, [26] has recently

    theoretically shown that, for the sphere model

    f  ( ~x  ) = 

    i = 1

    ( x 

    i

    ;  x 

    i

    )

    2

    (11)

    the setting  0

    /  1  = 

    n   is the optimal choice, maximizing the convergence velocity of the evo-

    lution strategy. Moreover, for a (1     )-evolution strategy Beyer derived the result that  0

     

    1

    n  (for     1 0   ), where c 1

      denotes the progress coefficient of the (1     )-strategy.

    For an empirical investigation of the self-adaptation mechanism defined by the mutation

    operator variants (4)–(8), [204, 205, 206] used the following three objective functions which

    are specifically tailored to the number of learnable strategy parameters in these cases:

    1. Function

    1

    ( ~x  ) = 

    i = 1

    2

    i

    (12)

    requires learning of one common standard deviation     , i.e., n  = 1   .

    2. Function

    2

    ( ~x  ) = 

    i = 1

    i x 

    2

    i

    (13)

    requires learning of a suitable scaling of the variables, i.e., n 

    =  n   .

    3. Function

    3

    ( ~x  ) = 

    i = 1

    i

    j = 1

    j

    2

    (14)

    requires learning of a positive definite metrics, i.e., individual   i  and n    =  n  ( n  ;  1 ) =  2 

    different covariances.

    As a first experiment, Schwefel compared the convergence velocity of a (1  1 0 

      ) and a (1+10)-

    evolution strategy withn 

    = 1 

      on the sphere modelf 

    1

      withn  = 3 0 

      . The results of a comparable

    experiment performed for this study (averaged over ten independent runs, with the standard

    6

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    13/62

    Figure 1: Comparison of the convergence velocity of a (1  1 0 

      )-strategy and a (1 + 1 0  

      )-strategy

    in case of the sphere modelf 

    1

      withn  = 3 0 

      andn 

    = 1 

      .

    deviations initialized with a value of 0.3) are shown in figure 1, where the convergence velocity

    or progress is measured by log(

    m n

    ( 0 ) = f 

    m n

    ( g  ) )

      withf 

    m n

    ( g  )

      denoting the objective function

    value in generationg 

      . It is somewhat counterintuitive to observe that the non-elitist (1  1 0 

      )-

    strategy, where all offspring individuals might be worse than the single parent, performs  better 

    than the elitist (1+10)-strategy. This can be explained, however, by taking into account that

    the self-adaptation of standard deviations might generate an individual with a good objective

    function value but an inappropriate value of      for the next generation. In case of a plus-strategy,

    this inappropriate standard deviation might survive for a number of generations, thus hindering

    the combined process of search and adaptation. The resulting periods of stagnation can be

    prevented by allowing to  forget  the good search point, together with its inappropriate step size.From this experiment, Schwefel concluded that the non-elitist (    )-selection mechanism is an

    important condition for a successful self-adaptation of strategy parameters. Recent experimental

    findings by Gehlhaar and Fogel [56] on more complicated objective functions than the sphere

    model give some evidence, however, that the elitist strategy performs as well as or even better

    than the (    )-strategy in many practical cases.

    For a further illustration of the self-adaptation principle in case of the sphere model f 1

      , we

    use a time-varying version where the optimum location ~x  = ( x  1

    : : : x  

    )  is changed every 150

    generations. Ten independent experiments for n  = 3 0   and 1000 generations per experiment

    are performed with a (15,100)-evolution strategy (without recombination). The average best

    objective function value (solid curve) and the minimum, average, and maximum standard devi-ations  m n

      ,     avg, and   m a x  are reported in figure 2. The curve of the objective function value

    clearly illustrates the linear convergence of the algorithm during the first search interval of 150

    generations. After shifting the optimum location at generation 150, the search stagnates for a

    while at the bad new position before the linear convergence is observed again.

    The behavior of the standard deviations, which are also plotted in figure 2 clarifies the

    7

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    14/62

    Figure 2: Best objective function value and minimum, average, and maximum standard devi-

    ation in the population plotted over the generation number for the time-varying sphere model.

    The results were obtained by using a (15,100)-evolution strategy with n 

    = 1   , n  = 3 0   , without

    recombination.

    Figure 3: Convergence velocity on f  2  for a ( 1 0 0   )-strategy with   2 f  1  : : : 3 0  g   for the self-adaptive evolution strategy (dashed curve) and the strategy using optimum prefixed values of 

    the standard deviations  i

      .

    8

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    15/62

    Figure 4: Comparison of the convergence velocity of a (1 5  1 0 0 

      )-strategy with correlated muta-

    tions (solid curve) and with self-adaptation of standard deviations only (dashed curve) in case

    of the functionf 

    3

      withn  =  n 

    = 1 0 

      ,n 

     

    = 4 5 

      .

    reason for the periods of stagnation of the objective function values: Self-adaptation of standard

    deviations works both by decreasing them during the periods of linear convergence and by

    increasing them during the periods of stagnation, back to a magnitude such that they have an

    impact on the objective function value. This process of standard deviation increase, which

    occurs at the beginning of each interval, needs some time which does not yield any progress

    with respect to the objective function value. According to [25], the number of generations

    needed for this adaptation is inversely proportional to   20

      (that is, proportional to n    ) in case of a

    (1     )-evolution strategy.

    In case of the objective function f  2  , each variable x  i  is differently scaled by a factorp 

    i  ,such that self-adaptation requires to learn the scaling of  n    different  

    i

      . The optimal settings

    of standard deviations   i

    /  1  = 

    i  are also known in advance for this function, such that self-

    adaptation can be compared to an evolution strategy using optimally adjusted  i

      for mutation.

    The result of this comparison is shown in figure 3, where the convergence velocity is plotted for

    ( 1 0 0   )-evolution strategies as a function of      , the number of parents, both for the self-adaptive

    strategy and the strategy using the optimal setting of   i

      .

    It is not surprising to see that, for the strategy using optimal standard deviations  i

      , the

    convergence rate is maximized for   = 1   , because this setting exploits the perfect knowledge in

    an optimal sense. In case of the self-adaptive strategy, however, a clear maximum of the progress

    rate is reached for a value of   = 1 2 

      , and both larger and smaller values of  

      cause a strongloss of convergence speed. The collective performance of about 12 imperfect parents, achieved

    by means of self-adaptation, almost equals the performance of the perfect (1,100)-strategy and

    outperforms the collection of 12 perfect individuals by far. This experiment indicates that self-

    adaptation is a mechanism that requires the existence of a knowledge diversity (or diversity of 

    internal models), i.e., a number of parents larger than one, and benefits from the phenomenon

    9

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    16/62

    of collective (rather than individual) intelligence.

    Concerning the objective function f 3

      , figure 4 shows a comparison of the progress for a

    (15,100)-evolution strategy with n 

    =  n  = 1 0   , n  

    = 0   (that is, no correlated mutations) and

     

    =  n  ( n  ;  1 ) =  2 = 4 5  (that is, full correlations). In both cases, intermediary recombi-

    nation of object variables, global intermediary recombination of standard deviations, and no

    recombination of the rotation angles is chosen. The results demonstrate that, by introducing

    the covariances, it is possible to increase the effectiveness of the collective learning process in

    case of arbitrarily rotated coordinate systems. Recently, [180] has shown that an approxima-tion of the Hessian matrix could be computed by correlated mutations with an upper bound of 

      +    = ( n 

    2

    + 3  n  + 4 ) =  2  on the population size, but the typical settings (   = 1 5   ,   = 1 0 0    )

    are often not sufficient to achieve this (an experimental investigation of the scaling behavior of 

    correlated mutations with increasing population sizes and problem dimension has not yet been

    performed).

    The choice of a logarithmic normal distribution for the modification of the standard devia-

    tions  i

      in connection with a multiplicative scheme in equations (6), (4) and (8) is motivated by

    the following heuristic arguments (see [202], p. 168):

    1. A multiplicative process preserves positive values.

    2. The median should equal one to guarantee that, on average, a multiplication by a certain

    value occurs with the same probability as a multiplication by the reciprocal value (i.e.,

    the process would be neutral under absence of selection).

    3. Small modifications should occur more often than large ones.

    The effectiveness of this multiplicative logarithmic normal modification is presently also

    acknowledged in evolutionary programming, since extensive empirical investigations indicate

    some advantage of this scheme over the original additive self-adaptation mechanism used in

    evolutionary programming [185, 184, 186], where

     

    0

    i

    =   

    i

    ( 1 +    N  ( 0  1 ) ) (15)

    (with a setting of      0  2   [186]). Recent investigations indicate, however, that this becomes

    reversed when noisy objective functions are considered, where the additive mechanism seems

    to outperform multiplicative modifications [4].

    The study by Gehlhaar and Fogel [56] also indicates that the order of the modifications of 

    i

    and  i

      has a strong impact on the effectiveness of self-adaptation: It is important to mutate

    the standard deviations first and to use the mutated standard deviations for the modification of 

    object variables. As the authors point out in that study, the reversed mechanism might suffer

    from generating offspring that have useful object variable vectors but bad strategy parameter

    vectors, because these have not been used to determine the position of the offspring itself.Concerning the sphere model f 

    1

      and a (1     )-strategy, Beyer has recently indicated that equa-

    tion (15) is obtained from equation (6) by Taylor expansion breaking off after the linear term,

    such that both mutation mechanisms should behave identically for small settings of the learning

    rates 

    0

      and 

      , when 

    0

    =   

      [25]. This was recently confirmed also with some experiments for

    the time-varying sphere model [15].

    10

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    17/62

    4 Variations

    4.1 Mutative Step-Size Control

    For a (1,    )-strategy and n 

    = 1   , the self-adaptation of strategy parameters can also be facilitated

    by using the so-called mutational step size control by Rechenberg, which modifies the standard

    deviations     according to the following rule ([169], p. 47):

     

    0

      if  u    U  ( 0  1 )   1  =  2 

    = if  u    U  ( 0  1 ) >  1  =  2   (16)

    A value of    = 1  3   of the learning rate is proposed by Rechenberg.

    As shown in [25], this self-adaptation rule also provides a reasonable choice with a con-

    vergence velocity comparable to that achieved by equation 4 for the convex case. This result

    confirms that the self-adaptation principle works for a variety of different probability density

    functions for the modification of step sizes, i.e., it is a very robust technique.

    4.2 Derandomized Step-Size Adaptation

    In contrast to the techniques discussed so far, the  derandomized mutational step size control

    proposed in [146] accumulates information about the selected individual’s mutation vector ~z 

    over the course of evolution by adding up the successful mutations. The authors claim that the

    method enables a reliable adaptation of individual step sizes (i.e.,n 

      different standard devia-

    tions  i

      ) even in small populations, namely, in (1,    )-strategies with   = 1 0   in the experiments

    reported. The proposed method utilizes a vector~z 

    g  of accumulated mutations as well as indi-

    vidual step sizes 

    i

      and a global step size 

      according to [146]:

    ~z 

    g

    = ( 1  ;  c  ) ~z 

    g ;  1

    +  c ~z 

    ~z 

    0

    (17)

     

    0

    =   

    e x p 

    ~z 

    g

    c

    2 ;  c

    ;  1 + 

    5  n 

     

    (18)

     

    0

    i

    =   

    i

    g

    i

    c

    2 ;  c

    + 0  3 5 

     

    (19)

    0

    i

    =  x 

    i

    +   

    0

     

    0

    i

    i

    ( 0  1 ) (20)

    Essentially, equation (17) captures the history of successful mutations by a weighted sum

    of the mutations selected in preceding generations (i.e., ~z  g ;  1  ) and the mutation vector ~z    of 

    the selected parent individual (notice that the method applies to (1,    )-strategies, i.e., ~z    is the

    mutation vector of the single best offspring individual produced in generationg  ;  1 

      ). Thevector ~ z  g  is then used to update both a global step size     and individual step sizes  

    i

      according

    to equations (18) and (19), where~z 

    g  in equation (18) denotes the absolute value of ~ z 

    g  , while

    g

    i

    in equation (19) indicates the absolute value of itsi

      -th component.

    Equation (20) then denotes the generation of offspring individuals from the single parent

    (with componentsx 

    i

      ) in a way similar to equation (6), but now using 

    0  and 

    0

    i

      . Concerning the

    11

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    18/62

    choice of the new learning rates c   ,     , and   0  , both theoretical and empirical arguments are given

    in [146] for the settings c  = 1  = p 

    n   ,   = 1  = p 

    n   ,   0 = 1  = n   .

    The experimental results presented in [146] demonstrate a clear convergence velocity im-

    provement of the derandomized mutational step size control when compared to an (8,50)-

    evolution strategy using the update rule given in equation (6), but the investigations focus on

    unimodal objective functions.

    The general idea of utilizing information from past generations as well is very convincing

    and should motivate further research on the derandomized self-adaptation scheme. It should benoted, however, that the method has to be classified at the border between adaptive and self-

    adaptive control methods, because equations (18) and (19) do not define a mutational variation

    of step sizes involving a random variation in the sense of those defined previously. Randomness

    is introduced only by means of the vector~z 

    g  , which takes the mutation vector of the parent

    individual into account, not an actually generated random variation.

    4.3 Hierarchical Evolution Strategies

    This kind of evolution strategy abstracts from the individual and takes genetic operators even

    on the level of populations into account. It was introduced by Rechenberg [169] and denoted as

     

    0

    =  

    0

    +   

    0

    ( = +    )

    ;  ES.

    Here the inner brackets denote a normal ( = +    )  -ES (the notation =    indicates a    -ary

    recombination operator) which runs   0  times for     generations, each. After that one got   0

    populations and   0  populations are selected for the next generation on the population level.

    These   0  populations run through a recombination and mutation cycle (   0 =   0 )  on the level of 

    populations to generate   0  new populations and then run the inner ( = +    )  -ES again for  

    cycles. This reproduction cycle on the population level is done   0  times.

    The problem to arise is the recombination and mutation on the level of populations. Recom-

    bination of populations can be done by simply taking single individuals from all   0  populations

    into the succeeding population. Mutation can than be invoked by mutating each of the single in-dividuals or by moving the centres of gravity of the populations [169]. The latter one of course

    needs more computational effort.

    One can recognize that there are two levels of hierarchy in the approach shown here:

    1. The level of individuals, and

    2. the level of populations.

    The concept however can be applied to more than one level and the nesting can increase to

    higher levels like sorts and families in natural evolution [77].

    The benefit of these hierarchical or nested evolution strategies is the isolation of populations.

    These populations can run in parallel and explore different parts of the search space. Becausethis is done several times it leads to a better exploration of the search space. Rechenberg indi-

    cates that this kind of strategy is qualified for multimodal optimization [169].

    This ES can also be used for multicriteria optimization (see also section 5.4) because the

    objectives to select for can be different on every step of the hierarchy. This only works with in-

    dependent objectives, however because e.g. the objective selected for in the level of populations

    12

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    19/62

    is not working in the level of individuals. This will destroy every good information regarding

    one objective in the case of contradicting ones.

    A detailed description of the implementation is given in [169] but one should have in mind

    that this approach again increases the number of parameters for an evolution strategy. This does

    not only need more effort in programming but also requires knowledge and experience in the

    tuning of the parameters to achieve good results.

    4.4 The (    ,    ,    ,    )-Strategy: Aging of Individuals

    In the (   +    )  -ES the     offspring and their     parents are united, before according to a given

    criterion, the     fittest individuals are selected from this set of size   +     . Both     and     can be

    as small as1 

      in this case, in principle. Indeed, the first experiments were all performed on the

    basis of a ( 1 + 1 )  -ES. In the (   )  -ES, with >     1   , the     new parents are selected from

    the 

      offspring only, no matter whether they surpass their parents or not. The latter version is

    in danger to diverge (especially in connection with self-adapting variances – see below) if the

    so far best position is not stored externally or even preserved within the generation cycle (so-

    called elitist strategy). So far, only empirical results have shown that the comma version has to

    be preferred when internal strategy parameters have to be learned on-line collectively. For thatto work, >  1   and intermediary recombination of the mutation variances seem to be essential

    preconditions. It is not true that ESs consider recombination as a subsidiary operator.

    The(   )

      -ES implies that each parent can have children only once (duration of life: one

    generation = one reproduction cycle), whereas in the plus version individuals may live eternally

    – if no child achieves a better or at least the same quality. The new (  )  -ES as defined

    in [209, 208] introduces a maximal life span of      1   reproduction cycles (iterations). Now,

    both original strategies are special cases of the more general strategy, with   = 1   resembling

    the comma- and with   =  1   resembling the plus-strategy, respectively. Thus, the advantages

    and disadvantages of both extremal cases can be scaled arbitrarily. Other new options include:

      Free number of parents involved in reproduction (not only 1, 2, or all).

      Tournament selection as alternative to the standard (   )  -selection.

      Free probabilities of applying recombination and mutation.

      Further recombination types including crossover.

    In a (     ,    ,    ,    )-ES, the representation of individuals is extended by a positive integer value

      2  I N 

    0

    , the remaining life span of the individual in iterations (reproduction cycles). Whenever

    a new individual is created by mutation and recombination, its remaining life span     is initialized

    to  =   

      . The remaining life span is decremented by the selection operator for all individualswhich survive selection.

    The remaining life span is then used to modify the   traditional deterministic ES selection

    operator , which can be defined formally as:

    s e l : I 

      + 

    !  I 

      (21)

    13

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    20/62

    Let P  ( T  )  denote some parent population in reproduction cycle T    , ~ P  ( T  )  their offspring produced

    by recombination and mutation, and Q  ( T  ) =  P  ( T  ) t  ~ P  ( T  ) 2  I    +    where the operator t    denotes

    the union operation on multisets. Then

    ( T  + 1 )

    : =  s e l ( Q 

    ( T  )

    ) (22)

    The next reproduction cycle contains the     best individuals still having a positive remaining

    duration of life, i.e., the following relation is valid:

    8  ~a  2  P 

    ( T  + 1 )

    :  

    a

    >  0  ^ 6 9 

    b  2  Q 

    ( T  )

    n  P 

    ( T  + 1 )

    :

    > ~a  (23)

    where the relation

    >    (read: better than) introduces a maximum duration of life,     , that defines

    an individual to be better than an other one if its remaining duration of life  k

      is still positive

    and its fitness (measured by the objective function) is better.

    The definition of the

    >    - relation is given by:

    ~a 

    k

    ~a 

    `

    : ,   

    k

    >  0   ̂ f  ( ~x 

    k

    )   f  (

    ~x 

    `

    ) (24)

    At the end of the selection process, the remaining maximum life durations have to be decre-

    mented by one for each survivor:

     

    ( T  + 1 )

    k

    : = 

     

    ( T  )

    k

    ;  1  8  k  2 f  1  : : :   g  (25)

    It should be noted again that, according to the definition (24) of the “better than” relation, a

    setting of    = 1   results in discarding the parents regardless of their quality (i.e., the (    ,    )-

    selection as in traditional evolution strategies) while  =  1 

      guarantees parents to be discarded

    only if they are outperformed by offspring individuals (i.e., the ( 

      + 

      )-selection as in traditional

    evolution strategies).

    As an alternative to this variant of selection, the   tournament selection   is well suited for

    parallelization of the selection process. This method selects     times the best individual from

    a random subsetB 

    k

      of sizeB 

    k

    =   

      ,2          +    8  k  2 f  1  : : :   g 

      and transfers it tothe next reproduction cycle (note that there may appear duplicates!). The best individual within

    each subset B k

      is selected according to the

    >    relation which was introduced in (24). A formal

    definition of the (  )  tournament selection follows: Let

    k

      Q 

    ( T  )

    8  k  2 f  1  : : :   g 

    (26)

    be random subsets of  Q  ( T  )  , each of size B k

    =     . For each k  2 f  1  : : :   g   choose ~ a k

    2  B 

    k

    such that

    b  2  B 

    k

    : ~a 

    k

    b (27)

    Finally,

    ( T  + 1 )

    : = 

     

    k = 1

    f  ~a 

    ( T  + 1 )

    k

    g  (28)

    As an extension to the traditional recombination operator, the generalized recombination

    operatorr e c  : I 

     

    !  I 

      is defined as follows:

    r e c  : =  r e    c o 

    (29)

    14

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    21/62

    where c o  : I    !  I    chooses 1           parent vectors from I     with uniform probability, and

    r e  : I 

    !  I  creates one offspring vector by mixing characters from     parents.

    Let A    P  ( T  )  of size A  =     be a subset of arbitrary parents chosen by the operator c o   ,

    and let ^~ a  2  I   be the offspring to be generated. If  A  =  f  ~a 1

    ~a 

    2

    g   , ~a 1

      and ~a 2

      being two out of   

    parents, holds, recombination is called  bisexual. If  A  =  f  ~a 1

    : : : ~ a  

    g   and >  2   , recombination

    is called  multisexual. While recombination in evolution strategies was originally proposed for

    the two cases of    = 2   and   =     (global recombination), and was restricted to   = 2   in

    genetic algorithms, Eiben generalized the idea for an arbitrary number of parents 2         

    involved in the creation of either one (e.g., in case of   scanning crossover ) or     (e.g., in case

    of  diagonal crossover ) offspring individuals [39, 41, 40]. This generalization is adapted here

    for extending discrete and intermediary recombination in evolution strategies to an arbitrary

    number of parents, but still generating one offspring only per application of the recombination

    operator. First experimental results in parameter optimization indicate that the optimum value of 

      is problem-dependent, but in many cases   =     is the most efficient setting for recombination

    of the object variables [38].

    In contrast to traditional evolution strategies which always apply recombination for the cre-

    ation of offspring, we also propose here to introduce recombination probabilities p r

    2  0  1

    3

    as a further generalization of the algorithm. A recombination probability p r

      for one of the

    three components of individuals that might undergo recombination is algorithmically realized

    by sampling a uniform random variable u    U  ( 0  1 )  and applying no recombination, if  u > p r

      ,

    or the corresponding recombination operator, if  u    p r

      .

    Finally, an offspring individual created by recombination is equipped with a remaining life

    time   =     .

    5 Application-Oriented Extensions

    5.1 Noisy Objective Functions

    Originally designed for experimental optimization [166, 203], Evolution Strategies are claimed

    to be of general applicability as well as robust in the presence of noise. Whereas the universality

    of these algorithms was validated through lots of applications [13] little is known about the

    robustness in case of pertubations. But the ability to deal with noisy functions not only is a

    prerequisite for experimental optimization, e.g. because of limited precision of observations,

    but also in the context of numerical optimization like in the field of computer simulation.

    Despite of their simple structure Evolution Strategies show a complex dynamic behavior.

    Theoretical investigations up to now were successful only for simplified strategy variants and

    convex objective functions like the sphere modelf 

    1

    ( ~x  ) = 

    i = 1

    2

    i

      .

    Here we cite a result from Beyer [24], which describes the dynamics of the (1,    )-ES on the

    noisy objective functionf 

    1

    ( ~x  ) +  N  ( 0   

    )

      :

      R 

      g 

    =   

    2

    2  R 

    2  R c 

    1

     

    2

    + ( 2  R   )

    2

    A  (30)

    andg 

      denote the remaining distance to the true optimimum point (~ 0 

      ) and the current

    15

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    22/62

      2 3 5 10 50 100

    1

    0.5642 0.8463 1.1630 1.539 2.249 2.508

    Table 1: Some values forc 

    1

    generation number, respectively. The standard deviations for the mutation and the perturbationare given by     and  

      . The model is of dimensionality n    and c 1

      denotes the so called progress

    coefficient , which is a slowly increasing function in     [24]:

    1

     

    2 l n    (31)

    Expressions (30) and (31) hold for large n    and     -values, respectively.

    Table 1 lists some values of c 

    1

      which are analytically derived for    5 

      and numerically

    approximated for >  5 

      from Scheel [187].

    We will make use of equation (30) to investigate the steady state, i.e. R 1 

    :

      R 

      g

    !  0   .

    Assuming l i m g ! 1 

      = 0   we get

     

    1

    and (32)

    1

    ( R 

    ) = 

     

    4  c 

    1

    (33)

    Equation (33) can be used to validate experimental results for the sphere model.

    For the experiments, standard deviations  

    2 f  0  0 0 1  0  0 0 5  0  0 1  0  0 5  0  1  0  5  1  0  g   are

    utilized to perturb the function values and the evolution strategies’ behavior is compared to the

    unperturbed case (  

    = 0   ). The experiments are performed by running a (1,100)-ES as well as

    a (15,100)-ES with n 

    = 1   for the convergence velocity test. Each experiment is repeated for a

    total of  N  = 1 0 0    independent runs in order to obtain statistically significant results. In contrast

    to the standard method which assesses the quality of an optimization run by concentrating on the

    individual of best (in our case, minimal) objective function value, this is not reasonable in case

    of perturbed evaluations because the populations’ extreme values represent outliers. Instead,

    the evaluations are based on the average objective function value of the offspring population,

    which provides a more robust measure of the true (unperturbed) quality of the individuals.

    The experiments are performed on the sphere modelf 

    1

      withn  = 3 0 

      . The initial population

    consists of object variables chosen uniformly at random from the interval ;  3 0  3 0

      . All initial

    standard deviations are set to a value of 25.0, and n 

    = 1   is used for all runs. Each of the N  = 

    1 0 0 

    runs is terminated after2 0  0 0 0 

      function evaluations (2 0 0 

      generations), and the objective

    function data of all runs is averaged to obtain a result of statistical significance. (Indeed, thedata from 100 runs passes a Kolmogorov-Smirnov test for the hypothesis of normally distributed

    data for a significance level of  0  0 1   and a confidence interval of 1% around the average.)

    Figure 5 shows the behavior of a (1,100)-ES for the set of different perturbation magnitudes

    as well as the unperturbed case. The average objective function value is plotted against the

    number of generations.

    16

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    23/62

    Figure 5: Courses of evolution for (1,100)-ES on the sphere model and standard deviations 

    2 f  0  0  0 0 1  0  0 0 5  0  0 1  0  0 5  0  1  0  5  1  0  g  for the perturbation.

    The courses of evolution clearly demonstrate the capability of an evolution strategy to pro-

    ceed as fast as in the unperturbed case as long as the magnitude of   

      is small in comparison to

    f  . If  f    decreases beyond a certain level the selection is based on the perturbation only and the

    search process becomes a random walk thus limiting the convergence precision.

    Table 2 shows a remarkable accordance between theoretical and experimental results com-

    paring the (1,100)-ES steady states. The difference of a factor of approximately 1.3 can be

    explained through the fact, that equation (33) is valid forn  ! 1 

      and  !  0 

      only.Increasing the parent population size to a more practical value of 15, we observe a similar

    behavior (figure 6). A closer look not only shows a moderate speed up due to the influence

    of recombination, but also a much better localisation of the optimum point in the steady state

    by approximately a factor of 4. This effect is caused by the reduction of selection pressure

    which prohibits the outliers to take over the whole population. A first analysis lets us assume

    an optimal parameter value for     between 10 and 15 in this configuration.

    5.2 Robust Design

    Robustness is an important requirement for almost all kinds of products, i.e. they should keep agood performance under varying conditions (temperature or humidity). Furthermore, the impact

    of wear, as well as manufacturing tolerances, should be limited as much as possible. Conse-

    quently, the production process itself as well as the environmental influences after the product is

    put to use have to be regarded during the product design. We have shown for multilayer optical

    coatings (MOCs) how robust designs can be achieved by using evolutionary algorithms. MOCs

    17

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    24/62

    (1,100)-ES (1,100)-ES (15,100)-ES

     

    theory observation observation

    1.0 2.990 3.8 0.975

    0.5 1.495 1.9 0.469

    0.1 0.299 0.4 0.091

    0.05 0.150 0.2 0.047

    0.01 0.030 0.038 0.0090.005 0.015 0.02 0.005

    0.001 0.003 0.004 0.001

    Table 2: f  ( R 1 

    )  for the (1,100)-ES theory, (1,100)-ES experiment and the (15,100)-ES experi-

    ment.

    are used to guarantee specific transmission and/or reflection characteristics of optical devices.

    The objective of MOC designs is to find sequences of layers of particular materials with spe-

    cific thicknesses showing the desired characteristics as closely as possible. The MOC design

    problem is not analytically solvable.Let ~x  = ( x 

    1

    : : : x  

    )  be a vector of parameters of a given design problem, e.g., the refraction

    indices and thickness of the optical layers. Given a function f  ( ~x  )  describing the merit of a

    design feature, e.g. the color perception of the reflected light, and     being a target value for

    f  ( ~x  ) , then if disturbances are neglected the task is to find such an ~x    that the difference between

    f  ( ~x 

    ) and     is minimized.

    On the other hand the usability of two products although manufactured under almost iden-

    tical conditions might differ significantly, due to external conditions such as temperature and

    humidity, or internal factors such as wear as well as manufacturing tolerances. Some of these

    factors are not controllable at all. Others can only be reduced with unjustifiable effort. Thus

    they are regarded as disturbances, and it is desired to reduce their influence as much as possible.

    Here we focused on manufacturing tolerances, but the approach could easily be extended.

    The disturbances are represented by a vector of random numbers ~    = (  1

    : : :  

      ). If the

    probability distribution of the  i

      are known as well as their influence on f    we might rewrite

    f  ( ~x  ) as ~ f  ( ~x ~    )  . In our example the disturbances are assumed to be normally distributed with

    zero mean and will have an additive influence on the parameter values. Thus, we define

    f  ( ~x

      ) =  f  ( x 

    1

    +   

    1

    : : : x  

    +   

    ) (34)

    The task is now to minimize the deviations of  ~ f  ( ~x ~    )  from     .

    This leads to the question of how to assess these deviations. The traditional approach regards

    all products with~ 

    f  ( ~x

      ) ;     

      as equally good for some predefined

      and all others as off-cuts. But this approach is somewhat unrealistic, since if such products are assembled to larger

    units such as devices on electronic boards malfunctions might occur due to aggregations of 

    deviations of single elements.

    The method of parameter design after Taguchi [218, 93, 179] takes these effects into account

    by considering every deviation from the objective 

      as a loss. In practical applications quadratic

    18

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    25/62

    Figure 6: Courses of evolution for (15,100)-ES on the sphere model and standard deviations 

    2 f  0  0  0 0 1  0  0 0 5  0  0 1  0  0 5  0  1  0  5  1  0  g  for the perturbation.

    loss functions of the form

    (

    f  ( ~x

      ) ;    ) )

    2 (35)

    have proven to be well suited if no better alternative is known. The expected loss then becomes

    L  =  k  E  ( (

    f  ( ~x

      ) ;    )

    2

    ) (36)

    where k    is some constant and E    denotes the expectation value of the quadratic deviation.

    In our work we follow the approach of Greiner [61, 62] who defines the objective functionas

    ( ; 

    ~

    f )

    2

    ( ~x  ) =  k 

    (   ; 

    f  ( ~x

      ) )

    2

    P  (

      ) d 

    (37)

    where P  ( ~    )  denotes the the joint probability distribution of the distrubances. Since in most

    applications the expectation value E    cannot be calculated analytically it must be approximated.

    Here we use1 

    t

    t

    i = 1

    (   ; 

    f  ( ~x

     

    i

    ) )

    2 (38)

    as an estimate, where ~   i

    i = 1  : : : t  , are vectors of normally distributed random numbers with

    mean zero and standard deviation 

      . The estimation error scales proportional to

    t

      , and sincein most applications the possible number of evaluations is very limited this approach yields a

    stochastic optimization problem. As evolutionary algorithms have proven their robustness in

    case of noisy objective functions [46, 24, 9, 64] they are promising candidates here.

    In order to clarify the relationship between the original merit functionf 

      and the expected

    lossL 

      we investigated a rectangular function. We could show that optimla points of L 

      do not

    19

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    26/62

    necessarily correspond to optimal points of  f  ( ~x  ) ;     . As already mentioned we considered as

    an practical example the design of multilayer optical coatings most frequnetly used for optical

    filters. During the production process the layer thickness can not be controlled with arbitrary

    precision. Additionally, the refraction indices vary slightly due to pollution of the optical mate-

    rials. Thus, we might observe significant variances in the quality of single filters.

    Basically, we applied two modified evolution strategies (ES). A extended ( 2 5 + 5 0 )  -ES for

    mixed-integer optimization after [14] and a parallel diffusion model after [199], where the in-

    dividuals are located on a regular grid. We used 15 subpopulations with a size of 20x25, aneighborhood size of 7x7 and an isolation time of 30 generations. The MOC designs found by

    the evolutionary algorithms are substantially more robust to parameter variations than a refer-

    ence design and therefore perform much better in the average case, although for the undisturbed

    case the reference design is significantly better. This observation was expected, since sensitiv-

    ity analysis shows that many local optima are not robust under parameter variations. For more

    details see [230].

    5.3 Dynamic Environments

    The principle of self-adaptation promises to be useful not only in case of static optimizationproblems, but also for   dynamic optimization problems  where the objective function changes

    over the course of optimization. The dynamic environment requires the evolutionary algorithm

    to maintain sufficient diversity for continuous adaptation to the changes of the landscape, which

    should be possible by means of self-adaptation of strategy parameters. Recently, it was demon-

    strated that indeed the self-adaptation principle in evolution strategies provides an effective way

    of tracking moving optima in case of dynamic objective functions [6].

    In the general case of a dynamic environment, the goal is not only to acquire an optimal

    solution but also to track its progression through the search space as closely as possible. In

    contrast to the static optimization problem f  ( ~x  ) !  m i n   (~x  2  M   ), the dynamic optimization

    problem

    f  ( ~ x t ) !  m i n  ~x  2  M t 2  T 

    depends on an additional parameter t 2  T   (the time) as well, i.e., the objective function changes

    with t  . Generally, this implies that, for ti

    6=  t

    j

      , f  ( ~ x ti

    ) 6=  f  ( ~ x t

    j

    )  , i.e., the objective function

    might be different after each function evaluation, in contrast to a simplified form of dynamic

    behavior where the objective function remains constant within specific time intervals tk

    t

    k

      t

    k

    , such that

    t

    i

    t

    j

    2  t

    k

    t

    k

    +   t

    k

    )  f  ( ~ x t

    i

    ) =  f  ( ~ x t

    j

    )

    For the investigations reported in [6], it was assumed that the dynamics of the objective

    function and the dynamics of the evolutionary algorithm are synchronized by identifyingt

      withthe generation index of the algorithm and by keeping f    constant within one generation, such

    that  t

    k

      1 

      andt

    i

    t

    j

    t

    k

    2 f  0  1  2  : : : t

    m a x

      . Moreover,  t

    k

    = :   g 

      is also assumed to be

    constant, such that the objective function changes every  g 

      generations after completing the

    evaluation of the whole population in case of a generational evolutionary algorithm such as the

    evolution strategy.

    20

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    27/62

    Figure 7: Evolution strategy results for the linear dynamics with update frequenc y   g  = 1 

    (left),   g  = 5   (middle),   g  = 1 0   (right).

    Three dynamical environments derived from the sphere model

    f  ( ~x  ) = 

    i = 1

    2

    i (39)

    are used for the experiments. The dynamical environments are generated by translating the base

    function along a linear trajectory according to

    f  ( ~ x t ) = 

    i = 1

    ( x 

    i

    +   

    i

    ( t ) )

    2 (40)

    where t 2  I N 0

      denotes the time counter (equivalent to the generation number in an evolutionary

    algorithm).

    The trajectory is defined by setting  i

    ( 0 ) = 0   8  i 2 f  1  : : : n   g   , and

     

    i

    ( t + 1 ) =  

     

    i

    ( t ) +  s ( t + 1 ) mod   g  = 0 

     

    i

    ( t )

    else  (41)

    The algorithm used here is a standard (15,100)-evolution strategy with local discrete re-

    combination on the object variables x i

      and global intermediary recombination on the strategy

    parameters  i

      . 100 offspring individuals are generated per generation, n 

    =  n   variances are

    used for self-adaptation (although it is well known that one variance is optimal for the sphere

    model), all object variables are uniformly initialized within the range ;  5 0  5 0  , and 50 indepen-

    dent runs are performed over 500 generations, each. The experiments for the linear dynamics,

    with update frequencies   g  2 f  1  5  1 0  g   and severity s  2 f  0  0 1  0  1  0  5  g   are shown in figure

    7.In this figure, the left, middle, and right subfigure correspond with an update frequency of 

    1, 5, and 10 generations, respectively, and each of the subfigures contains the three curves for

    the different levels of the severity parameter.

    All results reported here give a clear impression that the self-adaptation of variances as uti-

    lized in a ( 

      , 

      )-evolution strategy is an effective method for tracking dynamic environments. In

    21

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    28/62

    all cases, the optimization proceeds with a linear rate of convergence as predicted by the theory

    of evolution strategy behavior on the sphere model, until the objective function value reaches

    an order of magnitude corresponding to the squared value of the severity parameter s   . With

    an update frequency of    g  = 1   , the algorithm constantly follows the dynamic environment

    without any deteriorations.a With larger update frequencies   g  2 f  5  1 0  g   , the objective func-

    tion values oscillate with a frequency of    g   generations between the objective function value

    achieved by a continuous update at every generation (left figures) and the further improvement

    that can be achieved by holding the environment constant for   g   generations. This results in alarger amplitude of the oscillation when

      g 

      increases.

    The direct conclusion from the three sets of experiments reported here is that the lognormal

    self-adaptation rule as used in ( 

      , 

      )-evolution strategies is perfectly able to track the dynamic

    optima.

    5.4 Multiple Criteria Decision Making

    It has become increasingly obvious that the optimization under a single scalar–valued criterion

    — often a monetary one — fails to reflect the variety of aspects in a world getting more and more

    complex. Often, there are several conflicting optimization criteria (e.g., costs vs. reliability),

    such that the objective function is characterized best by a multiple-criteria approach with k >  1 

    objectives, i.e.:~ 

    f  : M  !  I R 

    k

    f  ( ~x  ) = ( f 

    1

    ( ~x  ) : : : f  

    k

    ( ~x  ) )

    (42)

    Under such circumstances, the goal of the search is to identify solutions which can not be

    improved in any combination of the objectives without degradation in the remaining, i.e., a

    solution ~x i

      is called Pareto-optimal (nondominated): , 

    6 9  ~x 

    j

    :

    f  ( ~x 

    j

    )

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    29/62

      Pareto-based approaches, using a population ranking according to Pareto dominance.

    While all of these approaches can be used in combination with an evolution strategy, we

    focus here on a study which falls in the second of the above mentioned categories and uti-

    lizes the concept of polyploidy to deal with different objectives. More precisely, the following

    modifications to a (     ,    )-evolution strategy are made [112, 113]:

      Since the environment now consists of  k    objectives the selection step is provided with a

    fixed user–definable vector that determines the probability of each objective to become

    the sorting criterion in the k    iterations of the selection loop. Alternatively, this vector may

    be allowed to change randomly over time.

      Furthermore, the extension of an individual’s genes by recessive information turned out

    to be necessary in order to maintain the population’s capability of coping with a chang-

    ing environment. The recessive genes enable a fast reaction after a sudden variation of 

    the probability vector. One can also observe this behaviour in nature: The younger the

    environment the higher the portion of polyploid organisms.

    Using these principles, the algorithm is able to generate solutions covering the Pareto front,

    such that the user is provided with an idea of the tradeoffs between the objectives. It should benoted that efficient solutions in one generation may become dominated by individuals emerg-

    ing in a later generation. This explains the non–efficient points in figure 8 (left) for the two

    objectives

    1

    ( ~x  ) = 

    i = 1

    ( ;  1 0 e x p ( ;  0  2 

    2

    i

    +  x 

    2

    i + 1

    ) ) (44)

    2

    ( ~x  ) = 

    i = 1

    ( x 

    i

    0 8

    + 5 s i n ( x 

    i

    )

    3

    )

    (45)

    For efficiency reasons the ‘parents’ of the next generation are stored provisionally in an array

    that is cleaned out if there is not enough space left for further individuals. If this operation does

    not result in enough free space solutions ‘close’ to one another are deleted. As an important

    side effect the elements of the Pareto set are forced apart thus allowing a good survey with only

    a finite number of solutions. Figure 8 (right) displays the situation after tidying up.

    When working with diploid individuals the inclusion of the recessive genes in the selection

    step turns out to be vital. Otherwise, undisturbed by the outside world they lead such a life

    of their own that an individual whose dominant genes have been freshened up with recessive

    material has no chance of surviving the next selection step. The best results were achieved with

    a probability of about 1  =  3   for exchanging dominant and recessive genes. This value also serves

    as a factor when putting together the overall fitness vector. Only in this way the additional

    recessive material can serve as a stock of variants. From further test runs one can also concludethat diploid or, in general, polyploid individuals are not worth the additional computing time in

    a static environment consisting only of one objective function.

    Since the algorithm tries to cover the Pareto set as good as possible a probability distribution

    forcing certain minimum changes during the mutation step ought to yield better results. Indeed,

    the (symmetric) Weibull distribution turned out to be better than the Gaussian distribution.

    23

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    30/62

    Figure 8: Graphical visualization of the output of the algorithm.

    The stochastic approach towards vector optimization problems via evolution strategies leads

    to one major advantage: In contrast to other methods no subjective decisions are required during

    the course of the iterations. Instead of narrowing the control variables space or the objective

    space by deciding about the future direction of the search from an ‘information vacuum’ the

    decision maker can collect as much information as needed before making a choice which of the

    alternatives should be realized. Moreover, using a population while looking for a set of efficient

    solutions seems to be more appropriate than just trying to improve one ‘current best’ solution.

    One might exploit the algorithm’s capability of self–adapting its parameters even further:

    The exchange rate between dominant and recessive genetic material can be adjusted on–line

    thus providing the user with a measure of convergence. The self–adaptation property largely

    depends on a selection scheme that forces the algorithm to ‘forget’ the good solutions (‘parents’)of one generation. When accepting a possible recession from one generation to the next on the

    phenotype level individuals with a better ‘model’ of their environment, i.e. better step sizes  i

    are likely to emerge in later generations. This kind of selection seems to be lavish at first sight

    but it favours better adapted settings, thus speeding up the search in the long run.

    5.5 Constraint Handling

    In practical application problems, the feasible region F    usually is only a subspace of the whole

    search space S    , and it is defined by a set of  m    additional constraints:

    j

    ( ~x  )   0  for j  = 1  : : : q     (46)h 

    j

    ( ~x  ) = 0  for j  =  q  + 1  : : : m :   (47)

    During the optimum seeking process of ESs, inequality constraints so far have been handled

    as barriers, i.e., offspring that violate at least one of the restrictions are lethal mutations. Before

    the selection operator can be activated, exactly 

      non-lethal offspring must have been generated.

    24

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    31/62

    In case of a non-feasible start position ~x  ( 0 )  , a feasible solution must be found at first. This

    can be achieved by means of an auxiliary objective function

    f  ( ~x  ) = 

    j = 1

    j

    ( ~x  )  

    j

    ( ~x  ) (48)

    with 

    j

    ( ~x  ) =  ;  1 

      if g 

    j

    ( ~x  )   0   and d   .

    This kind of handling bounds can be used with all optimum seeking methods, provided that

    they are started within the feasible region. Some may have trouble with the sine-term due to the

    periodicity introduced, however.

    6 Parallel Evolution Strategies

    Due to the fact, that all individuals of a population act simultaneously in nature one can speak 

    of an inherent parallelism in evolution. Although this was already known when the principles of evolutionary algorithms were designed, no one could at that time imagine the power of parallel

    computers, which are now available. Consequently, evolutionary algorithms have usually been

    implemented sequentially.

    Nowadays we are used to parallel computers and so in the last years a lot of suggestions to

    parallise evolutionary algorithms have been made. The goals of parallelism are simple:

    25

  • 8/17/2019 Innovative Methodologies in Evolution Strategies

    32/62

      Speed: Get the same results like a sequential algorithm in less time.

      Robustness: Get more robust results regarding errors or noisy information.

      Quality: Get better results in the same time as a sequential algorithm.

    There are at least two different approaches to parallel evolutionary algorithms [57, 5] which

    are described here next to a mixed-model approach which tries to put the best of both models

    together. Before that a very simple but effective way to use parallel hardware is presented,which does not match to the models presented afterwards.

    6.1 The Master-Slave Approach

    This approach is very effective if the calculation of the fitness function is time intensive, e.g. when

    optimizing simulaton models where the simulation software runs a long time like in [7].

    In this case the evolutionary algorithm can be divided into a master-process, where the

    individuals are generated and the genetic operators are applied, and a number of slave-processes,

    where the fitness function is evaluated.

    Now the different processes can run on different maschines and the fitness calculation for awhole population can be done parallel. A special kind of  steady-steate  selection [228] with a

    (   + 1 ) ; 

    selection scheme was presented in [7, 11, 97] which nearly avoids any idle times on

    the processors, because every time a fitness is calculated a new individual is send to the idle

    processor without waiting for any other results from the slaves.

    6.2 Coarse Grained Parallelism: The Migration Model

    In the migration model a population is divided into a number of subpopulations, so-called demes

    [5]. These subpopulations are still panmictic but exchange genetic information by the migration

    of individuals. Two concepts are known [57, 215, 5]:

    1. In the Island Model  there is a random exchange of information between the subpopula-

    tions, and

    2. in the Stepping Stone Model  this exchange is limited to migration paths which connect

    the subpopulations that are placed in a topology (e.g. a ring, or a torus etc.).

    These algorithms can be scaled to a balanced usage of processing and communication resources

    by tuning the local population size and the migration frequencies.

    Different ways to choose the individuals to leave the local population are known. To choose

    one randomly seems to be a good compromise between the danger of premature stagnation

    when choosing the best individual and small chances to survive in the new subpopulation whenchoosing the worst one to leave.

    Another problem is the way to insert immigrants into the new population. A solution which

    c