data clustering approach reliability.pdf

Upload: gkgj

Post on 01-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 data clustering approach reliability.pdf

    1/20

    Quality Technology &QuantitativeManagement Vol. 4, No. 2, pp. 191-210, 2007

    QQTTQQMM © ICAQM 2007

    Data Clustering of Solutions for Multiple ObjectiveSystem Reliability Optimization Problems

    Heidi A. Taboada and David W. Coit

    Department of Industrial and Systems Engineering, Rutgers University, Piscataway, USA(Received  August 2005, acceptedMay 2006)

    ______________________________________________________________________ 

    Abstract:  This paper proposes a practical methodology for the solution of multi-objective systemreliability optimization problems. The new method is based on the sequential combination ofmulti-objective evolutionary algorithms and data clustering on the prospective solutions to yield a smaller,more manageable sets of prospective solutions. Existing methods for multiple objective problems involveeither the consolidation of all objectives into a single objective, or the determination of a Pareto-optimalset. In this paper, a new approach, involving post-Pareto clustering is proposed, offering a compromise

    between the two traditional approaches. In many real-life multi-objective optimization problems, thePareto-optimal set can be extremely large or even contain an infinite number of solutions. Broad anddetailed knowledge of the system is required during the decision making process in discriminating amongthe solutions contained in the Pareto-optimal set to eliminate the less satisfactory trade-offs and to selectthe most promising solution(s) for system implementation. The well-known reliability optimizationproblem, the redundancy allocation problem (RAP), was formulated as a multi-objective problem with thesystem reliability to be maximized, and cost and weight of the system to be minimized. A multiple stageprocess was performed to identify promising solutions. A Pareto-optimal set was initially obtained usingthe fast elitist nondominated sorting genetic algorithm (NSGA-II). The decision-making stage was thenperformed with the aid of data clustering techniques to prune the size of the Pareto-optimal set andobtain a smaller representation of the multi-objective design space; thereby making it easier for thedecision-maker to find satisfactory and meaningful trade-offs, and to select a preferred final designsolution. 

    Keywords:Data mining, multiple objective genetic algorithms (MOGA), system reliability optimization. ______________________________________________________________________ 

    1. Introduction

    ystem design and reliability optimization problems are inherently multi-objective andconflicting in nature. In multi-objective optimization problems, there are two primary

    approaches to identify solution(s). The first involves quantifying the relative importance ofthe attributes and aggregating them into some kind of overall objective function, e.g., utilityor value function. Solving the optimization problem with this approach generates an“optimal” solution, but only for a specified set of quantified weights or specific utility

    function. Unfortunately, the precise value of the objective function weights used or theform of a selected utility function dictates the final solution, and thus, broad and detailedknowledge of the system is demanded.

     The second approach involves populating a number of feasible solutions along aPareto frontier and the final solution is a set of non-dominated solutions. In this case, thePareto set can contain a large number (in some cases, thousands) of solutions. From thedecision-maker’s perspective, consideration of all the nondominated solutions can beprohibitive and inefficient.

    S

  • 8/9/2019 data clustering approach reliability.pdf

    2/20

    192 Taboada and Coit

     This paper takes the view that, for many multi-objective engineering designoptimization problems, a balance between single solutions and Pareto-optimal sets can beadvantageous. A practical approach to intelligently reduce the size of the Pareto set ispresented. This creates a reduced set of perspective solutions; thereby, making selection ofthe final solution easier.

     The redundancy allocation problem (RAP) was formulated as a multi-objectiveproblem as the system reliability is to be maximized and cost and weight of the system areto be minimized. A lthough the RAP has been studied in great detail, one area that has notbeen sufficiently studied is multiple objective optimization in general, and more specifically,the post-Pareto analysis stage. The solution to the multi-objective RAP is a set ofnondominated solutions. This set is often too large for a decision-maker to make anintelligent identification of the most suitable compromise solution. In this paper, a practicalapproach is described to determine a reasonable-sized set of promising solutions. Theresulting pruned Pareto set is simply a nondominated sub-set of a more manageable size tooffer meaningful contrasting design options.

    2. Redundancy Allocation Problem

     The RAP is a system design optimization problem. This system has a total of s  subsystems arranged in series. For each subsystem, there are in   functionally equivalentcomponents arranged in parallel. Each component has potentially different levels of cost,weight, reliability and other characteristics. The in   components are to be selected from

    im  available component types, where multiple copies of each type can be selected. Anexample of a series-parallel system is depicted in Figure 1.

    1

    n 1

    2

    1

    n s

    2

    1   s

    1

    n 1

    2

    1

    n s

    2

    1   s

     

    Figure 1. General series parallel redundancy system.

     The use of redundancy improves system reliability but also adds cost, weight, etc., tothe system. There are generally system-level constraints and the problem is to select thedesign configuration that maximizes some stated objective functions.

    2.1. Previous Research

    Solving the redundancy allocation problem has been shown to be NP-hard by Chern[5]. Different optimization approaches have been previously used to determine optimal orgood solutions to this problem. The RAP has been solved using dynamic programming byBellman [1] and Bellman & Dreyfus [2] to maximize reliability for a system given a singlecost constraint. For each subsystem, there was only one component choice so the problemwas to identify the optimal levels of redundancy. Fyffe et al ., [14] also used a dynamic

  • 8/9/2019 data clustering approach reliability.pdf

    3/20

    Data Clustering of Solutions for Multiple Objective System 193 

    programming approach and solved a more difficult design problem. They considered asystem with 14 subsystems and constraints on both cost and weight. For each subsystem,there were three or four different component choices each with different reliability, cost andweight. Bulfin & Liu [3] used integer programming and they formulated the problem as aknapsack problem using surrogate constraints.

     These previous mathematical programming approaches are only applicable to alimited or restricted problem domain and require simplifying assumptions, which limits thesearch to an artificially restricted search space. In these formulations, once a componentselection is made, only the same component type can be used to provide redundancy. Thisrestriction is required so the selected mathematical programming tool can be applied, but itis not an actual imposition of the engineering design problem. Thus, the resulting solutionis only “optimal” for a restricted solution space, and better solutions can be found if therestriction is no longer imposed.

    Genetic Algorithms (GAs) offer some distinct advantages compared to alternativemethods used to solve the RAP. Coit & Smith [6, 7] used GAs to obtain solutions to theRAP. In their research, they solved 33 variations of the Fyffe problem using a GA.Similarly, Tabu Search (Kulturel-Konak, et al. [21]) has been used to obtain solutions to thisproblem.

    Several reliability optimization formulations considering multiple criteria have alsobeen presented in the literature. A multi-objective formulation to maximize systemreliability and minimize the system cost was considered by Sakawa [27] using the surrogateworth trade-off method. Inagaki et al . [17] used interactive optimization to design a systemwith minimum cost and weight and maximum reliability.

    Dhingra [11] used goal programming and goal-attainment to generate Pareto-optimalsolutions to solve a special case of a multi-objective RAP. Busacca et al . [4] proposed amulti-objective GA approach that was applied to a design problem with the aim to identifythe optimal system configuration and components with respect to reliability and costobjectives.

    Recently, Kulturel-Konak, et al. [22] solved this problem using Tabu Search methodconsidering three objective functions, maximization of system reliability and minimizationof cost and weight of the system.

     The following notation will be used throughout the reminder of the paper:

    Notation:

    R, C, W  = system level reliability, cost and weight or constraint limits

    s  = number of subsystems

    ij x   = quantity of the th j available component used in subsystem i  

    xi  = ( ),1 ,2 ,, ,..., i i i i m  x x x   m i  = total number of available component types for subsystem i  

    n max,i  = user defined maximum number of components in parallel used in subsystem i

    R i (xi ) = reliability of subsystem i

    c ij , w ij , r ij  = cost, weight and reliability for theth

     j available component for subsystem i

    ω i  = weight used for objective i  in the weighted sum method

  • 8/9/2019 data clustering approach reliability.pdf

    4/20

    194 Taboada and Coit

    NOTE: an objective function weight, ω i , and a component weight, ij w , are conceptuallyvery different.

    2.2. Problem Formulation

     The use of redundant elements increases system reliability, but also increases theprocurement cost and system weight. A complex question arises then regarding how tooptimally allocate redundant elements. The answer depends on the structure of the

    designed system, the decision-makers’ priorities and preferences, and ultimately, someselected optimality criterion.

    Different problem formulations of the RAP have been presented in the literature. Forinstance, Problem P1 maximizes the system reliability given restrictions on the system cost,C , and the system weight, W . A lternatively; Problem P2 is formulated as a multi-objectiveoptimization problem by using the weighted sum approach. Problem P3 is a multi-criteriaformulation of the RAP, in which a Pareto-optimal set of solutions is obtained. Theformulation of the three problems is summarized below:

    Problem P1: Reliability maximization 

    =∏1max )

    i i i  R   

    Subject to:

    1 1

    i m s 

    ij ij  i j 

    c x C = =

    ≤∑ ∑  

    1 1

    i m s 

    ij ij  i j 

    w x W = =

    ≤∑ ∑  

    max,1

    1 for 1,2,..,i m 

    ij i  j 

    x n i s  =

    ≤ ≤ =∑  

    { }0,1,2,..ij x   ∈ .

    Given the overall restrictions on system cost of C  and weight of W , the problem is todetermine which design alternative to select with the specified level of componentreliability, and how many redundant components to use in order to achieve the maximumreliability.

    Problem P2: Weighted sum method formulation

    1 2 3 1 2 31 1 1 11

    max ( ) ( ) ( ) ( )i i m m s  s s 

    i i ij i j i j i j  i j i j  i 

    R C W R c x w x  ω ω ω ω ω ω  = = = ==

    ⎡ ⎤ ⎡ ⎤⎡ ⎤− − = − −∑ ∑ ∑ ∑∏   ⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦   ⎣ ⎦ ⎣ ⎦x x x x  

    Subject to:

    max,1

    1 for 1,2,..,i m 

    ij i  j 

    x n i s  =

    ≤ ≤ =∑ ,

    11

    i i 

    ω 

    ==∑ ,

  • 8/9/2019 data clustering approach reliability.pdf

    5/20

    Data Clustering of Solutions for Multiple Objective System 195 

    { }0,1,2,..ij x   ∈ .

     The solution for this multi-objective formulation is determined by combining the threeobjectives into a single objective problem. This requires the user to a priori  specify objectivefunction weights to represent the relative importance to the individual objective functionand one single set of weighting coefficients yields only one “optimal” solution. Therefore,selection of the correct set of weights is critical and it will dictate the final solution. Often,

    decision-makers lack the training or detailed knowledge to precisely select the weights. Thisis undesirable because even small alterations to the weights can lead to very different finalsolutions.

    Problem P3: Multi-objective formulation

    = = = ==

    ⎡ ⎤ ⎡ ⎤⎡ ⎤∑ ∑ ∑ ∑∏   ⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦   ⎣ ⎦ ⎣ ⎦1 1 1 11

    max ( ) , min , mini i m m s  s s 

    i i ij i j i j i j  i j i j  i 

    R c x w x  x  

    Subject to

    max,1

    1 for 1,2,..,i m 

    ij i  j 

    x n i s  =

    ≤ ≤ =∑ ,

    { }0,1,2,..ij x   ∈ .

    For the multi-objective RAP, the objectives are to determine the optimal designconfiguration that will maximize system reliability, minimize the total cost and minimizethe system weight. A Pareto-optimal set of solutions can be obtained by using anymulti-objective evolutionary algorithm (MOEA) available. However, there may be toomany prospective solutions for the decision-maker to fully consider before ultimatelyselecting a unique design configuration to implement. The Problem P3 formulation is theone addressed in this paper.

    3. Multi-Objective Optimization Solution Methods

    Most actual system engineering optimization problems are multi-objective in nature. This means that these problems have designs with several criteria or design objectives.Moreover, if the objectives are conflicting, then the problem involves finding the bestpossible design that provides the best combination of the conflicting objectives. Then, adesign problem needs to be solved, with multiple objectives and constraints taken intoconsideration.

    Although there are several approaches to solve multi-objective problems, the two mostcommon are: (1) combine all objective functions into a single objective function using

    methods such as the weighted sum method or utility functions, or (2) obtain a set ofnon-dominated Pareto-optimal solutions. For the first approach, a single “optimal” solutionis generally found, whereas in the second approach, a potentially large Pareto-optimal set isidentified. In this paper, a new approach is presented which represents a compromisebetween the two traditional approaches.

    A general formulation of a multi-objective optimization problem consists of a numberof objectives with a number of inequality and equality constraints. Mathematically, theproblem can be written as follows (Rao [24]):

  • 8/9/2019 data clustering approach reliability.pdf

    6/20

    196 Taboada and Coit

    minimize / maximize f i (x) for i  = 1, 2, …, n  

    Subject to:

    ( ) ≤ 0 j g  x    j = 1, 2, …, J  

    ( ) = 0

    h  x   q  = 1, 2, …,Q .

    In the vector function, f i (x), some of the objectives are often in conflict with others,and some have to be minimized while others are maximized. The constraints define thefeasible region X, and any point x ∈ X defines a feasible solution. There is rarely a situationin which all the f i (x) values have an optimum in X at a common point x. Therefore, it isnecessary to establish certain criteria to determine what is considered as an optimalsolution, and this criteria is nondominance. Thus, solutions to a multi-objectiveoptimization problem are mathematically expressed in terms of nondominance.

    Without loss of generality, in a minimization problem for all objectives, a solution x1 dominates a solution x2, if and only if the two following conditions are true:

    • 

    x1 is no worse than x2 in all objectives, i.e. f i (x1) ≤ f i (x2) ∀ i , { }1,2,..,i n ∈  

    •  x1 is strictly better than x2 in at least one objective, i.e., f i (x1) < f i (x2) for at least one i .

     Then, the optimal solutions to a multi-objective optimization problem are the set ofnondominated solutions and they are usually known as Pareto-optimal set (Zeleny [32]).

    3.1. Single-Objective Approaches

     The presence of several conflicting objectives is typical for engineering designproblems. The most common approach of conducting multi-objective optimization is byaggregating the different objectives into one single objective function. Optimization is thenconducted with one optimal design as the result. The weighted sum method, goal

    programming, utility theory, etc., are examples of this approach. The weighted sum method consists of adding all the objective functions together using

    different weighting coefficients for each one. This method is the simplest possible approachto solve the multi-objective problem, but the challenge is determining the appropriate set ofweights when the user does not have enough information about the problem or has onlyintuition to evaluate the relative importance of one objective over the other. In practice, it isdifficult to establish a relationship between these weights.

    Goal programming pertains to the achievement of prescribed objective function goalsor targets. In this method, the user has to assign targets or goals that he/ she wishes toachieve for each objective. However, the decision-maker must devise the appropriateweights for the objectives. This can also be a difficult task in many cases, unless there is

    prior knowledge about the shape of the search space, the relative importance of theobjectives and meaningful goals.

    For modeling designer’s preference structure, one of the commonly used methods isbased on utility theory (Keeney and Raifa [20]). A lthough utility functions offer an idealway to solve a multiple objective problem (Steuer [30]), one difficulty associated with usingthe utility function approach is that, in practice, no precise approach exists to obtain themathematical representation of the decision-maker’s true preference or utility function in amulti-objective setting. This can be problematic for non-specialists, i.e., untrained

  • 8/9/2019 data clustering approach reliability.pdf

    7/20

    Data Clustering of Solutions for Multiple Objective System 197 

    practitioners.

    3.2. Mul ti ple Objecti ve Geneti c Algori thms

    Genetic Algorithms (GAs) developed by Holland [15] are stochasticsearch/optimization methods that simulate the process of natural evolution to solveproblems with a complex solution space. GAs are computer-based algorithms that mimicsome of the known mechanisms in evolution.

    GAs work as follows: an initial population of individuals is generated at random orheuristically. At every generation, the individuals in the current population are decoded andevaluated according to some predefined quality criterion, referred to as the fitness function.

    Creation of new members is done by crossover   and mutation   operations. Theeffectiveness of the crossover  operator dictates the rate of convergence, while the mutation  operator prevents the algorithm from prematurely converging to a local optimum.

    During the selection procedure, individuals are chosen according to their fitness value.Individuals with high-fitness values have better chances of reproducing, while low-fitnessones are more likely to disappear. The procedure is terminated either when the searchprocess stagnates or when a predefined number of generations is reached.

    GAs are advanced search mechanisms which are ideal for exploring large and complexproblem spaces. However, it is important to not forget that GAs are stochastic iterativeprocesses and they are not guaranteed to converge to an optimal solution. Hence, thetermination condition may be specified as some fixed maximal number of generations or asthe attainment of an acceptable fitness level.

     The performance of GAs in the solution of multi-objective problems has providedgood results, mostly because they are population-based search methods. GAs evaluatemany solutions in each run, as opposed to other search methods, where only one individualat a time progresses toward the non-dominated frontier. This makes GAs very attractive forsolving multiple objective problems.

    Several versions of multi-objective GAs, most often referred to as multiple-objectiveevolutionary algorithms (MOEAs), have been developed, including:

    •  VEGA (vector evaluated genetic algorithm) by Shaffer [28]

    •  MOGA (multi-objective genetic algorithm) by Fonseca and Flemming [13]

    •  NPGA (niched-Pareto genetic algorithm) by Horn et. al. [16]

    •  NSGA (nondominated sorting genetic algorithm) developed by Srinivas and Deb [29]

    •  SPEA (strength Pareto evolutionary algorithm) by Zitzler and Thiele [33]

    •  NSGA-II by Deb, Agrawal, Pratab and Meyarivan ([9], [10])

    •  NPGA 2 developed by Erickson, Mayer and Horn [12]

    In general, MOEAs are suited to multi-objective optimization because they are able tocapture multiple Pareto-optimal solutions in a single run and may exploit similarities ofsolutions by recombination. However, MOEAs produce many different Pareto-optimalsolutions and the final Pareto set (or sub-set) may be very large since it increases with thenumber of objectives. Then, the decision-maker has to consider this potentially huge Paretoset and make his/her choice.

  • 8/9/2019 data clustering approach reliability.pdf

    8/20

    198 Taboada and Coit

    3.2.1. NSGA-II

     The fast elitist non-dominated sorting genetic algorithm II (NSGA-II) was proposed byDeb et al . ([9], [10]). This is the algorithm used for this research. It is an improved versionof the original nondominated sorting genetic algorithm (NSGA) that reduces thecomputational complexity and maintains the solutions of the best Pareto front foundincluding them into the next generation; thereby preventing non-elitism. 

    A crowding distance parameter is incorporated in this algorithm that serves as anestimate of the population density by calculating the perimeter of the largest cuboidenclosing an individual without including any other member of the population. Thisparameter implies that, between any two solutions being considered, the solution with thehigher rank is preferable and a solution that is located in a less crowded region is preferableto a solution that lies in a high solution density region.

    NSGA-II uses the concept of controlled elitism to tune the mutation rate and theelitism rate to attain equilibrium between the two. Controlled elitism limits the maximumnumber of individuals belonging to each front by employing a geometric decreasingfunction that is governed by the reduction rate, thereby preventing the algorithm topremature converge to a local optima.

     This algorithm is highly efficient in obtaining good Pareto-optimal fronts for anynumber of objectives and can accommodate any number of constraints. These features ofNSGA-II are attractive for the purpose of this paper.

    4. Data Clustering Approach to Analyze Multi-Objective RAP 

    A new approach is presented based on the concepts of data clustering afterdetermination of a Pareto-optimal set. This new approach offers benefits compared toprevious approaches because it provides practical support to the decision-maker during theselection step. The main idea of this paper is to systematically assist the decision-makerduring the post-Pareto analysis stage to select his/her choice without precise quantifiedknowledge of the relative importance of the objective functions. In this paper, no objective

    function preferences are known or required. If there are objective function rankings orpriorities available, then a similar clustering approach can be done after filtering ofundesirable solutions, as described by Taboada et al . [31].

    In multicriteria optimization, data clustering can be a useful exploratory technique inknowledge discovery. Since it groups similar solutions together, it allows thedecision-maker to identify potentially meaningful trade-offs among the solutions containedin the Pareto-optimal set without requiring the decision-maker to explicitly define objectivefunction weights or utility functions.

    4.1. Data Clusteri ng Background

    Cluster analysis is a multivariate analysis technique that is defined as the process of

    organizing objects in a database into clusters/ groups such that objects within the samecluster have a high degree of similarity, while objects belonging to different clusters have ahigh degree of dissimilarity (Kaufman and Rousseeuw [19]). The most popularnonhierarchical partitioning method is probably the k -means clustering algorithm. Thegeneral algorithm was introduced by Cox [8], and first named “k -means” by MacQueen[23]. Since then, it has become widely popular and is classified as a partitional ornon-hierarchical clustering method (Jain and Dubes [18]).

     The k -means algorithm is well known for its efficiency in clustering data sets. The

  • 8/9/2019 data clustering approach reliability.pdf

    9/20

    Data Clustering of Solutions for Multiple Objective System 199 

    grouping is done by calculating the centroid for each group, and assigning each observationto the group with the closest centroid. For the membership function, each data pointbelongs to its nearest center, forming a partition of the data. The objective function that thek -means algorithm optimizes is:

    { }

    2

    1,..,1( , ) min || ||

    n

    i j j k i

    K M X C  ∈=

    = −∑   v c ,

    where:

    vi = i th data vector

    c j  = j th cluster centroid

    X = set of data vectors

    C= set of centroids.

     This objective function is used within an algorithm that minimizes the within-clustervariance (the squared distance between each center and its assigned data points). Themembership function for k -means is:

    21 ; if arg min || || ,

    ( | )

    0 ; otherwise.

    i j j

    KM l i

    l

    m  ∀

    ⎧   = −⎪

    = ⎨⎪⎩

    v c

    c v  

     The performance of the k- means clustering algorithm is improved by estimating theideal number of clusters represented in the data. Thus, different cluster validity indices havebeen suggested to address this problem. A cluster validity index indicates the quality of aresulting clustering process. Then, the clustering partition that optimizes the validity indexunder consideration is chosen as the best partition. The silhouette plot method is one ofthese clusters validity techniques.

    Rousseeuw [25] and Rousseeuw et al. [26] suggested a graphical display, the silhouetteplot, to evaluate the quality of a clustering allocation, independently of the clusteringtechnique that is used. The silhouette value for each data point is a measure of how similarthat data point is to points in its own cluster compared to data points in other clusters. s (i ) isknown as the silhouette width. This value is a confidence indicator on the membership ofthe i th sample in clusterX  j  and it is defined as:

    ( ) ( )( )

    max{ ( ), ( )}

    b i a i  s i 

    a i b i  

    −= ,

    where, a (i ) is the average distance from the i th data point to all the other data points in itscluster, and b (i ) is the average distance from the i th data point to all the points in the nearest

    neighbor cluster.

     The value of s(i ) ranges from +1 to –1. A value of +1, indicates points that are verydistant from neighboring clusters; a value of 0, indicates points that are not distinctly in onecluster or another, and a value of –1, indicates points that are probably assigned to thewrong cluster.

    For a given cluster, X  j , a cluster silhouette, S  j , can be determined. S  j  characterizes theheterogeneity and isolation properties of such a cluster. It is calculated as the average of the

  • 8/9/2019 data clustering approach reliability.pdf

    10/20

    200 Taboada and Coit

    sum of all samples’ silhouette widths in X  j . Moreover, for any partition, a global silhouettevalue or silhouette index, GS u , can be used as an effective validity index for a partition U .

    1

    1 c u j 

     j 

    GS S c   =

    =   ∑  

    It has been demonstrated that this equation can be applied to estimate the optimal

    number of clusters for a partition U  (Rousseeuw [25]). In this case the partition with themaximum silhouette index value is taken as the optimal partition.

    4.2. Descri ption of the New Approach

     The developed approach is based on the following steps:

    1. Obtain the Pareto-optimal set or sub-set of Pareto solutions by using a MOEA such asNSGA-II.

    2. Apply thek -means algorithm to form clusters on the solutions contained in the Pareto set. The solution vectors are defined by the specific objective function values, f i (x), for eachprospective solution. Normalization of the objective function values is recommended tohave comparable units. Several replicates are needed to avoid local optima. The solution

    to consider is the one with the lowest total sum of distances over all replicates.

    3. To determine the “optimal” number of clusters, k , in this set, silhouette plots are used. Avalue of the silhouette width, s (i ), is obtained for several values of k . The clustering withthe highest average silhouette width,GS u , is selected as the “optimal” number of clustersin the Pareto-optimal set.

    4. For each cluster, select a representative solution. To do this, the solution that is closest toits respective cluster centroid is chosen as a good representative solution. This results in adramatic reduction in the number of solutions that the decision-maker must consider.

    5. Analyze the representative solutions based on the priorities and preferences of thedecision-maker. At this stage, the decision-maker can either select one solution among

    the k  representative solutions, or he/she can decide to perform further investigation onthe cluster that he/she is most interested. An unbiased suggestion is to focus on thecluster that has the solutions that conform to the “knee” region. The “knee” is formed bythose solutions of the Pareto-optimal front where a small improvement in one objectivewould lead to a large deterioration in at least one other objective.

    6. Then, steps 2, 3 and 4 are applied again on this reduced space formed by the solutions inthe selected “knee” cluster.

    By following this approach, one systematically contracts the subspace in the directionof the most relevant solutions for the decision-maker until a unique selection can be made.

    4.2.1. MATLAB   Implementation

    After obtaining the Pareto set from a particular MOEA, in this case, from theNSGA-II, a MATLAB®  code was developed to perform the steps of the proposedtechnique. From normalized data, the code runs the k -means algorithm, from two to aspecified number of means; it calculate average silhouette values and the clustering with thehighest average silhouette width, u GS , is selected as the “optimal” number of clusters inthe Pareto-optimal set. The pseudocode of the algorithmic implementation is shown inFigure 2.

  • 8/9/2019 data clustering approach reliability.pdf

    11/20

    Data Clustering of Solutions for Multiple Objective System 201 

    Notice that the k -means algorithm can converge to a local optimum, in this case, apartition of points in which moving any single point to a different cluster increases the totalsum of distances. This problem can be overcome by performing several replicates, eachwith a new set of initial cluster centroid positions. That is, each of the replicates beginsfrom a different randomly selected set of initial centroids. The final solution is the one withthe lowest total sum of distances over all replicates.

    It is important to mention that for purposes of this paper, the multiobjective

    formulation of the RAP considered only three objective functions. However, the proposedalgorithm can be applied to a multiobjective problem with more objective functions, such asmaximize reliability, minimize cost, minimize weight, minimize volume, etc.

    For C =2 to  MC  *maximum number of centroids*For  Z =1 to  R *number of replicates*

    Randomly select initial values for C For each vi ∈  X , assign all vi to c j ∈  C  according to nearest c j

    Recompute c jUntil no change in c jReturn C , KM ( X ,C ) and membershipStore values for C , KM ( X ,C ) and membership

     Z = Z +1end

    end Select the minimum KM ( X ,C ) obtained for all replicates

    end Obtain silhouette values, s(i)Choose the cluster with the maximum silhouette width, GS u, of all centroids considered 

    For C =2 to  MC  *maximum number of centroids*For  Z =1 to  R *number of replicates*

    Randomly select initial values for C For each vi ∈  X , assign all vi to c j ∈  C  according to nearest c j

    Recompute c jUntil no change in c jReturn C , KM ( X ,C ) and membershipStore values for C , KM ( X ,C ) and membership

     Z = Z +1end

    end Select the minimum KM ( X ,C ) obtained for all replicates

    end Obtain silhouette values, s(i)Choose the cluster with the maximum silhouette width, GS u, of all centroids considered 

     

    Figure 2. Pseudocode of algorithmic implementation.

    5. Multi-Objective Redundancy Allocation Problem Example 

    A RAP example was solved to illustrate how data clustering can be of great aid for thedecision-maker. The example system configuration consists of 3 subsystems, with an optionof 5, 4 and 5 types of available components for each subsystem, respectively. Theoptimization involves selection from among these component types. The maximumnumber of components is 8 for each subsystem. Table 1 defines the component choices foreach subsystem.

     Table 1. Component choices for each subsystem.

    Subsystem i  1 2 3

    DesignAlternative

     j R C W R C W R C W

    1 0.94 9 9 0.97 12 5 0.96 10 62 0.91 6 6 0.86 3 7 0.89 6 83 0.89 6 4 0.70 2 3 0.72 4 24 0.75 3 7 0.66 2 4 0.71 3 45 0.72 2 8 0.67 2 4

     The multi-objective RAP with three objectives was formulated as shown in Problem P3,to maximize system reliability, minimize total cost and minimize system weight. The

  • 8/9/2019 data clustering approach reliability.pdf

    12/20

    202 Taboada and Coit

    NSGA-II algorithm was used to solve the problem and 75 solutions were found in thePareto set. Figure 3 shows the Pareto-optimal set of solutions found by NSGA-II. Themaximum and minimum values of reliability, cost and weight found on the solutionscontained in the Pareto-optimal set are presented in Table 2.

    Figure 3. Pareto-optimal set for the multi-objective RAP.

     Table 2. Maximum and minimum values found in the Pareto-optimal set.

    Pareto-optimal setReliability Cost Weight

    max 0.999999 143 121min 0.682048 13 19

    Once the Pareto-optimal set is obtained, the k -means algorithm was then used to

    cluster the solutions in the Pareto set. Normalization of the objective function values wasperformed to have comparable units. Thus, the three objective functions were normalizedusing the following linear normalization equation; although other types of normalizingequations can be used, i.e., logarithmic.

    −∀ =

    −  …

    min

    max min

    ( ) ( )1,2, , ,

    ( ) ( )i i 

    i i 

    f f i n 

    f f 

    x x

    x x 

    where f i min(x) = minimum value for ( )i f  x   found in the Pareto optimal set.

    f i max(x) = maximum value for ( )i f  x   found in the Pareto optimal set.

     To use the above equation, all the objective functions were considered to be minimized,

    thus reliability was multiplied by -1. It is important to remark that by using a different typeof normalization function, the clustering outcome may look different.

     To determine the optimal number of clusters, silhouette plots were used as suggestedby Rousseeuw [25], and several runs were performed for different values of k  with severalreplicates for each value of k . For this particular data set, 5 was found to be the optimumnumber of clusters. The five clusters are shown in Figure 4 from normalized data. Cluster 1contains 4 solutions; there were 15 solutions in cluster 2, 5 in cluster 3, 30 in cluster 4 and21 solutions in cluster 5. The decision-maker now considers only five clusters instead of 75

  • 8/9/2019 data clustering approach reliability.pdf

    13/20

    Data Clustering of Solutions for Multiple Objective System 203 

    unique solutions. This demonstrates that this new approach, combining MOEA and dataclustering, is of practical benefit.

    Figure 4. Clustered Pareto-optimal set.

    One way to select a final solution, among all the ones contained in every particularcluster, is to identify the solution that is closest to its corresponding centroid. So, in this way,the decision-maker is given only k  particular solutions to further consider. Table 3 showsthe representative solutions for each cluster with its corresponding reliability, cost andweight.

    With the information from Table 3, the decision maker has now a smaller set ofnondominated solutions. I t becomes easier to comprehend, analyze and evaluate thetrade-offs to finally make his/ her choice regarding the importance of the different

    objectives as well as budget constraints. The algorithm, coded in MATLAB 7.0, was run ina Sony VAIO computer, with an Intel Pentium processor operating at 1.86 GHz and 1 GBof RAM. The computation time was 1.4219 seconds.

     Table 3. Clustering results of the entire Pareto-optimal set.

    # ofsolutions

    Representativesolution

    Reliability Cost Weight

    Cluster 1 4 2 0.720365 16 24Cluster 2 15 61 0.999565 81 77Cluster 3 5 73 0.999994 121 113Cluster 4 30 43 0.986017 54 43Cluster 5 21 17 0.915556 33 28

    Another way to take advantage of this method is to consider the cluster(s) thatcontain(s) the most interesting solutions of the Pareto-optimal set, i.e., those where a smallimprovement in one objective would lead to a large deterioration in at least one otherobjective. These solutions are often referred as “knees.” In this case, as we can see fromFigure 4, solutions in cluster 4 are likely to be more relevant to the decision-maker. Themaximum and minimum values of reliability, cost and weight of cluster 4 are shown in

     Table 4.

  • 8/9/2019 data clustering approach reliability.pdf

    14/20

    204 Taboada and Coit

     Table 4. Maximum and minimum values found in cluster 4.

    Cluster 4 Reliability Cost Weightmax 0.999036 77 55min 0.961883 38 32

    At this point, the decision-maker has two choices: either to choose solution #43 fromcluster 4 as a good representative solution of this “knee” region or decide to focus his/her

    search more intensely on this knee region. Then, the starting 75 solutions have beenreduced to only the 30 solutions found in the “knee” region as shown in Figure 5.

    Figure 5. “Knee” region of the Pareto-optimal set.

    For instance, if the decision-maker decides to further investigate this “knee” region,then the 30 solutions contained in cluster 4 are further investigated. Clustering is again usedto find groups just on this reduced space, and with the use of the silhouette plots, 11 wasfound to be the optimal number of clusters. In this way, one systematically contracts thesubspace in the direction of the most relevant solutions for the decision-maker. Figure 6shows the clusters found on the original cluster 4 from normalized data.

    Figure 6. Clusters found on original cluster 5.

  • 8/9/2019 data clustering approach reliability.pdf

    15/20

    Data Clustering of Solutions for Multiple Objective System 205 

    Since the original cluster 5 already contained promising trade-offs, plotting thesolutions in two dimensions can be of graphical aid for the decision-maker. Figures 7, 8and 9 plot reliability vs. cost, reliability vs. weight and cost vs. weight, respectively, fromnormalized (0 to 1) objective function data.

    Figure 7. Reliability vs. Cost.

    Figure 8. Reliability vs. Weight.

    From Figures 7 and 8, clusters 3, 10 and 11, can be considered as undesirable becausethey do not have a large reliability compared with the other clusters, but in Figure 9, it canbe observed that these three clusters are the ones that provides the minimum values for costand weight. Nevertheless, clusters 1 and 6 in Figures 7 and 8 have large reliability but it isachieved at relatively high cost and weight.

     The analysis of these trade-offs continues until a solution or a small portion of thenondominated set is located. Then, this solution or sub-set will contain the preferredsolutions of the overall problem. It is important to note that, even when the space has beenreduced to the “knee” region or to the region that contains the most promising solutions, inthe absence of information, none of the corresponding trade-offs can be said to be betterthan the others. Thus, the choice of one solution over the other is going to lie on the abilityof the decision-maker and on his/her knowledge of the system’s intended usage and thepriorities and preferences of the system’s intended user/ owner.

  • 8/9/2019 data clustering approach reliability.pdf

    16/20

    206 Taboada and Coit

    Figure 9. Cost vs. Weight

     The representative solutions of these 11 clusters found on the “knee” region are shownin Table 5 with their corresponding values of reliability, cost and weight.

     Table 5. Clustering results of the “knee” region

    # ofsolutions

    Representativesolution

    Reliability Cost Weight

    Cluster 1 3 51 0.998043 68 45Cluster 2 3 47 0.992115 59 37Cluster 3 3 28 0.963644 39 35Cluster 4 4 44 0.986416 55 41

    Cluster 5 1 34 0.979653 44 38Cluster 6 2 56 0.999036 77 51Cluster 7 4 40 0.983483 46 46Cluster 8 2 36 0.982178 52 35Cluster 9 3 50 0.994940 60 49Cluster 10 2 31 0.973035 42 34Cluster 11 3 30 0.970198 41 39

    For this particular multi-objective RAP, clusters 2, 4 and 7 seem to contain desirablesolutions. For ease of interpretation and analysis, the 11 representative solutions of this“knee” region are plotted in two dimensions in Figure 10 for reliability vs. cost. From this

    figure, one can easily notice that solutions #47, #44 and #40 belonging to clusters 2, 4 and7 respectively are the ones that are presented to the decision-maker as good trade-offs if nopreviously defined objective function preference have been specified by the decision-maker.

    For example, solution #44, shown in Figure 11, achieves a reliability of 0.986416 at acost of 55 and a weight of 41. For system implementation, the configuration is composedof one component of type 1, one component of type 2 and one component of type 5 forsubsystem 1; two components of type 1 for subsystem 2, and one component of type 1 andone component of type 3 for subsystem 3.

  • 8/9/2019 data clustering approach reliability.pdf

    17/20

    Data Clustering of Solutions for Multiple Objective System 207 

    Figure 10. Representative solutions of the “knee”region in a two- dimensional space.

    Figure 11. Illustration of the systemconfiguration for solution #44.

    6. Conclusion

     The RAP was formulated as a multiple objective system reliability optimizationproblem, with system reliability to be maximized and system cost and weight to beminimized. A Pareto-optimal set of solutions was obtained using the fast elitist NSGA-IIalgorithm. Then, the developed algorithm was used to find groups of similar solutionscontained in the Pareto front.

    In our approach, a clustering validation technique has been integrated into thek -means clustering algorithm to give a relatively automatic clustering process. The onlyparameters defined by the user are the maximum number of clusters to be analyzed and thedesired number of replicates to carry out. This is to avoid the bias due to the selection ofthe initial centroids that usually can be observed. This was performed by selecting differentvalues as initial centroids and then comparing the results obtained until a minimum wasfound. To determine the “optimal” number of clusters, k , in the set, the silhouette method

  • 8/9/2019 data clustering approach reliability.pdf

    18/20

    208 Taboada and Coit

    was applied. A value of the silhouette width, s (i ), was obtained for the several values of kdesired to investigate. The clustering with the highest average silhouette width, GS u , wasselected as the “optimal” number of clusters in the Pareto-optimal set.

    With this approach, the decision-maker obtained a pruned Pareto-subset of just k  particular solutions. Moreover, clustering analysis was useful to focus search on the “knee”region of the Pareto front. The “knee” region is characterized by those solutions of thePareto-optimal set where a small improvement in one objective would lead to a largedeterioration in at least one other objective. The clusters formed in this region contain thosesolutions that are likely to be more relevant for the decision-maker. The analysis of thesetrade-offs continue until a solution or a small portion of the nondominated set is located. Inthis paper, we can see that no objective function preferences are known or required. If thereare objective function rankings or priorities available, then a similar clustering approach canbe done after filtering of undesirable solutions, as described by Taboadaet al . [31].

    References

    1. 

    Bellman, R. E. (1957).Dynamic Programming . Princeton University Press.

    2.  Bellman, R. E. and Dreyfus, E. (1958). Dynamic programming and reliability of

    multicomponent devices.Operations Research , 6, 200-206.3.  Bulfin, R. L. and Liu, C. Y. (1985). Optimal allocation of redundant components for

    large systems. IEEE T ransactions on Reli abili ty , 34, 241-247.

    4.  Busacca, P. G., Marseguerra, M. and Zio, E. (2001). Multiobjective optimization bygenetic algorithms: application to safety systems. Reli ability Engineering and SystemSafety , 72, 59-74.

    5.  Chern, M . S. (1992). On the computational complexity of reliability redundancyallocation in a series system.Operat ions Research Letters , 11, 309-315.

    6. 

    Coit, D. W. and Smith A. (1996a). Reliability optimization of series-parallel systemsusing a genetic algorithm. IEEE Transactions on Reliabili ty , 45(2), 254-260.

    7. 

    Coit, D. W. and Smith A. (1996b). Penalty guided genetic search for reliability designoptimization. Computers and Industrial Engineeri ng , 30(4), 895-904.

    8. 

    Cox D. (1957). Note on grouping. Journal of the American Statisti cal Association , 52,543–547.

    9.  Deb, K ., Agarwal, S., Pratap, A. and Meyarivan, T. (2000a). A Fast ElitistNon-Dominated Sorting Geneti c Algori thm for Multi -Objective Optimization : NSGA-II.KanGAL Report Number 200001 , Indian Institute of Technology. Kanpur, India.

    10.  Deb, K ., Agarwal, S., Pratap, A. and Meyarivan, T. (2000b). A Fast ElitistNon-Dominated Sorting Geneti c Algori thm for Multi -Objective Optimization : NSGA-II.Proceedings of the Parallel Problem Solving from Nature VI Conference , 849-858. Paris,France.

    11. 

    Dhingra, A. K. (1992). Optimal apportionment of reliability and redundancy in seriessystems under multiple objectives. IEEE T ransactions on Reli abili ty , 41(4), 576-582.

    12.  Erickson, M., Mayer, A. and Horn, J. (2001). The niched Pareto genetic algorithm 2applied to the design of groundwater remediation systems. Fi rst I nternational Conferenceon Evolutionary Multi -Cri terion Optimization , 681-695.

    13.  Fonseca, C. M. and Fleming, P. J. (1993). Genetic algorithms for multiobjectiveoptimization: formulation, discussion and generalization. Proceedings of the Fi fthInternational Conference on Geneti c Algori thms , 416-423. San Mateo, California.

  • 8/9/2019 data clustering approach reliability.pdf

    19/20

    Data Clustering of Solutions for Multiple Objective System 209 

    14. 

    Fyffe, D. E., Hines, W. W. and Lee, N. K . (1968). System reliability allocation and acomputational algorithm. IEEE Transactions on Reliabil i ty , 17, 64-69.

    15. 

    Holland J. (1975). Adaptation in natural and arti ficial systems . U. Michigan Press, MI.

    16.  Horn, J., Nafpliotis, N. and Goldberg, D. E. (1994). A niched Pareto genetic algorithmfor multiobjective optimization. In Proceedings of the Fi rst I EEE Conference onEvolutionary Computation, IEEE World Congress on Computational Intell igence , 1, 82-87,Piscataway, New Jersey. IEEE Service Center.

    17. 

    Inagaki T., Inoue, K . and Akashi, H. (1978). Interactive optimization of systemreliability under multiple objectives. IEEE Transactions on Reliabil i ty , 27, 264-267.

    18.   Jain A. K . and Dubes R. C. (1988). Algori thms for Clustering Data . Englewood Cliffs:Prentice Hall.

    19. 

    Kaufman L. and Rousseeuw P. J. (1990). Finding Groups in Data. An I introduction toCluster Analysis . Wiley-Interscience.

    20.  Keeney, R. L . and Raiffa H. (1976). Decisions wi th Multiple Objectives : Preferences andTradeoffs . John Wiley & Sons.

    21. 

    Kulturel-Konak, S., Smith, A. and Coit D. W. (2003). Efficiently solving theredundancy allocation problem using Tabu search. I IE Transactions , 35(6), 515-526.

    22. 

    Kulturel-Konak, S., Baheranwala, F. and Coit D. W. (2005). Pruned Pareto-Optimalsets for the system redundancy allocation problem based on multiple prioritizedobjectives.Journal of Heuristi cs(in press).

    23.  MacQueen J. (1967). Some methods for classification and analysis of multivariateobservations. In L. M. LeCam and J. Neyman, editors, Proceedings of the Fifth BerkeleySymposium on M athematical Stati sti cs and Probabili ty , 1, 281–297.

    24.  Rao, S. S. (1991). Optimization Theory and Application . New Delhi: Wiley EasternLimited.

    25. 

    Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation andvalidation of cluster analysis. Journal of Computational and Applied Mathematics , 20,53-65.

    26. 

    Rousseeuw P., Trauwaert E. and Kaufman L. (1989). Some silhouette-based graphicsfor clustering interpretation. Belgian Journal of Operations Research, Statisti cs andComputer Science , 29(3).

    27.  Sakawa, M. (1978). Multiobjective optimization by the surrogate worth trade-offmethod. IEEE T ransactions on Reliabili ty , 27, 311-314.

    28.  Schaffer, J. D. (1985). Multiple objective optimization with vector evaluated geneticalgorithms. In genetic algorithms and their applications, Proceedings of the Fi rstInternational Conference on Geneti c Algori thms , 93-100. Hillsdale, New Jersey.

    29.  Srinivas, N. and K. Deb. (1994). Multiobjective optimization using nondominatedsorting in genetic algorithms. Journal of Evolut ionary Computation , 2(3), 221-248.

    30. 

    Steuer, R. E. (1989).Mult iple Cri teria Optimization: Theory, Computation, and Application .Reprint Edition, Krieger Publishing Company, Malabar, Florida.

    31. 

     Taboada, H., Baheranwala, F., Coit, D. W. and Wattanapongsakorn, N. (2006).Practical solutions of Multi-Objective optimization: An application to systemreliability design problems.Reli abili ty Engineering & System Safety , 92(3), 314-322.

    32. 

    Zeleny, M . (1982). Mult iple Criteria Decision Making . McGraw-Hill series inQuantitative Methods for Management.

    33. 

    Zitzler, E. and Thiele, L . (1999). Multiobjective evolutionary algorithms: A

  • 8/9/2019 data clustering approach reliability.pdf

    20/20

    210 Taboada and Coit

    comparative case study and the strength Pareto approach. IEEE Transactions onEvolut ionary Computation , 3(4), 257-271.

    Authors’ Biographies:

    Heidi A. Taboada  is a PhD Candidate in the Department of Industrial & SystemsEngineering at Rutgers University, Piscataway, NJ. She has an MS in Industrial & SystemsEngineering from Rutgers University (2005), an MS in Industrial Engineering (Minor inQuality) from Instituto Tecnológico de Celaya (2002), and a BS in BiochemicalEngineering from Instituto Tecnológico de Zacatepec (2000). Her research interests lie inthe areas of applied operations research, multiple objective optimization, biologicallyinspired methods and algorithms, including evolutionary computation, reliability modelingand optimization, and data mining.

    David W. Coit is an Associate Professor in the Department of Industrial & SystemsEngineering at Rutgers University. He received a BS degree in Mechanical Engineeringfrom Cornell University, an MBA from Rensselaer Polytechnic Institute, and MS & PhD inIndustrial Engineering from the University of Pittsburgh. In 1999, he was awarded aCAREER grant from NSF to study reliability optimization. He also has over ten years ofexperience working for IIT Research Institute (IITRI), Rome, NY (IITRI is now calledAlion Science & Technology), where he was a reliability analyst, project manager, and anengineering group manager. His current research involves reliability prediction &optimization, risk analysis, and multi-criteria optimization considering uncertainty. He is amember of IIE and INFORMS.