data reduction via clustering and averaging for contingency and reliability analysis

Electrical Power and Energy Systems 43 (2012) 1435–1442

Contents lists available at SciVerse ScienceDirect

Electrical Power and Energy Systems

journal homepage: www.elsevier .com/locate / i jepes

Data reduction via clustering and averaging for contingency and reliability analysis

Håkon Kile ⇑, Kjetil UhlenDepartment of Electric Power Engineering, Norwegian University of Science and Technology, N-7491 Trondheim, Norway

a r t i c l e i n f o a b s t r a c t

Article history:Received 27 June 2011Received in revised form 18 June 2012Accepted 5 July 2012Available online 9 August 2012

Keywords:Data reductionClusteringContingency analysisReliability analysisPower market model

0142-0615/$ - see front matter � 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.ijepes.2012.07.011

⇑ Corresponding author.E-mail addresses: [email protected] (H. K

Uhlen).

Data reduction is necessary when the choice of analysis method cannot deal with large data sets. This isfor instance the case when a power market model generates future generation and load scenarios, andone wants to use these scenarios as a basis for contingency and reliability analysis. A framework thatforms a reduced data set, which keeps the important information from the full data set intact, is pre-sented. The framework uses statistical methods to find patterns in the data, and use averaging to reducethe data set. A case study where the framework is used in a reliability analysis is included.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

A realistic picture of the future utilization of the power systemis important both for operation planning and reliability analysis.This is obtained by generating a set of operational states (OPs),where each state represents a certain future generation and loadscenario [1].

The OPs are used as input in a contingency analysis, where theconsequences of a set of outages/contingencies are determined. Foroperation planning, the main interest is whether or not the currentpower system can handle the future load on the system, i.e., if theoperational criteria are fulfilled or not, while a reliability analysiscalculates different probabilistic reliability indices [2], e.g., energynot supplied (ENS). The validity of the analysis depends on howwell the generated set of OPs actually represents the future loadand generation patterns.

The ‘‘classical’’ approach is to model a few future worst-casescenarios, e.g., heavy load situations, and use these as a represen-tation of the future utilization. It is evident that this is not a com-plete description of the future utilization, and is not an optimalbasis for operation planning and reliability analysis.

To get a more complete picture of the future load and genera-tion patterns, a power market model can be used to generate fu-ture scenarios. Power marked models, e.g., the EMPS modeldescribed in [3], can for instance take into account how the weath-er affect the generation and loads patterns, and thus generate OPs

ll rights reserved.

ile), [email protected] (K.

which describes different operation scenarios for a given powernetwork. The EMPS model can for instance generate four differentOPs per week for 75 years of inflow data. The OPs for one week willrepresent different load scenarios within the week, e.g. the heavyload OP for a given week will be the average of the heavy load peri-ods within that week. Other power market models can generateOPs on a different time scale.

Even though the output from the power market model (hope-fully) is a good representation of the future utilization, it can bevery computationally intensive to analyse all the generated OPs.For instance, by considering 4 OPs per week for 75 years and 100contingencies to be tested for each OP, one has to solve 1.56 mil-lion power flow scenarios. The number of power flow solutionsgrows rapidly as the number of parameters is increased, andquickly turns into an infeasible problem with today’s technology.

A mid-way between the two might be to pick some of the OPsgenerated by the power market model, and hope these representa sufficiently realistic picture of the future load on the power sys-tems. This is the motivation behind the framework presented inthis paper, which finds a subset of the operational states generatedby a power market model, which is sufficient to describe the futureload on the power system. In the rest of this paper, such a subsetwill be denoted a representative set.

The framework outlined in this paper will search for naturalgroups in the full set, by using clustering (unsupervised learning)algorithms. The operational states within one cluster will be as-sumed to be approximately equal, and represented by the centroidof the group. This will reduce the number of operational statesneeded to be analyzed in the contingency analysis substantially,while retaining much of the details/accuracy generated by the

http://dx.doi.org/10.1016/j.ijepes.2012.07.011

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.ijepes.2012.07.011

http://www.sciencedirect.com/science/journal/01420615

http://www.elsevier.com/locate/ijepes

Fig. 1. Illustration of a data analysis setting where the framework outlined here can be useful. The initial n observations are placed in k groups. Each group is represented byone x0i , which is analyzed. The data estimation step can be done to, e.g., bring back the original order of the observations.

Fig. 2. Example of natural grouping of data. The P and Q represent powerproduction for a generator. The two groups can be easily separated, but only theP direction is useful for separating the two groups.

1436 H. Kile, K. Uhlen / Electrical Power and Energy Systems 43 (2012) 1435–1442

power market model. This framework will typically be useful in asecurity of electricity supply analysis, see [1] for a completedescription.

If the probabilities of the different OPs are of interest, as in riskanalysis, this framework can keep the empirical probability distri-bution of the OPs approximately intact. If all OPs initially haveequal probability (1/n), and if group k in the representative sethas nk members, the OP representing group k will be assigned aprobability of nk/n.

The general applicability of the framework is illustrated inFig. 1. The n initial observations are grouped into k groups, whereone observation represents each group. The further part of theanalysis is a suggestion of what can be done based upon the repre-sentative set.

Within power system analysis, clustering is used for data reduc-tion in [4–7], while supervised learning is used for classification in,e.g., [8–11] and forecasting/regression in, e.g., [12,13]. The ap-proach taken in this paper bare a lot of similarities with scenarioreduction [14]. A different approach of data reduction is adaptivesampling [15], but that approach is mainly useful for very largedata sets. Although motivated by reducing the number of opera-tional states, the techniques discussed in this paper are quite gen-eral, and similar data reduction approaches are found within otherfields.

The next section is a discussion of the qualitative requirementsfor a representative set. Section 3 deals with mathematical detailsand actual implementation of the framework. The last part of thepaper is a case study, where the framework is used in a reliabilityanalysis.

2. Qualitative assessment of a representative set

We require the representative set to be an adequate base forfurther analysis, i.e., it should capture the main/important charac-teristics of the full set. Furthermore, the representative set shouldbe substantially smaller than the full set. These two requirementsare contradictory, and a trade-off between them is needed. Thereare some important issues to consider before the appropriatetrade-off can be determined, e.g., how much loss of accuracy canone accept at the gain of less computational effort required to com-plete the overall analysis. The following discussion deals withsome of these issues.

Clustering algorithms essentially search for natural groupswithin the data, meaning that observations in one group shouldbe more similar to each other compared to those in other groups.For instance, it is evident that the power generator data plottedin Fig. 2 can be divided into two groups by drawing a vertical lineat P = 2. The existence of natural groups within the data, and thenumber of such groups, is generally unknown. If a natural groupingexists, clustering algorithms can be successful in finding thesegroups, and some possible methods are presented in the next sec-tion. If there is no such natural grouping within the data, oneshould interpret the whole dataset as one cluster. If this is the case,

the problem of finding a representative set is quite different, as itturns into a segmentation problem.

From another perspective, if one has a predetermined numberof operational states one can afford to include in the representativeset, one has a segmentation problem. This can for instance be thecase if one is willing to analyse, say, only 50 operational states.

There is also a problem related to how to represent observationwithin one cluster. If the cluster is compact, it should be sufficientto represent all observations by the centroid. On the other hand, ifthe cluster is quite large (where large means high volume), thismight not be sufficient. The amount of averaging done dependson how much accuracy one is prepared to lose at the gain of moredata reduction.

The type of power system will probably influence the number ofgroups needed in a representative set. In a power system domi-nated by thermal energy production, the future generation pat-terns are quite predictable, and possibly a few groups can beadequate. While in systems where generation is dominated byintermittent generation, i.e., wind and hydro power, the futuregeneration patterns will probably vary a lot, and more groups areprobably needed in a representative set.

To actually have a representative picture of the future load onthe power system, the extreme states/outliers are important, asthey might represent special cases where the system is under con-siderable strain. An outage in this case might have severe conse-quences. Most clustering algorithms cannot handle outliers. If anoutlier is included in a cluster, the characteristics of the cluster,e.g., the mean, will be severely affected, and this will distort theclustering procedure. Thus it is important to take care of outliersin a suitable manner.

Much of the success of the framework presented depends onhow much it can reduce the computational effort for the followinganalysis. At the same time, the framework should find the repre-sentative set in an efficient matter. This means, the representativeset should substantially reduce the number of operational states tobe analyzed, and at the same time it should quickly find the

H. Kile, K. Uhlen / Electrical Power and Energy Systems 43 (2012) 1435–1442 1437

representative set. If the framework uses much time to find therepresentative set, little is gained by doing the data reduction. Ofcourse, the time one can afford to spend finding the representativeset depends on how computationally intensive the following anal-ysis is, e.g., if one analyse at first, second, or higher ordercontingencies.

The additional gain of this type of framework is that it can helpfinding patterns and structure the data. This might be useful forother analyses.

3. The framework

In this section, an OP will be represented as a p-dimensionalvector~xi ¼ ½xi;1; . . . ; xi;p�, where each x�,j is denoted a feature. A fea-ture can for instance be power generation or load at a bus in apower network. Often,~x will be referred to as an object.

3.1. Similarity and dissimilarity

All clustering algorithms need a measure of similarity/dissimi-larity between objects. All features in this paper are quantitativefeatures, and absolute difference will be used to define dissimilar-ity between features,

d xi;j; xi0 ;j

� �¼ jxi;j � xi0 ;jj

l: ð1Þ

The total dissimilarity between two objects is the weighted sumof all the pairwise dissimilarities between the features of the twoobjects,

D xi; xi0ð Þ ¼Xp

j¼1

wj � d xi;j; xi0 ;j

� �;Xp

j¼1

wj ¼ 1: ð2Þ

As l increases, more weight will be put on large distances be-tween the features. A more thorough discussion of distance mea-sures can be found in [16]. Other alternatives exist, e.g.,probabilistic measures, but these are not considered here.

3.2. Feature selection and transformation

The definition of dissimilarity in (2) requires features (x) as in-put. Which features one choose to include in the dissimilarity cal-culation is extremely important to the results.

First, some features are useful for separation among objects,while others are not. The hypothetical example in Fig. 2 illustratesthis. The points in the figure are simulated data for a power gener-ator, where P and Q are in pu. It is clear that there are two naturalgroups here – high and low production. But only the P value is use-ful for separating the two production categories.

A common problem in data analysis relates to the dimension of~x. When the dimension of the feature vector~x is high, e.g., 10 andabove, distance measures break down and one get a sparsely pop-ulated sample space. This is known as the curse of dimensionality[16], and is a serious problem which too often is neglected.

In general, there are two different solutions to this problem –feature selection and feature transformation. Feature selectionaims at finding a subset of the xi,j’s, which are used as features ofthe OPs in the clustering process. Feature transformation essen-tially projects~x down to a lower dimensional space. Principal com-ponent analysis is a popular method, while random projection andindependent component analysis are other alternatives.

In power system analysis, a hybrid between these two could beconsidered. The similarity measure above does not require the ac-tual power injections and loads as input, only some quantitativefeatures. For the power network in Fig. 4, which is used in the casestudy in Section 4, one could look at the power transmissions be-

tween the areas, which can be approximated quickly by a DCpower flow, and use these as features.

In fact, the feature representation is much more important inthis context, than the choice of clustering algorithm. However, thistopic is case/domain specific, and in that sense are the results notas generally applicable, and therefore not such a popular researchtopic as the clustering algorithms themselves.

3.3. Clustering algorithms

Two classification criteria for classifying clustering algorithmsare useful in this context; hierarchical methods vs. partitioningmethods and hard vs. fuzzy clustering. Partitioning methods aimat finding one partition for the data, while hierarchical methodsaim at fitting the data into a hierarchical structure. Hard clusteringassign each object to one cluster, while fuzzy clustering can let oneobject belong to more than one cluster, where the former is moresuitable for this framework.

Below are k-means and agglomerative clustering briefly sum-marized. For more thorough discussions of the vast amount of dif-ferent clustering techniques, see [16,17].

3.3.1. K-MeansThis method requires the number of clusters k as input. It then

assigns objects to k partitions, such that the ‘‘between cluster’’ var-iation is maximized with respect to the ‘‘within cluster’’ variation.This algorithm uses the Euclidean distance as dissimilaritymeasure.

The different steps of the process are:

� Step 1: Choose k and partition the data into k clusters.� Step 2: Calculate the cluster centers.� Step 3: Reassign objects to the cluster with the nearest cluster

center.� Step 4: Repeat steps 2 and 3 until no change occur between

iterations.

It is also possible to add a fifth step, which takes action whenempty clusters appear and/or takes care of outliers.

Essentially, this algorithm partitions the data into hyperspheri-cal clusters. It is known to be very successful if well separated andcompact clusters exits within the data [18]. The algorithm is alsoquite efficient since it work directly on the data.

The major drawback of this method is that it solely depend onthe user to choose k, and it is sensitive to the initial partition. Thatis to say, if a bad initial partition is chosen, there is no guaranteethat the algorithm will converge to the global optimum. However,convergence is guaranteed. One way to partly solve this problem isto start the algorithm with different initial partitions, and see if itconverges to the same optimum. If no global optimum is reached,the solution with maximized ‘‘between cluster’’ variation is cho-sen. The problem of choosing k is addressed later.

3.3.2. Agglomerative clusteringHierarchical clustering is divided into agglomerative and divi-

sive clustering. Divisive clustering starts with all objects in onecluster, and at each level it chooses an optimal split of the clusters.This approach is way too computationally intensive to be useful inthis framework, except if one search for a very small number ofclusters. Only agglomerative clustering will be considered here.

The agglomerative clustering algorithm requires a dissimilaritymatrix as input. The dissimilarity matrix is a symmetrical matrixwhere all the pairwise dissimilarities between objects are calcu-lated. Agglomerative clustering starts with all objects in separateclusters, and starts merging clusters together based upon the dis-


similarity matrix and the linkage criterion. The most common link-age types are:

� Single linkage: The two clusters with the minimum minimaldistance between them are merged.

� Complete linkage: The two clusters with minimum maximaldistance between them are merged.

� Average linkage: The clusters with the minimum distancebetween their cluster centers are merged.

The algorithm yields a dendrogram, and cutting the dendro-gram at a certain height forms clusters. The dendrogram in Fig. 3represent the agglomerative clustering of the data from Fig. 2. Toget the two natural clusters, one cut the dendrogram at the heightrepresented by the horizontal line ‘‘Cut 1’’. ‘‘Cut 2’’ will yield fourclusters.

Single linkage often results in chaining, i.e. one large cluster isformed. Complete linkage usually produces small and tight clus-ters, and is the most used in practice. The average linkage requiresmore computations. See [16] for a more details.

3.3.3. ComparisonThe k-means is quite efficient as it work directly on the data,

while building the dendrogram is quite time consuming. On theother hand, the agglomerative algorithm is not dependent on theinitial partition, and can handle different dissimilarity measures,as long as they can form a dissimilarity matrix. Cutting the tree/dendrogram at different heights, and by that forming clusters,can be done quite efficiently. If many values of k are to be tested,as is the case when estimating k, this can be more efficient thank-means.

Both k-means and hierarchical clustering implicitly put somestructure on the data. The k-means forms hyperspherical clusters,while this might not be a suitable structure. K-Means also tendto produce clusters of approximately equal size. This is not idealfor this framework, since the goal is to search for large clusters,and let the potential outliers form their own small clusters.

The agglomerative approach often produces quite differenttrees for different linkage criteria, which again suggest that thistype of structure/hierarchy is not ideal for the data set at hand. Ifthe different linkage criteria produce approximately the same tree,it can indicate that this is a suitable approach.

3.4. Finding k

The clustering algorithms above require k as input. If the goal isto find say 10 groups, the k-means algorithm seems like a good

Fig. 3. The dendrogram build by an agglomerative clustering of the data in Fig. 2. Tocut the dendrogram at the height of the horizontal line cut 1 will result in apartition of the data into the two natural groups seen in Fig. 2.

choice. One could consider running the k-means with some differ-ent initial partitions to verify global convergence. On the otherhand, if one wants to estimate k from the data, the problem is morecomplicated.

A common way of estimating the number of clusters k, isthrough the ‘‘within cluster dispersion’’ Wk. The following defini-tion is the same as in [19].

Dr ¼X

i;i02Cr

dii0 ; ð3Þ

where Cr is the index of the data points assigned to cluster r and dii0

is the squared Euclidean distance. The ‘‘within cluster dispersion’’ isthen

Wk ¼Xk

r¼1

12nr

Dr; ð4Þ

where nr = jCrj, the number of observations in cluster r.As the number of clusters is increased, Wk will decrease. If we

denote K as the ‘‘true’’ number of clusters, then Wk will decreasequickly as k approaches K from below. We assign objects that donot belong to the same cluster to different clusters, i.e., one splitclusters containing ‘‘very’’ dissimilar objects. As k reaches K the de-crease in Wk will decline, since all data points at this stage are asso-ciated with only ‘‘natural’’ neighbours – the curve of Wk willmarkedly flatten out. As is said in [19] ‘‘Statistical folklore has itthat the location of such an ‘elbow’ indicates the appropriate num-ber of clusters’’. The presence of such an ‘‘elbow’’ however, can of-ten be hard (impossible) to locate.

If one expect the number of cluster to be high, this is even morecomplicated as one has to form clusters for each value of k, andthen calculate Wk, which gets very computationally intensive. Thisis not in accordance with the efficiency criterion for the currentframework.

There are two major problems with the Wk approach of search-ing for k. As mentioned is the location of the elbow difficult to find,and the approach fails in the situation where there is no naturalclustering, i.e., k = 1. This is some of the motivation behind thegap statistic [19], which is an interesting approach to estimating k.

3.5. Cluster verification

Clustering, or unsupervised learning, is essentially learningwithout a teacher. The goal is to form clusters with similar objectsin them, but a general verification of the clustering is not available.This is not the case in supervised learning, e.g., regression models,where one for instance can use cross-validation or bootstrapping toestimate the prediction error.

This type of assessment is not available for clustering, and onesolely rely on heuristic arguments like Wk and the ‘‘tip of the el-bow’’, or the maximized between cluster variation of the k-means.These are logical arguments, partly verified by experience, butwithout the possibility of any objective assessment.

These heuristics arguments, being the basis for clustering, areone of the reasons there exists such a vast amount of clusteringalgorithms. As the measure of success is a subjective matter, opti-mal clustering will depend on the user. Additionally, clusteringalgorithms implicitly put some structure on the data. The true datastructure usually deviates from this theoretical structure, andtherefore some clustering algorithms will be more efficient forsome data sets than others. There is no general optimal clusteringprocedure.


4. Case study

The test network can be seen in Fig. 4, where arrows indicatedelivery points. The EMPS-NC model [3,20], is used to generate 4OPs per week for 50 years, resulting in a total of 10,400 OPs.EMPS-NC is an extension of the EMPS model, which includes net-work constraints.

To illustrate and measure the accuracy of the presented datareduction framework, a security of electricity supply analysis isdone based on a full, and a representative set of OPs. The securityof supply is measured through the annual energy not supplied(ENS), which is calculated as explained below. Other reliabilityindices are omitted for simplicity. For a complete overall descrip-tion of a security of electricity supply analysis, see [1].

4.1. Contingency and reliability analysis

To limit the problem size, only single contingencies (29 in total)are analyzed. For each OP, the consequences of the 29 contingen-cies are determined through a power flow analysis. A set of loadshedding rules and corrective actions are defined, which are usedto eliminate potential overload during outages. If these correctiveactions fail to bring the system back into a valid (no overload)

Fig. 4. The network used in the case study. The arrows indicate delivery points. Generathermal and wind power. The line between area 3 and 4 is a HVDC line.

state, all loads are assumed lost. In this way, the system availablecapacity (SAC) can be determined for each delivery point, contin-gency, and OP.

A very brief summary of how to calculate the ENS is given here,and a more thorough explanation is given in [21]. Let operationalstates within a year be indexed by i 2 [1,208], delivery points byd 2 [1,10], and contingencies by j 2 [1,29].

The interrupted power Pinteri;d;j , for fixed (i, d, j), is the difference

between the original demand Pi,d, from the power market model,and the SACi,d,j.

Pinteri;d;j ¼ Pi;d � SACi;d;j ½MW� ð5Þ

Assuming one OP to last for year, the ENSi,d,j is

ENSi;d;j ¼ kjrj � Pinteri;d;j ½MWh�; ð6Þ

where kj is the fault rate, and rj the outage time, for contingency j.The annual energy not supplied is then

ENSa ¼X

i

X

d

X

j

ENSai;d;j �

hi

8736½MWh�: ð7Þ

where hi is the lifetime (in hours) for the operational state i, and8736 (24 � 7 � 52) is the total hours in an ‘‘EMPS-year’’.

tion in area 1–3 is dominated by hydro power, while area 4 represent a mixture of


4.2. Feature representation and outliers

There are 17 buses in the test network where there are loadsand/or generation. The real power inputs at these buses are thefeatures used for the clustering process.

The load on the test network is quite light, and no OPs have SACwhich are considered gross outliers. To use the results of the anal-ysis to detect outliers is of course impossible in practice, but it sim-plifies the case study, as the extent of the analysis is limited. SomeOPs cause all loads to be lost, which make the assumption about nooutliers quite crude.

Fig. 6. The Wk as a function of number of clusters. The lower line is for k-meansclustering, while the upper line is Wk for agglomerative clustering with completelinkage.

4.3. Clustering and worst case modelling

K-Means and agglomerative clustering are used, and the worstcase modelling mentioned in the introduction is included. Worstcase modelling amounts to picking the high load OPs for week 4,16 and 40, and let these represent a year. This is illustrated inFig. 5a. A grey pixel at (i, j) means that observations indexed by iand j belongs to the same group. One can see from the figure thatweek four represent week 1–14 and 42–52. A k-means clustering(with a total of 150 clusters) is shown in Fig. 5b. While the worstcase model forces OPs to be grouped with OPs within the sameyear, the k-means have more freedom, and let OPs group togetherwith OPs from different years. Such an illustration can be done forall 10,400 OPs, but will be impossible to read in this format. Notethat while there are 3 clusters in the worst case model, there are63 different clusters present in year 1 with the k-means approach.

The Wk is plotted in Fig. 6, where the lower curve is the k-meansclustering solution, and the upper curve represent the result ofagglomerative clustering with complete linkage. There is no indica-tion of the ‘‘elbow’’ phenomena in this plot. It is interesting to notethat the k-means produce the best clustering of the two with re-spect to the ‘‘within cluster dispersion’’. The scale on the Wk axisis quite large, but a standardisation of the data did not cause anychange in the shape of the curves, and did not lead to any otherconclusion.

As there is no evidence of the size of k from the data, represen-tative sets of size 150, 500 and 1000 are tested in this case study.This will lead to a substantial reduction of the full set, while a rep-resentative set of size 1000 is believed to capture the main charac-teristics of the full set fairly well.

Fig. 5. Plot of the confusion matrices for the worst case modelling and the k-means clustepixel at (i, j) means that observations indexed by i and j belongs to the same cluster. Notek-means has more freedom and can cluster OPs with those that are further away in timemeans model.

4.4. Results

The system available capacity calculated from the full data set isdenoted SACf,i,d,j, while the system available capacity from the rep-resentative set is denoted SACr,i,d,j. Note that i now range from 1 to10,400.

For a given contingency and a given delivery point, the systemavailable capacities for the full set and the representative set canbe plotted as time series. An error measure can then be the squarederror between the two time series, which is defined by

errðd; jÞ ¼X10;400

i¼1

½SACf ;i;d;j � SACr;i;d;j�2: ð8Þ

The total error for the system is then

err ¼X

d

X

j

errðd; jÞ: ð9Þ

The err for different clustering algorithms are plotted in Fig. 7,as a function of the number of clusters. Agglomerative clusteringwith single linkage possesses the typical chaining effect for the gi-ven data set, and produces a much larger error than the othermethods, and is therefore omitted from the figure. The worst case

ring, for year 1 with a total of 150 clusters spread out over all the 10,400 OPs. A greythat while the worst case modelling forces neighbouring states into the same cluster,, but possibly are more alike. There are 63 different clusters within year 1 in the k-

Fig. 7. The total error for the system as defined in (9), for different clusteringalgorithms, plotted as a function of the number of groups in the representative set.

Table 1ENSa (MWh) – k-means clustering.

Year ENS 150 500 1000 Worst case

ENS % ENS % ENS % ENS %

16 0.567 0.995 77.2 0.808 42.5 0.602 7.2 1.491 162.932 1.692 1.689 �0.2 1.891 11.7 1.464 �13.5 1.424 �15.830 3.167 3.180 0.4 2.849 �10.1 3.139 �0.9 6.786 114.3

Table 2ENSa (MWh) – agglomerative clustering.

Year ENS 150 500 1000 Worst case

ENS % ENS % ENS % ENS %

16 0.567 0.596 6.0 0.437 �22.3 0.511 �9.1 1.491 162.932 1.692 1.536 �9.2 1.566 �7.4 1.666 �1.5 1.424 �15.830 3.167 2.860 �9.7 2.808 �11.3 3.006 �5.1 6.786 114.3

Fig. 8. ENSar � ENSa

f per operational state for year 16. The two graphs represent theresult based on agglomerative clustering, with 150 (solid line) and 500 (dotted line)clusters in the representative set.


modelling produce 150 clusters, but the error is much higher thanthe errors at k = 150 in Fig. 7, and is therefore not plotted. The k-means is the overall best method, which is in agreement withthe Wk in Fig. 6. The reason why the error for k-means sometimesincreases as k is increased is due to the instability of the k-meansalgorithm – only a limited amount of initial partitions were testedfor each fixed k.

ENS is calculated for each of the 50 years. In terms of ENS, year16 is the best, year 30 is the worst, and year 32 is the median,based on an analysis of the full data set. It is impractical to reportresults for different clustering algorithms for all 50 years, and onlythese three years are considered. In Table 1 the ENS for year 16, 30and 32 are reported for the representative sets formed by k-meansclustering. The result of the worst case approach is included forcomparison. The percentage deviation from the ENS from the fullanalysis is also included. In Table 2, the same is done for agglom-erative clustering with complete linkage.

For year 16 in Table 2, the ENS error increases as the number ofgroups are increased from 150 to 500. This seems counter-intuitiveas more data should generally lead to better estimates. This behav-iour is a result of how the ENS is calculated, and can be explainedby Fig. 8. In Fig. 8, the difference between ENSa

r;i and ENSaf ;i is plot-

ted for year 16. In general, the estimates based upon 500 clustersare better, but if one sum all the errors, the total error associatedwith 150 clusters is less since the error per operational state ismore evenly distributed around zero. If one sum up the absoluteerrors the larger representative set will be better.

4.5. Comments

As there was no indication of a natural choice of k from the data,cluster sizes of 150, 500, and 1000 are used to illustrate the frame-work. The perhaps most interesting observation is that the frame-works approach of picking 150 representative OPs is considerablebetter than using the worst case modelling approach.

For 1000 clusters, the errors in Tables 1 and 2 are roughly with-in a ±10% margin. This is also the case for the ENS for the otheryears not reported here (Two exceptions, with errors of about20%). Considering the quite crude procedure used here, i.e., notconsidering feature representation, outliers, and no estimation ofk, the results of the case study are promising.

5. Concluding remarks

A data reduction framework is outlined, and some issues andproblems are discussed. The results in the case study are quitegood considering the crude averaging done, but the frameworkneeds to be developed further.

For the purpose of data reduction, the goal is to find a represen-tative set, and use this in the further analysis, i.e., the full set willnever be analyzed. In the development of this framework it is nec-essary to compare our representative set with the full set in orderto assess the success of this approach. A major pitfall related tocase studies, such as the one presented, is overfitting the frame-work to a given test case. For instance one could use k-means, finda representative set, and compare the results with the full analysis.If the result is not satisfactory, increase k by one, and repeat until asatisfactory result is reached. This will provide a good result for thegiven test case, but will most likely perform quite bad on othercases due to overfitting.

For further development of the framework, the general ap-proach should be: develop a set of rules/criteria for determiningthe representative set. Find the representative set, and do a casestudy. If not satisfactory, change the criteria, and try again. Withthis approach, one can possible generate a set of criteria of moregenerality, and thus better applicability for different cases.

Clustering based upon different feature was not considered inthe case study. For instance, running a DC power flow to find thetransmission between areas, and use these as features, will de-crease the dimension of the problem, and possibly lead to better


clustering results. Other features can also be included. Power sys-tem operators often have a set of measurements used to describethe current state of the system, and these can possibly be usedas features in this framework. A good feature selection/transforma-tion procedure will be essential to increase the accuracy of theframework.

A screening technique for detecting outliers among the observa-tions also needs to be integrated in the framework, since the clus-tering algorithms presented here are very sensitive to outliers. Aninteresting approach to the clustering problem is to use hierarchi-cal Dirichlet processes (HDPs), [22]. The HDP model takes care ofthe model selection problem (choosing k), outlier detection andcan fit different structures to the data, at the cost of a more com-plex model. This approach is further discussed in [7].

Another problem not considered in the case study is cluster rep-resentation. The choice of representing each cluster only by its cen-troid can be questioned, especially for large clusters, where largemeans high volume. Alternative representations should beconsidered.

Acknowledgement

H. Kile’s work was funded by the research project ‘‘Integrationof methods and tools for security of electricity supply analysis’’at SINTEF Energy AS, Trondheim, Norway.

References

[1] Kjølle G, Gjerde O. Integrated approach for security of electricity supplyanalysis. Int J Syst Assur Eng Manage 2010;1(2):163–9.

[2] Billinton R, Allan RN. Reliability evaluation of power systems. 2nd ed. NewYork: Plenum; 1996.

[3] Wolfgang O, Haugstad A, Mo B, Gjelsvik A, Wangensteen I, Doorman G. Hydroreservoir handling in Norway before and after deregulation. Energy2009;34(10):1642–51.

[4] Singh C, Luo X, Kim H. Power system adequacy and security calculations usingMonte Carlo simulation incorporating intelligent system methodology. In: Proc

9th int conf on probabilistic methods applied to power systems (PMAPS),Stockholm, Sweden; 2006. p. 1–9.

[5] Luo X, Singh C, Patton A. Power system reliability evaluation using selforganizing map. In: Proc IEEE power eng soc winter meeting, vol. 2; 2000. p.1103–8.

[6] Jota PR, Silva VR, Jota FG. Building load management using cluster andstatistical analyses. Int J Electr Power Energy Syst 2011;33(8):1498–505.

[7] Kile H, Uhlen K. Averaging operating states with infinite mixtures in reliabilityanalysis of transmission networks. In: Proc 12th int conf on probabilisticmethods applied to power systems (PMAPS), Istanbul, Turkey; 2012. p. 670–5.

[8] Verma K, Niazi K. Supervised learning approach to online contingencyscreening and ranking in power systems. Int J Electr Power Energy Syst2012;38(1):97–104.

[9] da Silva AML, de Resende LC, da Fonseca Manso LA, Miranda V. Compositereliability assessment based on Monte Carlo simulation and artificial neuralnetworks. IEEE Trans Power Syst 2007;22(3):1202–9.

[10] Kim H, Singh C. Power system probabilistic security assessment using Bayesclassifier. Electr Power Syst Res 2005;74(1):157–65.

[11] Pindoriya NM, Jirutitijaroen PJ, Jirutitijaroen P, Singh C. Composite reliabilityevaluation using Monte Carlo simulation and least squares support vectorclassifier. IEEE Trans Power Syst 2011;26(4):2483–90.

[12] Xia C, Wang J, McMenemy K. Short, medium and long term load forecastingmodel and virtual load forecaster based on radial basis function neuralnetworks. Int J Electr Power Energy Syst 2010;32(7):743–50.

[13] Karami A, Mohammadi M. Radial basis function neural network for powersystem load-flow. Int J Electr Power Energy Syst 2008;30(1):60–6.

[14] Gröwe-Kuska N, Heitsch H, Romisch W. Scenario reduction and scenario treeconstruction for power management problems. In: Proc IEEE power tech conf,Bologna, Italy; 2003. p. 1–7.

[15] Li XB. Data reduction via adaptive sampling. Commun Inform Syst2002;2(1):53–68.

[16] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. 2nded. Springer-Verlag; 2008.

[17] Xu R, Wunsch D. Clustering. Oxford: Wiley; 2009.[18] Jain A, Duin R, Mao J. Statistical pattern recognition: a review. IEEE Trans

Pattern Anal Mach Intell 2000;22(1):4–37.[19] Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data

set with the gap statistic. J Roy Statist Soc Ser B 2001;63(2):411–23.[20] Helseth A, Warland G, Mo B. Long-term hydro-thermal scheduling including

network constraints. In: Proc. 7th int conf on the European energy market(EEM), Madrid, Spain; 2010. p. 1–6.

[21] Samdal K, Kjølle G, Gjerde O, Heggset J, Holen A. Requirement specification forreliability analysis in meshed power networks. Tech rep A-6429. SINTEFEnergy AS, Trondheim, Norway; 2006.

[22] Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. J AmStatist Assoc 2006;101(476):1566–81.

data reduction via clustering and averaging for contingency and reliability analysis

Documents