meteorological data analysis using self-organizing maps

Meteorological Data Analysis UsingSelf-Organizing MapsTatiana Tambouratzis,1,∗ George Tambouratzis2,†1Department of Industrial Management & Technology, University of Piraeus,107 Deligiorgi St., Piraeus 185 34, Greece2Institute for Language and Speech Processing, Artemidos 6 & Epidavrou,Paradissos Amaroussiou 151 25, Athens, Greece

A data analysis task is described, which is focused on the clustering of high-dimensional mete-orological data collected long term (more than 43 years) at 128 weather stations in Greece. Theproposed hybrid method combines (a) the assignment of the stations to two-dimensional grids ofnodes via self-organizing maps (SOMs) of various sizes and (b) statistical clustering of the SOMnodes. The areas resulting from clustering have well-defined meteorological profiles; they are alsodescribed by distinct combinations of morphological and geographical characteristics, indicatingthat morphology and geographical location largely affect the meteorological measurements. Themost salient data parameters per area as well as over the entire map are determined, wherebythe parameters and parameter ranges that shape the various meteorological profiles are exposed.The classification of stations with missing and noise-contaminated meteorological measurementsinto their expected areas demonstrates the prediction capability and robustness of the proposedhybrid method. C© 2008 Wiley Periodicals, Inc.

1. INTRODUCTION

The analysis and clustering of high-dimensional data (i.e., data characterizedby a large number of parameters) aim both at exposing the natural groups that existin the dataset and at extracting the salient information that is inherent in the data.Analysis and clustering are affected by such factors as

• Data parameter selection. Different parameter sets may generate distinct classificationresults. On the one hand, the repetition of parameters or the occurrence of highly dependentparameters in the parameter set may increase their saliency disproportionally over that ofthe other parameters in the parameter set. On the other hand, the elimination of repeated or

This paper is dedicated to the memory of our beloved father Dr. Professor Demetrius G.Tambouratzis, who lost the fight against Amyotrophic Lateral Sclerosis (Lou Gehrig’s Disease)(ALS) on June 14, 2004.

∗Author to whom all correspondence should be addressed: [email protected];[email protected].

†e-mail: giorg [email protected].

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 23, 735–759 (2008)C© 2008 Wiley Periodicals, Inc. Published online in Wiley InterScience(www.interscience.wiley.com). • DOI 10.1002/int.20294

736 TAMBOURATZIS AND TAMBOURATZIS

dependent parameters may distort the original parameter set and produce counterintuitiveclustering and classification results.

• Uniform normalization. Depending on the distribution of the parameter values, uniformnormalization of all parameters may place more emphasis on parameters with smallerranges over those with larger ranges. In such cases, the distance relations between unscaledand normalized data may be significantly modified,a whereby considerably different anal-ysis and clustering results are obtained when working with the unscaled and normalizeddatasets.

The task studied in this piece of research involves the analysis of (a) theoriginal 28-dimensional and (b) the reduced 20-dimensionalb meteorological datacollected long term (over a period of 43 years) at 130 weather stations in Greece.A hybrid method has been employed: following the assignment of the stations totwo-dimensional self-organizing maps (SOM)1 of various sizes and the selection ofthe maps most capable of preserving the topology of the dataset, statistical-basedclustering2,3 has been employed for partitioning the SOM nodes into clusters).4,5

The hybrid method has been found effective at clustering the stations and partition-ing Greece into areas such that stations in the same area have similar meteorologicalprofiles, whereas stations classified in different areas have distinct meteorologicalprofiles. Especially when working with the original 28-dimensional data, the areasare described by distinct morphological and geographical characteristics, thus indi-cating that morphology and geographical location largely affect the meteorologicalmeasurements.

The most salient data parameters for classification have been uncovered bydetermining the parameters whose values vary in accordance with the emergent orderof the SOM; concurrently, the most salient parameters per area have been establishedby their ability—in terms of parameter values—to distinguish the area of interestfrom the other areas. The effects of parameter selection, uniform normalization,and map size have been investigated and evaluated. The successful classificationof stations with missing and noise-contaminated meteorological measurements intotheir expected areas demonstrates the prediction capability and robustness of theproposed hybrid method.

This paper is organized as follows: the principles and properties of the SOMare presented in Section 2; the data employed for the analysis task are describedin Section 3; the results generated during the analysis of the dataset are given inSection 4; Section 5 concludes the paper.

2. THE SELF-ORGANIZING MAP

2.1. SOM Structure

The processes of competition, self-organization, and emerging order abound inbiological neural networks: competitive (winner-take-all) groups of neurons become

aDistance modification becomes especially apparent for high-dimensional data, where it isunlikely that the ranges of all parameters are comparable.

bThe reduced dataset has been created after the elimination of the highly dependentparameters.

International Journal of Intelligent Systems DOI 10.1002/int

METEOROLOGICAL DATA ANALYSIS USING SELF-ORGANIZING MAPS 737

organized during the presentation of stimuli in such a manner that neighboring neu-rons are sensitized to similar stimuli while increasingly remote neurons respondto increasingly different stimuli. Two of the most well-known examples of emerg-ing order and self-organization via neuron competition are the motor and sensoryhomunculi.6 These constitute topographically mapped human bodies whose head,torso, limbs, and fingers are projected to the tiniest detail on the motor and sensorycortex; projection is not proportional to the actual size of the body part but to theprecision with which it must be controlled. Stimulation of a given body part is trans-ferred as activation of the corresponding part of the sensory homunculus, whereasactivation of a given part of the motor homunculus is transformed into motion ofthe corresponding body part.

Inspired by such biological neural networks, the SOM1 consists of a set of nodesorganized into a regular (one- or, most frequently, two-dimensional) structure. TheSOM self-organization process is based on an unsupervised adaptation law thatgenerates a global ordering of the input patterns through competition between theSOM nodes. Prior to training, the codebook vectors (weights) of the nodes areinitialized either randomly or with predefined values that introduce a partial orderin the map. During training, each input pattern is normalized and, subsequently,presented to the SOM. The winner node (the node whose codebook vector bestmatches the input pattern) together with its neighboring nodes are subjected to thefollowing adaptation rule:

mi(t + 1) = mi(t) + a(t)[x(t) − mi(t)], ∀i ∈ Nc(t)mi(t + 1) = mi(t), ∀i /∈ Nc(t) (1)

where mi(t) is the codebook vector of node i at time t , a(t) the gain factor at time t ,x(t) the input pattern at time t , and Nc(t) a function denoting the nodes that belongto the neighborhood of the winner node at time t . Both Nc(t) and a(t) decrease astraining progresses, in effect dividing the SOM training into two phases:

• Rough training. A gradual ordering of the nodes is achieved, with each neighborhoodcomprising several nodes.

• Fine-tuning. The codebook vectors become fine-tuned to their optimal values, with eachneighborhood comprising a very small number of nodes.

The shrinking neighborhoods promote emergent order, with neighboring nodeslearning to respond alike to each input pattern and increasingly distant nodes learningto respond to progressively dissimilar input patterns. In fact, following a sufficientamount of training, the codebook vectors of neighboring nodes are similar, whereasthose of increasingly distant nodes are gradually more dissimilar.

2.2. SOM Properties

An important property of the SOM is its topology preservation capability,which is derived from the ordering of the codebook vectors during training(Section 2.1). Topology preservation is responsible for the pattern classification



property of the SOM: the map can be partitioned into classes of nodes, where eachclass is characterized by specific combinations of parameter values/ranges and de-fines a distinct subset of the dataset. Following class creation, novel input patternsare classified according to the corresponding winner node: the class to which thewinner node is assigned constitutes the class to which the input pattern belongs.

Extensive studies of the SOM have resulted in various metrics and similaritycriteria that express the successful formation of topology-preserving mappings.7,8

The unified distance matrix (U-matrix) constitutes the standard way of visualizingthe distances between neighboring nodes and partitioning the map into classes:1,9

a small value in the U-matrix denotes a small distance between neighboring code-book vectors and thus supports the placement of the corresponding nodes in thesame class; by contrast, a large value in the U-matrix denotes a large distancebetween neighboring codebook vectors and thus suggests a class-border betweencorresponding nodes. Existing methodologies for class creation include

• Observation. Each class comprises nodes with sufficiently similar codebook vectors(sufficiently small U-matrix values).

• Hybrid statistical clustering.5 Following inspection of the U-matrix, the nodes are groupedinto classes via hierarchical k-means clustering.3

• Three-step hybrid clustering.4 Initially, an oversized SOM is trained with the data andthe resulting codebook vectors are clustered into n classes via the k-nearest neighbormethod.3 Subsequently, a SOM with n nodes is trained with the data; each of the resultingn classes comprises all the patterns of the dataset that are assigned to the same node.

Another appealing (though largely unexplored) property of the SOM is itsability to indicate the important parameters of the dataset. This property has beeninvestigated here, resulting in

a. The most salient parameters for shaping the SOM. A parameter that is ordered in thetrained SOM is assumed to influence the emergent order and self-organization of the map,a fact that is indicative of its saliency.1 By contrast, a parameter that is not ordered in thetrained SOM points toward a lack of significance in shaping the map.

b. The parameters that characterize the different clusters. Parameters whose values, for agiven class, are distinct from the corresponding values for other classes are assumed to beimportant for classification, especially for pinpointing the differences between classes.

3. THE METEOROLOGICAL DATA

The National Meteorological Service (EMY) of Greece maintains a networkof 130 stations covering the Greek territory (shown in Figure 1). The objective ofthis network is to study the weather patterns of different locations and provide thefoundation for weather forecasts. The released data,10 which has been employedfor the present analysis and clustering task, includes the 28 parameters described inTable I; each parameter comprises a single value equaling the numeric average over43 years of collection (from 1955 to 1997). Owing to extensive averaging (over themeasuring interval to either daily, monthly, or yearly averages and, subsequently,



Figure 1. Geographical map of Greece with the locations of the 130 EMY stations; the circlesdemarcate the two stations with missing parameter values.

to the pooled averages over the 43 years), the dataset is assumed to be practicallynoise-free. Two of the 130 stations (stations 679 and 739 in Figure 1) have some—two and four, respectively—of their parameter values missing. Together with thenovel data that have been generated with missing and noise-contaminated parametervalues, these stations have been retained for testing the SOM; the data from the other128 stations has been employed exclusively for training.

Parameter selection has been performed in such a manner as to

• eliminate highly correlated (i.e., repeated or dependent) parameters of the original param-eter set that collectively describe the same natural phenomenon. Elimination reverses theapparent increase in saliency of such natural phenomena over others that are describedby a single parameter;



Table I. The 28 data parameters measured at the 130 EMY stations, accompanied bytheir ranges prior to normalization. The highlighted rows denote the repeated/dependentparameters that have been eliminated in the creation of the reduced dataset.

Meteorological parameter Range

1 Yearly mean(ambient temperature) (◦C) 8.62 Yearly average of daily max(ambient temperature) (◦C) 7.43 Yearly average of daily min(ambient temperature) (◦C) 13.14 Yearly max(ambient temperature) (◦C) 135 Yearly min(ambient temperature) (◦C) 24.66 Yearly average of monthly max(ambient temperature) (◦C) 77 Yearly average of monthly min(ambient temperature) (◦C) 14.78 Yearly mean(relative humidity) (%) 219 Yearly mean(cloud cover) (1/8th) 2.610 Yearly mean(precipitation) (mm) 1520.211 Yearly max(precipitation) (mm) 251.812 Yearly mean(wind speed) (km/h) 13.213 Yearly number of days with cloud cover in [0,1.5]/8ths 179.314 Yearly number of days with cloud cover in [1.6,6.4]/8ths 139.115 Yearly number of days with cloud cover in [6.5,8]/8ths 102.516 Yearly number of days with showers 10717 Yearly number of days with rain 90.418 Yearly number of days with snow 27.119 Yearly number of days with thunderstorms 56.720 Yearly number of days with hail 8.421 Yearly number of days with snow-covered ground 36.722 Yearly number of days with fog 50.123 Yearly number of days with dew 137.824 Yearly number of days with rime 84.725 Yearly number of days with partial ground frost (min(ambient temperature) ≤ 0) 117.426 Yearly number of days with total ground frost (max(ambient temperature) ≤ 0) 12.227 Yearly number of days with max(wind speed) ≥ 6Bf 124.728 Yearly number of days with max(wind speed) ≥ 8Bf 26.1

• unless highly correlated with other parameters, retain at least one parameter from eachnatural phenomenon. Meteorological parameter preservation minimizes the distortion ofthe original dataset;

• preserve the distance relations of the original dataset.

Table II depicts the grouping of the 28 data parameters into the 12 naturalphenomena measured in the original dataset. For a parameter A to be eliminated,the following must be satisfied:

a. At least one additional parameter B from the same group must have a high (exceeding0.8) correlation coefficient (CC) with parameter A, i.e., CC(A,B) > 0.8. This criterionindicates that A and B are likely to contain the same information, i.e. that either of themmay be redundant.

b. The correlation coefficients of parameters A and B with every remaining parameterC of the dataset (from the same as well as from the other groups) must be similar,i.e., CC(A,C) ≈ CC(A,B). This criterion indicates that the relation of A with the other



Table II. Elimination of parameters carrying redundant information concerning thesame natural phenomenon.

Natural phenomenon Parameter Correlation coefficient exceeding 0.8 Elimination

CC(3,7) = +0.987 7Temperature 1, 2, 3, 4, 5, 6, 7 CC(1,3) = +0.899 3

CC(1,2) = +0.859 2CC(1,5) = +0.808 5

Humidity 8 CC(8,∗) < ±0.8 –Cloud 9, 13, 14, 15 CC(9,13) = −0.941 13Precipitation 10, 11, 16, 17 CC(16,17) =+0.979 17Wind 12, 27, 28 CC(12,27) =+0.881 27Snow 18 CC(18,∗) < ±0.8 –Thunderstorms 19 CC(19,∗) < ±0.8 –Hail 20 CC(20,∗) < ±0.8 –Fog 22 CC(22,∗) < ±0.8 –Ground frost/snow 21, 25, 26 CC(21,26) =+0.829 26Dew 23 CC(23,∗) < ±0.8 –Rime 24 CC(24,∗) < ±0.8 –

parameters in the dataset is highly similar to that of B with the same parameters, furthersupporting that either of them is redundant.

c. Parameter A must describe a less general (e.g., min or max rather than mean in Table I)measurement than parameter B. This criterion ensures minimal loss of information.

Following this elimination procedure, a total of eight parameters have beenremoved (highlighted in Table I, rightmost column of Table II). As neither of theparameters uniquely describing a natural phenomenon has been found to be highlycorrelated with other parameters, all of them have been retained in the reduced(20-dimensional) dataset.

The amount of data distortion owing to parameter selection (i.e., the extent ofthe modification of the distance relations between original and reduced datasets)has been examined by comparing the order of similarity of the stations in the twodatasets. Two lists have been created per station registering the 128 stations sortedin increasing order according to their original and reduced distances, respectively,from the given station. These pairs of lists have been compared by calculating thenumber of moves necessary for shifting from the position of each station in one list(of the reduced distances) to its position in the other list (of the original distances);Table III presents the total (summed over the 128 pairs of lists, i.e., over all stations)

Table III. The total and average number of moves necessary fortransforming the 128 lists ordered according to the original distances tothose ordered according to the reduced distances between stations.

Required moves Original-reduced data

Total 52,863Per list (average) 413



and average number of moves (per pair of lists, i.e., per station) resulting from thiscomparison. Taking into account that the worst case involves around 8000 movesper list, the performed parameter reduction does not significantly alter the distancerelations of the original dataset. To investigate the effect of the performed parameterselection and elimination procedure on the performance of the hybrid method,the following analysis has been carried out on both original (28-dimensional) andreduced (20-dimensional) datasets.

4. EXPERIMENTAL RESULTS

4.1. Uniform Normalization

Two kinds of normalization have been applied to each data parameter:

• Range normalization. The parameter is scaled linearly within the range [0,1], accordingto

x ′ = (x − min(x))

(max(x) − min(x))(2)

where x is the unscaled observation, max(x) the maximum of all observations, min(x) theminimum of all observations, and x′ the normalized (scaled) value of the observation.

• Variance normalization. The parameter is scaled such that its standard deviation (esti-mated over all observations) σ̂x is normalized to unity and its mean (estimated over allobservations) ˆ̄x to zero, according to

x ′ = (x − ˆ̄x)

σ̂x

(3)

where x is the unscaled observation and x′ the normalized (scaled) value of theobservation.

The amount of data distortion due to normalization (i.e., the extent of themodification of the distance relations between unscaled and normalized datasets)has been examined, separately for the original and reduced datasets, via the listcomparison procedure described in Section 3. For each normalization, the 128 pairsof lists (registering the 128 stations sorted in increasing order according to theirunscaled and normalized distances, respectively, from each station) have been com-pared. Table IV shows the total and average numbers of moves for the original data,separately for each normalization; the same is shown in Table IV for the reduceddata. The significant number of moves observed for both normalizations as well asfor both original and reduced data is caused by the application of uniform normal-ization to the notably dissimilar ranges of the data parameters (rightmost column ofTable I). The two normalizations are roughly equivalent in terms of data distortion,with variance being slightly superior to range normalization for the original dataand range being slightly superior to variance normalization for the reduced data.

Another point of interest is how proximal the two normalizations are, sep-arately for the original and reduced datasets. To this end, the 128 pairs of lists



Table IV. The total and average number of moves necessary for transforming the128 lists ordered according to the normalized distances to those ordered accordingto the unscaled distances between stations; original and reduced data.

Required moves Range normalization Variance normalization

Original dataTotal 599,706 588,212Per list (average) 4,685 4,595

Reduced dataTotal 597,609 601,422Per list (average) 4,669 4,699

corresponding to the two normalizations of each dataset have been compared.Table V tabulates the total and average numbers of moves resulting from thiscomparison. Although the two normalizations for the original data are proximal,they are quite distant for the reduced data. The increased discrepancy observed forthe reduced data is problem specific and attributed to the elimination of five, outof the total eight, parameters (parameters 2, 3, 5, 7, and 26) with small ranges andrelatively smooth distribution of their parameter values.

4.2. Hybrid Method

4.2.1. SOM Training: Topology Preservation

Two-dimensional SOMs with hexagonal neighborhoods have been employed.The 16-tested map configurations result from the combination of

a. original/reduced datasets,b. variance-/range-normalized data, andc. four map sizes, namely 2 × 5, 3 × 5, 6 × 11, and 12 × 22, which generate a variety of

dense and sparse projections of the data on the maps.

Random initialization of the codebook vectors of the SOM has been performed,and a bubble-type activation function has been employed during training.11 Thetraining procedure is relatively lengthy; it involves 1000 and 10,000 iterations ofthe entire training set for the rough-training and fine-tuning phases, respectively,promoting the ordering of the patterns on the SOM.

Table V. The total and average number of moves necessary fortransforming the 128 lists ordered according to the range normalizeddistances to those ordered according to the variance normalizeddistances between stations.

Required moves Original data Reduced data

Total 58,098 239,522Per list (average) 454 1871



The topology preservation property of a map is expressed by the extent towhich the distance relations between the normalized input patterns are preserved onthe map, so that similar/dissimilar patterns of the normalized dataset are projected tonearby/distant nodes of the SOM. For a map configuration generated by a particularnormalization, the 128 lists corresponding to this normalization have been employed:each station S in the list of a given station GS (registering the stations sorted inincreasing order according to their normalized distances from GS) has been attributedto the hexagonal nodal distance between the nodes of the SOM to which Sand GShave been assigned. Ideally, the nodal distances in a list must be nondecreasing,expressing a step diagonal that begins from 0 (for the first station(s) in the list) andends at the greatest hexagonal nodal distance between the node to which GS has beenassigned and all nonempty SOM nodes. Figure 2 illustrates the actual and idealc

step diagonals of station 600 for two different map configurations: a configurationwhere actual and ideal diagonals for station 600 agree to a large extent (Figure 2a),and a map configuration where the rugged actual diagonal expresses frequent andsignificant discrepancies from the ideal diagonal for station 600 (Figure 2b).

A map configuration with close-to-ideal step diagonals for many stations en-tails topology preservation, whereas far-from-ideal diagonals for many stationsreveal inferior topology preservation. The sum of absolute deviations between ac-tual and corresponding ideal step diagonals for the 128 stations (lists) of each mapconfiguration has been calculated. To render the results comparable for

• different map sizes (smaller maps have smaller maximum nodal distances than largerones), and

• different nodes in the same map (nodes closer to the edges of the maps have largermaximum nodal distances than nodes near the center of the map),

the absolute deviations of each list have been scaled by the maximum nodal distancein the list. Table VI shows the results of the topology preservation evaluation for theoriginal data. It can be seen that the 3 × 5 maps are best at preserving the topologyof the dataset, with variance being slightly superior to range normalization; theseare closely followed by the 12 × 22 maps, with range being slightly superior tovariance normalization. The 6 × 11 maps have been found inferior in terms oftopology preservation, whereas dimensionality reduction in the 2 × 5 maps hasbeen found too intense to allow an accurate representation of the global orderingof the dataset. Table VI also shows the same evaluation for the reduced data. Theeffect of map size is pronounced, with larger maps demonstrating enhanced topol-ogy preservation; for all map sizes, the range has been found superior to variancenormalization.

Taken over the 16-map configurations, the SOMs most capable of preserving thetopology of the normalized datasets are the 3 × 5 map with variance normalizationfor the original data, the 3 × 5 map with range normalization for the original data,the 12 × 22 map with range normalization for the reduced data, the 12 × 22 map

cThe ideal step diagonal has been implemented by the actual diagonal sorted in an increasingorder.



0 10 20 30 40 50 60 70 80 90 100 110 120 1300

1

2

3

4

5

Stations in ordered list

No

da

l d

ista

nc

e

(a)

0 10 20 30 40 50 60 70 80 90 100 110 120 1300

1

2

3

4

5

Stations in ordered list

No

da

l d

ista

nc

e

(b)

Figure 2. Actual diagonal (light thin line) superimposed on the ideal step diagonal (dark thickline) for station 600; (a) 3 × 5 map with variance normalization for the original data; (b) 2 × 5map with range normalization for the original data.



Table VI. Comparison of the topology preservation capability of the 16 map configurations;original and reduced data.

Original DataMap normalization 2 × 5 3 × 5 6 × 11 12 × 22Range 3358.1 2602.5 3231.6 2735.6Variance 3202.3 2545.9 3125.6 2802.4

Reduced dataMap normalization 2 × 5 3 × 5 6 × 11 12 × 22Range 3598.1 3054.1 2832.1 2680.5Variance 3857.0 3713.9 3321.8 3095.8

with range normalization for the original data, and the 12 × 22 map with variancenormalization for the original data.

4.2.2. Statistical Clustering of the SOM Codebook Vectors

Ward’s statistical method of agglomerative hierarchical clustering (Ref. 2) hasbeen employed for grouping the SOM nodes and exposing the classes of the dataset;the squared Euclidean distance metric has been utilized. For the 1st clustering step,each node forms a distinct cluster; at subsequent kth clustering steps (k = 2, 3, . . . ),the clusters situated at the smallest distance dk from each other are grouped together,provided that

dk − dk−1

dk−1> T (4)

where T is a predefined positive threshold cutoff. Since the desired number ofclusters in not known a priori, (4)—and by extension the value of T —provide thetermination criterion of clustering. T = 0.5 has been selected, invariably resultingin three classes for the 16 map configurations.

Figure 3 shows the clustering dendrogram, the distances dk (k = 1, 2, . . . , 13),the threshold cutoff, as well as the three classes obtained when clustering the nodesof the 3 × 5 map with variance normalization for the original data. Figure 4 illustratesthe four map configurations most capable of topology preservation, together withthe classification of the SOM nodes into three classes (white, black, and gray areas).Figure 5 depicts the classification of the stations of Figure 1 for the four mapconfigurations of Figure 4; each station has been annotated with a square markerinheriting its color (white, black, and gray) from the color of the area to which thecorresponding SOM node has been classified.

A comparison of the classification results for the 16 map configurations revealsthe effects of

• Parameter selection. Comparing homologous map configurations (i.e., SOMs with thesame normalization and map size), significant differences in clustering are observedwhen working with the original and the reduced data. The reduced dataset producesmaps that are on average less capable of topology preservation than those for theoriginal data (Table VI). Furthermore, the clustering and classification results are not



Node 5

Node 7

0.510

Node 6

Node 2

Node 4

Node 3

Node 1

Node 8

Node 9

Node 12

Node 10

Node 13

Node 11

Node 14

Node 15

1.025

6.459

8.57615.408

1.555

2.101

2.838

3.675

4.737

11.700

21.051

33.578

Threshold cutoff

Gray area

White area

Black area

Figure 3. Clustering dendrogram corresponding to the 3 × 5 map with variance normalizationfor the original data.

always in accordance with intuition for the reduced data: the generated areas haverugged and interpenetrating borders (e.g., Figure 5c, especially when contrasted with thesmoother areas of the corresponding map configuration for the original data in Figure 5d),whose discontinuities cannot be readily explained in terms of the 12 natural phenomena(Table II).

• Uniform normalization. For the original dataset and for the same map size, minor differ-ences are observed between range and variance normalizations: the stations are placed inthe same/nearby nodes (e.g., Figures 4a and 4b) and are generally clustered into the sameclasses (e.g., Figures 5a and 5b). For the reduced dataset and as a result of the biggerdiscrepancy between the two normalizations (Table V), the differences are more extensivefor maps of the same size.

• Map size. For both original and reduced data, the failure of the smaller (2 × 5) mapsto accurately represent the topology of the high-dimensional datasets is due to exces-sive dimensionality reduction (Table VI); for larger maps, size affects both the relativeposition of the stations on the SOM and the relative distance of the stations allocatedto neighboring nodes (U-matrix). For the original data and both normalizations, this isexpressed mainly by differences in the clustering dendograms, thus in the borders be-tween the three (especially the black and gray) areas. For example, the area encompassingEvia, the islands of the Central Aegean sea and Southern Kriti is allocated to the black areafor the 3 × 5 map with range normalization and the gray area for the 12 × 22 map withrange normalization (Figures 5b and 5d, respectively); an observation of the clusteringdendogram of the 12 × 22 map with range normalization one step prior to terminationdemonstrates that this area constitutes an individual cluster that is grouped with the gray,rather than the black, area. For the reduced data, progressively larger maps allow moresatisfactory ordering (Table VI). However, more crucial differences are encountered inthe allocation of the stations on the SOM for maps of different sizes, frequently resultingin diverse classification results.



Figure 4. Clustering of the SOM nodes into three classes/areas: (a) 3 × 5 map with variancenormalization for the original data; (b) 3 × 5 map with range normalization for the original data;(c) 12 × 22 map with range normalization for the reduced data; (d) 12 × 22 map with rangenormalization for the original data.

The findings concerning

• the increased discrepancy between the two normalizations,• inferior topology preservation,• less consistent classification results for varying map sizes, and• the generation of less continuous areas

for the reduced data, draw attention to the risks involved with parameter eliminationand support the prevalent decision against parameter selection.d (e.g., Refs. 12–14)For the present task, the application of parameter selection combined with uniformnormalization distorts the original unscaled data in such a manner as to hinder thecreation of meaningful areas, a fact that is not obvious from the preservation ofthe distance relations between data patterns alone (Table III). By contrast, retain-ing the original dataset allows the formation of continuous as well as meaningfulclasses/areas, despite the distortion effected by normalization.

For the remainder of the paper, the results derived from the two 3 × 5 maps forthe original data (i.e., the two map configurations that have been found most capableof topology preservation) are reported. For both maps, evidence of local groupingis apparent. For instance, all stations located in Thraki have been assigned to the

dEspecially in SOM applications, where dimensionality reduction is inherent to the mapping.



Figure 5. Classification results: (a) 3 × 5 map with variance normalization for the originaldata; (b) 3 × 5 map with range normalization for the original data; (c) 12 × 22 map with rangenormalization for the reduced data; (d) 12 × 22 map with range normalization for the originaldata. The classification of stations 679 and 739 has been marked with dots.

top left node of the two SOMs, whereas all stations situated in the islands of theIonian sea have been assigned to the top right node of the two SOMs (Figures 4aand 4b). Meaningful grouping, in terms of geographical location, extends to largerlocalities, with stations spreading over two or three neighboring nodes from the sameclass, such as for instance the stations located in the islands of the Southern AegeanSea. More globally, geographical location together with morphology corroboratesthe grouping in the SOM. An illustrative example is provided by Kriti, which ischaracterized by a large total area and a diverse relief comprising high mountainsas well as extensive coasts. Solely according to geographical location, all stationswithin Kriti should have been assigned to the black area. However, stations 752 and



763 (Figure 1), which are located on high mountains exceeding 1000 m in altitude,have been assigned to the gray and white areas, respectively, in other words theyhave been classified according to their morphology rather than their geographicallocation. Consequently, the classification performed by the SOM is based on bothmorphology-related and geographical criteria (Figures 5a and 5b):

a. The white area comprises the stations in Northern Greece as well as the stations inthe very mountainous locations of the entire mainland and islands; Thraki, Makedonia,Ipiros, Northern Thessalia, and the mountains of Sterea, Peloponissos and Kriti have beenassigned to the four neighboring nodes occupying the leftmost part of the two SOMs.

b. The black area contains the stations located in the islands (Ionian, Aegean sea) as wellas most of the stations situated at coastal locations and/or locations of low altitude in themainland of Central and Southern Greece; these stations have been assigned to the eightneighboring nodes occupying the rightmost part of the two SOMs.

c. The gray area contains the stations on the boundaries between the black and white areas,i.e., the stations with intermediate geographical location and altitude.

The observation that morphology and geographical location concord with boththe ordering and the clustering in the two SOMs implies that these factors consid-erably influence the measurements of the collected meteorological parameters. Itis interesting to mention that the nodes corresponding to each area are immediateneighbors, with the white area occupying one end of the two SOMs, the gray areaoccupying their central portion and the black area occupying their other end.

4.3. Meteorological Profiles

The extraction of the most salient parameters for classification aims at enrichingthe morphological and geographical characteristics of the three areas of Section 4.2with their meteorological profiles. As for the previous section, only the resultsconcerning the two 3 × 5 maps for the original data have been reported.

4.3.1. Parameters Determining Classification

The parameters that dominate the ordering in the SOM are those whose valuesvary in accordance with the emergent order of the SOM, i.e., those parameters forwhich

a. stations located in the same/neighboring nodes have identical or very similar normalizedparameter values, and

b. stations located in progressively more distant nodes have gradually more dissimilar nor-malized parameter values.

Two criteria of parameter importance (one map-dependent, the other purelydistance-related) for classification have been employed:

a. Parameter saliency. This is expressed by the extent to which the distance relationsbetween the normalized values of a given parameter are preserved on the SOM. For aspecific map configuration and a given parameter scaled by a particular normalization, 128lists have been created where the list of any station GS registers the 128 stations sorted in



increasing order according to their distance from GS in terms of the normalized parameteronly. Subsequently, the 128 actual diagonals have been created in the manner describedin Section 4.2.1. The sum of absolute deviations between actual and corresponding idealdiagonals has been employed as a measure of the saliency of a given parameter in orderingthe specific map configuration.

b. Distance preservation. This is expressed by the extent to which the normalized parameteralone preserves the distance relations of the normalized dataset; it has been evaluated vialist comparison (Section 3), i.e. by comparing the order of similarity of the stations interms of the normalized parameter only with the order of similarity of the stations in thenormalized dataset.

Figure 6a illustrates the saliency estimates overlaid with the distance preser-vation estimates for the range-normalized original data mapped on the 3 × 5 map;for ease of comparison, the maxima of the two estimates have been normalizedto 1. Figure 6b shows the same results for variance-normalized original data. Inaddition, Table VII presents 28 parameters sorted in a decreasing order of saliencyand distance preservation, separately for each normalization.

Providing additional support to the finding that the choice of normalization doesnot significantly affect the distance relations of the original data (Table V), the orderof distance preservation has been found almost identical for the two normalizations(second and fourth columns of Table VII): a single minimal shift (4↔28) has beenobserved. Furthermore, the order of parameter saliency for the two 3 × 5 mapconfigurations is highly similar (first and third columns of Table VII): two minimalshifts (12↔13 and 17↔19) have been observed concerning less salient parameters.These two findings explain the almost identical clustering and classification resultsobtained for the two normalizations (e.g., Figures 4a and 4b and 5a and 5b).

For both normalizations, a satisfactory agreement between saliency and dis-tance preservation has been observed for all, but one, parameters. As distancepreservation is purely distance related (i.e., independent of the mapping of the in-put patterns on the SOM), this agreement emphasizes the topology preservation ofthe 3 × 5 maps, which has been accomplished despite the effect of normalization.The single, but significant, discrepancy between saliency and distance preservationoccurs for parameter 10. This parameter is the most capable of distance preser-vation, but one of the least salient for both SOMs. The discrepancy is due tothe uniform normalization of the 28 parameters: the range (rightmost column ofTable I) of parameter 10 is one to two orders of magnitude larger than that of theother 27 parameters, while its distribution is such that only 16 meteorological sta-tions (12.5%) have values no less than its median. After normalization, the majorityof highly compressed values occupy a very small portion of the lower half of thenormalized range, whereby the difference between the normalized parameter valuesshrinks considerably. Hence, parameter 10 becomes insignificant for ordering theSOM, especially in relation to other parameters with smaller ranges and/or moreevenly distributed values.

Table VIII shows the parameters deemed as most salient for shaping the SOM.

4.3.2. Parameters Determining Class Characteristics

Subranges of some parameters have been found to be distinct in a given area.Such subranges distinguish the given area from the other areas, whereby they



0 5 10 15 20 25 300.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Data parameters (range normalization)

Su

m o

f d

evi

ati

on

s (

sc

ale

d t

o 1

)

(a)

0 5 10 15 20 25 300.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Data parameters (variance normalization)

Su

m o

f d

evi

ati

on

s (

sc

ale

d t

o 1

)

(b)

Figure 6. Parameter saliency of the 28 parameters (line with circles) overlaid with distancepreservation (line with dots): (a) 3 × 5 map with range normalization; (b) 3 × 5 map withvariance normalization. The more salient/distance-preserving a parameter, the lower it appears onthe Y -axis.



Table VII. Parameters sorted in decreasing order of saliency anddistance preservation; 3 × 5 maps for the original data.

Range normalization Variance normalization

Saliency Distance preservation Saliency Distance preservation

26 10 26 109 9 9 9

21 26 21 2620 20 20 2024 21 24 211 1 1 12 2 2 26 28 6 4

18 4 18 284 6 4 65 18 5 18

25 24 25 243 5 3 57 3 7 3

28 12 28 1222 7 22 713 22 12 2212 8 13 815 25 15 2516 23 16 2327 19 27 1917 11 19 1119 17 17 1714 16 14 1611 27 11 2710 15 10 158 14 8 14

23 13 23 13

can be utilized for characterizing the area and further refining its meteorologicalprofile.

The characteristic parameters and sub-ranges that distinguish the three areas forthe two 3 × 5 maps are shown in Table VIII. Combined with the morphological andgeographical characteristics of the three areas ((a–c) of Section 4.2.2), the followingmeteorological profiles have been generated:

a. The white area encompasses Northern Greece as well as the stations in the very moun-tainous locations of the entire mainland and islands (Thraki, Makedonia, Ipiros, NorthernThessalia, and the mountains of Sterea, Peloponissos and Kriti) and is characterized bylow temperatures (low values of average and maximum ambient temperature), a highoccurrence of windy spells, overcast skies, and a high occurrence of days with rain, snow,hail, thunderstorms, fog as well as snow-covered ground, partial and total ground frost.

b. The black area encompasses the islands (Ionian, Aegean sea) as well as most of thecoastal locations and/or locations of low altitude in the mainland of Central and Southern



Table VIII. Important meteorological parameters (a) for classification; (b) for areacharacterization.

(a) For Classification

Parameter Classification

26 Yearly number of days with total ground frost9 Yearly mean(cloud cover)21 Yearly number of days with snow-covered ground20 Yearly number of days with hail24 Yearly number of days with dew1 Yearly mean(ambient temperature)2 Yearly average of daily max(ambient temperature)6 Yearly average of monthly max(ambient temperature)18 Yearly number of days with snow4 Yearly max(ambient temperature)

(b) For area characterization

Parameter White area Gray area Black area Characterization

1 Small values – Large values2 Small values – Large values Temperature4 Small values – Large values6 Small values – Large values8 – Large values – Humidity12 – – Large values28 Large values – – Wind9 – Small values Large values13 Small values – – Cloud14 Large values – –11 Large values – –18 Large values – – Precipitation19 Large values – – Fog20 Large values – – Hail21 Large values – – Thunderstorm22 Large values – – Snow25 Large values – – Frost26 Large values – –

Greece and is characterized by high temperatures (large values of average and maximumambient temperature), strong winds and patchy/cloudy skies.

c. The gray area extends at the boundaries between black and white areas, i.e. encom-passes intermediate geographical locations and altitude, and is distinguished from its twoneighboring areas via high humidity and clear skies.

4.4. SOM Testing

The investigation concerning the prediction capability of the proposed hybridmethod has been performed employing novel data. Both incomplete and noise-contaminated data patterns have been generated (a) from the 128 stations employed



for training the SOM, and (b) from stations 679 and 739, whose missing parametervalues have been completed with the mean values of the corresponding parame-ter ranges (the classification of these two completed stations has been marked inFigure 5 with dots).

More specifically, the incomplete data have been created by removing N pa-rameter values from each of the 130 stations. Twenty-eight incomplete patterns havebeen created for each combination of

• Station,• N = 1, 7, 14, and 21. The N = 7, 14, and 21 missing parameters have been selected at

random for the creation of each incomplete pattern; for N = 1, each of the 28 parametershas been removed once for the creation of an incomplete pattern.

Furthermore, the noise-contaminated data have been created by injecting whitenoise of a given signal-to-noise-ratio (SNR) to N parameter values of each of the130 stations. Twenty-eight noise-contaminated patterns have been created for eachcombination of

• Station,• SNR = 2, 5, 10, and 20,• N = 1, 7, 14, 21, and 28. The N = 7, 14, and 21 parameters have been selected at random

for the creation of each noise-contaminated pattern. For N = 1 and a given SNR level,each of the 28 parameters has been subjected to noise contamination once for the creationof a noise-contaminated pattern; for N = 28 and a given SNR level, the 28 patterns havebeen created by injecting white noise simultaneously to all parameters.

The prediction accuracy of the hybrid method has been expressed via• Node accuracy, i.e. the proportion of responses for which the winner node of the original

station coincides with the winner node of the corresponding novel (incomplete/noise-contaminated) test pattern.

• Class accuracy, i.e. the proportion of responses for which the winner node of the originalstation and the winner node of the corresponding novel test pattern are assigned to thesame area.

As for the previous sections, the results derived from two 3 × 5 maps for theoriginal data have been reported.

The results concerning test patterns with missing parameter values are illus-trated in Figure 7. The gradual degradation due to increasing values of N is obviousand comparable for the two normalizations. As expected, class accuracy is more ro-bust than node accuracy throughout: class accuracy remains satisfactory (exceeding80%) for all tested N values, whereas node accuracy remains satisfactory for up toseven missing parameter values.

Figure 8 shows the results concerning test patterns with noise-contaminatedparameter values. Again, the two normalizations generate comparable results forboth node (Figure 8a) and class (Figure 8b) prediction accuracy, and class is superiorto node accuracy throughout. The gradual degradation due to decreasing SNR levelsas well as to increasing N values is obvious. Concerning the amount of white noiseinjected to the patterns, for SNR levels down to 10 both node and class accuracy are



0 7 14 2150

60

70

80

90

100

Removed parameters

Ac

cu

rac

y (

%)

Node (range norm.)

Node (variance norm.)

Class (range norm.)

Class (variance norm.)

Figure 7. Node and class prediction accuracy of the two 3 × 5 map configurations for 1, 7, 14and 21 missing parameter values.

satisfactory for all tested N values; node accuracy remains satisfactory when up toseven parameters are injected with noise for SNR = 5 and when no more than oneparameter is injected with noise for SNR = 2; by contrast, class accuracy remainssatisfactory when up to 21 parameters are injected with noise for SNR = 5 and whenup to 14 parameters are injected with noise for SNR = 2. Concerning the effect ofN , a single noise-contaminated parameter allows satisfactory node accuracy for alltested SNR levels; the same is true of up to 14 noise-contaminated parameters forclass accuracy; for larger N values, performance degrades proportionally. Figure 9highlights the hardest test concerning the robustness of the proposed classificationapproach, namely the injection of noise of varying SNR levels simultaneously to all28 parameters. Although the hybrid method performs in a satisfactory manner—interms of class prediction accuracy—for SNR levels down to 5, node predictionaccuracy degrades beyond SNR levels of 10. However, even for the lowest SNRlevel of 2, the class accuracy is no less than 74% for range normalization and 67%for variance normalization.

Parameter removal has been found to be an easier classification task than noisecontamination. Both class and node prediction accuracy support the validity ofthe mapping of the dataset on the SOMs as well as of the generated classifica-tion, supporting that the hybrid method constitutes a robust clustering/classificationmethodology.



0 7 14 21 2820

40

60

80

100

Noise-contaminated parameters

No

de

ac

cu

rac

y (

%)

Node (range norm.)


SNR=20

SNR=10

SNR=5

SNR=2

(a)

0 7 14 21 2860

70

80

90

100

Noise-contaminated parameters

Cla

ss

ac

cu

rac

y (

%)

SNR=10

SNR=20

SNR=5

SNR=2

Class (range norm.)


(b)

Figure 8. Prediction accuracy of the two 3 × 5 map configurations for noise of varying levelsinjected to 1, 7, 14, 21, and 28 parameters: (a) node accuracy; (b) class accuracy.



0510152020

30

40

50

60

70

80

90

100

SNR level

Ac

cu

rac

y (

%)

Node (range norm.)


Class (range norm.)


Figure 9. Node and class prediction accuracy of the two 3 × 5 map configurations for noiseinjected to all 28 parameters; varying SNR levels.

5. CONCLUSIONS

A hybrid method has been proposed for the analysis and clustering of high-dimensional meteorological data collected long-term at 128 weather stations inGreece. Initially, uniform normalization has been applied to the data. Following theassignment of the data to two-dimensional SOMs of various sizes and the selectionof the maps most capable of preserving the topology of the dataset, statistical-based clustering has been employed for partitioning the SOM nodes. The generatedclasses partition Greece into areas such that stations in the same area have similarmeteorological profiles, whereas stations classified in different areas have distinctmeteorological profiles. These areas are also described by distinct morphologicaland geographical characteristics, thus indicating that morphology and geographicallocation largely affect the meteorological measurements. In all

a. The area encompassing Northern Greece as well as the very mountainous locations ofthe entire mainland and islands is characterized by low temperatures, a high occurrenceof windy spells, overcast skies, and a high occurrence of days with rain, snow, hail,thunderstorms, fog as well as snow-covered ground, partial and total ground frost.

b. The area encompassing the islands as well as most of the coastal locations and/or locationsof low altitude in the mainland of Central and Southern Greece is characterized by hightemperatures, strong winds, and patchy/cloudy skies.

c. The area extending at the boundaries between the aforementioned two classes encom-passes intermediate geographical locations and altitude, and is characterized by highhumidity and clear skies.



The most salient data parameters for classification have been uncovered bydetermining the parameters whose values vary in accordance with the emergentorder of the SOM; concurrently, the most salient parameters per class have beenestablished by their ability—in terms of parameter values—to distinguish the class ofinterest from the other classes. The accurate classification of novel stations into theirexpected areas and the gradual degradation in performance observed for increasingnoise levels as well as for larger numbers of missing/noise-contaminated parametersdemonstrate the prediction capability and robustness of the proposed hybrid method.

It is worth mentioning that, for the present application, the combination ofuniform normalization and parameter selection (i.e., the elimination of highly cor-related parameters collectively describing the same natural phenomenon) has beenfound to distort the original dataset in such a manner as to hinder the creation ofmeaningful classes/areas. This finding draws attention to the risks involved with pa-rameter elimination and supports the prevalent decision against parameter selection,especially in SOM applications where dimensionality reduction is inherent to themapping.

References

1. Kohonen T. Self-organising maps, 2nd edition. Berlin: Springer-Verlag; 1997.2. Ward, JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc

1963;58:236–244.3. Duda RO, Hart PE, Stork DG. Pattern classification, 2nd edition. New York: Wiley-

Interscience; 2000.4. Waldemark, J. An automated procedure for cluster analysis of multivariate satellite data. Int

J Neural Syst 1997;8:3–15.5. Vesanto J, Alhoniemi E. Clustering of the self-organising map. IEEE Trans Neural Netw

2000;11:586–600.6. Geschwind N. Specializations of the human brain. In: The Brain: A Scientific American

Book. New York: WH. Freeman; 1979. pp 108–119.7. Bauer H-U, Pawelzik KR. Quantifying the neighborhood preservation of self-organizing

feature maps. IEEE Trans Neural Netw 1992;3:570–579.8. Bauer H-U, Herrmann M, Villmann T. Neural maps and topographic vector quantization.

Neural Netw 1999;12:659–676.9. Ultsch A. Self-organized feature maps for monitoring and knowledge acquisition of a chem-

ical process. In: Gielen S, Kappen B. editors. Proceedings of the ICANN-93 Conference,Amsterdam, September 13–16, 1993. Springer-Verlag: Berlin; 1993. pp 864–867.

10. Kornaros G. Climatic data of the stations of the Hellenic National Meteorological Service:Period 1955–1997, Vols 1 and 2. National Meteorological Service (EMY): Athens, Greece;1999 (in Greek).

11. Vesanto J, Himberg J, Alhoniemi E, Parhankangas J. SOM Toolbox for Matlab 5. ReportA57. SOM Toolbox Team, Helsinki University of Technology, Finland; 2000. Available athttp://www.cis.hut.fi/projects/somtoolbox

12. Kaski S, Kohonen T. Exploratory data analysis by the self-organising map: Structures ofwelfare and poverty in the world. In: Proceedings of the Third International Conference onNeural Networks in the Capital Markets, London, October 11–13, 1996. pp 498–507.

13. Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A. Self-organisationof a massive document collection. IEEE Trans Neural Netw 2000;11:574–585.

14. Nikkila J, Toronen P, Kaski S, Venna J, Castren E, Wong G. Analysis and visualization ofgene expression data using self-organizing maps. Neural Netw 2002;15:953–966.


meteorological data analysis using self-organizing maps

Documents