spatial data mining cs 697

58
1 Spatial Data Mining Spatial Data Mining CS 697 CS 697 Assignment 1 Assignment 1 February 16, 2010 February 16, 2010 Pradnya Khutafale, Peter Lucas, Pradnya Khutafale, Peter Lucas, and Chris Maio and Chris Maio Advisor: Dr. Wei Ding Advisor: Dr. Wei Ding Computer Science Department Computer Science Department UMass Boston UMass Boston

Upload: wade-mcfarland

Post on 02-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Spatial Data Mining CS 697. Assignment 1 February 16, 2010 Pradnya Khutafale, Peter Lucas, and Chris Maio Advisor: Dr. Wei Ding Computer Science Department UMass Boston. 1. Discovery of Climate Indices using Clustering. Principal Investigators - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial Data Mining CS 697

1111

Spatial Data MiningSpatial Data MiningCS 697 CS 697

Assignment 1Assignment 1February 16, 2010February 16, 2010

Pradnya Khutafale, Peter Lucas, Pradnya Khutafale, Peter Lucas, and Chris Maioand Chris Maio

Advisor: Dr. Wei Ding Advisor: Dr. Wei Ding Computer Science DepartmentComputer Science Department

UMass BostonUMass Boston

Page 2: Spatial Data Mining CS 697

2222

Discovery of Discovery of Climate Climate

Indices using Indices using ClusteringClustering

Principal InvestigatorsPrincipal Investigators      Vipin Kumar (University of Minnesota)Vipin Kumar (University of Minnesota)     Michael Steinbach (University of Minnesota)Michael Steinbach (University of Minnesota)

CollaboratorsCollaborators      Steven Klooster (Cal. State Univ, Monterey Bay)Steven Klooster (Cal. State Univ, Monterey Bay)     Christopher Potter (NASA Ames Research Center)Christopher Potter (NASA Ames Research Center)     Pang-Ning Tan (Michigan State University)Pang-Ning Tan (Michigan State University)

Page 3: Spatial Data Mining CS 697

33

Department of Computer Science Department of Computer Science and Engineeringand Engineering

Michael Steinbach Michael Steinbach Pang-Ning TanPang-Ning TanVipin KumarVipin Kumar

ResearchersResearchers

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Leading educators in the field of Leading educators in the field of spatial data miningspatial data mining

Investigating the use of data Investigating the use of data mining techniques to find mining techniques to find interesting spatio-temporal interesting spatio-temporal patterns from Earth Sciencepatterns from Earth Science

Regarded as leaders in the field of Regarded as leaders in the field of climate indices identification and climate indices identification and data mining researchdata mining research

Page 4: Spatial Data Mining CS 697

44

NASA & Ames Research NASA & Ames Research Center team members: Center team members:

Chris Potter Chris Potter Steven Klooster Steven Klooster

ResearchersResearchers

Working on cutting edge Working on cutting edge computer science methods computer science methods and technologies to be and technologies to be utilized for finding utilized for finding solutions to complex solutions to complex environmental problems.environmental problems.

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 5: Spatial Data Mining CS 697

55Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

55

Presentation OutlinePresentation Outline Background: Background: (Chris)(Chris)

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate Indices Earth Science Data and Climate Indices (Chris)(Chris)

Existing Eigenvalue Techniques and Limits Existing Eigenvalue Techniques and Limits (Pete)(Pete)

New Clustering Based Methodology New Clustering Based Methodology (Pete)(Pete)

Results and Comparisons Results and Comparisons (Pradnya)(Pradnya)

Conclusions and Future Research Conclusions and Future Research (Pradnya and Pete)(Pradnya and Pete)

Page 6: Spatial Data Mining CS 697

66Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

66

Presentation OutlinePresentation Outline Background:Background:

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate IndicesEarth Science Data and Climate Indices

Existing Eigenvalue Techniques and LimitationsExisting Eigenvalue Techniques and Limitations

New Clustering Based MethodologyNew Clustering Based Methodology

Results and ComparisonsResults and Comparisons

Conclusions and Future ResearchConclusions and Future Research

Page 7: Spatial Data Mining CS 697

7777

Climate ChangeClimate ChangeBackgroundBackground

IPCC PredictionsIPCC Predictions

Rise in global temperaturesRise in global temperaturesExtinctions of plants and animalsExtinctions of plants and animals

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using ClusteringSea-level RiseSea-level Rise

Page 8: Spatial Data Mining CS 697

8888

Climate Change leads to Climate Change leads to significant changes of significant changes of rainfall and soil moisture rainfall and soil moisture (drought and flood)(drought and flood)

Climate Change ImpactsClimate Change ImpactsBackgroundBackground

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Agricultural activities (crop Agricultural activities (crop growth cycle) and world growth cycle) and world food supplies are affected food supplies are affected greatly by climatic factors greatly by climatic factors (desertification)(desertification)

Climate change increases Climate change increases the frequency, intensity, the frequency, intensity, and distribution of natural and distribution of natural hazards, such as hurricanes hazards, such as hurricanes and other stormsand other storms

Page 9: Spatial Data Mining CS 697

99Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

BackgroundBackground

Ocean, atmosphere, Ocean, atmosphere, and land processes are and land processes are highly coupledhighly coupled

Climate phenomena in Climate phenomena in one location can affect one location can affect the climate at a far the climate at a far away location this is away location this is known as climate known as climate teleconnectionsteleconnections

Understanding climate Understanding climate “teleconnections” key “teleconnections” key to knowing and to knowing and predicting ecosystem predicting ecosystem response to climate response to climate change change

Earth System LinkagesEarth System Linkages

Page 10: Spatial Data Mining CS 697

1010Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

1010

Presentation OutlinePresentation Outline Background:Background:

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate IndicesEarth Science Data and Climate Indices

Existing Eigenvalue Techniques and LimitationsExisting Eigenvalue Techniques and Limitations

New Clustering Based MethodologyNew Clustering Based Methodology

Results and ComparisonsResults and Comparisons

Conclusions and Future ResearchConclusions and Future Research

Page 11: Spatial Data Mining CS 697

1111

Time Series Data Time Series Data Earth Science DataEarth Science Data

Sea Surface Sea Surface Temperature (SST)Temperature (SST)

Sea Level Pressure Sea Level Pressure (SLP)(SLP)

Page 12: Spatial Data Mining CS 697

12121212

Earth Science DataEarth Science Data

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

There are thousands of floats, buoys, and other remote sensing devises There are thousands of floats, buoys, and other remote sensing devises throughout the oceans collecting enormous amount of oceanographic data throughout the oceans collecting enormous amount of oceanographic data periodically transmitted to shore via satellite (Naval Research Laboratory). periodically transmitted to shore via satellite (Naval Research Laboratory).

Data Acquisition Data Acquisition

Page 13: Spatial Data Mining CS 697

13131313

Spatial and temporal Spatial and temporal nature of data poses a nature of data poses a number of challengesnumber of challenges

NoisyNoisy

Cycles of varying lengths Cycles of varying lengths and regularityand regularity

Strong seasonal Strong seasonal componentcomponent

Displays long term trendsDisplays long term trends

Displays temporal and Displays temporal and spatial Autocorrelationspatial Autocorrelation

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Earth Science DataEarth Science Data Preprocessing RequiredPreprocessing Required

Page 14: Spatial Data Mining CS 697

14141414

Climate Indices = Data time Climate Indices = Data time series that summarize series that summarize physical behavior of different physical behavior of different regions of ocean and regions of ocean and atmosphere atmosphere

Distill climate variability at Distill climate variability at regional or global scale into a regional or global scale into a single and manageable time single and manageable time series series

Usually based on sea level Usually based on sea level pressure and sea surface pressure and sea surface temperaturetemperature

Past methods of indication Past methods of indication painstakingly slow and painstakingly slow and tedioustedious

Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 15: Spatial Data Mining CS 697

15151515

Climate Index: Climate Index: Nino 1+2Nino 1+2 Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 16: Spatial Data Mining CS 697

16161616Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 17: Spatial Data Mining CS 697

1717

El Nino El Nino CorrelationsCorrelations

Climate IndicesClimate Indices

SST of El Nino correlated indicesSST of El Nino correlated indices

Page 18: Spatial Data Mining CS 697

1818

Detection of Climate Indices

Earth Scientists have devoted a Earth Scientists have devoted a significant amount of time significant amount of time discovering climate indicesdiscovering climate indices

Traditional approaches include direct Traditional approaches include direct observation of climate phenomena (El observation of climate phenomena (El Nino)Nino)

Use of linear algebra techniques Use of linear algebra techniques including eigenvalue analysisincluding eigenvalue analysis

Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 19: Spatial Data Mining CS 697

1919

Eigenvalue AnalysisEigenvalue Analysis

Driven by massive amount Driven by massive amount of data obtained from of data obtained from satellites and remote satellites and remote sensing devisessensing devises

Provides a way to quickly Provides a way to quickly and automatically detect and automatically detect patterns in large amounts patterns in large amounts of dataof data

Climate IndicesClimate Indices

Jason-2 IR satellite imageJason-2 IR satellite image

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 20: Spatial Data Mining CS 697

2020

Eigenvalue AnalysisEigenvalue AnalysisClimate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue techniques include:Eigenvalue techniques include: Principle Components Analysis (PCA)Principle Components Analysis (PCA) Single Value Decomposition (SVD)Single Value Decomposition (SVD)

Limitations of Eigenvalue AnalysisLimitations of Eigenvalue Analysis Weaker signals may be masked by stronger Weaker signals may be masked by stronger

signalssignals All Discovered signals must be orthogonal to All Discovered signals must be orthogonal to

each other making it difficult to attach a each other making it difficult to attach a physical interpretation to themphysical interpretation to them

Page 21: Spatial Data Mining CS 697

2121

Alternative Clustering Alternative Clustering MethodologyMethodology

Utilization of data mining Utilization of data mining techniques and enormous techniques and enormous amount of remote sensing amount of remote sensing data to find climate indicesdata to find climate indices

Analysis yields clusters that Analysis yields clusters that represent ocean regions represent ocean regions with relatively with relatively homogeneous behaviorhomogeneous behavior

Centroids of these areas Centroids of these areas summarize behavior summarize behavior particular regionparticular region

Finding “meaningful” Finding “meaningful” clusters will enable Earth clusters will enable Earth Scientists to better predict Scientists to better predict changes in climate systemchanges in climate system

Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 22: Spatial Data Mining CS 697

2222

Benefits of ClusteringBenefits of Clustering

Discovered signals do not need to be Discovered signals do not need to be orthogonal or statistically independent of orthogonal or statistically independent of one anotherone another

Signals are more easily interpretedSignals are more easily interpreted

Weaker signals are more readily detectedWeaker signals are more readily detected

It provides an efficient way to determine the It provides an efficient way to determine the influence of large set of points (all ocean influence of large set of points (all ocean point) on another large set of points (all point) on another large set of points (all land points)land points)

Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 23: Spatial Data Mining CS 697

2323

Results of Clustering Results of Clustering MethodologyMethodology

Candidate Indices Candidate Indices highly correlated to highly correlated to known indices known indices representing representing rediscovery of well rediscovery of well known indices and known indices and validation of methodsvalidation of methods

Variants to well-known Variants to well-known indices which may be indices which may be better predictors of better predictors of land behavior for land behavior for some regions of landsome regions of land

Cluster centroids that Cluster centroids that have medium or low have medium or low correlation with known correlation with known indices may represent indices may represent new Earth science new Earth science phenomenaphenomena

Climate IndicesClimate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 24: Spatial Data Mining CS 697

2424Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

2424

Presentation OutlinePresentation Outline Background:Background:

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate IndicesEarth Science Data and Climate Indices

Existing Eigenvalue Techniques and Existing Eigenvalue Techniques and LimitationsLimitations

New Clustering Based MethodologyNew Clustering Based Methodology

Results and ComparisonsResults and Comparisons

Conclusions and Future ResearchConclusions and Future Research

Page 25: Spatial Data Mining CS 697

2525

FindingFinding Spatial or Temporal Spatial or Temporal Patterns using SVD Patterns using SVD

AnalysisAnalysisSVD: Singular Value SVD: Singular Value

DecompositionDecomposition

Earth Scientists typically used SVD Earth Scientists typically used SVD analysis to identify climate indicesanalysis to identify climate indices

Goal : To find a new set of attributes Goal : To find a new set of attributes that better describe variability in that better describe variability in the data, through dimensionality the data, through dimensionality reductionreduction

Its operation can be thought of as Its operation can be thought of as revealing the internal structure of revealing the internal structure of the data in a way which best the data in a way which best explains the variance in the data explains the variance in the data

Karl Pearson, Karl Pearson, StatisticianStatistician 1857 – 1936 1857 – 1936

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 26: Spatial Data Mining CS 697

2626

Overview of SVD AnalysisOverview of SVD Analysis

These techniques applied to a These techniques applied to a data set in the form of a data data set in the form of a data matrix (m by n)matrix (m by n)

m rows (objects)m rows (objects)

n columns (attributes)n columns (attributes)

Data Matrix: a variation of Data Matrix: a variation of

record data in that it consistsrecord data in that it consists

of all numeric attributesof all numeric attributesExample of a data matrixExample of a data matrix

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 27: Spatial Data Mining CS 697

2727

Overview of SVD AnalysisOverview of SVD Analysis Assume the data objects in a Assume the data objects in a

matrix all have the same fixed matrix all have the same fixed set of attributes set of attributes

Each data object can be Each data object can be thought of as a point, or thought of as a point, or Vector in multidimensional Vector in multidimensional spacespace

Each spatial dimension Each spatial dimension

represents a distinct attribute represents a distinct attribute describing the objectdescribing the object

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 28: Spatial Data Mining CS 697

Simple Example of SVD Simple Example of SVD AnalysisAnalysis Just using web, it’s hard to find intuitive explanation of SVD Just using web, it’s hard to find intuitive explanation of SVD

Again, SVD is a way to expose underlying details of matrixAgain, SVD is a way to expose underlying details of matrix

Simple Example using Golf : 3 golfers play 9 holes, par every holeSimple Example using Golf : 3 golfers play 9 holes, par every hole

How to predict score for a player on a given hole?How to predict score for a player on a given hole?

Assume two vectors, Player Ability and Hole Assume two vectors, Player Ability and Hole DifficultyDifficulty

Predicted score = Player Ability * Hole DifficultyPredicted score = Player Ability * Hole Difficulty Hole difficulty is Left Singular VectorHole difficulty is Left Singular Vector Player Ability is Right Singular VectorPlayer Ability is Right Singular Vector

Discovery of Climate Indices Discovery of Climate Indices using Clusteringusing Clustering 2828

Page 29: Spatial Data Mining CS 697

2929

Finding Spatial or Temporal Finding Spatial or Temporal Patterns using SVD Patterns using SVD

AnalysisAnalysis Given a data matrix, whose rows consist of time Given a data matrix, whose rows consist of time

series from various points on the globe, the series from various points on the globe, the objective is to discover the strong temporal or objective is to discover the strong temporal or spatial patterns in the dataspatial patterns in the data

SVD decomposes a matrix into two sets of patterns, SVD decomposes a matrix into two sets of patterns, which, that correspond to a set of spatial patterns which, that correspond to a set of spatial patterns (left singular vectors) and a set of temporal patterns (left singular vectors) and a set of temporal patterns (right singular vectors). (right singular vectors).

We can plot the temporal patterns regular line plot We can plot the temporal patterns regular line plot and the spatial patterns on a spatial grid and and the spatial patterns on a spatial grid and visualize these patterns.visualize these patterns.

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 30: Spatial Data Mining CS 697

3030

Example : Plotting SST Example : Plotting SST (Sea Surface Temp)(Sea Surface Temp)

Temporal pattern of SST (blue)Temporal pattern of SST (blue)plotted against the NINO4 index plotted against the NINO4 index (green)(green)

Strongest spatial pattern of Strongest spatial pattern of SSTSST

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 31: Spatial Data Mining CS 697

3131

Limitations of SVD Limitations of SVD AnalysisAnalysis

Only useful for finding a few of the Only useful for finding a few of the strongest signalsstrongest signals

Smaller patterns in data may be obscuredSmaller patterns in data may be obscured

Signals must be orthogonal to each other Signals must be orthogonal to each other (statistically independent)(statistically independent)

May not identify all patterns in dataMay not identify all patterns in data

Efficiency can be a concernEfficiency can be a concern

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Eigenvalue TechniquesEigenvalue Techniques

Page 32: Spatial Data Mining CS 697

3232Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

3232

Presentation OutlinePresentation Outline Background:Background:

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate IndicesEarth Science Data and Climate Indices

Existing Eigenvalue Techniques and LimitationsExisting Eigenvalue Techniques and Limitations

New Clustering Based MethodologyNew Clustering Based Methodology

Results and ComparisonsResults and Comparisons

Conclusions and Future ResearchConclusions and Future Research

Page 33: Spatial Data Mining CS 697

3333

Clustering Based Methodology Clustering Based Methodology for the Discovery of Climate for the Discovery of Climate

IndicesIndices Two key steps for finding climate Two key steps for finding climate

indicesindices1.1. Find Find candidate candidate indices using clusteringindices using clustering

2.2. Evaluate these candidate indices for Evaluate these candidate indices for Earth Science significanceEarth Science significance

Clustering Method used for this study:Clustering Method used for this study:

SNN Clustering Algorithm Method SNN Clustering Algorithm Method

“ “Searching Nearest Neighbors”Searching Nearest Neighbors”

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 34: Spatial Data Mining CS 697

3434

Finding Candidate Indices Finding Candidate Indices Using ClusteringUsing Clustering

SNN Clustering AlgorithmSNN Clustering Algorithm

First finds the nearest neighbors of First finds the nearest neighbors of each data point each data point

Next, redefines the similarity Next, redefines the similarity between pairs in terms of how between pairs in terms of how many nearest neighbors the two many nearest neighbors the two points sharepoints share

Using this definition of similarity Using this definition of similarity the algorithm identifies core pointsthe algorithm identifies core points

These Core Points are used to build These Core Points are used to build clustersclusters

SNN algorithms have time SNN algorithms have time complexity O(n*log(n)) complexity O(n*log(n))

Graph of functions n(log n) Graph of functions n(log n) and nand n

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 35: Spatial Data Mining CS 697

3535

Evaluation of Candidate Evaluation of Candidate IndicesIndices

Indices must be evaluated in terms of Earth Science Indices must be evaluated in terms of Earth Science significancesignificance

(meaning the strength of the association between (meaning the strength of the association between the behavior of a candidate index and land climate)the behavior of a candidate index and land climate)

Goal is to find a numerical measure of the strength Goal is to find a numerical measure of the strength and association between the behavior of an index and association between the behavior of an index and land climateand land climate

To evaluate influence of climate indices on land, the To evaluate influence of climate indices on land, the researchers use Area-Weighted Correlationresearchers use Area-Weighted Correlation

Definition : The weighted average of the correlation Definition : The weighted average of the correlation of the candidate index with all land points, where of the candidate index with all land points, where weight is based on the area of the land grid pointweight is based on the area of the land grid point

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 36: Spatial Data Mining CS 697

3636

Calculating Area-weighted Calculating Area-weighted CorrelationCorrelation

Step 1 :Step 1 : Compute the correlation of the time series of the candidate index with Compute the correlation of the time series of the candidate index with the same time series associated with each land pointthe same time series associated with each land point

Step 2 :Step 2 : Compute the weighted average of the correlations, where the weight Compute the weighted average of the correlations, where the weight associated with each land point is its areaassociated with each land point is its area

The resulting area-weighted correlation The resulting area-weighted correlation

can be at most 1, min is 0can be at most 1, min is 0

General Formula for W.A.General Formula for W.A.

General Correlation Index. 1 being strongestGeneral Correlation Index. 1 being strongest

Wc = weight of each value MWc = weight of each value M

Mc = some value to averageMc = some value to average

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 37: Spatial Data Mining CS 697

3737

Comparison of Area-Comparison of Area-Weighted CorrelationsWeighted Correlations

Development of Development of Baseline to compare Baseline to compare the values of area the values of area weighted correlations weighted correlations of candidate indicesof candidate indices

Histogram of area Histogram of area weighted correlation weighted correlation of 1000 random time of 1000 random time seriesseries

No time series has a No time series has a WAC >.1 This will be WAC >.1 This will be the baseline, and the baseline, and indicates whether a indicates whether a good candidate indexgood candidate index

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 38: Spatial Data Mining CS 697

3838

Validation of Comparison Validation of Comparison BaselineBaseline

Below shown are weighted area correlations of 11 Below shown are weighted area correlations of 11 knownknown indices indices

Note that 10/11 indices have a weighted area Note that 10/11 indices have a weighted area correlation of >.1correlation of >.1

If candidate index shows weighted area correlation If candidate index shows weighted area correlation >.1, investigate>.1, investigate

Graph of Weighted Area Graph of Weighted Area Correlation of Correlation of Well know Climate IndicesWell know Climate Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Clustering MethodsClustering Methods

Page 39: Spatial Data Mining CS 697

3939Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

3939

Presentation OutlinePresentation Outline Background:Background:

Climate ChangeClimate Change Earth System LinkagesEarth System Linkages

Earth Science Data and Climate IndicesEarth Science Data and Climate Indices

Existing Eigenvalue Techniques and LimitationsExisting Eigenvalue Techniques and Limitations

New Clustering Based MethodologyNew Clustering Based Methodology

Results and ComparisonsResults and Comparisons

Conclusions and Future ResearchConclusions and Future Research

Page 40: Spatial Data Mining CS 697

4040

SST Based Candidate Indices

Used SST data over time period from 1958 and 1998 and applied SNN clustering

Obtained 107 clusters

Cluster centroids were used to categorize clusters into G0,G1,G2 and G3 groups depending on their correlation to known indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Page 41: Spatial Data Mining CS 697

4141

107 Sea Surface 107 Sea Surface Temperature (SST) ClustersTemperature (SST) Clusters

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Find Correlation with known index like SOI, NINO1+2 etc

Find Area Weighted correlation with land

Page 42: Spatial Data Mining CS 697

4242

SST Cluster CorrelationSST Cluster Correlation

Correlation between known indices with SST cluster centroids and SVD Components

ResultsResults

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 43: Spatial Data Mining CS 697

4343

G0: G0: Clusters with correlation to Clusters with correlation to known indices >= 0.8known indices >= 0.8

ResultsResults

VeryVery highly correlated highly correlated

Rediscovered well-known indicesRediscovered well-known indices

Serve to validate the approachServe to validate the approach

NINO 1+2

NINO 3

NINO 3.4

NINO 4

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 44: Spatial Data Mining CS 697

4444

G0: SST Cluster CorrelationG0: SST Cluster Correlation

Correlation between known indices with SST cluster Correlation between known indices with SST cluster centroids and SVD Components centroids and SVD Components

ResultsResults

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 45: Spatial Data Mining CS 697

4545

G1: Clusters with correlation to known indices from 0.4 to 0.8

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Page 46: Spatial Data Mining CS 697

4646

G1: G1: Cluster 29 vs. El Nino IndicesCluster 29 vs. El Nino Indices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Cluster 29

Page 47: Spatial Data Mining CS 697

4747

G2: G2: Clusters with correlation to Clusters with correlation to known indices from 0.25 to 0.4known indices from 0.25 to 0.4

Less correlated Less correlated

May represent new earth May represent new earth science science

phenomena phenomena

May be new indexMay be new index

ResultsResults

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 48: Spatial Data Mining CS 697

4848

Cluster 62 vs. El Nino Cluster 62 vs. El Nino IndicesIndices

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Cluster 62

Page 49: Spatial Data Mining CS 697

4949

G3: G3: Clusters with correlation to Clusters with correlation to known indices <= 0.25known indices <= 0.25

Less correlated Less correlated

May represent new earth science May represent new earth science

phenomena or weaker version of phenomena or weaker version of

known phenomenaknown phenomena

New indexNew index

ResultsResults

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 50: Spatial Data Mining CS 697

5050

SLP based Candidate SLP based Candidate IndicesIndices

SLP data over time period from SLP data over time period from

1958 to 19981958 to 1998 Correlation measured as difference Correlation measured as difference

of all pairs of cluster centriodsof all pairs of cluster centriods Negative correlation are interesting Negative correlation are interesting

candidatescandidates 25 Clusters found25 Clusters found

ResultsResults

25 Sea Level Pressure Based Clusters

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 51: Spatial Data Mining CS 697

5151

SLP Clusters Pairwise SLP Clusters Pairwise Correlation Correlation

Note :Only negative correlation values Note :Only negative correlation values shown shown

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

ResultsResults

Page 52: Spatial Data Mining CS 697

5252

Comparison with SVD Comparison with SVD based Indicesbased Indices

Correlation of Cluster Centroids Correlation of Cluster Centroids with land temperature with land temperature

Correlation of first 30 SVD Correlation of first 30 SVD components with land temperature components with land temperature

ComparisonsComparisons

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 53: Spatial Data Mining CS 697

5353

SST Clusters : Performance SST Clusters : Performance Comparison Comparison

Correlation for known indices with SST cluster centroids and Correlation for known indices with SST cluster centroids and SVD componentsSVD components

ComparisonsComparisons

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 54: Spatial Data Mining CS 697

5454

SLP Clusters : Performance SLP Clusters : Performance Comparison Comparison

ComparisonsComparisons

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 55: Spatial Data Mining CS 697

5555

Area-weighted correlation for known indices with SLP cluster Area-weighted correlation for known indices with SLP cluster centroids and SVD componentscentroids and SVD components

SLP clusters Performance SLP clusters Performance ComparisonComparison

ComparisonsComparisons

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 56: Spatial Data Mining CS 697

5656

Conclusions Conclusions Demonstrated that clustering is a viable Demonstrated that clustering is a viable

alternative to eigenvalue based approach alternative to eigenvalue based approach for the discovery of climate indicesfor the discovery of climate indices

Can replicate many well-known climate Can replicate many well-known climate indicesindices

Have also discovered variants of known Have also discovered variants of known indices that may be “better” for some indices that may be “better” for some regionsregions

Some indices may represent new Earth Some indices may represent new Earth Science phenomenaScience phenomena

No need for discovered indices to be No need for discovered indices to be orthogonalorthogonal

No need to pre-select the area to analyzeNo need to pre-select the area to analyzeDiscovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 57: Spatial Data Mining CS 697

5757

Future WorkFuture Work Investigation of candidate indices by Investigation of candidate indices by

Earth ScientistsEarth Scientists

Investigate whether there are climate Investigate whether there are climate indices that cannot be represented by indices that cannot be represented by clustersclusters

Noise elimination and other Noise elimination and other preprocessing improvementspreprocessing improvements

AggregationAggregation

Discovery of Climate Indices using ClusteringDiscovery of Climate Indices using Clustering

Page 58: Spatial Data Mining CS 697

5858

QUESTIONS ???QUESTIONS ???