seminar final1

Amod Aggarwal

(13535005)

Guided by:

Dr. Padam Kumar

Dr. Dhaval Patel

Introduction

Literature Survey

Conclusion

References

What is Meteorology and Oceanography?◦ study of spatial and temporal variations of the atmospheric,

oceanographic and land parameters over long time periods

◦ helps in prediction of disasters which prevents loss of life and

property

What is data mining?

◦ process of extraction of

implicit,

previously unknown

and potentially useful information from huge amount of data

Anomaly

detection

Detection of Land cover change , outlier

values of precipitation

Association rule

mining

Finding association between

oceanographic parameters and cyclone

intensification

Pattern mining

Understanding of natural events. For

example: eddies sustain energy for weeks

or months and therefore can be

manifested as connected group of

gradually increasing or decreasing time

series

Classification Detection of water fraction per flood pixel

Regression Detection of forest cover per pixel

Technique Application

Gradually decreasing segments of time series enclosed between red and

green lines are signatures of an eddy

Swirls of ocean currents

Play significant role in transport

of water, heat, salt, and nutrients

Green

swirl is

ocean

eddy

This is challenging due to following reasons:◦ Not concrete objects: Spatio-temporal phenomena are not

concrete objects but evolving patterns over space and timewhereas in traditional data mining, objects are concrete i.e. theyare either present or absent.

Transactions – item either present or absent (0 or 1)

Hurricanes – continuous gradual evolution, does not simply appearand disappear

◦ Uncertainty: It occurs due to biases in measurement as somevalues may be missing due to presence of cloud cover.

◦ Diversity: This is due to heterogeneity in space and time as datamay be available from different sources at different spatial andtemporal resolutions

◦ Variability: Values captured for same location at difference ofsmall intervals may vary due to local climatic variations

The data retrieved from the remote sensing satellites is in the form

of data products having different data formats.

The standard data format for most of the data products is HDF

format. Some other formats are NetCDF, KML etc.

One data product contains data related to one parameter.

The authenticated users can download the Indian satellite data from

mosdac.gov.in website of ISRO.

MOSDAC disseminates data for around 20 parameters. Some of

these are: o Normalized Difference Vegetation Index (NDVI)

o Land surface temperature (LST)

o Aerosol Optical Depth (AOD)

o Cloud Liquid Water (CLW)

o Mean Sea Surface (MSS)

Container for storing a variety of scientific data

Composed of two primary types of objects :

◦ Groups :

grouping structure containing 1 or more HDF objects together

with supporting metadata

◦ Datasets :

Multidimensional array of data elements together with

supporting metadata

Introduction

Literature Survey

Conclusion

References

Anomaly detection

◦ Land cover change detection

◦ Outlier precipitation detection

◦ Outlier time interval detection

Detection of water fraction per flood pixel

Detection of forest cover per pixel

Aim:◦ To find those locations which undergo significant and sudden change

during a particular time period.

◦ The time at which the change occurs is also determined.

Importance:◦ Helps in mapping of damages following a natural disaster such as fire,

droughts, floods etc.

Land cover change detection

methods

Bitemporal methods

Time series data mining techniques

Red – focused

techniques

Bitemporal methods

Image differencing

Image ratioing

Principal component

analysis

Change vector

analysis

Land cover change

detection methods

Bitemporal methods



Predictive model based

Yearly Delta Algorithm

Variability Distribution Algorithm

Vegetation Independent Yearly Delta

Algorithm

Segmentation based

Top down approach

Bottom up approach

Recursive merging algorithm

Land cover change

detection methods

Bitemporal methods


Bitemporal methods Time series data mining methods

Two time instants are compared Vegetation time series is analyzed at each

location and changes in the time series are

identified

Do not provide the information about the

time of change

Provides the information about the time of

change

Less computational complexity Computational complexity is high as large

time series has to be analyzed

Segmentation based approach Predictive model based approach

Time series is partitioned into homogenous

segments and boundaries between

segments may be change points

A model is constructed for the portion of the

time series and that is used to predict the

future time points.

The time series that are sufficiently different

are considered change points.

Time series data mining methods

Segmentation based

◦ Recursive merging algorithm

Predictive model based◦ Yearly Delta algorithm

◦ Variability distribution algorithm

◦ Vegetation independent variability distribution algorithm

Input : Monthly composited EVI (Enhanced Vegetation Index) dataset for

the state of California for years 2000-2006.

Output : Detection of land cover changes

◦ Forest fires

◦ Conversion to farming

◦ Construction or logging

Algorithm : The pixel time series is analyzed as follows:

1. Let {b1,b2,…., bn} correspond to list of annual EVI sum which is the sum of

vegetation index value of all the months.

2. Two consecutive segments with most similar annual EVI sums are merged

• Suppose b1 and b2 are most similar EVI sums, then at the end of this

step, list will be {(b1+b2)/2,b3,…., bn} having one less element

• Merge cost s1= dist {b1,b2}

3. Step 2 is applied recursively until list contains one element

4. List of merge costs will be s1,s2,.......,sn-1.

5. Change score for a location or pixel will be

6. Pixels are ranked on basis of change score value and some top ranked

pixels are considered as changes.

1

1

1

1

min

max score change

n

ii

i

n

i

s

s

b1 b2 b3

Time series for one pixel

Change score is calculated in such a way so as to take into account

the type of vegetation

◦ very small change can be considered as change point for stable

forests

◦ large change may not be change point for high variability regions

such as grasslands

Helps in reducing the detection of false positives

Limitations:

◦ Minimum cost of merging is considered as variability value due to

local climatic changes.

◦ But, the minimum cost may have occurred very rare and have

been captured by chance


Segmentation based





Input: MODIS EVI data for California and Yukon

◦ Data for California is at 250m spatial resolution for years 2006-

2008.

◦ Data for Yukon is at 1km spatial resolution for years 2004-2008.

◦ Time series for each pixel is analyzed independently

Output:

◦ Land cover change locations (pixels)

◦ Time at which change occurred

Validation: High quality data for fires generated from independent

source is used for validation

Algorithm:

◦ Previous year is considered as a model

◦ Change score is assigned to each time step as difference between mean annual

EVI of current year and previous year

◦ Maximum change score across all the time steps is considered the YD score for a

location

◦ Top ranked pixels according to YD score are called change points.

Limitation :

◦ Does not make use of information about natural variation in EVI.

◦ Only one top change of a time series is considered.

There is possibility that one time series may undergo multiple changes during a

given period

)score change(max score 1-n

1iYD

EVI annualEVI annualscore change year previousyearcurrent year current

Actual change occurs in year 2008

Difference in

annual EVI is

high

Change occurs in year 2005 due to natural

variations

Although

difference in

annual EVI is

high but not

very high if

compared with

mean

variability

score


Segmentation based





Algorithm:

Each annual segment in the first k years is considered a model and remaining k-1values are considered as the observed values.

Mean Manhattan distance is computed for the k-1 years of model to give thedistribution of variability scores for that location.

Modified score value called VD score is used which is

where µ is the mean of distribution.

The mean is estimated using Maximum Likelihood Estimation method

Special features:

Makes use of information about natural variation in EVI.

Any year for which annual EVI deviates significantly from the mean annual EVI for kyears should be discarded

Limitation :

Some of the vegetation types such as open shrubs have large variations in spread ofannual variability

- score YD score VD

As only one

vegetation type

i.e. forests is

considered,

therefore YD is

also performing

better

Scatter plot of mean variability against YD score for forest cover

(Courtesy: Mithal et al. [6])

Constant YD score

Constant VD score

Change

point

Savannas consist of

trees, shrubs, grasses

etc.

The different

vegetation types has

different value of

threshold change

score to be

considered as actual

change.

Therefore, VD

performs better than

YD algorithm

Scatter plot of mean variability against YD score for savannas (Courtesy:

Mithal et al. [6])

Constant VD score

Constant YD score

As open shrub-lands

show different spread

of variability for

different locations

even though

vegetation type is

same, therefore both

YD and VD are

showing lot of false

positives

Scatter plot of mean variability against YD score for shrublands

(Courtesy: Mithal et al. [6])

Constant VD score

Constant YD score


Segmentation based

◦ Recursive Merging Algorithm

Predictive model based◦ Yearly Delta Algorithm

◦ Variability Distribution Algorithm

◦ Vegetation Independent Variability Distribution Algorithm

Algorithm:

Mean and standard deviation of variability score distribution are

estimated as maximum likelihood estimates of distribution

New score called VID score is used and calculated as follows:

Salient features:

Takes into account the information about spread of variability score

distribution and therefore reduces false alarm rates

High VID score implies lower false positive rate and vice versa.

- score YDscore VID

Mean annual EVI

Both pixels correspond to shrub vegetation

type whose spread of variability score varies

from location to location and time to time.

Variability score in

this area indicates

change for pixel 2

Variability score in

this area indicates

change for pixel 1Curve for variability

score for pixel 1

Curve for variability

score for pixel 2

Maximum likelihood estimation (MLE)

Every model is specified by the parameters.

MLE is a parameter estimation method which finds the parameter values of a

model that best fits the data.

As fluctuations in variability score for particular vegetation type are normally

distributed for a location, therefore parameters are calculated for normal

distribution

The mean and standard deviation are the parameters for the normal distribution.

Calculation of mean and standard deviation using MLE

◦ Let f(y|w) denotes probability density function (PDF) that specifies probability of observing data

vector y given the parameter w.

◦ If individual observations yi are independent of each other, then according to theory of

probability, the PDF for data y=(y1,.......,yn) given the vector w can be expressed as

multiplication of individual PDFs.

f(y=(y1,…..yn)|w) = f1(y1|w) f2(y2|w)…..fn(yn|w)

The PDF for one observation is

The PDF for multiple independent observations is

Taking log on both sides

exxxi

fn

2

)(

2

1),|,.....,( 2

2

1

exi

n

n

2

)(2

2

)2( 2

22

2

)ln()2ln(2

1)ln(

xinnf

exi

xP

2

)(

2

1)( 2

2

In order for data to best fit the model, the value of the parameter

vector should maximize the PDF.

The partial differentiation of PDF with respect to each of component

parameter of vector should be zero

n

f xx ii

0

))(ln(2

n

iinf xx

2

3

2

0))(ln(

Yearly Delta algorithm Variability distribution

method

Vegetation Independent

Yearly Delta Algorithm

Does not consider the type

of vegetation.

Same YDscore value may be

actual change for forests

but not for savannas or

shrublands.

Considers the type of

vegetation

Same VD score may be

actual change for regions

such as savannas (having

less variation in variability

value) but not for

shrublands (having high

variation in variability value)

Considers the type of

vegetation

VIDscore works for all the

vegetation types

Does not consider the

average change score

value(µ) and the degree of

variability in value(σ)

Considers only the average

change score value(µ)

Considers both the average

change score value(µ) and

the degree of variability in

value(σ)

YDscore= max i=1 to n(annual

EVI current year – annual EVI

previous year)

VD score = YDscore - µ VID score=(YDscore-µ)/ σ

Where TPn = true positives,

FPn = false positives,

M = total no of pixels considered

VD and VID gives better

results than YD.

Reason:

Graph corresponds

to only forest region.

MODIS forest map

was used to detect

forest cover pixels inaccurate and

includes some

shrubs and

agricultural land

labeled as forests.

Green line -> YD score

Red line -> VD score

Black line -> VID score

VID performs slightly

worse than VD

Reason-Initial few years

selected to model variability

may have some noise

Therefore, mean variability

for that location is modeled

as high and changes in later

years will go undetected




Performance of

VID is best.

Reason-Shrubs

form dominant land

cover type for

California and they

show high variability

in spread of

variability score due

to higher sensitivity

to climatic variations




Performance of YD is

exceptionally poor and that

of VID is exceptionally

good.

Reason-due to high

variability in spread of

variability score for different

locations with vegetation

type as shrubs




Anomaly detection






Input : ◦ South American Precipitation dataset in geoscience format known as NetCDF

Output:◦ The top k=5 outliers are found for every year◦ Total of 155 outlier sequences were found over a period of 10 years

Running time of algorithm is 229s.

Variable Value

Num Year Periods 10

Year Range 1995-2004

Grid Size 2.5º×2.5º

Num Latitudes 31

Num Longitudes 23

Total Grids 713

Aim:

◦ To find and track the position of outliers with time

Method description:

◦ Top k outliers are found for every year using Exact-Grid Top-k algorithm

◦ Outliers are tracked using the OutStretch algorithm

◦ The outlier sequences generated are analyzed

How to find the outlier (Exact-Grid Top-k algorithm)

◦ Concept of discrepancy is used

◦ Discrepancy value is assigned to each rectangular region using

Kulldorff’s scan statistic.

◦ Top-k outliers are selected for further processing as it is necessary in

order to track the outliers

How to calculate the discrepancy?◦ Two parameters are required:

a measurement m (number of incidences of an event)

a baseline b (total population at risk)

◦ The measurement M and baseline B values for the whole dataset (U) arecalculated as

◦ The measurement M and baseline B values for the region (R) are calculatedas

◦ The discrepancy score of the shaded area is calculated by using the givenformula:

◦ For the above figure, M=6, B=16, mR= 4/6, and bR = 4/16

Up

pmM )(

Up

pbB )(

M

pmRp

Rm

)(

B

pbRp

Rb

)(

)1

1log()1(log),(

bm

mbm

mbmR

R

R

R

R

RRRd

Outstretch algorithm

◦ The region is stretched around each side of the outlier region of the

previous year

◦ Each of outlier in current year is examined to see whether it lies in the

region consisting of stretched region and outlier region of previous year

◦ If it is, then it will be added to child list of previous year outlier

RecurseNode algorithm:

◦ All the sequences starting at root node of trees and ending at leaf node

are fetched.

Outlier region of

previous year

Stretched

region

Forest built by applying outstretch algorithm recursively

(1,1), (2,2) and (3,2)

corresponds to one

sequence followed

by outlier

Anomaly detection






Input:

◦ Sea surface temperature (SST) data of Equatorial Pacific Ocean.

◦ The data consisted of measurements of sea surface temperature

for 44 sensors in Pacific Ocean

◦ Each sensor had a time series of 1440 data points.

Output:

◦ Time intervals where spatial neighborhood has shown abnormal

behavior.

Terms:

Spatial distance (sd) : Distance between 2 locations based on distance between

spatial coordinates

Measurement distance (md) : Distance between 2 points based on difference

between features of 2 points.

Spatial neighborhood : Cluster of locations such that the spatial distance (sd)

and measurement distance (md) between every 2 locations is less than the

respective threshold values.

Sum of squared error (SSE) : Measure of degree of abnormality of the interval

Where p and q are 2 locations

Where valbn is each

temporal reading in base

interval and µ is the meanof the temporal readings

m

ss qampammd1

2

)(

BN

bnvalbn

distSSE1

2

)( int

)()(22

ssss qypyqxpxsd

Aim:

◦ To find time intervals where spatial neighborhoods are likely to show

abnormal behavior.

Algorithm:

◦ Time series is first divided into a set of base equal size temporal

intervals

◦ Spatial neighborhoods are found for every base interval

◦ Each of spatial node in every base interval is analyzed and binary

classified as 1 if showing abnormal behavior or 0 otherwise

◦ Count of spatial nodes having a binary error classification of 1 is

found for every base interval and this count is called vote count.

◦ A threshold mv is then applied and those intervals for which votes >

mv are binary classified as 1 and others as 0.

◦ Consecutive base intervals which have same binary classification are

merged to form the larger intervals.

◦ Mean value for each edge is calculated for every interval.

◦ Spatial neighborhoods are calculated for each interval using the

mean value of edge.

Location : 0ºN latitude and 110ºW longitude

Time period : 10 day period from 01/01/2004 to 01/10/2004

No of measurements: approx.1400

Agglomerative temporal intervals for SST data

Neighborhood (a) represents cooler water

Neighborhood (b) represents warmer water

Neighborhood (c) and (d) represents moderate water

• Edge clustering is validated by satellite image of SST.

• Light regions represent cooler temperatures

•Dark regions represent warmer temperatures.

Neighborhood quality for each interval

SSE of

neighborhood (a)

shows interesting

pattern between

intervals 16 and 19

SSE goes from

high to low and then

back from low to high

Neighborhood (a)

has more spread

during 16th interval

as compared to

17th interval

Input :◦ Land cover type◦ 8-day composite surface reflectance for NIR band (CH1) and VIS

band (CH2)◦ CH2-CH1◦ CH2/CH1◦ NDVI dataset◦ Data before flooding in Mississippi basin is used as training dataset◦ Data after flooding in Mississippi basin that occurred on June 17-19,

2008 is used for testing

Output:◦ The best attribute (R) for classification i.e. CH2-CH1 is found.◦ The threshold values of the best attribute (R) for pure water and pure

land are found.

Validation data:◦ 30m spatial resolution Landsat TM imagery for validation purposes

Aim:

◦ To find the fraction of water in flood pixels which are usually water

mixed with land cover features for MODIS dataset which has

coarse resolution

Method description:

◦ Decision tree approach is used to find

the best parameter (predictor) in order to differentiate between

land and water.

the threshold values of the predictor R for pure water (Rwater)

and pure land (Rland)

◦ Water fraction per pixel can be found by comparing actual value

of predictor with its value for pure water or pure land

RRR landwatermixWFWF *)1(*

100*))/()(( RRRR waterlandmixlandWF

Experimental Results:

Some of the rules used for deciding threshold values are :◦ (CH2-CH1) > 9.17 -> class Land

◦ (CH2-CH1) <= 2.91 -> class Water

Correlation between TM and MODIS water fractions is 0.97 with

bias of 4.47% and standard deviation of 4.4%.

Decision tree created

using C4.5 algorithm

Anomaly detection






Input :

◦ Land surface temperature 5-monthly composited MOD11C3

product

◦ NDVI and EVI from monthly composited MOD13C2 product

◦ Land cover type from MCD12C1 yearly product

Output:

◦ Fraction of forest cover per pixel

Validation:

◦ Forest cover information from PRODES data at 90 m resolution available in GeoTiff format is used for validation purposes

Aim :◦ To find the forest cover per pixel for MODIS dataset having coarse

resolution

◦ The data values for parameters like NDVI, EVI, land surface temperatureetc are available per pixel.

◦ Therefore, value is affected by vegetation cover of every point covered inthat pixel.

◦ Same parameter value may correspond to different fraction of forestcover depending on vegetation type for whole area per pixel.

Algorithm:◦ Modification of Leeuwen et al. approach.

◦ Leeuwen et al. approach gives the single logistic regression model for allvegetation types.

◦ But, improved algorithm considers vegetation type and givesindependent logistic regression model for each vegetation type

Leeuwen et al. approach

Terms:

pit : Fraction of forest cover for pixel i in year t (generated from the

analysis of high-resolution LandSat TM) images

Xit : Vector of MODIS observations for pixel i in year t

β: Vector of model parameters (which are estimated from a set of

training data) for pixel i in year t

The vectors Xit and β each have three components:

◦ the first corresponding to a constant intercept term

◦ the second to a NDVI measurement,

◦ and the third to a LST measurement.

Model : X

p

pit

T

it

it

1ln

Learning independent regression algorithms require segmentation

of observation space into multiple categories.

Segmentation is done by partitioning the feature space which is n-

dimensional space with one feature corresponding to one of axis.

Features are selected based on their ability in differentiating

between different vegetation types.

For example: Forests show high inter-annual NDVI and EVI mean

and low inter-annual LST mean but intra-annual variance of NDVI,

EVI and LST is low.

Therefore, mean(µ) and variance(σ2) are selected as features.

Vegetation type distribution in feature space (µ, σ2) of NDVI

Forests show high

inter-annual mean

and low intra-annual

variance

Farms show high

intra-annual variance

due to crop cycles

Grasslands show

high intra-annual

variance and high

inter-annual mean

Water locations

show high intra-

annual variance and

low inter-annual

mean

Analysis of partition

corresponding to forest

vegetation type

Scatter plot of residual of

baseline approach and residual

of vegetation specific approach

Residual of vegetation

specific approach has lower

magnitude than baseline model

Therefore, vegetation

approach better than baseline

model

Analysis of partition

corresponding to cropland

vegetation type

Residual of vegetation

specific model is lower in

magnitude as compared to

baseline model

Introduction

Literature Review

Conclusion

References

Various research works related to anomaly detection and

detection of water fraction or forest cover per pixel have been

discussed.

Most of the research works are pixel-based and do not

consider the spatial neighborhood of a pixel.

Domain knowledge is also required along with data mining

techniques

Future works should work towards addressing these

limitations.

Introduction

Literature Review

Conclusion

References

[1] Jonathan T. Overpeck, Gerald A.Meehl, Sandrine Bony, David R. Easterling, D. (2011) ,"Climate

data challenges in the 21st century," in Science, 2011.

[2] James H. Faghmous and Vipin Kumar," Spatio-temporal data mining for climate data : Advances,

Challenges and Opportunities," in Data Mining and Knowledge Discovery for Big Data, 2014.

[3] Donglian Sun, Yunyue Yu, Mitchell D. Goldberg ,"Deriving Water Fraction and Flood Maps From

MODIS Images Using Decision Tree Approach," in IEEE Journal of Selected Topics In Applied Earth

Observations And Remote Sensing , 2011.

[4] Shyam Boriah, Vipin Kumar, Michael Steinbach, Christopher Potter, Steven Klooster," Land

Cover Change Detection : A Case Study," in Knowledge Discovery in Databases Proceedings, 2008.

[5] Elizabeth Wu, Wei Liu, Sanjay Chawla ,"Spatio-Temporal Outlier Detection in Precipitation Data,"

in Knowledge Discovery in Databases Proceedings, 2008.

[6] Hong Yeon Cho, Ji Hee Oh, Kyeong Ok Kim, and Jae Seol Shim, "Outlier Detection and missing

data filling methods for coastal water temperature data," in Journal of Coastal Research, 2013.

http://link.springer.com/book/10.1007/978-3-642-40837-3

http://www.researchgate.net/journal/1551-5036_Journal_of_Coastal_Research

[7] C.T.Dhanya and D.Nagesh Kumar, " Data mining for evolution of association rules for droughts

and floods in India using climate inputs," in Journal of Geophysical Research, 2009.

[8] Ruixin Yang, Jiang Tang, and Donglian Sun, " Association Rule Data Mining Applications for

Atlantic Tropical Cyclone Intensity Changes," in Journal of American Meteorological Society, 2011.

[9] James H.Faghmous, Yashu Chamber, Shyam Boriah, Stefan Liess, Vipin Kumar, "A novel and

scalable spatio-temporal technique for ocean eddy monitoring," in Association for Advancement of

Artificial Intelligence, 2012.

[10] Imran Maqsood, Muhammad Riaz Khan, and Ajith Abrahim, "An ensemble of neural networks for

weather forecasting," in Neural Computing & Applications , 2004.

[11] Agboola A.H., Gabriel A.J., Aliyu E.O., Alese B.K., "Development of a Fuzzy Logic Based

Rainfall Prediction Model," in International Journal of Engineering and Technology, 2013.

[12] Christopher G.Healey, "On the Use of Perceptual Cues and Data Mining for Effective

Visualization of Scientific Datasets," in Proceedings Graphics Interface, 1998.

http://link.springer.com/journal/521

[13] Wenwen Li, Chaowei Yang, Donglian Sun, "Mining geophysical Parameters through Decision Tree

Analysis to Determine Correlation with Tropical Cyclone Development," in Computers & Geosciences,

2008.

[14]Pinky Saikia Dutta, and Hitesh Tahbilder, "Prediction of Rainfall using Data mining Technique over

Assam," in Indian Journal of Computer Science and Engineering (IJCSE),2014.

[15]Anuj Karpatne, Mace Blank, Michael Lau, Shyam Boriah, Karsten Steinhaeuser, Michael Steinbach

and Vipin Kumar," Importance of Vegetation Type in Forest Cover," in Intelligent Data Understanding,

2012.

[16] James H.Faghmous, Mathew Le, Muhammed Uluyol, Vipin Kumar and Snigdhansu Chatterjee, "A

parameter-free spatio-temporal pattern mining model to catalog ocean dynamics," in IEEE 13th

International Conference on Data Mining, 2013.

[17] Rie Honda and Osamu Konishi, "Temporal rule discovery for Time-Series Satellite Images and

Integration with RDB," in Principles of Data Mining and Knowledge Discovery, Lecture Notes in

Computer Science ,2001.

http://link.springer.com/book/10.1007/3-540-44794-6

http://link.springer.com/bookseries/558

[18] Pol R. Coppin and Marvin E. Bauer, "Change Detection in Forest Ecosystems with Remote Sensing

Digital Imagery ," in Remote Sensing Reviews, 1996.

[19] Varun Mithal, Ashish Garg , Ivan Brugere, Shyam Boriah, Vipin Kumar, Michael Steinbach, ristopher

Potter, Steven Klooster , "Incorporating Natural Variation Into Time Series-Based Land Cover Change

Identification," in Proceedings of the NASA Conference on Intelligent Data Understanding, 2011.

[20] Varun Mithal, Shyam Boriah, Ashish Garg, Michael Steinbach, Vipin Kumar, "Monitoring Global

Forest Cover Using Data Mining," in Journal of Association for Computing Machinery, Volume V, 2010.

[21] D. Agarwal, A. McGregor, J.M.Phillips, S.Venkatsubramanian, and Z.Zhu, "Spatial Scan Statistics:

Approximations and Performance Study," in Knowledge Discovery in Databases Proceedings, 2006.

[22] Michael P. McGuire, Vandana P. Janeja, Aryya Gangopadhyay, "Spatiotemporal Neighborhood

Discovery for Sensor Data" in Knowledge Discovery in Databases Proceedings, 2008.

[23]Thijs T. van Leeuwen, Andrew J. Frank, Yufang Jin, Padhraic Smyth, Michael L. Goulden, Guido R.

van der Werf and James T. Randerson, "Optimal use of land surface temperature data to detect changes

in tropical forest cover," in Journal of Geophysical Research: Biogeosciences, 2011.