seminar final1
TRANSCRIPT
Amod Aggarwal
(13535005)
Guided by:
Dr. Padam Kumar
Dr. Dhaval Patel
Introduction
Literature Survey
Conclusion
References
What is Meteorology and Oceanography?◦ study of spatial and temporal variations of the atmospheric,
oceanographic and land parameters over long time periods
◦ helps in prediction of disasters which prevents loss of life and
property
What is data mining?
◦ process of extraction of
implicit,
previously unknown
and potentially useful information from huge amount of data
Anomaly
detection
Detection of Land cover change , outlier
values of precipitation
Association rule
mining
Finding association between
oceanographic parameters and cyclone
intensification
Pattern mining
Understanding of natural events. For
example: eddies sustain energy for weeks
or months and therefore can be
manifested as connected group of
gradually increasing or decreasing time
series
Classification Detection of water fraction per flood pixel
Regression Detection of forest cover per pixel
Technique Application
Gradually decreasing segments of time series enclosed between red and
green lines are signatures of an eddy
Swirls of ocean currents
Play significant role in transport
of water, heat, salt, and nutrients
Green
swirl is
ocean
eddy
This is challenging due to following reasons:◦ Not concrete objects: Spatio-temporal phenomena are not
concrete objects but evolving patterns over space and timewhereas in traditional data mining, objects are concrete i.e. theyare either present or absent.
Transactions – item either present or absent (0 or 1)
Hurricanes – continuous gradual evolution, does not simply appearand disappear
◦ Uncertainty: It occurs due to biases in measurement as somevalues may be missing due to presence of cloud cover.
◦ Diversity: This is due to heterogeneity in space and time as datamay be available from different sources at different spatial andtemporal resolutions
◦ Variability: Values captured for same location at difference ofsmall intervals may vary due to local climatic variations
The data retrieved from the remote sensing satellites is in the form
of data products having different data formats.
The standard data format for most of the data products is HDF
format. Some other formats are NetCDF, KML etc.
One data product contains data related to one parameter.
The authenticated users can download the Indian satellite data from
mosdac.gov.in website of ISRO.
MOSDAC disseminates data for around 20 parameters. Some of
these are: o Normalized Difference Vegetation Index (NDVI)
o Land surface temperature (LST)
o Aerosol Optical Depth (AOD)
o Cloud Liquid Water (CLW)
o Mean Sea Surface (MSS)
Container for storing a variety of scientific data
Composed of two primary types of objects :
◦ Groups :
grouping structure containing 1 or more HDF objects together
with supporting metadata
◦ Datasets :
Multidimensional array of data elements together with
supporting metadata
Introduction
Literature Survey
Conclusion
References
Anomaly detection
◦ Land cover change detection
◦ Outlier precipitation detection
◦ Outlier time interval detection
Detection of water fraction per flood pixel
Detection of forest cover per pixel
Aim:◦ To find those locations which undergo significant and sudden change
during a particular time period.
◦ The time at which the change occurs is also determined.
Importance:◦ Helps in mapping of damages following a natural disaster such as fire,
droughts, floods etc.
Land cover change detection
methods
Bitemporal methods
Time series data mining techniques
Red – focused
techniques
Bitemporal methods
Image differencing
Image ratioing
Principal component
analysis
Change vector
analysis
Land cover change
detection methods
Bitemporal methods
Time series data mining techniques
Time series data mining techniques
Predictive model based
Yearly Delta Algorithm
Variability Distribution Algorithm
Vegetation Independent Yearly Delta
Algorithm
Segmentation based
Top down approach
Bottom up approach
Recursive merging algorithm
Land cover change
detection methods
Bitemporal methods
Time series data mining techniques
Bitemporal methods Time series data mining methods
Two time instants are compared Vegetation time series is analyzed at each
location and changes in the time series are
identified
Do not provide the information about the
time of change
Provides the information about the time of
change
Less computational complexity Computational complexity is high as large
time series has to be analyzed
Segmentation based approach Predictive model based approach
Time series is partitioned into homogenous
segments and boundaries between
segments may be change points
A model is constructed for the portion of the
time series and that is used to predict the
future time points.
The time series that are sufficiently different
are considered change points.
Time series data mining methods
Segmentation based
◦ Recursive merging algorithm
Predictive model based◦ Yearly Delta algorithm
◦ Variability distribution algorithm
◦ Vegetation independent variability distribution algorithm
Input : Monthly composited EVI (Enhanced Vegetation Index) dataset for
the state of California for years 2000-2006.
Output : Detection of land cover changes
◦ Forest fires
◦ Conversion to farming
◦ Construction or logging
Algorithm : The pixel time series is analyzed as follows:
1. Let {b1,b2,…., bn} correspond to list of annual EVI sum which is the sum of
vegetation index value of all the months.
2. Two consecutive segments with most similar annual EVI sums are merged
• Suppose b1 and b2 are most similar EVI sums, then at the end of this
step, list will be {(b1+b2)/2,b3,…., bn} having one less element
• Merge cost s1= dist {b1,b2}
3. Step 2 is applied recursively until list contains one element
4. List of merge costs will be s1,s2,.......,sn-1.
5. Change score for a location or pixel will be
6. Pixels are ranked on basis of change score value and some top ranked
pixels are considered as changes.
1
1
1
1
min
max score change
n
ii
i
n
i
s
s
b1 b2 b3
Time series for one pixel
Change score is calculated in such a way so as to take into account
the type of vegetation
◦ very small change can be considered as change point for stable
forests
◦ large change may not be change point for high variability regions
such as grasslands
Helps in reducing the detection of false positives
Limitations:
◦ Minimum cost of merging is considered as variability value due to
local climatic changes.
◦ But, the minimum cost may have occurred very rare and have
been captured by chance
Time series data mining methods
Segmentation based
◦ Recursive merging algorithm
Predictive model based◦ Yearly Delta algorithm
◦ Variability distribution algorithm
◦ Vegetation independent variability distribution algorithm
Input: MODIS EVI data for California and Yukon
◦ Data for California is at 250m spatial resolution for years 2006-
2008.
◦ Data for Yukon is at 1km spatial resolution for years 2004-2008.
◦ Time series for each pixel is analyzed independently
Output:
◦ Land cover change locations (pixels)
◦ Time at which change occurred
Validation: High quality data for fires generated from independent
source is used for validation
Algorithm:
◦ Previous year is considered as a model
◦ Change score is assigned to each time step as difference between mean annual
EVI of current year and previous year
◦ Maximum change score across all the time steps is considered the YD score for a
location
◦ Top ranked pixels according to YD score are called change points.
Limitation :
◦ Does not make use of information about natural variation in EVI.
◦ Only one top change of a time series is considered.
There is possibility that one time series may undergo multiple changes during a
given period
)score change(max score 1-n
1iYD
EVI annualEVI annualscore change year previousyearcurrent year current
Actual change occurs in year 2008
Difference in
annual EVI is
high
Change occurs in year 2005 due to natural
variations
Although
difference in
annual EVI is
high but not
very high if
compared with
mean
variability
score
Time series data mining methods
Segmentation based
◦ Recursive merging algorithm
Predictive model based◦ Yearly Delta algorithm
◦ Variability distribution algorithm
◦ Vegetation independent variability distribution algorithm
Algorithm:
Each annual segment in the first k years is considered a model and remaining k-1values are considered as the observed values.
Mean Manhattan distance is computed for the k-1 years of model to give thedistribution of variability scores for that location.
Modified score value called VD score is used which is
where µ is the mean of distribution.
The mean is estimated using Maximum Likelihood Estimation method
Special features:
Makes use of information about natural variation in EVI.
Any year for which annual EVI deviates significantly from the mean annual EVI for kyears should be discarded
Limitation :
Some of the vegetation types such as open shrubs have large variations in spread ofannual variability
- score YD score VD
As only one
vegetation type
i.e. forests is
considered,
therefore YD is
also performing
better
Scatter plot of mean variability against YD score for forest cover
(Courtesy: Mithal et al. [6])
Constant YD score
Constant VD score
Change
point
Savannas consist of
trees, shrubs, grasses
etc.
The different
vegetation types has
different value of
threshold change
score to be
considered as actual
change.
Therefore, VD
performs better than
YD algorithm
Scatter plot of mean variability against YD score for savannas (Courtesy:
Mithal et al. [6])
Constant VD score
Constant YD score
As open shrub-lands
show different spread
of variability for
different locations
even though
vegetation type is
same, therefore both
YD and VD are
showing lot of false
positives
Scatter plot of mean variability against YD score for shrublands
(Courtesy: Mithal et al. [6])
Constant VD score
Constant YD score
Time series data mining methods
Segmentation based
◦ Recursive Merging Algorithm
Predictive model based◦ Yearly Delta Algorithm
◦ Variability Distribution Algorithm
◦ Vegetation Independent Variability Distribution Algorithm
Algorithm:
Mean and standard deviation of variability score distribution are
estimated as maximum likelihood estimates of distribution
New score called VID score is used and calculated as follows:
Salient features:
Takes into account the information about spread of variability score
distribution and therefore reduces false alarm rates
High VID score implies lower false positive rate and vice versa.
- score YDscore VID
Mean annual EVI
Both pixels correspond to shrub vegetation
type whose spread of variability score varies
from location to location and time to time.
Variability score in
this area indicates
change for pixel 2
Variability score in
this area indicates
change for pixel 1Curve for variability
score for pixel 1
Curve for variability
score for pixel 2
Maximum likelihood estimation (MLE)
Every model is specified by the parameters.
MLE is a parameter estimation method which finds the parameter values of a
model that best fits the data.
As fluctuations in variability score for particular vegetation type are normally
distributed for a location, therefore parameters are calculated for normal
distribution
The mean and standard deviation are the parameters for the normal distribution.
Calculation of mean and standard deviation using MLE
◦ Let f(y|w) denotes probability density function (PDF) that specifies probability of observing data
vector y given the parameter w.
◦ If individual observations yi are independent of each other, then according to theory of
probability, the PDF for data y=(y1,.......,yn) given the vector w can be expressed as
multiplication of individual PDFs.
f(y=(y1,…..yn)|w) = f1(y1|w) f2(y2|w)…..fn(yn|w)
The PDF for one observation is
The PDF for multiple independent observations is
Taking log on both sides
exxxi
fn
2
)(
2
1),|,.....,( 2
2
1
exi
n
n
2
)(2
2
)2( 2
22
2
)ln()2ln(2
1)ln(
xinnf
exi
xP
2
)(
2
1)( 2
2
In order for data to best fit the model, the value of the parameter
vector should maximize the PDF.
The partial differentiation of PDF with respect to each of component
parameter of vector should be zero
n
f xx ii
0
))(ln(2
n
iinf xx
2
3
2
0))(ln(
Yearly Delta algorithm Variability distribution
method
Vegetation Independent
Yearly Delta Algorithm
Does not consider the type
of vegetation.
Same YDscore value may be
actual change for forests
but not for savannas or
shrublands.
Considers the type of
vegetation
Same VD score may be
actual change for regions
such as savannas (having
less variation in variability
value) but not for
shrublands (having high
variation in variability value)
Considers the type of
vegetation
VIDscore works for all the
vegetation types
Does not consider the
average change score
value(µ) and the degree of
variability in value(σ)
Considers only the average
change score value(µ)
Considers both the average
change score value(µ) and
the degree of variability in
value(σ)
YDscore= max i=1 to n(annual
EVI current year – annual EVI
previous year)
VD score = YDscore - µ VID score=(YDscore-µ)/ σ
Where TPn = true positives,
FPn = false positives,
M = total no of pixels considered
VD and VID gives better
results than YD.
Reason:
Graph corresponds
to only forest region.
MODIS forest map
was used to detect
forest cover pixels inaccurate and
includes some
shrubs and
agricultural land
labeled as forests.
Green line -> YD score
Red line -> VD score
Black line -> VID score
VID performs slightly
worse than VD
Reason-Initial few years
selected to model variability
may have some noise
Therefore, mean variability
for that location is modeled
as high and changes in later
years will go undetected
Green line -> YD score
Red line -> VD score
Black line -> VID score
Performance of
VID is best.
Reason-Shrubs
form dominant land
cover type for
California and they
show high variability
in spread of
variability score due
to higher sensitivity
to climatic variations
Green line -> YD score
Red line -> VD score
Black line -> VID score
Performance of YD is
exceptionally poor and that
of VID is exceptionally
good.
Reason-due to high
variability in spread of
variability score for different
locations with vegetation
type as shrubs
Green line -> YD score
Red line -> VD score
Black line -> VID score
Anomaly detection
◦ Land cover change detection
◦ Outlier precipitation detection
◦ Outlier time interval detection
Detection of water fraction per flood pixel
Detection of forest cover per pixel
Input : ◦ South American Precipitation dataset in geoscience format known as NetCDF
Output:◦ The top k=5 outliers are found for every year◦ Total of 155 outlier sequences were found over a period of 10 years
Running time of algorithm is 229s.
Variable Value
Num Year Periods 10
Year Range 1995-2004
Grid Size 2.5º×2.5º
Num Latitudes 31
Num Longitudes 23
Total Grids 713
Aim:
◦ To find and track the position of outliers with time
Method description:
◦ Top k outliers are found for every year using Exact-Grid Top-k algorithm
◦ Outliers are tracked using the OutStretch algorithm
◦ The outlier sequences generated are analyzed
How to find the outlier (Exact-Grid Top-k algorithm)
◦ Concept of discrepancy is used
◦ Discrepancy value is assigned to each rectangular region using
Kulldorff’s scan statistic.
◦ Top-k outliers are selected for further processing as it is necessary in
order to track the outliers
How to calculate the discrepancy?◦ Two parameters are required:
a measurement m (number of incidences of an event)
a baseline b (total population at risk)
◦ The measurement M and baseline B values for the whole dataset (U) arecalculated as
◦ The measurement M and baseline B values for the region (R) are calculatedas
◦ The discrepancy score of the shaded area is calculated by using the givenformula:
◦ For the above figure, M=6, B=16, mR= 4/6, and bR = 4/16
Up
pmM )(
Up
pbB )(
M
pmRp
Rm
)(
B
pbRp
Rb
)(
)1
1log()1(log),(
bm
mbm
mbmR
R
R
R
R
RRRd
Outstretch algorithm
◦ The region is stretched around each side of the outlier region of the
previous year
◦ Each of outlier in current year is examined to see whether it lies in the
region consisting of stretched region and outlier region of previous year
◦ If it is, then it will be added to child list of previous year outlier
RecurseNode algorithm:
◦ All the sequences starting at root node of trees and ending at leaf node
are fetched.
Outlier region of
previous year
Stretched
region
Forest built by applying outstretch algorithm recursively
(1,1), (2,2) and (3,2)
corresponds to one
sequence followed
by outlier
Anomaly detection
◦ Land cover change detection
◦ Outlier precipitation detection
◦ Outlier time interval detection
Detection of water fraction per flood pixel
Detection of forest cover per pixel
Input:
◦ Sea surface temperature (SST) data of Equatorial Pacific Ocean.
◦ The data consisted of measurements of sea surface temperature
for 44 sensors in Pacific Ocean
◦ Each sensor had a time series of 1440 data points.
Output:
◦ Time intervals where spatial neighborhood has shown abnormal
behavior.
Terms:
Spatial distance (sd) : Distance between 2 locations based on distance between
spatial coordinates
Measurement distance (md) : Distance between 2 points based on difference
between features of 2 points.
Spatial neighborhood : Cluster of locations such that the spatial distance (sd)
and measurement distance (md) between every 2 locations is less than the
respective threshold values.
Sum of squared error (SSE) : Measure of degree of abnormality of the interval
Where p and q are 2 locations
Where valbn is each
temporal reading in base
interval and µ is the meanof the temporal readings
m
ss qampammd1
2
)(
BN
bnvalbn
distSSE1
2
)( int
)()(22
ssss qypyqxpxsd
Aim:
◦ To find time intervals where spatial neighborhoods are likely to show
abnormal behavior.
Algorithm:
◦ Time series is first divided into a set of base equal size temporal
intervals
◦ Spatial neighborhoods are found for every base interval
◦ Each of spatial node in every base interval is analyzed and binary
classified as 1 if showing abnormal behavior or 0 otherwise
◦ Count of spatial nodes having a binary error classification of 1 is
found for every base interval and this count is called vote count.
◦ A threshold mv is then applied and those intervals for which votes >
mv are binary classified as 1 and others as 0.
◦ Consecutive base intervals which have same binary classification are
merged to form the larger intervals.
◦ Mean value for each edge is calculated for every interval.
◦ Spatial neighborhoods are calculated for each interval using the
mean value of edge.
Location : 0ºN latitude and 110ºW longitude
Time period : 10 day period from 01/01/2004 to 01/10/2004
No of measurements: approx.1400
Agglomerative temporal intervals for SST data
Neighborhood (a) represents cooler water
Neighborhood (b) represents warmer water
Neighborhood (c) and (d) represents moderate water
• Edge clustering is validated by satellite image of SST.
• Light regions represent cooler temperatures
•Dark regions represent warmer temperatures.
Neighborhood quality for each interval
SSE of
neighborhood (a)
shows interesting
pattern between
intervals 16 and 19
SSE goes from
high to low and then
back from low to high
Neighborhood (a)
has more spread
during 16th interval
as compared to
17th interval
Input :◦ Land cover type◦ 8-day composite surface reflectance for NIR band (CH1) and VIS
band (CH2)◦ CH2-CH1◦ CH2/CH1◦ NDVI dataset◦ Data before flooding in Mississippi basin is used as training dataset◦ Data after flooding in Mississippi basin that occurred on June 17-19,
2008 is used for testing
Output:◦ The best attribute (R) for classification i.e. CH2-CH1 is found.◦ The threshold values of the best attribute (R) for pure water and pure
land are found.
Validation data:◦ 30m spatial resolution Landsat TM imagery for validation purposes
Aim:
◦ To find the fraction of water in flood pixels which are usually water
mixed with land cover features for MODIS dataset which has
coarse resolution
Method description:
◦ Decision tree approach is used to find
the best parameter (predictor) in order to differentiate between
land and water.
the threshold values of the predictor R for pure water (Rwater)
and pure land (Rland)
◦ Water fraction per pixel can be found by comparing actual value
of predictor with its value for pure water or pure land
RRR landwatermixWFWF *)1(*
100*))/()(( RRRR waterlandmixlandWF
Experimental Results:
Some of the rules used for deciding threshold values are :◦ (CH2-CH1) > 9.17 -> class Land
◦ (CH2-CH1) <= 2.91 -> class Water
Correlation between TM and MODIS water fractions is 0.97 with
bias of 4.47% and standard deviation of 4.4%.
Decision tree created
using C4.5 algorithm
Anomaly detection
◦ Land cover change detection
◦ Outlier precipitation detection
◦ Outlier time interval detection
Detection of water fraction per flood pixel
Detection of forest cover per pixel
Input :
◦ Land surface temperature 5-monthly composited MOD11C3
product
◦ NDVI and EVI from monthly composited MOD13C2 product
◦ Land cover type from MCD12C1 yearly product
Output:
◦ Fraction of forest cover per pixel
Validation:
◦ Forest cover information from PRODES data at 90 m resolution available in GeoTiff format is used for validation purposes
Aim :◦ To find the forest cover per pixel for MODIS dataset having coarse
resolution
◦ The data values for parameters like NDVI, EVI, land surface temperatureetc are available per pixel.
◦ Therefore, value is affected by vegetation cover of every point covered inthat pixel.
◦ Same parameter value may correspond to different fraction of forestcover depending on vegetation type for whole area per pixel.
Algorithm:◦ Modification of Leeuwen et al. approach.
◦ Leeuwen et al. approach gives the single logistic regression model for allvegetation types.
◦ But, improved algorithm considers vegetation type and givesindependent logistic regression model for each vegetation type
Leeuwen et al. approach
Terms:
pit : Fraction of forest cover for pixel i in year t (generated from the
analysis of high-resolution LandSat TM) images
Xit : Vector of MODIS observations for pixel i in year t
β: Vector of model parameters (which are estimated from a set of
training data) for pixel i in year t
The vectors Xit and β each have three components:
◦ the first corresponding to a constant intercept term
◦ the second to a NDVI measurement,
◦ and the third to a LST measurement.
Model : X
p
pit
T
it
it
1ln
Learning independent regression algorithms require segmentation
of observation space into multiple categories.
Segmentation is done by partitioning the feature space which is n-
dimensional space with one feature corresponding to one of axis.
Features are selected based on their ability in differentiating
between different vegetation types.
For example: Forests show high inter-annual NDVI and EVI mean
and low inter-annual LST mean but intra-annual variance of NDVI,
EVI and LST is low.
Therefore, mean(µ) and variance(σ2) are selected as features.
Vegetation type distribution in feature space (µ, σ2) of NDVI
Forests show high
inter-annual mean
and low intra-annual
variance
Farms show high
intra-annual variance
due to crop cycles
Grasslands show
high intra-annual
variance and high
inter-annual mean
Water locations
show high intra-
annual variance and
low inter-annual
mean
Analysis of partition
corresponding to forest
vegetation type
Scatter plot of residual of
baseline approach and residual
of vegetation specific approach
Residual of vegetation
specific approach has lower
magnitude than baseline model
Therefore, vegetation
approach better than baseline
model
Analysis of partition
corresponding to cropland
vegetation type
Residual of vegetation
specific model is lower in
magnitude as compared to
baseline model
Introduction
Literature Review
Conclusion
References
Various research works related to anomaly detection and
detection of water fraction or forest cover per pixel have been
discussed.
Most of the research works are pixel-based and do not
consider the spatial neighborhood of a pixel.
Domain knowledge is also required along with data mining
techniques
Future works should work towards addressing these
limitations.
Introduction
Literature Review
Conclusion
References
[1] Jonathan T. Overpeck, Gerald A.Meehl, Sandrine Bony, David R. Easterling, D. (2011) ,"Climate
data challenges in the 21st century," in Science, 2011.
[2] James H. Faghmous and Vipin Kumar," Spatio-temporal data mining for climate data : Advances,
Challenges and Opportunities," in Data Mining and Knowledge Discovery for Big Data, 2014.
[3] Donglian Sun, Yunyue Yu, Mitchell D. Goldberg ,"Deriving Water Fraction and Flood Maps From
MODIS Images Using Decision Tree Approach," in IEEE Journal of Selected Topics In Applied Earth
Observations And Remote Sensing , 2011.
[4] Shyam Boriah, Vipin Kumar, Michael Steinbach, Christopher Potter, Steven Klooster," Land
Cover Change Detection : A Case Study," in Knowledge Discovery in Databases Proceedings, 2008.
[5] Elizabeth Wu, Wei Liu, Sanjay Chawla ,"Spatio-Temporal Outlier Detection in Precipitation Data,"
in Knowledge Discovery in Databases Proceedings, 2008.
[6] Hong Yeon Cho, Ji Hee Oh, Kyeong Ok Kim, and Jae Seol Shim, "Outlier Detection and missing
data filling methods for coastal water temperature data," in Journal of Coastal Research, 2013.
[7] C.T.Dhanya and D.Nagesh Kumar, " Data mining for evolution of association rules for droughts
and floods in India using climate inputs," in Journal of Geophysical Research, 2009.
[8] Ruixin Yang, Jiang Tang, and Donglian Sun, " Association Rule Data Mining Applications for
Atlantic Tropical Cyclone Intensity Changes," in Journal of American Meteorological Society, 2011.
[9] James H.Faghmous, Yashu Chamber, Shyam Boriah, Stefan Liess, Vipin Kumar, "A novel and
scalable spatio-temporal technique for ocean eddy monitoring," in Association for Advancement of
Artificial Intelligence, 2012.
[10] Imran Maqsood, Muhammad Riaz Khan, and Ajith Abrahim, "An ensemble of neural networks for
weather forecasting," in Neural Computing & Applications , 2004.
[11] Agboola A.H., Gabriel A.J., Aliyu E.O., Alese B.K., "Development of a Fuzzy Logic Based
Rainfall Prediction Model," in International Journal of Engineering and Technology, 2013.
[12] Christopher G.Healey, "On the Use of Perceptual Cues and Data Mining for Effective
Visualization of Scientific Datasets," in Proceedings Graphics Interface, 1998.
[13] Wenwen Li, Chaowei Yang, Donglian Sun, "Mining geophysical Parameters through Decision Tree
Analysis to Determine Correlation with Tropical Cyclone Development," in Computers & Geosciences,
2008.
[14]Pinky Saikia Dutta, and Hitesh Tahbilder, "Prediction of Rainfall using Data mining Technique over
Assam," in Indian Journal of Computer Science and Engineering (IJCSE),2014.
[15]Anuj Karpatne, Mace Blank, Michael Lau, Shyam Boriah, Karsten Steinhaeuser, Michael Steinbach
and Vipin Kumar," Importance of Vegetation Type in Forest Cover," in Intelligent Data Understanding,
2012.
[16] James H.Faghmous, Mathew Le, Muhammed Uluyol, Vipin Kumar and Snigdhansu Chatterjee, "A
parameter-free spatio-temporal pattern mining model to catalog ocean dynamics," in IEEE 13th
International Conference on Data Mining, 2013.
[17] Rie Honda and Osamu Konishi, "Temporal rule discovery for Time-Series Satellite Images and
Integration with RDB," in Principles of Data Mining and Knowledge Discovery, Lecture Notes in
Computer Science ,2001.
[18] Pol R. Coppin and Marvin E. Bauer, "Change Detection in Forest Ecosystems with Remote Sensing
Digital Imagery ," in Remote Sensing Reviews, 1996.
[19] Varun Mithal, Ashish Garg , Ivan Brugere, Shyam Boriah, Vipin Kumar, Michael Steinbach, ristopher
Potter, Steven Klooster , "Incorporating Natural Variation Into Time Series-Based Land Cover Change
Identification," in Proceedings of the NASA Conference on Intelligent Data Understanding, 2011.
[20] Varun Mithal, Shyam Boriah, Ashish Garg, Michael Steinbach, Vipin Kumar, "Monitoring Global
Forest Cover Using Data Mining," in Journal of Association for Computing Machinery, Volume V, 2010.
[21] D. Agarwal, A. McGregor, J.M.Phillips, S.Venkatsubramanian, and Z.Zhu, "Spatial Scan Statistics:
Approximations and Performance Study," in Knowledge Discovery in Databases Proceedings, 2006.
[22] Michael P. McGuire, Vandana P. Janeja, Aryya Gangopadhyay, "Spatiotemporal Neighborhood
Discovery for Sensor Data" in Knowledge Discovery in Databases Proceedings, 2008.
[23]Thijs T. van Leeuwen, Andrew J. Frank, Yufang Jin, Padhraic Smyth, Michael L. Goulden, Guido R.
van der Werf and James T. Randerson, "Optimal use of land surface temperature data to detect changes
in tropical forest cover," in Journal of Geophysical Research: Biogeosciences, 2011.