1 modeling evolution in spatial datasets paul amalaman 2/17/2012 dr eick christoph nouhad rizk...

15
1 Modeling Evolution in Spatial Datasets Paul Amalaman 2/17/2012 Dr Eick Christoph Nouhad Rizk Zechun Cao Sujing Wang Data Mining and Machine Learning Lab Team Members Anirup Dutta Swati Goyal Tarikul Islam Paul Amalaman

Upload: cassandra-mccoy

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Modeling Evolution in Spatial Datasets

Paul Amalaman2/17/2012

Dr Eick ChristophNouhad RizkZechun CaoSujing Wang

Data Mining and Machine Learning Lab Team Members

Anirup DuttaSwati GoyalTarikul IslamPaul Amalaman

2

I- BackgroundII- Research GoalsIII- Case StudyIV- Summary

3

Machine Learning Techniques are mostly used where• modeling implicit trends is possible (Regression)• stable patterns exist in dataset (Classification)

Simulation Systems are used when• a model is hard to establish• there is a great degree of randomness in the attribute values • there are a lot of interactions between objects • when attributes have to be predicted recursively over many steps

Example Applications of Simulation Systems:Traffic Modeling, Weather Forecasting, Social Networks, Urban Modeling

I-Background

4

I-Background continued(3)

Spatial Simulation Systems

Cellular Automata (CA)(Cell centered approach)

Continuous Agent SpaceOr Multi Agent System (MAS)

(Agent centered approach)

ABM

5

• Concept of neighborhood• Moore Neighborhood• Von Newman neighborhood

Moore Neighborhoodhttp://en.wikipedia.org/wiki/Moore_neighborhood

Von Newman Neighborhoodhttp://en.wikipedia.org/wiki/Von_Neumann_neighborhood

D(x-1,y-1) D(x-1,y) D(x+1,y-1)

D(x-1,y) P(x,y) D(x+1,y)

D(x-1,y+1) D(x-1,y+1) D(x+1,y+1)

D(x-1,y)

D(x-1,y) P(x,y) D(x+1,y)

D(x-1,y+1)

I-Background continued(3) Modeling with Cellular Automata

6

I-Background continued(4) Modeling with Cellular Automata

Cellular Automata • provides the programmer a cell-centered

programming style where the set of cells represents computing units that are regularly organized

• good efficiency with parallel architecture

7

II-Research Goals

Using Data Mining and Machine Learning Techniques to Enhance Simulation Systems

New approach= Machine Learning Techniques + Spatial Simulation Systems

Goal1: Grid-based Models for Progression in Spatial Datasets

Goal2: Development of Cluster-based Bias Removal Methods

8

?

yi,j,t+1= fij(x1,1,1,t,…, x1,n,n,t,… , xm,1,1,t,…, xm,n,n,t, y1,1,t,…,y,n,n,t)

II-Research Goal continued (1)Goal1:Grid-based Models for Progression in Spatial

Datasets

t t +1

X1(t)X2(t)

.

.Xn(t)Y(t)

X1(t+Δt)=?X2(t+Δt)=?

.

.Xn(t+Δt)=?Y(t+Δt)=?

Given that at t we know all the attribute values including the output variable Y, can we predict all attribute values at t+1?

Challenges:1. Many target variables to predict; different variables have to be predicted at different location 2. Target variables are not independent of each other (e.g. some are auto-correlated) 3. Models has to be used over multiple steps

9

EPA prediction models are meteorological and chemical transport models. Those models are derived from solving differential equations. Over time, the model bias grows larger

http://www.epa.gov/AMD/CMAQ/ch06.pdf

II-Research Goal continued (2)Goal2:Development of Cluster-based Bias Removal

Methods

ModelOutput + bias b(x)Input x

Whether pattern recognition

Model

Output Correction

(bias removal)

Inputx

Output h(b(x), group(x))

Bias removal based on whether pattern recognitionOur model, model h learn group(x), and b(x) and make better prediction

b(x)

group(x)

10

III-Case Study

Improving Ozone Forecasting For Houston-Galveston AreaGoal1: Development of a Grid-based Prediction Framework Goal2: Development of Cluster-based Bias Removal Methods

In Collaboration with UH-IMAQS Institute for Multidimensional Air Quality Studies (UH Department of Earth and Atmospheric Science) -Dr Rappenglueck, Bernhard-Dr Li, Xiangshang

11

III-Case Study Continued(1)

Ozone PredictionGoal 1:Improving Prediction for Spatial ProgressionGiven what happened at t, can we predict what happens at t+Δ, t+2Δ, ..?

12

Goal 2- Improving forecast Accuracy

III-Case Study Continued(2)Ozone Prediction

13

III-Case Study Continued(2)

Status of Dissertation

• Methods to collect ozone data and to capture it in a relational database have been developed.

• The necessary knowledge for simulation-based prediction systems in general, and ozone prediction in particular has been obtained

• Started work on different modeling approaches for grid-based prediction

14

IV-SUMMARY

15

Thank you!