spatial dependency modeling using spatial auto-regression

24
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3 , Baris M. Kazar 4 , Shashi Shekhar 1,3 , Daniel Boley 1 , David J. Lilja 1,2 1 CSE Department @ University of Minnesota, Twin Cities 2 ECE Department @ University of Minnesota, Twin Cities 3 Army High Performance Computing Research Center

Upload: ignatius-kerr

Post on 30-Dec-2015

58 views

Category:

Documents


0 download

DESCRIPTION

Mete Celik 1,3 , Baris M. Kazar 4 , Shashi Shekhar 1,3 , Daniel Boley 1 , David J. Lilja 1,2 1 CSE Department @ University of Minnesota, Twin Cities 2 ECE Department @ University of Minnesota, Twin Cities 3 Army High Performance Computing Research Center 4 Oracle USA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial Dependency Modeling Using Spatial Auto-Regression

Spatial Dependency Modeling Using Spatial Auto-Regression

Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2

1 CSE Department @ University of Minnesota, Twin Cities2 ECE Department @ University of Minnesota, Twin Cities3 Army High Performance Computing Research Center 4 Oracle USA

Page 2: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 2

Outline of Today’s Talk

• Motivation & Background

• Problem Definition

• Related Work & Contributions

• Proposed Approach

• Experimental Evaluation

• Conclusion & Future Work

Page 3: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 3

Motivation

• Widespread use of spatial databases Mining spatial patterns The 1855 Asiatic Cholera on London [Griffith]

• Fair Landing [NYT, R. Nader] Correlation of bank locations with loan

activity in poor neighborhoods• Retail Outlets [NYT, Walmart, McDonald etc.]

Determining locations of stores by relating

neighborhood maps with customer

databases• Crime Hot Spot Analysis [NYT, NIJ CML]

Explaining clusters of sexual assaults by

locating addresses of sex-offenders• Ecology [Uygar]

Explaining location of bird nests based on structural environmental variables

Page 4: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 4

Spatial Auto-correlation (SA)• Random Distributed Data (no SA): Spatial distribution satisfying assumptions of classical data

• Cluster Distributed Data: Spatial distribution NOT satisfying assumptions of classical data

Pixel property with

independent identical

distribution

RandomNest

Locations

Pixel property with

spatial auto-

correlation

ClusterNest

Locations

Page 5: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 5

Execution Trace

WEST21 )1,(SOUTH 111 j)1,(iEAST 111 )1,(

NORTH 12 ),1(

),(

qjp, ijiqj, p-iq-jp, i ji

qj p,iji

jineighbors

W allows other neighborhood definitions• distance based• 8-neighbors

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Space + 4-neighborhood

6th row

Binary W

6th row

Row-normalized W

Given:• Spatial framework• Attributes

0100100000000000101001000000000001010010000000000010000100000000100001001000000001001010010000000010010100100000000100100001000000001000010010000000010010100100000000100101001000000001001000010000000010000100000000000100101000000000001001010000000000010010

021002

1000000000003

1031003

10000000000

03103

10031000000000

002100002

1000000003

1000031003

10000000

041004

1041004

1000000

0041004

1041004

100000

00031003

10000310000

00003100003

10031000

0000041004

1041004

100

00000041004

1041004

10

000000031003

1000031

000000002100002

100

00000000031003

10310

000000000031003

1031

0000000000021002

10

Page 6: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 6

• Linear Regression → SAR• Spatial auto-regression (SAR) model has higher accuracy and removes

IID assumption of linear regression

εxβy εxβWyy

SDM Provides Better Model!

Page 7: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 7

Data Structures in SAR Model

• Vectors: y, β, ε

• Matrices: W, x• W is a large matrix

y

= + +

W x β ε

n-by-1 n-by-n

1-by-1 n-by-k k-by-1 n-by-1

y

n-by-1

Page 8: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 8

1

ln2

||lnMIN

)]()([

1||

1

BBA

yWIxxxxIBWIA

T

TT

n

n

Computational Challenge

• Maximum-Likelihood Estimation = MINimizing the log-likelihood Function

• Solving SAR Model– = 0 → Least Squares Problem– = 0, = 0 → Eigen-value Problem– General case: → Computationally expensive due to the

log-det term in the ML Function

framework spatialover matrix odneighborho -by- : parameter n)correlatio-(auto regression-auto spatial the:

nnW

Log-det termTheorem 1

β ε

SSE term

Page 9: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 9

Outline

• Motivation & Background

• Problem Definition

• Related Work & Contributions

• Proposed Approach

• Experimental Evaluation

• Conclusion & Future Work

Page 10: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 10

Problem Statement

Given: • A spatial framework S consisting of sites {s1, …, sq}

for an underlying geographic space G• A collection of explanatory functions fxk

: S k , k=1,…, K. k is the range of possible values for the explanatory functions

• A dependent function fy: y • A family of F (SAR equation) of learning model

functions mapping 1 x … x k y • A neighborhood relationship (4 and 8- neighbor) on

the spatial frameworkFind:

• The SAR parameter and the regression coefficient vector with a desired precision to save log-det computations.

Page 11: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 11

Problem Statement – Cont’d

Objective: Algebraic error ranking of approximate SAR model

solutions.Constraints:

• S is a multi-dimensional Euclidean Space, • The values of the explanatory variables x and the

dependent function (observed variable) y may not be independent with respect to those of nearby spatial sites, i.e., spatial autocorrelation exists.

• The domain of x and y are real numbers.• The SAR parameter varies in the range [0,1), • The error is normally distributed with unit standard

deviation and zero mean, i.e., ~N(0,2I) IID • The neighborhood matrix W exhibits sparsity.

Page 12: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 12

Related Work

Exact Estimate

Matrix Exponential Specification [Pace00]

Graph Theory [Pace00]

Taylor Series [Martin93, Kazar04, Shekhar04]

Chebyshev Poly. [Pace02, Kazar04,Shekhar04]

NORTHSTAR [Kazar05-06] Semiparametric Estimates[Pace02]

Characteristic Poly. [Smirnov01]

Double Bounded Likelihood Estimator[Pace04]

Upper & Lower Bounds via Div&Conq [Pace03]

SAR Local Estimation[Pace03]

Gauss-Lanczos [Bai, Golub98,Kazar05-06]

Matrix Exponential Specification[LeSage00]

MCMC [Barry99,LeSage00]None

Maximum Likelihood

Bayesian

Eigen-value based 1-D Surface Partitioning[Li96,Kazar03-04]

Direct Sparse Matrix Algorithms [Pace97, Kazar05]

Page 13: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 13

Contributions

• A new approximate SAR model solution: Gauss-Lanczos approximation method– Key Idea: Do not find all of the eigenvalues of W

• Error ranking of approximate SAR model solutions

)|()|(

))|((1

yy

y

d

df

Page 14: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 14

Outline

• Motivation & Background

• Problem Definition

• Related Work & Contributions

• Proposed Approach

• Experimental Evaluation

• Conclusion & Future Work

Page 15: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 15

Gauss-Lanczos Approximation

n

i

i

rIm

tr1

)(~~ 1

))(ln(ln WIWI

• Log-det is approximated by transforming the eigenvalue problem to the quadratic form.

• Finally, Gauss-type quadrature rules are applied using Lanczos procedure

Page 16: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 16

How does GL Method Work?

rr

rr

r

a

a

a

a

T

1

11

2

221

11

0...0

0......

...0

0...0

• GL (Algorithm 3.2) is repeated m (i.e., 400) times in our experiments• Parameter r varies between 5 and 8 in our experiments. • For large problem sizes, the effects of m and r for getting good solution are low.

Page 17: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 17

Taylor’s Series Approximation

• Log-det term in terms of Taylor’s Series– Trace is sum of eigen-values & W is symmetrized neighborhood matrix

SSE stage (Stage C)

One Dense Matrix (n-by-n) and Vector

(n-by-1) Multiplicatio

n

2 Dense Matrix (n-by-k) and

Vector (n-by-1)

Multiplications

3 Vector (n-by-1)

Dot Products

Scalar

Operation

2ˆ,ˆ,ˆ β Golden Section

search

Calculate ML Function

W~

, W, ρ , x, y

Taylor’s Series Expansion applied to

||ln WI

bestfit

ML Function

Value

Similar to Stages A & B

q

kk

ktracek|

1

)(|ln

WWI

Page 18: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 18

Chebyshev Polynomial Approximation

• Log-det term in terms of Chebyshev Polynomials – Trace is sum of eigen-values, Ts are matrix polynomials, cs are Chebyshev

polynomial coefficients

SSE stage (Stage C)

2ˆ,ˆ,ˆ β One Dense

Matrix (n-by-n) and Vector

(n-by-1) Multiplication

2 Dense Matrix (n-by-k) and Vector

(n-by-1) Multiplication

s

3 Vector (n-by-1)

Dot Products

Scalar

Operation

Similar to Stages A & B

Chebyshev Polynomial applied to ||ln WI

Chebyshev Polynomial Approximation

W~

q Golden Section search

Calculate ML

Function

W, W~

, ρ ,x,y

bestfit

ML Function

Value Trace of n-by-n dense matrix

Chebyshev coefficients

)(jc

q-1 dense n-by-n

matrix-matrix multiplications

1

111 )(

2

1))(()(||ln

q

kkk cTtracec WWI

Page 19: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 19

Outline

• Motivation & Background

• Problem Definition

• Related Work & Contributions

• Proposed Approach

• Experimental Evaluation

• Conclusion & Future Work

Page 20: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 20

Experiment Design

Factor Name Parameter Domain

Problem Size (n) 400, 1600, 2500 observation points

Neighborhood Structure

2-D with 4-neighbors

Candidates • Exact Approach (Eigenvalue Based)• Taylor's Series Approximation• Chebyshev Polynomial Approximation• Gauss-Lanczos Approximation

Dataset Synthetic Dataset for =0.1, 0.2, ….., 0.9

SAR Parameter [0,1)

Programming Language

Matlab

Page 21: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 21

Exact and Approximate Values of Log-det

• GL gives better approximation while spatial autocorrelation increases

Page 22: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 22

Absolute Relative Error of Approximations

• Absolute relative error of approximation goes down as spatial autocorrelation increases (GL Mean error % 0.9, GL max error % 1.78)

Page 23: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 23

Conclusions

• GL is slightly more expensive than Taylor series and Chebyshev polynomials.

• GL gives better approximations when spatial autocorrelation is high and the problem size is large.

• GL quality depends on the number of iterations and the initial Lanczos vector and the random number generator.

• No need to compute all eigenvalues.

Page 24: Spatial Dependency Modeling Using Spatial Auto-Regression

07/08/2006 Spatial Dependency Modeling Using SAR 24

` Acknowledgments

• AHPCRC• Minnesota Supercomputing Institute (MSI)• Spatial Database Group Members• ARCTiC Labs Group Members• Dr. Dan Boley• Dr. Sanjay Chawla• Dr. Vipin Kumar• Dr. James LeSage • Dr. Kelley Pace• Dr. Pen-Chung Yew

THANK YOU VERY MUCHQ/A