exploring spatial datasets using discriminative pattern ... · a won by obama, in footprint of...

Post on 14-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Exploring Spatial Datasets Using Discriminative Pattern Mining and Pattern Similarity MeasureLunar and Planetary Institute (tstepinski@lpi.usra.edu)

Tomasz F. Stepinski Wei DingDept. of Computer Science, Univ. of Massachusetts Boston (Wei.Ding@umb.edu)

MotivationComplex multi-attributed spa-tial datasets hide knowledge that needs to be discovered by exploring their structure. We propose association analysis-based strategy for exploration spatial datasets posessing prior binary classi-fication.

Input data :>

Lunar and Planetary Institute (salazar@lpi.usra.edu)

Josue Salazar

Example: Analysis of 2008 presidential election Innovation

mining for discriminative

patterns

class 2

class 1

class1

multi-attribute spatial datasetwith prior binary classification

Each spatial element is a transaction containing values of exploratory attributes

cluster 1

clust

er 2

cluste

r 3

aggomerative clusteringof patterns

Segmentation of class 1into clusters of similar patterns of exloratoryattributes

Algorithm

11

11

1

1 11 1

1 11

22

22

22

2 2

11 1

11

1

2

22

2

2 2

footprint of pattern Y(2 objects)

footprint of pattern X(4 objects)

12 __A B C D

pattern Y

attributes

1 2 __A B C D

pattern X

attributes

S (X, Y ) = 4i=1 wiS i (X i , Y i )Σ

11

11

1 2

22

attribute A

SA(XA , YA) = s(xA, yA)

11

12

2

11

1attribute C

S (− , YC) =2

k=1PX (x k )s(xk , yC)ΣC

22

22

21

2

2attribute B

S ( , XB ) =2

k=1P y (yk )s(yk , XC )ΣB −

11

11

2

11

2attribute D

S (− , − ) =2

l=1

2

k=1PX (x l )PY (yk )s(x l , yk )Σ ΣD

Pattern similarity

z , z , ..., z are ordinal values such that z = x + 1 and z = y - 1.i

1 2

1

k

k i

2008 election results + 13 socio-economic indicatorsfrom the US Census Bureau for 3108 counties.

Example 1 :>McCain voting block (red) and Obama voting block (blue) that are dissimilar in socio-economic sense and geographically apart.

Example 2 :>McCain voting block (red) and Obama voting block (green) that are dissimilar in socio-economic sense but geographically collocated.

Visual analytics :>Discriminative patterns are calculated for four groups (A, B, E, and F) of counties.

In each group patterns are ordered using ag-glomerative clustering.

Clustering heat map is a distance matrix with rows ordered according to clustering.

s(x i , y i ) =2 × log P (x i z1 z2 . . . zk yi )

log P (x i ) + log P (yi )

A

BC

D

E

FG

H

3 - 12

13 - 20

21 - 27

28 - 37

38 - 58

59 - 100

1 - 2

3 -4

5 - 6

7 - 8

9 - 10

11 - 13

0 - 0.25

0.25 - 0.5

0.5 - 1

1 - 2

2 - 3

3 - 4

4 - 13

0 - 0.05

0.05 - 0.18

0.18 - 0.32

0.32 - 0.46

0.46 - 0.62

0.62 - 0.82

0.82 - 1

pattern size patter length

pattern sizepattern length

patternoverlap

patterndissimilarity

} }

pattern set A (Obama) pattern set E (McCain)B F

} }

pattern set A (Obama) pattern set E (McCain)B F

pop. dens.

urban pop. %

female pop. %

fore

ign born %

per capita

income

household income

HS edu.

bachelor edu.

white pop. %

poverty %

owned house %

soc . sec. re

cipent %

soc. sec. in

come

lowest (1)

low (2)average (3)

high (4)

highest (5)

Obama block 1 (1- 872)

Obama block 2 (928 -3364)

Voted for Obama but not in disciminate patternssupport (3365 - 3610)

McCain block (3611- 6680)

Voted for McCain but not in disciminate patternssupport (6681 - 6970)

no value ( _ )

socio-economic indicators

E

A won by Obama, IN footprint of Obamaand NOT in footprint of McCain

153,611,411 67,040,847 62.14

won by Obama, NOT in footprint of Obamaand NOT in footprint of McCain

B

495

361 16,696,346 9,568,427 56.24

C won by Obama, NOT in footprint of Obamabut IN in footprint of McCain

9 199,478 88,945 51.07

D won by Obama, IN footprint of Obamaand IN footprint of McCain

1 210,554 61,494 52.90

won by McCain, IN footprint of McCainand NOT in footprint of Obama

1688 51,289,510 23,224,203 62,11

F won by McCain, NOT in footprint of McCainand NOT in footprint of Obama

472 31,269,880 15,772,301 59.01

G won by McCain, NOT in footprint of McCainbut IN footprint of Obama

62 23,518,016 8,941,422 55.91

H won by McCain, IN footprint of McCainand IN footprint of Obama

20 2,255,368 1,024,861 60.83

set description # of counties population # voted winning %

won by Obama

won by McCain

top related