spatial data analysis areas i: rate smoothing and the maup

30
Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005

Upload: junior

Post on 16-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Ifgi, Muenster, Fall School 2005. Spatial Data Analysis Areas I: Rate Smoothing and the MAUP. Gilberto Câmara INPE, Brazil. Areal data. Study region is partitioned in disjoint areas The region is the union of the areas Each map has one or more associated measures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP

Gilberto CâmaraINPE, Brazil

Ifgi, Muenster, Fall School 2005

Page 2: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Areal data

Study region is partitioned in disjoint areas The region is the union of the areas Each map has one or more associated measures

Treated as random variables

Examples: Map of Germany divided in municipalities. For each area,

we measure the unemployment rate and the literacy rate.

Is unemployment correlated with years of school? What about Brazil?

Page 3: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Violence in Minas Gerais

Page 4: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Violence in Minas Gerais

Page 5: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Violence in Minas Gerais

Page 6: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Attributes in areal data

As a general rule, each measure is a sum, count or a similar aggregated function over all the area

Each value is associated to all the corresponding area

If we need to choose a single location, usually we take the polygon centroid

There are no intermediate values

Page 7: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

What is mapped in areal data?

Typical values are rates or proportions

Numerator = events

Denominador = pop at risk

Log maps?

Page 8: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Log rate of motor vehicle accident death per 100.000 residents, 1990-92

São Paulo

Minas Gerais

Kilômetros

0 100 200

EspíritoSanto

Rio de JaneiroLEGENDA

classes (n de municípios)

4,214 a 5,28 (35)3,148 a 4,214 (287)

2,082 a 3,148 (536)1,016 a 2,082 (253)

-0,05 a 1,016 (23)

0 óbitos (298)

N

L

S

O

Capitais

Page 9: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Log ratio of homicide death of males 15-49 per 100.000 residents of same group age, 1990-92

São Paulo

Minas Gerais

Kilômetros

0 100 200

EspíritoSanto

Rio de JaneiroLEGENDA

classes (n de municípios)

0,95 a 1,906 (28)1,906 a 2,862 (209)

2,862 a 3,818 (460)

3,818 a 4,774 (223)4,774 a 5,73 (64)

0 óbitos (448)

N

L

S

O

Capitais

Page 10: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Models of Discrete Spatial Variation

Taxas de Leishmaniose Visceral (1997/1998) .casos por 100 mil habitantes .

200 a 250 (1)150 a 200 (2)100 a 150 (1)50 a 100 (4)10 a 50 (29)5 a 10 (16)1 a 5 (43)

< 1 (19)

Random variable in

area i iY

iZ

• n° of ill people

• n° of newborn babies

• per capita income

Source: Renato Assunção (UFMG/Brasil)

Page 11: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

When the study variable is a rate or a proportion, mapping

those rates is the first obvious step in any analysis.

However, the use of raw observed rates might be

misleading, since the variability of those rates will be a

function of the population counts, which differs widely

between the areas.

Bailey,1995

Dealing with rates and proportions

Page 12: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

São Paulo Metropolitan Region

0

10

20

30

40

50

60

0 5000 10000 15000 20000 25000

population aged less than 1 year

Infa

nt

mo

rtal

ity

rate

Source: Fred Ramos (CEDEST/Brasil)

Page 13: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Model-Driven Approaches

Model of discrete spatial variation Each subregion is described by is a statistical

distribution Zi

e.g., homicides numbers are Poisson (, ). The main objective of the analysis is to estimate the

joint distribution of random variables Z = {Z1,…,Zn}

We use a model-driven approach to correct the missing data It is called the “Empirical Bayes” method... We could also use the “Full Bayes” method (but that is

another story...)

Page 14: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

ˆ (1 )i i i i iw r w ( / )i

ii i i

wn

i

(measured rate)ii

i

yr

n

In Bayesian statistics, the best estimate of the true

and unknown rate isi

iwhere

Source: Fred Ramos (CEDEST/Brasil)

Page 15: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

ˆ i

i

y

n

2ˆ( ) ˆˆ i i

i

n r

n n

ˆ ˆ( )ˆ ˆˆ ˆ( / )

ii

i

r

n

Simplifying assumptions for estimating means and

variances for all random variables of all areas (Marshall,

1991)

Empirical Bayes

Source: Fred Ramos (CEDEST/Brasil)

Page 16: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Municípios da RMSP e distritos MSP

0

10

20

30

40

50

60

0 5000 10000 15000 20000 25000

população até 1 ano

tax

a d

e m

ort

ali

da

de

in

fan

til

0

10

20

30

40

50

60

0 5000 10000 15000 20000 25000

population less than 1 year old

es

tim

ate

d i

nfa

nt

mo

rtal

ity

ra

te

Source: Fred Ramos (CEDEST/Brasil)

Page 17: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Infant Mortality Rate – São Paulo (Raw)

Source: Fred Ramos (CEDEST/Brasil)

Page 18: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Infant Mortality Rate – São Paulo (Corrected)

Source: Fred Ramos (CEDEST/Brasil)

Page 19: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Some Important Questions

How does scale matter?

How do the spatial partitions matter?

How does proximity matter?

What can we learn by studing how multiple data vary in space?

How much prior assumptions can we impose in our spatial data?

Page 20: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Problema das Unidades de Área Modificáveis - MAUPA Question of Scale

A basic problem with areal data The spatial definition of the frontiers of the areas

impacts the results

Different results can be obtained by just changing the frontiers of these zones.

This problem is known as the “the modifiable area unit problem”

Page 21: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Per capita incomePer capita income Jobs/ populationJobs/ population Illiterate / populationIlliterate / population

Scale Effects

Source: Fred Ramos (CEDEST/Brasil)

Page 22: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Scale EffectsPer capita incomePer capita income Jobs/ populationJobs/ population Illiterate / populationIlliterate / population

Source: Fred Ramos (CEDEST/Brasil)

Page 23: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Population >60 years

Illiterates per capitaincome

270 ZONES OD97

Scale Effects: Figthing the MAUP

Source: Fred Ramos (CEDEST/Brasil)

Page 24: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

96 DISTRICTS OF SÃO PAULO

Scale Effects: Figthing the MAUP

Population >60 years

Illiterates per capitaincome

Source: Fred Ramos (CEDEST/Brasil)

Page 25: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

96 INCOME-HOMOGENOUS ZONES IN SÃO PAULO

Scale Effects: Figthing the MAUP

Population >60 years

Illiterates per capitaincome

Source: Fred Ramos (CEDEST/Brasil)

Page 26: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

27

0 Z

ON

ES

OD

97

96

DIS

TR

ICTS

96

IN

CO

ME-

AG

GR

EG

ATED

A) Percentage of population 60 year-old or more

B) Percentage of illiterate population

C) Per capita individual income

VARIABLES

Correlation matrices

Source: Fred Ramos (CEDEST/Brasil)

Page 27: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Get census data

Identify inter-tractvariation

Adaptation

Minimize the outlier effect

Reduce data variability

A Questão da EscalaA Questão da Escala

Page 28: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Regionalization

Reagregate N small areas (finest scale available) into M bigger regions to reduce scale effects.

A possible solution: constrained clustering

Page 29: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Regionalization: Maps as graphs

Page 30: Spatial Data Analysis  Areas I: Rate Smoothing and the MAUP

Regionalization: Maps as graphs

Simple aggregation Population-constrained aggregation