gis in public health research: understanding spatial analysis and interpreting outcomes 1-31-14

64
GIS in Public Health Research: Understanding Spatial Analysis & Interpreting Outcomes Kristin Osiecki PhD

Upload: hpaocec

Post on 27-Jan-2015

109 views

Category:

Health & Medicine


2 download

DESCRIPTION

Geographic information systems (GIS) allow us to visualize data to better understand public health issues in our communities. Maps help recognize patterns for hypothesis generation; however, spatial analysis is necessary to substantiate relationships and produce meaningful outcomes. In this presentation we will discuss a few of the basic questions related to spatial analysis:

TRANSCRIPT

Page 1: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

GIS in Public Health Research: Understanding Spatial Analysis &

Interpreting Outcomes Kristin Osiecki PhD

Page 2: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 3: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Houston Aerosol Characterization & Health Experiment (HACHE)

Page 4: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

• UT Health Science Center School of Biomedical Informatics

• University of Houston Department of Earth and Atmospheric Sciences

• Rice University Department of Sociology and Department of Civil & Environmental Engineering

Page 5: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 6: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 7: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Applications in Public Health Research

• Space matters – communities,census tracts, counties, states

• Multidisciplinary and Interdisciplinary • Collaborative • Simple and Complex Models

Page 8: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

What research questions are we trying to answer?

• Do we need visualizations or maps? OR • Are we interested in investigating possible

spatial relationships within the data?

Page 9: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

ArcGIS Toolbox

Handyman’s Dream or

Do-it-yourself nightmare?

Page 10: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Objectives

• Traditional Statistics & Spatial Analysis • Permutations • Spatial Weights • EDA & ESDA

Page 11: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

"Spatial Statistics" does not mean applying traditional (non-spatial)

statistical methods to data that just happens to be spatial (has X and Y

coordinates). Source: ESRI

http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/how_generate_spatial_weights_matrix_spa

tial_statistics_works.htm

Page 12: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Traditional Statistical Methodology

Spatial Methodology

Spatial Analysis

Page 13: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 14: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Global Model

Local Model

EDA ESDA

Global & Local

Global autocorrelation Local autocorrelation

Page 15: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

The most crucial step in the process

Page 16: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Exploring the Data: EDA & ESDA

Page 17: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 18: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 19: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Scatter Plot Matrix

p_blck x p_FHH10.80.60.40.20

1

0.8

0.6

0.4

0.2

0

p_blck

p_FH

H

pct_pov

pct_

pov

Page 20: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 21: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Page 22: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Exploratory Spatial Data Analysis

• Interactively visualize and explore data where space matter

• Detect patterns • Hypothesis generation

• spatial modeling is needed to test hypotheses

• Works on point feature and polygon features (i.e. census, epidemiology, demographic layers)

Page 23: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

What is Spatial Randomness? • Observed spatial pattern of value is equally as

likely as any other spatial pattern • Value at one location does not depend on

values at neighboring locations under spatial randomness, the location of values may be altered without affecting the information content of the data

• random permutation or reshuffling of values Dr. Luc Anselin 2012

Page 24: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Randomness • Spatial Randomness Null Hypothesis

– Spatial randomness is absence in any pattern – If rejected, evidence of spatial structure

Dr. Luc Anselin 2012

Page 25: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

ArcGIS Spatial Autocorrelation • The Randomization Null Hypothesis: Where appropriate, the tools in the

Spatial Statistics toolbox use the randomization null hypothesis as the basis for statistical significance testing. The randomization null hypothesis postulates that the observed spatial pattern of your data represents one of many (n!) possible spatial arrangements. If you could pick up your data values and throw them down onto the features in your study area, you would have one possible spatial arrangement of those values. (Note that picking up your data values and throwing them down arbitrarily is an example of a random spatial process). The randomization null hypothesis states that if you could do this exercise (pick them up, throw them down) infinite times, most of the time you would produce a pattern that would not be markedly different from the observed pattern (your real data). Once in a while you might accidentally throw all the highest values into the same corner of your study area, but the probability of doing that is small. The randomization null hypothesis states that your data is one of many, many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary.

http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000006000000

Page 26: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Permutations

• A numerical approach to testing for statistical significance (in contrast to analytical approaches)

• It is data-driven and makes no assumptions (such as normality) about the data

Page 27: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Permutations in Geoda

• Permutation inference is shuffling values around and re-computing statistics each time with a different set of random numbers to construct a reference distribution.

• Permutations are used to determine how likely it would be to observe the Moran’s I value of an actual distribution under conditions of spatial randomness.

• P-values are dependent on the number of permutations so they are “pseudo p-values”

Page 28: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Permutations

Page 29: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

The first step in the analysis of spatial autocorrelation is to construct a spatial weights file that contains information on the “neighborhood” structure for each location (luc anselin)

Spatial Weights

Page 30: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Generation of Spatial Weights ESRI

• For binary strategies (fixed distance, K nearest

neighbors, or contiguity) a feature is either a neighbor (1) or it is not (0).

• For weighted strategies (inverse distance or zone of indifference) neighboring features have a varying amount of impact (or influence) and weights are computed to reflect that variation.

Page 31: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Row Standardization

• Adjusts the weights in a spatial weights matrix • Each weight is divided by its row sum • The row sum is the sum of weights for a

feature’s neighbors. • A weights matrix is row-standardized when

the values of each of its rows sum to one.

Page 32: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Binary vs. row-standardized

• A binary weights matrix looks like:

• A row-standardized matrix it looks like:

0 1 0 0

0 0 1 1

1 1 0 0

0 1 1 1

0 1 0 0

0 0 .5 .5

.5 .5 0 0

0 .33 .33 .33

Page 33: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Weights • Formal expression of locational similarity

Page 34: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Distance Models

• Inverse distance – all features influence all other features, but the closer something is, the more influence it has

• Distance band – features outside a specified distance do not influence the features within the area

• Zone of indifference – combines inverse distance and distance band

Page 35: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Inverse Distance (impedance) (ArcGIS) • features impact/influence all other features

– farther away something is, the smaller the impact

• specify a Distance Band/Threshold Distance value to reduce the number of required computations – especially with large datasets. – If not specified, a default threshold value is computed for you

• Choosing an appropriate distance is important – Some spatial statistics require each feature to have at

least one neighbor for the analysis to be reliable.

Page 36: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Distance band (sphere of influence) • impose a sphere of influence, or moving window

conceptual model of spatial interactions onto the data • Neighbors within the specified distance are weighted

equally. Features outside have no influence (weight = 0) • Evaluate the statistical properties of your data at a

particular (fixed) spatial scale • have at least one neighbor, or results will not be valid • if the input data is skewed make sure that your distance

band is neither too small (only one or two neighbors) nor too large (include all other features as neighbors) – resultant z-scores less reliable.

Page 37: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Adjacency Models

• K Nearest Neighbors – a specified number of neighboring features are included in calculations

• Polygon Contiguity – polygons that share an edge or node influence each other

Page 38: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

K-nearest neighbors • each feature assessed in the spatial context of a

specified number of its closest neighbors. If K (t is 8, then eight closest neighbors to the target feature will be included If feature density is high - spatial context of the analysis will be smaller.

• If feature density is sparse, the spatial context for the analysis will be larger.

• method is available using the Generate Spatial Weights Matrix tool

Page 39: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Polygon contiguity (first order) • polygons that share an edge (that have

coincident boundaries) are included in computations for the target polygon

• modeling some type of contagious process or are dealing with continuous data represented as polygons.

Page 40: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Binary Contiguity Weights • contiguity = common border • i and j share a border, then wij = 1 • i and j are not neighbors, then wij = 0 • weights are 0 or 1, hence binary

Distance-Based Weights • distance between points • distance between polygon centroids or central points • distance-band weights: wij nonzero for dij < d less than a critical distance d • k-nearest neighbor weights: same number of neighbors for all observations potential problems with ties

Page 41: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Global vs. Local Statistics

• Global statistics (Clustering) – identify and measure the pattern of the entire study area – Do not indicate where specific patterns occur

• Local Statistics (Clusters) – identify variation across the study area, focusing on individual features and their relationships to nearby features (i.e. specific areas of clustering)

Page 42: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Autocorrelation (Moran’s I)

• Global statistic • Measures whether the pattern of feature values is clustered,

dispersed, or random. • Compares the difference between the mean of the target

feature and the mean for all features to the difference between the mean for each neighbor and the mean for all features.

Mean of Target Feature

Mean of all

features

Mean of each neighbor

Page 43: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Z-Score & P-value (ArcGIS)

• Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution

• it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by your null hypothesis (CSR)

• The null hypothesis for the pattern analysis tools is Complete Spatial Randomness (CSR), either of the features themselves or of the values associated with those features.

http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000006000000

Page 44: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Pseudo P-Value

• significance levels are dependent on the number of permutations

• One-sided significance test • For instance, if an observed Moran's I value is

higher than any of the randomly generated Moran's I values, the pseudo p-value would be 1/100=0.01 for 99 permutations or 1/1,000=0.001 for 999 permutations

Page 45: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Autocorrelation (Moran’s I) Polygon Contiguity (first order)

Page 46: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Autocorrelation (Moran’s I) Polygon Contiguity (first order)

Percent Black Population, Cook County, IL

Page 47: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Generate Spatial Weights Matrix K-Nearest Neighbor

Page 48: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor

Percent Black Population, Cook County, IL

Page 49: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor

Percent Black Population, Cook County, IL

Page 50: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

If the z-score value is positive, the observed General G index is larger than the expected General G index, indicating high values for the attribute are clustered in the study area

Spatial Autocorrelation (Getis –Ord General G High/Low Clustering) Polygon Contiguity

Percent Black Population, Cook County, IL

Page 51: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Spatial Autocorrelation (Moran’s I) Percent Black Population, Cook County, IL

Page 52: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Spatial Autocorrelation (Moran’s I) Queen Contiguity Weight (1st order)

Percent Black Population, Cook County, IL

Page 53: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor (eight)

Percent Black Population, Cook County, IL

Page 54: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor (four)

Percent Black Population, Cook County, IL

Page 55: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Anselin Local Moran’s I

• Local statistic

• Measures the strength of patterns for each specific feature.

• Compares the value of each feature in a pair to the mean value for all features in the study area.

Page 56: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Anselin Local Moran’s I

• Positive I value: – Feature is surrounded by features with similar values, either high or low.

– Feature is part of a cluster.

– Statistically significant clusters can consist of high values (HH) or low values (LL)

• Negative I value: – Feature is surrounded by features with dissimilar values.

– Feature is an outlier.

– Statistically significant outliers can be a feature with a high value surrounded by features with low values (HL) or a feature with a low value surrounded by features with high values (LH).

Page 57: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

• The z- scores and p-values are measures of statistical significance which tell you whether or not to reject the null hypothesis, feature by feature.

• Indicate whether the apparent similarity (or dissimilarity) in values for a feature and its neighbors is greater than one would expect in a random distribution.

http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/cluster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htm

Anselin Local Moran’s I

Page 58: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Anselin’s Local Moran’s I Polygon Contiguity Weight Percent Black Population Cook County, IL

p-value z-score index

HH LH

Page 59: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Univariate LISA Queen Contiguity Weight

Percent Black Population, Cook County, IL

p-values 499 Permutations p-values 999 Permutations

Page 60: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Geoda Univariate LISA Queen Contiguity Weight

Percent Black Population, Cook County, IL

HH HL 999 Permutations

Page 61: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Comparison ArcGIS & Geoda Results Queen Contiguity Weight

Percent Black Population, Cook County, IL p-values

Page 62: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Comparison ArcGIS & Geoda Univariate LISA Queen Contiguity Weight

Percent Black Population, Cook County, IL

HH HL 999 Permutations HH HL

Page 63: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

High - High

High - Low

Low-High

Low-Low

Percent Poverty

Non

-poi

nt S

ourc

e Ca

ncer

Risk

# of

Observations

R^2 Constant Std

Error

t-statistic p-value Slope Std

Error

t-statistic p-value

1343 0.209 0.00442 0.0176 0.251 0.802 0.332 0.0176 18.8 0

80 0.1116 1.58 0.0797 19.8 0 0.045 0.0475 0.957 0.342

1263 0.118 -0.0794 0.0161 -4.92 0 0.223 0.0172 13 0

INTERCEPT SLOPE

Bivariate LISA Scatterplot

Chow test for selected/unselected regression subsets distribution F(2,1339) ratio=214.6 p-value=0

Page 64: GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

Global Model

Local Model

EDA ESDA