1 kalev leetaru, eric shook, and shaowen wang cyberinfrastructure and geospatial information...

40
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information Science School of Earth, Society, and Environment National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign CyberGIS ‘ 12, Urbana IL, August 8, 2012 A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case Study of Wikipedia’s History of the World

Upload: jacob-mosley

Post on 17-Jan-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

1

Kalev Leetaru, Eric Shook, and Shaowen Wang

CyberInfrastructure and Geospatial Information Laboratory (CIGI)Department of Geography and Geographic Information Science

School of Earth, Society, and EnvironmentNational Center for Supercomputing Applications (NCSA)

University of Illinois at Urbana-Champaign

CyberGIS ‘ 12, Urbana IL, August 8, 2012

A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case

Study of Wikipedia’s History of the World

Page 2: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 3: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 4: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 5: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 6: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 7: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 8: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 9: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 10: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

10

Page 11: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

11

Page 12: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 13: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information
Page 14: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

14

http://www.sgi.com/go/wikipedia

Page 15: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

15

Page 16: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

16

Page 17: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

17

Page 18: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

18

Page 19: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

19

Page 20: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Workflow

CyberGIS

SentimentMining

Fulltext Geocoding

Page 21: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Inside the CyberGIS “black box”

Security DomainDecomposition

XSEDE

GISolve Middleware

CI

Data &Viz

Resource Selection

Task Scheduling

Clouds

Workflow Management ServicesOpen Service API

OSG

EmotionalHeatmap

Page 22: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Data Input for a Topic

A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic

Page 23: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Data Input for a Topic

A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic

?

Page 24: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Spatializing Emotion

3 important elements

1. Importance of location2. Prevalence of topic3. Emotion toward topic

Goal:Capture 3 elements on a single map

Page 25: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

1) Importance of Location Every mention of a location

increases its importance

Generate a density map of the number of times a location is mentioned in text using Kernel Density Estimation (KDE) based on k nearest neighbor search

Page 26: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

1) Importance of Location

Page 27: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

2) Prevalence of Topic

We term topic intensity to capture the prevalence of a topic relative to other topics, and adopt a method commonly used in epidemiological studies to estimate it

Relative risk is a ratio of the KDE of disease infection locations and case control locations

Page 28: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Topic Intensity

Topic Intensity

KDE(articles that mention a topic)___ KDE(articles that do not mention the topic)

Relative Risk

KDE(points with disease)__ KDE(points without disease)

Page 29: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Topic Intensity

Page 30: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

3) Emotion Toward a Topic Challenging question:

Is the emotional measure tone, discrete or continuous?– Is tone "countable" like trees or does

it exist as a continuum like air temperature?

Tone is a continuum:– Cannot have "number of tones"

Page 31: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

3) Emotion Toward a Topic A different method is used,

because tone is continuous and not discrete

Inverse distance weighted (IDW) interpolation is used to estimate tone across space creating a tone map

Tone map captures positive and negative tone toward a particular topic across space

Page 32: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

3) Emotion Toward a Topic

Page 33: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Overview – 3 layers

1) Article density - Proxy: Importance of location

2) Topic intensity - Proxy: Prevalence of topic relative to other topics

3) Tone - Proxy: Emotion toward a topic

Page 34: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Overview – 3 layers

1) Article density - Proxy: Importance of location

2) Topic intensity - Proxy: Prevalence of topic relative to other topics

3) Tone - Proxy: Emotion toward a topic

First two layers representscaling factors for tone

Value range: 0 - 1

Value range: 0 - 100

Value range: -100 - 100

Page 35: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Emotional Heatmap

Article Density Topic Intensity

Emotional HeatmapTone

*

=

*

Page 36: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Emotional Heatmap of Armed Conflict in 2003 (Wikipedia)

Page 37: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Summary

First steps, but started the dialogue

Balance– Managing the complexity of

cyberinfrastructure access– Simplifying the workflow of chaining

of spatial analytics– Making sense of what’s involved

Scientific rigor

Page 38: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

Ongoing Work

Translate spatial knowledge to domain knowledge by answering a basic question: why is this here and not there?

Tackle spatial aggregation issues– Represent locations as areas not

points– Areal interpolation

Page 39: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

39

Acknowledgments

Guofeng Cao, Anand Padmanabhan National Science Foundation

– BCS-0846655– OCI-1047916– Open Science Grid– XSEDE SES070004N

Page 40: 1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information

40

Thanks!