shashi shekhar - harvard university · discovering personally meaningful places: an interactive...

13
Transdisciplinary Foundations of Spatial Data Science April 27 th , 2018 Workshop on Illuminating Space and Time in Data Science Center for Geographical Information Systems, Harvard University. Shashi Shekhar McKnight Distinguished University Professor Dept. of Computer Sc. and Eng., University of Minnesota www.cs.umn.edu/~ shekhar : [email protected]

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Transdisciplinary Foundations of Spatial Data ScienceApril 27th, 2018

Workshop on Illuminating Space and Time in Data ScienceCenter for Geographical Information Systems, Harvard University.

Shashi ShekharMcKnight Distinguished University Professor

Dept. of Computer Sc. and Eng., University of Minnesotawww.cs.umn.edu/~shekhar : [email protected]

Page 2: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

NSF 1737633: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework

to Advance Equity in Communities (2017-2020)S. Shekhar, A. Ramaswami, R. Feiock, V. Merwade, J. Marshall

Major Research Innovations• Comprehensive fine intra-urban scale data (SEIU-EHW parameters in Figure 1)

• Spatial Data Science to understand relationships (Figure 2).• Model & visualize multi-infrastructure spatial smart city futures

• Knowledge co-production theories, science and practice

Figure 2. Spatial PatternsFigure 1. Complex Interactions among SEIU and EHW parameters

Page 3: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

• Co-Visioning via meetings• Plan infrastructure for driver-less, post-carbon future with climate change• Advance Environment, Health, Wellbeing & Equity via infrastructure refinement

• Co-select Questions– Understand spatial equity in infrastructure & outcomes (wellbeing. health, environment)?– How does equity first approach differ from average-outcome based approaches ?

• Problem Co-Definition: How to measure spatial equity? Well-being?• Co-Discovery• Co-Evaluation

• Details: University of Minnesota secures $2.5 million grant to improve quality of life in cities, October 20, 2017 (https://www.cs.umn.edu/news/filter/highlights/professor-shekhar-leads-u-m-team-granted-25-million-nsf-grant )

Social Equity

Research

Education

Community Partners & Outreach

Diversity

NSF 1737633: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework

to Advance Equity in Communities (2017-2020)S. Shekhar, A. Ramaswami, R. Feiock, V. Merwade, J. Marshall

Page 4: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Discover Patterns, Generate Hypothesis

Test Hypothesis(Controlled Experiments)

Develop Theory

Remove pump handle Germ Theory

1854: What causes Cholera?

Collect & Curate Data

? water pump

Impact on cities: Health & well-being, parks, sewage system, drinking water supply, …

History of Spatial Data Science in S&CC

Q? What are the Choleras of today? Q? How may spatial data science help?

Page 5: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Today’s Transdisciplinary Spatial Data Science

5

• Spatial Statistics: Test to reduce spurious patterns• Computer Sc.: Algorithms for large (e.g., national) data• Mathematics: Reduce missed patterns

• SatScan enumerates only 2-point circles

Page 6: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Theme 2: Spatial Data Analysis of SEIU-WHE Parameters

● Task 2A: Develop algorithms to discover statistically significant linear and buffer hotspots, e.g., of income-poverty, consumption, pollution exposure, and low wellbeing

● Task 2B: Discover co-location and teleconnection patterns: Develop scalable algorithms for identifying correlations in SEIU-WHE parameters, e.g., hotspots and deprived areas

● Task 2C: Data-Driven and Discipline-inspired hypotheses

Page 7: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Task 2A: Discovering Linear and Buffer Hotspots

● Hotspots often along a spatial network (e.g., air pollution hotspots along roads)● Preliminary results: Linear hotspot detection which models the linear semantics

● However, only along shortest paths between end-points● Not including the information surrounding the network.

● Proposed approach:– Novel notion: Non-shortest-path Simple paths, buffer hotspots– Potential solution: graph partitioning based divide and conquer

(a) Circular hotspots for pedestrian fatalities

(b) Linear hotspots for pedestrian fatalities

(c) Example of non-shortest path

Page 8: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Task 2B: Discover co-location and teleconnection patterns

● Challenge: Spatial partitioning distorts (& misses) spatial interactions!● Spatial Statistical Methods are computationally expensive

• Prelim. Results: Fast algorithms for mining Co-location (& Teleconnection) ● Proposed: address data with multiple levels of aggregation, e.g., areal summary

(a) a map of 3 features (b) Spatial Partitions (c) Neighbor graph

Pearson’s Correlation Ripley’s cross-K Participation Index

- -0.90 0.33 0.5

- 1 0.5 1

Page 9: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Data-Intensive Science of S&CC in 21st Century

Collect, & Curate Big Data

Spatial Patterns, Hypothesis Generation

Test Hypothesis(Policy Intervention)

S&CC Theory

Role of policies & urban forms

Hotspots of infrastructure deprivation, consumption, pollution, investment, disease & well-being.

Correlates?SEIU EHW

Equity first policies

Data-driven and Discipline-inspired

hypothesis generation

Volume, Variety

Page 10: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Challenges Ahead

• Non-stationarity• Change, e.g., climate, Web, …• Feedback Loops, e.g., Social

• Fairness• Accountability• Transparency

Page 11: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

References :Surveys, Overviews• Spatial Computing ( html , short video , tweet ), Communications of the ACM, 59(1):72-81,

January, 2016.• Transdisciplinary Foundations of Geospatial Data Science ( html , pdf ), ISPRS Intl. Jr. of

Geo-Informatics, 6(12):395-429, 2017. ( doi:10.3390/ijgi6120395 )• Spatiotemporal Data Mining: A Computational Perspective , ISPRS Intl. Jr. on Geo-

Information, 4(4):2306-2338, 2015 (DOI: 10.3390/ijgi4042306).• Identifying patterns in spatial information: a survey of methods ( pdf ), Wiley

Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3):193-214, May/June2011. (DOI: 10.1002/widm.25).

• Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEETransactions on Knowledge and Dat Mining, 29(10):2318-2331, June 2017. ( DOI:10.1109/TKDE.2017.2720168 ).

• Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and SocialScience Communities: A Research Roadmap. IEEE BigData Congress 2017: 232-250.

• Spatial Databases: Accomplishments and Research Needs, IEEE Transactions onKnowledge and Data Engineering, 11(1):45-55, 1999.

Page 12: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

References: DetailsColocations • Discovering colocation patterns from spatial data sets: a general approach, IEEE Trans. on Know. and

Data Eng., 16(12), 2004 (w/ Y. Huang et al.). • A join-less approach for mining spatial colocation patterns, IEEE Trans. on Know. and Data Eng.,18

(10), 2006. (w/ J. Yoo).• Cascading Spatio-Temporal Pattern Discovery. IEEE Trans. Knowl. Data Eng. 24(11): 1977-

1992, 2012 (w/ P. Mohan et al.).

Spatial Outliers

• Detecting graph-based spatial outliers: algorithms and applications (a summary of results), Proc.: ACM Intl. Conf. on Knowledge Discovery & Data Mining, 2001 (with Q. Lu et al.)

• A unified approach to detecting spatial outliers, Springer GeoInformatica, 7 (2), 2003. (w/ C. Lu, et al.)• Discovering Flow Anomalies: A SWEET Approach, IEEE Intl. Conf. on Data Mining, 2008 (w/ J. Kang).

Hot Spots • Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou et al.)

• A K-Main Routes Approach to Spatial Network Activity Summarization, IEEE Trans on Know. & Data Eng., 26(6), 2014. (with D. Oliver et al.)

• Significant Linear Hotspot Discovery, IEEE Trans. Big Data 3(2): 140-153, 2017, (w/ X.Tang et al.)

Location Prediction

• Spatial contextual classification and prediction models for mining geospatial data, IEEE Transactions on Multimedia, 4 (2), 2002. (with P. Schrater et al.)

• Focal-Test-Based Spatial Decision Tree Learning. IEEE Trans. Knowl. Data Eng. 27(6): 1547-1559, 2015 (summary in Proc. IEEE Intl. Conf. on Data Mining, 2013) (w/ Z. Jiang et al.).

Change Detection

• Spatiotemporal change footprint pattern discovery: an inter-disciplinary survey. Wiley Interdisc. Rew.: Data Mining and Know. Discovery 4(1), 2014. (with X. Zhou et al.)

Page 13: Shashi Shekhar - Harvard University · Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou

Knowledge Co-Production:NSF Smart & Connected Communities Grant 1737633 (2017-2020)

• Co-Visioning via meetings• Plan infrastructure for driver-less, post-carbon future with climate change• Advance Environment, Health, Wellbeing & Equity via infrastructure refinement

• Co-select Questions– Understand spatial equity in infrastructure & outcomes (wellbeing. health, environment)?– How does equity first approach differ from average-outcome based approaches ?

• Problem Co-Definition: How to measure spatial equity? Well-being?• Co-Discovery• Co-Evaluation

• Details: University of Minnesota secures $2.5 million grant to improve quality of life in cities, October 20, 2017 (https://www.cs.umn.edu/news/filter/highlights/professor-shekhar-leads-u-m-team-granted-25-million-nsf-grant )

Social Equity

Research

Education

Community Partners & Outreach

Diversity