geographic knowledge discovery (phd theme) by roberto zagal

Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City

Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN

Christophe Claramunt, Naval Academy Research Institute

Introduction (1)• Traditionally Pollution Data has been produced by

institutions, government and vendors• But now… the Pollution Data is produced by persons, too

Information about Pollution topic is expressed in different ways by:

Government, News media People in social networks

Introduction (2)

Introduction (3)

But…What about the certainty of this

information?

Introduction (4) What about ... inconsistency?

Id Type Description1 Tweet

newspaper1The index of IMECAS is 135 #CDMX

2 TweetNewspaper2

@ the #contamination of air is 127 IMECAS #CDMX #bad #new

Related work• The social data problem has been faced:

1. KDD and Social Mining2. Formal publications (news media) guide the classification

of the interests of social media users [1]3. Opinion mining and topic modeling [2]. But not using a GKD with an approach of crossing data

layers

GoalKnow how to:

Discover the certainty level of information

by Crossing geographic and social information

Solution proposed:

GKD Framework ForData Air Polluttion

Phase 1

Phase 2

Phase 3

Data extraction: Sample tweet (Phase 1)

newspaper1TheThe index of IMECAS is 135 #CDMX

2 TweetNewspaper2

@ the #contamination of air is 127 IMECAS #CDMX #bad #news

We consider tweets from accounts that periodically reports data of air pollution

Data extraction: Domain Detection (Phase 1)

Newspaper2

@ #contamination air is 127 IMECAS #CDMX #bad #new

The post is related to a pollution topic

Preprocessing (Phase 2)

• Emotion detection [3] • Location extraction

Newspaper2@ #contamination air is 127 IMECAS #CDMX #bad #new

• If we detect to which category belongs each set of data:

• Health and Pollution, Transport and Pollution

Then, we can select which data sources should be Then, we can select which data sources should be crossed with the tweet , in order to discover crossed with the tweet , in order to discover KnowledgeKnowledge

Classification C5 algorithm (Phase 3)

Id Description Category2 @ #contamination air is 127 IMECAS

#CDMX #bad #new Health and pollution

Crossing data (Phase 4)

• Example 1:• Inconsistencies in tweet 1 and 2?

Newspaper1The index of IMECAS is 135 #CDMX

2 TweetNewspaper2

@ the #contamination of air is 127 IMECAS #CDMX

What is correct?

How to know what tweet is correct? Answer:

It was classified in the domain of: Health and pollution ( In Phase 3 )Then The official data from Healt reports and pollution reports are

selected to be crosssed with the Tweet (in Phase 4)

28/10/16

• Data are crossed considering different attributes, from the tweet is taken the date and hour of publication

• When is crossed with the date and hour from official reports of air quality: a match is found

28/10/16

We discovered the tweets are correct but with different location (the location is not include in the original tweet)

28/10/16

1 Tweet newspaper1

The index of IMECAS is in 135 #CDMX

#Taxqueña 10:00 hours

2 TweetNewspaper2

The #contaminación of air is in 127 IMECAS #CDMX

#Indios Verdes

15:00 hours

Knowledge Discovered!

Other preliminary results

• Following the same approach

• Knowledge discovered: what topic are talked by region

Topic Geographic Period

HealthSouth , West March-June

TransportNorth, East January

December

Policy and programs

Center JanuaryDecember

PollutionSurrounding Mexico City January-June

Public roadsSurrounding Mexico City January-

December

Conclusions and Future work• The integration of the geographical and temporal

dimensions allow us to discover data correlations knowledge can increase certainty of some information in social networks .

• The main contribution is the domain discovery and classification of information is a key element of news aproaches for to discover geographic information.

Conclusions and future work• Future work

• Use of clustering or deep learning approaches to improve the classification process

• The location detection is a hard problem. It can be test another machine learning methods for social media [4, 5]

• ¿How can we improve the geographic discovery knowledge considering no explicit links between traditional data sources and

social sources?

Many Thanks!

Questions?

Roberto Zagal zagalmmx@gmail.com

IPN, México

28/10/16

References

[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN 0020-0255.

[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 871-880). ACM.

[3] Carlos Acevedo Miranda, Ricardo Clorio Rodriguez, Roberto Zagal Flores,and Consuelo V. Garcia Mendoza. Web architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc. 2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/

[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485

[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.

28/10/16

geographic knowledge discovery (phd theme) by roberto zagal

Internet

los preolímpicos irene 4º e.p.o. colegio zagal alqueria

presenta: ivonne zagal ramírez coordina: dr. rodrigo...

archivos foro taller 17022005 roberto zagal

labte - gr4 - inf1 - silva - zagal

héctor zagal

el juego de la paz - actiludis · para escribir paz la p,...

national geographic, national geographic society and the

digesto unlar · 2019. 11. 11. · 111 congreso egrafía i...

archivos foro taller_17022005 roberto zagal (1)

geographic business intelligence -...

amanco: developing the sustainability scorecard · amanco...

zagal el problema de la obscuridad en aristoteles eunsa

^vale la pena argumentar en etica? hector zagal

zagal portfolio

zagal. etica para adolescentes posmodernos

oscar díaz-josé, jorge aguilar-Ávila, roberto...

national geographic magazine: una empresa periodística...

el gran circo zagal fotos

alvar gÓmez el zagal - archivo de la frontera€¦ · bona...

el gran circo zagal