comparative study on dust pollution on air quality in korea and...

22
Comparative study on dust pollution on air quality in Korea and China through query search Heeseok Joeng Sungkyunkwan University Dept. of Human ICT Convergence June 05, 2015

Upload: others

Post on 26-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Comparative study on dust pollution on air quality

in Korea and China through query search

Heeseok Joeng

Sungkyunkwan University

Dept. of Human ICT Convergence

June 05, 2015

Page 2: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Contents

Ⅰ. Introduction

Ⅱ. Related Work1. Air pollution2. Search Query

Ⅲ. Research Question & Methodology

Ⅳ. Phase1. Data Collect2. Correlation Coefficient3. Monitoring Model4. Analysis result

Ⅴ. Conclusion1. Discussion2. Limitation3. Future Work

Ⅵ. References

Appendix

Page 3: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅰ. Introduction

Rapid industrialization and urbanization have seriously degraded air quality (Yasunari et al., 2013; Fajersztajn et al., 2013) in many regions around the world, particularly in China

Large amounts of air pollutants emitted in China are directly related to high-PM10 episodes in Seoul(WTO, 2013)

3/22

Page 4: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅰ. Introduction

Google Flu Trend(GFT) _ https://www.google.org/flutrends/intl/ko/

ILI, VD data by Centers for Disease Control and Prevention• ILI : influenza-like illness• VD : influenza virus detection

Tracking flu-related searches on the web for surveillance by query

4/22

Page 5: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅱ. Related Work : Air pollution

Definiton• What is the air pollution• dust, yellow dust, fine dust, PM10, PM2.5 and so on• the important air pollution in Korea and China

By Korea forest research institue(2012)

Air Mass Transport Pathways, Source of the air pollution• Composition Variation of Atmospheric Fine Particulate Matters in Accordance with Air Mass Transport

Pathways at Background Site of Korea in 2013

By Asian Dust Research Laboratory, National Institute of Meteorological Research(2014)

5/22

Page 6: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅱ. Related Work : Search Query

Google Trends• Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease by Herman Anthony Carneiro(2009)• Improving the Timeliness of Data on Influenza-like Illnesses using Google Search Data by Jurgen A. Doornik

University of Oxford, UK (2009)• Detecting influenza epidemics using search engine query data by Jeremy Ginsberg(2009)

Baidu• Monitoring influenza epidemics in china with search query from baidu By Qingyu Yuan(2013)

6/22

Page 7: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅲ. Research Question & Methodology

Research Question

How can we monitor air pollution(PM10) through query search in Baidu?

Consideration

• There is no geographic information exactly on PM10 in China(from Korea) → we can find that through query search in baidu

• How that it matches (accuracy) ex. Lag

• What models can we apply the date?

Phase(Methodology)

Data Collect

(Baidu index keyword)

(Kma pm10 data)

Data Analysis

(Statistical analysis –

correlation coefficient)

Data evaluation

(logistic regression model)

(multiple regression model)

Comparison

(Korea & China)

7/22

Page 8: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(1) : Data Collect

Data Collect

Source : Korea Meteorological Administration

• Classification : PM10 data

• Period : 2014.06 ~ 2015.05

• Criteria : maximum value one of the day, of 28 observing station in Korea

• Amount of data : 365 values(1countries * 12months * 30days)

8/22

Page 9: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(1)

Data Collect

Source : Baidu Index(http://index.baidu.com/)

• Classification : 34 districts, selected key word

• Period : 2014.06 ~ 2015.05

• Criteria : search query of a day in baidu

• Amount of data : 161,330 values(34 districts * 12months * 30days * 13 keywords)

Chinese Koran安徽 안후이澳门 마카오北京 베이징重庆 충칭福建 푸젠

广东 광둥甘肃 간쑤

广西 광시

贵州 구이저우河北 허베이

黑龙江 헤이룽장河南 허난湖南 후난

湖北 후베이

海南 하이난吉林 지린

江苏 장쑤

江西 장시

辽宁 랴오닝

内蒙古 네이멍구

宁夏 닝샤

青海 칭하이

上海 상하이

四川 쓰촨山东 산둥

山西 산시

陕西 시안天津 톈진

台湾 대만西藏 티베트

香港 홍콩

新疆 신장云南 윈난

浙江 저장 9/22

Page 10: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(1) : Select the keywords

Data Collect

Selection and purification of keyword

• Key Words : 13 words

Sand

sand strom

sand dust strom

Dust

Fine dust

Smog

Floating dust

PM10

PM2.5

keyword

Selected Key Works

prevention

Symptom

treatment

adjunct沙尘暴(황사)

沙暴(모래폭풍)

微尘(미세먼지)

雾霾(스모그)

灰尘(먼지)

感冒(감기)

小心(감기)

哮喘(천식)

喘息(천식)

气管炎(기관지염)

口罩(마스크)

PM10(미세먼지)

PM2.5(초미세먼지)

Search query

10/22

Page 11: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(2) : Correlation Coefficient

Analysis of data : Correlation Coefficient

Correlation of search query and PM10

• Consideration : time-lag

• T time of PM10(t, t-1, t-2, t-3, t-4)

Baidu data is divided in the PC and mobile

北京_

베이징

PC

TimeLag

沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩

Lag 2 .778 .376 .598 .485 .408 .720 .865 .761 .286 .756 .281 .738 .730

Lag 1 .762 .382 .572 .457 .425 .723 .870 .768 .295 .766 .258 .744 .743

Lag 0 .623 .264 .441 .352 .333 .642 .772 .624 .211 .612 .211 .543 .555

MobiIe

Lag 2 .812 .399 .622 .499 .416 .733 .877 .772 .301 .768 .301 .764 .762

Lag 1 .799 .400 .600 .488 .414 .754 .887 .788 .321 .751 .264 .789 .790

Lag 0 .663 .284 .522 .356 .234 .688 .721 .636 .233 .634 .243 .592 .612

Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)

11/22

Page 12: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(2) : Correlation Coefficient

Analysis of data : Correlaton Coefficient

Correlation of search query and PM10

• Consideration : time-lag

• T time of PM10(t, t-1, t-2, t-3, t-4)

Baidu data is divided in the PC and mobile

上海_

상하이

PC

TimeLag

沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩

Lag 2 .502 .124 .230 .253 .337 .370 .487 .411 .177 .437 .131 .272 .312

Lag 1 .565 .310 .223 .318 .407 .477 .594 .509 .210 .520 .184 .464 .437

Lag 0 .381 .047 .247 .219 .282 .454 .455 .450 .035 .397 .014 .369 .087

MobiIe

Lag 2 .219 .012 .184 .355 .438 .487 .513 .414 .072 .335 .109 .337 .320

Lag 1 .596 .200 .299 .353 .444 .489 .635 .537 .150 .370 .237 .592 .520

Lag 0 .224 .042 .132 .257 .416 .445 .487 .362 .063 .333 .039 .394 .433

Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)

12/22

Page 13: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(2) : Correlation Coefficient

Analysis of data : Correlaton Coefficient

Correlation of search query and PM10

• Consideration : time-lag

• T time of PM10(t, t-1, t-2, t-3, t-4)

Baidu data is divided in the PC and mobile

河北_허베이

PC

TimeLag

沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩

Lag 2 .556 .202 .404 .306 .285 .634 .726 .670 .242 .614 .203 .645 .622

Lag 1 .720 .364 .593 .482 .463 .797 .821 .777 .271 .774 .295 .711 .660

Lag 0 .539 .257 .484 .424 .415 .684 .795 .635 .217 .621 .247 .652 .637

MobiIe

Lag 2 .520 .343 .454 .312 .227 .689 .623 .609 .268 .544 .265 .620 .604

Lag 1 .717 .386 .572 .471 .413 .811 .845 .748 .320 .748 .290 .784 .640

Lag 0 .551 .359 .476 .399 .332 .699 .800 .700 .305 .600 .275 .648 .628

Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)

13/22

Page 14: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(3) : Monitoring Model

PM10 data monitoring model

Using the logistic regression model and multiple regression model to have validity

Logistic regression model

(1)

(2)

(3) Myers, 1990

Multiple logistic regression model Method

Keyword choice• high correlation of lad 0,1,3• Stepwise approach

Model evaluation• Partial F-test ; 2 14/22

Page 15: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(4) : Analysis result

Case of 北京(베이징)

Considering “PC based” and “Mobile based” respectively

RC SE SRC tΔR2

(ΔF)

(constant) -0.3892 0.158 -24.692

沙尘暴

( lag 2)1.096 0.223 0.404 4.911**

0.605

(82.596)

PM2.5

(lag 1)0.737 0.155 0.291 4.748**

0.124

(24.225)

感冒

( lag 1)0.806 0.206 0.322 3.905**

0.063

(15.724)

哮喘

( lag 1)0.381 0.117 0.209 3.251**

0.036

(10.569)

RC : Regression Coefficient, SE : Standard ErrorSRC : Standardized Regression CoefficientR2=0.827, F-value(p-value)=61.136**(0.000)**; p<0.01

PC based multiple Regression Analysis

PM

10(%

)

PC based Estimated PM10 by MRM

15/22

Page 16: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅳ. Phase(4) : Analysis result

Case of 北京(베이징)

Considering “PC based” and “Mobile based” respectively

RC : Regression Coefficient, SE : Standard ErrorSRC : Standardized Regression CoefficientR2=0.861, F-value(p-value)=62.083**(0.000)**; p<0.01

Mobile based multiple Regression Analysis

PM

10(%

)

Mobile based Estimated PM10 by MRM

RC SE SRC tΔR2

(ΔF)

(constant) -3.802 0.123 -30.844

沙尘暴

( lag 2)1.213 0.238 0.399 5.094**

0.613

(85.706)

PM2.5

(lag 1)0.502 0.167 0.242 3.001**

0.163

(38.646)

感冒

( lag 2)0.611 0.145 0.243 4.224**

0.047

(13.993)

哮喘

( lag 1)0.039 0.016 0.192 2.533**

0.022

(7.168)

气管炎

( lag 1)0.48 0.202 0.202 2.379**

0.016

(5.659)

16/22

Page 17: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅴ. Conclusion : Discussion

This research use the data(search query in Baidu, and PM10 in KMA) to improve the correlation,

regression

• 58,885,450 values

• Choose the 13 keywords

I found the districts in China that are related to PM10 in Korea by descending order

Because of the time difference, took account of the lag 0, 1, 2

I found the monitoring model is valid statistically. This can predict the PM10 occurrence in Korea

1~2 ago by regional groups in China

I found the accuracy that mobile query have more accuracy to estimate than pc query

So, I can predict PM10 figures in Korea using the search query in advance

17/22

Page 18: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅴ. Conclusion : Limitation

It may be meaningless to divide the PC query and Mobile query. Because there is no standard.

I did not consider the difference of observatory stations in Korean.

It is hard to collet the query data. So, I use the only 1 years data set.

This analysis is in progress for complete enumeration survey.

I cannot contain all of analysis result from city of china in my paper.

18/22

Page 19: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅴ. Conclusion : Future Work

Making one linear equation by accumulating the data before 2014 continuously

It is necessary to add relative data to raise accuracy

Developing the monitoring model in considering of various methodologies

19/22

Page 20: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Ⅵ. References

• Analysis of a decade of Asian outflow of PM10 and TSP to Gosan, Korea; also incorporating Radon–222, Jagoda Crawford 1,

Atmospheric Pollution Research 6 (2015) 529‐539

• Exploring the modeling of spatiotemporal variations in ambient air pollution within the land use regression framework: Estimation of

PM10 concentrations on a daily basis, Md. Saniul Alama, Taylor & Francis, 2015

• A spatially varying coefficient model for mapping PM10 air quality at the European scale, N.A.S. Hamm, Atmospheric Environment

102 (2015) 393e405

• Origin of PM10 Pollution Episodes in an Industrialized Mega-City in Central China, Xinguo Zhuang, Aerosol and Air Quality

Research, 14: 338–346, 2014

• Spatiotemporal Variations and Possible Sources of Ambient PM10 from 2003 to 2012 in Luzhou, China, Dong Ren, Environ. Eng. Res.

2014

• Spatiotemporal Variations of Ambient PM10 Concentrations in Nanchong, a Big City of Southwest China, Youping Lij, Nat. Env. &

Poll. Tech, ISSN: 0972-6268 Vol. 14 No. 1 pp. 165-170

• Moniroting Influenza Epidemics in China with Search Query from Baidu, Qingyu, PLOS ONE, Vol. 8, Iss.5, 2013

• Carneiro, H.A. and Mylonakis, E., “Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks”, Clinical

Infectious Diseases, Vol. 49, No. 10, pp. 1557-1564, 2009

• Doornik, J.A., Improving the Timeliness of Data on Influenza-like Illnesses using Google Search Data, University of Oxford Report,

pp. 1-21, 2009

• Eysenbach, G., “Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance”, Proc.of AMIA 2006

Symposium, pp. 244-248, 2006

• Ginsberg, J., Mohebbi, M.H., Patel, R. S., Brammer, L.,Smolinski, M. S. and Brilliant L., “Detecting influenza epidemics using

search engine query data”, Nature, Vol.457, pp. 1012-1014, 2009

• World Health Organization, Fact Sheet N°211, March 2014 (http://www.who.int/mediacentre/factsheets/fs211/en/)

• The Electronics Times (http://www.etnews.com/20141209001193)

20/22

Page 21: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Appendix

Paper

A Smart Healthcare Salinity Meter Design for User(KSDS, author, 11.2014)

Analysis of user experience using the social network(SNA) : View from the data in debit cards and

partnerships(HCIK, author, 12.2014)

Designing mHealth intervention for Women in Menopausal Period(PervasiveHealth 2015, co-, 05.2015)

Understanding Women’s Needs in Menopause for Development of mHealth(MobiHoc 2015, co-, 05.2015)

Contest

Financial idea competition 4th prize(IBK,11.2013)

21/22

Page 22: Comparative study on dust pollution on air quality in Korea and …humanict.skku.edu/data/paper_present/... · 2016-01-08 · ILI, VD data by Centers for Disease Control and Prevention

Thank you

Heesoek Jeong