comparative study on dust pollution on air quality in korea and...
TRANSCRIPT
Comparative study on dust pollution on air quality
in Korea and China through query search
Heeseok Joeng
Sungkyunkwan University
Dept. of Human ICT Convergence
June 05, 2015
Contents
Ⅰ. Introduction
Ⅱ. Related Work1. Air pollution2. Search Query
Ⅲ. Research Question & Methodology
Ⅳ. Phase1. Data Collect2. Correlation Coefficient3. Monitoring Model4. Analysis result
Ⅴ. Conclusion1. Discussion2. Limitation3. Future Work
Ⅵ. References
Appendix
Ⅰ. Introduction
Rapid industrialization and urbanization have seriously degraded air quality (Yasunari et al., 2013; Fajersztajn et al., 2013) in many regions around the world, particularly in China
Large amounts of air pollutants emitted in China are directly related to high-PM10 episodes in Seoul(WTO, 2013)
3/22
Ⅰ. Introduction
Google Flu Trend(GFT) _ https://www.google.org/flutrends/intl/ko/
ILI, VD data by Centers for Disease Control and Prevention• ILI : influenza-like illness• VD : influenza virus detection
Tracking flu-related searches on the web for surveillance by query
4/22
Ⅱ. Related Work : Air pollution
Definiton• What is the air pollution• dust, yellow dust, fine dust, PM10, PM2.5 and so on• the important air pollution in Korea and China
By Korea forest research institue(2012)
Air Mass Transport Pathways, Source of the air pollution• Composition Variation of Atmospheric Fine Particulate Matters in Accordance with Air Mass Transport
Pathways at Background Site of Korea in 2013
By Asian Dust Research Laboratory, National Institute of Meteorological Research(2014)
5/22
Ⅱ. Related Work : Search Query
Google Trends• Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease by Herman Anthony Carneiro(2009)• Improving the Timeliness of Data on Influenza-like Illnesses using Google Search Data by Jurgen A. Doornik
University of Oxford, UK (2009)• Detecting influenza epidemics using search engine query data by Jeremy Ginsberg(2009)
Baidu• Monitoring influenza epidemics in china with search query from baidu By Qingyu Yuan(2013)
6/22
Ⅲ. Research Question & Methodology
Research Question
How can we monitor air pollution(PM10) through query search in Baidu?
Consideration
• There is no geographic information exactly on PM10 in China(from Korea) → we can find that through query search in baidu
• How that it matches (accuracy) ex. Lag
• What models can we apply the date?
Phase(Methodology)
Data Collect
(Baidu index keyword)
(Kma pm10 data)
Data Analysis
(Statistical analysis –
correlation coefficient)
Data evaluation
(logistic regression model)
(multiple regression model)
Comparison
(Korea & China)
7/22
Ⅳ. Phase(1) : Data Collect
Data Collect
Source : Korea Meteorological Administration
• Classification : PM10 data
• Period : 2014.06 ~ 2015.05
• Criteria : maximum value one of the day, of 28 observing station in Korea
• Amount of data : 365 values(1countries * 12months * 30days)
8/22
Ⅳ. Phase(1)
Data Collect
Source : Baidu Index(http://index.baidu.com/)
• Classification : 34 districts, selected key word
• Period : 2014.06 ~ 2015.05
• Criteria : search query of a day in baidu
• Amount of data : 161,330 values(34 districts * 12months * 30days * 13 keywords)
Chinese Koran安徽 안후이澳门 마카오北京 베이징重庆 충칭福建 푸젠
广东 광둥甘肃 간쑤
广西 광시
贵州 구이저우河北 허베이
黑龙江 헤이룽장河南 허난湖南 후난
湖北 후베이
海南 하이난吉林 지린
江苏 장쑤
江西 장시
辽宁 랴오닝
内蒙古 네이멍구
宁夏 닝샤
青海 칭하이
上海 상하이
四川 쓰촨山东 산둥
山西 산시
陕西 시안天津 톈진
台湾 대만西藏 티베트
香港 홍콩
新疆 신장云南 윈난
浙江 저장 9/22
Ⅳ. Phase(1) : Select the keywords
Data Collect
Selection and purification of keyword
• Key Words : 13 words
Sand
sand strom
sand dust strom
Dust
Fine dust
Smog
Floating dust
PM10
PM2.5
keyword
Selected Key Works
prevention
Symptom
treatment
adjunct沙尘暴(황사)
沙暴(모래폭풍)
微尘(미세먼지)
雾霾(스모그)
灰尘(먼지)
感冒(감기)
小心(감기)
哮喘(천식)
喘息(천식)
气管炎(기관지염)
口罩(마스크)
PM10(미세먼지)
PM2.5(초미세먼지)
Search query
10/22
Ⅳ. Phase(2) : Correlation Coefficient
Analysis of data : Correlation Coefficient
Correlation of search query and PM10
• Consideration : time-lag
• T time of PM10(t, t-1, t-2, t-3, t-4)
Baidu data is divided in the PC and mobile
北京_
베이징
PC
TimeLag
沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩
Lag 2 .778 .376 .598 .485 .408 .720 .865 .761 .286 .756 .281 .738 .730
Lag 1 .762 .382 .572 .457 .425 .723 .870 .768 .295 .766 .258 .744 .743
Lag 0 .623 .264 .441 .352 .333 .642 .772 .624 .211 .612 .211 .543 .555
MobiIe
Lag 2 .812 .399 .622 .499 .416 .733 .877 .772 .301 .768 .301 .764 .762
Lag 1 .799 .400 .600 .488 .414 .754 .887 .788 .321 .751 .264 .789 .790
Lag 0 .663 .284 .522 .356 .234 .688 .721 .636 .233 .634 .243 .592 .612
Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)
11/22
Ⅳ. Phase(2) : Correlation Coefficient
Analysis of data : Correlaton Coefficient
Correlation of search query and PM10
• Consideration : time-lag
• T time of PM10(t, t-1, t-2, t-3, t-4)
Baidu data is divided in the PC and mobile
上海_
상하이
PC
TimeLag
沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩
Lag 2 .502 .124 .230 .253 .337 .370 .487 .411 .177 .437 .131 .272 .312
Lag 1 .565 .310 .223 .318 .407 .477 .594 .509 .210 .520 .184 .464 .437
Lag 0 .381 .047 .247 .219 .282 .454 .455 .450 .035 .397 .014 .369 .087
MobiIe
Lag 2 .219 .012 .184 .355 .438 .487 .513 .414 .072 .335 .109 .337 .320
Lag 1 .596 .200 .299 .353 .444 .489 .635 .537 .150 .370 .237 .592 .520
Lag 0 .224 .042 .132 .257 .416 .445 .487 .362 .063 .333 .039 .394 .433
Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)
12/22
Ⅳ. Phase(2) : Correlation Coefficient
Analysis of data : Correlaton Coefficient
Correlation of search query and PM10
• Consideration : time-lag
• T time of PM10(t, t-1, t-2, t-3, t-4)
Baidu data is divided in the PC and mobile
河北_허베이
PC
TimeLag
沙尘暴 沙暴 微尘 雾霾 灰尘 PM10 PM2.5 感冒 小心 哮喘 喘息 气管炎 口罩
Lag 2 .556 .202 .404 .306 .285 .634 .726 .670 .242 .614 .203 .645 .622
Lag 1 .720 .364 .593 .482 .463 .797 .821 .777 .271 .774 .295 .711 .660
Lag 0 .539 .257 .484 .424 .415 .684 .795 .635 .217 .621 .247 .652 .637
MobiIe
Lag 2 .520 .343 .454 .312 .227 .689 .623 .609 .268 .544 .265 .620 .604
Lag 1 .717 .386 .572 .471 .413 .811 .845 .748 .320 .748 .290 .784 .640
Lag 0 .551 .359 .476 .399 .332 .699 .800 .700 .305 .600 .275 .648 .628
Correlation Coefficient between PM10 and Key Words with lag 0, lag 1 and lag 2(p<0.01)
13/22
Ⅳ. Phase(3) : Monitoring Model
PM10 data monitoring model
Using the logistic regression model and multiple regression model to have validity
Logistic regression model
(1)
(2)
(3) Myers, 1990
Multiple logistic regression model Method
Keyword choice• high correlation of lad 0,1,3• Stepwise approach
Model evaluation• Partial F-test ; 2 14/22
Ⅳ. Phase(4) : Analysis result
Case of 北京(베이징)
Considering “PC based” and “Mobile based” respectively
RC SE SRC tΔR2
(ΔF)
(constant) -0.3892 0.158 -24.692
沙尘暴
( lag 2)1.096 0.223 0.404 4.911**
0.605
(82.596)
PM2.5
(lag 1)0.737 0.155 0.291 4.748**
0.124
(24.225)
感冒
( lag 1)0.806 0.206 0.322 3.905**
0.063
(15.724)
哮喘
( lag 1)0.381 0.117 0.209 3.251**
0.036
(10.569)
RC : Regression Coefficient, SE : Standard ErrorSRC : Standardized Regression CoefficientR2=0.827, F-value(p-value)=61.136**(0.000)**; p<0.01
PC based multiple Regression Analysis
PM
10(%
)
PC based Estimated PM10 by MRM
15/22
Ⅳ. Phase(4) : Analysis result
Case of 北京(베이징)
Considering “PC based” and “Mobile based” respectively
RC : Regression Coefficient, SE : Standard ErrorSRC : Standardized Regression CoefficientR2=0.861, F-value(p-value)=62.083**(0.000)**; p<0.01
Mobile based multiple Regression Analysis
PM
10(%
)
Mobile based Estimated PM10 by MRM
RC SE SRC tΔR2
(ΔF)
(constant) -3.802 0.123 -30.844
沙尘暴
( lag 2)1.213 0.238 0.399 5.094**
0.613
(85.706)
PM2.5
(lag 1)0.502 0.167 0.242 3.001**
0.163
(38.646)
感冒
( lag 2)0.611 0.145 0.243 4.224**
0.047
(13.993)
哮喘
( lag 1)0.039 0.016 0.192 2.533**
0.022
(7.168)
气管炎
( lag 1)0.48 0.202 0.202 2.379**
0.016
(5.659)
16/22
Ⅴ. Conclusion : Discussion
This research use the data(search query in Baidu, and PM10 in KMA) to improve the correlation,
regression
• 58,885,450 values
• Choose the 13 keywords
I found the districts in China that are related to PM10 in Korea by descending order
Because of the time difference, took account of the lag 0, 1, 2
I found the monitoring model is valid statistically. This can predict the PM10 occurrence in Korea
1~2 ago by regional groups in China
I found the accuracy that mobile query have more accuracy to estimate than pc query
So, I can predict PM10 figures in Korea using the search query in advance
17/22
Ⅴ. Conclusion : Limitation
It may be meaningless to divide the PC query and Mobile query. Because there is no standard.
I did not consider the difference of observatory stations in Korean.
It is hard to collet the query data. So, I use the only 1 years data set.
This analysis is in progress for complete enumeration survey.
I cannot contain all of analysis result from city of china in my paper.
18/22
Ⅴ. Conclusion : Future Work
Making one linear equation by accumulating the data before 2014 continuously
It is necessary to add relative data to raise accuracy
Developing the monitoring model in considering of various methodologies
19/22
Ⅵ. References
• Analysis of a decade of Asian outflow of PM10 and TSP to Gosan, Korea; also incorporating Radon–222, Jagoda Crawford 1,
Atmospheric Pollution Research 6 (2015) 529‐539
• Exploring the modeling of spatiotemporal variations in ambient air pollution within the land use regression framework: Estimation of
PM10 concentrations on a daily basis, Md. Saniul Alama, Taylor & Francis, 2015
• A spatially varying coefficient model for mapping PM10 air quality at the European scale, N.A.S. Hamm, Atmospheric Environment
102 (2015) 393e405
• Origin of PM10 Pollution Episodes in an Industrialized Mega-City in Central China, Xinguo Zhuang, Aerosol and Air Quality
Research, 14: 338–346, 2014
• Spatiotemporal Variations and Possible Sources of Ambient PM10 from 2003 to 2012 in Luzhou, China, Dong Ren, Environ. Eng. Res.
2014
• Spatiotemporal Variations of Ambient PM10 Concentrations in Nanchong, a Big City of Southwest China, Youping Lij, Nat. Env. &
Poll. Tech, ISSN: 0972-6268 Vol. 14 No. 1 pp. 165-170
• Moniroting Influenza Epidemics in China with Search Query from Baidu, Qingyu, PLOS ONE, Vol. 8, Iss.5, 2013
• Carneiro, H.A. and Mylonakis, E., “Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks”, Clinical
Infectious Diseases, Vol. 49, No. 10, pp. 1557-1564, 2009
• Doornik, J.A., Improving the Timeliness of Data on Influenza-like Illnesses using Google Search Data, University of Oxford Report,
pp. 1-21, 2009
• Eysenbach, G., “Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance”, Proc.of AMIA 2006
Symposium, pp. 244-248, 2006
• Ginsberg, J., Mohebbi, M.H., Patel, R. S., Brammer, L.,Smolinski, M. S. and Brilliant L., “Detecting influenza epidemics using
search engine query data”, Nature, Vol.457, pp. 1012-1014, 2009
• World Health Organization, Fact Sheet N°211, March 2014 (http://www.who.int/mediacentre/factsheets/fs211/en/)
• The Electronics Times (http://www.etnews.com/20141209001193)
20/22
Appendix
Paper
A Smart Healthcare Salinity Meter Design for User(KSDS, author, 11.2014)
Analysis of user experience using the social network(SNA) : View from the data in debit cards and
partnerships(HCIK, author, 12.2014)
Designing mHealth intervention for Women in Menopausal Period(PervasiveHealth 2015, co-, 05.2015)
Understanding Women’s Needs in Menopause for Development of mHealth(MobiHoc 2015, co-, 05.2015)
Contest
Financial idea competition 4th prize(IBK,11.2013)
21/22
Thank you
Heesoek Jeong