text mining on social media’s data during disasters
TRANSCRIPT
TEXT MINING ON SOCIAL MEDIA’S DATA DURING
DISASTERS RESPONSE PHASE
BY
SAFA BOUGOUFFA
A dissertation submitted in fulfilment of the requirement for
the degree of Master of Information Technology
Kulliyyah of Information and Communication Technology
International Islamic University Malaysia
MARCH 2015
ii
ABSTRACT
The pervasive use of social media has engendered extraordinary amounts of social
data. Social media provides easily an accessible platform for users to share
extensively information about situational updates during emergency states and
disasters. In this study, we apply a text mining technique on Twitter’s data throughout
the New York snowstorm and the Hollister earthquake, together with a content
analysis of an Online Discussion Forums collection. We explore the role of text
mining in evaluating information during disasters in order to understand the use of
social media as well to the issues that emerged by the online users during the
preparedness, mitigation, response and recovery phases of a disaster. Moreover, the
investigation is born out of the fact that Information and Communication
Technologies can serve to help in the different phases of a disaster. An informatics
focus on matters of disasters is essential for the social good, as well for the increase of
the attention from all sectors on disasters. The results of this study show that the
information on Twitter were not the original content, but instead it came from
traditional media and other sources, which were subject to journalistic standards and
the social media played the role of a mediator between the different types of media.
iv
APPROVAL PAGE
I certify that I have supervised and read this study and that in my opinion, it conforms
to acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Master of Information Technology.
......................................................
Mira Kartiwi
Supervisor
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Master of Information Technology.
......................................................
Abdul Arahman Bin Ahmad
Dahlan
Examiner
This dissertation was submitted to the Department of Information Systems and is
accepted as a fulfilment of the requirement for the degree of Master of Information
Technology.
......................................................
Mior Nasir Mior Nazir
Head, Department of
Information Systems
This dissertation was submitted to the Kulliyyah of Information & Communication
Technology and is accepted as a fulfilment of the requirement for the degree of Master
of Information Technology.
......................................................
Abdul Wahab Abdul Rahman
Dean, Kulliyyah of Information
and Communication Technology
v
DECLARATION
I hereby declare that this dissertation is the result of my own investigations, except
where otherwise stated. I also declare that it has not been previously or concurrently
submitted as a whole for any other degrees at IIUM or other institutions.
Safa Bougouffa
Signature …………………………………… Date ……………………..
vi
COPYRIGHT PAGE
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
DECLARATION OF COPYRIGHT AND AFFIRMATION OF
FAIR USE OF UNPUBLISHED RESEARCH
Copyright © 2015 by Safa Bougouffa. All rights reserved.
TEXT MINING ON SOCIAL MEDIA’S DATA DURING
DISASTERS RESPONSE PHASE
No part of this unpublished research may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise without prior written permission of the copyright holder except
as provided below.
1. Any material contained in or derived from this unpublished research
may only be used by others in their writing with due acknowledgement.
2. IIUM or its library will have the right to make and transmit copies (print
or electronic) for institutional and academic purposes.
3. The IIUM library will have the right to make, store in a retrieval system
and supply copies of this unpublished research if requested by other
universities and research libraries.
Affirmed by Safa Bougouffa
…..…………………………….. ……………………………
Signature Date
vii
ACKNOWLEDGEMENTS
In the name of Allah, the Most Gracious and Most Merciful
Alhamdulillah. Thanks to Allah SWT for granting me the wisdom, health and strength
to undertake this dissertation task and enabling me to its completion. I am grateful to a
number of people who have guided and supported me throughout the research process
and provided assistance for my venture. I would like to express my deepest thanks to
my Supervisor, Dr. Mira Kartiwi for her valuable guidance, scholarly inputs and
consistent encouragement I received throughout the research work.
Finally, my sincere indebtedness and gratitude goes to my beloved parents and
not forget to my dearest siblings their endless love, prayers, encouragement,
constructive suggestion and full of support for the dissertation completion, from the
beginning till the end.
To them, I am eternally grateful
viii
TABLE OF CONTENTS
Abstract .................................................................................................................... ii Abstract in Arabic .................................................................................................... iii Approval Page .......................................................................................................... iv
Declaration ............................................................................................................... v Copyright Page ......................................................................................................... vi Acknowledgements .................................................................................................. vii List of Tables ........................................................................................................... x List of Figures .......................................................................................................... xi
CHAPTER ONE: INTRODUCTION ................................................................. 1 1.1 Background of Study .............................................................................. 1
1.2 Problem Statement .................................................................................. 4 1.3 Research Questions ................................................................................. 5 1.4 Research Objictives ................................................................................ 5 1.5 Methodology ........................................................................................... 6
1.6 Difinitions of Terms ............................................................................... 6 1.6.1 Social Media: ................................................................................ 6
1.6.2 Disaster.......................................................................................... 7 1.6.3 Text Mining................................................................................... 7
1.7 Significance of the Study ........................................................................ 8
1.8 Organization of the Dissertation ............................................................. 9 1.8.1 Chapter One: Introduction ............................................................ 9
1.8.2 Chapter Two: Literature Review ................................................... 9
1.8.3 Chapter Three: Research Methodology ........................................ 9
1.8.4 Chapter Four: Analysis and Findings............................................ 10 1.8.5 Chapter Five: Conclusion.............................................................. 10
CHAPTER TWO: LITIRATURE REVIEW ..................................................... 11 2.1 Introduction............................................................................................. 11 2.2 Disasters .................................................................................................. 11
2.2.1 Hazards.......................................................................................... 12
2.2.2 Vulnerability ................................................................................. 13 2.2.3 Capacity ........................................................................................ 13
2.2.4 Risk ............................................................................................... 14 2.3 Disaster Management ............................................................................. 14
2.3.1 Phases of Disaster Management: .................................................. 15
2.3.1.1 Mitigation ......................................................................... 18 2.3.1.2 Preparedness ..................................................................... 18 2.3.1.3 Response ........................................................................... 19 2.3.1.4 Recovery ........................................................................... 21
2.4 Social Media ........................................................................................... 22 2.4.1 Facebook ....................................................................................... 24 2.4.2 Twitter ........................................................................................... 25 2.4.3 Online Discussion Forums: ........................................................... 25 2.4.4 Wikis ............................................................................................. 26
ix
2.4.5 Blogs ............................................................................................. 27
2.4.6 YouTube........................................................................................ 27 2.5 Social Media And Disasters ................................................................... 28
CHAPTER THREE: REASERCH METHODOLOGY ................................... 36 3.1 Introduction............................................................................................. 36 3.2 Qualitative Research Design................................................................... 36 3.3 Quantitative Research Design................................................................. 38
3.4 Mixed Methods Research Design ........................................................... 39 3.5 Why A Mixed Methodology ................................................................... 40 3.6 Data Collection ....................................................................................... 42 3.7 Selected Reasearch Design ..................................................................... 43
3.7.1 Content Analysis ........................................................................... 43
3.7.2 Text Mining................................................................................... 44
3.7.2.1 R Software ........................................................................ 45
3.7.3 Google Trends ............................................................................... 48
CHAPTER FOUR: ANALYSIS AND FINDINGS ............................................ 50 4.1 Introduction............................................................................................. 50
4.2 Content Analysis ..................................................................................... 50 4.3 Text Mining Analysis ............................................................................. 54
4.4 Accuracy of Social Media ...................................................................... 66 4.5 Reliability of Social Media during Disasters .......................................... 67 4.6 Conclusion .............................................................................................. 68
CHAPTER FIVE: CONCLUSION ..................................................................... 69 5.1 Summary of the Study ............................................................................ 69
5.2 Findings of the Study .............................................................................. 70
5.2 Limitations .............................................................................................. 72 5.3 Recommendations................................................................................... 72 5.4 Conclusion .............................................................................................. 73
BIBLIOGRAPHY .................................................................................................. 75
x
LIST OF TABLES
Table No. Page No.
4.1 Online Discussion Forums Analysis Part 1 52
4.2 Online Discussion Forums Analysis Part 2 53
5.1 Findings of the Study 70
xi
LIST OF FIGURES
Figure No. Page No.
2.1 Disaster Management Stages 17
3.1 Create an Application 46
3.2 Application Page 47
3.3 R Oauth Code for Twitter 48
4.1 R Code for Mining Tweets of New York Snowstorm 56
4.2 Wordcloud of New York Snowstorm Tweets 57
4.3a, 4.3b: R Code to Extract Graph of Who Tweets Who 59
4.4 Tweets with ‘New York snow storm’: Who retweets Whom 60
4.5a, 4.5b: R Code for Mining Tweets of Hollister Earthquake 62
4.6a, 4.6b: Word Cloud of Hollister Earthquake 63
4.7a, 4.7b: Google Trends Interest Over Time Graph 65
4.8a, 4.8b: Google Trends Regional Interest Graph 66
1
CHAPTER ONE
INTRODUCTION
1.1 BACKGROUND OF STUDY
In the last few years, the world has witnessed a sequences of a huge disasters.
Hurricane Katrina in 2005 in USA, European Heat Wave in 2003, Kashmir
Earthquake, the tsunami in Indonesia and Japan, Cyclone Nargis in Myanmar,
Australia and Thailand floods. These disasters struck without any warning and put to
death thousands and thousands of people (Zin, Tin, Hama, & Toriu, 2013). However,
with the quick growth of the world's population together with the increase of the
concentration in hazardous environments has intensified both of the frequency and
severity of disasters. In addition to the unstable land forms and tropical climate which
make the disaster-prone areas more vulnerable. Subsequently, a period of time follows
these disasters in which people will have limited situational awareness bound to their
personal environment (Rogstadius, V., Laredo, & Vukovic, 2011), combined with lack
of information about possible sources of food, shelter, transportation and many others
(Zin, Tin, Hama, & Toriu, 2013). As a result, communication increases among the
people.
A general comprehension and effective communication are needed to
minimize the losses in lives and properties to a maximum level (Mwendwa, 2013). In
order to realize these needs a technology-based system that insure public information
and education, improved warning and disaster preparedness is needed (Mwendwa,
2013). Moreover, the communication plays a very important role for disaster
mitigation as it can help in providing the data management and analysis techniques
besides to increasing knowledge toward disasters (Mwendwa, 2013). In general, to
2
respond to disasters and its effects a recovery plans are developed and actions are
taken by organizations (Singh, 2012). In disasters the media play a critical role as it is
believed to be a strong communication technique that lead to a successful
understanding of the situation by making the message more valuable and credible for
the public (Mwendwa, 2013).
Television and newspaper play the role of communication tool during disaster.
However, these media represent the old communication paradigm with one- way
information (Giroux, Roth, & Herzog, 2013). Nowadays, Information moves in
multiple directions resulting in two-way communications (Giroux, Roth, & Herzog,
2013), The Social media can be given as the best example of two-way communication
media that employ interactive online information and communication technologies
(Wang, Lin, & Bagrow, 2012). Social media encourages users to interact and
dialogue, creating an information space that is decentralized and devoid of hierarchy
(Giroux, Roth, & Herzog, 2013). Social media also change the way people spend
their leisure time by offering extensive possibilities of virtual interaction and
entertainment. It also has become a very important part of the lives of many people in
the world and has modified the way humans interact and have business with each
other (Wang, Lin, & Bagrow, 2012).
Social media platforms such as Facebook, Flickr, Twitter and YouTube
provide a great source of information (Dou, Wang, Ribarsky, & Zhou, 2012). In
addition to being visited daily by millions of users which makes them among the first
news’ forecasters and knowledge providers to a large mass of people (Nagar, Seth, &
Joshi, 2012). Furthermore, it is also utilized for the purpose of expressing opinions
and points of view (Agarwal & al., 2011). It’s also a venue for publicizing disasters,
for becoming involved in the large pool of social interactions surrounding a particular
3
disaster, and for propagating false information related to a disaster (Landwehr, M., &
Carley, 2014). Over the past few years, the online conversations have experienced a
remarkable growth (Dou, Wang, Ribarsky, & Zhou, 2012). Most of the conversations
are related to a user’s personal circle. However, a large part of the conversations are
responses caused by events (Wang, Lin, & Bagrow, 2012). Users’ updates are one of
the emergency notification tools that notify followers of an upcoming disaster (Dou,
Wang, Ribarsky, & Zhou, 2012), they also can be a reliable source of information for
majority of adults during different phases of the disaster (Carson, 2014).
During 2012 Hurricane Sandy, a huge number of Americans used Facebook,
Twitter, and social networks sites for information gathering about the storm’s
predictable track, the location of open shelters and streets and towns that been
flooded, also on how to ask for federal assistance. As said by the Federal Emergency
Management Agency (FEMA), the users of social media sent over 20 million tweets
related to Sandy during the storm in spite of the hard outage of cellphone and power.
Which was the same case eleven years ago as hundreds of families in the same
neighborhoods of New York and New Jersey waited for long time to hear about their
relatives as a result of the same types of outages (Brooks, 2014).
Lawmakers and security experts have begun to evaluate how disaster
management can best adapt and researchers have pointed out the importance of social
media in disasters (Maron, 2013). As Facebook supports many organizations that are
related to emergencies such as: Information Systems for Crisis Response and
Management (ISCRAM), The Humanitarian Free and Open Source Software (FOSS)
Project (Lindsay, 2011). Disaster management organizations use social media to
distribute information, receive feedback through wall posts and received messages
(Lindsay, 2011). Moreover, social media can be considered as a tool to conduct
4
disaster communications warnings, to receive requests of victims for help, to establish
situational awareness by monitoring users’ activities and by using uploaded images to
estimate damage among people (Lindsay, 2011).
With the extension of social media, disaster management organizations have
increasingly head to social networks in order to aid in rescue and relief efforts. The
significance of social media in disaster management also been recognized by the
United Nations (UN) and projects such as “Space-Based Information for Crowdsource
Mapping” have become one of the essential activities of the United Nations Platform
for Space-based Information as a part of Disaster Management and Emergency
Response. (Brooks, 2014).
Although the positive characteristics of the use of social media during disaster
events, a light should be on the negative consequences. As social media can be
harmful because of the possibility of misleading, faulty, and even malignant
information to spread quickly on social media.(Brooks, 2014).
1.2 PROBLEM STATEMENT
Disasters in today’s globalized world are becoming not only more frequent but, often,
more catastrophic. Over the last few years the world got affected by more than 400
natural disasters that killed more than 297,000 people and affected over 217 million
others. Many infrastructure and permanent assets were created as a result of the
technological advancement and development. The increased loss of lives and
properties due to disasters make the communities to explore disaster management in a
way that anticipate threats and enable tackling of disasters from the pre-stage.
Thus, the direction of research is now going on how to minimize the effects of
disasters and to decrease or evade the human, physical, and economic losses suffered
5
by the community. In addition to investigate the use of Information and
Communication Technology in delivering quick and fast information to potential
victims. With a focus on the accuracy and reliability of information provided by
social media during disasters.
Therefore, this study is to examine the use of text mining on social media’s
data during disasters in order to evaluate the information accuracy and reliability, and
to find the role played by social media’s platforms during the most recent disaster
events with the intention to better understand the related benefits and risks that will
help in creating more awareness of its importance.
1.3 RESEARCH QUESTIONS
The following questions are the basis of the current study:
1. How to evaluate information during disasters by using Text mining?
2. What is the role of social media in disasters response?
3. What are the issues emerged by the social media during the disaster
response phase?
4. How can the government and organizations make use of the information
discovered of the text mining?
1.4 RESEARCH OBJICTIVES
This study is:
1. To identify the role of Text mining in evaluating information during
disasters.
2. To identify the use of social media in disasters response.
6
3. To determine the issues emerged by the social media during the disaster
response phase.
4. To understand use of the information discovered by text mining for the
government and organizations.
1.5 METHODOLOGY
To examine these research questions, a text mining methodology is used to collect
data from social media platforms during disasters. Together with A content analysis of
Online Discussion Forums to determine the issues emerged during the disaster
management phases. Further details are mentioned in Chapter Three.
1.6 DIFINITIONS OF TERMS
1.6.1 Social Media:
Social media has been defined by Paquette (February 2011) as an developed
technology for the potential to permit the adaptability, flexibility, and boundary
spanning functionality needed by a number of response organizations for their
information systems (Paquette, February 2011). Furthermore, Mayfield (2008)
identified it as a collection of new online media, which characterize by the following
features: connectedness, participation, openness, conversation, and community.
Moreover, Ellison (2008) Defined social media as web-based services that make
individuals able to create a public profile within a bounded system, have a list of other
users that they share a connection with, and view and navigate their list of connections
within the system. The nature of these connections may differ from a site to another
(Ellison, 2008).
7
Social media includes web-based and mobile technologies which used to make
the communication more interactive dialogue. Social media is also defined as a group
of Internet-based applications that created on the technological foundations of Web
2.0, and that permit the establishment and exchange of user-generated content
(Zlateva, 2012). Social media is the kind of media that used for social interaction as
it’s enabled by communication technologies such as the web and smartphones
(Wikipedia, 2014). Communications on social media being very distributed,
decentralized and happening in real time, they provide the essential breadth and
immediacy of information required in times of disasters (Palen, 2008).
1.6.2 Disaster
Disasters happen as a result of the impact of a natural or a human-caused hazard.
Natural hazards include phenomena such as earthquakes, volcano, landslides,
tsunamis, tropical cyclones, tornadoes, coastal flooding, wildfires and many others.
Human-caused hazards may be planned, for example the illegal discharge of oil, or
accidental toxic spills or nuclear meltdown. All of these hazards threat people,
ecosystems, flora and fauna.(Quarantelli, 2005).
1.6.3 Text Mining
Text data mining is defined as the procedure of deriving high-quality information
from text. High-quality information means information that derived through the
devising of patterns and trends through means, a statistical pattern learning can be
given as an example. Text mining usually consist of the process of constructing the
input text (usually analyzing, together with the addition of some derived linguistic
features and the removal of others, and following insertion into a database). In text
8
mining high quality means the combination of novelty, relevance, and interestingness.
Text mining includes many tasks such as text categorization, text clustering,
concept/entity extraction, production of granular taxonomies, sentiment analysis,
document summarization, and entity relation modeling (Wikipedia, Text mining, 2014
).
Text analysis includes information recovery, analysis of word frequency
distributions, pattern recognition, tagging/annotation, information extraction, data
mining techniques including link and association analysis, visualization, and
predictive analytics. The main goal is basically to turn text into data for analysis, via
application of natural language processing and analytical methods (Wikipedia, Text
mining, 2014 ).
1.7 SIGNIFICANCE OF THE STUDY
Since the objectives of the study need to be addressed, the significance of the study is
born out of the fact that Information and Communication Technologies will help in
disaster preparation, warning, response, and recovery. A focus on matters of disasters
from an information and communication technology perspective is critical for the
social good. With growing attention from all sectors on disasters. The study will assist
in the use of the text mining as a methodology that will help in evaluating the
information accuracy of the social media during the different phases of the disaster
management.
9
1.8 ORGANIZATION OF THE DISSERTATION
The report of the study is organized into five chapters that address the main concern of
the study which is to discover the role of social media in disaster management phases.
The summary of those chapters is hereby given:
1.8.1 Chapter One: Introduction
This chapter introduce the study providing its purpose, the research questions and
objectives. It briefly describes the method used to gather the data while exposing the
benefits that this study brings as well as the limitations in carrying it out.
1.8.2 Chapter Two: Literature Review
This chapter looks at the history of Social Media and disasters. It as well highlight the
use of Information and Communication Technologies during disasters. It looks at
previous works related to the use of social media and the application of text mining on
social media’s data during disasters.
1.8.3 Chapter Three: Research Methodology
This chapter will take a look at the methods and tools that will be used in this study in
order to realize the research objectives. A Mixed methodology will be considered,
starting with the content analysis of the online discussion Forums, and following by
the text mining methodology applied on Twitter’s data with the help of the R software
and Google trends.
10
1.8.4 Chapter Four: Analysis and Findings
In this chapter, the content analysis of the online discussion forums is presented.
Together with the analysis of the text mining. Results will be highlighted and findings
will be concluded.
1.8.5 Chapter Five: Conclusion
A summary of the use of text mining on social media’s data and a content analysis of
online discussion forums during disasters will be presented. Followed by the findings
of both analysis and a sneak peek on the limitation of this research. A future
recommendations will also be encountered by the end of the chapter.
11
CHAPTER TWO
LITIRATURE REVIEW
2.1 INTRODUCTION
Many scholars and researchers have contributed to the literature regarding the social
media use in disasters and emergency situations. This chapter will first look into the
disasters notion and the management of disasters with a detailed explanation of the
management phases. It will also address the history of social media and its platforms.
Finally will discuss the literature related to the use of Information and Communication
technology during disasters together with the text mining finding of the previous
studies.
2.2 DISASTERS
The word Disaster ows its origin to the French word Desaster, that refers to ’bad or
evil star’ (Satendra, 2003). However, a disaster is defined by The World Health
Organization as “a sudden ecological phenomenon of sufficient magnitude to require
external assistance”. A disaster also means a situation in which there is a sudden
disruption of normalcy within society that cause widespread harm to life and property
(Hodgkinson & Stewart, 1991). Furthermore, a disaster can be as a result of the hazard
impacts on the vulnerable population that causes damage, casualties and interruption
(Vasilescu, 2008).
In (Singh, 2012) ‘Disaster’ is defined as a crisis situation causing wide spread
damage which far exceeds the human ability to recover. Moreover A disaster is the
situation that put the community in a state in which they are unable of coping. It also
can be natural or man-made that causes powerful negative influence on goods,
12
services and people, surpassing the capability of community to respond; hence the
community keep looking after the aid of government and international agencies. A
further definition of disaster is the event that make the society and community
experience a critical lack in basic necessities and food as a result of natural or human-
caused that exceed the disruption of the function of the community and society
(Lelisa Sena, 2006).
Disasters are dangerous events which result in a human’s life and property
losses, they make a threat to both the normal life and the process of development
(Satendra, 2003), A natural disaster can be a result of biological, geological, seismic,
hydrologic, meteorological conditions or processes in the natural environment such as
[rains, floods, tsunamis, cyclones, storms, landslides, earthquake, volcanoes and
tornadoes]. As for a man-made dsaster it can be as result of wars that contain
biological, arson, sabotage, riots, accident (train, air, ship), industrial accidents, fires
(forest fires), bomb explosions, nuclear explosions and ecological disasters]
(Hodgkinson & Stewart, 1991). Natural disasters usually cannot be prevented, but
actions can be taken in order to reduce or eliminate the possibility of trouble
(Waeckerle, 1991). Their chance of occurrence, time, place and severity of the strike
can be sometimes predictable with the help of some advanced scientific and
technological tools (Singh, 2012).
2.2.1 Hazards
Vasilescu (2008) define Hazard as “a dangerous situation or incident, which make a
threat to life or damage to property or the environment.” Hazards can be both natural
and manmade. Natural hazards are the type of hazards that happen as a result of
natural phenomena such as cyclones, tsunamis, earthquake and volcanic eruptions
13
which are completely of natural origin. Landslides, floods, drought, fires are socio-
natural hazards as they are caused by both natural and man-made. Flooding can be as
an example that result of heavy rains and landslide. Manmade hazards are the second
type of hazards that happen due to human negligence. Manmade hazards are related to
industries or energy generation facilities that can be explosions, leakage of toxic
waste, pollution, dam failure, wars or civil strife, etc. There is a long list of hazards.
Many happen frequently while others take place occasionally (Vasilescu, 2008).
2.2.2 Vulnerability
Vasilescu (2008) define Vulnerability as “The level to which a community, structure,
services or geographic area is expected to be damaged or disrupted as a result of
particular hazard, at the expense of their nature, construction and nearness to a disaster
prone area.” Vulnerabilities can be grouped in two categories: physical and socio-
economic. Physical Vulnerability contains answers for the question what may be
destroyed by a natural hazard. It is basically the physical state of people and elements
that are in risk, for example buildings, infrastructure and so on; and their proximity,
location and nature of the hazard. It also connected to the technical capability of
building and structures to resist the forces acting upon them during a hazard event
(Vasilescu, 2008).
2.2.3 Capacity
Capacity is defined by Vasilescu (2008) as the resources, incomes and strengths which
are a part of communities that makes them capable of preparedness, prevention,
mitigation, and recovery from a disaster. Capacities of people can be categorized into
two groups: physical and socio-economic. The physical capacity can be defined as the