twitter sentiment analysis on...
TRANSCRIPT
Twitter sentiment analysis on immigration
Author: Supervisor:
Radu Bogdan Pertescu Dr. Sophia Ananiadou
A project report submitted to the University of Manchester for the Bachelor of
Science with Industrial Experience as part of the Third Year Project
(COMP30040).
2
Abstract
Social media plays an important role in how people communicate these days. Many users of
(delete)social media platforms send messages, discuss ideas, make complaints and review
which represent an important source of information for companies, politicians and researchers
who want to gain an understanding on how people react to different topics.
proposes the development of a tool which will automatizes the process of data extraction, data
sentiment analysis.
Sentiment is determined using two independent classifiers, a lexicon based classifier and a
machine learning classifier. This report also demonstrates how the lexicon classifier provides
the training corpus required for the machine learning algorithm. In addition, an interface will
be implemented to establish how the classification tools can be easily accessed by the user to
be able to create an analysis of the immigration on social media. From the experiments
undergone, the best results have been obtained using a machine learning classifier based on
Naïve Bayes algorithm and extracting only 20% of the features resulting in approximately
80% accuracy.
3
Acknowledgements
First of all, I would like to thank my supervisor Dr. Sophia Ananiadou who provided me with
great advice and materials along with answers to my unlimited number of questions for the
entire period of the project.
I also want to thank my family for financial and moral support during the journey through
university allowing me to be exposed to such a great learning environment and grasp a vast
amount of skills.
Lastly, I want to thank to all my friends who took the time to test (have participated in
me with feedback and suggestions.
4
Table of contents
Chapter 1 Context .............................................................................................................................. 8
1.1 Introduction ............................................................................................................................. 8
1.2 Sentiment Analysis in microblogging ........................................................................................ 8
1.3 Motivation ................................................................................................................................ 9
1.4 Project Outline ......................................................................................................................... 9
1.4 Related work ............................................................................................................................ 9
1.5 Report Structure ..................................................................................................................... 10
Chapter 2 Design .............................................................................................................................. 11
2.1 System Diagram ...................................................................................................................... 11
2.2 Project Lifecycle ...................................................................................................................... 13
2.3 Data Extraction ....................................................................................................................... 13
2.3.1 Motivation ........................................................................................................................ 13
2.3.2 Twitter connectivity .......................................................................................................... 13
2.4 Data pre-processing ................................................................................................................ 14
2.5 Sentiment Evaluation Methods .............................................................................................. 14
Chapter 3 Implementation ............................................................................................................... 14
3.1 Data Extraction ....................................................................................................................... 14
3.2 Text Normalization ................................................................................................................. 17
3.2.1 JSON removal .................................................................................................................... 17
3.2.2 Lower case transformation and Tokenization ................................................................... 17
3.2.3 Removing Usernames ....................................................................................................... 18
3.2.5 Removing URLs ................................................................................................................. 18
3.2.6 Removing duplicates and white spaces ............................................................................. 18
3.2.7 Possible implementations and approaches ....................................................................... 18
3.3 Sentiment Analysis ................................................................................................................. 19
3.3.1 Motivation ........................................................................................................................ 19
3.3.2 Lexicon Based classifier ..................................................................................................... 19
3.3.3 Machine Learning classifier ............................................................................................... 20
3.3.4 Feature Extraction ............................................................................................................. 21
5
3.3.5 Multinomial Naïve Bayes .................................................................................................. 21
3.3.6 Classification Process ........................................................................................................ 22
3.3.7 Combining two different classifiers ................................................................................... 23
3.3.8 Previous approaches and possible implementations ........................................................ 23
3.4 User Interface ......................................................................................................................... 24
3.4.1 Motivation ........................................................................................................................ 24
3.4.2 Django ............................................................................................................................... 24
3.4.3 User Interface Structure .................................................................................................... 24
Chapter 4 Evaluation and Testing ..................................................................................................... 28
4.1 Beta Testing ............................................................................................................................ 28
4.2 Unit Testing ............................................................................................................................ 29
4.3 Further Testing ....................................................................................................................... 29
4.4 Evaluation ............................................................................................................................... 29
Chapter 5 Reflection and Conclusion ................................................................................................ 30
5.1 Requirements ......................................................................................................................... 30
5.2 Learning process ..................................................................................................................... 30
5.3 Challenges .............................................................................................................................. 31
5.4 Future Enhancements ............................................................................................................. 31
5.4.1 Different classifier implementations ................................................................................. 32
References .................................................................................................................................... 33
Appendix ...................................................................................................................................... 36
6
Table of Figures
Figure (1) Diagram of the system ......................................................................................................... 12
Figure (2) The process of extracting the tweets .................................................................................. 16
Figure (3) The tweets collected in a file ............................................................................................... 16
Figure (4) A tweet in JSON format........................................................................................................ 17
Figure (5) One of the tests performed with the spell checker ............................................................ 19
Figure (6) The formula of Naïve Bayes algorithm ................................................................................ 22
Figure (7) The first part of the interface ............................................................................................. 24
Figure (8) The file upload ...................................................................................................................... 25
Figure (9) The ability of saving the chart ............................................................................................. 26
Figure (10) The second part of the interface ....................................................................................... 26
Figure (11) Result of the classification ................................................................................................. 27
Figure (12) The textbox ......................................................................................................................... 27
Figure (13) The TerMine service in highlighted words ........................................................................ 28
7
Acronyms
TM – Text Mining
NLP – Natural Language Processing
ML – Machine Learning
POS – Part Of Speech
CSV – Coma Separated Value
Regex – Regular expression
NLTK –Natural Language Toolkit
MVC – Model View Controller
SKLearn – Science Kit Learn
IE – Information Extraction
IR – Information Retrieval
DM – Data Mining
SVM – Support Vector Machines
8
1 Context
This chapter will present a general description of the text mining including the aims and
objectives of the my project and the applicability of a sentiment analysis tool. in Twitter
such tool in the modern days.
1.1 Introduction 1.1 I would rather start (will begin)with a brief
Text mining can be characterized as the process of analysing natural language text in order to
extract information that is useful for specific purposes. [1]
Text mining is achieved using three major sub-components:
Information Retrieval (IR) – Represents the process of gathering the data required
for analysis.
Information Extraction (IE) – Represents the process of structuring the data using
different Natural Language Processing techniques.
Data Mining (DM) – Represents the process of discovering hidden associations and
creating new knowledge.
Sentiment Analysis (also known as Opinion Mining) is a common topic among the TM
researchers and represents the process of categorising a text as positive, negative and neutral.
[61]
This project is using TM techniques in order to extract and analyse people’s opinions on
Twitter resulting in an overall analysis of the sentiment.
Twitter became (has become) a very important component in Natural Language processing
nowadays (delete nowadays) as they (it) provides an important amount of real-time
(plays) (in todays society)and allowing (opinions??). Twitter became (has
become) a very important component in Natural Language processing nowadays (delete
nowadays) as they (it) provides an important amount of real-time information which can be
Microblogging websites (as they stand today are) an unlimited source of information (for)
various topics. This came as a result of people using social media to post real time meassages
to present their opinions, make complaints, discuss different topics and post reviews of
products they use in (everyday) life. As a result of of this ongoing trend, manufactures are
social media to investigate customer’s reactions and opinions in order to improve their
products (as well as) maintain a high customer service. One particular challenge is
software which automatize the process of identifying a general opinion or sentiment. [2]
Nowadays microblogging websites have begun to be extremely popular providing users with
unlimited real time information and the possibility of expressing their opinions about specific
products they use or topics of interest along with making complaints and discussing recent
events. [1]
Formatat: Evidențiere
A comentat [SA1]: Remove this phrase
Formatat: Evidențiere
Formatat: Evidențiere
A comentat [SA2]: Use in the beginning Natural Language Processing (NLP) and Text Mining (TM)
Formatat: Evidențiere
Formatat: Evidențiere
Formatat: Evidențiere
Formatat: Evidențiere
Formatat: Evidențiere
A comentat [SA5]: Use in the beginning Natural Language Processing (NLP) and Text Mining (TM)
Formatat: Evidențiere
Formatat: Evidențiere
Formatat: Evidențiere
9
1.43 Project Outline
The aim of the project is to analyse textual data extracted from social media, more
specifically, to observe trends related with (to) the vision of global immigration on Twitter
furthermore to determine the opinion of individuals on this particular topic.
(In recent years),Recently, I immigration (has become)is a controversial topic, discussed
the world. (Over time, and in the modern times it constitutes a controversial topic. mMore
increase in the number of) politicians (that) are touching (on) this ese subject in their
You need a link here: In order to understand trends and opinions about immigration in social
media, we need automated methods such as text mining and opinion mining. Such tools will
be approached (aand discussed further) in this report. (thesis?)ll….
We (I) shall demonstrate The resulting system is how the application integrates the Text
Mining (TM) tools with visual analyticsable to offering the user an extensive analysis
through the interface for the user offering the capability of performing an analysis gathering
visualising the collective opinions for a specific period of time and along with possibility
ofalso individual examination of the tweets analyse each tweet individually. (have a look at
the whole sentence again- not sure what youre saying)
Furthermore, to extend the capabilities of the application we(I) shall demonstrate how two
different sentiment classifiers are implemented . The system is also designed to allow the user
users to choose the desired classifier allowing him (them) to make a comparison of the results.
The application provides the user with a text box where (in which) different sentences and
reviews can be added, in order to determine the sentiment expressed by the authors. (Further
to this,) it The system can also makes a call to a web service named TerMine provided by
NacTem [33]. which is analysing the inputted text (Through this service, the inputted text is
analysed ) and (as a result it) highlights the most important words in the text. By using this
the user can easily determine the content of the text and decide if the text presents any interest
for his needs.
1.54 Related work
Previous approaches include hand-coded rules (Neviarouskaya et al., 2010), the winnow
algorithm (Alm et al., 2005), random k-label sets (Bhowmick et al., 2009), Support Vector
Machines (SVM) (Koppel and Schler, 2006), and Naive Bayes (Mihalcea and Liu, 2006).[4]
One of the projects which aims to make an analysis of product reviews is (Fang and Zhan,
2015). They propose a tool able to classify at sentence level studying individual sentences
and extract the polarity from it and review level which generates the overall sentiment for the
entire review. The system is using three different machine learning classifiers a Naïve Bayes
classifier a Random Forest classifier and a Support Vector Machine Classifier (SVM).
In addition to social media sentiment analysis is performed on numerous forums or domain
specific blogs. (Zhao et al., 2014) propose a tool which is analysing the Online Health
A comentat [SA7]: If you have visualisation then either you must say how you integrate text mining with visual analytics
A comentat [SA8]: Which classifier?
Formatat: Evidențiere
A comentat [SA9]: Avoid the overuse of the word ‘the system’. Say, we shall show or demonstrate
A comentat [SA10]: For doing what? Be specific.
10
Communities (OHC) and identifies the most influential users. In this way the results will
assist the community and the OHC users. They use a special built classifier able to categorise
words which may refer to both positive and negative content such as “cancer” and be able to
correctly categorise it as the term “cancer” usually relates to a negative sentiment.
As described in section 1.2 Twitter can be a valuable source of information regarding
different domains of interest. In addition Twitter contains various discussions when a political
campaign is undergoing which makes it a necessity to perform sentiment analysis in order to
understand better users’ opinions. (Bakliwal et al., 2013) describe various experiments
undergone on Irish General Election in February 2011. They use a naïve lexicon based
classifier which uses a sentiment lexicon to determine the sentiment orientation of political
tweets. After excluding the tweets containing sarcasm the highest accuracy obtained was
61.6%.
One of the papers focuses on researching the social media in regards to medicine. This is
performed in order to get a better understanding on how people react to some
medicationmedicaments, treatments and diagnosis. Further to this the relation between
specific treatments and possible symptoms can be studied in a higher detail allowing the drug
This chapter presented the reader with an introduction of Sentiment Analysis along with the
general purpose of Sentiment Analysis and related work undergone by different practitioner.
Chapter 2 Design:
The chapter will present the architecture of the project. In addition, different approaches
tested in order to achieve my results will also be included.
Chapter 3 Implementation:
This chapter presents a more detailed description of the steps followed alongside the tools
used for the project implementation.
Chapter 4 Testing and Evaluation:
This chapter contains the evaluation and tests carried out.
Chapter 5 Reflection and Conclusion:
This chapter presents the overall conclusion of the project along with any further
enhancements that can be carried out.
Chapter 2 – Design
The project was designed to allow an extensive analysis on a specific topic, based on
information extracted from social media. The system will be able to extract information from
Twitter, normalise this data, followed by determining the sentiment from the text and
categorising it in three groups: positive, negative and neutral. Furthermore, this chapter will
A comentat [SA11]: Which one?
Formatat: Evidențiere
A comentat [SA12]: You don’t react to diagnosis
Formatat: Evidențiere
Formatat: Evidențiere
A comentat [SA16]: Check out scientific writing In this section we present …. How sentiment analysis here is related with the section above?
11
present a high level view of the subcomponents of the system along with the functionality and
the purpose of them.
2.1 System diagram
The diagram below presents a high level design of the system, showcasing how the
components interact with each other.
12
Figure 1 – Diagram of the system.
2.2 Project lifecycle
In order to get started with the (begin the) project, the first step was to get (gather) an
understanding in (of) Sentiment Analysis (by) evaluating the necessary tools and resources
needed for development. Some of the essential resources I have used to get (gather) get an
Sentiment Analysis” (Pang and Lee) [4] and “Natural Language Processing with Python” [5].
These two (bBoth) offer examples of different approaches for Sentiment Analysis including
and code examples. Using these resources, a couple of different examples mentioned by the
authors were, in order to be able to understand how the components are interrelating with
each other (interrelated). At the start of the project, various software developments
approaches were also evaluated to be able to maximise the results whilst ensuring that the best
professional tool is delivered. Therefore, the project adopted few Agile practices, which
proved to be a great decision for the development after taking in consideration the numerous
approaches regarding sentiment classifiers encountered and the need of easily change the code
to comply with each of them. Moreover, if the system would not be developed in separate
independent modules the necessity of writing the code from scratch would have been
unavoidable. As part of the practices, an online task board was used, which assisted mainly
divide (in dividing ) the requirements in small parts and (in order to be able to observe the
progress). As a result of using this practice, better time management and a better task
management were obtained. Furthermore, the use of retrospectives provided the option of
looking back and checking how the development progressed after every iteration, concluding
in enhancing the development for the future iterations.
2.3 Data Extraction
2.3.1 Motivation
A major part in this analysis is represented by the data extraction which provides us with
the necessary information regarding the project’s topic, immigration. I It also needed to
(was also necessary to) identify which social media platform suits well the data extraction.
By having previous experience with the Twitter API and being aware of the reliability of
the Twitter Streaming services, along with the amount of information available on
Twitter, (this) played an important role in my decision regarding choosing (the chosen)
platform.
2.3.2 Twitter connectivity
13
In order to extract the data we need to connect to Twitter. Twitter is providing with a
Streaming API which is offering developers a “low latency access to Twitter’s global
stream of data” [6].
Twitter Streaming API is offering three different services such as:
Public streams - “Streams of the public data flowing through Twitter”
User streams - “Single-user streams, containing roughly all of the data corresponding
with a single user’s view of Twitter”
Site streams - “The multi-user version of user streams”[7]
In the system the public stream is used in order to retrieve all public data based on a specific
keyword. As a result of the call to Twitter Streaming API all the results containing the
specific keywords are retrieved and are saved in a designated file.
2.4 Data pre-processing
Extracting the sentiment from a tweet is not a trivial matter as the data found on
microblogging websites contains slang, abbreviations and Twitter specific symbols. The
processed tweet requires to be cleared from URL, @ mentions and other Twitter specific
symbols such as ‘#’ whilst maintaining the text of the hashtag as it can contain an important
reference to the sentiment of the tweet.
The pre-processing is done using different regexes, which are scanning through the tweets
removing the undesired data leaving the tweet clean and ready for analysis.
2.5 Sentiment evaluation methods
When it comes to There are different methods and techniques used for performing sentiment
them divide into two major categories:
Machine Learning approaches
Lexicon Approaches
This project focuses on using two distinct approaches in order to get a comparison of results
and also one can be used as training data provider for the other.
The first approach of the project is a lexicon approach which is based on a lexicon containing
words along with their pre-labelled sentiments.
The second approach is a machine learning approach which uses probabilistic in order to
determine the sentiment in a tweet. This process requires a pre-trained classifier.
Chapter 3 - Implementation
Formatat: Evidențiere
14
This chapter presents all the components of the system including a detailed step by step
description of approaches and tools used in the development.
3.1 Data Extraction
The first step in extracting the necessary data for the analysis is to decide on the source of the
extraction. This project uses Twitter as a source of information as it provides the user with
unlimited amount of data.
In order to obtain a comprehensive analysis, we need to extract a large amount of tweets
which makes it absolutely necessary for the automatization of data extraction. As mentioned
in the section 2.3.2 Twitter does provide an API which can allow the user to extract the data.
API refers to Application Program Interface which allows the user to connect with the API
provider in order to access certain services orf tools [8]. This project will make use of Twitter
Streaming API. In order to connect to the API we will require four different API Keys from
Twitter:
Consumer Key
Consumer Secret
Access Token
Access Token Secret
These four keys are used to issue authorised requests to the Twitter Streaming API. [9]
These API keys can be obtained by accessing “apps.twitter.com/” and creating a Twitter app
where Consumer Key and Consumer Secret will be available. In order to obtain the Access
Token and Access Token Secret, Twitter provides a function to generate these which can be
accessed from the Twitter app created.
Furthermore, the system uses a Python library called (known ass) Tweepy[10 ] which reduces
complexity in creating the connection to the Twitter API by handling the creation of the
session, connection to the Twitter Streaming API, authentication and also exiting the session.
Tweepy is also reading the incoming messages received from Twitter making it easy to
connect to the Twitter Streaming API. [10]
Inside the Stream Listener class of Tweepy there is a function called filter assigned to the
stream listener object which allows the user to define specific keywords which will filter the
tweets retrieval.
The project focuses specifically on immigration so the keyword provided for the stream
Listener is “immigration”. In this way only tweets containing this keyword or related to the
topic will be extracted. (Figure 2)
To add how much data is extracted in one day………………..
Figure 2 - The process of extracting the tweets.
15
Once the session is active the tweets are saved in a file from where they will be processed
later. (Figure 3)
Figure 3 - The tweets collected in a file.
3.2 Text normalisation
Twitter became a very important component in Natural Language processing nowadays as
they provide an important amount of real-time information which can be processed and used
in different ways in text Mining and Natural Language processing. The information present
on Twitter can range from formal reports to meaningless messages mostly written in a Twitter
specific language. [11] From my observation most of these tweets contain typos,
abbreviations, emoticons, URLs and other Twitter specific symbols and expressions. In order
to be applicable for Sentiment Analysis the tweets have to be normalised which implies
different “cleaning techniques” that will be presented in this chapter.
3.2.1 JSON Removal
Returning to the data we extracted from Twitter, it needs to be mentioned that the Twitter
Streaming API returns the tweets in JSON format which refers to stands for JavaScript Object
Notation.[12]
The tweet extracted contains various information about the authoruser [13] (Figure 4):
“created at”- date and time when the tweet was created
“id” – a unique value assigned to every user
“text” – the content of the tweet
“source” – the application from where the tweet was sent
Formatat: Evidențiere
A comentat [SA18]: Use in the beginning Natural Language Processing (NLP) and Text Mining (TM)
16
“screen_name” – the name of the author
“location” – the location of the author
“description” – the author of the biography
“friends_count” – the number of friends of the specific user
“retweet_count” – how many times the tweet was retweeted
“lang” – the language of the tweet
Figure 4 - A tweet in JSON formatpresenting a raw tweet
Although the raw tweet contains a lot of information we will only need the text of the tweet.
In this case we will load the JSON and parse through the tweets file while writing on a
separate file only the text appearing in the raw tweet.
In a different scenario all the information contained in the raw tweet can also be handled in
different ways according to the requirements of the project. By this way the tweets can be
filtered by location, selecting different regions or languages.
3.2.2 Tokenization
Tokenization and lower case transformation are processes which imply transforming all the
text to lower case as well as dividing the character stream into different tokens.[14] These
processes are used for the lexical analysis undergone by the lexicon classifier and are
performed using SentLex Library.[15]
3.2.34 Removing usernames
One of the ways the information is spread over twitter is by users tagging another users in
their tweets and as a response most users are replying to the initial tweet using the same
tagging principle. The tagging process is done using the specific Twitter procedure such asby
adding “@” followed by the name of the user. In this way the tweet contains numerous user
tags that can affect the sentiment detection. The system is parsing through each tweet in a file
looking for “@” symbols and removing them as part of the normalization process.
3.2.54 Removing URLs
Similar to the user tagging mentioned above, the tweets usually contain links to different
attachments of websites, expressions which do not contain any sentiment and their removal
will enhance the precision for the lexicon classifier and increase the accuracy for the machine
learning classifier as it will be presented in the next chapter. As a result, the system uses
regexes to parse the tweets and identify the patterns leading to an URL and removing it.
Regexes referare standing tofor “Regular Expressions” and represent which are text strings
used for describing patterns.
17
3.2.6 5 Removing duplicates and white spaces
The Twitter Streaming API returns all the tweets containing the designatedrespective
keyword. and a Amongst these the Retweets are also are included Retweets which can lead to
a less efficient training data for the ML classifier. In order to avoid this,ese the system is
parsing a file and saves each line in a set, followed by writing the tweets in a separate file.
The set it is used as it is a data structure which contains only unique elements and as a result
of this the duplicates are removed.
3.2.7 6 Possible implementations and approaches
One of the text normalisation practice is the stop -words removal. Stop words are theose
words which appear commonly in sentences that do not present any sentiment. [16] Usually
these words are represented by linking words such as: “to, a, and, hHow, wWhen” . and
rRemoving them from the sentence would allow the system to focus on the most important
words which are linked to a sentiment.
The stop words removal has a wide range of applicability from:
Supervised Machine Learning
Clustering
Information Retrieval
Text summarization [17]
My attempt to implement a stop word removal was done using the NLTK Stop Word list but
has resulted in a lower accuracy than before removing the stop words. Most of the pre-
compiled stop word lists have a negative impact on the classification performance, also Naïve
Bayes classifiers are more sensitive to stop words removal than Maximum Entropy classifiers.
[18] This system attempted a Naïve Bayes classifier and a pre-compiled stop words list, and
resultedting in a low performance classification which was the cause of the exclusion of the
stop word removal process from the normalization.
As the introduction of this chapter mentions, Twitter contains slang, abbreviations and Twitter
specific expressions. In order to handle this I have attempted to use a spell-checker provided
by TextBlob [19]. The result as presented in the diagram below was presenting that the spell
checker corrected was correcting the extra letters in the words such as “Gooooood” into
“Good” but in some cases did not performedwas not performing as expected. (Figure 5)
Consequently, the sentence would result on a potential wrong classification and was not
includedused in the project. The spelling corrector is performing with an accuracy of 70% and
is implemented using a pattern library.[19] As a result of limited time for development a
different spelling corrector was not implemented.
Figure 5 - One of the tests performed r forwith the spell checker
18
3.3 Sentiment Analysis
In order to perform the desired classification this system is using two different approaches:
Lexicon based approach
Machine Learning approach
3.3.1 Motivation
The motivation for using two different classifiers was to gain an understanding in how
sentiment analysis can be performed using both lexicons based and machine learning
approaches along with getting a comparison of results in order to analyseis when and which of
the classifiers is obtaining a better performance.
3.3.2 Lexicon Based Approach
According to (Kaushik and Mishra, 2014) lLexicon based approaches for sentiment analysis
are based on the assumption that the overall sentiment of a piece of text is determined by the
sum of the individual words or phrases.[20]
In order to determine the sentiment for the tweets the approach makes use of lexicons.
Lexicons are dictionaries containing words along with their POS tag and polarity represented
by positive and negative scores.
POS tagging refers tostands for Part-Of-Speech Tagging and is a piece of software which
takes as an input a text and assigns part of speech tags such as noun, adverb, adjective, etc.
and it is described by The Stanford Natural Language Processing Group as a piece of software
In order to determine the sentiment in a piece of text the system is performing a POS Tagging
on the tweet, followed by the application of different algorithms to calculate the overall
polarity of the text. The system uses SentLex[15] a python library which performs lexicon
based analysis.
One of the algorithms used is a Negation Detection which is determining if the word is
negated, resulting in a better performance of the classifier.
In the context of this project the system was required to return a three way classification such
such (ddefined asby ) as positive, negative and neutral. Since SentLex[15] was providing with
and negative tags for the input tweets, the library had to be modified in such a way to return
the neutrality of the sentence. ConsequentlyFurthermore, an algorithm was implemented and
extracting the positive and negative score for each tweet followed by the calculation of the
absolute value of the difference of these two and compared with two pre-defined boundaries
which represent the values for neutrality.
Lower bound < absolute value (positive_score – negative_score) < Upper bound
In this way the neutrality could be determined.
Formatat: Evidențiere
19
3.3.3 Machine learning classifier
Sentiment analysis approaches can be divided in two categories:
Supervised Learning models
Unsupervised Learning models
Supervised learning refers to those models where the training data contains the input data
along with the wanted outcome. In addition they have to achieve acceptable result for data not
observed during the training.[22]
Unsupervised learning describes a model which is not provisioned with pre labelled inputs, so
the model has to make relations between the data provided or search for patterns. (Peter
Dayan) affirms that: “Unsupervised learning studies how systems can learn to represent
particular input patterns in a way that reflects the statistical structure of the overall collection
of input patterns”. [23]
The motivation for using a supervised approach for the machine learning classifier is that for
sentiment analysis supervised approached tend to have a higher performance with the
condition of test data being similar to the training data.[24]
(Schrauwen ,2010) In the paper “Machine learning Approaches to Sentiment Analysis Using
the DNC” the author defines Supervised Machine Learning as those techniques which use pre
labelled training corpus in order to learn a certain property or learn from examples [25]. In the
case of this project the system is provided with a predefined labelled corpus of 200000
positive tweets and 200000 negative tweets. This training corpusora was providedtaken from
by sentiment140 [26]. In order for this training corpus to be used by the system, it had to be
pre-processed the data to a format such as Tweet, Label. As the data was saved in a CSV file,
Microsoft Word was used to change the formatting.
Machine learning for sentiment analysis can be developed using different algorithms. The
most popular are [25]:
Naïve Bayes
Maximum Entropy
Support Vector Machines
Decision Threes
According to (Go et al., 2009) [26] it is not certain which of the algorithms mentioned above
perform better. The algorithm selected for this project is Naïve Bayes as it is simple to use
[27] and offers an accurate classification (Manning et al., 2008)[28].
The algorithmis is based on Bayes theorem and is efficient on a high input of data. The Naïve
Bayes model uses the maximum likelihood method and most of the times performs well in
difficult real situations in spite of the over simplified assumptions.[29]
3.3.4 Feature Extraction
A comentat [SA20]: Use proper citation format, e.g. in (author et al) or in [1] where 1 is the number of the citation in the reference
A comentat [SA21]: You have to explain there are supervised and unsupervised methods first and then explain why you use unsupervised
A comentat [SA22]: Read the introduction from our paper http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00452.x/full
20
The feature selection process is the automatic selection of the most important attributes from
the training corpus [30].
Research have shown that large feature representation can be effective for tasks in NLP. On
the other hand making use of an exceeding number of features does not always increase the
performance of the classifiers. The features can consist of unnecessary information which can
result in noisy feature presentations. [31]
In sentiment analysis and opinion mining, aspect extraction aims to extract entity aspects or
features on which opinions have been expressed (Hu and Liu, 2004; Liu, 2012).[5].
In this project SelectPercentile[32] and CountVectorizer[33] provided by SciKit Learn [34]
library SelectPercentile and CountVectorizer provided by SciKit Learn library act as a feature
selector, used in order to extract the 20% most important features of the training data.
Numerous tests have been performed with 10%, 20%, 30% and 40% of the features but the
highest accuracy have been obtained with extracting only 20% of the features.
3.3.5 Multinomial Naïve Bayes(Bayes (NB)
This system uses a multinomial Naïve Bayes classifier for sentiment analysis.
Figure 6 - The formula of Naïve Bayes algorithm [100]
The Naïve Bayes algorithm is based on maximum likelihood. In the formula presented above
𝒇 describes a feature while 𝒏𝒊(d) represents the feature count 𝒇𝒊. P refers to the probability
that an event takes place and P(c), P(𝒇 |c) are achieved using maximum likelihood
estimates.[100]
3.3.6 Classification Process
The process of determining the sentiment from a tweet does involve a few steps. As presented
in the subsection 2.5 the machine learning classifier is working (works) based on
the machine learning classifier to be able to categorise the tweets it needs to be trained as
presented in section 3.3.
After the classifier is trained it can be saved as a pickle object in order to be able to be reused
for later classifications.
A comentat [SA23]: Several papers on feature selection for TM http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00452.x/full#b51 http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00452.x/full#b4 http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00452.x/full#b48
Formatat: Evidențiere
A comentat [SA26]: Citation?
21
FAs for a better handling of training process, the python library used for development, Scikit
Learn, provides a pipeline which is capable of combining the vectoriser and percentile
Selector used for feature selection along with the desired algorithm for classification, in this
case Multinomial Naïve Bayes.
After the pipeline is created, training can be performed by providing the pipeline and the
training corpus.
After the training is done and the classifier saved as a pickle object it can be instantiated in a
separate class and used for classification.
As a result of this implementation, the classifier was capable of binary categorising of tweets
in two categories positive and negative.
As mentioned before in section 3.3.2 the requirements for the project were requesting a three3
way classifier capable of detecting neutrality in addition to positivity and negativity of tweets.
In this case the system is using a function provided by the Science Kit Learn library which
displays the probability of the specific piece of text to be negative of positive displayed in the
interface which will allow the user determine the neutrality of the text.
3.3.7 Combining two different classifiers.
A different approach for sentiment analysis this system is integrating was influenced by the
project developed by HP[35] and makes use of the lexicon based classifier mentioned in
section 3.3.2 as a provider of labelled data, necessary for training the classifier presented in
the section 3.3.3.
In order to do so the algorithm used for determining the neutrality for the lexicon based
classifier presented in section 3.3.2 has been used to determine the tweets with high negative
and positive sentiment. Afterwards aA set of 2000 pre-labelled tweets was created and was
added to the training corpus provided by Sentiment 140. This approach is extremely useful in
situations when a specific training corpus is not existent as in non-English speaking countries
where a pre labelled training corpus is not existent or a domain specific such as medicine
when a training corpus based on movie reviews would not produce a performant classifier.
This approach also enhances the performance of the machine learning classifier by providing
training data specific to immigration in the case of this project, resulting in a better
classification related to immigration. On the other side, this approach can also decrease the
performance of the machine learning classifier as usually the accuracy of a good lexicon
based classifier needed to produce the labelled data is around 70 – 80% resulting in 20% of
the data classified inappropriately, which will result in misclassification of the machine
learning classifier. In order to avoid this issue I decided to use only the tweets which have a
strong sentiment leading to a more accurate labelling. In order to extract only the tweets with
a high sentiment I used the algorithm presented in the section 3.3.2 and set it to a high value
so to extract only specific tweets.
22
3.3.8 Previous approaches and possible implementations
Initially I decided to use Naïve Bayes algorithm for developing the classifier which leaded to
future approaches using this algorithm. The motivation for choosing this algorithm is
presented in section 3.3.3Previous approaches to machine learning classifiers were implying
using different libraries such asmaking use of the Python’s library, Natural Language
Toolkit(NLTK) and TextBlob.
The first attempt was using the TextBlob library which was based on NLTK. The library was
using the Naïve Bayes algorithm for classification which was trained based on a corpus of
50000 tweets. This approach was not t considered forselected this system as a result of long
period of timeit was requireding long time f or to training the classifier.
The second attempt was using NLTK library. The motivation behind this approach was
eliminating the extra time required by the TextBlob library taken for training the classifier.
Furthermore this library was providing with a lot of functionality needed for the classification.
Using this approach the training time for a corpus containing 400000 tweets was over 15
hours without applying the feature selection. This made me realise how important the crucial
importance feature selection ishas not only for reducing the time required to trainof training
the classifier as well asbut also improving the accuracy of the classifier. This result leaded me
to the use of Science Kit Learn library for the ease of use by combining the feature selection
functions and the desired algorithm for classification under one pipeline.
OtherApart from the approaches mentioned other methods have been tried but were not
considered but were not included in the system due to the lack of support for the specific
implementations or poor results.
3.4 User Interface
3.4.1 Motivation
The motivation behind developing the user interface was the necessity of displaying the
results in a professional manner including the idea of creating specific analysis and visualising
tools required for an extensive analysis.
3.4.2 Django
In order to build the interface I have decided to use Django an open source framework as an
alternativeinstead of a classic Python GUI. By using this approach it allowed me to build a
more reliable web interface along with more possibility of design.
23
Django is a High-level web-framework written for Python apps based on Model View
Controller (MVC) design patterns providing a powerful and friendly template system for the
designers. [31]
3.4.3 UI Structure
The user interface is structured in two parts. The first part of the interface is used in providing
a long term analysis of the immigration tweets displaying a pie chart which is updated
automatically from a database. (Figure 6)
Figure 6 – The first part of the interface.To add diagram
The database is provided by Django and by default is SQLite. The database can be updated
from the user interface by the user. (Figure 8) This action also triggers another jQuery request
which performs sentiment analysis on the tweets added to database and updates the pie chart.
24
Figure 8 – The file upload.
The pie chart template is provided by Chartit [34] and offers the user the possibility to save a
pdf copy of the pie chart or print it. (Figure 9) This functionality is important part of the
analysis as the user can save records of the data added for future evaluations.
Figure 9 – The ability of saving the chart.
25
The second part of the user interface allows the user to explore more deeply the sentiment
analysis tools, by being offered the possibility of examining each tweets on its own and
perform sentiment analysis on it using the selected methods presented in sections 3.2.2 and
3.2.3. (Figure 10)
Figure 10 – The second part of the interface.
The user interface also provides a column chart displaying the polarity of the tweet in the case
of lexicon based approach and the probability of that tweet to be positive or negative. In this
way the userhe can determine the neutrality or the intensity of the sentiment as presented in
section 3.3.6.(Figure 11)
The column chart is provided by Chartit [32] and similarly to the pie chart allows the user to
save the chart in pdf format or print it for further analysis. (Figure 11)
26
Figure 11 – Result of the classification.
The user interface also allows the user to input text for analysis by displaying a text box
where the user can provide reviews or text extracted from different sources and be analysed
using both sentiment analysis methods. (Figure 12)
Figure 12 – The textbox.
To add diagram…..
Figure 13 – The TerMine service in highlighted words.
27
Chapter 4 - Evaluation and testing
In this chapter will be presented different attempts to verify the results of the classifier along
with different testing methods in order to make sure the system is behaving accordingly.
The system was exposed to different approaches regarding the text normalization and
sentiment analysis. Furthermore the system was developed in independent modules able to
run separate from other parts of the system. As a result of these one of the main concerns was
if all the components would be able to be integrated under one system without having any
clashes or errors which would lead to a delay in providing the final result or even a system
failure incapable of meeting the requirements. Furthermore the system proved to be a success
not only being able of implementing all the requirements but also adding extra functionality.
In addition being developed in independent modules the system was capable of quickly
replacing and integrating different components which needed to be substituted as a result of
poor results.
4.1 Beta testing
As part of the testing process, testing and verifying the entire system was an important part of
the development.
If the developers want to believe the software they implemented has a good User Experience
they will often do, they are not capable of seeing the real nature of their system. [34] As a
result of this I have decided to let people who were not involved in the project, use and test
the system in order to provide me feedback and ideas which will improve the quality of the
software and enhance the User Experience. This method proved to be extremely useful as I
have received feedback for different components of the system and ideas for creating a more
professional User Interface.
4.2 Unit Testing
In addition to the high level testing carried out for the system, low-level testing was
performed for the system which implied the creation of the individual unit tests for each
component of the system.
The tests have been implemented for the normalization and the classifiers using PyUnit tests.
PyUnit is a Python language version of the unit testing framework JUnit. Following the unit
tests I was able to reassure that the methods used by the system are behaving as required.
28
4.3 Further Testing
Once the interface was implemented, a continuous testing was approached, evaluating if the
classifiers are categorising accordingly. In this way, the interface was tested in more detail to
make sure everything behaves accordingly. In addition, this allowed me to observe if my
classifiers are consistent in the case of different categories of tweets. From my observation the
tweets on immigration vary from a political view to a social view, but the classification is not
affected by this change. Furthermore, I observed the lexicon classifier tends to perform less
accurate when tweets containing sarcasm are processed. This matter was expected, as the
lexicon classifier is based on a dictionary look up and does not contain any sarcasm detection
handlers. The machine learning classifier was able to classify appropriately some of the
tweets containing sarcasm, as it uses a probabilistic algorithm based on a pre labelled corpus
of data.
4.4 Evaluation
In order to evaluate the performance of the classifiers a data set of 1000 pre labelled tweets
was used. After several tests, the highest accuracy achieved with the lexicon classifier was of
63%.
Different tests have been performed in order to examine the performance of the machine
learning classifier. Different corpus size used for training have been approached along with
different percent in feature extraction as presented in section 3.2.4. The highest accuracy was
obtained using a training corpus of approximately 400000 tweets along with 20% of the
features extracted. The machine learning classifier obtained a maximum accuracy of 80%
based on a data set of 1000 pre-labelled tweets. Various studies demonstrated that humans can
usually agree to the sentiment in a text between 70% and 80% of the time. [99]
Chapter 5 - Reflection and Conclusion
5.1 Requirements
The application was required to be capable of:
Extracting data from Twitter
Performing pre-processing and normalization
29
Performing sentiment analysis
The requirements were implemented successfully and in addition a separate approach for
sentiment analysis was developed along with an interface to display the results.
5.2 Learning process
This project was a great opportunity for me to highlight and apply in practice the knowledge
gained during the first years of the university, but most importantly it was a great learning
process. For the period of the development I have been exposed to programming languages I
had never used before such as “R” as part of my attempts to implement a classifier. I had also
the chance to enhance my knowledge by learning to call web services and use advanced
functionality of Python language.
This project also introduced me to the area of text mining, natural language processing and
machine learning which helped me build on the previous experience gained during the
university.
By having developed the project in an Agile manner, using short iterations and an Online
Task Board, I had the opportunity to apply the materials taught during the university in a real
project leading to a better time management and a better modulation of my project allowing
me to change my code often without affecting the entire system.
I had also decided to create a user interface in order to display my results and where the user
can easily make use of the resources of the project without being exposed to non-user friendly
interface of the terminal where a specific computing knowledge is needed in order to use the
system. In order to do so I had to learn Django which is a fast web framework written in
Python [6] which allows the connection between the Python code of the system and a web
browser which acts as a user interface.
To conclude with the development of the project was a great learning experience allowing me
to enhance my knowledge and capabilities with technical skills by learning new programming
languages and advanced usability of previously known tools and also contributed to
increasing my soft skills by improving my time management and awareness of developing a
large project.
5.3 Challenges
By having no previous experience with Sentiment Analysis, one of the first challenges I
encountered was understanding the topic and the existing work.
The data extracted from social media consisted a challenge as well as the data is not written in
a formal manner respecting all the grammar laws, but it is written in a Twitter specific way
people making use of abbreviations, slang and Twitter specific language such as using
30
hashtags, @ mentions and URL’s. It was particularly challenging to adapt this data and
modify it in a way in order to be applicable for my classifiers.
Another major challenge was the implementation of the classifiers. Most of the related papers
were written using technical content specific to the topic and were including mathematical
formulas and topic specific expressions making it difficult to understand for a user without a
prior experience. Having carried an intensive research and reading on different forums and
speciality websites, where an introduction to the domain was presented in a less formal
context, I managed to grasp the basics and get an understanding of the acronyms and topic
specific expressions which lately allowed me to discover a more variety of papers and
projects related to this topic.
Having different attempts of implementing the classifiers using different Libraries required
getting used with using each of the libraries and accommodating their tools to my project but
carrying intensive research on specialist forums and having a background in computing I
managed to understand the way these tools are functioning allowing me to obtain the desired
results.
Another major challenge was implementing the web framework in order to construct the web
interface for my project as it required me understand how a “model-view-controller”
architecture works and how it can be deployed in my system. Constructing the interface also
required a combination of different tools and programming languages such as jQuery, HTML,
CSS which I had little previous experience followed by integrating all of them in order to
work with my system.
In conclusion developing this project involved facing a lot of challenges but being determined
to achieve my results and a lot of research helped me to manage to obtain the desired results.
5.4 Future enhancements
As mentioned in the previous chapter this project has managed to achieve all the desired
results and to implement all the requirements including additional features for a better
comparison of results and a better visualisation and handling of data. The area of sentiment
analysis is still in the early stages with newer implementations appearing very often.
According to my research there is no implementation which will guarantee you better results
than another, as in this area there are many components to take into consideration. The final
results can be influenced by the origin from where the data is extracted, topic on which is
performed the sentiment analysis and many other factors. In order to obtain the highest result
many approaches need to be implemented and verify which of them is providing the best
results. Furthermore this is what this project tried to achieve by implementing two different
approaches for the classification methods. However being limited by the time I was not able
to test every available approach. In this subchapter I will present some of the possible
enhancements that can be added to the project in order to provide more functionality for the
user, to enhance the accuracy of the classifiers and provide an even more accurate close to the
state of the art results.
31
5.4.1 Different classifier implementations
As mentioned in the section 3.3.3, different implementations for the machine learning
classifiers exists such as Support Vector Classification, Maximum Entropy. The authors of
“Using Maximum Entropy for Text Classification” mention that using Maximum Entropy in
text classification compared to Naive Bayes classifier which is used in this project can
sometimes provide better results but sometimes worse. [10] As a result the implementation of
different machine learning classifiers could have resulted in better or less accuracy than the
Naïve Bayes classifier used by this project.
References:
[1] Agarwal A., Xie B. ,Vovsha I. ,Rambow O. and Passonneau R. (2011) 'Sentiment Analysis of Twitter
Data', Proceedings of the Workshop on Languages in Social Media, pp. 30-
38.http://www.cs.columbia.edu/~julia/papers/Agarwaletal11.pdf
[2] Streaming APIs, Available at: https://dev.twitter.com/streaming/overview(Accessed: 5th April 2016).
32
[4] http://cs229.stanford.edu/proj2010/HsuSeeWu-
[10] Nigam K., Lafferty J. and McCallum A. (n.d.) 'Using Maximum Entropy for Text Classification'.
[7] Kerstin Denecke (2015) Health Web Science: Social Media Data for Healthcare, : Springer
International Publishing.
[8] Bhatti S. (n.d.) 'Multiclass Sentiment Analysis on Movie Reviews', pp. 1- 6.
[12] Pang B. and Lee L. (2008) 'Opinion mining and sentiment analysis', pp. 1 -90.
[13] Edward Loper (2009) Natural Language Processing With Python: Analyzing Text with the Natural
Language Toolkit, Sebastapol, California: O'Reilly.
[14] An Introduction to Text Mining using Twitter Streaming API and Python, Available
at:http://adilmoujahid.com/posts/2014/07/twitter-analytics/ (Accessed: 10th April 2016).
[15] Obtaining access tokens, Available at: https://dev.twitter.com/oauth/overview(Accessed: 10th April
2016).
[16] Streaming With Tweepy, Available
at:http://tweepy.readthedocs.org/en/v3.5.0/streaming_how_to.html (Accessed: 10th April 2016).
[17] Han B. and Baldwin T. (2011) 'Lexical Normalisation of Short Text Messages: Makn Sens a
#twitter', pp. 368-378.
[18] SON Tutorial, Available at: http://www.w3schools.com/json/ (Accessed: 11th April 2016).
[19] A beginners guide to streamed data from Twitter, Available
at:http://mike.teczno.com/notes/streaming-data-from-twitter.html (Accessed: 11th April 2016).
[20] All About Stop Words for Text Mining and Information Retrieval Available at: http://www.text-
analytics101.com/2014/10/all-about-stop-words-for-text-mining.html (Accessed: 11th April 2016).
[21] Saif H., Fernandez M. , He Y. and Alani H. (2014) 'On Stopwords, Filtering and Data Sparsity for
Sentiment Analysis of Twitter', pp. 810 - 816.
[22] Kaushik C. and Mishra A. (2014) 'A SCALABLE, LEXICON BASED TECHNIQUE FOR SENTIMENT
ANALYSIS', International Journal in Foundations of Computer Science & Technology (IJFCST), 4(5), pp. 35
- 43.
[23] Stanford Log-linear Part-Of-Speech Tagger, Available at:
http://nlp.stanford.edu/software/tagger.shtml (Accessed: 14th April 2016).
[24] Tools and Libraries for Lexicon-Based Sentiment Analysis, Available
at:https://github.com/bohana/sentlex (Accessed: 15th April 2016).
[25] Schrauwen S. (2010) 'MACHINE LEARNING APPROACHES TO SENTIMENT ANALYSIS USING
THE DUTCH NETLOG CORPUS'.
[26] For Academics, Available at: http://help.sentiment140.com/for-students (Accessed: 16th April 2016).
[28] Mihai M. (2010) 'Naive-Bayes Classification Algorithm'.
33
[29] http://machinelearningmastery.com/an-introduction-to-feature-
[31] Zhang Z. (2012) 'Django Web Framework '.
[32] Django Charit, Available at: http://chartit.shutupandship.com/ (Accessed: 18th April 2016).
[33] Frantzi, K., Ananiadou, S. and Mima, H. (2000) Automatic recognition of multi-word
terms. International Journal of Digital Libraries 3(2), pp.117-132.
[34] Simon Harper (n.d.) UX from 30,000ft, : Leanpub.
[35] Zhang L., Ghosh R., Dekhil M., Hsu M. and Liu B. (2011) 'Combining Lexicon-based and Learning-
based Methods for Twitter Sentiment Analysis'.
[40]Tutorial: Quickstart, Available at:https://textblob.readthedocs.org/en/dev/quickstart.html#spelling-
correction (Accessed: 16th April 2016).
[50] Mu T., Miwa M. , Tsujii J. and Ananiadou S. (2014) 'Discovering Robust Embeddings In (Dis)similarity
Space For High-Dimensional Linguistic Features', Computational Intelligence, 30(2), pp. 285-315.
[51] sklearn.feature_selection.SelectPercentile, Available at: http://scikit-
learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html(Accessed: 17th April
2016).
[52] sklearn.feature_extraction.text.CountVectorizer, Available at: http://scikit-
learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html(Accessed: 17th
April 2016).
[53] scikit-learn, Available at: http://scikit-learn.org/stable/index.html (Accessed: 17th April 2016).
[55] Huang C., Simon P., Hsieh S. and Prevot L. (2007) 'Rethinking Chinese Word Segmentation:
Tokenization, Character Classification, or Wordbreak Identification', pp. 69-72.
[60] Witten I. 'Text mining', pp. 1-21.
[61] Sentiment Analysis, Available at: https://www.lexalytics.com/technology/sentiment(Accessed: 31st
April 2016).
[70] Gamallo P. and Garcia M. (2014) 'Citius: A Naive-Bayes Strategy for Sentiment Analysis on English
Tweets∗', Proceedings of the 8th International Workshop on Semantic Evaluation, (), pp. 171-175.
[100] Go A., Bhayani R. and Huang L. (n.d.) 'Twitter Sentiment Classification using Distant Supervision'.
[105] Donalek C. (2011) 'SupervisedandUnsupervised Learning',
[106] Dayan P. (n.d.) 'Unsupervised Learning', The MIT Encyclopedia of the Cognitive Sciences.,
[99] SENTIMENT ANALYSIS: WHY IT’S NEVER 100% ACCURATE, Available at:http://brnrd.me/sentiment-
analysis-never-accurate/ (Accessed: 31st April 2016).
[107] Rothfels J. and Tibshirani J. (2010) 'Unsupervised sentiment classification of English movie reviews
using automatic selection of positive and negative sentiment items',
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant
supervision. In CS224N Technical report. Standford
Chris Manning, Prabhakar Raghadvan, and Hinrich Schutze. 2008. ¨ Introduction to Information
Retrieval. Cambridge University Press, Cambridge, MA, USA.
34
Zhao K, Yen J, Greer G, et al. J Am Med Inform Assoc 2014;21: 212–218.
Bakliwal A., Foster J., van der Puil J., O’Brien R., Tounsi L. and Hughes M. (2013) 'Sentiment Analysis of
Political Tweets: Towards an Accurate Classifier', Proceedings of the Workshop on Language Analysis in
Social Media, (), pp. 49-58.
APPENDIX
University of Manchester logo is property of “The University Of Manchester (UoM) and has
been taken from: “http://www.manchester.ac.uk/”
All the external tools used in the development are under “Open Source” license and the author
does not hold property of the external tools used.
35
TerMine service used by the project and described in section … are property of NaCTeM. For
more information please contact: [email protected] .
The report, screencast and the implementation of the project are properties of The University
Of Manchester.
For any enquires regarding the use of these resources please contact:
School Of Computer Science, The University Of Manchester.