word vectorisation for long-short-term-memory (lstm) model
Post on 20-Apr-2022
4 Views
Preview:
TRANSCRIPT
NATIONAL UNIVERSITY OF SINGAPORE
Department of Physics
Word Vectorisation for Long-Short-Term-Memory
(LSTM) model on Chatbot and Analysis of Model’s
Dynamical Patterns
Lim Yanxiang Louis
A0140537L
Supervisors:
Orkan Arkan (Director of EY Data & Analytics)
Dr Hong Cao (Head of Data Science, EY)
Dr Feng Ling (Assistant Professor in NUS)
5th April 2019
Page i
Abstract Word Vectorization for Long-Short-Term-Memory (LSTM) model on Chatbot and Analysis of Model’s
Dynamical Patterns
Lim Yanxiang Louis, National University of Singapore (Singapore)
Chat services are needed in almost every business. Industries are using rule based chatbots to automate
chat services, however, they are faced with limitations. In this study, we build a generative based
chatbot using the Ubuntu Dialogue Corpus. This type of chatbot have the potential to answer all
technical questions about the Ubuntu operating system. We first analysed the corpus and sent it
through a Natural Language Processing (NLP) pipeline. We applied 3 different pre-trained word
embedding models (word2vec, GloVe and fastText), which vectorizes the words corpus and trained it on
a Long Short-Term Memory (LSTM) model. We studied the results of the embeddings and found that the
weight distribution became more heterogeneous during the training process. GloVe performed the best
in terms of accuracy for both fundamental and technical analysis. We tried to draw analogies between
neural activations and Ising spins by analysing the distribution of the model's activation.
Page ii
Acknowledgements An end to my formal education leads to the start to of my career. This is at this moment when life
begins.
It was an insane final year.
Before the semester started, I had to go through a lot of administrative work to coordinate between
NUS and EY, to understand the criteria of the final year project since the scope of this project is
unconventional to the physics department. All these were done when I was still in Munich, Germany
doing my exchange program. I had to consider the time difference when making a call to back to
Singapore to discuss about the project.
In the first semester, more than 12 hours were spent almost every weekday in the science library to
learn more about data science. I went from thinking that “pandas” referred to an adorable animal in
China to using this library almost every day.
In the second semester, I was fortunate enough find an internship in e-commerce company, Castlery,
where I worked 3 days a week in the business intelligence/ data analyst team, while balancing school
and this final year project.
The decision to learn more about data science put me out of my comfort zone and made life a lot
tougher than how it already was. However, I am very thankful that I have done it, broadening my
knowledge on this field.
To my supervisors at EY,
Orkan, thank you for introducing the world of data science to me, advising me on the initial steps and
planning such an interesting and relevant industrial project for me.
Dr Hong Cao, thank you for voluntarily joining the project. You got me to think about the implications of
data science from a business perspective and how to make my project valuable to both the academic
and industrial setting.
To my co-supervisor, Dr Feng Ling, thank you for spending so much time and effort on top of your busy
schedule to make sure that the project was on track. Really enjoyed the times when we brain-stormed
on ideas to make the project more interesting by including physics elements to the project.
A special thanks to my physics and Sheares hall senior, Jia Hui, although you have graduated even
before I joined NUS, you still came back to help the juniors. This includes imparting your knowledge on
data science, patiently guiding me to understand the theories I applied in this project and assisting me
whenever I had any problem with my codes.
To my mentor at Castlery, Manuel, thank you for making the internship such a valuable one, imparting
relevant knowledge to me. Appreciate the times we had small talks during lunch and after work,
discussing about the project and giving me some of your thoughts and feedbacks. You have been an
inspiration for me to learn more about data science.
Page iii
My friends in physics, notably Sherman, Abby and Jasper (semi physics), thank you for camping in the
library with me for crazy hours and finishing off the day with Bishan chicken rice. Not to forget Waxin
and Shouzan who were always there to entertain my nonsense.
I would like to thank all my friends and family who have been understanding as I disappeared to work
on what is important to me. All these would not have been be possible without all of you. Special
mention to Edward. You were always so willing to understand more about my project even though you
did not really know what was going on, just so that you can push me on to achieve more.
Finally, thank you Min Jee for taking care me during this period. You looked out for my well-being,
planning exercise classes and ensuring I was well nourished during the busiest months. The thoughts of
going out with you during the weekend was what kept me motivated to complete the tasks at hand.
Page iv
Contents Abstract .......................................................................................................................................................... i
Acknowledgements ....................................................................................................................................... ii
1 Introduction .......................................................................................................................................... 1
1.1 Motivation ..................................................................................................................................... 1
1.2 Introduction to Artificial Intelligence and Machine Learning ....................................................... 4
1.3 Chatbots ........................................................................................................................................ 7
1.3.1 Chatbot types ........................................................................................................................ 7
1.3.2 Chatbot Workflow ................................................................................................................. 9
2 Data Preprocessing ............................................................................................................................. 10
2.1 Ubuntu Dialogue Corpus ............................................................................................................. 10
2.2 Preprocessing Computations ...................................................................................................... 10
2.3 Natural Language Processing (NLP) pipeline .............................................................................. 12
2.3.1 Morphological Analysis ....................................................................................................... 12
2.3.2 Syntactic Analysis ................................................................................................................ 13
2.3.3 Semantic Analysis ................................................................................................................ 14
2.3.4 Natural Language Processing in this study ......................................................................... 14
3 Vectorising of text ............................................................................................................................... 15
3.1 Introduction ................................................................................................................................ 15
3.2 Word Embeddings ....................................................................................................................... 15
3.2.1 Word2vec ............................................................................................................................ 15
3.2.2 GloVe ................................................................................................................................... 16
3.2.3 fastText................................................................................................................................ 17
3.2.4 Embedding Comparison ...................................................................................................... 17
3.3 Implementation .......................................................................................................................... 20
4 Deep Learning Framework .................................................................................................................. 23
4.1 Sequence-to-sequence (Seq2Seq) .............................................................................................. 23
4.2 Recurrent Neural Network (RNN) ............................................................................................... 23
4.3 Long Short-Term Memory (LSTM) .............................................................................................. 24
4.4 Training ....................................................................................................................................... 29
5 Results ................................................................................................................................................. 31
5.1 Fundamental Analysis ................................................................................................................. 31
Page v
5.1.1 Accuracy and loss ................................................................................................................ 31
5.1.2 Weights and Biases ............................................................................................................. 33
5.1.3 Fundamental Insights .......................................................................................................... 35
5.2 Technical Analysis ....................................................................................................................... 35
5.2.1 Technical Insights ................................................................................................................ 35
6 Neuron Activation and Ising Spins ...................................................................................................... 38
6.1 Phase Transition and Ising Model ............................................................................................... 38
6.2 Model and Results ...................................................................................................................... 38
7 Future work and Conclusion ............................................................................................................... 43
8 Bibliography ........................................................................................................................................ 45
9 Appendix ............................................................................................................................................. 48
9.1 Weight and Bias Results .............................................................................................................. 48
9.1.1 Evolution over 1000 epochs ................................................................................................ 48
9.1.2 Model Comparison .............................................................................................................. 60
9.2 Technical Results ......................................................................................................................... 63
9.3 Activation Analysis ...................................................................................................................... 88
9.3.1 Reverse-CDF ........................................................................................................................ 88
9.3.2 Activation Variance ............................................................................................................. 94
9.4 Codes ......................................................................................................................................... 102
9.4.1 Chatbot Building ................................................................................................................ 102
9.4.2 Analysis of results.............................................................................................................. 108
9.4.3 Analysis of model parameters .......................................................................................... 109
Page 1
1 Introduction In this paper, we seek to understand the work flow of building a chatbot and apply the concepts of
physics to study the learning behavior of artificial intelligence in a chatbot.
1.1 Motivation In the last decade, machine learning has been trending. It is common for almost all industries to adopt
this technology to avoid falling behind their competitors. This is no different in the professional services
industry. EY’s data and analytics team has been established to leverage on big data and advanced
technologies to disrupt everything, providing new insights, value and action for the company. These
gives the company a competitive advantage to day-to-day business processes [1].
One of the most exciting artificial intelligence technology to date is the development of chatbots. A
chatbot, or question-answering system has been around since the 1970s. However, most of these
systems are limited to short and factual answers with no memory of the previous question asked [2].
Recently, due to advancement in technologies, we can generate more sensible answers to complex
questions. A break through in the chatbot technology will impact the business environment and the lives
of technology adopters. Hence, we study the current state of chatbot technology and study its relevance
to statistical physics, a field that has a much longer history as compared to data science. It will be a
scientific breakthrough if we are able apply our knowledge on statistical physics and have a better
understanding on chatbot technology or other machine learning technologies
This study is motivated by 3 main domains: business, technology and physics
The most clear-cut business application would be to improve the customer services sector. According to
IBM, during a 6 minutes customer service call, the customer service agent spends 75% of that time doing
manual research [3]. Businesses are turning to more cost and time effective alternatives to improve
their customer services. Chatbot software products can service customers 24/7 without much waiting
time. Scottish Bank, Royal Bank of Scotland (RBS) is using a chatbot trained on 1,000 responses to more
than 200 customer queries [4]. IBM predicted that 85% of all customer interactions will be handled by a
chatbot before 2020 [3]. Customer service will improve when businesses opt to adopt the chatbot
technology
Chatbot is not just beneficial to the customer service department, staffs within the company can also
utilize this service. Real business data is often stored in many sources, unfiltered and difficult for
humans to interpret. This is a problem that many companies are facing as they have different suppliers
and many different enterprise systems. A good chatbot software will understand what the staff requires
from his or her question and automatically extract, transform and load (ETL) the data, making it
interpretable to the staff. For instance, staffs from the operations department can keep track of their
operations simply by verbally asking their computer “where is my cargo now?” and get a reply almost
immediately as it searches through the database. This is not limited just to operation but any form of
data from marketing to finances. A good chatbot can easily join all their data and give them a real time
update of their business. Brazil’s biggest bank, Banco Bradesco has built a chatbot not only for
customers, but also agents to find 283,000 questions a month with an accuracy of 95% [4]. Adopting
chatbots will give data driven companies an edge over their competitors.
Page 2
Another business implication would be for the chatbot to replace applications or apps for short. Mobile
apps are very handy, we have apps for almost everything. Apps allow us to find out information on the
weather, order ourselves food, clothes or even call a taxi with a few clicks on our phone. However, to
perform each of these tasks requires a different app and this takes up a lot of storage space, something
which is getting more valuable each day as data is getting more important. With improved chatbot
technology, these tasks can be further simplified to an all in one platform. We can ask one chatbot to
perform all the tasks mentioned above and more, serving as an artificial butler. This idea is often seen in
science fiction movies such as in Iron Man where Tony Stark/Iron Man builds a chatbot called J.A.R.V.I.S
(Just A Rather Very Intelligent System) to assist him in almost everything he needs. Leading chatbot
technology services such as Google assistant is working towards this goal. A good chatbot technology
could do away with multiple apps, integrating companies in the backend while users have an all in one
artificial butler to serve them. When chatbots become more advanced, what used to science fiction will
become a reality and businesses need to adopt this technology to remain relevant.
The potential a chatbot can bring to businesses are limitless if companies know how to leverage on this
technology. However, its limitation could hinder business adoption.
Chatbots that the industries use is only capable of responding to frequently asked questions that it is
programmed to answer. If you have a more specific question, you will still require a human to intervene.
If chatbots have the ability to think and understand questions just like a human, they will be able to give
an answer similar or possibly even better than what a human is capable of at a fraction of the time.
In this study, we aim tackle this problem by researching on a type of chatbot that can respond to any
question, not just limited to the ones it was trained on. With this, we will move on to the technological
motivation.
We can read text from this report with ease. The same cannot be said about a computer. It is only
capable of making computations from numbers and makes no sense of alphabets, words and sentences.
We will understand how computers tries to understand the human language, English for this study. The
ability for AI to understand human languages, also known as natural language processing, is not only
relevant to building chatbots, but also in many other developing AI technologies. For instance, natural
language processing is also used to analyze the sentiment of texts, understanding the meaning of
sentences and can execute many tasks such as summarizing a long paragraph or clustering similar text
together to give recommendations to users based on articles they have been reading.
We will study different vectorization models commonly used in the industry, understand how the
computer convert words into numbers to process what humans are trying to tell them. The different
vectorization models are then applied in the context of a chatbot and evaluated. We will also be
exploring a state-of-the-art technology still not used by industries. Generative based chatbots (chatbot
that can respond to questions it has never been exposed to) are currently too inaccurate and unreliable
for industries. They use rule based (chatbot that respond based on what it is being taught) which is
easily implemented and reliable, however is limited to answer only questions that it was trained on.
Generative based chatbots, on the other hand, have is no limit to the types of questions it can answer
[6]. Hence, we study the implementation of a generative based chatbot.
We move on to the last motivation of this project and explore how physics is relevant.
Page 3
I consider a physicist to be a data analyst of the physical world. We have abundant data about how the
universe works and have developed theories and equations to model the universe. Some of these
equations are so complicated, advanced mathematical tools are required for these models. This results
in a long computation time whenever a variable is changed. With the same data, after training, neural
networks have the ability to match the inputs and outputs, constructing its own model. These models
might be able to achieve results similar to the complicated models using simple arithmetic operations,
significantly reducing computation cost. Thus, physicist working on projects that requires quick
production might find this a viable alternative after validating the trained model with their complex
model.
In this study, we are using a type recurrent neural network (RNN) model called Long-Short Term
Memory (LSTM). More details on what this is and why this model is implemented will be explain in
chapter 4.
LSTM is a model that computational physicists dealing with time series data are exploring. Due to the
complexity of certain problems, conventional equations are limiting physicist from solving them. There
are some physics problems that have been giving generations of physicists nightmares. They are now
looking at more advanced computational tools such as LSTM to solve these problems. For example, in
condensed matter physics, the Navier-Stokes equation governs the flow of fluids. However, this
equation is listed as one of the Millennium Prize Problems in mathematics as there is no proof that such
a solution even exists [7]. There is a paper titled “A Deep Learning based Approach to Reduced Order
Modeling for Turbulent Flow Control using LSTM Neural Networks” [8] that aims to model turbulent
flow without computing the full Navier-Stokes equation with the help of LSTM .
Another application of LSTM would be in the Large Hadron Collider (LHC) to model the voltage time
series of the magnets. This is possible from the data provided from an electronic monitoring system, in
hopes to detect misbehavior of the magnets and avoid costly damages in the LHC. More on this can be
found on a paper “Using LSTM recurrent neural networks for monitoring the LHC superconducting
magnets” [9].
On top of understanding how data science can be used to solve physics problems, we seek to discover a
mutually beneficial relationship between these 2 fields. Physics has been around for a very long time
and serves as a foundation to many other sciences such as Quantum Chemistry. Data science is a
relatively new field of science and there are still many areas where scientists are still debating on. For
example, many advanced models are often described as a “black box” as we are not sure of what is
going on inside them. We aim to marry physics theorems into data science by drawing analogies with
physical phenomenon. With this inspiration, we might be able to catalyse the advancement of data
science.
Now that we have understood the motivations behind this project, we will give a brief introduction on
artificial intelligence and machine learning defining some commonly used jargons in this field.
Page 4
1.2 Introduction to Artificial Intelligence and Machine Learning What is Artificial Intelligence (AI)?
“Artificial intelligence refers to the simulation of human intelligence in machines which are programmed
to think like humans and mimic their actions. The term may also be applied to any machine that exhibits
traits associated with a human mind such as learning and problem solving.”
– Investopedia, New York City based website that focuses
on investing and finance education and analysis. [10]
The term ‘AI’ was first coined by American computer scientist John McCarthy in 1956 during the first
academic conference on the subject. AI was then founded as an academic discipline on the same year. In
fact, AI was around years before this. The famous Turing test, by Alan Turing, was developed in 1950 to
test a machine’s ability to exhibit intelligent behavior [5]. The earliest form of AI used in games was a
checkers-playing program written by Christopher Strachey and a chess-playing program written
by Dietrich Prinz in 1951. In October 2015, AlphaGo by Alphabet Inc developed the first program to beat
a human professional in the board game Go (commonly known in Singapore by its Chinese name, Weiqi)
[11]. In August 2017, inventor Elon Musk’s startup, OpenAI, released an AI program during an eSport
(Dota 2) tournament [12]. This shocked many as it became the first ever AI to defeat professional
players in a complex game.
You may be wondering, how did AI evolve from learning to defeat professional humans in board games
like Go, to defeating professionals in a complex eSport like Dota 2 over just 2 years. This is because
instead of feeding algorithm that tells the AI what to do, scientists have written programs that allows
machine to learn from past data, generating new output and solutions to solve the problem. With the
concept of artificial intelligence, I will be introducing to you the next buzz word after AI: Machine
learning (ML).
“Machine learning is a method of data analysis that automates analytical model building. It is a branch
of artificial intelligence based on the idea that systems can learn from data, identify patterns and make
decisions with minimal human intervention.”
-SAS Institute, American multinational developer of analytics software [13]
Machine learning has been around since the 1980s. It is classified into 3 main categories: supervised
learning (learning from known answers), unsupervised learning (learning from internal structure of data)
and reinforcement learning (learning from experience) [13]. Computers use mathematical models to
study data and make their own predictions. The key point about machine learning is that it requires
minimal intervention and learns directly from the data provided. Example of such models include linear
regression, clustering and classification. Before machine learning, scientist had to fit input data together
with a function written into a program code for the model to generate an output, for instance, scientist
had to calculate the coefficients when they wanted to fit the data into a linear regression model. With
machine learning, all they need is the input data and output data, feed it into the computer and the
computer will generate its own coefficients for the model, making modelling more efficient and
accurate. Differences between machine learning and the traditional mathematical computation are
illustrated in Fig 1.2.1 and 1.2.2 respectively.
Page 5
Figure 1.2.1 Traditional mathematical computation method Figure 1.2.2 Computation with Machine Learning
Traditional mathematical computations are still commonly used today for less complex problems.
However, in a VUCA (short for volatility, uncertainty, complexity and ambiguity) world, simple solutions
are often not enough to solve complex problems. Hence, we seek for a new approach, which brings us
to the next buzz word: artificial neural networks (ANN).
While scientists were trying to improve on artificial intelligence, they went to study the human
intelligence and try to implement it on machines. They studied biological neural networks, getting
inspiration from how the neurons and synapses in the brain works, modelling the architecture on
computers. Neural networks (indicating to artificial neural networks rather than biological neural
networks for the rest of this paper), a subset of machine learning, is based on a collection of nodes
called artificial neurons and are connected to each other, sending electrical signals, similar to synapses
in the brain. Some of these neurons contain more critical features than others, hence, the neural
network must be able to differentiate the importance of each neuron. The idea of a neural networks
(also called “perceptrons”) have been around from as early as the 1940s [14]. However, early neural
networks do not have the capability to learn. Every single neuron’s importance (or weights) do not
change automatically when you feed in more data. These machines are not learning from the data and is
a form of artificial intelligence but cannot be classified as machine learning. It was only in 1969 where
the idea of backpropagation was first proposed and became a mainstream part of machine learning in
the mid-1980s. Backpropagation refers to the ability for weights in the hidden layer to adjust based on
the accuracy of its output. It calculates a loss function, which computes the deviation between the
predicted output and the output fed into the training data, through various methods such as cosine
proximity or cross entropy. The network will try to optimise the system, minimising the loss function and
maximising its accuracy, through adjusting the weights. This made neural networks a lot more advanced
than what it used to be.
What made AI, ML and ANN so successful these days? This will be answered by the final buzz word that
will introduced in this chapter: deep learning.
‘Deep learning is a collection of algorithms used in machine learning, used to model high-level
abstractions in data through the use of model architectures, which are composed of multiple nonlinear
transformations. It is part of a broad family of methods used for machine learning that are based on
learning representations of data.’
-Techopedia, IT education website that provides insight and inspiration [15]
Computer
Input Data
Program Code/
Function
Computer
Input Data
Output
Page 6
Deep learning has been introduced to the machine learning community by Rina Dechter in 1986 and to
artificial neural network community by Igor Aizenberg in 2000 [16].
In general, you could think of deep learning as an architecture which sends the input through multiple
layers of machine learning models before churning out an output. In the case of ANN, deep learning
refers to stacking the number of hidden layers before giving an output. Although it is still unclear on
what each layer does, some scientist believes that each layer takes care of a certain feature of the input.
For instance, if you fed the machine a picture of a cat, the first layer might be looking at the eyes, while
the second layer, the ears and so on. However, there are conflicting arguments with regards to this
hypothesis and there is no significant evidence to support or reject this theory.
The accuracy provided from deep learning sky rocketed as compared to a single layer neural network.
This is partly because we are in an age of where there is an inflation of data. Some people believe that
90% of the data in the world was created in the last 2 years. With more data, deep learning performs
better as compared to its traditional counterparts. In figure 1.2.3, we can see that with small amount of
data, older learning algorithm may perform better than deep learning algorithms. However, as we have
more data, older learning algorithms plateaus while deep learning models continue improving its
performance [17].
Figure 1.2.3 Graph of performance against amount of data, comparing deep learning and older learning algorithms [17]
“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the
huge amounts of data we can feed to these algorithms.”
- Andrew Ng, the chief scientist of China’s major search engine
Baidu and one of the leaders of the Google Brain Project [17]
Page 7
Figure 1.2.4 Summary of Artificial Intelligence, Machine Learning and Deep Learning [18]
Today, artificial intelligence and machine learning is used in almost every industry to improve workflow.
Most prevalent application of AI is used in predictive modeling, computer vision and time series analysis.
Predictive modeling refers to predicting an unknown event from the data provided. For instance,
predicting the species of a butterfly from the features such as its wing span length, or identifying a
criminal based on evidence provided. Computer vision refers to classifying image data based on the
features in the photo. For instance, the computer learns to differentiate between a photo of a dog and a
cat. Time series, as its name suggests, involve predicting data with a time element. An example would be
predicting how the stock market price fluctuates. A chatbot will be considered a time series prediction
as the position of words matters and these word positions can be treated as different time points.
In this study, we look at how we use deep learning in time series analysis and implanting it into a
chatbot.
1.3 Chatbots A chatbot is simply a conversation agent that interacts with human turn by turn using natural language.
Some if the more popular chatbots in the industry are Amazon Echo, Google Assistant and Siri.
1.3.1 Chatbot types In this section, we will discuss the different types of chatbots, its functionalities, followed by the
workflow of building a chatbot.
There are various types of chatbots. This is represented in Figure 1.3.1.3 shown below. Chatbots are
categorized into the domains they operate in and based on the kind of response they return.
The conversation content can either come from an open or closed domain. For an open domain, the
chatbot get its training data from an open source such as the internet, giving it the capability to answer
a vast variety of topics. A close domain refers to getting training data from a closed source of data with a
specialized area of expertise.
The kind of reply a chatbot gives can be categorized into either retrieval-based or generative-based.
Even the most advanced chatbots in the industry we have today are retrieval-based. This means that a
fixed reply is already encoded to the chatbot. It can be built simply by using rule-based sentence
matching or an ensemble of machine learning techniques to return an existing answer that the machine
previously learned. Generative-based on the other hand, are capable of generating any kind of
responses. Its ability in terms of the kind of question they can reply is limitless. However, at the present
Page 8
state of technology, this kind of chatbots are not capable of giving responses that is up to the industrial
standards [6].
Figure 1.3.1.1 Generative-based chatbot model [6] Figure 1.3.1.2 Retrieval-based chatbot model [6]
Figure 1.3.1.3. Chatbot conversation framework [18]
1. Open domain with retrieval-based responses
To have a fixed set of responses from an open domain means that there is a fixed response for any
possible question anyone can think of. This is illogical and hence, impossible to create.
2. Open domain with generative-based responses
We can ask the chatbot any possible question and it is able to generate a reply. This is the most
complete form of a chatbot and the solution to this problem is called Artificial General Intelligence
(AGI). However, we are nowhere near this technology yet.
3. Closed domain with retrieval-based responses
This is the most common type of chatbot where a specified answer has been crafted for a specific
domain. This is the most basic type of chatbot, giving the most reliable answers at present.
However, this type of chatbot is limited to what it has been taught.
4. Closed domain with generative-based responses
This type of chatbot can handle questions from the underlying dataset it was trained on and new
questions outside the underlying dataset, but within the domain. Tend to be more human like and
have their own personality. However, the generated answers are full of grammatical errors. This
approach is still not widely used by chatbot developers and is mostly found in labs [18].
Page 9
In this study, we focus on a closed domain with generative-based responses chatbot. Despite it being
harder to train and giving more inaccurate results, it is the future of chatbots. It will provide us with
more learning opportunities and gather more valuable insights.
1.3.2 Chatbot Workflow In this section, we provide an overview of the steps to build a chatbot. More details on each step can be
found in further chapters.
Firstly, we need to define the objective of the problem. This was elaborated in the motivation section
above. Following which, we import our dataset into an interpreter and start with data exploration where
we have a better understanding of the data we are dealing with.
The most obvious difference between a chatbot problem and most data science problems is the nature
of the data. Most data science problem involves dealing numbers, pictures and texts. Since a computer
has been built to handle numbers, the first kind of problem can be easily dealt with. For pictures, each
pixel in the picture can be represented by a number on the RGB colour model, hence converting it to
numbers easily. For texts, this is something more challenging. We need to vectorize these words into
vectors before we can make any predictions. Hence, this additional vectorization process is needed for a
chatbot problem.
Next, we to send our vectorized input into a model that will give us an output. In this study, we use a
model called Long-Short Term Memory (LSTM). Following which we convert our vectors back to words.
This gives us the reply of the chatbot.
Finally, we integrate the model into a user interface. This is a step which will not be dealt with in this
project due to time constraint and its ability to generate insights relevant to a Physics project.
Figure 1.3.2.1 Chatbot work flow. Last stage will not be implemented, hence labelled in red.
Clean the data with Natural Language Processing (NLP)
Vectorizing with various machine techniques
Run machine learning model LSTM and convert vectors back to words
Integration into chatbot
Page 10
2 Data Preprocessing We can now proceed to discuss the processes for building the chatbot.
2.1 Ubuntu Dialogue Corpus The dataset that is required to build a chatbot is a question and answer text data. In this study, we chose
the Ubuntu Dialogue Corpus (UDC) [21]. A corpus is a large and structured set of texts. The UDC is a
collection of logs from Freenode’s Internet Relay Chat (IRC) network. Freenode IRC is a platform that
facilitates communication in the form of text, used to discuss peer-directed projects. A new user joins
the channel and asks a general question about a problem they have with Ubuntu. A more experienced
user replies with a potential solution, after first addressing the ’username’ of the first user. This is done
to avoid confusion in the channel. At any given time during the day, there can be between 1 and 20
simultaneous conversations happening in some channels. The UDC collates the history of chat from
Ubuntu-related chat rooms [22].
Ubuntu is a free and open source operating system based on a Linux kernel as its foundation. Since it is
an open source product, support from the developers is limited. Most queries on technical support are
directed to the chat room on freenode. We extracted the dialogue corpus from freenote in this study.
The UDC consist of logs from 2004 till today. In this study, we extracted data from the start till 2015. The
dataset contains almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100
million words. A turn refers to a change of user giving an input during the dialogue. An utterance refers
to a text message sent. The reason why there is a large greater number of utterances as compared to
turns is that multiple messages can be sent before a reply is received. The conversations have an
average of 8 turns each, with a minimum of 3 turns.
2.2 Preprocessing Computations All the computations in this study is implemented in the Python programming language. Most of it is
done on the Pycharm interface with a small part of visualization done on Jupyter Notebook and
JupyterLab.
The data is split into 3 different folders, a training folder, validation folder and a test folder. The test set
is used to adjust the weights in the network, the validation set is used to ensure that there is no
overfitting and the test set is used to confirm the predictive power of the neural network. The training
dataset downloaded is 1,757,751KB with 16,587,830 rows and 6 columns. The row comprises of:
‘folder’, ‘dialogueID’, ‘date’, ‘from’, ‘to’, ‘text’.
● folder: The folder that a dialogue originated from.
● dialogueID: An ID number given to a specific dialogue.
● date: A timestamp of the time that the particular dialogue was sent.
● from: The username of the user whom sent the line of dialogue.
● to: The username of the user whom they were replying. On the first turn of a dialogue, this field
is blank.
● text: The text of that turn of dialogue [23]
Page 11
Table 2.2.1 illustrates the corpus.
Due to a high computation power required to process such a large dataset, we sliced the data and ran
50,000 lines.
As mentioned earlier, there were many inputs before a turn. The first step was to combine the multiple
question input into one input and repeat this for the responses. The outcome gives us a list of question
and answer pair, each contained in strings. For questions that does not have a reply, such as a thank you
statement, where the conversation comes to an end, it is paired with an empty string. All the question
and answer pairs in a dialogue with the same dialogue ID are put into a list. Individual list of each
dialogue is appended into a bigger list which contains all the information. For easy visualization, the list
looks like this:
[[[Question, Answer], [Question, Answer], … [Question, Answer],[Question, ‘ ’]],[[Question, Answer],
[Question, Answer],…]… …]
Table 2.2.1 Table of first 15 inputs from the Ubuntu Dialogue Corpus visualised in a pandas dataframe
Page 12
2.3 Natural Language Processing (NLP) pipeline Natural Language Processing (NLP) is a set of techniques that enables computer to “talk” to human and
understand the human language. NLP accounts for the hierarchical structure of a language, such as
letters forming words and words forming a sentence. This is a big challenge in computer science because
of the complexity and ambiguity found in languages. For instance, a computer needs to know both the
meaning of the word and how these words are linked together to create the meaning of the text. NLP
techniques are broken down into 3 stages [2]:
1. Morphological analysis
2. Syntactic analysis
3. Semantic analysis
2.3.1 Morphological Analysis In this stage, we study the elements within words. The key processes are as follows:
• Tokenization
Texts are broken down into symbols, words, phrases or other text elements called tokens. For
example, a sentence (“I am Louis”) is broken down into (“Hello”, “I”, “am”, “Louis”)
• Stop-word removal
Most common words in a language that gives little meaning to a text such as ‘a ‘, ’the’, ‘is’ and
‘are’ are removed. The removal of these words will not change the meaning of the entire text.
• Special character removal
Similar to stop-word removal, we will remove special characters which does not help a
computer to comprehend the sentence. This includes symbols like (“&”, ”@”, “,”)
• Stemming
Reducing conjugated words to the same word stem. For instance, words like eating and eaten
can all be stemmed to ‘eat’. These 3 words hold the same meaning. The simplest method of
stemming is the affix removal stemmers. This method removes pairs of letters from the end of
the word. For instance, “eating” becomes “eat”. But “aging” becomes “ag”. Hence, stemming
often lead to problems such as over-stemming and under-stemming.
• Lemmatization
Lemmatization serves the same purpose as stemming of reducing conjugated words back to its
root word. The difference is that instead of slicing off the end words to get the root word,
lemmatization uses a vocabulary to transform words properly into its root form, called the
lemma. To do so, it is necessary to have detailed dictionaries which the algorithm can look
through to link the form back to its lemma. For instance, lemmatization process “ate” and
transform it into “eat”
Page 13
• Automatic Query expansion
Reformulating a query to encourage matching between texts
Some techniques used to expand queries are using synonyms of words, stemming all words and
fixing spelling errors
• Part-of-speech (POS) tagging
Words can be tagged as a noun, verb, pronoun or article. 2 approaches are:
1. Statistical approach (Markov models)
2. Rule based approach
o Rules, such as word position in a sentence, are place into an algorithm to tag the words
2.3.2 Syntactic Analysis In this stage, we study the elements within sentences. The key processes are as follows:
• Parsing
Convert sentences into its formal grammar structure. The output is a parse tree that illustrates the
syntactic relation between words in the input sentence.
Figure 2.3.1 Illustration of a parse tree [25]
• Bag-of-words
Bag-of-words is an orderless representation of text. Text is portrayed as the bag of words, without
considering the relationship between words and grammar. This method represents the word in large
text and is most frequently used for document classification
• N-grams
N-gram models are used to store spatial information. It models the probability of the words from
the sequence of words and estimates the next word. The probability that a word in used next in the
sequence is estimated by the frequency that this word follows the same sequence in the training
corpus, divided by the frequency that the sequence is present in the training corpus, N stands for
the number of words considered in the sequence. For example, the sentence: “I study in the
National University of Singapore”, when N=2, the N-gram of the sentence becomes: “I study”,” study
in”, “in the”, “the National”, “National University”, “University of”, “of Singapore”.
Page 14
• Term Frequency-Inverse Document Frequency (TF-IDF)
Uses statistic to reflects the importance of a word to a text in a sentence or text. The TF-IDF statistic
increases when a word is more frequently found in a text, however, loses its importance if it is
frequently found in all text, suggesting that it may be a stop word.
(𝑇𝐹 − 𝐼𝐷𝐹) = 𝑡 𝑙𝑜𝑔 𝐷
𝑑
➢ t - number of times the word appears in that particular document (term frequency in the input) ➢ d - number of text documents the term appears in ➢ D - total number of text documents
2.3.3 Semantic Analysis Assigning meaning to words, sentences and texts. Structures are created to represent the meaning of
words and phrases, however, there is no optimal solution to automatically derive the meaning from text
despite intensive research by scientists.
2.3.4 Natural Language Processing in this study Natural Language Processing itself is a field that many data scientists spend their career research on. To
keep the project manageable, we only did morphological analysis to our text data. Due to the
complexity of many of these techniques we performed the following NLP processes:
We tokenized the text (“Hello”, “I”, “am”, “Louis”), remove stop words (“a”, “the”, “and”) and special
characters (“&”, “@”, “,”) since we only want to keep the context of the sentence.
We import a Python library called Natural Language Toolkit (NLTK) and fit our data into these processes
A snapshot of sentences after going through the NLP pipeline can be seen in Figure 2.3.4.1
Figure 2.3.4.1 Section of the Ubuntu Dialogue Corpus after tokenising, removing of stop words and special characters.
Page 15
3 Vectorizing of text We need to transform our text data into numbers for our computer to process.
3.1 Introduction Words are not naturally understood by computers. By transforming words into a numerical form, we can
apply mathematical rules and do matrix operations on them to obtain an output.
The most basic way to numerically represent words is through one-hot encoding. This means that every
unique word in the dataset is represented by a one vector in the vector space, with 0s everywhere else.
The dimension of the vector will then be the number of unique words. This results in an enormous
vector that captures no relational information [24][26][28].
Figure 3.1.1 Visualisation of one-hot encoded vector
As seen in the diagram, every word is of the same distance, hence, synonyms and antonyms are treated
to be the same.
3.2 Word Embeddings Word embeddings is a real number, vector representation of a word. Ideally, words with similar
meaning will be close together when being represented on the vector space. The goal is to capture the
relationship in that space. With words in densely populated space, we can represent word vector in a
much smaller space as compared to one hot encoded vector that could go up to millions of dimensions
[26][27][28].
In this study we will be implementing 3 popular word embedding techniques, namely, word2vec, GloVe
and fastText.
3.2.1 Word2vec Word2vec was created by a team of researchers at Google, led by Tomáš Mikolov. It is the most popular
method for training embedding [26][27][28][29[[30][32][33].
This model involves a statistical computation to learn from a text corpus. It is a predictive model that
learns their vectors to improve their predictive ability by reducing its loss function. It is also the first
model that considers the closeness of word meaning in a vector space.
Page 16
There are 2 methods that this model takes during training.
1. Continuous Bag-of-Words (CBOW)
As briefly explained in 2.3.2 what a bag-of-words is, this method determines the context of a word
by the surrounding words, or continuous-bag-of-words. It then learns an embedding by predicting
the current words based on the context.
2. Continuous Skip-Gram
This method also learns an embedding by predicting the surrounding words given the context. In
continuous skip-gram, the model uses the current word to predict the surrounding window of
context words.
According to the google team, CBOW is faster than skip-gram. However, skip-gram performs better at
infrequent words
In this study, we used the Google’s pre-trained model. Its word vectors embed a vocabulary of 3 million
words and phrases trained on approximately 100 billion words from the Google News dataset. There
was no explicit detail on whether Google used CBOW or skip-gram to train the model. Each vector has
300 dimensions.
3.2.2 GloVe GloVe, derived from Global Vectors, is another model for word embedding. This model is created by a
team of researchers from the Stanford Artificial Intelligence Laboratory in the computer science
department of Stanford University [26][27][28][31][33].
An extension of word2vec, instead of being a predictive model, GloVe is a count-based model.
Initially, a sparse matrix (large matrix with mostly zero terms) of words × context with the count of word
frequency in the corpus is constructed. Context refers to the word next to (before, or after) the word of
interest. For example, the sentence “The boy ate at the table” with a window size of 2 would become a
co-occurrence matrix as seen in table 3.2.2.1
Table 3.2.2.3.2.1 Word context co-occurrence matrix for the sentence “The boy ate at the table”
the boy ate at table
the 2 1 2 1 1
boy 1 1 1 1 0
ate 2 1 1 1 0
at 1 1 1 1 1
table 1 0 0 1 1
When many sentences are added together, this matrix will grow into a sparse matrix with many 0
entries. This matrix is manipulated based on the hyperparameters set to shift the weights on certain
words. The word context co-occurrence matrix is then deconstructed into a word feature matrix and
feature context matrix as shown in figure 3.2.2.1
Page 17
Figure 3.2.2.1 Matrix illustration for the construction of a GloVe embedding [28]
The row of the word feature matrix is the representation of the GloVe vector of each word.
GloVe vectors are very good at global information, but do not perform that well when capturing
meanings of words.
In this study, we used the GloVe’s pre-trained model. Its word vectors embed a vocabulary of 2.2 million
words and phrases trained on approximately 840 billion words from the Common Crawl dataset. Each
vector has 300 dimensions.
The Common Crawl dataset from text all over the web from a non-profit organization, Common Crawl.
3.2.3 fastText fastText is another word embedding method created by Facebook's AI Research (FAIR) lab. Like GloVe, it
is another extension of the word2vec model [26][27][28][32].
Unlike the previous 2 models which treat words as the smallest unit to train on, fastText treats each
word as composed of character N-gram as explained in 2.3.2, but for word level. For example, the word
vector for “hello” is a sum of the vectors: “he”, “hel”, “hell”, “hello”, “ello”, “llo”, “lo”, “ell”, “el”, “ll”.
This feature allows fastText to support words that are not trained in its vocabulary. For instance, the if
the model has been trained on the word ‘apple’ but has not been trained on the word ‘pineapple’, It will
be able to form a relationship between those 2 words and give a meaning to the word. Hence, fastText
is known to best handle rare words.
In this study, we used the fastText’s pre-trained model. Its word vectors embed a vocabulary of 2 million
words and phrases trained on approximately 600 billion words from the Common Crawl dataset. Each
vector has 300 dimensions.
3.2.4 Embedding Comparison Cosine similarity or cosine proximity is a measure of closeness between 2 non-zeros vectors. It is the
result of an inner product space, measuring the cosine angle between the 2 vectors.
𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑐𝑜𝑠 𝜃 = |𝐴 ∙ 𝐵
‖𝐴‖‖𝐵‖| = ||
∑ 𝐴𝑖𝐵𝑖𝑛𝑖=1
√∑ 𝐴𝑖2𝑛
𝑖=1 √∑ 𝐵𝑖2𝑛
𝑖=1
||
➢ A and B – first and second vector respectively
Page 18
➢ 𝜃 – cosine angle
For each word embedding model, we searched for words that are closest to the word ‘hello’ in terms of
cosine similarity. The similar words and cosine similarity value for each model are presented below.
We mapped the similarity between the word ‘Hello’ and ‘Bye’. These 2 words should be almost parallel
but in opposite direction because of the meaning of the words, hence should have a cosine similarity
value that is close to 1.
Cosine similarity values will be close to 0 when 2 vectors are orthogonal and have no relationship with
each other.
Hence, cosine similarity is a measure of how related 2 words are rather than how similar 2 words are.
Word2vec
Words most related to: ‘Hello’
[('Hi', 0.6188793182373047), ('Hello_hello', 0.6035354137420654), ('Goodbye', 0.6006848812103271),
('Hiya', 0.5998829007148743), ('Hey', 0.5955354571342468), ('Bruce_Springsteen_bellowed',
0.5867887139320374), ('hello', 0.5842196345329285), ('Welcome', 0.570493221282959), ('Hullo',
0.5693444609642029), ('@_ESPN_Michelle', 0.5587143898010254)]
Figure 3.2.4.1 word2vec top 10 most related words to ‘Hello’
Similarity between ‘Hello’ and ‘Bye’ (cosine similarity)
0.34806252
00.10.20.30.40.50.60.70.80.9
1
CO
SIN
E SI
MIL
AR
ITY
RELATED WORDS
word2vec 10 most closely related words to 'Hello'
Page 19
GloVe
Words most related to: ‘Hello’
[('Hi', 0.8000067472457886), ('hello', 0.7661874294281006), ('Hey', 0.7338173985481262), ('Dear',
0.7021666169166565), ('Greetings', 0.6533131003379822), ('Thank', 0.6320098638534546), ('Thanks',
0.6301470994949341), ('Welcome', 0.6017478704452515), ('Howdy', 0.5910314321517944), ('Happy',
0.5842403769493103)]
Figure 3.2.4.2 GloVe top 10 most related words to ‘Hello’
Similarity between ‘Hello’ and ‘Bye’ (cosine similarity)
0.53188723
fastText
Words most related to: ‘Hello’
[('Hi', 0.8838053345680237), ('Greetings', 0.7670567035675049), ('Hellow', 0.7657904624938965),
('Helllo', 0.7620762586593628), ('Hallo', 0.7522884607315063), ('HEllo', 0.7484337687492371), ('Hiya',
0.7310601472854614), ('Howdy', 0.7185547947883606), ('Helloo', 0.7155404090881348), ('Hey',
0.7117010354995728)]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CO
SIN
E SI
MIL
AR
ITY
RELATED WORDS
GloVe 10 most closely related words to 'Hello'
Page 20
Figure 3.2.4.3 fastText top 10 most related words to ‘Hello’
Similarity between ‘Hello’ and ‘Bye’ (cosine similarity)
0.5045497
Table 3.2.4.1 summarizes these results and compares the cosine similarity word representation between
the 3 word embedding models
Table 3.2.4.1 Cosine similarity between the vector representation of the word ‘Hello’ against the words ‘Bye’ and ‘Hi’ across the 3 word embedding models
3.3 Implementation In the preprocessing stage, the list was grouped into a question and answer format. This data feed into
the 3 word embedding models mentioned. Since the models were already trained on other sources, this
process of using a trained data for another dataset is referred to as transfer learning. For all 3 models
each word is vectorized into a 300-dimension vector. Each question and answer were truncated to a
length of 14 words. Reason to why this is done will be explained in section 4.2 on vanishing and
exploding gradient problem. Finally, to indicate to the model that the sentence has ended, we filled the
last vector with a sentend (short for sentence end) vector. This vector is a 300-dimension vector filled
with value 1. For those vectors with less than 14 words, we filled the shortage with the sentend vector
such that every question and answer input is of length 15.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CO
SIN
E SI
MIL
AR
ITY
RELATED WORDS
fastText 10 most closely related words to 'Hello'
word2vec GloVe fastText
'Hello' against 'Bye' 0.34806252 0.53188723 0.5045497
'Hello' against 'Hi' 0.618879318 0.800006747 0.883805335
Page 21
An example of how the word “thanks” look like through a 300-dimension GloVe embedding:
array ([ 6.1785e-01, -4.4125e-01, -1.2995e-01, 3.0474e-01, -1.6964e-01, 5.8525e-01, 1.7816e-01, -2.4445e-01, 4.6609e-01, -5.5441e-01, -1.9651e-01, 1.2467e-01, 2.3402e-01, 4.3042e-01, -6.9528e-02, 3.6921e-01, -4.1056e-01, 8.0052e-01, -4.0739e-02, -5.5692e-01, -7.4094e-01, 3.2417e-01, -6.6123e-02, 1.2387e-01, -2.4245e-01, -4.9114e-01, -7.7270e-02, -4.1978e-01, -1.9696e-01, 1.5016e-01, 8.0221e-01, -5.1428e-01, 2.9590e-01, 1.4012e-01, -5.6856e-01, -1.0086e-01, -1.2372e-01, 5.6630e-01, 7.3632e-01, -4.3627e-01, -3.4570e-01, -2.4257e-01, 5.0708e-01, 2.2338e-02, 3.2261e-01, 1.2905e-01, 8.9789e-01, 4.7432e-01, -3.3794e-01, 2.7993e-01, 1.5664e-01, 2.0991e-01, 8.7210e-02, -1.1773e+00, -1.4127e-02, -8.5633e-02, -1.7679e-01, -3.6956e-01, 6.3628e-01, 9.3964e-02, -3.5051e-01, 4.4953e-01, 1.1335e-01, -6.9076e-01, 9.5427e-01, 3.8231e-02, 2.8848e-01, -3.1629e-01, -5.3073e-01, 4.6703e-01, 2.8946e-01, 6.0747e-02, 7.8110e-01, 9.7159e-02, 3.5411e-01, 3.5429e-01, 4.5753e-01, 2.1665e-01, 7.7750e-02, 1.5379e-02, 1.1258e-01, 2.4216e-01, 2.7584e-01, 2.0799e-02, 2.5160e-03, -7.7169e-02, 1.0423e+00, -8.2542e-01, 5.8036e-01, 1.4357e-01, -2.2185e-01, -7.8225e-01, 2.0596e-01, -2.8564e-03, 2.3707e-01,5.8756e-01, -6.2610e-01, -3.7070e-01, 1.8357e-01, 3.7607e-01, -1.6075e-01, 5.4994e-01, -2.1851e-02, -1.1604e-01, -3.5818e-01, -1.0260e+00, -9.4983e-03, 2.1102e-01, 1.7862e-01, -2.7145e-01, -2.4182e-01, -2.9118e-01, 5.1136e-02, -5.6239e-01, -1.0261e-01, 3.6359e-01, 3.4954e-02, -2.3305e-01, -8.0760e-01, -6.5115e-02, -6.5976e-02, -7.2082e-02, -5.8931e-01, -5.1587e-01, 1.5050e-01, -2.0810e-01, 2.3459e-01, -4.1443e-02, 4.8942e-01, -4.6887e-01, -3.1591e-01, 1.1605e-01, -5.1222e-01, -3.3649e-01, -2.3522e-01, 2.1687e-01, 2.5116e-02, -4.7138e-01, -3.8184e-01, -2.4276e-01, -5.7618e-01, -3.4715e-01, -3.4020e-01, -2.3749e-01, 3.6804e-01, -2.2816e-01, 1.4530e-01, 3.6457e-01, -1.1285e+00, -1.9550e-02, 4.6852e-01, 4.3800e-01, -2.1323e-01, 4.0452e-01, 3.0908e-01, 3.1752e-01, -1.6075e-01, 2.2468e-01, -1.4104e-01, -2.1498e-01, -9.0856e-02, -1.1202e-01, 4.0014e-01, 7.7541e-03, -6.0925e-01, -5.3467e-01, -3.3437e-01, -2.9339e-01, -3.6834e-01, 1.4113e-01, 4.4871e-01, -1.6597e-01, 3.6154e-01, 3.0655e-01, 1.8177e-01, 4.8663e-01, 7.5956e-02, 2.7227e-01, -2.9530e-01, 3.7055e-01, 1.4913e-01, -4.4195e-01, 3.6298e-01, 6.8432e-02, -3.2926e-04, 6.1724e-01, 4.8816e-02, -1.6167e-01, -5.6976e-01, -4.9326e-01, 3.5325e-02, -1.9735e-01, -4.8631e-01, 4.3613e-02, 9.1704e-01, -1.8168e-01, 3.3793e-02, -1.9161e-01, 3.8046e-01, 5.3991e-01, -1.9965e-01, -4.2029e-01, -4.4788e-01, 3.2937e-03, 1.2840e-01, -4.0793e-01, 2.1293e-01, 1.3234e-01, 2.3004e-01, 1.5066e-02, 4.9688e-01, -1.9250e-01, 1.8341e-01, -8.0931e-02, -7.9795e-01, -6.0971e-01, 3.5563e-01, -2.0999e-01, -7.3456e-01, 1.6439e-01, 4.4013e-01, 3.7708e-01, -1.6677e-01, 1.3012e-01, 2.5617e-01, -3.3465e-01, 3.6778e-01, -2.8760e-01, 4.8935e-01, 1.2266e-01, 2.7119e-01, -4.0542e-01, 4.8436e-01, -3.3753e-01, 5.8466e-02, -6.1795e-01, -1.7105e-01, -1.2160e-01, 8.3938e-01, -7.3814e-02, 2.3217e-01, 2.3231e-01, 4.2904e-01, -3.0190e-01, 4.3600e-01, -8.6186e-01, -5.4164e-01, 1.5913e-01, 2.0317e-01, -6.8615e-01, -1.4824e-01, -2.2892e-01, 5.0858e-01, -6.0777e-01, 5.9970e-01, -4.3597e-01, -2.9010e-01, -4.4896e-01, -9.1547e-02, -2.4183e-01, 5.4375e-01, 1.7586e-01, 6.5346e-01, -7.7777e-01, -2.3320e-01, 2.6135e-02, -3.8351e-01, 9.5452e-02, 8.0358e-01, -4.6460e-01, -4.3092e-01, 5.3605e-01, 1.1099e-01, 3.9731e-01, 1.4128e-01, 1.1530e-01, 8.0781e-01, 1.3066e-01, 5.7696e-01, -6.0291e-01, -3.6552e-01, -4.2259e-01, 1.2372e-02, 4.6832e-01, 1.0419e+00, -3.3080e-01, -3.4654e-01, 4.8942e-01, 6.6817e-01, -4.4043e-01, 1.9420e-01, -1.2774e-01, -2.7415e-01, 7.7054e-02, 3.7732e-01, 3.4198e-02, 3.6308e-01, -7.6669e-01, 3.7364e-01, 3.6009e-01])
Page 22
On a histogram,
Figure 3.3.1 Histogram for the GloVe embeddings of the word 'thanks'
This is how the sentend vector look like:
Figure 3.3.2 The sentend vector visualised on JupyerLab
Page 23
4 Deep Learning Framework Before we will now dive into the deep learning framework used in this model, we need to understand
the problem setting. The chatbot is a sequence-to-sequence problem. We will be using a recurrent
neural network (RNN) or more specifically long-short term memory (LSTM) model in this problem.
4.1 Sequence-to-sequence (Seq2Seq) We first look at the problem setting of this project. In language study, the placement of words affects
the meaning of a sentence. For instance, the sentence “you are happy” and “are you happy” may
contain the same words however they have different meanings. Even though we have removed the
special character ‘?’ in the latter statement, we can still interpret that the second statement is a
question to understand if the second person is feeling happy while the first statement is indicating that
the second person is feeling happy. From this example, we need to construct a model that accounts for
where a token is inserted into the sentence. Hence this is a time series problem that requires a model to
accept a time series input and return a time series output. This kind of problem is referred to as a
sequence-to-sequence learning model. The initial sequence is fed into the encoder which considers the
order of the input is then sent into a model before delivering an output which is returned at the decoder
with the correct word order. This is illustrated in Figure 4.1.1 where the sentence “how is nus” followed
by the sentend vector mentioned in chapter 3.3 is used to indicate the end of the sentence is loaded
into the encoder. The decoder then gives the sequential output “it is awesome” and SENTEND after
passing through the black box which represents the model we will be using. This is a simplified model
and, in our study, as mentioned in chapter 3, we are using an encoder and decoder that handles 15
vectors for its input and output [34].
Figure 4.1.1 An illustration of the sequence-to-sequence problem setting, starting with an encoder and ending with a decoder
4.2 Recurrent Neural Network (RNN) We will now look at the model which was depicted by the black box in section 4.1.
Before heading to the specific model that we will be using, we will first understand the kind of network
we will be dealing with. The most common way of modeling a seq2seq problem is with a recurrent
neural network model. Deep learning and neural networks were already introduced in section 1.2. We
now look at a type of neural network called the recurrent neural network (RNN). As most networks are
feed forward network, this kind of neural network are recurrent. This means that there are loops in the
network and the output of one unit may go back to one of the already visit units. This is illustrated in
Figure 4.2.1 where there is an arrow at the hidden layer which loops back into the hidden layer. This
loop is not present in a typical feed forward neural network. This loop can be “unfolded” into what we
see in Figure 4.2.1. “unfolded” is in quotation marks because we cannot unfold the network and this
Page 24
unfolding is only illustrated for visualization purpose. This allows us to draw a closer comparison with
the seq2seq problem statement we had before. The initial black box we described is now replace by a
hidden layer. We call each of this hidden layer a cell [34][35].
The biggest problem that exists with RNN is the vanishing and exploding gradient problem. This problem
was first discovered by scientist Sepp Hochreiter in 1991. Following which, there were many papers
written targeted for this problem. We will generally cover this problem now.
To simplify the equation of an RNN, we have the equation:
𝑂 = 𝑊𝑛𝐼
➢ I - the input
➢ O - the output
➢ W - the weight
➢ n - the number of sequence input.
If we do not limit the input and output sequence, the variable n will be a number, 0 < 𝑛 < ∞ . When
the number of inputs from the training becomes a large number, the value of Wn will tend towards
infinity if W is greater than 1 or tend towards 0 if w is less than 1. Hence, the output value will be either
infinity or 0. Due to this limitation of the model and to reduce the computational cost of our model, we
have limited the input to 15 as mentioned in section 3.3. However, we found a specific type of RNN
which addresses our first consideration. Hence, the result of our truncation is mainly to deal with
computational cost.
4.3 Long Short-Term Memory (LSTM) We will now explore the specific model we use in this study. Long short-term memory is a special type of
RNN where you connect the units in a specific way such that it avoids the vanishing and exploding
gradient problem that arise initially in a typical RNN [36].
LSTM was discovered by Hochreiter (the same scientist who discovered the vanishing and exploding
gradient problem in RNNs) and Schmidhuber in 1997. Over the years many other scientists refined and
popularized this model. It is model that works on a wide variety of problems, some of which were
mentioned earlier in chapter 1.
To understand how LSTM is different from a typical RNN, we study the architecture of each individual
cell for both models.
Figure 4.2.1 An illustration of a recurrent neural network (RNN) model, “unfolded” to demonstrate the recurrent process
Page 25
Figure 4.3.1 Illustration of a repeating module in a standard RNN model, containing a single tanh layer [36].
Figure 4.3.2 Illustration of a repeating module in an LSTM model, containing 3 sigmoid and 1 tanh layer [36].
In an artificial neural network like this LSTM, all the “memory” of the networks is in the form of vectors
that we have created in chapter 3. To “remember”, “learn” or “forget” is analogous to when a
mathematical operator acts on the “memory” vector to retain, alter or remove the values.
Figure 4.3.1 represents the cell of a standard RNN while Figure 4.3.2 represents the cell of a LSTM. In a
standard RNN, it consists only of a single tanh neural network layer. In a LSTM, there are four neural
network layers interreacting in a unique way that allows the model to “remember” in its long-term
memory and “forget” in its short-term memory, hence the name long short-term memory. The long
term “memory” is embedded in the cell state represented by C while the short-term memory, memory
of the previous (t-1) output, is embedded in the hidden state represented by h in the figures.
Page 26
Each line in the diagram refers to a “memory” and in our study is represented by a 300-dimension
vector. The pink circles with an operator inside are pointwise operators. These occurs at intersections
between 2 vector line and forces the 2 vectors to do an operation. The pointwise operator with a ‘+’
performs a vector addition while those with ‘x’ performs a pointwise product (ie, [u1,…,un] x [v1,…,vn]
=[u1v1,…,unvn]. These pointwise functions are also referred to as gates as they decide on what
information is retained, added or remove from the system. The yellow boxes represent the neural
network layers. These layers comprise of a weight and bias which are being updated from
backpropagation during the training. Merging lines without the pink circles refer to concatenation of
vectors while splitting lines represents vectors being copied and going on separate paths.
We will explore the key idea behind the LSTM model.
In Figure 4.3.3 we have the cell state. This contains the “memory” from the previous sequence. We see
that the cell state goes through the cell without going through any neural network layer, it only passes
through 2 pointwise operation, hence the information through the system can only be altered linearly at
the pointwise operation. These structures are called gates. The vector through this stream are often
unchanged and is thus responsible for the long-term memory.
Figure 4.3.3 Illustration of the path of a “cell” through a repeating unit, responsible for keeping the “memory” of the model [36].
Before we explore the function at each neural network, we will introduce the two functions that are
governing the neural network. The first function is a sigmoid function. This function takes in any value
and compresses the value to produce an output with a value between 0 and 1. The second function is a
hyperbolic function of tangent, also referred to as a tanh function. This function takes in any values and
squeezes the values between -1 to 1.
The sigmoid and tanh functions are defined by the following formulas respectively
𝑆(𝑥) =1
1 + 𝑒−𝑥=
𝑒𝑥
𝑒𝑥 + 1 , tanh 𝑥 =
sinh 𝑥
cosh 𝑥=
𝑒𝑥 − 𝑒−𝑥
𝑒𝑥 + 𝑒−𝑥=
𝑒2𝑥 − 1
𝑒2𝑥 + 1
The first layer we will be looking at is called the “forget gate layer” as shown in Figure 4.3.4. The input
through this layer is a concatenated vector from the hidden state ht-1 and from the new input vector xt.
This concatenated vector consists of 2 300-dimension vectors where xt is fed into the kernel while ht-1 is
fed into the recurrent kernel. More about the kernel and recurrent kernel will be discussed in chapter 4.
The output is governed by the sigmoid function and produces a vector with values between 0 and 1. This
vector undergoes a pointwise multiplication with the cell state as shown in Figure 4.3.3. This allows the
Page 27
cell states to either “remember”, “learn” or “forget” information from the previous cell. For instance,
when the word fed in is a new subject, we want the model to forget the gender of the old subject.
Figure 4.3.4 Illustration of the vectors path through the "forget gate" [36].
In the next step, the model needs to decide if there will be any new information that needs to be added
to the “memory” of the cell state.
There are 2-part process as shown in Figure 4.3.5. The same concatenated vector that went through the
first neural network is being duplicated and one vector goes through a sigmoid layer to decide which
values to update, producing a vector it. The second vector goes through a tanh layer to create a new
vector referred to as candidate, �̃�𝑡. A pointwise product is performed on the 2 new vectors it and �̃�𝑡 to
create a new vector to be added into the cell state. This is shown in Figure 4.3.3.
An application of this will be when the forget gates causes the system to “forget” the gender of the old
subject, the gender of the new subject is updated into the cell state.
Figure 4.3.5 Illustration of the vectors path through the "input gate" [36].
Figure 4.3.6 Illustration of the vectors through the "forget gate" and "input gate" affecting the cell state [36].
Page 28
In the final process, the same concatenated vector through a sigmoid layer. This output vector decides
that parts of the cell state that will be output. The cell state goes through a tanh function to have its
values compressed between -1 to 1. Note that this is not a neural network layer but is simply a tanh
operator acting on the vector without the influence of weights and biases. These 2 vectors undergo a
pointwise multiplication to produce the new hidden state ht.
Figure 4.3.7 Illustration of the vectors path through the "output gate" [36].
In our setup, we stacked 4 LSTM on top of each other forming 4 layers. Note that there are 4 LSTM layer
and within each layer, there are 4 neural network layers. Figure 4.3.8 illustrates the set up used in this
study.
Figure 4.3.8 Illustration of a 4-layer LSTM that we use in this study.
Page 29
4.4 Training Upon understanding the model used, we trained out data on the proposed model.
The duration which we set to train our model is called an epoch. One Epoch is defined to be when an
entire dataset is passed forward and backward through the neural network only once.
We sent each of our vectorized data into the LSTM network. At this point, out 50,000 lines that had
been combined earlier in the preprocessing stage mentioned in chapter 2, our training dataset now
contains 14938 pairs of question and answer. Our training is set to 1000 epochs through 4 LSTM layers
with 300 neurons in each neural network layer.
In general, there is no rule on how many networks and epoch to run the dataset, too many might result
in overfitting and too little might not represent the data well enough. This is based on experience,
considering processing power of the computer and the data size. The parameters chosen in this study
were taken with reference from chatbot models that experts used.
Figure 4.4.1 Snapshot of the LSTM training process, where the first mini batch for the first epoch, the estimated time of completion/arrival (ETA) is 6.33min, loss function (cosine proximity) is -0.4829, accuracy us 0.0083. For the 14938 epochs, it is split into mini batches of size 32.
Page 30
We now look at the parameters in the training.
In each neural network layer in each cell, there are 300x300 weights in the kernel and 300x300 weight in
the recurrent kernel (300-dimension input on 300 neurons). All the weights in each LSTM layer kernel
and recurrent kernel are concatenated. We can see how this is broken down in Figure 4.4.2. The total
number of weights and recurrent weights in the LSTM is 2,880,000. Each neural network layer contains
300 biases and each LSTM layer contains 300x4 = 1200 biases. There is a total of 4,800 as seen in Figure
4.4.2. More details on weights and biases can be found in section 5.2.
An activation is the output at each neuron given a set of input, defined by an activation function.
In general, at each neural network layer in the LSTM as described in 4.3, it follows the equation:
𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 = function (𝑊𝑘 × 𝑥𝑡 + 𝑊𝑟 × ℎ𝑡−1 + b)
➢ activation – output, (300, 1) matrix
➢ function – the activation function, given by a sigmoid or tanh function
➢ Wk – weight of kernel, (300, 300) matrix
➢ xt – input at time t, (300, 1) matrix
➢ Wr – weight of recurrent kernel, (300, 300) matrix
➢ ht-1 – hidden layer from time t-1, (300, 1) matrix
➢ b – bias, (300, 1) matrix
The size of the matrices is specific to this study, where (m, n) represents m number of rows and n
number of columns.
Figure 4.4.2 Distribution of weights in kernel and recurrent kernel and its biases in each layer
Page 31
5 Results In this chapter, we will discuss the results obtain from the 3 different models we built. We broke down
the analysis into fundamental analysis, studying the intrinsic results such as the theoretical accuracy,
loss functions, weights and biases of the model, and technical analysis, studying the actual response
from the chatbot models.
5.1 Fundamental Analysis We study the values of the accuracy and the loss function at each epoch obtained from training the
model and the weights and biases at every 100 epochs from a mathematical point of view.
5.1.1 Accuracy and loss In this section, we will discuss about the accuracy and loss recorded during the training of each model.
Accuracy in this section refers to the theoretical accuracy obtained during the training. It refers to the
accuracy of the words matching with the theoretical “correct” answer that was given as the output. This
is not to be confused with the experimental accuracy give in section 5.2.1. The accuracy has a maximum
score of 1 when all the words in the validation is equals to the expected output. In this study, since our
input and output is of size 15 words, if it manage to predict 9 words in the correct position of the
sentence, it will achieve a score of (9 ÷ 15) = 0.6. We will now analyse the results we achieved.
Figure 5.1.1 Graph of accuracy against epoch across the 3 models during the training
From the accuracy graph, we gathered that GloVe obtained the best accuracy. Initially, the accuracy was
an almost an exponential increase until around the 150th epoch. The accuracy spiked from
approximately 0.3 to approximately 0.7 from the 150th to 200th epoch. After around 200 epochs, its
accuracy slowly increases to slightly below 0.8 where it starts to plateau. This means that out of 15
words that was fed into the model, the model was able to predict approximately (0.8 x 15) = 12 words of
the output correctly after training. This of a very high accuracy. The model with the next highest
accuracy is word2vec. Initially, the accuracy of the word2vec model was very low, close to 0. Its gradient
was almost flat and accuracy was not improving despite the training. From the 255th epoch, its accuracy
spiked from 0.067 to 0.404 in the 266th epoch. Following which, it starts to plateau at around 0.5 for the
Page 32
rest of the training. This model was able to predict approximately (0.5 x 15) =7.5 words of the output
correctly after training. FastText fared the worst at in end of the training in terms of the training
accuracy. Despite having a higher initial accuracy than word2vec, the training was not as efficient. Unlike
the first 2 models, there was no large jump in accuracy found from the fastText training. Accuracy only
improved after the 65th epoch and its rate of increase slowed down after the 200th epoch. Towards the
end of the train, from around 800 to 1000 epoch, there was a lot of noise observed. After 1000 epochs,
its training accuracy ended at 0.26. This means that it could only predict approximate (0.26 x 15) = 4
words of the output correctly.
The loss function is calculated from the cosine proximity/similarity formula:
𝐿 = − 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = − 𝑐𝑜𝑠 𝜃 = −𝐴 ∙ 𝐵
‖𝐴‖‖𝐵‖ = −
∑ 𝐴𝑖𝐵𝑖𝑛𝑖=1
√∑ 𝐴𝑖2𝑛
𝑖=1 √∑ 𝐵𝑖2𝑛
𝑖=1
Where L represents the loss function and A and B are the input and output vectors
In a usual cosine similarity formula, the value ‘1’ is obtained when 2 vectors are completely similar
(parallel) and 0 when they are dissimilar (orthogonal). However, unlike the usual cosine similarity
formula, there is a negative sign included here. This is to illustrate the loss function to decrease, getting
closer to -1, when the answers are getting more similar.
Figure 5.1.2 Graph of loss against epoch across the 3 models during the training
We can observe that the 3 models are decreasing in a similar shape. Like the accuracy graph, the GloVe
model performed the best (closest to -1). FastText performed better than word2vec in terms of loss.
This means that the vectors predicted by fastText were closer to each other than word2vec (have a
closer vector representation to the targeted answer), however, the final vector results produced were
not the same word, hence fastText scored a lower accuracy.
Page 33
5.1.2 Weights and Biases In this section, we analyse how the weights and biases evolve in each layer during the training and
compared the final training weights and biases across all 3 models.
The weights in the neuron applied to the input vectors are referred to as weights from the kernel while
the weights applied to the hidden state vector which stores the memory from the previous sequence’s
output is referred to as the weights from the recurrent kernel.
All the weights from the kernel and recurrent kernel are initialized with the Glorot normal initializer
(also known as Xavier normal initialization). This function is a truncated normal distribution with mean 0.
The standard deviation (𝜎) is given by:
𝜎 = √2/(𝑖𝑛 + 𝑜𝑢𝑡)
➢ in – number of input units in the weight tensor
➢ out – number of output units in the weight tensor
Glorot normal initializer is a common initializer used in neural networks [37]. If the weights of the
network start too small, the signal going through each layer will shrink and be too small. If the weights
are too large, then the signal going through the weight will grow and be too massive. Glorot normal
initializer is a method commonly used in data science with the aim to have the right size for the weight
distribution, keeping the signal in a reasonable range of values through the layers.
For all weights in the kernel, recurrent kernel and all the biases, we plot the quantity on a log scale. The
first reason for doing so is to deal with the skewness of values from the initialization. The second reason
is to observe if the data follows a power law distribution.
In the following observations, the weights and biases distribution are studied on a macro perspective.
5.1.2.1 Distributions
We plotted and anaysed how the weights and biases for each layer in each model evolved over the 1000
epochs set. In general, we observed that over the epochs, the weights, in both the kernels and recurrent
kernels, and biases spread outwards away from the initialization. The Glorot normal initializer for the
kernels and recurrent kernels flattens outwards with some values favouring the negative values while
others favouring the positive values. The 2 peaks from the initialization of the bias flattened out as well
and converge towards each other.
Page 34
Figure 5.1.2.1.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the first layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 1, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 1, the weight distribution is found to be more symmetrical on both
sides. We observed that the movement of weights from the recurrent kernel is greater than the kernel.
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
A detailed study on the evolution of its distribution can be found in appendix 9.1.1.
We went on to study the weight and bias distribution after 1000 epoch compared across the 3 models.
In each layer, the weights and biases spread out differently for each model. There are no clear
qualitative differences to the model’s weights and biases.
Page 35
Figure 5.1.2.1.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model first LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale
In the first layer of the 3 models, we observed that the weights from the word2vec model is the most
heterogeneous and GloVe was the least heterogeneous model in the kernel and recurrent kernel.
However, the biases for the GloVe model had the widest distribution while fastText had the narrowest
distribution.
A detailed study across each model after 1000 epochs can be found in appendix 9.1.2.
5.1.3 Fundamental Insights From fundamental analysis, the most important insight that we could draw was from the accuracy
graph. The GloVe model responded the best to the LSTM training, obtaining the best result. The most
interesting insight from the graph was the similar trend occurring in the word2vec and GloVe model, but
not for the fastText model. We observed a jump in accuracy for the first 2 models but not for the third
model. This is an interesting phenomenon that we try to study and model in chapter 6.
5.2 Technical Analysis In this section, we will discuss the result from the actual output which users will experience when they
use the chatbot.
5.2.1 Technical Insights At this stage, we want to test the accuracy of our models. Since we are building a model that mimics
intelligence, judging the accuracy of the model based on the numerical prediction would not be the
most accurate. The accuracy in this section is different from the theoretical accuracy mentioned in
section 5.1.1. The accuracy (experimental accuracy) here is a measure of accuracy on its context rather
than the accuracy of individual words.
Page 36
The best way to evaluate an artificial intelligent is to expose it to humans and gather real feedbacks.
However, since the chatbot it fed with too little data, it is still at its infancy development stage. Hence,
we personally evaluated the chatbot.
We loaded a list of 100 questions to the chatbot. The first 50 questions are questions that the chatbot
trained on, while the next 50 questions are questions that the chatbot have never encountered in its
training. We got the respond of each question from the 3 models and will be giving them a score from 1-
3. 3 for the best response given out of the 3 chatbots and 1 for the worst response. The chatbot will be
evaluated equally based on 2 criteria
1. The accuracy of the response
2. How human is the response
The first column gives us the index of the table. ‘Q’ represents question and ‘A’ represents answer. For
instance, ‘Q1’ is the index for question 1 and ‘A1’ is the index for answer 1. In the second column, we
have the questions and answers. The table is structure in such a way where we have the question,
followed by the actual answer given by the dialogue corpus in the next row.
The next 3 columns are the results from the 3 models, word2vec, GloVe and fasText respectively. The
top row, together with the question, gives the response from the model while the bottom row, together
with the answer, gives the score of each model.
Table 5.2.1.1 Truncated table showing the first question and answer pair from trained models and answer provided by the Ubuntu Dialogue Corpus
Question/Answer Response
word2vec GloVe fastText
Q1 what green ?
samantharonson_@ unfortu_nate
unsigned_char HELEN_COONAN_Well
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure required provide least havent anyway yes post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-
exertional post-exertional
do think refer porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A1
usb to ps2 converter came with the mouse in
the box 1 2 3 S
The full results of the responses and scoring is in the appendix, section 9.2.
Recalling the sentend vector that we have implemented in section 3.3, this vector will appear at the end
of the sentences. We could replace these vectors in the Chatbot integration, however, as mentioned in
1.3.2, it is not the focus of this project. As observed from each model, the sentend vector represents the
word:
● l'_Affaire, in the word2vec model
● post-exertional, in the GloVe model
● porn.The, in the fastText model
Page 37
The results generated from the 50,000 training input was not able to generate a sufficiently well trained
model. This made the grading very difficult as most of the answers were incorrect. This was especially so
in the second half of the questions with new questions. We also noticed that for the fastText model,
sentend was found not just at the end of the sentence, but also within the sentence. This is reflected by
the training accuracy, achieving the lowest score out of the 3 models.
One positive result was that we could see that the models have learnt from the training data. The
outputs were not completely random but showed sign of learning. It returned words that were within
the domain of the Ubuntu Dialogue Corpus. For example, its replies had words like ‘Ubuntu’, ‘BIT’ and
computing related subjects.
Despite the difficulty to grade the chatbot, we managed to rank the models. More details on the grading
results can be found in appendix 9.2. In general, we felt that GloVe did the best in terms of replying the
most humanly logical sentences. The scores are summarized in table 5.2.1.2.
Table 5.2.1.2 Technical analysis results for the 3 word embedding models on 100 sample question and answer pairs
word2vec GloVe fastText
Score 129 250 215
Page 38
6 Neuron Activation and Ising Spins We observed an interesting phenomenon during the training whereby there was a spike in accuracy
during the training of the word2vec and GloVe model. In this chapter, we try to give an interpretation to
this phenomenon by drawing an analogy of this phenomenon with a phase transition and modelling the
neuron activations to Ising spins.
6.1 Phase Transition and Ising Model Phase transitions refers to the change in state. Common phase transition we are familiar with includes
vapourisation and melting [38].
We explore a familiar phase transition, melting of ice.
The Helmholtz free energy of a system is given by the following equation:
𝐹 ≡ 𝑈 − 𝑇𝑆
Where F is the Helmholtz free energy, U is the internal energy of the system, T is the absolute
temperature of the surrounds and S is the entropy of the system
At very low temperature, the water is in solid form, ice, as heat is applied to the ice the temperature
increases while the molecules held within a crystal lattice vibrates faster and becomes more energetic.
Once the temperature reaches 0°C or 273.15K, this is a critical temperature where phase transition
occurs. At this point, upon heating the ice, the temperature of the ice does not change. The heating
process causes the entropy and internal energy to increase as temperature remains constant as the ice
starts to melt.
Another famous model is the Ising Model, named after physicist Ernst Ising, is a popular model for
magnetic solids. In this model, each atom has an intrinsic magnetic momentum called spin. This spin can
exist as spin “up”, which conventionally equals to 1 or “down” which equates to -1. It is being widely
used as a toy model to study phase transition behaviors in statistical mechanics. In this model, close to
the critical temperature, the correlation lengths of the system approaches infinite, while the variance
from the distribution of the spin cluster sizes is heterogeneous.
6.2 Model and Results In Figure 5.1.1.1, we saw that there was a spike in accuracy over the epoch during the training. By
drawing analogy to phase transitions, we hypothesize that before the training starts, the neural network
is in the disorder phase, as the weights are randomly initialized. It evolves towards a 'ordered' phase in
which the model has learned the patterns in data and is able to make predictions. We see that an abrupt
change in accuracy exists for word2vec and GloVe, but not for fastText. If the analogy is accurate, the
word2vec and GloVe models has a “critical epoch” of 250 and 150 respectively.
Since neural activations do not have a clear spatial allocation unlike the Ising model, we look at the
distribution of activations to explore the dynamical evolutions of the model training process.
We investigate the activation of the neurons by feeding in the sentend vector. After initialization, the
weights and biases will start distributing outwards. As these neural network layers are governed by a
sigmoid and tanh functions, the activations tend towards 0 and 1 for the sigmoid function, 1 and -1 for
the tanh function.
Page 39
From figure 6.2.1, we have plot the reverse cumulative distribution function (CDF) for the word2vec
activation at the first gate, we observe that the likelihood of occurrence after the zeroth epoch is at the
extreme values of 0 and 1.
Figure 6.2.1 Reverse CDF plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Refer to appendix 9.3.1 for the reverse CDF plots for all the other layers and models.
The 2 states of the activations are similar to that of the up and down spins in the Ising model. Hence, we
try to draw an analogy of the activations to the Ising model. To verify if phase transition occurs, we plot
the variance of the activation from the sentend over 1000 epochs.
From figure 6.2.2, we observe that maximum variance occurs at the region of the “critical epoch” for
LSTM 2 gate 1 in the word2vec model. The same observation can be made in the same layer and same
gate at the “critical epoch” for the GloVe model, illustrated in figure 6.2.4. This suggest to us that phase
transition could occur in LSTM 2 gate 1.
Instead of setting the activation on the sentend vector, we fed the vector for the word ‘java_14’ into the
word2vec and GloVe model. Figure 6.2.3 and figure 6.2.5 shows that a maximum does not occur at the
“critical epoch”.
Page 40
Figure 6.2.2 Variance plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 6.2.3 Variance plot for the word2vec activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation.
Page 41
Figure 6.2.4 Variance plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 6.2.5 Variance plot for the GloVe activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation
Page 42
Plots of variance against epoch at each gate per layer for all other layers can be found in appendix 9.3.2.
From the results, we found that the variance of these activations was not at its maximum during the
abrupt change in accuracy. This was consistent for both word2vec and GloVe model. We do not have
enough evidence to support the hypothesis that the abrupt increase was due to a phase transition.
Page 43
7 Future work and Conclusion As natural language processing is a new field of study, little is known about it. There are still many
uncertainties that experts in the field do not understand. Researchers spend their career understanding
this branch of data science. As someone new to data science, given 1 year to do this project, on top of
school and a part time internship, time was a major limitation in this project. Hence, there are some
major improvements that can be done to improve on the result of this study.
Firstly, more research could be done on the natural language processes and applied to the corpus. In
chapter 2.3, we discussed about the various forms of analysis and mentioned that we have only applied
the most basic natural language processes: tokenization, stop word removal and special character
removal. There are many more methods mentioned in chapter 2.3 which may improve the result of the
output.
Another potential improvement that we could work on would be the word embeddings used. Currently,
we are doing transfer learning, using word embedding trained on other context, Google news for
word2vec and Common Crawl for GloVe and fastText. These are general words trained on a large
corpus, however, were not specific to the topic of interest. For instance, there were many irrelevant
words such as ‘samantharonson_@’, ‘porn.the’ and ‘AP_HOCKEY_NEWS’ that were in the replies of the
chatbot. Training the word embedding on the corpus with only relevant words might produce an
improved result. However, would require a lot more time and computational power.
One of the most critical improvement that is applicable to all data science problem is to train on a larger
dataset. The luxury of a super computing lab was not available and thus the access to computational
resources were limited. We trained the model on a home computer with only 8GB ram. Each model took
approximately 5 days to train and initially, we had to keep retraining the model as we were not sure of
what data we wanted to record. Attempts to use the National Supercomputing Centre Singapore (NSCC)
computer were made. This resource is freely available to NUS students. However, at the start of the
project, there was a waterpipe leakage at their centre, causing their infrastructure to be damage and
most services to be unavailable. Subsequently, we it was back online, we tried to again, however, due to
the complexity of the python libraries, older discontinued libraries such as Theano were used in my
code, we did not have enough time to either change the code or to try and figure out how to get the
libraries installed. We were only able to train 50,000 rows of our full corpus. By training the full corpus,
it will allow the weights to better generalize the model, possibly giving a better result.
The dataset used in this study, the Ubuntu Dialogue Corpus, is meant for users to discuss about
technical issues. A lot of these responses are specific to a particular problem, having specific steps to
solve a particular issue. This is something every difficult to generalize and users will require the exact
steps to solve that specific problem. A ruled based chatbot might be more suitable for this problem
setting. A generative based chatbot might be more situatable for more general question and answer
chatbot at this stage.
In this study, we did the most basic implementation into a user interface. We had to run the code and
talk to the chatbot on the python terminal. In future, when the accuracy of the chatbot is improved with
the suggestions mentioned above, or with other modeling techniques, we could complete the chatbot
by integrating it in a Chatbot user interface, such as on telegram, or even building an entire interface for
it. This might benefit Ubuntu users, providing them the convenience of an instant customer service.
Page 44
The physics intuition to analyse the deep learning models has led us to explore possible evidence of
critical phase transition in the training process. That is, whether the transition of the model from
'unlearned' to 'learned' is a critical phase transition analogous to many physical processes. From the
limited analysis of our work, such evidence is lacking. It could be purely the absence of a 'critical' phase
transition or it could be that the models we employed simply do not perform well in the end, meaning
they have not really reached the 'learned' phase yet. Other models or numerical methods can be applied
to further explore the nature of the evolution process during model training, and that could be future
work.
We hope that in the future, as we conduct more research on the tools required to build the chatbot, we
will have a better understanding on data science and neural networks, with the possibility of marrying
physics theories to accelerate the advancement of artificial intelligence. With a fully functioning
generative based chatbot connected to an open domain, we will be able to achieve what was discussed
in section 1.1 and even more!
Page 45
8 Bibliography [1] EY - Analytics. Retrieved from https://www.ey.com/gl/en/issues/business-environment/ey-
analytics
[2] Wetstein, S. (2017). Designing a Dutch financial chatbot (Masters). VU UNIVERSITY
AMSTERDAM.
[3] Schneider, C. (2017). 10 reasons why AI-powered, automated customer service is the future.
Retrieved from https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-
automated-customer-service-future/
[4] AI for Customer Service | IBM Watson. Retrieved from https://www.ibm.com/watson/ai-
customer-service?cm_mmc=OSocial_Blog-_-
Watson%20and%20Cloud%20Platform_Watson%20Core%20-%20Conversation-_-WW_WW-_-
Landing%20Page&cm_mmca1=000027BD&cm_mmca2=10006919
[5] Avalverde, D. A Brief History of Chatbots. Retrieved from https://pcc.cs.byu.edu/2018/03/26/a-
brief-history-of-chatbots/
[6] Surmenok, P. (2016). Chatbot Architecture. Retrieved from
https://medium.com/@surmenok/chatbot-architecture-496f5bf820ed
[7] Millennium Problems | Clay Mathematics Institute. Retrieved from
http://www.claymath.org/millennium-problems
[8] Mohan, A. T. (2018). A Deep Learning based Approach to Reduced Order Modeling for Turbulent
Flow Control using LSTM Neural Networks. 22.
[9] Wielgosz, M., Skoczeń, A., & Mertik, M. (2017). Using LSTM recurrent neural networks for
monitoring the LHC superconducting magnets.
[10] Frankenfield, J. (2019). Artificial Intelligence (AI). Retrieved from
https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp
[11] AlphaGo | DeepMind. Retrieved from https://deepmind.com/research/alphago/
[12] Sottek, T. (2017). The world’s best Dota 2 players just got destroyed by a killer AI from Elon
Musk’s startup. Retrieved from https://www.theverge.com/2017/8/11/16137388/dota-2-dendi-
open-ai-elon-musk
[13] Machine Learning: What it is and why it matters. Retrieved from
https://www.sas.com/en_sg/insights/analytics/machine-learning.html
[14] Richárd, N. (2018). The differences between Artificial and Biological Neural Networks. Retrieved
from https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-
networks-a8b46db828b7
[15] What is Deep Learning? - Definition from Techopedia. Retrieved from
https://www.techopedia.com/definition/30325/deep-learning
[16] Audacity, T. (2018). Introduction of Deep Learning. Retrieved from
https://medium.com/@techutzpah/introduction-of-deep-learning-e79252bf353a
[17] Mahapatra, S. (2018). Why Deep Learning over Traditional Machine Learning? Retrieved from
https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-
learning-1b6a99177063
[18] Samson, O. (2017). Deep learning weekly piece: the differences between AI, ML, and DL.
Retrieved from https://towardsdatascience.com/deep-learning-weekly-piece-the-differences-
between-ai-ml-and-dl-b6a203b70698
Page 46
[19] Krishnan, S. (2018). Chatbots are cool! A framework using Python. Retrieved from
https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-
7c69af7a7439 [20] Ray, S. (2017). Understanding and coding Neural Networks From Scratch in Python and R.
Retrieved from https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-
in-python-and-r/
[21] Ubuntu IRC Logs. Retrieved from https://irclogs.ubuntu.com/2007/12/12/%23ubuntu.html
[22] Lowe, R., Pow, N., V. Serban†, I., & Pineau, J. (2015). The Ubuntu Dialogue Corpus: A Large
Dataset for Research in Unstructured Multi-Turn Dialogue Systems. School of Computer Science,
McGill University, Montreal, Canada.
[23] Tatman, R. (2017). Ubuntu Dialogue Corpus. Retrieved from
https://www.kaggle.com/rtatman/ubuntu-dialogue-corpus [24] Parrish, A. (2018). Understanding word vectors: A tutorial for "Reading and Writing Electronic
Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0
https://creativecommons.org/choose/zero/, other text released under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/. Retrieved from
https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469
[25] CS 2112/ENGRD 2112 Fall 2018. Retrieved from
http://www.cs.cornell.edu/courses/cs2112/2018fa/lectures/lecture.html?id=parsing [26] NSS. (2017). Intuitive Understanding of Word Embeddings: Count Vectors to Word2Vec.
Retrieved from https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-
word2veec/
[27] Ruder, S. (2016). An overview of word embeddings and their connection to distributional
semantic models - AYLIEN. Retrieved from http://blog.aylien.com/overview-word-embeddings-
history-word2vec-cbow-glove/
[28] Heidenreich, H. (2018). Introduction to Word Embeddings | Hunter Heidenreich. Retrieved from
http://hunterheidenreich.com/blog/intro-to-word-embeddings/
[29] Google Code Archive - Long-term storage for Google Code Project Hosting. (2019). Retrieved
from https://code.google.com/archive/p/word2vec/
[30] Banerjee, S. (2018). Word2Vec — a baby step in Deep Learning but a giant leap towards Natural
Language Processing. Retrieved from https://medium.com/explore-artificial-
intelligence/word2vec-a-baby-step-in-deep-learning-but-a-giant-leap-towards-natural-language-
processing-40fe4e8602ba
[31] Socher, R., D. Manning, C., & Pennington, J. (2019). GloVe: Global Vectors for Word
Representation. Computer Science Department, Stanford University, Stanford, CA 94305.
[32] Rajasekharan, A. (2017). What is the main difference between word2vec and fastText?.
Retrieved from https://www.quora.com/What-is-the-main-difference-between-word2vec-and-
fastText
[33] Selivanov, D. (2015). GloVe vs word2vec revisited. · Data Science notes. Retrieved from
http://dsnotes.com/post/glove-enwiki/
[34] Goyal, P. (2017). What is the difference between LSTM, RNN and sequence to sequence?.
Retrieved from https://www.quora.com/What-is-the-difference-between-LSTM-RNN-and-
sequence-to-sequence
Page 47
[35] Banerjee, S. (2018). An Introduction to Recurrent Neural Networks. Retrieved from
https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-
networks-72c97bf0912
[36] Olah, C. (2015). Understanding LSTM Networks. Retrieved from
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[37] Jones, A. An Explanation of Xavier Initialization. Retrieved from
http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
[38] Eastman, P. (2014). 6. Phase Transitions — Introduction to Statistical Mechanics. Retrieved from
https://web.stanford.edu/~peastman/statmech/phasetransitions.html
[39] Shrimal, S. (2017). The Semicolon. Retrieved from https://github.com/shreyans29/thesemicolon
[40] Building a Chatbot: analysis & limitations of modern platforms | Tryolabs Blog. (2017). Retrieved
from https://tryolabs.com/blog/2017/01/25/building-a-chatbot-analysis--limitations-of-modern-
platforms/
[41] Baroni, M., Dinu, G., & Kruszewski, G. (2019). Don’t count, predict! A systematic comparison of
context-counting vs. context-predicting semantic vectors. Retrieved from
http://clic.cimec.unitn.it/marco/publications/acl2014/baroni-etal-countpredict-acl2014.pdf
[42] Drakes, W. Information Retrieval: CHAPTER 8: STEMMING ALGORITHMS. Retrieved from
http://orion.lcg.ufrj.br/Dr.Dobbs/books/book5/chap08.htm
Page 48
9 Appendix
9.1 Weight and Bias Results As discussed in 5.3 about the weights and biases, we look at the plots of the results in detail.
9.1.1 Evolution over 1000 epochs In this section, we plot the kernel, recurrent kernel and bias individual LSTM layers for each model
across 1000 epochs. We can study how the weights and biases evolves in each layer.
9.1.1.1 Word2vec
Figure 9.1.1.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the first layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 1, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 1, the weight distribution is found to be more symmetrical on both
sides. We observed that the movement of weights from the recurrent kernel is greater than the kernel.
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 49
Figure 9.1.1.1.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the second layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 2, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 2, the weight distribution also not symmetrical and is also biased
more towards the negative values. We observed that the movement of weights from the recurrent
kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the second layer as
compared to the first layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 50
Figure 9.1.1.1.3 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the third layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 3, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 3, the weight distribution also not symmetrical and is also biased
more towards the negative values. We observed that the movement of weights from the recurrent
kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the third layer as compared
to the second layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 51
Figure.9.1.1.1.4 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the fourth layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 4, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 4, the weight distribution also not symmetrical, however the
weights are observed to be biased more towards positive. We observed that the movement of weights
from the recurrent kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the fourth layer as
compared to the third layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out. The distinction between the 2 peaks were not observable by the end of the training
Page 52
9.1.1.2 GloVe
Figure 9.1.1.2.1 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the first layer of the GloVe network, we observer the weights redistributing away from the center
where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 1, the weight distribution is rather symmetrical on both positive and
negative sides. For recurrent kernel 1, the weight distribution not symmetrical and is also biased more
towards the negative values. We observed that the movement of weights from the recurrent kernel is
greater than the kernel.
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 53
Figure 9.1.1.2.2 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the second layer of the network, we observer the weights redistributing away from the center
where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution less is vigorous as the
epochs increase. For kernel 2, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 2, the weight distribution not symmetrical and is also biased more
towards the negative values. We observed that the movement of weights from the kernel is greater
than the recurrent kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the second layer as
compared to the first layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 54
Figure 9.1.1.2.3 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the third layer of the GloVe network, we observer the weights redistributing away from the center
where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution less is vigorous as the
epochs increase. For kernel 3, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 3, the weight distribution also not symmetrical, however the
weights are observed to be biased more towards positive values. We observed that the rate which the
weights distributes for the kernel and recurrent kernel is similar.
The weights for both kernel and recurrent kernel spread out more rapidly in the third layer as compared
to the second layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 55
Figure 9.1.1.2.4 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the fourth layer of the GloVe network, we observer the weights redistributing away from the center
where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 4, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 4, the weight distribution also not symmetrical, however the
weights are observed to be biased more towards positive. We observed that the movement of weights
from the recurrent kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the fourth layer as
compared to the third layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out. The distinction between the 2 peaks were not observable by the end of the training
Page 56
9.1.1.3 fastText
Figure 9.1.1.3.1 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the first layer of the fastText network, we observer the weights redistributing away from the center
where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 1, the weight distribution is rather symmetrical on both positive and
negative sides. For recurrent kernel 1, the weight distribution also symmetrical on both positive and
negative sides. We observed that the movement of weights from the recurrent kernel is greater than
the kernel.
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 57
Figure 9.1.1.3.2 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the second layer of the fastText network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 2, the weight distribution is not symmetrical and is biased more towards the
negative values. For recurrent kernel 2, the weight distribution not symmetrical and is also biased more
towards the negative values. We observed that the movement of weights from the kernel is greater
than the recurrent kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the second layer as
compared to the first layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 58
Figure 9.1.1.3.3 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the third layer of the word2vec network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 3, the weight distribution is not symmetrical and is biased more towards the
positive values. For recurrent kernel 3, the weight distribution I rather symmetrically distributed on both
sides. We observed that the movement of weights from the recurrent kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the third layer as compared
to the second layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out.
Page 59
Figure 9.1.1.3.4 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale
For the fourth layer of the fastText network, we observer the weights redistributing away from the
center where the values of the weights were close to 0.
The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the
epochs increase. For kernel 4, the weight distribution is not symmetrical and is biased more towards the
positive values. For recurrent kernel 4, the weight distribution also not symmetrical and the weights are
observed to be biased more towards positive. We observed that the movement of weights from the
recurrent kernel is greater than the kernel.
The weights for both kernel and recurrent kernel spread out more rapidly in the fourth layer as
compared to the third layer
The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing
outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the
biases spreads out. The distinction between the 2 peaks were not observable by the end of the training
Page 60
9.1.2 Model Comparison In this section, we plot the kernel, recurrent kernel and bias individual LSTM layers on the 1000 epoch
across the 3 different models. We can study how the weights and bias differs for each model at the end
of the training.
Figure 9.1.2.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model first LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale
In the first layer of the 3 models, we observed that the weights from the word2vec model is the most
heterogeneous while the GloVe model was the least heterogeneous in the kernel and recurrent kernel.
However, the biases for the GloVe model had the widest distribution while fastText had the narrowest
distribution.
Page 61
Figure 9.1.2.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model second LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale
In the second layer of the 3 models, we observed that the weights from all models distributed almost
similar in the kernel. The distribution for GloVe and fastText is similar in the recurrent kernel, whereas
the word2vec model is slightly less distributed in the positive values. The bias distribution for all 3
models is very similar.
Page 62
Figure 9.1.2.3 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model third LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale
In the third layer of the 3 models, we observed that the weights from all models distributed almost
similar in the kernel and recurrent kernel. The bias distribution for all 3 models is also very similar.
Page 63
Figure 9.1.2.4 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model fourth LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale
In the fourth layer of the 3 models, we observed that the weights in the kernel from fastText model is
the most heterogeneous on the positive values while word2vec model is the most heterogeneous on the
negative values. GloVe had the narrowest distribution in both the positive and negative distribution. The
weight distribution in the recurrent kernel is the widest for word2vec and the narrowest for the glove
model. The bias distribution for GloVe model was the widest in the positive values and fastText model
was the widest in the negative values
9.2 Technical Results In this section, we have the responses from the 3 different chatbot models for 100 questions. The first
50 questions are questions that the chatbot was trained on while the next 50 are questions from the
Ubuntu Dialog Corpus that the chatbot was not trained on. These 100 question and answers were
randomized.
The first column gives us the index of the table. ‘Q’ represents question and ‘A’ represents answer. For
instance, ‘Q1’ is the index for question 1 and ‘A1’ is the index for answer 1. In the second column, we
have the questions and answers. The table is structure in such a way where we have the question,
followed by the actual answer given by the dialog corpus in the next row.
The next 3 columns are the results from the 3 models, word2vec, GloVe and fasText respectively. The
top row, together with the question, gives the response from the model while the bottom row, together
with the answer, gives the score of each model. This score is evaluated based on accuracy of answer and
how human the sentence is, 3 being the best and 1 being the worst score.
Page 64
Table 9.2.1 Questions and answers from trained models and answer provided by the Ubuntu Dialogue Corpus
Question/Answer Response
word2vec GloVe fastText
Q1 what green ?
samantharonson_@ unfortu_nate
unsigned_char HELEN_COONAN_Well
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure required provide least havent anyway yes post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do think refer porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A1
usb to ps2 converter came with the mouse in
the box 1 2 3 S
Q2 yes
Stupid_stupid Circumstances_dictate Circumstances_dictate Circumstances_dictate Circumstances_dictate
reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if if if if check required return charters charters charters
policyIf policyIf policyIf post-exertional post-exertional
do do reistall reistall seex porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A2 i would try putting them
there 2 3 1 S
Q3 is it okay to just hand out
Ubuntu CDs?
Indexing_Options RUSH_Wait reallllly tried
could install anyway Indexing_Options
slackware astonishing_Siemionow
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there can that able need files use use fill either windows
charters post-exertional post-exertional post-exertional
think try use see MicroXP thing nt do
don troubleshooting use
multi-monitors AMplz intellipoint
porn.The R
A3
absolutely what is the output of: wget -O alsa-info.sh http://www.alsa-project.org/alsa-info.sh; chmod +x ./alsa-info.sh;
./alsa-info.sh the command makes a link if you select to upload to
the server ;) ok and what is the issue? 2 1 3 S
Q4
....yeah, thats where it showed me 36 updates
but it wont do them
RUSH_Okay Circumstances_dictate etc_fstab Stupid_stupid anyway want DBX_files anyway VIGELAND_Well
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there if install install open file not you check installed
amyone post-exertional post-exertional post-exertional
post-exertional
but BrowseSystem xwindows xshell .Xr
BrowseSystem directory.Click
bsdinstall porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 65
A4
Find the one that says 'NVIDIA accelerated
graphics driver (version xxx) [Recommended]'. Click it then click the
'Activate' button. You can't install updates in a command line 'sudo apt-
get update' then 'sudo apt-get upgrade' what
they said, only you have to use sudo What error?
the command is 'sudo apt-get update' YOu've
got an Nvidia card, it should work fine.
YOu've got to install these updates Just copy this and paste it into a terminal: sudo apt-get update then this: sudo apt-get upgrade do you
have synaptic open? gimme a sec sudo pkill apt then sudo apt-get
upgrade 1 3 2 S
Q5 yup
RUSH_Okay dpa_cb Pfft 1Gig_DIMM
Circumstances_dictate CNA_fa
Circumstances_dictate inlets_ponds_creeks inlets_ponds_creeks
opera_Elaine_Padmore opera_Elaine_Padmore
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there file required required change login can check
required required return number charters post-
exertional post-exertional
but .Xr reistall reistall reistall
reistall seex porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A5 s/thread/wik entry/ 2 3 1 S
Q6
how do i get the image on the usb stick or you
mean usb cdrom
Stupid_stupid tried jolie 1Gig_DIMM MediaSmart
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure double-check register post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
onesClick daysdetails wasnt porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A6
usb cdrom would be easy, usb flash didn't
work for me. https://help.ubuntu.com/community/Installation
/FromUSBStick 1 3 2 S
Page 66
Q7 you got the script run?
RUSH_Okay Stupid_stupid WHat
equaling_Vince_Lombardi wmf psd
Ali_Farokhmanesh_gutsy I'l back l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure setup directly post-exertional login if address post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
but brought install windoze reistall do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A7 ya, this is givin gme hell- sorry. thanks for trying.. 0 0 0 S
Q8 Didn't work on my pc.
Just doesn't start
yes get let allow reallllly could might l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if exceptional install that post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do showed modules.dep geting
diconnecting getting do try version.Click
porn.The porn.The porn.The porn.The porn.The porn.The R
A8
hmm that doesnt make sense well, if you can't get TVTime, my next
choice would be KDE... XawTV.. as noted by
someone above, works, but its ugly 2 1 3 S
Q9
-> in generality, fdisk /dev/sddX and follow
the menu -> i have already given you to command for using
debootstrap
GLENN_RENWICK Chariton_Newspapers
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there router codes policyIf give you anyone can post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do xubuntu porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A9
Yes, okay, thanks. Can you tell me the command for
debootstrap again, as I had a shower, and
mibbit made your text dissappear! 1 3 2 S
Q10
i visited the samba site, but I cant find anything
under downloads
Stupid_stupid rar nt reallllly vm use
PMB_Portable_software asus l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure run especially id know possible who required required
can charters post-exertional post-exertional post-exertional
post-exertional
thing thats dont AMtrying jaunt linuxes ut2k4
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A10
try sudo apt-get upgrade samba that is the same version as yours where
are you located? (to select the right
download server) 1 3 2 S
Page 67
Q11
can I do that from here ... hmmmm hold on how can I do that from here ... I can access the other
files
Pfft drives_CDs_DVDs Interrogation_techniques
going ascii polish_turd going install Ehh l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure input that use post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do text-mode dvd-drive xwindows
pkgtool run make picClick but bash-
script have porn.The porn.The porn.The porn.The R
A11
you can do 'sudo cp /etc/network/interfaces /etc/network/interfaces.bak' and then do 'sudo
cp /media/OTHER_INSTALL/etc/network/interfaces
/etc/network/interfaces' 2 3 1 S
Q12
compiz will not let me drag to another
workspace this way work I found where its set:
System Settings > Window Behavior >
Advanced > Placement: Centered see I have no
system settings anywhere
Stupid_stupid redhat associate_Vyacheslav_Sok
olenko JEFF_BURTON_Well
Defragment l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
couldnt information post-exertional login - system 12
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
umm installed reistall easylist
cpuburn porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A12 I said that was for KDE...
not gnome 1 2 3 S
Q13
uncomplicated firewall for the less ambitios.
'sudo ufw enable' should be done on all macines directly on the internet
alien translates between rpm and deb
know alien @_bbc.co.uk. guess BACHMANN_Well is
Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
know that email configure install needed that would
would login need post-exertional post-exertional
post-exertional post-exertional
think alien AMSuch doubletalk vimperator
Slashcode win2k3 stepgen rtos -gui
porn.The porn.The porn.The porn.The
porn.The R
A13
i know what alien is, he wrote 'rvm' which can be
lvm or rpm 3 1 2 S
Q14
once you resolve the name to the IP you can
start a TCP connection to the actual server, if your system ALREADY knows
the resolve it will not need the internet to resolve it and will be
faste your ISP will provide a DNS but if you
want to use different ones then thats you call
RUSH_Okay Men_Seeded_Winners
System_Configuration_Utility could Elena_Nola
usr_sbin Personality_wise l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if once work hav there concerned policyIf post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
stuffClick use use try run porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A14
Seveas so I'll just take that as a 'yes', I am using Google's DNS instead of my ISPs? :) I am planning on doing the local DNS cache as you suggested as well, but just looking
to find out if the DNS 1 2 3 S
Page 68
actually uses Googles since I'm behind a router
Q15
well... lets just say that i get a big table, and its
aaaaall zeroes
Stupid_stupid try subservient_housewives
Stupid_stupid Chrome_#.#.###.###
1Gig_DIMM see anyway Dell_TrueMobile install
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there nt get again post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do think much probedisk try one
thing thing thing get ndisgtk mA.
xwindows xubuntu porn.The R
A15
I recommend you try putting that disk back
into the other computer :) 1 2 3 S
Q16
That's awesome - Anyhow... how would I
get 'x' to start via terminal?
Pfft would l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure would that post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do thats thing do thing porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A16 startx 1 3 2 S
Q17
geez i keep thinking i should start playing
WoW.. it's easy.. friends play it, lol.. but i'd drop too much time into it at
least FFXI was a challenge
Oh txt_file MediaSmart l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
ok needs he check can information you work example get either opportunity provide
provide post-exertional
yeah windoze xwindows one so
cant dont dont withstar stuffClick
nvidia do computer diconnecting
porn.The R
A17
WoW is a cash cow now. Leveling is easy, but fun. They do a decent job of
providing content for casual play and hardcore folks. I'm just too much
of an addictive personality to keep any
balance 2 3 1 S
Q18
nope !!! --- I am trying to remember my steps,
but , the main thing is to be able to modify the
fstab
Stupid_stupid going MediaSmart txt_file LTO4
Persistent_rumors l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there would required log reinstall policyIf installed post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do system see porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A18 im waiting to your
solution 1 3 2 S
Page 69
Q19 Whatever you do don't
do that ;)
Oh Calverts_disappearance speaker_Koksal_Toptan
reallllly try might JON_ALPERT
Someway_somehow things
PETER_COSTELLO_Well going want KDE_desktops
l'_Affaire l'_Affaire
sure concerned need numbers well if unit installed policyIf
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thats installed bucks.Click four
comupter google more thats being but do try install install porn.The R
A19
lol ok, I'll be sure not to where do you
reccomend searching? 1 2 3 S
Q20
how to open the other system then. from
network in nautlius?
Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure falan computer work computer provide installed
operate logging post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
onesClick daysdetails trying
do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A20
you'd need to share the filesystems out via
NFS/Samba/SSHFS etc etc 1 3 2 S
Q21
no. when i brows for a connection on my
domain (localdm) i get this error: Browsing for service type _rfb._tcp in domain localdm failed:
Timeout reached
oh ACD_ACD_Systems wanta Which_brings
Cal_JJ_Arrington l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure well nt display to fill there can icono if can if interface ip
whether
do HWInfo bits.A porn.The porn.The
honest work Henyman work thing porn.The
porn.The porn.The porn.The porn.The R
A21
@wigren If this question is below your
level, please do not take offense, as I've found
that such questions can sometimes be worth
asking: You have enabled VNC on the desktop,
correct? @wigren I'm
gonna go refill my coffee. Feel free to send me a personal message,
so I don't miss it. 1 3 2 S
Q22 wierd is an
understatement :P
http yeah datamart hurdled_tackler
Matthieu_Humery_head line lol still ought
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
http if well documents post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
http but do go thing electical drive
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A22 outrageously wierd?.. 1 2 3 S
Page 70
Q23
sudo grep -r -i HTTP /etc/apt/* any
response?
Stupid_stupid charset_= CNA_fa l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure 3 entries post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do 2 objectshidden4fun porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A23 yeah. the source.list and
2 entries in apt.conf 1 3 2 S
Q24
anwy way to get it to display automatically
what do i need to add to make it display the
options of safemode, memtest, and windows?
Oh Stupid_stupid Following
inlets_ponds_creeks l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there sure back next registrations if post-exertional cvs present charters vista able
specific required post-exertional
think nt work problems work
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A24
i'll post you a default menu.lst to pastebin so
you can just look for yourselfe what you
need. 1 2 2 S
Q25 reboot and let me know
Oh http_ftp Tee_hee really think Ovidiu_Rom
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure possible use able anyway did upload one time
registration could post-exertional post-exertional
post-exertional post-exertional
umm daysdetails do Well and thing work
work one most porn.The porn.The porn.The porn.The
porn.The R
A25
? Ok. Login ok. nvidia installed. Back to
original issue from 5 hours ago! Turning on
Desktop Cube/Rotate in Hardy resets Effects from 'Extra' to 'None' give up
yet? (FYI - nvidia installed via Envy) It all
work is Fiesty and Gutsy You crack me up!
Thanks. Resolved. It was working before you had
me reinstall xserver :) but I appreciate the
attemp 1 2 3 S
Q26
No matter what the operating system is, you HAVE to find a program to backup Or it comes with the distribution, ubuntu in fact does
come with everything you need to back things up If you do not like that
approach, you can search for a solution
oh os bullsh_* look l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure there laptop if check you would wont not that that nt
possible post-exertional post-exertional
do debians withstar do daysdetails see
oh mean thing porn.The porn.The porn.The porn.The porn.The porn.The R
Page 71
more to your liking just as you would on any
other systeme
A26
No they shouldn't, there's no law in the
government to say 'NO OPERATING SYSTEM
WILL HAVE A BACKUP UTILITY' 1 3 2 S
Q27 Yep Should I undo that
or change it or anything?
Stupid_stupid Stupid_stupid might
Dominic_Rhodes_plow webmail_interface ftp calvin redhat see come
want l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if qredit card executables check would there that policyIf post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do porn.The porn.The porn.The porn.The comupter porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A27
can you paste the results of 'ifconfig -a' to
pastes.ubuntu.com, and send me the link? i
wouldn't change it right now 1 2 3 S
Q28 ActionParsnip i dont care
how i fix it lol
Stupid_stupid samantharonson_@ do
RUSH_Okay insisted l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure can work delete would server need check else post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thing PMhay navigational use
OS.Click thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A28
slighly vague on that cause i haven't had to since probably back
when i had teh mx 440 ActionParsnip has a point about nvidia-
settings 1 3 2 S
Q29 yeah, i figured that... :)
yeah docx_file Contrasting_textures downloaden anyway
going butched i_dunno think ought see l'_Affaire
l'_Affaire l'_Affaire l'_Affaire
there tll post-exertional post-exertional not 12 config output
please able required (250) post-exertional post-exertional
post-exertional
thats with windos glibc replyGetting Vrythramax would <pre facility.Read
porn.The porn.The porn.The porn.The porn.The porn.The R
A29 btw, whats the shortcut
to the terminal? 2 1 3 S
Q30
ok, i think these are connected, except the
video driver case maybe, but it is also possible
Stupid_stupid 1Gig_DIMM AP_HOCKEY_NEWS l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure know needed amyone sumtink information login
there can required can using that post-exertional post-
do think put send the porn.The
porn.The porn.The porn.The porn.The R
Page 72
that that is the source... l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
exertional porn.The porn.The porn.The porn.The
porn.The
A30
do the freezes occur when certain apps are
running? have you tried to xkill them? 1 3 2 S
Q31
, yes merodent , rahul_ , same point, grub> (cant
use quit/exit)
Stupid_stupid go reinstalling_Windows
Sooooo opera_Elaine_Padmore
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there updated log code post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
jobClick page1.htm server.domain.com
tpye porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A31 quick ban razer freespire
yuck 1 2 3 S
Q32
but I dont quite understand how the raid
could be the problem here?
Pfft supose try horsepower_Hemi_engine
contactmusic_quoted msn_messenger
Oh_c'mon os findout Chrome_#.#.###.### Dear_Bossy l'_Affaire
l'_Affaire l'_Affaire l'_Affaire
if check post-exertional amyone get xp can that delete
one does if either required post-exertional
do sudo pkgtool can onWikianswers
Quotecan linuxes koneksi idle-timeout run
porn.The porn.The porn.The porn.The
porn.The R
A32
because grub cant detect the raid-array if it is not activated with dmraid
(what the alternate-cd is doing) maybe give it a try? if it is not working
you can use usb-flash or buy a ide-drive good luck 2 3 1 S
Q33 same reliance net
connect
@_kiranchetrycnn_@ 5.x. is l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there required file can file policyIf post-exertional post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
ok terminal winmodems
connection PMfeel ping porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A33 ah ok 1 2 3 S
Q34
you cant have got it more right, thats exactly how it is :) if you check
disk usage now - you should see a lot more freespace for /home :)
RUSH_Okay vm AdContext really see decided might could
might might might get anyway l'_Affaire
l'_Affaire
sure vista happen that default to install need would that
there ubuntu either applications post-exertional
but getting waay mHotspot gnu-linux porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 73
A34
that's a thing of beauty then , coz i can still just
as easily look at my music 'close by' without
having to navigate to /media/multimedia/music everytime indeed this
is the acid test 1 2 3 S
Q35
g-ma meant to be a gud modeller, but also no
renderer afaik
anyway wanted let ought presumptious andrea
SADY happeneing anyway l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
give try even specified specific that possible specific example one program post-exertional
post-exertional post-exertional post-exertional
give try anything thats thing thing
cperl-mode stuffClick thing
things thing notesLoading
pkgtool PMmight porn.The R
A35
give kpovmodeler a try, perhaps the povray manual has quite an
amusing example of a raytracer written in
povray's SDL ;) 2 3 1 S
Q36 o okay i just go
permission denied
Stupid_stupid courtney_cox
spokesman_Eitan_Bencuya RUSH_Okay could
www.geospaceinc.com expected
Fox_affiliate_WUPW l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there required policyIf post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
daysdetails porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A36
sudo wpa_passphrase mywireless_ssid
'secretpassphrase' > /etc/wpa_supplicant.con
f 1 3 2 S
Q37 : then?
anyway see going Laughing.
Circumstances_dictate reallllly MediaSmart
reallllly Which_brings reallllly
Dominic_Rhodes_plow l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if as sawed shown files hang post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
think loadlin thing cfset use
kpowersave dpkg-divert install do system-config-
network timesync connection little
porn.The porn.The R
A37 seperate hard drive.. 1 3 2 S
Q38
'Response: 500 OOPS: vsftpd: cannot locate
user specified in 'guest_username':$USER
'
RUSH_Okay associate_Vyacheslav_Sok
olenko really really n'est_pas l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if one there server so error away fixed nt nid returned
return specified policyIf post-exertional
think daysdetails thing fc18 install
line--a cued porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 74
A38
well, it seems that you have to specify an user...
like 'virtual' or something maybe it will
help you http://archives.neohapsis.com/archives/openbsd/2005-12/0755.html this one surelly will help you, but its quite the same i
told you :) http://howto.gumph.org/content/setup-virtual-
users-and-directories-in-vsftpd/ you will need to enable the local users, i
guess 1 3 2 S
Q39 bash: /dev/dsp: Device
or resource busy
soooooooo SSPX_bishops l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure dirs file information ip given be there there required
file can L8S post-exertional post-exertional
thats porn.The utili- porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A39 lol 2 1 3 S
Q40 iso 10.10 desktop
want using anyway l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
not 64bit command file up create display to possible
required post-exertional post-exertional post-exertional
post-exertional post-exertional
Puppylinux sorta withstar tv.Click
puter. killdisk porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A40
32bit? 32bit? and what program are you using to
burn? 2 3 1 S
Q41 that is odd then
Pfft msn_messenger Circumstances_dictate
TM_Distribution_Reinvestment ls intefere WHat
stuff l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure that use able post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do isant but do install porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A41
i know i can't figure it out lol it's been driving
me a little crazy 2 3 1 S
Q42 you have 2 entries for
that repo?
RUSH_Okay like refreshing GIF_BMP
setup.exe VKernel_virtual ought l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure computer when mount computer nor if need get
elsewhere post-exertional post-exertional post-exertional post-exertional post-exertional
but stuff think thing put .msstyles
objectshidden4fun but windows.Click porn.The drivers.A porn.The porn.The porn.The porn.The R
A42 no 1 3 2 S
Page 75
Q43 Is there a master list of
all PPA's?
Indexing_Options RUSH_Okay
reinstall_Windows_XP Tweak_UI see l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there there there changed nt installed throughout place
whether others login login nt one post-exertional
think mesa-utils install.Click mac-like dont thing put sub-
CPU but facility.Read
porn.The porn.The porn.The porn.The
porn.The R
A43
I think there is a search mechanism, perhaps a directory, let me check 1 2 3 S
Q44
i wanna trasfer the repos to my pc.. coz my friends
done have direct internet connection..
Stupid_stupid vm dont do don'tknow aregoing
another l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure if check login everyone there amyone post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
thing do more http Rapidleech
porn.The cpuburn porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A44 so you don't need to login via ssh for that 1 3 2 S
Q45
:) but the most strange.. i needed to copy the boot of pendrive to / of my
usb hd. hahaha it means the pendrive is not necessary.. excet
because i will not need to install de the grub in the mbr when i start my
ubuntu, my screen freezy on 'configuring Network interface' i
need to press control alt del to skip it, and under
it i see the message about rcS and rc 6. i
want my linux skip this step otherwise i will
always need to press ctrl alt del, do u know how to fix it ? or did u hear about it before ? oh..
just rcS failed to call or something.. but
ok i will restart and i will tell it to u just wait i got it nothing there, when i press ctrl alt del.. just show rcs killed by
sign terminal.. and rc6 .. the same message well
the time is default 120 and i have a dhcp
connection. i think it could be my wireless
even i wait .. this is step
Pfft tried_unsuccessfully want DVRS
www.geospaceinc.com bring courtney_cox i_dunno want see teeny_tiny_lapse
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if interface number post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
think ummm Postsuser got install
work porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 76
never finish or skip. i really need to press ctrl
alt del so, what can i do ? not sure, its a vaio notebook ah one is
modem other ethernet.. yes the wireless is other really ? cant i skip this
step on init.d ? where do u mean exctally? system
admin network or network tools ? wireless has a - wired has a v and modem - if i disable the wired.. i will not have
internet wireless connection is roaming
mode enable. is it a problem ? please could u
look to it ? http://img120.imageshack.us/my.php?image=ikozu1.png oh shit im stupid sorry.. just feeling stupid
but need i disable that enable roaing mode ?
ok.. lets see what happens thank u very
much. u are genius :** :) i was just reading and im
tired already, i need some coffee.. u need to take a break :) notebook can use agp video too ? or in the case mine is a nvidia it must be pci ?
A45 1 3 2 S
Q46 toshiba satellite a100-
797
think really sudo sudo search see sudo
www.ci.ankeny.ia.us O'REILLY_OK l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
would work sudo sudo hidd -- search not sudo hidd where
information know post-exertional post-exertional
thing thing sudo sudo hidd -- search
mean sudo usbmount -- search
sorry porn.The porn.The R
A46
the only thing that works for me is when I do 'sudo
sudo hidd --search I mean (sudo hidd --
search) sorry 1 3 2 S
Q47 that's the kernel stuff
Pfft spokesman_Hugo_Eriksse
n 7_Upgrade_Advisor escapist_fiction capping
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure would make concerned continue either post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do thats provide snap.do installed
near feel.Click porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 77
A47
true /etc/initramfs-tools/scripts/init-bottom
is empty :-( 1 2 3 S
Q48
same command? still same effect. waits like 1.5 seconds and then
new command line pops up
@_kiranchetrycnn_@ www.janus.com run
Continental.com really l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there use did click access post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
ok ask ping.exe thing dofile disable most backup.tar.gz
file porn.The porn.The porn.The porn.The porn.The
porn.The R
A48 try it with the -v switch 1 3 2 S
Q49 in the same file? No luck
with adding that line
imma going said l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure windows work indicate them required either check
need disable parameters post-exertional post-exertional
post-exertional post-exertional
umm operated question umm
PreservedFish do icon do but
operated regard thing porn.The
porn.The porn.The R
A49
Did you restart the pam and sshd processes before trying login? 2 3 1 S
Q50
shredder only wipe 'files', not free space i
think
Oh CNA_fa Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
ok as possible ask nt anyway there able setup anytime
either change post-exertional post-exertional post-exertional
ok installing install work bucks.Click
install opsys install porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A50 ok... well tehre probably are other programs too .. 1 2 3 S
Q51 the buggyness of the distro is fully on topic
anyway try Vyke_Pro try AMERICAblog_Joe_Sudba
y Beckloff_exchanged_pleasantries txt_file get might
go l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there required changes which there whether we can alot
following post-exertional post-exertional post-exertional
post-exertional post-exertional
yes removepkg reistall looking linux
grphics Gigabit-Ethernet base-
installer installed being porn.The
porn.The porn.The porn.The porn.The R
A51
No, this is a channel for support, not discussions
about Ubuntu 1 3 2 S
Q52 it doesnt state the time...
Stupid_stupid http://www.cbrands.com
upp install ktre.com go l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if would connections post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
but onesClick porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A52 1 3 2 S
Page 78
Q53 BOOOOOOOYAAAAAAA
HHHHHHHH!
RUSH_Okay Circumstances_dictate Circumstances_dictate Circumstances_dictate Circumstances_dictate Dominic_Rhodes_plow Dominic_Rhodes_plow
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
login installed count computers computer check additional file log able post-exertional post-exertional
post-exertional post-exertional post-exertional
thats onWikianswers
drives.Click components.Click
battery.Click comptuers porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A53 You scared me. Get soemthing to work? 1 3 2 S
Q54
can someone help me print from a wireless printer using 9.04?
Pfft don'tI'll Kimi_RÄIKKÖNEN
_Yeah Um_um Sooooo COACH_FISHER get automatic_debits By_Azenith_Smith
RUSH_Okay Pajama_Jeans Earn_Rs.####/day_workin
g l'_Affaire l'_Affaire l'_Affaire
sure amyone up work that reset error corresponding specified specified there
apache timestamps there check
do but use Several got get get but
probably one try install do put
porn.The R
A54
is the wireless printer already set up?? Does it have an ip and can you
ping it? 1 2 3 S
Q55
Bug report time on a Breezy>Dapper update! I
did 3 'sudo apt-get -f install's to get things
going, once from an 'x-less' environment,
because the installation screwed up 'x'. In the
process: 1) My sound no longer works. (I have an nVidia card) 2) I had to
re-install a few programs that I had running before, they just
disappeared. To put it mildly... whoever said
that Dapper Drake was ready to release was not
looking at life the way that I l it.
soooooooo burke gma l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
ok if past installed tutorials use need need install if sure post-
exertional post-exertional post-exertional post-exertional
ok nt isant thing nt use use do think
porn.The porn.The porn.The porn.The porn.The porn.The R
A55
it might be better off the CD. I did an update to Breezy that way and X broke but otherwise it
wasn't too bad. Dapper broke _everything_ but
that was beta2 1 3 2 S
Page 79
Q56 oh god. my frickin ex gf
called.
Oh Tru##_Unix Circumstances_dictate
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure network file twice checked releaseDoors post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do network work porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A56 kick her to the curb 3 1 2 S
Q57
Hello folks, please help me a bit with the
following sentence: 'Order here your
personal photos or videos.' - I think the only allowed version is 'Order your personal videos or photos here.', but I'm
not sure, are you? Did I choose a bad channel? I ask because you seem to
be dumb like windows user
shutup problem Easynews l'_Affaire l'_Affaire try gussy might l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure error but post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do problem thats files. porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A57
the second sentence is better english and we
are not dumb 1 3 2 S
Q58
Evening All Is this the best place for new
converts from Windoze to Linux (Ubuntu 8.10)?
yeah +_kerry_kerry Saintula um_ah l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there anytime software reinstall re-install re-install uninstalled anything there know service wont if post-exertional post-exertional
thats isant see Contributor82 stuff
thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A58 yes 3 1 2 S
Q59
so i won't be able to use perl rename to do what I
want?
Oh ughh tar.gz tar.gz anyway reallllly os_x Stupid_stupid would
would reallllly l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there required post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do use operated porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A59 1 3 2 S
Q60
how to transfer files (250 GB) from my pc and my
new laptop? how to transfer files (250 GB) from my pc to my new
laptop?
Stupid_stupid TRUMP_Well vm imma
want docx_format really courtney_cox
news.bbc.co.uk l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure falan register post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
onesClick daysdetails IO-APIC threshold fast-boot porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A60
not by getting kicked from the channel, I
suppose. 1 3 2 S
Q61
where i can i find docs on how to upgrade
ubuntu from warty to hoary?
try AEGIS_LTD Debian_Sarge might could
attempting l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if see that anyone either post-exertional membership post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do AWizzArd say deskstop software
puter. things computer Thanks
put three porn.The R
Page 80
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
post-exertional post-exertional porn.The porn.The porn.The
A61
sudo gedit /etc/apt/sources.list and change all instances of warty to hoary; sudo
apt-get update and then sudo apt-get dist-
upgrade should do it 1 3 2 S
Q62 !pinning | zanberdo
!guidelines | cryptopsy
computor nt spacewalking_tourist get
get Necth_described l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
login our can rego possible provide your 21 access ie etc install post-exertional post-exertional post-exertional
thats MicroXP probedisk win2K isapnp porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A62
/topic works too - I find it more conveniant than copy/pasting. What say
you? 1 3 2 S
Q63
Can anyone recommend an NFS alternative that works transparently?
Pfft try Erland_E._Kailbourne
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure post-exertional additional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do graphcs run porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A63 once you get NFS set up,
it IS transparent 1 2 3 S
Q64 PLEASE WOMENS
Pfft reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure check what post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thats qbittorrent so-called porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A64 /join #ubuntu-es 1 2 3 S
Q65
Question guys, I know it's probably in the
manual some where but I'm too lazy to look, where abouts is the
setting to make the icons on the desktop smaller?
yeah later Bryan_Hahnfeldt
sober_Boole_added come outrunning_defenders
maybe logoff there'sa_silver_lining line
RUSH_Okay nifty_wraparound nifty_wraparound l'_Affaire l'_Affaire
sure example provide use need installed not provide disk need
if post-exertional post-exertional post-exertional
post-exertional
but rstrui.exe stuff windoze MAME32
perl-base nmap configs. -tf apt-
pinning apt-pinning porn.The porn.The porn.The porn.The R
A65 can't you alt + middle click and resize them? 1 2 3 S
Page 81
Q66 hello everybody i've a
simple problem
shutup Er_um http_ftp courtney_cox Pfft try
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure able there log post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do Kanasz winmx and porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A66 which is? 1 3 2 S
Q67
Hi all, I want to install the guest extensions of
virtualbox in ubuntu 10.04 but they dont
work
Pfft move ftp docx_file let wayyy JACKIE_CAMERON want l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if set there fill either post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do one picClick ALL-IN-WONDER should downloud MicroXP
get install bucks.Click ubuntu component booting re-install porn.The R
A67
: Please join #ubuntu+1 for Lucid/10.4
support/discussion. 1 2 3 S
Q68
please help me i movie player display not fine where i go for better
performance
Pfft jolly_japes l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure kde abiword post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thats kpowersave bt5 PMOhhhhhh
daysdetails -l. umm fuck.Click apt-get
seex porn.The porn.The porn.The porn.The porn.The R
A68 what GPU do you have Graphis Processing Unit 2 3 1 S
Q69
huh, PPA sounds sweet! who all is getting in the
beta?
Sooooo Lorem_ipsum want think want think noscript really another
Sooooo might Auslogics_Disk_Defrag
might Circumstances_dictate
l'_Affaire
sure restart install needed that able anyway use post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
heheClick being use tvout oh do
porn.The dont porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A69
it would be really awesome if it
automatically generated packages for all
distributions based on the source code 2 3 1 S
Q70
Hey Pidgin won't connect. I type my email and password but it says it's incorrect. but when i try to sign in though the website it works fine! :s
Er_um use get try exe_files gigapixel_images
use l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if immediately package work should post-exertional post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
thing thing kpowersave linuxes
uses porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A70
i am a user too, but i think you also may know
there is also a #pidgin channel 3 2 1 S
Q71 That doesn't work for me
Pfft gma l'_Affaire Ovidiu_Rom l'_Affaire
l'_Affaire l'_Affaire
sure there that check install well that that policyIf post-exertional post-exertional
do thats put fix porn.The porn.The porn.The porn.The R
Page 82
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
post-exertional post-exertional post-exertional post-exertional
porn.The porn.The porn.The porn.The porn.The porn.The
porn.The
A71 1 2 3 S
Q72
do i need to defrag my harddisk first on
windows first before i install ubuntu?
Sooooo really fucking_retarded think
iFuntastic gcc eh supose Wash_rinse zac
Stupid_stupid ought gma norman l'_Affaire
there able novatel policyIf post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do comouter think think thing laptop.My
xwindows most classpnp.sys nt
core2quad objectshidden4fun
thing problems.More
porn.The R
A72
nah its fine. but if you want to, go on you dont need to defrag. but do you mean partition? 3 2 2 S
Q73
i know this question will send red-flags a mile
high, but i can't seem to get it working on my own. I'm trying to get
aircrack-ng to work, and when i follow the
howto's, it keeps telling we that it can't put my
atheros card can't be put in monitor mode. I've
personally put this card in master mode to use as
a WAP, so i know it SHOUDL be able to do
this also.. is there something ubunt-specific that is goofing me here?
Stupid_stupid TERRY_FRANCONA_Well
@_kiranchetrycnn_@ maybe l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure if working will there upgrade provide that possible
that there regarding count post-exertional post-exertional
thing do thing 233mhz drbdadm but can probedisk Seems operate put
sometimes.A webbrowsing
graphical porn.The R
A73 ndiswrapper? 1 3 2 S
Q74
It's easy if install first windows 7 and after it
ubuntu. So as last is ubuntu it find all other
operating files to startup.
Stupid_stupid want would docx_file Kinda_sucks
gma l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if but there able going post-exertional install post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
but winmx porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A74
yes windows lost the bootloader or whatever it is but using liveCD it
can be re-enabled 1 3 2 S
Q75
what time will jaunty-ubuntu mostly be
released
samantharonson_@ reinstall_Windows_XP upgrade wayyy let gma
works Australian_sheepskin_boo
ts l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure able provide changes that functions possible well check
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do say think thing thing thing think instlled use thing
but classpnp.sys put objectshidden4fun
porn.The R
Page 83
l'_Affaire
A75 ^^^^^ 1 3 2 S
Q76
sorry subcool whats a light DM without compiz that will still have game
capability?
Stupid_stupid going Sooooo docx_format
Digital_Negative_DNG like nmap wmf
detrimentally_affects RUSH_Okay l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if before linux provide Linux hr post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do quite pyNeighborhood
thing pkgfile operate AMttt
besides porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A76 XFCE is also pretty
lightweight 1 3 2 S
Q77
hi i would like to upgrade to 11.04 from 10.04 but
update-manager -d propose me only 10.10
should i change my preference to testing or
something like that ?
Pfft hollyrpeete_@ think charset_= Inbox_folder let want l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
if get well you that need change allow user generate provide provide able post-exertional post-exertional
do but other Powertop provide
try use comments.Click use AMPlz xshell make
AMttt make porn.The R
A77
Also, natty/11.04 support/discussion is
only in #ubuntu+1 1 2 3 S
Q78 hey, I just changed the
cd drive AND it worked!
Er_um ftp regional_energo
monetarism_adopted think hahah
SCOTT_THISTLE_covers etc_fstab
Circumstances_dictate gma MammoView_® aingersoll@gfherald.com
get ought l'_Affaire
if would post-exertional post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thing daysdetails use to umm
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A78 hehe 1 2 3 S
Q79
Help! If I type in 'sudo depmod -a' nothing
happends. It asks me for the sudo password, I
enter it, and then I just get the regular
command prompt. Should I not get some
data from that command?
Alright_alright associate_Vyacheslav_Sokolenko l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure 06794 post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
ltsp5 fugured gonaads -----signature-----
一般討論
freespire install onesClick porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A79 no 1 2 3 S
Q80
its my dads, ive never used it before. IT works for him so i think its all
set up
làwan't make Ovidiu_Rom l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire
Ovidiu_Rom l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure well portable either go there that 06095 post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
AMttt use install sets.Click thing
onesClick ntfs get of é Â數
minutesAdd thats velineon
circuit.Click porn.The R
Page 84
A80 1 3 2 S
Q81
i'm trying to tftp to a router. it asks me to
specify a port....which port do i use?
Stupid_stupid sanctuary_DellaBianca
About_generica_viagra_viagra Golly_gee
RUSH_Okay think l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
sure either be able help there you that sure sudo return install option grep post-
exertional
thing porn.The more more
troublshooting being xubuntu
multimedia bitTorrent network computer wireless
ping traps.The porn.The R
A81 tftp:69 ftp:21 1 2 3 S
Q82 its any soft to free ram?
làYeah_uh_huh get wayyy smug_git want
site_http://www.lockheedmartin.com files Ehh l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure reinstall file click be application card required
information able vista required nt that post-exertional
AMttt install porn.The zoodad withstar windowz
linpus put objectshidden4fun
do porn.The porn.The porn.The porn.The porn.The R
A82
free ram i wasted ram managin ram is the
kernels job 1 3 2 S
Q83 thanks :)
Er_um Circumstances_dictate
reallllly Dominic_Rhodes_plow Dominic_Rhodes_plow opera_Elaine_Padmore opera_Elaine_Padmore
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure config be possible required code post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do thing Edonkey dpkg hexchat use
happens.Can porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A83 1 3 2 S
Q84
i need to send a mail when my system starts. but since init/d scripting is beyond me i want to
know if there is a prgram where you can put in yout mail, smtp server and have that program
startup with the complete system
Stupid_stupid 1Gig_DIMM try might WRT##G MediaBlvd_Did Pfft
anyway happiest_camper Appreciable_downside_ris
ks docx_file like courtney_cox l'_Affaire
l'_Affaire
sure error amyone post-exertional post-exertional
post-exertional post-exertional install install additional work install post-exertional post-exertional post-exertional
thing reistall norecoil DoomFrost
install xwindows porn.The
driver.Click porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A84 1 3 2 S
Q85 thanks
Er_um Circumstances_dictate
reallllly Dominic_Rhodes_plow Dominic_Rhodes_plow opera_Elaine_Padmore opera_Elaine_Padmore
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure if check required check check log required charters
post-exertional post-exertional charters charters policyIf post-
exertional
do reistall reistall seex seex porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A85 1 3 2 S
Page 85
Q86
how to clean remove wine and app that
installs.... i try remove with 'apt-get purge wine' but my apps that install on wine still there... any
help please :).
Stupid_stupid f_cked ctrl_alt l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure falan password access login once access before
before should configure specify install post-exertional post-
exertional
onesClick daysdetails Hdds install iso-image
remotley porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
A86
they are usually stored in $HOME/.wine so the
package-manager hasn't anything to do with it 1 3 2 S
Q87 how can I install mercury messanger on ubuntu?
Stupid_stupid want dn't Spencer_Troian_Bellisario QuickTime_#.#.#_update
want Raikhan_Daffa l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure specify policyIf post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
onesClick operated Oviously porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A87
you need to follow the directions: it's tricky, but
you can if i remember right you need to do some sneaky stuff to enable the sys tray 1 3 2 S
Q88
Ubutntu 9.04 beta: compiz: missing
transparency effeccts on all windows. everything
else in compiz works great. any ideas? intel
graphics
try might l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there either get executable you using asap icq charters post-
exertional post-exertional post-exertional post-exertional post-exertional post-exertional
but AMReport Add-ADCentralAccessPolicyMember Install TheBigDawg non-
gui apt-get sites.Use CARRYOVER ActSection
Subdivision So-called Subdivision
Subdivision porn.The R
A88 #ubuntu+1 3 2 1 S
Q89
o man sorry i have to much in my head forget
some time's the rules for all channels i'm in.
Stupid_stupid think tummy_pooch try anyway
wmf ##.#.#_Leopard loooooove cheapy this
would =_strlen system##_folder
etc_fstab l'_Affaire
there if linux return there work using post-exertional post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional
post-exertional
daysdetails linpus Thats say onesClick
Quotecan do RemotelyAnywhere
use use system -r1.11 problems
harddrive porn.The R
A89 1 3 2 S
Q90
Is something going on? All of my mini.iso installs are failing on the linux-generic package. They were working before,
but it's been a few weeks since I needed to install
Ubuntu.
Indexing_Options going Wally_Pipped
Ovidiu_Rom l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there need there ms post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
think get poweriso use use thing thing
run porn.The porn.The porn.The porn.The porn.The porn.The porn.The R
Page 86
A90
the linux-generic various meta packages have
been updated, but the actual packages aren't up yet (or they weren't
earlier) 1 2 3 S
Q91
http://pastebin.com/d74f8dc93 i guese thats what you want no ?
Stupid_stupid want want try would persuade
Penny_Musgraves wayyy l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
there outside when folder shortly login one that access
login corp post-exertional post-exertional post-exertional
post-exertional
do porn.The lol files equipt. porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A91 lspci | grep vga 1 3 2 S
Q92 ah, ok.
Oh Circumstances_dictate reallllly l'_Affaire l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure well stored post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thats use thing somewhat kinda do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A92 1 3 2 S
Q93 no the text based
oh document try sheâ easily_hackable
eeny_meeny_miny_moe would
Kimi_RÄIKKÖNEN_Yeah system##_folder friend_Dion_Mial would
filesystem come l'_Affaire l'_Affaire
sure work try post-exertional policyIf policyIf register
whether address following possible login ptr that post-
exertional
do reistall tftpboot say software called servers.A but ran
porn.The porn.The porn.The porn.The porn.The porn.The R
A93
ok , i have intel onboard vga, with hdmi output,
how do i use it? 1 3 2 S
Q94
is there any easy way to update the refresh rate to 72 instead of 60 or
something, ubuntu isn't auto-detecting it
Indexing_Options RUSH_Okay might will
id_=_#####,## eeny_meeny_miny_moe
ought want do BSODs anyway think l'_Affaire
l'_Affaire l'_Affaire
there there see check ask specific specified
corresponding sudo hostname not configure not possible
post-exertional
think mesa-utils reistall porn.The
porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A94 LinuxJones check
XF86Config-4 1 3 2 S
Q95
look at the download page without live
desktop and the gui grub.cfg
anyway reallllly RUSH_Okay l'_Affaire
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
if not same post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do run say say to script.Click textpad locate costs.Learn porn.The porn.The porn.The porn.The porn.The porn.The R
A95 2 3 1 S
Q96
Is it possible to disable the graphics card in my AGP slot and run from my onboard graphics without removing the
Indexing_Options RUSH_Wait reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
there can well he ubuntu either example access can
provide using install required work post-exertional
think try use but make nt drivers
porn.The porn.The porn.The porn.The porn.The porn.The R
Page 87
card? l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
porn.The porn.The
A96
Probably. Look at the BIOS for your computer,
not here. (!) 1 2 3 S
Q97
Hello Does Ubuntu have somekind of register to configure applications and os settings? Hello
Does Ubuntu have somekind of register to configure applications
and os settings?
shutup reinstall @_kiranchetrycnn_@
skeeved Circumstances_dictate
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure disk delete filename go post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
do problem ummm go systems.One
stuff thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The
porn.The R
A97
Linux doesn't have an equivalent to the Windows registry 1 2 3 S
Q98 !schedule october?
RUSH_Okay seemed might mount A.It_s really
l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
l'_Affaire
login available there before mil register 64 required update
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
thats stuff debian PMup apt-get apt-get devx.sfs have
scsi porn.The porn.The porn.The porn.The porn.The
porn.The R
A98 right, october. 6.10 (2006, 10th month) 3 2 1 S
Q99
any OS is better than windows All microshit employees should be
stoned to death
hollyrpeete_@ Pixelated Homepages
aunt_Philomena_McCann Labor_frontbencher_Tany
a_Plibersek daughter_Tomasita
supose Rightfully O'Reirdan Bert_Brantley
Kazinform_cites_RIA_Novosti
Opposition_Leader_Mario_Dumont could l'_Affaire
l'_Affaire
sure install lspci check computer because failure there there sudo fix post-exertional post-exertional
post-exertional post-exertional
yes being go fugured errorClick crapClick ( reistall
questionsClick onesClick
AMSOOOO thing xcircuit cable-
connected porn.The R
A99 please stop 1 2 3 S
Q100
oh, ok. It seems a project that dates long time ago. Never heard about. Thx
Oh tar.gz MediaSmart l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire
sure get able L8S post-exertional post-exertional
post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional
post-exertional
do but windows get do have bucks.Click make thing ie use
porn.The porn.The porn.The porn.The R
A100 1 3 2 S
Total Score 129 250 215
Page 88
9.3 Activation Analysis We analyse the reverse cumulative distribution function (CDF) and variance for activations of the 3
models when the sentend vector is fed into the LSTM network. The peak occurred approximately at
around the 260 epoch for word2vec, 150 epoch for GloVe and fastText.
9.3.1 Reverse-CDF We plotted the reverse CDF for the 3 models for each gate in each layer at interval of 100 epochs.
Similar results were found in all 3 models.
From the graphs, apart from the 0th epoch, all the graphs have a close to flat increase from 0 to 1
(sigmoid function) or -1 to 1(tanh function). The likelihood that the activations take values of either 0 or
1(sigmoid function) or -1 to 1(tanh function) is almost 1. This means that almost all the values are close
to either 0 or 1(sigmoid function) or -1 to 1(tanh function)
word2vec
Figure 9.3.1 Reverse CDF plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervasl.
Page 89
Figure 9.3.1.2 Reverse CDF plot for the word2vec activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.1.3 Reverse CDF plot for the word2vec activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 90
Figure 9.3.1.4 Reverse CDF plot for the word2vec activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
GloVe
Figure 9.3.5 Reverse CDF plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 91
Figure 9.3.6 Reverse CDF plot for the GloVe activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.7 Reverse CDF plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 92
Figure 9.3.8 Reverse CDF plot for the GloVe activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
fastText
Figure 9.3.9 Reverse CDF plot for the fastText activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 93
Figure 9.3.20 Reverse CDF plot for the fastText activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.31 Reverse CDF plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 94
Figure 9.3.42 Reverse CDF plot for the fastText activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
9.3.2 Activation Variance For each model, we plot the variance of the neuron activations against the epoch at all gates and layers.
The activations are calculated by feeding the activation equation with the sentend vector.
If a phase transition were to occur, a maximum variance should occur approximately at the “critical
epoch” of 260 for word2vec and the “critical epoch” of 150 for GloVe. From figure 9.3.2.1 and figure
9.3.2.6, LSTM 2 Gate 1 for both word2vec and GloVe model showed signs of phase transition as the
maximum point is happening around the “critical epoch” region.
We changed the vector fed into the activation equation to the word ‘java_14’. From figure 9.3.2.2 and
9.3.2.7, the maximum variance was not occurring near the “critical epoch” region. Hence, this suggests
that having the maximum variance at the critical epoch was coincidental and there is no indication of a
phase transition
Page 95
word2vec (spike at approximately 260)
Figure 9.3.5.1 Variance plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch interval.
Figure 9.3.6.2 Variance plot for the word2vec activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation.
Page 96
Figure 9.3.2.3 Variance plot for the word2vec activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.2.4 Variance plot for the word2vec activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 97
Figure 9.3.2.5 Variance plot for the word2vec activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervasl.
GloVe (Spike at approximately 150 epoch)
Figure 9.3.2.6 Variance plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 98
Figure 9.3.2.7 Variance plot for the GloVe activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation
Figure 9.3.2.8 Variance plot for the GloVe activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 99
Figure 9.3.2.9 Variance plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.2.10 Variance plot for the GloVe activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 100
fastText (no spike)
Figure 9.3.2.11 Variance plot for the fastText activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.2.12 Variance plot for the fastText activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 101
Figure 9.3.2.13 Variance plot for the fastText activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Figure 9.3.2.14 Variance plot for the fastText activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.
Page 102
9.4 Codes
http://www.planetb.ca/syntax-highlight-word
9.4.1 Chatbot Building Data Preparation
1. import csv
2. import numpy as np
3. from pprint import pprint
4.
5.
6. # read csv file on python
7. filepath = "dialogueText_301.csv"
8.
9.
10. with open(filepath, "r", encoding='latin1') as f:
11. reader = csv.reader(f)
12. data = [line for line in reader]
13.
14. # read only first 50000 rows
15. df=data[1:50000]
16.
17.
18. # grouping all the next neighbour text together
19. def grouped(df):
20. lst1 = []
21. lst = []
22. for i in range(len(df)):
23. if i in lst:
24. continue
25. else:
26. for j in range(i+1, len(df)):
27. if df[i][3] == df[j][3]:
28. lst.append(j)
29. df[i][-1] += ' ' + df[j][-1]
30. if df[i] not in lst1:
31. lst1.append(df[i])
32. else:
33. if df[j] not in lst1:
34. lst1.append(df[j])
35. break
36. return lst1
37. group = grouped(df)
38. #group by ID
39. ID = []
40. for i in group:
41. g = []
42. g.append(i[1])
43. g.append(i[5])
44. ID.append(g)
45.
46.
47.
Page 103
48. # only chat output list by id
49. values = sorted(set(map(lambda x:x[0], ID)))
50. chat = [[y[1] for y in ID if y[0]==x] for x in values]
51.
52.
53.
54. #group into pairs
55. l = []
56. for i in chat:
57. if len(i)%2==0:
58. paired = zip(i[:-1:2], i[1::2])
59. l.append(list(paired))
60. else:
61. odd_i = i
62. odd_i.append(' ')
63. paired = zip(odd_i[:-1:2], i[1::2])
64. l.append(list(paired))
65. f = open('extract_301.pickle','wb')
66. pickle.dump(l,f)
67. f.close()
NLP pipeline
1. import csv
2. import numpy as np
3. import pandas as pd
4. from pprint import pprint
5.
6. from prep import l
7. import pickle
8.
9.
10. import nltk
11.
12. import re
13. import string
14. from nltk.tokenize import word_tokenize, sent_tokenize
15. from nltk.corpus import stopwords
16. from nltk.stem import PorterStemmer
17.
18. default_stemmer = PorterStemmer()
19. default_stopwords = stopwords.words('english')
20. def clean_text(text, ):
21.
22. def tokenize_text(text):
23. return [w for s in sent_tokenize(text) for w in word_tokenize(s)]
24.
25. def remove_special_characters(text, characters=string.punctuation.replace('-
', '')):
26. tokens = tokenize_text(text)
27. pattern = re.compile('[{}]'.format(re.escape(characters)))
28. return ' '.join(filter(None, [pattern.sub('', t) for t in tokens]))
29.
30.
Page 104
31. def remove_stopwords(text, stop_words=default_stopwords):
32. tokens = [w for w in tokenize_text(text) if w not in stop_words]
33. return ' '.join(tokens)
34.
35. text = text.strip(' ') # strip whitespaces
36. text = text.lower() # lowercase
37. #text = stem_text(text) # stemming
38. text = remove_special_characters(text) # remove punctuation and symbols
39. text = remove_stopwords(text) # remove stopwords
40. #text.strip(' ') # strip whitespaces again?
41.
42. return text
43.
44.
45. text=l
46.
47.
48.
49. clean = []
50. for i in text:
51. y = []
52. for j in i:
53. for k in j:
54. x = clean_text(k)
55. y.append(x)
56. paired = zip(y[:-1:2], y[1::2])
57. clean.append(list(paired))
58.
59.
60. f = open('w2v50000.pickle','wb')
61. pickle.dump(clean,f)
62. f.close()
For the following codes, the same process is repeated for word2vec, GloVe and fastText. Hence, only
the code for the implementation of word2vec is attached
word2vec embedding
1. # download gensim environment
2. import gensim
3. from gensim.models import KeyedVectors
4. import numpy as np
5. from pprint import pprint
6. import pickle
7.
8.
9. model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',
10. encoding='utf-8',
11. binary=True)
12.
13. f = open('clean50000.pickle','rb')
14. clean =pickle.load(f)
15.
16.
Page 105
17.
18. count=0
19. question=[]
20. answer=[]
21.
22. for x in clean:
23. for t in x:
24. for u in t:
25. count+=1
26. if count%2==1:
27. question.append(u)
28. else:
29. answer.append(u)
30.
31. split_question = [item.split(" ") for item in question]
32.
33. split_answer = [item.split(" ") for item in answer]
34.
35. vocab = model.vocab.keys()
36. question_vectors=[]
37.
38. for sent in split_question:
39. sentvec = [model[w] for w in sent if w in model.vocab]
40. question_vectors.append(sentvec)
41.
42.
43.
44. answer_vectors=[]
45.
46. for sent in split_answer:
47. sentvec = [model[w] for w in sent if w in model.vocab]
48. answer_vectors.append(sentvec)
49.
50.
51. ### Return a new array of given shape and type, filled with ones.
52.
53. sentend=np.ones((300,),dtype=np.float32)
54.
55. for tok_sent in question_vectors:
56. tok_sent[14:] = []
57. tok_sent.append(sentend)
58.
59. for tok_sent in question_vectors:
60. if len(tok_sent) < 15:
61. for i in range(15 - len(tok_sent)):
62. tok_sent.append(sentend)
63.
64. for tok_sent in answer_vectors:
65. tok_sent[14:] = []
66. tok_sent.append(sentend)
67.
68. for tok_sent in answer_vectors:
69. if len(tok_sent) < 15:
70. for i in range(15 - len(tok_sent)):
71. tok_sent.append(sentend)
72.
73. f = open('w2v5000.pickle','wb')
Page 106
74. pickle.dump([question_vectors,answer_vectors],f)
75. f.close()
Training embeding on LSTM model
1. import pickle
2. import numpy as np
3. from keras.models import Sequential
4. from keras.layers.recurrent import LSTM, SimpleRNN
5. import theano
6. from keras.callbacks import CSVLogger
7.
8. theano.config.optimizer = "None"
9.
10. f = open('w2v50000.pickle','rb')
11. question_vectors,answer_vectors=pickle.load(f)
12.
13. x_train = np.array(question_vectors)
14. y_train = np.array(answer_vectors)
15.
16. model=Sequential()
17. model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init
='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
18. model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init
='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
19. model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init
='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
20. model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init
='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
21. model.compile(loss='cosine_proximity', optimizer='adam', metrics=['accuracy'])
22.
23. csv_logger = CSVLogger('w2vlog.csv', append=True, separator=';')
24. model.fit(x_train, y_train, nb_epoch=0, callbacks=[csv_logger])
25.
26. model.save('w2vLSTM0.h5');
27. model.fit(x_train, y_train, nb_epoch=2, callbacks=[csv_logger])
28.
29. model.save('w2vLSTM100.h5');
30. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
31.
32. model.save('w2vLSTM200.h5');
33. model.fit(x_train, y_train, nb_epoch=100 ,callbacks=[csv_logger])
34.
35. model.save('w2vLSTM300.h5');
36. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
37.
38. model.save('w2vLSTM400.h5');
39. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
40.
41. model.save('w2vLSTM500.h5');
42. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
43.
44. model.save('w2vLSTM600.h5');
45. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
46.
47. model.save('w2vLSTM700.h5');
Page 107
48. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
49.
50. model.save('w2vLSTM800.h5');
51. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
52.
53. model.save('w2vLSTM900.h5');
54. model.fit(x_train, y_train, nb_epoch=100, callbacks=[csv_logger])
55.
56. model.save('w2vLSTM1000.h5');
57. predictions=model.predict(x_test)
58. f = open('predictions','wb')
59. pickle.dump([predictions],f)
60. f.close()
Simple implementation into chatbot
Implementation only on pycharm interface to retrieve results generated by model
1. import numpy as np
2. from keras.models import load_model
3. from gensim.models import KeyedVectors
4. import gensim
5. import nltk
6. from keras.models import load_model
7.
8. import theano
9. theano.config.optimizer="None"
10.
11.
12.
13.
14. model=load_model('w2vLSTM1000.h5')
15. mod = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',
16. encoding='utf-8',
17. binary=True)
18.
19. while (True):
20. x = input("Enter the question:");
21. sentend = np.ones((300,), dtype=np.float32)
22.
23. sent = nltk.word_tokenize(x.lower())
24. sentvec = [mod[w] for w in sent if w in mod.vocab]
25.
26. sentvec[14:] = []
27. sentvec.append(sentend)
28. if len(sentvec) < 15:
29. for i in range(15 - len(sentvec)):
30. sentvec.append(sentend)
31. sentvec = np.array([sentvec])
32.
33. predictions = model.predict(sentvec)
34. outputlist = [mod.most_similar([predictions[0][i]])[0][0] for i in range(15)]
35. output = ' '.join(outputlist)
36. print (output)
Page 108
9.4.2 Analysis of results Randomisation of 100 questions and expected answers
Code is for extracting first 50 questions from trained data. Similar steps can be done to extract the next
50 questions.
1. import pickle
2. import numpy
3. from pprint import pprint
4. import random
5. import csv
6.
7. f = open('extract_301.pickle','rb')
8. l = pickle.load(f)
9.
10. count=0
11. question=[]
12. answer=[]
13. for x in l:
14. for t in x:
15. for u in t:
16. count+=1
17. if count%2==1:
18. question.append(u)
19. else:
20. answer.append(u)
21.
22. random.seed(42)
23. random.shuffle(question)
24. random.seed(42)
25. random.shuffle(answer)
26. randomq = (question[:50])
27. randoma = (answer[:50])
28.
29.
30. csvfile = "randomq_301.xls"
31. with open(csvfile,"w") as output:
32. writer = csv.writer(output, lineterminator='\n')
33. for word in randomq:
34. writer.writerow([word])
35. output.close()
36.
37. csvfile = "randoma_301.xls"
38. with open(csvfile,"w") as output:
39. writer = csv.writer(output, lineterminator='\n')
40. for word in randoma:
41. writer.writerow([word])
42. output.close()
word2vec model answers
1. import numpy as np
2. import pandas as pd
3. from gensim.models import KeyedVectors
4. import nltk
Page 109
5. from keras.models import load_model
6.
7.
8. import theano
9. theano.config.optimizer="None"
10.
11. model=load_model('w2vLSTM1000.h5')
12. mod = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',
13. encoding='utf-8',
14. binary=True)
15.
16. df=pd.read_csv(r'randomq_301.xls', header=None)
17. df=df.rename({0:'questions'}, axis=1)
18.
19.
20.
21. def function(x):
22. sentend = np.ones((300,), dtype=np.float32)
23. sent = nltk.word_tokenize(x.lower())
24. sentvec = [mod[w] for w in sent if w in mod.vocab]
25.
26. sentvec[14:] = []
27. sentvec.append(sentend)
28. if len(sentvec) < 15:
29. for i in range(15 - len(sentvec)):
30. sentvec.append(sentend)
31. sentvec = np.array([sentvec])
32.
33. predictions = model.predict(sentvec)
34. outputlist = [mod.most_similar([predictions[0][i]])[0][0] for i in range(15)]
35. output = ' '.join(outputlist)
36. return (output)
37.
38.
39.
40. df['answers'] = df['questions'].apply(function)
41.
42. df.to_csv('w2vqna_301.csv')
9.4.3 Analysis of model parameters Plotting loss fuction and accuracy from training over 1000 epochs
1. import pandas as pd
2. import numpy as np
3. import matplotlib.pyplot as plt
4.
5. df=pd.read_csv("w2vlog.csv")
6. df=df.iloc[:,0].str.split(';', expand=True)
7. df=df.rename(index=str, columns={0: "epoch1", 1: "acc",2: "loss"})
8. df=df.astype(float)
9. df = df.reset_index()
10. df["epoch"]=df["index"]
11. df=df.astype(float)
Page 110
12. df.plot(kind='line',x= 'epoch',y='acc',color='red')
13. plt.xlabel("epoch")
14. plt.show()
15.
16. df.plot(kind='line',x='epoch',y='loss',color='blue')
17. plt.show()
Extracting the weights from h5 file stored in the keras file for each 100 epoch
1. import numpy as np
2. from keras.models import load_model
3. import csv
4. import pickle
5.
6. model0=load_model('w2vLSTM0.h5')
7. model1=load_model('w2vLSTM100.h5')
8. model2=load_model('w2vLSTM200.h5')
9. model3=load_model('w2vLSTM300.h5')
10. model4=load_model('w2vLSTM400.h5')
11. model5=load_model('w2vLSTM500.h5')
12. model6=load_model('w2vLSTM600.h5')
13. model7=load_model('w2vLSTM700.h5')
14. model8=load_model('w2vLSTM800.h5')
15. model9=load_model('w2vLSTM900.h5')
16. model10=load_model('w2vLSTM1000.h5')
17.
18.
19. weights0 = model0.get_weights()
20. weights1 = model1.get_weights()
21. weights2 = model2.get_weights()
22. weights3 = model3.get_weights()
23. weights4 = model4.get_weights()
24. weights5 = model5.get_weights()
25. weights6 = model6.get_weights()
26. weights7 = model7.get_weights()
27. weights8 = model8.get_weights()
28. weights9 = model9.get_weights()
29. weights10 = model10.get_weights()
30.
31.
32.
33.
34. f= open('file0.pickle','wb')
35. pickle.dump(weights0,f)
36. f.close()
37. f= open('file1.pickle','wb')
38. pickle.dump(weights1,f)
39. f.close()
40. f= open('file2.pickle','wb')
41. pickle.dump(weights2,f)
42. f.close()
43. f= open('file3.pickle','wb')
Page 111
44. pickle.dump(weights3,f)
45. f.close()
46. f= open('file4.pickle','wb')
47. pickle.dump(weights4,f)
48. f.close()
49. f= open('file5.pickle','wb')
50. pickle.dump(weights5,f)
51. f.close()
52. f= open('file6.pickle','wb')
53. pickle.dump(weights6,f)
54. f.close()
55. f= open('file7.pickle','wb')
56. pickle.dump(weights7,f)
57. f.close()
58. f= open('file8.pickle','wb')
59. pickle.dump(weights8,f)
60. f.close()
61. f= open('file9.pickle','wb')
62. pickle.dump(weights9,f)
63. f.close()
64. f= open('file10.pickle','wb')
65. pickle.dump(weights10,f)
66. f.close()
Plotting of kernel, recurrent kernel and bias for the first layer of the LSTM network. Similar code can be
derived to plot the kernel, recurrent kernel and bias of the second, third and fourth layer.
1. import numpy as np
2. import pandas as pd
3. import matplotlib.pyplot as plt
4. import pickle
5. f= open('file0.pickle','rb')
6.
7. x0 =pickle.load(f)
8. A=x0[0].flatten()
9.
10.
11. f= open('file1.pickle','rb')
12. x1 =pickle.load(f)
13. a=x1[0].flatten()
14.
15. f= open('file2.pickle','rb')
16. x2 =pickle.load(f)
17. b=x2[0].flatten()
18.
19. f= open('file3.pickle','rb')
20. x3 =pickle.load(f)
21. c=x3[0].flatten()
22.
23. f= open('file4.pickle','rb')
24. x4 =pickle.load(f)
25. d=x4[0].flatten()
26.
27. f= open('file5.pickle','rb')
28. x5 =pickle.load(f)
29. e=x5[0].flatten()
Page 112
30.
31. f= open('file6.pickle','rb')
32. x6 =pickle.load(f)
33. z=x6[0].flatten()
34.
35. f= open('file7.pickle','rb')
36. x7 =pickle.load(f)
37. g=x7[0].flatten()
38.
39. f= open('file8.pickle','rb')
40. x8 =pickle.load(f)
41. h=x8[0].flatten()
42.
43. f= open('file9.pickle','rb')
44. x9 =pickle.load(f)
45. i=x9[0].flatten()
46.
47. f= open('file10.pickle','rb')
48. x10 =pickle.load(f)
49. j=x10[0].flatten()
50.
51. import math
52. intervali = np.linspace(-8, 8, 100)
53. yA, xA = np.histogram(A, bins=intervali)
54. ya, xa = np.histogram(a, bins=intervali)
55. yb, xb = np.histogram(b, bins=intervali)
56. yc, xc = np.histogram(c, bins=intervali)
57. yd, xd = np.histogram(d, bins=intervali)
58. ye, xe = np.histogram(e, bins=intervali)
59. yf, xf = np.histogram(z, bins=intervali)
60. yg, xg = np.histogram(g, bins=intervali)
61. yh, xh = np.histogram(h, bins=intervali)
62. yi, xi = np.histogram(i, bins=intervali)
63. yj, xj = np.histogram(j, bins=intervali)
64.
65. plt.plot(xA[:-1], yA,label="0",color='K')
66. plt.plot(xa[:-1], ya,label="100")
67. plt.plot(xb[:-1], yb,label="200")
68. plt.plot(xc[:-1], yc,label="300")
69. plt.plot(xd[:-1], yd,label="400")
70. plt.plot(xe[:-1], ye,label="500")
71. plt.plot(xf[:-1], yf,label="600")
72. plt.plot(xg[:-1], yg,label="700")
73. plt.plot(xh[:-1], yh,label="800")
74. plt.plot(xi[:-1], yi,label="900")
75. plt.plot(xj[:-1], yj,label="1000")
76.
77. plt.yscale('log')
78. plt.ylabel('Quantity (log)')
79. plt.xlabel('Kernel')
80.
81.
82. plt.legend(loc='upper left')
83. plt.title("w2v histogram of kernels 1")
84. plt.show()
85.
86. f= open('file0.pickle','rb')
Page 113
87. x0 =pickle.load(f)
88. A=x0[1].flatten()
89.
90. f= open('file1.pickle','rb')
91. x1 =pickle.load(f)
92. a=x1[1].flatten()
93.
94. f= open('file2.pickle','rb')
95. x2 =pickle.load(f)
96. b=x2[1].flatten()
97.
98. f= open('file3.pickle','rb')
99. x3 =pickle.load(f)
100. c=x3[1].flatten()
101.
102. f= open('file4.pickle','rb')
103. x4 =pickle.load(f)
104. d=x4[1].flatten()
105.
106. f= open('file5.pickle','rb')
107. x5 =pickle.load(f)
108. e=x5[1].flatten()
109.
110. f= open('file6.pickle','rb')
111. x6 =pickle.load(f)
112. z=x6[1].flatten()
113.
114. f= open('file7.pickle','rb')
115. x7 =pickle.load(f)
116. g=x7[1].flatten()
117.
118. f= open('file8.pickle','rb')
119. x8 =pickle.load(f)
120. h=x8[1].flatten()
121.
122. f= open('file9.pickle','rb')
123. x9 =pickle.load(f)
124. i=x9[1].flatten()
125.
126. f= open('file10.pickle','rb')
127. x10 =pickle.load(f)
128. j=x10[1].flatten()
129.
130. import math
131. intervali = np.linspace(-8, 8, 100)
132. yA, xA = np.histogram(A, bins=intervali)
133. ya, xa = np.histogram(a, bins=intervali)
134. yb, xb = np.histogram(b, bins=intervali)
135. yc, xc = np.histogram(c, bins=intervali)
136. yd, xd = np.histogram(d, bins=intervali)
137. ye, xe = np.histogram(e, bins=intervali)
138. yf, xf = np.histogram(z, bins=intervali)
139. yg, xg = np.histogram(g, bins=intervali)
140. yh, xh = np.histogram(h, bins=intervali)
141. yi, xi = np.histogram(i, bins=intervali)
142. yj, xj = np.histogram(j, bins=intervali)
143.
Page 114
144. plt.plot(xA[:-1], yA,label="0",color='k')
145. plt.plot(xa[:-1], ya,label="100")
146. plt.plot(xb[:-1], yb,label="200")
147. plt.plot(xc[:-1], yc,label="300")
148. plt.plot(xd[:-1], yd,label="400")
149. plt.plot(xe[:-1], ye,label="500")
150. plt.plot(xf[:-1], yf,label="600")
151. plt.plot(xg[:-1], yg,label="700")
152. plt.plot(xh[:-1], yh,label="800")
153. plt.plot(xi[:-1], yi,label="900")
154. plt.plot(xj[:-1], yj,label="1000")
155.
156. plt.yscale('log')
157. plt.ylabel('Quantity (log)')
158. plt.xlabel('Recurrent Kernel')
159.
160. plt.legend(loc='upper left')
161. plt.title("w2v histogram of recurrent kernels 1")
162. plt.show()
163.
164. #bias 1
165. f= open('file0.pickle','rb')
166. x0 =pickle.load(f)
167. A=x0[2].flatten()
168.
169. f= open('file1.pickle','rb')
170. x1 =pickle.load(f)
171. a=x1[2].flatten()
172.
173. f= open('file2.pickle','rb')
174. x2 =pickle.load(f)
175. b=x2[2].flatten()
176.
177. f= open('file3.pickle','rb')
178. x3 =pickle.load(f)
179. c=x3[2].flatten()
180.
181. f= open('file4.pickle','rb')
182. x4 =pickle.load(f)
183. d=x4[2].flatten()
184.
185. f= open('file5.pickle','rb')
186. x5 =pickle.load(f)
187. e=x5[2].flatten()
188.
189. f= open('file6.pickle','rb')
190. x6 =pickle.load(f)
191. z=x6[2].flatten()
192.
193. f= open('file7.pickle','rb')
194. x7 =pickle.load(f)
195. g=x7[2].flatten()
196.
197. f= open('file8.pickle','rb')
198. x8 =pickle.load(f)
199. h=x8[2].flatten()
200.
Page 115
201. f= open('file9.pickle','rb')
202. x9 =pickle.load(f)
203. i=x9[2].flatten()
204.
205. f= open('file10.pickle','rb')
206. x10 =pickle.load(f)
207. j=x10[2].flatten()
208.
209. import math
210. intervali = np.linspace(-2, 2, 50)
211. yA, xA = np.histogram(A, bins=intervali)
212. ya, xa = np.histogram(a, bins=intervali)
213. yb, xb = np.histogram(b, bins=intervali)
214. yc, xc = np.histogram(c, bins=intervali)
215. yd, xd = np.histogram(d, bins=intervali)
216. ye, xe = np.histogram(e, bins=intervali)
217. yf, xf = np.histogram(z, bins=intervali)
218. yg, xg = np.histogram(g, bins=intervali)
219. yh, xh = np.histogram(h, bins=intervali)
220. yi, xi = np.histogram(i, bins=intervali)
221. yj, xj = np.histogram(j, bins=intervali)
222.
223. plt.plot(xA[:-1], yA,label="0",color='k')
224. plt.plot(xa[:-1], ya,label="100")
225. plt.plot(xb[:-1], yb,label="200")
226. # plt.plot(xc[:-1], yc,label="300")
227. plt.plot(xd[:-1], yd,label="400")
228. # plt.plot(xe[:-1], ye,label="500")
229. plt.plot(xf[:-1], yf,label="600")
230. # plt.plot(xg[:-1], yg,label="700")
231. plt.plot(xh[:-1], yh,label="800")
232. # plt.plot(xi[:-1], yi,label="900")
233. plt.plot(xj[:-1], yj,label="1000")
234.
235. plt.yscale('log')
236. plt.ylabel('Quantity (log)')
237. plt.xlabel('Bias')
238.
239. plt.legend(loc='upper left')
240. plt.title("w2v histogram of bias 1")
241. plt.show()
Plot of the variance of the activation in the first gate ‘forget gate’ of the first LSTM layer.
Similar code can be derived to plot the second, third and fourth gates and for the second third and
fourth LSTM layer.
1. import numpy as np
2. import pandas as pd
3. import matplotlib.pyplot as plt
4. import pickle
5. import pprint
6. import math
7.
8.
9. def sigmoid(gamma):
Page 116
10. if gamma < 0:
11. return 1 - 1/(1 + math.exp(gamma))
12. else:
13. return 1/(1 + math.exp(-gamma))
14. f= open('file0.pickle','rb')
15. x0 =pickle.load(f)
16.
17.
18. f= open('file1.pickle','rb')
19. x1 =pickle.load(f)
20.
21. f= open('file2.pickle','rb')
22. x2 =pickle.load(f)
23.
24. f= open('file3.pickle','rb')
25. x3 =pickle.load(f)
26.
27. f= open('file4.pickle','rb')
28. x4 =pickle.load(f)
29.
30. f= open('file5.pickle','rb')
31. x5 =pickle.load(f)
32.
33. f= open('file6.pickle','rb')
34. x6 =pickle.load(f)
35.
36. f= open('file7.pickle','rb')
37. x7 =pickle.load(f)
38.
39. f= open('file8.pickle','rb')
40. x8 =pickle.load(f)
41.
42. f= open('file9.pickle','rb')
43. x9 =pickle.load(f)
44.
45. f= open('file10.pickle','rb')
46. x10 =pickle.load(f)
47.
48. w=np.ones((300,),dtype=np.float32)
49.
50. # gate 1 activation var
51. e0=[]
52. for i in range(300):
53. w1=x0[0][i][0:300]
54.
55. rw1=x0[1][i][0:300]
56.
57. b1=x0[2][i]
58.
59. e0.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
60.
61. e1=[]
62. for i in range(300):
63. w1=x1[0][i][0:300]
64.
65. rw1=x1[1][i][0:300]
66.
Page 117
67. b1=x1[2][i]
68.
69. e1.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
70.
71. e2=[]
72. for i in range(300):
73. w1=x2[0][i][0:300]
74.
75. rw1=x2[1][i][0:300]
76.
77. b1=x2[2][i]
78.
79. e2.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
80.
81. e3=[]
82. for i in range(300):
83. w1=x3[0][i][0:300]
84.
85. rw1=x3[1][i][0:300]
86.
87. b1=x3[2][i]
88.
89. e3.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
90. e4=[]
91. for i in range(300):
92. w1=x4[0][i][0:300]
93.
94. rw1=x4[1][i][0:300]
95.
96. b1=x4[2][i]
97.
98. e4.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
99.
100. e5=[]
101. for i in range(300):
102. w1=x5[0][i][0:300]
103.
104. rw1=x5[1][i][0:300]
105.
106. b1=x5[2][i]
107.
108. e5.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
109.
110. e6=[]
111. for i in range(300):
112. w1=x6[0][i][0:300]
113.
114. rw1=x6[1][i][0:300]
115.
116. b1=x6[2][i]
117.
118. e6.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
119.
120. e7=[]
121. for i in range(300):
122. w1=x7[0][i][0:300]
123.
Page 118
124. rw1=x7[1][i][0:300]
125.
126. b1=x7[2][i]
127.
128. e7.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
129.
130. e8=[]
131. for i in range(300):
132. w1=x8[0][i][0:300]
133.
134. rw1=x8[1][i][0:300]
135.
136. b1=x8[2][i]
137.
138. e8.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
139.
140. e9=[]
141. for i in range(300):
142. w1=x9[0][i][0:300]
143.
144. rw1=x9[1][i][0:300]
145.
146. b1=x9[2][i]
147.
148. e9.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
149.
150.
151.
152. e10=[]
153. for i in range(300):
154. w1=x10[0][i][0:300]
155.
156. rw1=x10[1][i][0:300]
157.
158. b1=x10[2][i]
159.
160. e10.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
161.
162.
163. var0=np.var(e0)
164. var1=np.var(e1)
165. var2=np.var(e2)
166. var3=np.var(e3)
167. var4=np.var(e4)
168. var5=np.var(e5)
169. var6=np.var(e6)
170. var7=np.var(e7)
171. var8=np.var(e8)
172. var9=np.var(e9)
173. var10=np.var(e10)
174.
175. plt.plot([0,100, 200, 300, 400,500,600,700,800,900,1000], [var0,var1,var2,var3,v
ar4,var5,var6,var7,var8,var9,var10])
176. plt.title('LSTM 1 Gate 1 activation')
177. plt.xlabel('epoch')
178. plt.ylabel('variance')
179. plt.show()
Page 119
Plot of the reverse cumulative step historgram for the first gate (‘forget gate’) of the first LSTM layer.
Similar code can be derived to plot the second, third and fourth gates and for the second third and
fourth LSTM layer.
1. import numpy as np
2. import pandas as pd
3. import matplotlib.pyplot as plt
4. import pickle
5. import pprint
6. import math
7.
8.
9. def sigmoid(gamma):
10. if gamma < 0:
11. return 1 - 1/(1 + math.exp(gamma))
12. else:
13. return 1/(1 + math.exp(-gamma))
14. f= open('file0.pickle','rb')
15. x0 =pickle.load(f)
16.
17.
18. f= open('file1.pickle','rb')
19. x1 =pickle.load(f)
20.
21. f= open('file2.pickle','rb')
22. x2 =pickle.load(f)
23.
24. f= open('file3.pickle','rb')
25. x3 =pickle.load(f)
26.
27. f= open('file4.pickle','rb')
28. x4 =pickle.load(f)
29.
30. f= open('file5.pickle','rb')
31. x5 =pickle.load(f)
32.
33. f= open('file6.pickle','rb')
34. x6 =pickle.load(f)
35.
36. f= open('file7.pickle','rb')
37. x7 =pickle.load(f)
38.
39. f= open('file8.pickle','rb')
40. x8 =pickle.load(f)
41.
42. f= open('file9.pickle','rb')
43. x9 =pickle.load(f)
44.
45. f= open('file10.pickle','rb')
46. x10 =pickle.load(f)
47.
48. w=np.ones((300,),dtype=np.float32)
49.
Page 120
50. # gate 1 activation
51. e0=[]
52. for i in range(300):
53. w1=x0[0][i][0:300]
54.
55. rw1=x0[1][i][0:300]
56.
57. b1=x0[2][i]
58.
59. e0.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
60.
61. e1=[]
62. for i in range(300):
63. w1=x1[0][i][0:300]
64.
65. rw1=x1[1][i][0:300]
66.
67. b1=x1[2][i]
68.
69. e1.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
70.
71. e2=[]
72. for i in range(300):
73. w1=x2[0][i][0:300]
74.
75. rw1=x2[1][i][0:300]
76.
77. b1=x2[2][i]
78.
79. e2.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
80.
81. e3=[]
82. for i in range(300):
83. w1=x3[0][i][0:300]
84.
85. rw1=x3[1][i][0:300]
86.
87. b1=x3[2][i]
88.
89. e3.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
90. e4=[]
91. for i in range(300):
92. w1=x4[0][i][0:300]
93.
94. rw1=x4[1][i][0:300]
95.
96. b1=x4[2][i]
97.
98. e4.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
99.
100. e5=[]
101. for i in range(300):
102. w1=x5[0][i][0:300]
103.
104. rw1=x5[1][i][0:300]
105.
106. b1=x5[2][i]
Page 121
107.
108. e5.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
109.
110. e6=[]
111. for i in range(300):
112. w1=x6[0][i][0:300]
113.
114. rw1=x6[1][i][0:300]
115.
116. b1=x6[2][i]
117.
118. e6.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
119.
120. e7=[]
121. for i in range(300):
122. w1=x7[0][i][0:300]
123.
124. rw1=x7[1][i][0:300]
125.
126. b1=x7[2][i]
127.
128. e7.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
129.
130. e8=[]
131. for i in range(300):
132. w1=x8[0][i][0:300]
133.
134. rw1=x8[1][i][0:300]
135.
136. b1=x8[2][i]
137.
138. e8.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
139.
140. e9=[]
141. for i in range(300):
142. w1=x9[0][i][0:300]
143.
144. rw1=x9[1][i][0:300]
145.
146. b1=x9[2][i]
147.
148. e9.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
149.
150.
151.
152. e10=[]
153. for i in range(300):
154. w1=x10[0][i][0:300]
155.
156. rw1=x10[1][i][0:300]
157.
158. b1=x10[2][i]
159.
160. e10.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))
161.
162.
163.
Page 122
164. bins = 150
165.
166. fig, ax = plt.subplots(figsize=(8, 4))
167.
168.
169. # Overlay a reversed cumulative histogram.
170. ax.hist(e0, bins=bins, density=True, histtype='step', cumulative=-1,
171. label='0 epoch')
172. ax.hist(e1, bins=bins, density=True, histtype='step', cumulative=-1,
173. label='100 epoch')
174. ax.hist(e2, bins=bins, density=True, histtype='step', cumulative=-1,
175. label='200 epoch')
176. ax.hist(e3, bins=bins, density=True, histtype='step', cumulative=-1,
177. label='300 epoch')
178. ax.hist(e4, bins=bins, density=True, histtype='step', cumulative=-1,
179. label='400 epoch')
180. ax.hist(e5, bins=bins, density=True, histtype='step', cumulative=-1,
181. label='500 epoch')
182. ax.hist(e6, bins=bins, density=True, histtype='step', cumulative=-1,
183. label='600 epoch')
184. ax.hist(e7, bins=bins, density=True, histtype='step', cumulative=-1,
185. label='700 epoch')
186. ax.hist(e8, bins=bins, density=True, histtype='step', cumulative=-1,
187. label='800 epoch')
188. ax.hist(e9, bins=bins, density=True, histtype='step', cumulative=-1,
189. label='900 epoch')
190. ax.hist(e10, bins=bins, color='k', density=True, histtype='step', cumulative=-
1,
191. label='1000 epoch')
192.
193. # tidy up the figure
194. ax.grid(True)
195. ax.legend(loc='right')
196. ax.set_title('Reverse cumulative step histograms for gate 1 LSTM 1')
197. ax.set_xlabel('activation')
198. ax.set_ylabel('Likelyhood of occurence')
199.
200. plt.show()
top related