deep learning for textprocessing with focus on...
TRANSCRIPT
![Page 1: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/1.jpg)
Deep Learning for Text Processing with Focus on Word Embedding:
Concept and ApplicationsMohamad Ivan Fanany, Dr. Eng.,
![Page 2: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/2.jpg)
![Page 3: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/3.jpg)
![Page 4: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/4.jpg)
Five Reasons to ExploreDeep Learning
![Page 5: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/5.jpg)
![Page 6: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/6.jpg)
![Page 7: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/7.jpg)
![Page 8: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/8.jpg)
![Page 9: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/9.jpg)
![Page 10: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/10.jpg)
![Page 11: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/11.jpg)
![Page 12: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/12.jpg)
![Page 13: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/13.jpg)
![Page 14: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/14.jpg)
Contents
• Natural language processing• One-hot Encoding
• Distributional Representation• Distributed Representation• Word embeddings
• Exploring word2vec and GloVe• Using Pre-trained embeddings
![Page 15: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/15.jpg)
Natural Language Processing
• Mostly works with text data.
• Could be applied to music, bioinformatics, speech, etc.
• Machine learning perspective: NL is a sequence of variable-length sequences of high-dimensional vectors.
How to best represent the word for learning?
![Page 16: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/16.jpg)
Word Embeddings
• A set of language modeling and feature learning techniques in natural language processing.
• Mapped words or phrases from the vocabulary to vectors of real numbers.
• Mathematical embedding from a space with one dimension per word to a continuous vector space with much lower dimension.
![Page 17: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/17.jpg)
One-Hot EncodingV = {zebra, horse, school, summer}
v(zebra) = [1, 0, 0, 0]v(horse) = [0, 1, 0, 0]
v(school) = [0, 0, 1, 0]v(summer) = [0, 0, 0, 1]
(+) Pros:Simplicity
(-) Cons:Can be memory inefficientNotion of "word similarity" is undefined
![Page 18: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/18.jpg)
Distributional Representation
Is there a representation that preserves the similarities of wordmeanings?
d(v(zebra), v(horse)) < d(v(zebra), v(summer))
“You shall know a word by the company it keeps” - John Rupert Firth
v(Paris)-v(France) ≈ v(Berlin)-v(Germany)
![Page 19: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/19.jpg)
www.cs.ox.ac.uk/files/6605/aclVectorTutorial.pdf
Distributional Representation
![Page 20: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/20.jpg)
www.cs.ox.ac.uk/files/6605/aclVectorTutorial.pdf
Distributional Representation
![Page 21: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/21.jpg)
Distributional Representation
(+) Pros:Simplicity (BOW assumption)Has notion of word similarity
(-) Cons:Can be memory inefficient
Latent Semantic Analysis, Latent Dirichlet Allocation, Self-organizing map, Hyperspace Analog to Language, Independent Component Analysis, Random Indexing.
![Page 22: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/22.jpg)
Distributed Representation
𝑽𝑽 is a vocabulary𝒘𝒘𝑖𝑖 ∈ 𝑽𝑽
𝒗𝒗(𝒘𝒘𝑖𝑖) ∈ 𝑹𝑹𝑛𝑛
𝒗𝒗(𝒘𝒘𝑖𝑖) is a low-dimensional, learnable, dense word vector
![Page 23: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/23.jpg)
colah.github.io/posts/2014-07-NLP-RNNs-Representations
Distributed Representation
![Page 24: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/24.jpg)
Distributed Representation
(+) Pros:Has notion of word similarityMemory Efficient (low dimensional)
(-) Cons:Computationally intensive
![Page 25: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/25.jpg)
Distributed Representation as a Lookup Table
𝑾𝑾 is a matrix whose rows are 𝒗𝒗(𝒘𝒘𝑖𝑖) ∈ 𝑹𝑹𝑛𝑛
𝒗𝒗(𝒘𝒘𝑖𝑖) returns 𝒊𝒊𝑡𝑡𝑡 row of 𝑾𝑾
![Page 26: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/26.jpg)
Statistical Language Model
A sentence 𝒔𝒔 = 𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇
How likely is 𝒔𝒔?𝑝𝑝 𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇
According to the chain rule (probability)
𝑝𝑝 𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇 = �𝑡𝑡=1
𝑇𝑇
𝑝𝑝 𝑥𝑥𝑡𝑡|𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇
![Page 27: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/27.jpg)
N-gram Modelsn-th order Markov assumption
𝑝𝑝 𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇 ≈�𝑡𝑡=1
𝑇𝑇
𝑝𝑝 𝑥𝑥𝑡𝑡|𝑥𝑥𝑡𝑡−𝑛𝑛,⋯ , 𝑥𝑥𝑡𝑡−1
Bigram model of 𝒔𝒔 = (𝑎𝑎, 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐, 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏, 𝑏𝑏𝑖𝑖, 𝑜𝑜𝑜𝑜, 𝑐𝑐𝑡𝑐𝑐, 𝑐𝑐𝑏𝑏𝑐𝑐𝑐𝑐, . )1. How likely does ‘a’ follow ‘<S>’?2. How likely does ‘cute’ follow ‘a’?3. How likely does ‘is’ follow ‘bird’?4. …
![Page 28: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/28.jpg)
n-gram Modelsn-th order Markov assumption
𝑝𝑝 𝑥𝑥1, 𝑥𝑥2,⋯ , 𝑥𝑥𝑇𝑇 ≈�𝑡𝑡=1
𝑇𝑇
𝑝𝑝 𝑥𝑥𝑡𝑡|𝑥𝑥𝑡𝑡−𝑛𝑛,⋯ , 𝑥𝑥𝑡𝑡−1
Bigram model of 𝒔𝒔 = (𝑎𝑎, 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐, 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏, 𝑏𝑏𝑖𝑖, 𝑜𝑜𝑜𝑜, 𝑐𝑐𝑡𝑐𝑐, 𝑐𝑐𝑏𝑏𝑐𝑐𝑐𝑐, . )
𝑝𝑝 𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−(𝑛𝑛−1) ,⋯ ,𝑤𝑤𝑖𝑖−1 =𝑐𝑐𝑜𝑜𝑐𝑐𝑜𝑜𝑐𝑐 𝑤𝑤𝑖𝑖−(𝑛𝑛−1) ,⋯ ,𝑤𝑤𝑖𝑖−1,𝑤𝑤𝑖𝑖𝑐𝑐𝑜𝑜𝑐𝑐𝑜𝑜𝑐𝑐 𝑤𝑤𝑖𝑖−(𝑛𝑛−1) ,⋯ ,𝑤𝑤𝑖𝑖−1
the counts are obtained from a training corpus
![Page 29: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/29.jpg)
n-gram Models
(+) Pros:Computationally efficient
(-) Cons:Data sparsityLack of generalization:
[ride a horse], [ride a llama], [ride a zebra]
![Page 30: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/30.jpg)
Word Embedding versus Other Representations• They use words as their context• More natural form of semantic similarity• Human understanding perspective• The technique of choice for vectorizing text for NLP Tasks:
• Text classification• Document clustering• Part of speech tagging• Named entity recognition• Sentiment analysis• …
![Page 31: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/31.jpg)
word2vec vs one-hot
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
one-hot word2vec
![Page 32: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/32.jpg)
Reasoning with word vectors
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
![Page 33: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/33.jpg)
word2vec
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
![Page 34: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/34.jpg)
CBOW vs Skip-gram
• CBOW:• Predicts the current word given a window of surrounding words (context)• The order of the context words does not influence the prediction (bag of
words assumption)
• Skip-gram:• Predict the surrounding words (context) given the center word.
CBOW is faster but skip-gram does a better job at predicting infrequent words.
![Page 35: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/35.jpg)
Skip-gram implementation in Keras
Training Data (Pairs)
Expected Prediction
Takes in a word vector and a context vector, learns to predict one or zero depending whether it sees a positive or negative sample.
Deliverables
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 36: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/36.jpg)
Skip-gram implementation in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 37: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/37.jpg)
Skip-gram implementation in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 38: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/38.jpg)
Skip-gram implementation in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
The first 10 of 56 (pair, label)
![Page 39: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/39.jpg)
CBOW implementation in Keras
Training Data (Pairs)
Expected Prediction
Takes the context words as inputPredicts the target word
Deliverables
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 40: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/40.jpg)
CBOW implementation in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 41: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/41.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 42: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/42.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 43: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/43.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 44: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/44.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 45: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/45.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 46: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/46.jpg)
CBOW from the scratch in Keras
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 47: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/47.jpg)
Glove vs Word2vec
• Glove :• Count-based• Capture global co-
occurrences statistics• Requires upfront pass
through entire dataset.
• Both are fundamentally similar:
• Capture local co-occurrence statistics (neighbors)
• Capture distance between embedding vector (analogies)
![Page 48: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/48.jpg)
Glove vs Word2vec
• GloVe generally shows higher accuracy than word2vec.• GloVe is faster to train if use parallelization• Python tooling for GloVe is not as mature as for word2vec.
• The only tool available to do this as of the time of writing is the GloVe-Python project (https://github.com/maciejkula/glove-python), which provides a toy implementation for GloVe on Python
![Page 49: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/49.jpg)
Fine-tune learned embedding (word2vec)
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
Learned Model
New TrainingData
![Page 50: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/50.jpg)
Fine-tuned learned embedding (word2vec)
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 51: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/51.jpg)
Fine-tuned learned embedding (word2vec)
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 52: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/52.jpg)
Fine-tune learned embedding (GloVe)
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
Learned Model
![Page 53: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/53.jpg)
Fine-tune learned embedding (GloVe)
https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras
![Page 54: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/54.jpg)
Look up embeddings
Set this!
![Page 55: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/55.jpg)
Using third-party Implementations (Gensim)
• Gensim library provides an implementation of word2vec.• Keras does not provide any support for word2vec.• Integrating the genism implementation into Keras is common
practice.
![Page 56: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/56.jpg)
Using third-party Implementations (Gensim)
![Page 57: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/57.jpg)
Using third-party Implementations (Gensim)
![Page 58: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/58.jpg)
Using third-party Implementations (Gensim)
![Page 59: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/59.jpg)
Neural-based Predictive Models
• Goal: Predict Text using Learned Embedding Vectors• Word2vec:
• Shallow neural network• Local: nearby words predict each other• Fixed word embedding vector size (i.e., 300)• Optimizer: Mini-batch SGD
• SyntaxNet:• Deep(er) neural network• More global• Not an RNN!• Can Combine with BOW-based models (i.e., word2vec CBOW)
![Page 60: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/60.jpg)
Word2vec Library
• Gensim:• Python only• Most popular
• Spark ML• Python + Jawa/Scala• Supports only synonyms
![Page 61: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/61.jpg)
*2vec
• lda2vec:• LDA (Global) + word2vec (local)
• Like2vec:• Embedding-based Recommender
![Page 62: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/62.jpg)
Word Embeddings Applications (Machine Translation)
![Page 63: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/63.jpg)
Word Embeddings Applications (Sentiment Analysis)
![Page 64: Deep Learning for TextProcessing with Focus on …socs.binus.ac.id/files/2017/05/INACL-2017_M_Ivan-Fanany.pdfDeep Learning for TextProcessing with Focus on Word Embedding: Concept](https://reader033.vdocuments.net/reader033/viewer/2022042320/5f0a62077e708231d42b5dcd/html5/thumbnails/64.jpg)
Thank you! Q/A Sessions
All source codes and datasets are available! The DLwK sources are fixed and modified to run on Python 3.5 and Keras 2.2 in Windows 10 with GPU
Please ask Panitia INACL 2017