deep learning for nlp applications
TRANSCRIPT
![Page 1: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/1.jpg)
Deep Learning forDeep Learning forNLP ApplicationsNLP Applications
![Page 2: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/2.jpg)
What is DeepWhat is DeepLearning?Learning?
![Page 3: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/3.jpg)
Just a NeuralJust a NeuralNetwork!Network!
"Deep learning" refers toDeep Neural Networks
A Deep Neural Network issimply a Neural Networkwith multiple hidden layers
Neural Networks havebeen around since the1970s
![Page 4: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/4.jpg)
So why now?So why now?
![Page 5: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/5.jpg)
LargeLargeNetworks areNetworks arehard to trainhard to train
Vanishing gradients makebackpropagation harderOverfitting becomes aserious issueSo we settled (for the timebeing) with simpler, moreuseful variations of NeuralNetworks
![Page 6: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/6.jpg)
Then,Then,suddenly ...suddenly ...
We realized we can stackthese simpler NeuralNetworks, making themeasier to trainWe derived more efficientparameter estimation andmodel regularizationmethodsAlso, Moore's law kicked inand GPU computationbecame viable
![Page 7: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/7.jpg)
So what's the bigSo what's the bigdeal?deal?
![Page 8: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/8.jpg)
MASSIVE improvements inMASSIVE improvements inComputer VisionComputer Vision
![Page 9: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/9.jpg)
Speech RecognitionSpeech RecognitionBaidu (with Andrew Ng as their chief) has built a state-of-the-art speech recognition system with DeepLearningTheir dataset: 7000 hours of conversation couple withbackground noise synthesis for a total of 100,000hoursThey processed this through a massive GPU cluster
![Page 10: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/10.jpg)
Cross DomainCross DomainRepresentationsRepresentations
What if you wanted to takean image and generate adescription of it?The beauty ofrepresentation learning isit's ability to be distributedacross tasksThis is the real power ofNeural Networks
![Page 11: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/11.jpg)
But Samiur, whatBut Samiur, whatabout NLP?about NLP?
![Page 12: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/12.jpg)
Deep Learning NLPDeep Learning NLPDistributed word representationsDependency ParsingSentiment AnalysisAnd many others ...
![Page 13: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/13.jpg)
StandardStandardBag of Words
A one-hot encoding20k to 50k dimensionsCan be improved byfactoring in documentfrequency
Word embeddingWord embeddingNeural Word embeddings
Uses a vector spacethat attempts topredict a word given acontext window200-400 dimensions
motel [0.06, -0.01, 0.13, 0.07, -0.06, -0.04, 0, -0.04]
hotel [0.07, -0.03, 0.07, 0.06, -0.06, -0.03, 0.01, -0.05]
Word RepresentationsWord Representations
Word embeddings make semantic similarity andsynonyms possible
![Page 14: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/14.jpg)
Word embeddings have coolWord embeddings have coolproperties:properties:
![Page 15: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/15.jpg)
Dependency ParsingDependency ParsingConverting sentences to adependency based grammar
Simplifying this to the verbsand it's agents is calledSemantic Role Labeling
![Page 16: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/16.jpg)
SentimentSentimentAnalysisAnalysis
Recursive Neural NetworksCan model treestructures very wellThis makes it great forother NLP tasks too(such as parsing)
![Page 17: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/17.jpg)
Get to theGet to theapplications partapplications part
already!already!
![Page 18: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/18.jpg)
ToolsToolsPython
Theano/PyLearn2Gensim (for word2vec)nolearn (uses scikit-learn
Java/Clojure/ScalaDeepLearning4jneuralnetworks by Ivan Vasilev
APIsAlchemy APIMeta Mind
![Page 19: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/19.jpg)
Problem: Funding SentenceProblem: Funding SentenceClassifierClassifier
Build a binary classifier that is able to take any sentencefrom a news article and tell if it's about funding or not.
eg. "Mattermark is today announcing that it has raised a
round of $6.5 million"
![Page 20: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/20.jpg)
Word VectorsWord Vectors
Used Gensim's Word2Vec implementation to trainunsupervised word vectors on the UMBC WebbaseCorpus (~100M documents, ~48GB of text)Then, iterated 20 times on text in news articles in thetech news domain (~1M documents, ~300MB of text)
![Page 21: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/21.jpg)
Sentence VectorsSentence VectorsHow can you compose word vectors to makesentence vectors?
Use paragraph vector model proposed by Quoc LeFeed into an RNN constructed by a dependencytree of the sentenceUse some heuristic function to combine the stringof word vectors
![Page 22: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/22.jpg)
What did we try?What did we try?TF-IDF + Naive BayesWord2Vec + Composition MethodsWord2Vec + TF-IDF + Composition MethodsWord2Vec + TF-IDF + Semantic Role Labeling (SRL) +Composition Methods
![Page 23: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/23.jpg)
Composition MethodsComposition Methods
Where wi represents the i'th word vector,wv the word vector for the verb, and a0 and a1 are
agents
![Page 24: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/24.jpg)
What worked?What worked?Word2Vec + TFIDF + SRL + Circular Convolution/Additive
The first method with simple TFIDF/Naive Bayesperformed extremely poorly because of it's largedimensionalityCombining TFIDF with Word2Vec provided a small,but noticeable improvementAdding SRL and a more sophisticated compositionmethod increased performance by almost 5%
![Page 25: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/25.jpg)
What else could we try?What else could we try?Can we apply this method to generate generalpurpose document vectors?
We are currently using LDA (a topic analysismethod) or simple TFIDF to create documentvectorsHow will this method compare to the alreadyproposed paragraph vector method by Quoc Le?
Can we associate these document vectors with muchsmaller query strings?
eg. Search for artificial intelligence against ourcompanies and get better results than keywordsearch
![Page 26: Deep Learning for NLP Applications](https://reader033.vdocuments.net/reader033/viewer/2022042701/55a5145c1a28ab866b8b45d6/html5/thumbnails/26.jpg)
Who's doing ML atWho's doing ML atMattermark?Mattermark?
mattermark
We need more people! Refer anyone thatyou know that does Data Science/ML