![Page 1: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/1.jpg)
![Page 2: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/2.jpg)
Progress in Deep Learning Theory
• Exponential advantage of distributed representations
• Exponential advantage of depth
• Myth-busting : non-convexity & local minima
• Probabilistic interpretations of auto-encoders
![Page 3: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/3.jpg)
Machine Learning, AI & No Free Lunch•
1.
2.
3.
4.
![Page 4: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/4.jpg)
ML 101. What We Are Fighting Against: The Curse of Dimensionality
![Page 5: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/5.jpg)
Not Dimensionality so much as Number of Variations
•
•
(Bengio, Dellalleau & Le Roux 2007)
![Page 6: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/6.jpg)
Putting Probability Mass where Structure is Plausible
•
•••
![Page 7: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/7.jpg)
Bypassing the curse of dimensionality
![Page 8: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/8.jpg)
Exponential advantage of distributed representations
![Page 9: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/9.jpg)
Hidden Units Discover Semantically Meaningful Concepts
••
![Page 10: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/10.jpg)
Each feature can be discovered without the need for seeing the exponentially large number of configurations of the other features•
••••
![Page 11: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/11.jpg)
Exponential advantage of distributed representations
••
•
•
![Page 12: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/12.jpg)
Classical Symbolic AI vs Representation Learning
••
cat do
g
person
![Page 13: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/13.jpg)
Neural Language Models: fighting one exponential by another one!
•
![Page 14: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/14.jpg)
Neural word embeddings: visualizationdirections = Learned Attributes
![Page 15: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/15.jpg)
Analogical Representations for Free(Mikolov et al, ICLR 2013)
•
• ≈• ≈
![Page 16: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/16.jpg)
Exponential advantage of depth
…
2n
…
= universal approximator2 layers of
Logic gatesFormal neuronsRBF units
Theorems on advantage of depth:(Hastad et al 86 & 91, Bengio et al 2007, Bengio & Delalleau 2011, Martens et al 2013, Pascanu et al 2014, Montufar et al NIPS 2014)
Some functions compactly represented with k layers may require exponential size with 2 layers
RBMs & auto-encoders = universal approximator
![Page 17: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/17.jpg)
Why does it work? No Free Lunch
•
•
•
![Page 18: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/18.jpg)
•
•
Exponential advantage of depth
![Page 19: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/19.jpg)
main
subroutine1 includes subsub1 code and subsub2 code and subsubsub1 code
“Shallow” computer program
subroutine2 includes subsub2 code and subsub3 code and subsubsub3 code and …
![Page 20: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/20.jpg)
main
sub1 sub2 sub3
subsub1 subsub2 subsub3
subsubsub1 subsubsub2subsubsub3
“Deep” computer program
![Page 21: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/21.jpg)
Sharing Components in a Deep Architecture
![Page 22: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/22.jpg)
•
•
Exponential advantage of depth
![Page 23: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/23.jpg)
A Myth is Being Debunked: Local Minima in Neural Nets → Convexity is not needed•
•
•
![Page 24: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/24.jpg)
Saddle Points
•
•
![Page 25: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/25.jpg)
Saddle Points During Training
•••
![Page 26: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/26.jpg)
Low Index Critical Points
![Page 27: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/27.jpg)
The Next Challenge: Unsupervised Learning
•••
•••••
![Page 28: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/28.jpg)
Why Latent Factors & Unsupervised Representation Learning? Because of Causality.
•
•
•
![Page 29: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/29.jpg)
Probabilistic interpretation of auto-encoders
••
•
•
•
![Page 30: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/30.jpg)
Denoising Auto-Encoder•
•
→
Reconstructio
n
Corrupted input
Corrupted input
![Page 31: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/31.jpg)
Regularized Auto-Encoders Learn a Vector Field that Estimates a Gradient Field
![Page 32: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/32.jpg)
Denoising Auto-Encoder Markov Chain
![Page 33: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/33.jpg)
Variational Auto-Encoders (VAEs)
•
•
•
![Page 34: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/34.jpg)
Geometric Interpretation of VAEs
•
•
![Page 35: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/35.jpg)
Denoising Auto-Encoder vs Diffusion Inverter (Sohl-Dickstein et al ICML 2015)
•
•
![Page 36: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/36.jpg)
Encoder-Decoder Framework
•
••
![Page 37: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/37.jpg)
Attention Mechanism for Deep Learning
••
![Page 38: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/38.jpg)
End-to-End Machine Translation with Recurrent Nets and Attention Mechanism
•
![Page 39: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/39.jpg)
Image-to-Text: Caption Generation with Attention
![Page 40: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/40.jpg)
Paying Attention to Selected Parts of the Image While Uttering Words
![Page 41: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/41.jpg)
Speaking about what one sees
![Page 42: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/42.jpg)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
![Page 43: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/43.jpg)
The Good
![Page 44: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/44.jpg)
And the Bad
![Page 45: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/45.jpg)
The Next Frontier: Reasoning and Question Answering
•
![Page 46: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/46.jpg)
Ongoing Project: Knowledge Extraction
•
••
![Page 47: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/47.jpg)
Conclusions
•
•••
•…
![Page 48: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0ee32f7e708231d4416d11/html5/thumbnails/48.jpg)
MILA: Montreal Institute for Learning Algorithms