from data to numbers to knowledge: semantic embeddings by alvaro barbero
TRANSCRIPT
www.iic.uam.es
7 de diciembre de 2016
Data Numbers KnowledgeÁlvaro Barbero – Chief Data Scientist
www.iic.uam.es
www.iic.uam.es 3
How do we see the world?
www.iic.uam.es 4
Understanding correlations
The more, the better
Linear correlations
www.iic.uam.es 5
Understanding correlations
“Moral virtue is a mean”Aristotle
Convex correlations
www.iic.uam.es 6
Understanding correlations
Few is bad, but slightly more is very good, unless you skipthe sweet spot and then you are doing terribly, but if you
keep going you get better until you do worse again and afterthat nothing changes and you do so-so
Nonlinear correlations
www.iic.uam.es
THE REAL WORLD IS NON-LINEARShocking revelationsBig data scientists rampage the cities after the newly discovered uselessness of their linear models.
Lorem ipsum. Ea pro natuminvidunt repudiandae, his et facilisis vituperatoribus. Mei eu ubique alterasenserit, consul eripuitaccusata has ne. Ignotaverterem te nam, eu cibocausae menandri vim. Sit rebum erant dolorem et, sedodio error ad.Vel molestiecorrumpit deterruisset ad, mollis ceteros ad sea.
In libris graecis appeteremea. At vim odio loremomnes, pri id iuvaretpartiendo. Vivendomenandri et sed. Loremvolumus blandit cu has.Sitcu alia porro fuisset.
Ea pro natum inviduntrepudiandae, his et facilisisvituperatoribus. Mei euubique altera senserit, consul eripuit accusata has ne. Ignota verterem te nam, eu cibo causae menandrivim. Sit rebum erantdolorem et, sed odio error ad.Vel molestie corrumpitdeterruisset ad, mollisceteros ad sea.
BIG DATA TIMES
VERY VERY BIG DATA SCIENCE - Since 1802
www.iic.uam.es 8The modelling dilemma
www.iic.uam.es 9
The modelling dilemma
Easy to understandand explain
Linear models
May not faithfullyrepresent reality
Non-linear models
Accuraterepresentations
Very difficultinterpretation
www.iic.uam.es 10
The brain trick
www.iic.uam.es 11
Brain power
Low levelobservations
Multilayered non-linear processing
Abstract yet linear concepts
www.iic.uam.es 12
Abstraction examples
<<
Art quality
www.iic.uam.es 13
Embeddings: abstract non-linearity away
Low level observations
High levelembedding
Artificial multilayerednon-linear processing
Deep Network
www.iic.uam.es 14
www.iic.uam.es 15
word2vec
cat chills on a mat
cat chills mushroom a mat
Socher et al (2013)
www.iic.uam.es 16
Exploiting the linearity: semantic algebra
King man woman queen
Obama USA Russia Putin
human animal ethics
paella Spain Italy risotto
Cristiano Madrid Barcelona Messi
www.iic.uam.es 17
Bilingual word2vec
WordSpace
WzhWen
EnglishWords
MandarinWords
Socher et al (2013)
www.iic.uam.es 18
Text embedding models through Recurrent Neural Networks
High dimensional representation of a sequence
0.1
0.5
1.0
0.0
2.4
The lazy brown foxSutskever et al - Sequence to Sequence Learning with neural networks
www.iic.uam.es 19
Example: books embeddings
Cilibrasi, Vitányi - Normalized Web Distance and Word Similarity
Jonathan Swift
Oscar Wilde
William Shakespeare
www.iic.uam.es 20
Embedding artistic styles
Gatys et al – A Neural Algorithm of Artistic StyleBottou et al - Optimization Methods for Large-Scale Machine Learning
Low levelobservations
Styleembedding
www.iic.uam.es 21
Artistic styles as linear relations
Google Research - https://research.googleblog.com/2016/10/supercharging-style-transfer.html
www.iic.uam.es 22
An application: furniture embedding
Bell and Bala - Learning visual similarity for product design with convolutional neural networks
www.iic.uam.esTake home message
www.iic.uam.es 24
Take home message
Reality is highly non-linear
Deep Learning can abstract complexity
away
Complex relations become easy comparisons!
www.iic.uam.es
www.iic.uam.es
www.iic.uam.es
Álvaro Barbero JiménezChief Data Scientist at Instituto de Ingeniería del Conocimiento (IIC)
Elementos gráficos de apoyo obtenidos en:
@albarjip
Alvaro Barbero