from data to numbers to knowledge: semantic embeddings by alvaro barbero

26
www.iic.uam.es

Upload: big-data-spain

Post on 10-Jan-2017

44 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es

Page 2: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

7 de diciembre de 2016

Data Numbers KnowledgeÁlvaro Barbero – Chief Data Scientist

www.iic.uam.es

Page 3: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 3

How do we see the world?

Page 4: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 4

Understanding correlations

The more, the better

Linear correlations

Page 5: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 5

Understanding correlations

“Moral virtue is a mean”Aristotle

Convex correlations

Page 6: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 6

Understanding correlations

Few is bad, but slightly more is very good, unless you skipthe sweet spot and then you are doing terribly, but if you

keep going you get better until you do worse again and afterthat nothing changes and you do so-so

Nonlinear correlations

Page 7: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es

THE REAL WORLD IS NON-LINEARShocking revelationsBig data scientists rampage the cities after the newly discovered uselessness of their linear models.

Lorem ipsum. Ea pro natuminvidunt repudiandae, his et facilisis vituperatoribus. Mei eu ubique alterasenserit, consul eripuitaccusata has ne. Ignotaverterem te nam, eu cibocausae menandri vim. Sit rebum erant dolorem et, sedodio error ad.Vel molestiecorrumpit deterruisset ad, mollis ceteros ad sea.

In libris graecis appeteremea. At vim odio loremomnes, pri id iuvaretpartiendo. Vivendomenandri et sed. Loremvolumus blandit cu has.Sitcu alia porro fuisset.

Ea pro natum inviduntrepudiandae, his et facilisisvituperatoribus. Mei euubique altera senserit, consul eripuit accusata has ne. Ignota verterem te nam, eu cibo causae menandrivim. Sit rebum erantdolorem et, sed odio error ad.Vel molestie corrumpitdeterruisset ad, mollisceteros ad sea.

BIG DATA TIMES

VERY VERY BIG DATA SCIENCE - Since 1802

Page 8: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 8The modelling dilemma

Page 9: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 9

The modelling dilemma

Easy to understandand explain

Linear models

May not faithfullyrepresent reality

Non-linear models

Accuraterepresentations

Very difficultinterpretation

Page 10: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 10

The brain trick

Page 11: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 11

Brain power

Low levelobservations

Multilayered non-linear processing

Abstract yet linear concepts

Page 12: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 12

Abstraction examples

<<

Art quality

Page 13: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 13

Embeddings: abstract non-linearity away

Low level observations

High levelembedding

Artificial multilayerednon-linear processing

Deep Network

Page 14: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 14

Page 15: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 15

word2vec

cat chills on a mat

cat chills mushroom a mat

Socher et al (2013)

Page 16: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 16

Exploiting the linearity: semantic algebra

King man woman queen

Obama USA Russia Putin

human animal ethics

paella Spain Italy risotto

Cristiano Madrid Barcelona Messi

Page 17: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 17

Bilingual word2vec

WordSpace

WzhWen

EnglishWords

MandarinWords

Socher et al (2013)

Page 18: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 18

Text embedding models through Recurrent Neural Networks

High dimensional representation of a sequence

0.1

0.5

1.0

0.0

2.4

The lazy brown foxSutskever et al - Sequence to Sequence Learning with neural networks

Page 19: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 19

Example: books embeddings

Cilibrasi, Vitányi - Normalized Web Distance and Word Similarity

Jonathan Swift

Oscar Wilde

William Shakespeare

Page 20: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 20

Embedding artistic styles

Gatys et al – A Neural Algorithm of Artistic StyleBottou et al - Optimization Methods for Large-Scale Machine Learning

Low levelobservations

Styleembedding

Page 21: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 21

Artistic styles as linear relations

Google Research - https://research.googleblog.com/2016/10/supercharging-style-transfer.html

Page 22: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 22

An application: furniture embedding

Bell and Bala - Learning visual similarity for product design with convolutional neural networks

Page 23: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.esTake home message

Page 24: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es 24

Take home message

Reality is highly non-linear

Deep Learning can abstract complexity

away

Complex relations become easy comparisons!

Page 25: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es

Page 26: From data to numbers to knowledge: semantic embeddings By Alvaro Barbero

www.iic.uam.es

www.iic.uam.es

Álvaro Barbero JiménezChief Data Scientist at Instituto de Ingeniería del Conocimiento (IIC)

Elementos gráficos de apoyo obtenidos en:

@albarjip

Alvaro Barbero