glove: global vectors for word...

77
GloVe: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning Presented by Chris Kedzie March 25, 2015 Chris Kedzie GloVe March 25, 2015 1 / 30

Upload: others

Post on 12-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher D. Manning

Presented by Chris Kedzie

March 25, 2015

Chris Kedzie GloVe March 25, 2015 1 / 30

Page 2: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Overview

1 Introduction

2 Problem

3 GloVe Model

4 Experiments

Chris Kedzie GloVe March 25, 2015 2 / 30

Page 3: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe

1 Introduction

2 Problem

3 GloVe Model

4 Experiments

Chris Kedzie GloVe March 25, 2015 3 / 30

Page 4: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Representations: A history

Chris Kedzie GloVe March 25, 2015 4 / 30

Page 5: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Neural Language Models – Recurrent NNLM

ht−1

wt

...

...

ht

ot+1

Chris Kedzie GloVe March 25, 2015 5 / 30

Page 6: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Neural Language Models – Recurrent NNLM

ht−1

wt

...

...

ht

ot+1

Chris Kedzie GloVe March 25, 2015 5 / 30

Page 7: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Neural Language Models – Recurrent NNLM

ht−1

wt

...

...

ht

ot+1

wt

Chris Kedzie GloVe March 25, 2015 5 / 30

Page 8: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Neural Language Models – Continuous BOW

wt−2

wt−1

wt+1

wt+2

wavg

...

...

ot

Chris Kedzie GloVe March 25, 2015 6 / 30

Page 9: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Neural Language Models – Continuous BOW

wt−2

wt−1

wt+1

wt+2

wavg

...

...

ot

Chris Kedzie GloVe March 25, 2015 6 / 30

Page 10: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Linear Relationships

Semanticwking − wman + wwoman ≈ wqueen

Syntacticweasy − weasiest + wluckiest ≈ wlucky

Chris Kedzie GloVe March 25, 2015 7 / 30

Page 11: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Scalable Embedding Learning

Noise Contrastive Estimation

–no more normalization required!

wt−2

wt−1

wt+1

wt+2

wavg

...

...

ot

Chris Kedzie GloVe March 25, 2015 8 / 30

Page 12: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Scalable Embedding Learning

Noise Contrastive Estimation –no more normalization required!

wt−2

wt−1

wt+1

wt+2

wavg

...

...

ot

Chris Kedzie GloVe March 25, 2015 8 / 30

Page 13: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Scalable Embedding Learning

Noise Contrastive Estimation –no more normalization required!

wt−2

wt−1

wt+1

wt+2

wavg

...

...

ot

wavgw

Chris Kedzie GloVe March 25, 2015 8 / 30

Page 14: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe

1 Introduction

2 Problem

3 GloVe Model

4 Experiments

Chris Kedzie GloVe March 25, 2015 9 / 30

Page 15: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 16: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 17: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 18: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 19: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 20: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 21: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 22: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 23: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 24: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 25: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 26: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 27: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 28: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for

P (w|the)Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imaginedevent before a live audience in a specific place. The performers may communicate this experience to the audience throughcombinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality,presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derivedfrom the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”).Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology,classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis definestheatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiatetheatre from the other performing arts, literature, and the arts in general.Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

Page 29: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Matrix Factorization Methods

e.g. SVD, COALS, etc. directly on co-occurrence matrix.

Main drawback: frequent words like the and a have an outsized effect onthe representation learning.

Chris Kedzie GloVe March 25, 2015 11 / 30

Page 30: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Matrix Factorization Methods

e.g. SVD, COALS, etc. directly on co-occurrence matrix.

Main drawback: frequent words like the and a have an outsized effect onthe representation learning.

Chris Kedzie GloVe March 25, 2015 11 / 30

Page 31: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe

1 Introduction

2 Problem

3 GloVe Model

4 Experiments

Chris Kedzie GloVe March 25, 2015 12 / 30

Page 32: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe Model

J =

V∑i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

Chris Kedzie GloVe March 25, 2015 13 / 30

Page 33: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 34: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 35: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 36: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 37: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 38: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Notation!

X ∈ RV×V word co-occurrence matrix

Xij frequency of word i co-occurring with word j

Xi =∑V

k Xik total number of occurrences of word i in corpus

Pij = P (j|i) = Xij

Xia.k.a. probability of word j occurring within the

context of word i

w ∈ Rd a word embedding of dimension d

w̃ ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

Page 39: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Motivation

Prob. and Ratio k = solid k = gas k = water k = fashion

P (k|ice) 1.9× 10−4 6.6× 10−5 3.0× 10−3 1.7× 10−5

P (k|steam) 2.2× 10−5 7.8× 10−4 2.2× 10−3 1.8× 10−5

P (k|ice)P (k|steam) 8.9 8.5× 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

Page 40: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Motivation

Prob. and Ratio k = solid k = gas k = water k = fashion

P (k|ice) 1.9× 10−4 6.6× 10−5 3.0× 10−3 1.7× 10−5

P (k|steam) 2.2× 10−5 7.8× 10−4 2.2× 10−3 1.8× 10−5

P (k|ice)P (k|steam) 8.9 8.5× 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

Page 41: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Motivation

Prob. and Ratio k = solid k = gas k = water k = fashion

P (k|ice) 1.9× 10−4 6.6× 10−5 3.0× 10−3 1.7× 10−5

P (k|steam) 2.2× 10−5 7.8× 10−4 2.2× 10−3 1.8× 10−5

P (k|ice)P (k|steam) 8.9 8.5× 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

Page 42: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Motivation

Prob. and Ratio k = solid k = gas k = water k = fashion

P (k|ice) 1.9× 10−4 6.6× 10−5 3.0× 10−3 1.7× 10−5

P (k|steam) 2.2× 10−5 7.8× 10−4 2.2× 10−3 1.8× 10−5

P (k|ice)P (k|steam) 8.9 8.5× 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

Page 43: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Motivation

Prob. and Ratio k = solid k = gas k = water k = fashion

P (k|ice) 1.9× 10−4 6.6× 10−5 3.0× 10−3 1.7× 10−5

P (k|steam) 2.2× 10−5 7.8× 10−4 2.2× 10−3 1.8× 10−5

P (k|ice)P (k|steam) 8.9 8.5× 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

Page 44: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F (wi, wj , w̃k) =PikPjk

F should encode information in the ratio PikPjk

.

Chris Kedzie GloVe March 25, 2015 16 / 30

Page 45: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F (wi, wj , w̃k) =PikPjk

F should encode information in the ratio PikPjk

.

Chris Kedzie GloVe March 25, 2015 16 / 30

Page 46: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F (wi − wj , w̃k) =PikPjk

Chris Kedzie GloVe March 25, 2015 17 / 30

Page 47: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F((wi − wj)T w̃k

)=PikPjk

Some more desiderata:

F should be unchanged by exchanging w → w̃ and X → XT

This requires that

F((wi − wj)T w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)⇒ F (wTi w̃k) = Pik

Chris Kedzie GloVe March 25, 2015 18 / 30

Page 48: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F((wi − wj)T w̃k

)=PikPjk

Some more desiderata:

F should be unchanged by exchanging w → w̃ and X → XT

This requires that

F((wi − wj)T w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)⇒ F (wTi w̃k) = Pik

Chris Kedzie GloVe March 25, 2015 18 / 30

Page 49: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F((wi − wj)T w̃k

)=PikPjk

Some more desiderata:

F should be unchanged by exchanging w → w̃ and X → XT

This requires that

F((wi − wj)T w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)⇒ F (wTi w̃k) = Pik

Chris Kedzie GloVe March 25, 2015 18 / 30

Page 50: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F((wi − wj)T w̃k

)=PikPjk

Some more desiderata:

F should be unchanged by exchanging w → w̃ and X → XT

This requires that

F((wi − wj)T w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)⇒ F (wTi w̃k) = Pik

F(wTi w̃k − wTj w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)Chris Kedzie GloVe March 25, 2015 18 / 30

Page 51: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

F((wi − wj)T w̃k

)=PikPjk

Some more desiderata:

F should be unchanged by exchanging w → w̃ and X → XT

This requires that

F((wi − wj)T w̃k

)=

F(wTi w̃k

)F(wTj w̃k

)⇒ F (wTi w̃k) = Pik

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)Chris Kedzie GloVe March 25, 2015 18 / 30

Page 52: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)

wTi w̃k = logPik = logXik − logXi

wTi w̃k = logXik − bi − b̃kwTi w̃k + bi + b̃k = logXik

Chris Kedzie GloVe March 25, 2015 19 / 30

Page 53: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)wTi w̃k = logPik = logXik − logXi

wTi w̃k = logXik − bi − b̃kwTi w̃k + bi + b̃k = logXik

Chris Kedzie GloVe March 25, 2015 19 / 30

Page 54: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)wTi w̃k = logPik = logXik − logXi

wTi w̃k = logXik − bi − b̃kwTi w̃k + bi + b̃k = logXik

Chris Kedzie GloVe March 25, 2015 19 / 30

Page 55: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)wTi w̃k = logPik = logXik − logXi

wTi w̃k = logXik − bi − b̃k

wTi w̃k + bi + b̃k = logXik

Chris Kedzie GloVe March 25, 2015 19 / 30

Page 56: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

exp(wTi w̃k − wTj w̃k

)=

exp(wTi w̃k

)exp

(wTj w̃k

)wTi w̃k = logPik = logXik − logXi

wTi w̃k = logXik − bi − b̃kwTi w̃k + bi + b̃k = logXik

Chris Kedzie GloVe March 25, 2015 19 / 30

Page 57: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function,

but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2

⇒ J =V∑

i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 58: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function, but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2

⇒ J =V∑

i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 59: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function, but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2

⇒ J =V∑

i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 60: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function, but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2

⇒ J =V∑

i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 61: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function, but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2⇒ J =

V∑i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2where f has the following desiderata:

1 f(0) = 0

2 f(x) should be non-decreasing so that rare co-occurrences are notoverweighted.

3 f(x) should be relatively small for large values of x, so that frequentco-occurrences are not overweighted.

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 62: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Derivation

This suggests a least-squares objective function, but...

J =

V∑i,j=1

(wTi w̃j + bi + b̃j − logXij

)2⇒ J =

V∑i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

where f(x) =

{(x

xmax

)αif x < xmax

1 otherwise

Chris Kedzie GloVe March 25, 2015 20 / 30

Page 63: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Weighting Function

Chris Kedzie GloVe March 25, 2015 21 / 30

Page 64: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Optimization

J =

V∑i,j=1

f (Xij)(wTi w̃j + bi + b̃j − logXij

)2

where f(x) =

{(x

xmax

)αif x < xmax

1 otherwise

In this paper: α = 34 and xmax = 100.

The model is trained using AdaGrad and stochastically sampling non-zeroelements from X. An initial learning rate of .05 is used.

Chris Kedzie GloVe March 25, 2015 22 / 30

Page 65: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

GloVe

1 Introduction

2 Problem

3 GloVe Model

4 Experiments

Chris Kedzie GloVe March 25, 2015 23 / 30

Page 66: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Analogies

a is to b as c to ?

Paris is to France as Tokyo is to ?

arg maxw′ cosine-sim(

wb − wa + wc

, w′)

Chris Kedzie GloVe March 25, 2015 24 / 30

Page 67: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Analogies

a is to b as c to ?Paris is to France as Tokyo is to ?

arg maxw′ cosine-sim(

wb − wa + wc

, w′)

Chris Kedzie GloVe March 25, 2015 24 / 30

Page 68: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Analogies

a is to b as c to ?Paris is to France as Tokyo is to ?

arg maxw′ cosine-sim(wb − wa + wc, w′)

Chris Kedzie GloVe March 25, 2015 24 / 30

Page 69: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Analogies – Results

Chris Kedzie GloVe March 25, 2015 25 / 30

Page 70: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Similarities

Humans scored similarity of word pairs.

word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1)

king cabbage 0.23 0.11king queen 8.58 0.78king rook 5.92 0.25

Embeddings are evaluated by Spearman rank correlation of human scoresto cosine similarity.

Chris Kedzie GloVe March 25, 2015 26 / 30

Page 71: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Similarities

Humans scored similarity of word pairs.

word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1)

king cabbage 0.23 0.11king queen 8.58 0.78king rook 5.92 0.25

Embeddings are evaluated by Spearman rank correlation of human scoresto cosine similarity.

Chris Kedzie GloVe March 25, 2015 26 / 30

Page 72: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Similarities

Humans scored similarity of word pairs.

word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1)

king cabbage 0.23 0.11king queen 8.58 0.78king rook 5.92 0.25

Embeddings are evaluated by Spearman rank correlation of human scoresto cosine similarity.

Chris Kedzie GloVe March 25, 2015 26 / 30

Page 73: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Word Similarities – Results

Chris Kedzie GloVe March 25, 2015 27 / 30

Page 74: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Named Entity Recognition

NER is a sequence tagging task where the goal is to identify namedentities:

Jim bought 300 shares of Acme Corp . in 2006 .B-PER O O O O B-ORG I-ORG I-ORG O B-TIME O

Combined discrete features of existing system (Stanford NER).

Word embeddings were treated as additional features in a linear-chain CRFmodel.

Chris Kedzie GloVe March 25, 2015 28 / 30

Page 75: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Named Entity Recognition

NER is a sequence tagging task where the goal is to identify namedentities:

Jim bought 300 shares of Acme Corp . in 2006 .B-PER O O O O B-ORG I-ORG I-ORG O B-TIME O

Combined discrete features of existing system (Stanford NER).

Word embeddings were treated as additional features in a linear-chain CRFmodel.

Chris Kedzie GloVe March 25, 2015 28 / 30

Page 76: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

Named Entity Recognition – Results

Chris Kedzie GloVe March 25, 2015 29 / 30

Page 77: GloVe: Global Vectors for Word Representationllcao.net/cu-deeplearning15/presentation/nn-pres.pdf · 2021. 6. 22. · classi cation into genres, and many of its themes, stock characters,

The end! Thanks!

Chris Kedzie GloVe March 25, 2015 30 / 30