christopher potts - stanford university...overview designs vector comparison basic reweighting...
TRANSCRIPT
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Distributed word representations
Christopher Potts
Stanford Linguistics
CS 224U: Natural language understandingApril 8 and 13
1 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Plan
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
2 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Meaning latent in co-occurrence patterns
against age agent ages ago agree ahead ain’t air aka al
against 2003 90 39 20 88 57 33 15 58 22 24age 90 1492 14 39 71 38 12 4 18 4 39
agent 39 14 507 2 21 5 10 3 9 8 25ages 20 39 2 290 32 5 4 3 6 1 6ago 88 71 21 32 1164 37 25 11 34 11 38
agree 57 38 5 5 37 627 12 2 16 19 14ahead 33 12 10 4 25 12 429 4 12 10 7
ain’t 15 4 3 3 11 2 4 166 0 3 3air 58 18 9 6 34 16 12 0 746 5 11
aka 22 4 8 1 11 19 10 3 5 261 9al 24 39 25 6 38 14 7 3 11 9 861
3 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Meaning latent in co-occurrence patterns
Class Word
0 awful0 terrible0 lame0 worst0 disappointing1 nice1 amazing1 wonderful1 good1 awesome
Pr(Class = 1) Word
? w1? w2? w3? w4
A hopeless learning scenario
3 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Meaning latent in co-occurrence patterns
Class Word
0 awful0 terrible0 lame0 worst0 disappointing1 nice1 amazing1 wonderful1 good1 awesome
Pr(Class = 1) Word
? w1? w2? w3? w4
A hopeless learning scenario
3 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Meaning latent in co-occurrence patterns
Class Word excellent terrible
0 awful 6 1130 terrible 8 3090 lame 1 690 worst 9 2020 disappointing 19 291 nice 118 21 amazing 91 61 wonderful 66 71 good 21 91 awesome 67 2
Pr(Class=1) Word excellent terrible
≈0 w1 4 82≈0 w2 5 84≈1 w3 49 3≈1 w4 41 5
A promising learning scenario
3 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Meaning latent in co-occurrence patterns
Class Word excellent terrible
0 awful 6 1130 terrible 8 3090 lame 1 690 worst 9 2020 disappointing 19 291 nice 118 21 amazing 91 61 wonderful 66 71 good 21 91 awesome 67 2
Pr(Class=1) Word excellent terrible
≈0 w1 4 82≈0 w2 5 84≈1 w3 49 3≈1 w4 41 5
A promising learning scenario
3 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
High-level goals
1. Begin thinking about how vectors can encode themeanings of linguistic units.
2. Foundational concepts for vector-space model (VSMs).
3. A foundation for deep learning NLU models.
4. In your assignment and projects, you’re likely to userepresentations like these:É to understand and model linguistic and social
phenomena; and/orÉ as inputs to other machine learning models.
4 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Associated materials
1. Codea. vsm.pyb. vsm_01_distributional.ipynbc. vsm_02_dimreduce.ipynbd. vsm_03_retrofitting.ipynb
2. Homework 1 and bake-off 1: hw_wordsim.ipynb3. Screencasts:
a. Overview [link]b. Vector comparison [link]c. Reweighting [link]d. Dimensionality reduction [link]
4. Core readings: Turney and Pantel 2010; Smith 2019;Pennington et al. 2014; Faruqui et al. 2015
5 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Guiding hypothesesFirth (1957)“You shall know a word by the company it keeps.”
Firth (1957)“the complete meaning of a word is always contextual, andno study of meaning apart from context can be takenseriously.”
Wittgenstein (1953)“the meaning of a word is its use in the language”
Harris (1954)“distributional statements can cover all of the material of alanguage without requiring support from other types ofinformation.”
Turney and Pantel (2010)“If units of text have similar vectors in a text frequencymatrix, then they tend to have similar meanings.”
6 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Great power, a great many design choicestokenizationannotationtaggingparsingfeature selection... cluster texts by date/author/discourse context/. . .⇓ w
Matrix design
word × documentword × wordword × search proximityadj. × modified nounword × dependency rel.
...
Reweighting
probabilitieslength norm.TF-IDFPMIPositive PMI
...
Dimensionalityreduction
LSAPLSALDAPCANNMF
...
Vectorcomparison
EuclideanCosineDiceJaccardKL
...
Nearly the full cross-product to explore; only a handful of the com-binations are ruled out mathematically. Models like GloVe andword2vec offer packaged solutions to design/weighting/reduction andreduce the importance of the choice of comparison method.
7 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Designs
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
8 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word x word
against age agent ages ago agree ahead ain’t air aka al
against 2003 90 39 20 88 57 33 15 58 22 24age 90 1492 14 39 71 38 12 4 18 4 39
agent 39 14 507 2 21 5 10 3 9 8 25ages 20 39 2 290 32 5 4 3 6 1 6ago 88 71 21 32 1164 37 25 11 34 11 38
agree 57 38 5 5 37 627 12 2 16 19 14ahead 33 12 10 4 25 12 429 4 12 10 7
ain’t 15 4 3 3 11 2 4 166 0 3 3air 58 18 9 6 34 16 12 0 746 5 11
aka 22 4 8 1 11 19 10 3 5 261 9al 24 39 25 6 38 14 7 3 11 9 861
9 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word x document
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
against 0 0 0 1 0 0 3 2 3 0age 0 0 0 1 0 3 1 0 4 0
agent 0 0 0 0 0 0 0 0 0 0ages 0 0 0 0 0 2 0 0 0 0ago 0 0 0 2 0 0 0 0 3 0
agree 0 1 0 0 0 0 0 0 0 0ahead 0 0 0 1 0 0 0 0 0 0
ain’t 0 0 0 0 0 0 0 0 0 0air 0 0 0 0 0 0 0 0 0 0
aka 0 0 0 1 0 0 0 0 0 0
10 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word x discourse context
Upper left corner of an interjection × dialog-act tag matrixderived from the Switchboard Dialog Act Corpus:
% + ˆ2 ˆg ˆh ˆq aa
absolutely 0 2 0 0 0 0 95actually 17 12 0 0 1 0 4anyway 23 14 0 0 0 0 0
boy 5 3 1 0 5 2 1bye 0 1 0 0 0 0 0
bye-bye 0 0 0 0 0 0 0dear 0 0 0 0 1 0 0
definitely 0 2 0 0 0 0 56exactly 2 6 1 0 0 0 294
gee 0 3 0 0 2 1 1goodness 1 0 0 0 2 0 0
11 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
phonological segment × feature valuesDerived from http://www.linguistics.ucla.edu/people/hayes/120a/.Dimensions: (141× 28).
sylla
bic
stre
ss
long
conso
nan
tal
sonor
ant
conti
nuan
t
del
ayed
.rel
ease
app
roxi
man
t
tap
trill
· · ·
6 1 −1 −1 −1 1 1 0 1 −1 −1
· · ·
A 1 −1 −1 −1 1 1 0 1 −1 −1Œ 1 −1 −1 −1 1 1 0 1 −1 −1a 1 −1 −1 −1 1 1 0 1 −1 −1æ 1 −1 −1 −1 1 1 0 1 −1 −12 1 −1 −1 −1 1 1 0 1 −1 −1O 1 −1 −1 −1 1 1 0 1 −1 −1o 1 −1 −1 −1 1 1 0 1 −1 −1G 1 −1 −1 −1 1 1 0 1 −1 −1@ 1 −1 −1 −1 1 1 0 1 −1 −1...
...
12 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
phonological segment × feature valuesDerived from http://www.linguistics.ucla.edu/people/hayes/120a/.Dimensions: (141× 28).
ɒɑ
ɶ
aæ
ʌ
ɔ
o ɤ
ɘ
œ
əә
e
ɞ
ø
ɛ
ɵ
ɯu
ʊ
ɨʉ
y i
ʏɪ
ŋ
ʟ
ɫ
ɴ
ʀ
ɲ
ʎ
ŋŋ
ʟʟ
ɳ
ʙ
ɭ
ɺ
ɻ
ɽr
nm
l
ɾ
ɱ
ʔɣx
kg
kxg ɣ
ħʕ
ʁq
χ
ɢ
ɕ
ɟ
ʝ
c
ç
dʑtç
ɣɣ
xx
k k
g g
ʑ
ʈ ɖ
ɬ
ʐ
ɸ
ʂ ʒ
z
v
t
ʃ
s
p
f
d
b
θ
ɮ
ð
β
dʒdz
dɮ
d ɮ tʃtɬ tstɬ ts
tɬ d zd ɮ ʈʂ
ɖʑ
pf bv
pɸbβ
tθdð
cçɟʝ
kxkxgɣ
g ɣqχɢʁ
ɧ
kpgb
pt
bd
ɰɰ
w ɥ
jɹ
ʋ
ʍ ɦh
12 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Feature representations of data
• the movie was horrible becomes [4,0,1/4].
• The complex, real-world response of an experimentalsubject to a particular example becomes [0,1] or [118,1].
• A human is modeled as a vector [24,140,5,12].
• A continuous, noisy speech stream is reduced to arestricted set of acoustic features.
13 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Other designs
• word × dependency rel.• word × syntactic context• adj. × modified noun• word × search query• person × product• word × person• word × word × pattern• verb × subject × object...
14 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Windows and scaling: What is a co-occurrence?
• Larger, flatter windows capture more semanticinformation.
• Small, more scaled windows capture more syntactic(collocational) information.
• Textual boundaries can be separately controlled; coreunit as the sentence/paragraph/document will havemajor consequences.
15 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Windows and scaling: What is a co-occurrence?
from swerve of shore to bend of bay , brings
4 3 2 1 0 1 2 3 4 5
• Larger, flatter windows capture more semanticinformation.
• Small, more scaled windows capture more syntactic(collocational) information.
• Textual boundaries can be separately controlled; coreunit as the sentence/paragraph/document will havemajor consequences.
15 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Windows and scaling: What is a co-occurrence?
from swerve of shore to bend of bay , brings
4 3 2 1 0 1 2 3 4 5
from swerve of shore to bend of bay , brings
Window: 3 4 3 2 1 0 1 2 3 4 5
Scaling: flat 0 1 1 1 1 1 1 1 0 0
Scaling: 1n 0 1
312
11 1 1
112
13 0 0
• Larger, flatter windows capture more semanticinformation.
• Small, more scaled windows capture more syntactic(collocational) information.
• Textual boundaries can be separately controlled; coreunit as the sentence/paragraph/document will havemajor consequences.
15 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Windows and scaling: What is a co-occurrence?
from swerve of shore to bend of bay , brings
4 3 2 1 0 1 2 3 4 5
from swerve of shore to bend of bay , brings
Window: 3 4 3 2 1 0 1 2 3 4 5
Scaling: flat 0 1 1 1 1 1 1 1 0 0
Scaling: 1n 0 1
312
11 1 1
112
13 0 0
• Larger, flatter windows capture more semanticinformation.
• Small, more scaled windows capture more syntactic(collocational) information.
• Textual boundaries can be separately controlled; coreunit as the sentence/paragraph/document will havemajor consequences.
15 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Windows and scaling: What is a co-occurrence?
from swerve of shore to bend of bay , brings
4 3 2 1 0 1 2 3 4 5
• Larger, flatter windows capture more semanticinformation.
• Small, more scaled windows capture more syntactic(collocational) information.
• Textual boundaries can be separately controlled; coreunit as the sentence/paragraph/document will havemajor consequences.
15 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippets
snippets
March 31, 2019
In [1]: import syssys.path.append("../cs224u")
In [3]: import osimport pandas as pd
DATA_HOME = os.path.join('data', 'vsmdata')
# IMDB: Window size = 5; scaling = 1/nimdb5 = pd.read_csv(
os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
# IMDB: Window size = 20; scaling = flatimdb20 = pd.read_csv(
os.path.join(DATA_HOME, 'imdb_window20-flat.csv.gz'), index_col=0)
# Gigaword: Window size = 5; scaling = 1/ngiga5 = pd.read_csv(
os.path.join(DATA_HOME, 'giga_window5-scaled.csv.gz'), index_col=0)
# Gigaword: Window size = 20; scaling = flatgiga20 = pd.read_csv(
os.path.join(DATA_HOME, 'giga_window20-flat.csv.gz'), index_col=0)
In [48]: import osimport pandas as pdimport vsm
In [32]: ABC = pd.DataFrame([[ 2.0, 4.0],[10.0, 15.0],[14.0, 10.0]],index=['A', 'B', 'C'],columns=['x', 'y'])
In [33]: A = ABC.loc['A']B = ABC.loc['B']
1
16 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Vector comparison
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
17 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Running example
dx dy
A 2 4B 10 15C 14 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(10,15)B
(2,4)A
(14,10)C
●
●
●
• Focus on distance measures• Illustrations with row vectors
18 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
EuclideanBetween vectors u and v of dimension n:
euclidean(u,v) =
√
√
√
√
n∑
i=1
|ui − vi|2
dx dy
A 2 4B 10 15C 14 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(10,15)B
(2,4)A
(14,10)C
10 − 142 + 15 − 102 = 6.4
2 − 102 + 4 − 152 = 13.6
●
●
●
19 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Length normalization
Given a vector u of dimension n, the L2-length of u is
||u||2 =
√
√
√
√
n∑
i=1
u2i
and the length normalization of u is�
u1
||u||2,
u2
||u||2, · · · ,
un
||u||2
�
20 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Length normalization
dx dy ||u||2A 2 4 4.47B 10 15 18.03C 14 10 17.20
row L2 norm⇒
dx dy
A 24.47
44.47
B 1018.03
1518.03
C 1417.20
1017.20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(10,15)B
(2,4)A
(14,10)C
●
●
●
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(0.55,0.83)B
(0.45,0.89)A
(0.81,0.58)C
●
●
●
20 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Cosine distanceBetween vectors u and v of dimension n:
cosine(u,v) = 1−
∑ni=1 ui × vi
||u||2 × ||v||2
dx dy
A 2 4B 10 15C 14 10
21 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Cosine distanceBetween vectors u and v of dimension n:
cosine(u,v) = 1−
∑ni=1 ui × vi
||u||2 × ||v||2
dx dy
A 2 4B 10 15C 14 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(10,15)B
(2,4)A
(14,10)C
1 −(10
× 14) + (15
× 10)
||10, 1
5||× ||14
, 10||= 0.
065
1−
(2× 1
0)+ (
4× 1
5)
||2, 4
|| ×||1
0, 1
5||=
0.00
8
●
●
●
21 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Cosine distanceBetween vectors u and v of dimension n:
cosine(u,v) = 1−
∑ni=1 ui × vi
||u||2 × ||v||2
dx dy
A 2 4B 10 15C 14 10
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(0.55,0.83)B
(0.45,0.89)A
(0.81,0.58)C
1 −(0.
55× 0.
81) + (0.
83× 0.
58)
||0.55
, 0.8
1||× ||0.
81, 0
.58||= 0.
065
1−
(0.4
5×
0.55
) +(0
.89
×0.
83)
||0.4
5, 0
.89||
×||0
.55,
0.8
3||=
0.00
8
●
●
●
21 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Matching-based methods
Matching coefficientmatching(u,v) =
n∑
i=1
min(ui,vi)
Jaccard distancejaccard(u,v) = 1−
matching(u,v)∑n
i=1 max(ui,vi)
Dice distancedice(u,v) = 1−
2×matching(u,v)∑n
i=1 ui + vi
Overlapoverlap(u,v) = 1−
matching(u,v)
min�∑n
i=1 ui ,∑n
i=1 vi�
22 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
KL divergenceBetween probability distributions p and q:
D(p ‖ q) =n∑
i=1
pi log�
pi
qi
�
p is the reference distribution. Before calculation, smooth byadding ε.
23 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
KL divergence
dx dy
A 2 4B 10 15C 14 10
Normalize the rows⇒
dx dy
A 0.33 0.67B 0.40 0.60C 0.58 0.42
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(10,15)B
(2,4)A
(14,10)C
●
●
●
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(0.4,0.6)B
(0.33,0.67)A
(0.58,0.42)C
(0.33 × log(0.330.4
)) + (0.67 × log(0.670.6
)) = 0.01
(0.4 × log( 0.40.58
)) + (0.6 × log( 0.60.42
)) = 0.13
●
●
●
23 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
KL variantsSymmetric KL
D(p ‖ q) +D(q ‖ p)
KL-divergence with skew
D(p ‖ αq+ (1− α)p) 0 ≤ α ≤ 1
p q
α = 1 ; SkewKL = 1.17
0.10.2
0.7
0.10.2
0.7
p q
α = 0.8 ; SkewKL = 0.63
0.10.2
0.7
0.200.22
0.58
p q
α = 0.5 ; SkewKL = 0.25
0.10.2
0.7
0.2
0.40.4
p q
α = 0.2 ; SkewKL = 0.05
0.10.2
0.7
0.200.22
0.58
p q
α = 0 ; SkewKL = 0
0.10.2
0.7
0.10.2
0.7
Jensen–Shannon distance√
√
√1
2D�
p ‖p+ q
2
�
+1
2D�
q ‖p+ q
2
�
24 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Relationships and generalizations
1. Euclidean, Jaccard, and Dice with raw count vectors willtend to favor raw frequency over distributional patterns.
2. Euclidean with L2-normed vectors is equivalent to cosinew.r.t. ranking (Manning and Schütze 1999:301).
3. Jaccard and Dice are equivalent w.r.t. ranking.
4. Both L2-norms and probability distributions can obscuredifferences in the amount/strength of evidence, whichcan in turn have an effect on the reliability of cosine,normed-euclidean, and KL divergence. Theseshortcomings might be addressed through weightingschemes.
25 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Proper distance metric?To qualify as a distance metric, a vector comparison methodd has to be symmetric (d(x,y) = d(y,x)), assign 0 to identicalvectors (d(x,x) = 0), and satisfy the triangle inequality:
d(x, z) ¶ d(x,y) + d(y, z)
Cosine distance as I defined itdoesn’t satisfy this:
Distance metric?
Yes: Euclidean, Jaccard forbinary vectors, Jensen–Shannon,cosine as
cos−1�∑n
i=1ui × vi
||u||2×||v||2
�
π
No: Matching, Jaccard, Dice,Overlap, KL divergence,Symmetric KL, KL with skew
26 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Comparing the two versions of cosineRandom sample of 100 vectors from our giga20 countmatrix. Correlation is 99.8.
0.2 0.3 0.4 0.5Improper cosine distance
0.175
0.200
0.225
0.250
0.275
0.300
0.325
0.350
Prop
er c
osin
e di
stan
ce
27 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippetsIn [ ]: import sys
sys.path.append("../cs224u")
DATA_HOME = os.path.join('../cs224u/data', 'vsmdata')
In [1]: import osimport pandas as pdimport vsm
In [2]: ABC = pd.DataFrame([[ 2.0, 4.0],[10.0, 15.0],[14.0, 10.0]], index=['A', 'B', 'C'], columns=['x', 'y'])
In [3]: vsm.euclidean(ABC.loc['A'], ABC.loc['B'])
Out[3]: 13.601470508735444
In [4]: vsm.vector_length(ABC.loc['A'])
Out[4]: 4.47213595499958
In [5]: vsm.length_norm(ABC.loc['A']).values
Out[5]: array([0.4472136 , 0.89442719])
In [6]: vsm.cosine(ABC.loc['A'], ABC.loc['B'])
Out[6]: 0.007722123286332261
In [7]: vsm.matching(ABC.loc['A'], ABC.loc['B'])
Out[7]: 6.0
In [8]: vsm.jaccard(ABC.loc['A'], ABC.loc['B'])
Out[8]: 0.76
In [9]: DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
In [10]: vsm.cosine(imdb5.loc['good'], imdb5.loc['excellent'])
Out[10]: 0.9644382411451131
In [11]: vsm.cosine(imdb5.loc['good'], imdb5.loc['bad'])
Out[11]: 0.9480014759326252
2
28 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippetsIn [ ]:
In [ ]:
In [9]: DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
In [10]: vsm.cosine(imdb5.loc['good'], imdb5.loc['excellent'])
Out[10]: 0.9644382411451131
In [11]: vsm.cosine(imdb5.loc['good'], imdb5.loc['bad'])
Out[11]: 0.9480014759326252
In [12]: vsm.neighbors('bad', imdb5).head()
Out[12]: bad 0.000000guys 0.823744. 0.844851taste 0.893747guy 0.896312dtype: float64
In [13]: vsm.neighbors('bad', imdb5, distfunc=vsm.jaccard).head(3)
Out[13]: bad 0.000000think 0.783744better 0.788782dtype: float64
In [ ]:
3
28 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Basic reweighting
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
29 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Goals of reweighting
• Amplify the important, the trustworthy, the unusual;deemphasize the mundane and the quirky.
• Absent a defined objective function, this will remainfuzzy.
• The intuition behind moving away from raw counts isthat frequency is a poor proxy for the above values.
• So we should ask of each weighting scheme:É How does it compare to the raw count values?É How does it compare to the word frequencies?É What overall distribution of values does it deliver?
• We hope to do no feature selection based on counts,stopword dictionaries, etc. Rather, we want our methodsto reveal what’s important without these ad hocinterventions.
30 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
NormalizationL2 norming (repeated from earlier)Given a vector u of dimension n, the L2-length of u is
||u||2 =
√
√
√
√
n∑
i=1
u2i
and the length normalization of u is�
u1
||u||2,
u2
||u||2, · · · ,
un
||u||2
�
Probability distributionGiven a vector u of dimension n containing all positive values, let
sum(u) =n∑
i=1
ui
and then the probability distribution of u is�
u1
sum(u),
u2
sum(u), · · · ,
un
sum(u)
�
31 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Observed/Expected
rowsum(X, i) =n∑
j=1
Xij colsum(X, j) =m∑
i=1
Xij sum(X) =m∑
i=1
n∑
j=1
Xij
expected(X, i, j) =rowsum(X, i) · colsum(X, j)
sum(X)
oe(X, i, j) =Xij
expected(X, i, j)
32 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Observed/Expected
rowsum(X, i) =n∑
j=1
Xij colsum(X, j) =m∑
i=1
Xij sum(X) =m∑
i=1
n∑
j=1
Xij
expected(X, i, j) =rowsum(X, i) · colsum(X, j)
sum(X)
oe(X, i, j) =Xij
expected(X, i, j)
a b rowsum
x 34 11 45y 47 7 54
colsum 81 18 99
oe⇒
a b
x 3445·81
99
1145·18
99
y 4754·81
99
754·18
99
32 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Observed/Expected
rowsum(X, i) =n∑
j=1
Xij colsum(X, j) =m∑
i=1
Xij sum(X) =m∑
i=1
n∑
j=1
Xij
expected(X, i, j) =rowsum(X, i) · colsum(X, j)
sum(X)
oe(X, i, j) =Xij
expected(X, i, j)
Observed
tabs reading birds
keep 20 20 20enjoy 1 20 20
keep and tabs co-occur more thanexpected given their frequencies,enjoy and tabs less than expected
Expected
tabs reading birds
keep 60·21101
60·40101
60·40101
enjoy 41·21101
41·40101
41·40101
=
tabs reading birds
keep 12.48 23.76 23.76enjoy 8.5 16.24 16.24
32 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Pointwise Mutual Information (PMI)
PMI is observed/expected in log-space (with log(0) = 0):
pmi(X, i, j) = log�
Xij
expected(X, i, j)
�
= log�
P(Xij)
P(Xi∗) · P(X∗j)
�
d1 d2 d3 d4
A 10 10 10 10B 10 10 10 0C 10 10 0 0D 0 0 0 1
⇒
P(w,d) P(w)
A 0.11 0.11 0.11 0.11 0.44B 0.11 0.11 0.11 0.00 0.33C 0.11 0.11 0.00 0.00 0.22D 0.00 0.00 0.00 0.01 0.01
P(d) 0.33 0.33 0.22 0.12
PMI⇓
d1 d2 d3 d4
A −0.28 −0.28 0.13 0.73B 0.01 0.01 0.42 0.00C 0.42 0.42 0.00 0.00D 0.00 0.00 0.00 2.11
33 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Selected PMI valuesSelected PMI values
P(word)
P(context)
P(word, context) =
1.02
0
-0.67
-1.18
0.51 0.17 -0.08
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 0.25
34 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Positive PMI
The issuePMI is actually undefined when Xij = 0. The usual response isthe one given above: set PMI to 0 in such cases. However,this is arguably not coherent (Levy and Goldberg 2014):
• Larger than expected count ⇒ large PMI• Smaller than expected count ⇒ small PMI• 0 count ⇒ placed right in the middle!?
PPMI
ppmi(X, i, j) = max(0,pmi(X, i, j))
35 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
TF-IDFFor a corpus of documents D:
• Term frequency (TF): P(w|d)
• Inverse document frequency (IDF): log�
|D|�
�{d∈D:w∈d}�
�
�
(log(0) = 0)
• TF-IDF: TF × IDF
d1 d2 d3 d4
A 10 10 10 10B 10 10 10 0C 10 10 0 0D 0 0 0 1
⇒IDF
A 0.00B 0.29C 0.69D 1.39
⇓TF
d1 d2 d3 d4
A 0.33 0.33 0.50 0.91B 0.33 0.33 0.50 0.00C 0.33 0.33 0.00 0.00D 0.00 0.00 0.00 0.09
TF-IDFd1 d2 d3 d4
A 0.00 0.00 0.00 0.00B 0.10 0.10 0.14 0.00C 0.23 0.23 0.00 0.00D 0.00 0.00 0.00 0.13
36 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
IDF values
docCount
0 1 2 3 4 5 6 7 8 9 10
000.110.220.360.510.690.92
1.2
1.61
2.3
= corpus size
IDF
= lo
g(10
/ do
cCou
nt)
37 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Selected TF-IDF valuesSelected TF-IDF values
TF
docCount
0.23
0.07
0.01
0.46
0.14
0.02
1.15
0.35
0.05
2.3
0.69
0.11
0.11
0.18
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
1
2
3
4
5
6
7
8
9
10
38 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Other weighting/normalization schemes
• t-test: P(w,d)−P(w)P(d)pP(w)P(d)
• TF-IDF variants that seek to be sensitive to the empiricaldistribution of words (For discussion and references,Manning and Schütze 1999:553.)
• Pairwise distance matrices:
dx dy
A 2 4B 10 15C 14 10
cosine⇒
A B C
A 0 0.008 0.116B 0.008 0 0.065C 0.116 0.065 0
39 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Weighting scheme cell-value distributions
0.00 0.25 0.50 0.75 1.00 1.25Cell value 1e8
101
103
105
107
Raw counts
0.0 0.2 0.4 0.6 0.8 1.0Cell value
101
102
103
104
105
106
107
L2 norming
0.0 0.2 0.4 0.6 0.8Cell value
101
103
105
107
Probabilities
0.0 0.5 1.0 1.5 2.0 2.5 3.0Cell value 1e7
101
103
105
107
Observed/Expected
10 5 0 5 10 15Cell value
101
102
103
104
105
106
107
PMI
0 5 10 15Cell value
101
102
103
104
105
106
107
PPMI
0.0 0.5 1.0 1.5 2.0 2.5Cell value
101
103
105
107
TF-IDF
Uses the giga5 matrix loaded earlier. Others look similar.40 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Weighting scheme relationships to counts
Uses the giga5 matrix loaded earlier. Others look similar.41 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Relationships and generalizations• The theme running through nearly all these schemes is
that we want to weight a cell value Xij relative to thevalue we expect given Xi∗ and X∗j.
• Many weighting schemes end up favoring rare eventsthat may not be trustworthy.
• The magnitude of counts can be important; [1,10] and[1000,10000] might represent very different situations;creating probability distributions or length normalizingwill obscure this.
• PMI and its variants will amplify the values of counts thatare tiny relative to their rows and columns.Unfortunately, with language data, these are often noise
• TF-IDF severely punishes words that appear in manydocuments – it behaves oddly for dense matrices, whichcan include word × word matrices.
42 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippets
vsm_01_code4_solved
April 1, 2019
In [1]: import osimport pandas as pdimport vsm
DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
imdb5_oe = vsm.observed_over_expected(imdb5)
imdb5_norm = imdb5.apply(vsm.length_norm, axis=1)
imdb5_ppmi = vsm.pmi(imdb5)
imdb5_pmi = vsm.pmi(imdb5, positive=False)
imdb5_tfidf = vsm.tfidf(imdb5)
In [2]: vsm.neighbors('bad', imdb5).head()
Out[2]: bad 0.000000guys 0.823744. 0.844851taste 0.893747guy 0.896312dtype: float64
In [3]: vsm.neighbors('bad', imdb5_ppmi).head()
Out[3]: bad 0.000000good 0.701241awful 0.757309terrible 0.763324horrible 0.763637dtype: float64
1
43 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippets
vsm_01_code4_solved
April 1, 2019
In [1]: import osimport pandas as pdimport vsm
DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
imdb5_oe = vsm.observed_over_expected(imdb5)
imdb5_norm = imdb5.apply(vsm.length_norm, axis=1)
imdb5_ppmi = vsm.pmi(imdb5)
imdb5_pmi = vsm.pmi(imdb5, positive=False)
imdb5_tfidf = vsm.tfidf(imdb5)
In [2]: vsm.neighbors('bad', imdb5).head()
Out[2]: bad 0.000000guys 0.823744. 0.844851taste 0.893747guy 0.896312dtype: float64
In [3]: vsm.neighbors('bad', imdb5_ppmi).head()
Out[3]: bad 0.000000good 0.701241awful 0.757309terrible 0.763324horrible 0.763637dtype: float64
1
43 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Subword information
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
44 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Motivation
1. Schütze (1993) pioneered subword modeling to improverepresentations by reducing sparsity, thereby increasingthe density of connections in a VSM.
2. Subword modeling will also
a. Pull morphological variants closer togetherb. Facilitate modeling out-of-vocabulary itemsc. Reduce the importance of any particular tokenization
scheme
45 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Technique
Bojanowski et al. (2016) (the fastText team) motivate astraightforward approach:
1. Given a word-level VSM, the vector for a character-leveln-gram x is the sum of all the vectors of wordscontaining x.
2. Represent each word w as the sum of its character-leveln-grams.
3. Add in the representation of w if available
A linguistically richer variant might use sequences ofmorphemes rather than characters.
Example with 4-gramssuperbly becomes
[<w>sup, supe, uper, perb, erbl, rbly, bly</w>]
46 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippets
vsm_01_code5_solved
April 1, 2019
In [1]: import osimport pandas as pdimport vsm
DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
In [2]: imdb5_ngrams = vsm.ngram_vsm(imdb5, n=4)
In [3]: imdb5_ngrams.loc['<w>sup'].values
Out[3]: array([3.41545000e+03, 3.70000000e+01, 4.95458333e+04, ...,2.23950000e+02, 4.64833333e+01, 3.12166667e+01])
In [4]: imdb5_ngrams.shape
Out[4]: (9806, 5000)
In [5]: vsm.get_character_ngrams("superbly", n=4)
Out[5]: ['<w>sup', 'supe', 'uper', 'perb', 'erbl', 'rbly', 'bly</w>']
In [6]: def character_level_rep(word, cf, n=4):ngrams = vsm.get_character_ngrams(word, n)ngrams = [n for n in ngrams if n in cf.index]reps = cf.loc[ngrams].valuesreturn reps.sum(axis=0)
In [7]: superbly = character_level_rep("superbly", imdb5_ngrams)
In [8]: superbly.shape
Out[8]: (5000,)
1
47 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Word-piece tokenizing
Later in the term, we’ll encounter tokenizers that break somewords into subword chunks:
Encode this sentence.
En##codethissentence.
Bert knows Snuffleupagus
BertknowsS##nu##ffle##up##agu##s
48 / 90
https://github.com/google/sentencepiece
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Visualization
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
49 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Techniques
• Our goal is to visualize very high-dimensional spaces intwo or three dimensions. This will inevitably involvecompromises.
• Still, visualization can give you a feel for what is in yourVSM, especially if you pair it with other kinds ofqualitative exploration (e.g., using vsm.neighbors).
• There are many visualization techniques implemented insklearn.manifold; see this user guide for an overviewand discussion of trade-offs.
50 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
t-SNE on the giga20 PPMI VSM
80 60 40 20 0 20 40 60 80
60
40
20
0
20
40
60
!
);
.
...
;p
</p><p>
?
___
abandoned
abdomen
abduct
ability
able
abortion
abraham
abroad absence
absorb
abstain
abstract
abu
abundance
abuse
accelerate
acceptaccepted
access
accident
accommodation
accomplish
accord
according
accountaccountingaccounts
accumulate
accuse
accused
ache
achieve
acknowledge
acknowledged
acoustic
acquire
acquit
acre
across acrylic
act
acting
action
actions
active
activists
activities
activity
actoractress
actually
ad
add
added
adding
additionadditional
address
adhesive
adjourn
adjustment
administration
admiralty
admire
admission
admit
admitted
adopt
adore
adorn
adult
adv
advance
advanced
advantage
adventure
adversaryadvertisement
advertiseradvertising
advice
advise
adviser
aerial
aeronautics
affairs
affect
affected
afghanafghanistan
afraid
africa
african
afternoon
age
agencies
agency
agenda
agent
agents
aggravate
aggression
aggressive
ago
agony
agreeagreed
agreement
ahead
aid
aides
aids
aim
aimed
airaircraft
airlineairlines
airplane
airport
aisle
al
al-qaida
alan
alarm
albanian
albert
alcohol
alexander
algorithm
ali
alien
alive
allegations
allegedallegedly
allen
alley
alliance
allies
allow
allowedallowing
allows
alloy
almost
alone
along
alphabet
alreadyalso
alt
alter
alternative
although
aluminum
always
amateur
amaze
ambassador
ambitiousamerica
america's
american
americans
amid
among
amount
amphibians
amuse
amusement
analyst
analysts
analyze
anarchy
anatomy
anchor
ancient
anderson
andrew
angel
angeles
anger
angle
angola
angry
animalanimals
ankle
annihilate
anniversary
announceannouncedannouncement
annoy
annual
anonymity
another
answer
ant
antecedent
antonio
anxiety
anxious
anyone
anything
ap
apart
apartment
apollo
apparel
apparent
apparently
appeal
appeals
appear
appearance
appeared
appears
apple
appliance
apply
appoint
appointedappointment
approach
approvalapproveapprovedapproving
april
aquarium
arab
arabia
arafat
arc
arch
archbishop
architecture
archive
area
areas
argentina
argue
argued
argument
arithmetic
arizona
arm
armed
armor
arms
army
aroma
around
arrange
arrangement
arrestarrested
arrivalarrivearrived
arrow
art
articlearticles
artifact
artillery
artist
artistsarts
ascend
ashamed
ashes
asiaasian
aside
askaskedasking
aspen
asphaltass
assassination
assault
assaulted
assembly
assets
assignment
assist
assistance
assistant
associateassociated
association
assume
astronomer
asylum
athens
athleteathletes
athletics
atlanta
atlantic
atmosphere
atom
atomic
attach
attack
attacked
attacksattemptattempts
attendattended
attention
attitude
attorney
attract
attraction
attribute
audience
aug
august
aunt
austin
australia
australian
austria
author
authorities
authority
autoautomobile
autumnavailable
avenue
average
aviation
avoid
awake
awardawards
aware
awareness
away
awful
b
baby
back
backed
bacon
bad
badge
bag
baghdad
bail
bait
bake
bakery
balance
ball
ballad
balloon
ballots
baltimore
ban
banana
band
bank
banker
banking
bankruptcy
banks
banned
bar
bargain
barn
barrel
barter
base
baseball
based
basic
basin
basis
basket
basketball
bathbathroom
battered
battle
battleship
bay
be
beach
beads
beam
bean
bear
beard
beast
beat
beating
beautifulbeauty
became
become
becomes
becoming
bedbedroom
bee
beef
beer
beethoven
beetles
beg
began
begin
beginner
beginning
begins
begun
behavebehavior
behind
beijing
belgium
belief
believe
believed
believes
bell
belly
belong
belt
bench
benchmark
bend
benefitbenefits
berlin
berry
best
bet
betray
better
beverage
beware
beyond
bias
bible
bicycle
bid
big
bigger
biggest
bike
bikini
bill
billboard
billion
bills
bin
binary
biography
biology
bird
birds
birth
birthday
bishopbishops
bit
bite
bitter
bizarre
black
blade
blair
blame
blamed
blanket
blast
blaze
bleach
blend
bless
blizzard
block
blonde
blood
bloombloomberg
blossom
blow
blue
blues
bluff
blur
blurred
blush
board
boardwalk
boat
bob
bodies
body
boeing boil
bold
bomb
bombingbombings
bombs
bondbonds
bone
book
booklet
books
boom
boost
boot
booth
border
born
borrow
bosniabosnian
boston
bother
bottle
bottom
bought
bounce
bound
boundary
bow
bowl
box
boxer
boxing
boy
boys
brace
bracelet
brain
brake
branch
brand
brandy
brass
brazil
bread
break
breakfast
breaking
breathe
breed
brian
brick
bride
bridgebridges
brief
bright
bring
bringingbritain
british
broad
broadcastbroadcasting
brochure
broil
broke
broken
brokerage
bronze
brother
brothers
brought
brow
brown
brush
brussels
bubble
buck
bucket
buddy
budget
buds
buffer
bug
buildbuilders
buildingbuildings
built
bulb
bulgarian
bull
bulletin
bumble
bump
bun
bunch
bunny
bureau
burger
burial
burn
burnedburning
burst
bury
bus
bushbush's
business
businesses
butter
butterfly
button
buybuying
buzz
c
cab
cabbage
cabin
cabinet
cable
cache
cactus
cafe
cage
cake
calculatecalculation
calendar
calf
calif
california
call
called
callingcalls
came
camel
camera
camp
campaigncampaigning
can
can't
canada
canadian
canal
cancer
candidatecandidates
candles
candy
caninecannon
cannot
canyon
cap
capability
capacity
capital
capitalism
capitulate
captain
capturecaptured
car
carbon
card
cardboard
cardinals
cards
care
career
carlos
carnival
carnivore
carolina
carpet
carriage
carried
carrots
carry
carrying
cars
cart
carter
cartoon
case
cases
casey
cash
cast
castle
cat
catastrophe
catch
category
caterpillars
cathedral
catholiccatholics
cattle
caught
causecaused
cautious
cave
cbs
cd
cdy
cease-fire
ceiling
celebratecelebration
cell
cellar
cells
cement
cemetery
cent
center
centers
central
cents
century
ceramic
cereal
ceremony
certain
certainly
certificatecertify
chain
chair
chairman
challenge
chamber
champagne
champion
championschampionship
championships
chance
chances
chandler
change
changed
changes
changing
channel
chaos
chapel
chapter
charactercharacters
charcoal
charge
chargedcharges
charles
chase
cheap
cheat
chechnya
check
cheek
cheerful
cheerleaderscheers
cheesecheetah
chef
chemical
chemistry
cherish
cherry
chess
chest
chew
chicago
chick
chicken
chief
child
childhood
childish
children
chile
chin
chinachina'schinese
chipmunkchirp
chocolate
choice
choir
choke
choose
chop
chopper
chord
chris
christian
christmas
chronicle
chuck
church
cigar
cigarette
circle
circulation
circumstance
cited
cities
citing
citizen
citizenscitizenship
citrus
city
city's
civil
civilian
civilians
claim
claimed
claims
clarify
clash
class
classic
classics
classroom
clean
clear
clearly
clench
cleveland
clever
click
clients
cliff
climb
clinic
clintonclinton's
clock
close
closed
closely
closer
closet
closing
cloth
clothes
cloud
cloudy
clown
clr
club
clubhouse
clue
co
coach
coaches
coal
coalition
coast
coat
cock
cocktail
code
coffee
cognition
coil
coincoins
cold
collaborate
collage
collapse
collar
colleagues
collect
collection
college
collision
colombia
colonel
colonies
color
colorado
colorful
coloring
colour
colt
columbia
column
combat
combination
combine
combined
come
comes
comfort
coming
comma
commandcommander
comment
commentary
comments
commerce
commercial
commission
commissioner
commit
commitment
committed
committee
common
communicatecommunication
communications
communist
communitiescommunity
commuters
companies
companion
companycompany's
compare
compared
comparison
compete
competence
competition
competitive
complain
complained
complete
completed
completely
complex
compose
composercomposers
compound
comprehend
computationcompute
computercomputers
con
concentrate
concept
concern
concerned
concerns
concert
conclude
conclusion
concrete
concur
condemncondemned
condition
conditions
conduct
conducted
cone
conference
confess
confession
confidence
confident
confirmed
conflict
confuse
confusion
congresscongressional
connect
connection
conquer
conquest
consecutive
conservation
conservative
consider
considered
considering
console
conspiracy
conspire
constellation
constitutionconstitutional
construct
construction
consumer
consumers
contact
contained
container
contemplate
contend
continent
continue
continued
continues
continuing
contractcontracts
contributed
control
controls
controversy
convention
conversation
convertible
convicted
convince
cook
cookies
cooking
cool
cooperate
cooperation
coordinator
cop
cope
copper
copy
copyright
cord
corner
corp
corporate
corporation
correct
correspondencecorridor
corruptcorruption
costcosts
costumecostumes
cottage
cotton
couch
cough
could
council
count
counter
countries
country
country's
county
couple
course
court
courts
cousin
cover
coverage
covered
covering
cow
cox
crab
crack
crackle
cradle
craft
crane
crash crave
crawlcrazy
create
created
creating
creation
creativity
creator
creaturecreatures
credibility
credit
creek
crew
crib
crime
crimes
criminal
crisis
criterion
critic
critical
criticismcriticize
criticized
critics
croak
croatia
crochet
crop
cross
crow
crowd
crowded
crown
crucial
cruel
cruise
crunch
crush
cry
crystal
cuba
cube
cucumber
cuddle
cuisine
culturalculture
cup
curator
curled
currency
current
currently
curtain
curve
customers
cut
cute
cuts
cutter
cutting
cylinder
czech
dad
daffodils
daily
daisy
dallas
damage
damages
damn
dan
dancedancers
dandelion
dangerdangerous
daniel
dare
dark
dash
dashboard
data
database
date
daughter
david
davis
dawn
day
days
de
dead
deadline
deal
deals
death
deaths
debate
debt
dec
decade
decades
decay
deceive
december
decide
decided
decision
decisions
deck
declare
declared
decline
declined
decompose
decoratedecoration
decrease
dedicate
deep
deer
def
defeat
defeateddefeating
defend
defending
defense
defensive
deficit
definedefinition
defrost
degrade
degree
del
delay
delightful
deliverdelivered
delivery
demand
demandeddemanding
demands
democracy
democrat
democratic
democrats
demolishdemolition
demon
demonstrate
denial
denied
dense
dentist
denver
deny
depart
department
departure
depend
deployment
deposit
depression
depth
deputy
descend
descent
describe
described
desert
deserve
designdesigned
desirable
desire
desk
despair
despise
despite
dessert
destroy
destroyed
destruction
detach
details
detaineddeteriorate
determination
determine
determined
detroit
develop
developed
developing
development
device
devil
devote
dew
dialogue
diamond
dice
dictionary
die
died
diego
diet
differ
difference
differences
different
difficult
difficulty
dig
digit
digital
dignity
diminish
diner
dinner
dip
diplomaticdiplomats
direct
directed
direction
directly
director
dirt
dirty
disability
disagree
disallow
disappear
disappoint
disaster
disbelieve
disc
discharge
discipline
discourage
discover
discovered
discovery
discussdiscussed
discussion
disease
disguise
disgust
dish
disintegrate
dislike
dismay
dismissdismissed
disney
disorganize
disown
disperse
display
dispose
disposition
disprove
dispute
disregard
dissolve
distance
distribution
distributor
district
disturb
disturbances
dive
diversion
divide
divided
dividend
diving
division
dlrs
do
dock
doctordoctors
document
documents
dodgers
dog
dole
dollar
dollars
dolls
domain
dome
domestic
dominate
done
donkey
donut
doodle
doordoorway
double
doubt
dow
downtown
doze
dozendozens
dr
draft
drag
dragon
dragonfly
drain
drama
draw
drawer
drawingdrawn
dream
dreary
drench
dress
dressing
drew
drift
drill
drink
drip
dripping
drivedriverdriversdriving
drizzle
drop
droplets
dropped
drought
drove
drown
drugdrugs
drum
dryer
duck
dude
due
duel
dull
dumb
dump
dunk
duplicate
dusk
dutch
duty
dye
e
eagle
ear
earlier
early
earnearnedearning
earnings
earth
earthquake
ease
easier
easily
east
easter
eastern
easy
eat
ecology
economic
economist
economy
ecuador
edge
editing
editor
editors
educate
education
effect
effective
effects
effort
efforts
eggeggs
ego
egypt
eight
eighth
einstein
either
el
elastic
elbow
elect
elected
election
elections
electric
electricity
electronic
electronics
elegance
element
elephant
elevator
else
elsewhere
embarrass
embassy
embrace
emergency
emerging
emission
emotion
employeeemployees
employer
en
encourage
encouragement
encyclopedia
end
endedending
endorsement
ends
endurance
energy
enforce
enforcement
engage
engagement
engine
engineering
england
english enjoy
enough
enrageensure
enter
entered
enterprise
entertain
entertainment
entire
entity
entrance
environment
environmental
episcopal
equality
equation
equipment
equity
era
eric
erode
error
erupt
escalator
escape
especially
espionage
essay
essential
established
establishment
estate
estimateestimatedestimates
et
ethnic
eu
euro
europe
european
evacuate
evaluate
even
evening
eventevents
eventually
ever
every
everybody
everyone
everything
evict
evidence
exactly
examinationexamine
examiner
example
exceed
excel
except
exchange
excuse
executiveexecutives
exercise
exhale
exhibitexhibition
exile
exist
existing
exit
exotic
expand
expansion
expect
expectations
expected
expects
expedition
expensive
experience
experts
explainexplanation
explode
exploit
exploitation
explore
explosion
explosive
exports
express
expressed
extended
extra
extractextracts
extremely
eyeeyes
f
fabric
face
faced
faces
facilities
facing
fact
factory
fade
fail
failedfailure
fair
faith
fallfalling
fame
familiar
families
family
famousfans
fantasy
far
farley
farmfarmer
farmers
fashion
fast fasten
father
fault
fauna
favor
favorite
fax
fbi
fear
fears
featherfeathers
featurefeatures
feb
february
fed
federal
federation
fee
feed
feedback
feelfeeling
fees
feetfeline
fell
fellow
feltfemale
fence
ferry
fertility
festival
fever
fewer
fiction
field
fifth
fight
fighters
fighting
figure
figures
file
filed
fill
filled
filmfilmsfinal
finally
finance
financial
find
finding
fine
finger
fingerprint
finishfinished
fire
fired
fireworks
firm
firms
first
fiscal
fish
fishing
fit
five
fix
fla
flag
flame
flamingo
flap
flash
flat
flavor
fledfleefleeing
flesh
flex
flexible
flight
flights
fling
flip
float
flood
floor
flora
florida
flour
flow
flowerflowers
flu
flush
flute
flutter
fly flyerflying
focusfocused
fog
foil
fold
foliage
follow
followed
following
follows
food
foolish
foot
football
footprint
forbear
forbes
forbid
force
forced
forces
ford
forecast
foreign
forest
forged
forget
forgive
form
formal
format
formationformed
former
formula
forswearing
fort
forthcoming
forward
fought
found
foundation
fountain
four
fourth
fox
fragile
fragrance
frame
framework
france
francisco
frank
fraternity
fraud
free
freedom
freeze
french
fresh
freud
friday
friday's
friedman
friend
friendly
friends
frighten
frigid
frog
front
frost
frozen
fruit
frustrate
frustration
fry
fuck
fuel
full
fully
fun
fund
funding
funds
funeral
fungi
funny
fur
furnace
furniture
fury
future
g
gain
gained
gains
galaxy
gallon
gamble
gambling
game
games
garage
garbage
garden
garfield
garlic
garment
gary
gas
gasoline
gate
gateway
gather
gathered
gathering
gauge
gave
gay
gaza
gear
gem
gen
gender
generalgenerally
generation
generous
geniusgenre
gentleman
george
georgia
german
germany
get
gets
getting
giant
giants
giggle
gin
giraffe
girl
girls
give
given
gives
giving
glad
glance
glare
glass
glide
glittering
global
globe
glove
glow
glue
gmt
go
goalgoals
goat
goats
god
goes
going
gold
golden
golf
gone
good
goods
goose
gop
gore
gospel
gossip
got
gov
govermentgovernmentgovernment's
governments
governor
grab
grade
graduate
graffiti
grand
grant
grapes
graphicgraphics
grasp
grass
grate
gravegravestone
graveyard
gray
graze
great
greater
greatest
greece
greek
green
greet
grew
grey
grief
grill
grind
grip
grocery
groom
ground
group
groups
grow
growing
grown
growth
guarantee
guard
guardian
guerrillas
guess
guest
guild
guilt
guilty
guitar
gulf
gull
gulp
gum
gun
guns
guru
gut
guy
guys
gymnastics
h
hack
hair
haircut
half
hallhalloween
hallway
hamas
hamburger
hamlet
hamper
hamster
hancock
hand
handbag
handle
hands
handwriting
hanghanging
happenhappenedhappening
happens
happiness
happy
harborharbour
hard
hardware
harm
harris
harsh
harvard
hat
hatch
hate
haul
haunt
have
hawk
haze
he'shead
headed
headquarters
heads
heal
health
healthy
hearheard
hearing
heart
heat
heaven
heavily
heavy
height
held
helium
hell
helmet
help
helped
helper
helping
hen
henry
herb
heritage
hero
heroin
heroine
hesitate
hide
high
higher
highest
highly
highway
hike
hill
hip
hired
historic
history
hit
hitch
hits
hitting
hockey
holdholding
holds
hole
holiday
holler
hollywood
holy
home
homeless
homer
homes
honest
honey
hong
honolulu
honor
hood
hoothop
hope
hoped
hopes
hoping
horace
horizon
horn
horrible
horse
hose
hospital
hospitals
host
hostility
hot
hotel
hound
hourhours
house
houses
housing
houston
howard
however
hug
huge
hum
human
humiliate
hummingbird
hundredhundreds
hunt
hurricanehurry
hurt
husband
husky
hussein
hustle
hut
hydrogen
hymn
hypertension
hypnotize
hysteria
i'di'll
i'mi've
ice
icon
ideaideas
identified
ignorance
ignore
ii
ill
illegal
illness
illusion
illustrations
imageimages
imagine
imitate
immediate
immediately
immigrantsimmigration
immoral
immunity
impact
impartiality
impatient
impersonate
implementimplementation
implode
importanceimportant
impose
imposed
impossible
impression
imprisoned
improve
improved
improving
impulse
inc
inchinches
incident
incline
include
included
includes
including
income
increaseincreasedincreases
increasing
increasingly
indeed
independence
independent
index
india
indian
indicated
indication
individual
indonesia
industrial
industriesindustry
inexpensive
infection
infinite
inflation
influence
inform
information
infrastructure
inhale
initial
initially
injuredinjuries
injury
ink
inmate
inn
inninginnings
inquire
insaneinsectinsects
insert
inside
insight
insisted
inspect
inspectors
installation
instance
instead
institute
institution
institutions
instruct
instruction
instructor
instrumentinstrumentation
insult
insurance
insure
insured
integrate
integration
intellect
intelligence
intelligent
intend
intended
intensity
interaction
interest
interested
interests
interim
interior
internal
international
internet
interrupt
intervention
interview
intoxicate
introduceintroduced
intuition
invalidate
invent
inventory
investigate
investigatinginvestigationinvestigators
investmentinvestments
investor
investors
invite
invoice
involve
involvedinvolving
ipod
iran
iraqiraq's
iraqi
ireland
iris
irish
iron
irritate
islamic
islandislands
isolationisraelisrael'sisraeliisraelis issue
issued
issues
italian
italy
items
ivy
j
jack
jacket
jackson
jaguar
jail
jamaica
james
jan
january
japan
japan'sjapanese
jar
jaw
jay
jazzjean
jeff
jellyfish
jerk
jerry
jersey
jerusalem
jet
jets
jewel
jewelry
jewishjews
jim
job
jobs
joe
jog
john
johnson
join
joined
joint
joke
jones
jordan
jose
joseph
journal
journal-constitution
journalists
journey
joy
jr
juan
judge
judges
judging
judgment
juice
july
jump
jumper
june
jurisdiction
jury
justice
justify
juvenile
k
kansas
keepkeeping
kennedy
kept
kerry
kevin
key
keyboard
kick
kid
kidnap
kidney
kids
kill
killedkilling
kilometer
kilometers
kim
kind
king
kiss
kitchen
kittens
kitty
klan
knee
kneel
knew
knife
knight
knit
knitting
knock
know
knowledgeknown
knows
kong
koreakoreankosovo
kremlin
l
la
lab
labor
laboratory
lace
lack
lad
laden
lady
lake
lakers
lamb
lamp
land
landlord
landscape
language
lantern
large
largely
larger
largest
larry
las
last
late
later
latest
latex
latin
laugh
launch
launched
laundering
law
lawmakers
lawn
lawslawsuit
lawyer
lawyers
lay
layer
lead
leader
leaders
leadership
leading
leads
leaf
league
lean
leap
learn
learned
least
leather
leave
leaves
leaving
lebanon
led
ledge
lee
left
leg
legal
legion
legislationlegislature
lego
lemon
lend
lens
less
lesson
let
letter
letters
level
levels
lewis
liability
liberal
library
libya
license
licklicking
lie
lien
life
lift
light
lighthouse lighting
lightning
like
likely
lily
limb
limit
limited
limits
limp
lincoln
line
linen
lines
lineup
lingerie
link
linked
lion
liplips
liquid
liquor
list
listed
listen
listing
literature
litter
little
live
lived
liver
liveslivinglizard
load
loanloans
lobster
local
locate
location
locked
locomotives
log
logic
london
long
long-term
longer
look
looked
looking
looks
loop
loosen
los
lose
losing
loss
losses
lost
lot
louis
louisiana
lounge
love
lover
low
lowerlowest
ltd
luck
luggage
lunch
lung
luxury
lyric
mac
machine
mad
made
madhouse
madrid
magazine
magic
magician
magnify
magnitude
magnolia
maid
mainmainly
maintain
major
majority
make
maker
makes
makeup
making
malaysia
malemales
mallard
mammalmammals
man
manage
managed
management
manager
managers
managing
manchester
manhattan
mannequin
manner
manslaughter
manufacturer
many
mapmaple
maradona
marathon
marble
march
marching
mare
marijuana
mark
marked
market
marketing
markets
marks
marriage
married
marrow
marry
mars
martin
mary
mask
mass
massachusetts
massive
matchmatches
mate
materialmaterialsmatter
matters
maxwell
may
maybe
mayor
meal
mean
means
meant
meanwhile
measure
measures
meat
mechanic
mechanism
medal
media
medicalmedicine
medium
meet
meetingmeetings
melody
melt
membermembers
memorabilia
memorial
memory
men
men's
mend
menumere
merger
message
met
metal
metermeters
metro
mets
mexican
mexico
miami
michael
michigan
microsoft
microwave
midday
middle
might
mike
milan
mile
miles
militantmilitants
militarymilitia
milk
mill
miller
million
millions
milosevic
mimic
mind
miniature
minister
ministers
ministry
mink
minnesota
minor
minority
minute
minutes
mirror
misery
miss
missed
missilemissiles
missing
mission
misspend
mist
mistake
mistreat
misty
mix
mixed
mixture
mm
mob
mode
model
modern
modest
modification
molecule
mom
moment
momentum
monday
monetary
money
monk
monkeys
monster
monthmonths
montreal
monument
mood
moon
morality
morgan
morning
mortal
mortality
moscow
moss
mostly
motel
mother
motion
motive
motor
motorcycle
motto
mount
mountain
mouse
mouth
move
moved
movement
moves
moviemovies
moving
mow
mozart
mph
mr
mrsms
much
mud
mug
multiply
munch
munich
mural
murder
murdered
murphy
muscle
museum
mushrooms
music
musician
musicians
muslimmuslims
mussolini
must
mustard
mutual
mystery
myth
n
nag
nailnails
name
named
names
nap
narrow
nasdaq
nation
nation's
national
nations
nationwide
native
nato
natural
nature
navy
nazi
nba
near
nearby
nearly
necessary
neck
necklace
need
needed
needle
needs
neglect
negotiations
negroes
neighborhood
neighboring
neighbors
neither
neon
nephew
nerve
nervousnest
net
netanyahu
netherlands
networknetworksneutral
nevernew
news
newspaper
newspapers
next
nfl
nice
nick
nickel
night
nine
ninth
nn
nobody
noise
none
noodlesnoon
normal
north
northeast
northern
northwest
nose
note
notebook
noted
notes
nothing
notice
noticeable
notify
nov
novel
november
novice
nuclear
nude
nuisance
nullify
numbernumbers
nursenurses
nut
nutrition
nuts
nyt
oak
obey
object
objective
obligation
observation
observatory
observe
obtain
obvious
obviously
occasion
occupation
occur
occurred
occurrence
ocean
oct
october
odd
odyssey
offend
offense
offensive
offer
offered
offering
offersoffice
officer
officers
offices
officialofficials
often
ohio
oil
oklahoma
old
older
oldest
olympicolympics
one
ones
onion
online
onto
opec
open
opened
opening
opera
operate
operating
operation
operations
operative
operator
opinion
opponent
opponents
opportunities
opportunity
opposed
oppositionoption
optional
optionsoracle
orange
orchestra
orchids
order
ordered
orders
organ
organism
organizationorganizations
organizeorganized
origami
origin
original
originate
orleans
orthodontist
others
otherwise
outdoor
outfit
outlet
outnumber
outside
oval
overall
overcome
overflow
overhead
overnight
overpower
overseas
overwhelm
owe
owl
own
owned
owner
owners
owns
oxoxford
oxygen
p
pace
pacific
package
packaging
packed
packet
pact
padding
page
pages
paid
pain
paint
painted
painting
pair
pakistan
palace
palestinianpalestinians
palm
pan
panda
panel
panorama
paper
papers
parade
paragraph
parcel
pardon
parent
parents
paris
parish
parkparking
parliamentparliamentary
parrot
part
participant
participate
particular
particularly
parties
partly
partnerpartners
parts
party
pass
passagepassed
passengers
passing
passion
passport
past
pat
patch
patent
path
patients
patio
patrick
patternpatterns
paul
pause
paws
paypaying
payment
payments
peace
peaceful
peacock
pearl
pebbles
peel
pelican
pelt
pen
penalty
pencil
pennsylvania
pentagon
people
people's
pepper
per
perceive
percent
percentage
perception
perfect
performperformance
performer
perfumeperhaps
period
perish
perjury
permanent
permission
permit
person
personal
personnel
perspire
persuade
pet
petals
peter
pets
phantom
phenomenon
philadelphia
philip
philippines
phoenix
phone
photo
photographer
photography
photos
phrase
physical
physician
physics
piano
piazza
pick
picked
picturepictures
pie
piecepieces
pier
pig
pigeon
pigs
pillow
pilot
pilots
pin
pinch
pink
pinnacle
pipe
pitch
pittsburgh
pizzaplace
placed
places
plague
plan
plane
planes
planet
plannedplanning
plans
plant
plantation
plants
plastic
plate
plates
playplayedplayerplayers
playground
playing
playoffplayoffs
plays
pleaplead
please
pledges
plenty
plot
plow
pluck
plus
poach
pod
poem
poetpoetry
pointpoints
poise
poker
poland
pole
poles
police
policiespolicy
polish polite
political
politician
politicians
politics
pollpolls
pollution
polo
polyester
pond
ponder
poodle
pool
poor
pop
popcorn
pope
poppy
popular
population
porch
pork
port
portion
portrait
portray
position
positions
positive
possess
possession
possibilitypossible
possibly
post
postage
postcards
posted
poster
pot
potato
potatoes
potential
poultry
pounce
poundpounds
pour
poverty
powell
powerpowerful
powers
practice
prayprayer
precedent
predict
predicted
prefer
preference
pregnancy
pregnant
prejudice
premier
preparationprepareprepared
preparing
presence
presentpresented
preservation
presidencypresident
president's
presidential
press
pressure
prestige
presume
pretend
pretty
prevent
preview
previous
previously
prey
price
prices
prick
pride
priest
primary
prime
princeprincess
printer
priority
prison
prisoners
privateprivileges
prize
probability
probably
problemproblems
procedure
proceedings
process
processing
proclaim
produce
produced
producer
product
production
products
profession
professional
professionals
professor
profitprofits
programprograms
progress
prohibit
project
projector
projects
prominence
prominent
promise
promised
promote
promoter
proof
propaganda
proper
property
proposalproposalsproposed
prose
prosecute
prosecutorprosecutors
prosper
protectprotection
protective
protest
protestant
protesters
protests
protocol
proton
proud
prove
provideprovided
provides
providing
province
proving
proximity
psychiatrypsychology
pub
public
publication
publicly
published
puddlepuerto
pugpull
pulled
pump
pumpkin
punch
punish
punishment
punk
pup
pupil
puppy
purchase
purple
purpose
purse
pursue
pushpushedpushing
putputting
pyramid
q
quality
quantity
quarantine
quarrel
quarter
quarterback
quartz
queen
queens
quench
query
quest
question
questions
queue
quick
quickly
quiet
quit
quite
quiz
quota
quotationquote
quoted
r
rabbi
rabbit
race
racer
racesracing
racism
racket
radar
radiation
radical
radio
railrailroad
rails
railway
rain
rainbow
raiseraised raising
rally
ran
range
rangers
rap
rapid
rare
rash
raspberry
rat
raterates
ratherrationalize
rattle
ray
reachreached
reaction
read
reader
reading
readyreal
reality
realize
really
reap
reappear
rearrangereason
reasons
rebelrebels
recall
recalled
receivereceived
receiver
recent
recently
recess
recognition
recommend
recommendation
record
recorder
records
recovery
recreation
recruit
recyclerecycling
red
redhead
reduce
reduced
reef reel
reference
referring
reflect
reflection
reform
reforms
refrain
refrigerator
refugees
refuserefused
regime
region
regional
registerregistration
regret
regular
reichstag
rejectrejected
rejoice
related
relation
relations
relationship
relativerelatively
relatives
relax
relaxation
relaxed
relay
release
released
relief
religion
religious
rely
remain
remained
remaining
remains
remark
remarks
remedy
remember
remindremove
removed
renounce
rental
rep
repair
repeal
repeat
repeatedly
replacereplaced
reply
report
reported
reportedly
reporter
reporters
reports
representation
representativerepresentatives
represents
repress
reprimand
reproduce
reptiles
republic
republicanrepublicans
request requirerequiredrequires
rescuerescued
research
researchers
reservation
reserve
residents
resolution
resort
resources
respect
respond
responded
response
responsibilityresponsible
rest
restaurant
restless
restore
restrict
result
results
resume
retail
retailer
retain
retire
retired
retirement
retiring
retreat
return
returnedreturning
returns
revenuerevenues
review
revolution
reward
rhodes
rhythm
rice
rich
richard
rick
ride
ridge
ridicule
right
rights
ring
rinse
rip
ripples
riserising
risk
rival
river
riverside
rn
road
roam
roar
rob
robert rock
rocker
rocket
rod
rodents
role
roles
roll
romance
roof
rook
room
rooms
rooster
root
rope
rose
rough
round
route
rover
row
rowing
royal
rub
rubber
rubbish
rugby
ruins
rule
ruled
rules
ruling
run
running
runs
rush
russia
russia'srussian
rust
rusty
sabbath
sad
saddam
safe
safety
said
sailsailing
sailor
saint
saints
sake
salad
salary
sale
sales
salt
salute
san
sanctions
sand
sandwich
santa
sat
satan
satellite
satin
satisfy
saturday
saturday's
sauce
saudi
sausage
save
savings
saw
say
saying
says
scale
scandal
scarce
scare
scene
scenery
schedule
scheduled
scheme
school
schools
sciencescientist scientists
scold
scooter
scorescored
scores
scoring
scotch
scott
scottish
scouting
scramble
scratch
scream
screen
scribble
script
scrutiny
sculpture
sea
seafood
seagull
search
seashore
seasonseasons
seatseats
seattle
second
seconds
secret
secretary
section
sector
secure
securities
security
see
seed
seeing
seekseeking
seem
seemed
seems
seen
seepage
segment
seize
seized
seizure
selectselection
self
sellselling
seminar
sensenatesenator
send
sending
senior
sense
sent
sentencesentenced
sentiment
sentry
separate
separation
sept
september
serbserbs
serial
series
serious
seriously
serve
served
server
service
services
serving
session
set
sets
setting
settle
settlement
seven
seventh
several
severe
sewsewing
sexsexual
sexy
shade
shadow
shake
shame
share
shareholders
shares
shariff
shark
sharon
sharp
shatter
shed
sheep
sheet
shell
shelter
sheriff
shift
shine
ship
shirt
shiver
shock
shoes
shoot
shooting
shopshopping
shore
short
shortage
shortly
shot
shots
shoulder
shove
show
showed
shower
showingshown
shows
shrink
shuffle
shun
shut
sicksickness
side
sides
sidewalksight
sign
signal
signature
signed
significant
signs
silence
silhouette
silver
similar
similarity
simple
simply
simulation
since
sing
singapore
singer
single
singles
sink
sinner
sip
sister
sit
sitesites
sitting
situation
six
sixteen
sixth
size
skate
skateboard
skatingsketch
skiskiing
skill
skin
skipskirt
skull
sky
skylineskyscraper
slap
slash
slaveslaveryslaves
slay
slaying
sleep
sleeve
slice
slide
slightly
slip
slither
slope
slow
slug
slurp
sly
small
smaller
smart
smash
smear
smell
smile
smith
smoke
smoking
smother
smudge
snake
snap
snatchsneak
sneeze
sniff
snoozesnore
snow
snowboarding
snowman
snuggle
so-called
soak
soap
soar
sob
soccer
social
socialism
society
socks
sofa
softball
software
soil
solar
sold
soldier
soldiers
sole
solid
solution
solve
someonesomething
sometimes
son
songsongs
soon
soothe
sorrow
sort
sought
soul
sound
soup
source
sources
south
southeast
southern
sovietsoviets
sow
sox
soybean
space
spaghetti
spain
spanish
spank
spare
spark
spawnspeak
speaker
speaking
special
species
specific
speculation
speech
speed
spell
spend
spending
spent
sphere
spider
spill
spin
spine
spiral
spirit
splash
split
spoil
spoke
spokesmanspokeswoman
spoon
sport
sports
spot
sprain
spray
spread
spring
springfield
sprinkle
sprint
spy
squad
square
squash
squeaksqueal
squeeze
squint
squirrel
sri
st
stab
stability
stadium
staff
stage
stain
staircase
stairs
stake
stalin
stamp
stance
stand
standard
standards
standingstands
stanley
star
stare
stars
start
started
starter
startingstarts
starve
statestate's
statement
statements
states
station
stations
statue
status
stay
stayed
steak
steal
steel
steeple
steer
stem
stencil
step
stephen
steps
steve
stevenson
stew
stewart
stick
stickers
still
sting
stink stir
stitch
stitches
stock
stockings
stocks
stomach
stone
stood
stop
stopped
store
stores
stories
stork
storm
story
stove
straight
strain
strange
stranger
strategy
straw
strawberry
stray
streak
stream
street
streets
strength
stretch
strike
strikes
string
strip
stripes
strive
stroke
strongstronger
struck
structure
struggle
struggling
stud
studentstudents
studiesstudy
stuff
stumble
stupid
style
subcommittee
subject
submarine
submit
substance
subtract
subway
succeed
successsuccessful
suck
suddenly
suds
sue
sufferedsuffering
sugar
suggest
suggested
suicide
suit
sum
summer
summit
sun
sunday
sunday's
sunflower
sunglasses
sunk
sunlight
sunny
sunrisesunset
sunshine
super
supper
suppliessupply
supportsupported
supporter
supporters
suppose
supposed
supreme
sure
surf
surface
surfers
surgery
surname
surpass
surprise surprised
surprises
survey
survive
survived
sushi
suspect
suspected
suspects
suspended
suspicion
swallow
swamp
swan
swap
sway
swear
sweat
sweater
sweden
sweet
swimswimming
swimsuitswing
swiss
switzerland
swoon
sydney symbol
syndicate
syria
system
systems
table
tableware
tackle
tag tail
taiwan
take
taken
takes
taking
taliban
talktalked
talking
talks
tall
tampa
tan
tank
tap
target
targets
tarnish
task
taste
tattoo
tax
taxation
taxes
taxi
taxpayer
taylor
tea
teach teacher
teachers
teaching
team
team's
teams
tear
tears
tease
technical technician
technology
teeth
tel
telecommunications
telephone
television
tell
telling
temper
temperature
temple
tend
tennis
tenor
tent
term
terms
terrible
terrier
terrific
territory
terrorterrorismterroristterrorists
testtested
testifytestimony
testingtests
texas
text
textbook
textile
texture
thailand
thanks
that's
thaw
theatertheatre
theft
theme
theory
there's
they're
they've
thief
thingthings
think
thinkingthinks
third
thomas
though
thoughtthousands
threat
threatened
threats
three
threw
throat
throne
throughout
throw
thumb
thunderstorm
thursday
thus
ticker
tickettickets
tickle
tietied
ties
tiger
tight
tiles
tim
timber
time
timer
times
tin
tiny
tire
title
toast
tobacco
today
today's
toe
toes
together
toil
toilet
tokyo
told
tolerance
tom
tomato
tone
tongue
tonight
tons
tony
took
tool
tooth
top
topic
toronto
toss
total
tote
touch
tough
tour
touristtourists
tournament
tow
toward
tower
town
towns
toytoys
track
tract
trade
traded
traders
trading
traditional
traffic
tragedy
trail
train
trainer
training
tram
transformation
transformers
transmission
transplant
transporttransportation
trap
travel
traveling
tread
treasury
treat
treatedtreatment
treaty
treetrees
trial
tribunal
trick
tried
trim
trinity
trip
triumph
troops
tropical
trouble
troubles
truck
trucks
true
trunk
trust
truth
trytrying
tsunami
tub
tuesday
tug
tulip
tumble
tune
tunnel
turkey
turkish
turks
turn
turned
turning
turns
tv
twice
twigs
twist
two
type
u
ultimate
umbrella
unable
uncle
undated
underground
understand
underwater
unemployment
unhappy
uniform
union
unit
unite
united
units
universe
university
unless
unlike
unlikely
unload
unnecessary
unusual
update
upon
upset
urban
urge
urged
us
useused
users
usesusing
usually
usurp
v
vacate
vacation
valentine
valley
valor
value
van
vanish
variety
various
vary
vault
veer
vehiclevehicles
vein
venture
verdict
verify
verse
version
vessel
veteran
via
vice
victim
victims
victor
victory
video
vietnamvietnamese
view
viewer
views
villa
village
villages
vine
vinegar
vintage
vinyl
violation
violenceviolent
violet
violin
virginia
virtually
virtuoso
virus
vision
visit
visitedvisiting
visitorvisitors
vitamin
vocal
vodka
voice
volcano
volume
volunteer
vomit
votevotedvotersvotes
voting
voyage
vs
w
wag
wagner
wagon
waist
wait
waiting
wake
walk
walked
walking
wall
wander
wantwanted
wants
war
warn
warned
warner
warning
warranty
warrior
warships
wash
washerwashing
washington
wasp
waste
watch
watched
watching
waterwaterfall
wave
way
ways
we'llwe'rewe've
weak
wealth
weapon
weapons
wear
wearing
weather
weave
web
wed
wedding
wednesday
weed
week
week's
weekend
weekly
weeks
weep
weight
weird
welcome
welfare
well
went
west
western
wet
whale
what's
whatever
wheat
whether
whip
whiskers
whiskey
whisper
whistle
white
whole
whose
wide
widely
wife
wig
wiggle
wild
william
williams
willing
wilson
win
wind
windmill
window
windows
wine
wing
wingswinner
winningwins
winter
wipe
wire
wisdom
wise
wit
withdrawwithdrawal
withdrawn
within
without
witnesses
wizard
wolf
woman
women
women's
wonderful
wood
woodland
woods
wool
word
words
workworked
worker
workers
working
workplace
worksworkshopworld
world's
worldsources
worldwide
worm
worried
worryworse
worst
worth
would
wounded
wrap
wreck
wrist
write
writer
writes
writingwritten
wrong
wrote
x
yacht
yale
yankees
yardyards yarn
yawn
year
year's
year-old
yearn
years
yell
yellow
yeltsin
yen
yes
yesterday
yet
yield
york
young
younger
youth
yugoslavyugoslavia
zealand
zebra
zinc
zombie
zone
zoo
51 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
t-SNE on the giga20 PPMI VSM
cooking conflict
51 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
t-SNE on the imdb20 PPMI VSM
60 40 20 0 20 40 60
80
60
40
20
0
20
40
60
80
!
);
.
.....
:)
?
abandoned
abdomen
abduct
ability
able
abortion
abraham
absence
absolute
absolutely
absorb
abstain
abstract
absurd
abundance
abuse
academy
accelerate
accent
accept
access
accident
accommodation
accomplish
accordingaccount
accumulate
accurate
accuse
acheachieve
acknowledge
acoustic
acquire
acquit
acre
across
acrylic
act
acted
acting
action
actions
activity
actor
actors
actressactresses
acts
actual
actually
ad
adam
adaptation
addadded
adding
addition
address
adds
adhesive
adjourn
adjustment
administration
admiralty
admire
admission
admit
adopt
adore
adorn
adultadults
advance
adventure
adventures
adversary
advertisement
advertiser
adviceadvise
aerial
aeronautics
affair
affect
afghanistan
afraid
africa
african
afternoon
age
agency
agent
ages
aggravate
aggression
ago
agony
agree
agreement
ahead
aid
aim
ain't
airaircraft
airplane
airport
aisle
al
alan
alarm
albeit
albert
alcohol
alex
algorithm
alice
alienaliens
alive
allen
alley
allowallowed
allows
alloy
almost
alone
along
alphabet
already
alright
also
alt
alter
alternative
although
aluminum
always
amateur
amaze
amazed
amazing
amazingly
ambitious
america
american
americans
amongamongst
amount
amphibians
amuse
amusement
amusing
analyze
anarchy
anatomy
anchor
ancient
anderson
andyangel
anger
angle
angles
angola
angry
animalanimals
animatedanimation
anime
ankle
annanna
anne
annihilate
anniversary
announce
announced
announcement
annoy
annoying
another
answeranswers
ant
antecedent
anthony
anxiety
anxious
anybodyanymore
anyone
anythinganyway
anywhere
apart
apartment
apollo
apparel
apparent apparently
appeal
appealing
appear
appearance
appeared
appears
apple
appliance
apply
appoint
appointment
appreciateappreciated
approach
appropriate
approval
approve
approving
aquarium
arafat
arc
arch
archbishop
architecture
archive
area
argentina
argue
argument
arithmetic
arm
armor
arms
army
arnold
aroma
around
arrange
arrangement
arrest
arrivalarrivearrives
arrow
art
arthur
article
artifact
artillery
artist
artistic
arts
ascend
ashamed
ashes
asia
asian
aside
ask
askedasking
asks
asleep
aspectaspects
aspen
asphalt
ass
assassination
assault
assaulted
assembly
assets
assignmentassist
associate
association
assume
astronomer
asylum
athens
athleteathletics
atmosphere
atom
atomic
attach
attackattacks
attemptattempts
attend
attention
attitude
attorney
attract
attraction
attractive
attribute
audience
audiences
august
aunt
author
autoautomobile
autumn
available
avenue
average
aviation
avoid
awake
awardawards
aware
awareness
away
awesome
awful
awkward
b
baby
back
background
bacon
bad
badge
badly
bag
bail
bait
bake
bakery
balance
ball
ballad
balloon
ballots
ban
banana
band
bankbanker
bankruptcy
bar
barely
bargain
barn
barrel
barter
base
baseball
based
basic
basically
basin
basis
basket
basketball
bath
bathroom
batman
battered
battlebattles
battleship
bay
be
beach
beads
beam
bean
bear
beard
beast
beat
beautifulbeautifullybeauty
became
become
becomes
becoming
bedbedroom
bee
beef
beer
beethoven
beetles
beg
began
begin
beginner
beginning
begins
behave
behavior
behind
belief
believable
believe
believed
believes
bell
belly
belong
belt
ben
bench
benchmark
bend
berlin
berry
besides
best
bet
betray
better
beverage
beware
beyond
bias
bible
bicycle
bigbigger
biggest
bike
bikini
bill
billboard
billion
billy
bin
binary
biography
biology
birdbirds
birth
birthday
bishop
bishops
bit
bite
bits
bitter
bizarre
black
blade
blair
blame
bland
blanket
blaze
bleach
blend
bless
blind
blizzard
block
blockbuster
blonde
bloodbloody
bloom
blossom
blow
blown
blue
blues
bluff
blurblurred
blush
board
boardwalk
boat
bob
bodiesbodyboil
bold
bomb
bond
bone
bookbooklet
books
boom
boost
boot
booth
border
boredboring
born
borrow
boss
boston
bother
bothered
bottle
bottom
bought
bounce
bound
boundary
bow
bowl
box
boxerboxing
boy
boyfriend
boys
br
brace
bracelet
brad
brain
brake
branch
brand
brandy
brass
brave
brazil
bread
break
breakfast
breaking
breaks
breathbreathe
breathtaking
breed
brian
brick
bride
bridge
bridges
brief
bright
brilliant
brilliantly
bringbringing
brings
britain
british
broad
broadcastbroadcasting
brochure
broil
broken
brokerage
bronze
brooks
brother
brothers
brought
brow
brown
bruce
brush
brussels
brutal
bubble
buck
bucket
buddy
budget
buds
buffer
bug
build
builders
building
built
bulb
bulgarian
bull
bulletin
bumble
bumpbun
bunch
bunny
burger
burial
burnburnedburning
burst
burton
bury
bus
bushbusiness
butter
butterfly
button
buy
buzz
c
cab
cabbage
cabin
cable
cache
cactus
cafe
cage
cake
calculate
calculation
calendar
calf
call
called
calling
calls
came
camel
cameo
camera
cameron
camp
campaign
campaigning
cancan't
canal
cancer
candidate
candles
candy
canine
cannon
cannot
cant
canyon
cap
capability
capable
capital
capitalism
capitulate
captain
capturecaptured
captures
car
carbon
card
cardboard
cardinals
care
career
cares
carnival
carnivore
carpet
carrey
carriage
carried
carries
carrots
carry
cars
cart
carter
cartooncartoons
casecases
casey
cash
castcasting
castle
cat
catastrophe
catch
category
caterpillars
cathedral
catholics
cattle
caught
cause
caused
cautious
cave
cd
ceiling
celebratecelebration
cell
cellarcement
cemetery
cent
center
central
century
ceramic
cereal
ceremony
certain
certainly
certificate
certify
cgi
chain
chair
challenge
chamber
champagne
championchampionship
chan
chance
chandler
change
changed
changeschanging
channel
chaos
chapel
chapter
character
character's
characters
charcoal
charge
charged
charlescharlie
charm
charming
chase
cheap
cheat
check
cheek
cheerful
cheerleaders
cheers
cheese
cheesy
cheetah
chef
chemical
chemistry
cherish
cherry
chess
chest
chew
chick
chicken
chief
childchildhood
childish
children
children's
chile
chilling
chin
chinese
chipmunk
chirp
chocolate
choice
choices
choir
choke
choose
chop
chopper
chord
chose
chosen
chris
christchristian
christmas
christopher
chronicle
chuck
church
cigarcigarette
cinema
cinematic
cinematography
circle
circulation
circumstancecircumstances
citizen
citizenship
citrus
city
claim
clarify
clark
clash
class
classicclassics
classroom
clean
clear
clearly
clench
clever
clichéclichés
click
cliff
climax
climb
clinic
clock
close
closer
closet
closing
clothclothes
cloud
clown
club
clubhouse
clue
coach
coal
coast
coat
cock
cocktail
code
coffee
cognition
coil
coin coins
cold
collaborate
collage
collapse
collar
collect
collection
college
collision
colonel
colonies
color
colorful
coloringcolorscolour
colt
column
combinationcombinecombined
come
comedic
comediescomedy
comes
comfort
comiccomics
coming
comma
commandcommander
comment
commentary
comments
commerce
commercial
commission
commit
commitment
committed
common
communicatecommunicationcommunity
commuters
companion
company
comparecomparedcomparison
compelling
compete
competence
competition
complain
complaint
complete
completely
complex
complicated
composecomposercomposers
compound
comprehend
computation
compute
computer
con
concentrateconcept
concern
concerned
concert
conclude
conclusion
concrete
concur
condemn
condemned
condition
conditions
cone
confess
confession
confidenceconfident
conflict
confuse
confused
confusing
confusion
congress
connect
connection
conquerconquest
conservation
considerconsidered
considering
console
conspiracy
conspire
constantconstantly
constellation
constitution
construct
construction
consumer
contact
contain
container
contains
contemplate
contemporary
contend
content
context
continent
continue
continues
contract
contrast
contrived
control
controversy
conversation
convertible
convinceconvinced
convincing
cook
cookies
cooking
cool
cooperate
coordinator
cop
cope
copper
cops
copy
cord
core
corner
corny
corporation
correct
correspondence
corridor
corruptcorruption
cost
costume
costumes
cottage
cotton
couch
cough
could
could've
council
count
counter
country
couple
course
court
cousin
cover
covering
cow
crab
crack
crackle
cradle
craft
crafted
crane
crap
crash
crave
crawl
crazy
create
createdcreatescreating
creation
creativecreativity
creator
creaturecreatures
credibility
credit
credits
creek
creepy
crew
crib
crimecriminal
crisis
criterion
critic
criticalcriticismcriticize
critics
croak
crochet
crop
cross
crow
crowd
crowded
crowe
crown
crucial
cruelcruise
crunch
crush
crycrying
crystal
cube
cucumber
cuddle
cuisine
cult
culture
cup
curator
curious
curledcurrency
current
curtain
curve
customers
cut
cute
cuts
cutter
cutting
cylinder
cynical
dad
daffodils
daisy
damage
damages
damn
dan
dancedancersdancing
dandelion
dangerdangerous
daniel
danny
dare
dark
darker
darkness
dash
dashboard
database
date
dated
daughter
david
davis
dawndaydays
de
dead
deadly
deal
dealingdeals
dean
deathdeaths
debate
debt
debut
decade
decades
decay
deceive
decent
decide
decided
decides
decision
deck
declare
decline
decompose
decorate
decoration
decrease
dedicate
deep
deeper
deeply
deer
defeat
defeating
defend
defensive
deficit
define
definitely
definition
defrost
degrade
degree
delay
delightdelightful
deliverdelivered
delivers
delivery
demand
demolishdemolition
demon
demonstrate
denial
dennis
dense
dentist
deny
depart
department
departure
depend
depicteddepiction
deployment
deposit
depp
depressing
depression
depth
deputy
descend
descent
describe
described
desert
deserve
deserveddeserves
design
designed desirable
desire
desk
despair
desperate
despise
despite
dessert
destroydestroyeddestruction
detach
detaildetails
detective
deteriorate
determination
determine
determined
develop
developed
development
device
devil
devote
dew
dialogdialogue
diamond
dice
dickdictionary
die
dieddies
diet
differdifference
different
difficult
difficulty
dig
digit
dignity
diminish
diner
dinner
dip
direct
directed
directingdirection
directly
director
director's
directors
dirt
dirty
disability
disagree
disallow
disappear
disappoint
disappointeddisappointing
disappointment
disaster
disbelief
disbelieve
disc
discharge
discipline
discourage
discover
discovered
discovers
discovery
discussdiscussion
disease
disguise
disgust
disgusting
dish
disintegrate
dislike
dismay
dismiss
disney
disorganize
disown
disperse
display
dispose
disposition
disprove
disregard
dissolve
distance
distributiondistributor
disturb
disturbances
disturbing
dive
diversion
divide
dividend
diving
division
do
dock
doctor
document
documentary
dogdogs
dollardollars
dolls
domain
dome
dominate
donald
done
donkey
donut
doodle
door
doorway
double
doubt
douglas
downtown
doze
dr
dracula
draft
drag
dragon
dragonfly
drain
drama
dramatic
draw
drawer
drawing
drawn
dreamdreams
dreary
drench
dressdresseddressing
drew
drift
drill
drink
dripdripping
drive
driven
driver
driving
drizzle
drop
droplets
dropped
drought
drown
drugdrugs
drum
drunk
dry
dryerduck
dudedue
duel
dull
dumb
dump
dunk
duplicate
dusk
duty
dvd
dye
dying
e
eagle
ear
earlier
early
earnearning
earth
earthquake
ease
easily
east
easter
eastwood
easy
eateating
ecology
ecuador
ed
eddie
edge
edited
editing
editor
educate
edward
eerie effect
effective
effectively
effects
effort
efforts
eggeggs
ego
eight
einstein
either
elastic
elbow
elect
electronics
elegance
elementelements
elephant
elevator
elizabeth
else
embarrass
embrace
emergency
emission
emotionemotional
emotionallyemotions
empire
employeeemployer
empty
encounter
encourage
encouragement
encyclopedia
end
ended
ending
endless
endorsement
ends
endurance
enemy
energy
enforce
engage
engagement
engaging
engine
engineering
england
english
enjoy
enjoyableenjoyed
enjoying
enough
enrage
ensemble
ensure
enter
enterprise
entertain
entertained
entertaining
entertainment
entire
entirely
entity
entrance
environment
epic
episcopal
episodeepisodes
equal
equality
equally
equation
equipment
era
eric
erode
error
erupt
escalator
escape
especially
espionageessay
essenceessential
essentially
establishment
etc
europe
european
evacuate
evaluate
even
evening
eventevents
eventually
ever
every
everybody
everyday
everyone
everything
everywhere
evict
evidence
evil
exact
exactly
examinationexamine
examiner
example
exceed
excel
excellent
except
exception exceptional
exchange
excited
excitementexciting
excuse
executedexecution
executive
exhale
exhibit
exhibition
exile
exist
existence
exit
exotic
expand
expect
expectationsexpectedexpecting
expedition
experience
experienced
experiences
explainexplained
explains
explanation
explode
exploit
exploitation
explore
explosion
explosions
explosive
express
expressionexpressions
extended
extent
extra
extract
extracts
extraordinary
extras
extreme
extremely
eye
eyes
f
fabric
fabulous
face
faces
fact
factor
factory
facts
fade
fail
failed
fails
failure
fair
fairlyfairy
faith
faithful
fake
fall
falling
fallsfalse
fame
familiar
families
family
famous
fanfans
fantastic
fantasy
far
fare
farley
farmfarmer
fascinating
fashion
fast
fasten
fat
fate
father
fault
fauna
favor
favoritefavoritesfavourite
fbi
fear
featherfeathers
feature
featured
features
featuring
fee
feed
feedback
feelfeeling
feelings
feels
feet
feline
fell
fellow
felt
female
fence
ferry
fertility
festival
fever
fiction
field
fightfightingfights
figure
figured
fill
filled
film
film's
film-making
filmedfilming
filmmaker
filmmakers
films
final
finale
finally
find
finding
finds
fine
finest
finger
fingerprint
finish
finished
fire
fireworks
first
fish
fishing
fit
fits
five
fix
flag
flame
flamingoflap
flash
flashbacks
flat
flavor
flaw
flawed
flawless
flaws
fleefleeing
flesh
flexflexible
flick
flicks
flight
fling
flip
float
flood
floor
flora
flour
flow
flowerflowers
flu
flush
flute
flutter
fly
flyer
flying
focusfocused
focuses
fog
foil
foldfoliage
folks
follow
followed
following
follows
food
foolish
foot
footage
football
footprint
forbear
forbes
forbid
force
forced
forces
ford
forecastforeign
forest
forever
forged
forget
forgive
forgot
forgotten
form
formal
format
formation
former
formula
forswearing
forth
forthcoming
forward
foster
found
foundation
fountain
four
fourth
fox
fragile
fragrance
frame
framework
france
franchise
frank
frankly
fraternity
fraud
fred
freddy
free
freedom
freeman
freeze
french
fresh
freud
friday
friedman friend
friendly
friends
friendship
frighten
frightening
frigid
frog
front
frost
frozen
fruit
frustrate
frustration
fry
fuck
fuel
full
fully
fun
fundfunds funeral
fungi
funnierfunniestfunny
fur
furnace
furniture
fury
future
g
gags
gain
galaxy
gallon
gamble
gambling
gamegames
gang
gangster
garage
garbage
garden
garfield
garlic
garment
gary
gasgasoline
gate
gateway
gathergathering
gauge
gave
gay
gear
gem
gender
gene
general
generally
generation
generous
genius
genre
gentleman
genuine
genuinely
george
germangermany
get
gets
getting
ghost
giant
gibson
giggle
gin
giraffe
girl
girlfriend
girls
give
given
gives
giving
glad
glance
glare
glass
glide
glittering
globe
glory
glove
glow
glue
go
goal
goat
goats
god
godfather
goes
going
gold
golden
golf
gone
gonna
good
goofy
goose
gordon
gore
gorgeous
gory
gospel
gossip
got
gotten
goverment
government
governor
grab
grace
grade
graduate
graffiti
grand
grant
granted
grapes
graphic
graphics
grasp
grass
grate
gratuitous
grave
gravestone
graveyard
gray
graze
great
greater
greatest
greatly
green
greet
grew
grey
grief
grill
grind
grip
gripping
gritty
grocery
groom
gross
ground
group
grow
growing
grown
growth
guarantee
guardian
guess
guest
guild
guilt
guilty
guitar
gulf
gull
gulp
gum
gunguns
guru
gut
guy
guys
gymnastics
h
hack
hairhaircut
half
hall
halloween
hallway
hamburger
hamlet
hamper
hamster
hancock
hand
handbag
handle
handled
hands
handsome
handwriting
hang
hanging
hanks
happenhappened
happening
happens
happiness
happy
harbor
harbour
hard
hardly
hardware
harm
harris
harryharsh
harvard
hat
hatch
hate
hated
haul
haunt
haunted
haunting
have
hawk
haze
he's
head
heads
heal
health
hear
heard
hearing
heart
heatheaven
heavily
heavy
heck
height
held
helen
helium
hell
helmet
help
helped
helper
helps
hen
henry
herb
here's
heritage
hero
heroes
heroin
heroine
hesitate
hey
hidden
hide
high
higher
highlight
highly
highway
hike
hilarious
hill
hip
historicalhistory
hit
hitch
hitchcock
hits
hockey
hoffman
hold
holding
holds
hole
holes
holiday
holler
hollywood
holmes
holy
home
homeless
homer
honest
honestly
honey
honolulu
hood
hooked
hoot
hop
hopehopefully
hopes
hoping
hopkins
horace
horizon
horn
horrible
horror
horse
hose
hospital
hostility
hot
hotel
hound
hourhours
house
housing
houston
howard
however
hug
huge
hugh
hum
humanhumanity
humans
humiliate
hummingbird
humor
humorous
humour
hundred
hunthunter
hurricane
hurry
hurt
husband
husky
hustle
hut
hydrogen
hymn
hype
hypertension
hypnotize
hysteria
i'd
i'll
i'm
i've
ian
ice
icon
idea
ideas
identity
ignorance
ignore
ii
iii
illegal
illness
illusion
illustrations
image
imageryimages
imagination
imagine
imdb
imitate
immediately
immoral
immunity
impact
impartiality
impatient
impersonate
implementimplementation
implode
importance
important
impose
impossible
impressed
impression
impressive
imprisoned
improving
impulse
inch
incline
include
included
includesincluding
income
increase
incredible
incredibly
indeed
independence
independent
indexindian
indiana
indication
individual
industrial
industry
inexpensive
infection
infinite
influence
inform
information
infrastructure
inhale
initial
initially
ink
inmate
inn
inner
innocence
innocent
inquire
insane
insectinsects
insert
inside
insight
inspect
inspired
installation
installment
instance
instead
institution
instructinstruction
instructor
instrument
instrumentation
insult
insurance
insureinsured
integrateintegration
intellect
intelligence
intelligent
intend
intended
intenseintensity
interaction
interest
interested
interesting
interim
interior
international
internet
interpretation
interrupt
intervention
interview
intoxicate
intriguing
introduce
introducedintroduction
intuition
invalidate
invent
inventory
investigateinvestigation
investment
investor invite
invoice
involve
involved
involvesinvolving
ipod
iris
irish
iron
irritate
irritating
island
isolation
israel
issueissues
italian
italy
ivy
j
jack
jacket
jackie
jackson
jaguar
jail
jake
jamaica
james
jamie
jane
japanjapanese
jar
jason
jaw
jay
jazz
jean
jeff
jellyfish
jennifer
jerk
jerry
jerusalem
jessica
jesus
jet
jewel
jewelry
jewish
jim
jimmy
joan
job
joe
jog
john
johnny
join
joint
joke
jokes
jones
journal
journey
joy
jr
judge
judging
judgment
juice
julia
julie
jump
jumper
jungle
jurisdiction
jury
justice
justify
juvenile
kkane
kansas
kate
keaton
keep
keeping
keeps
kelly
kennedy
kept
kevin
key
keyboard
kick
kid
kidnap
kidney
kids
killkilled
killerkillers
killingkills
kilometer
kind
kinda
kinds
king
kiss
kitchen
kittens
kitty
klan
knee
kneel
knew
knife
knight
knit
knitting
knock
know
knowing
knowledge
knownknows
kong
kremlin
kubrick
l
la
lablaboratory
lace
lacklackinglacks
lad
laden
ladies
lady
lake
lamb
lame
lamp
land
landlord
landscape
language
lantern
large
largely
larry
last
late
later
latest
latex
latin
latter
laugh
laughable
laughedlaughing
laughs
laughter
launch
laundering
laura
law
lawn
lawyer
lay
layer
lead
leader
leadingleads
leaf
lean
leap
learnlearned
learns
least
leather
leave
leavesleaving
lebanon
led
ledge
lee
left
leg
legend
legendary
legion
lego
lemon
lend
length
lens
less
lesson
let
let's
lets
letter
levellevels
lewis
liability
library
libya
license
lick
licking
lie
lien
lies
life
lift
light
lighthouse
lighting
lightning
lights
likable
like
liked
likely
likes
lily
limb
limit
limited
limp
lincoln
line
linen
lines
lineup
lingerie
link
lion
lip
lips
liquid
liquor
list
listen
listing
literally
literature
litter
little
livelived
liver
lives
living
lizard
load
loan
lobster
local
locate
locationlocations
locked
locomotives
log
logic
london
lonely
long
longer
looklookedlooking
looks
loop
loose
loosen
lord
lose
loseslosing
loss
losses
lost
lot
lots
loud
louisiana
lounge
love
loved
lovely
lover
lovers
lovesloving
low
lower
lucas
luck
lucky
luggage
luke
lunch
lung
luxury
lynch
lyric
mac
machine
mad
made
madhouse
magazine
magicmagical
magician
magnificent
magnify
magnitude
magnolia
maid
main
mainly
mainstream
majormajority
make
make-up
maker
makers
makes
makeup
making
malemales
mallard
mammalmammals
manman's
managemanaged
management
manager
manages
manchester
mannequin
manner
manslaughter
manufacturer
many
map
maple
maradona
marathon
marble
marching
mare
marijuana
mark
market
marriagemarried
marrow
marry
mars
martial
martin
mary
mask
massacre
massive
master
masterpiece
match
mate
material
matrix
matt
matter
matters
matthew
mature
max
maxwell
may
maybe
mayor
meal
mean
meaning
means
meant
meanwhile
measure
meat
mechanic
mechanism
medal
media
medicine
mediocre
medium
meetmeeting
meets
mel
melody
melt
membermembers
memorabilia
memorable
memorial
memoriesmemory
men
menace
mend
mental
mention
mentioned
menu
meremerely
mess
message
met
metal
meter
metro
mexico
michael
michelle
microwave
midday
middle
might
mike
mile
miles
military
militia
milk
mill
miller
millionmillions
mimic
mindmindless
minds
mine
miniature
minister
ministry
mink
minor
minority
minuteminutes
mirror
misery
miss
missed
missile
missing
mission
misspend
mist
mistake
mistakes
mistreat
misty
mixmixedmixture
mob
mode
model
modern
modest
modification
molecule
mom
moment
moments
momentum
money
monk
monkeys
monstermonsters
monthmonths
monument
mood
moon
moore
moralmorality
morgan
morning
mortal
mortality
moss
mostly
motel
mother
motion
motive
motormotorcycle
motto
mount
mountain
mouse
mouth
move
moved
movement
moves
movie
movie's
movies
moving
mow
mozart
mr
mrs
ms
much
mud
mug
multiple
multiply
mummy
munch
munich
mural
murdermurdered
murders
murphymurray
musclemuseum
mushrooms
musicmusical
musicianmusicians
mussolini
must
mustard
myers
mysterious
mystery
myth
nag
nailnails
naked
name
named
namesnap
narration
narrative
narrow
nasty
nation
national
native
natural
naturally
nature
navy
nazi
near
nearly
necessarily
necessary
neck
necklace
need
needed
needle
needs
negative
neglect
negroes
neither
neon
nephew
nerve
nervous
nest
net
network
neutral
never
nevertheless
new
news
newspaper
next
nice
nicely
nicholson
nick
nickel
night
nightmare
nobody
noir
noise
nominated
none
nonetheless
nonsense
noodles
noon
normal
normally
north
nose
notch
note
notebook
nothing
notice
noticeable
noticed
notify
novel
novice
nowhere
nuclear
nude
nudity
nuisance
nullify
numbernumbers
numerous
nursenurses
nut
nutrition
nuts
oak
obey
objectobjective
obligation
observation
observatory
observe
obsessed
obtain
obvious
obviously
occasion
occasionally
occupation
occuroccurrence
ocean
odd
odyssey
offend
offer
offers
office
officer
official
often
oh
oil
okokay
old
older
oldest
oliver
one
one's
ones
onion
onto
opec
open
opening
opens
opera
operateoperationoperative
operator
opinion
opponent
opportunity
opposite
option
oracle
orange
orchestra
orchids
order
ordinary
organ
organism
organization organize
origami
origin
original
originality
originally
originate
orthodontist
oscaroscars
others otherwise
outdoor
outfit
outlet
outnumber
outside
outstanding
oval
over-the-top
overall
overcome
overflow
overhead
overly
overpower
overwhelm
owe
owen
owl
own
owner
ox
oxford
oxygen
p
pacepaced
pacific
pacing
pacino
package
packaging packed
packet
pact
padding
page
paid
pain
painful
paintpainted
painting
pair
pakistan
palace
palestinianpalestinians
palm
pan
panda
panorama
paper
papers
par
parade
paragraph
parcel
pardon
parentparents
paris
parish
park
parker
parking
parody
parrot
part
participant
participate
particular
particularly
partner
parts
party
pass
passage
passed
passion
passport
past
pat
patch
patent
path
pathetic
patients
patio
patrick
patternpatterns
paul
pause
paws
pay
payment
peace
peaceful
peacock
pearl
pebbles
peel
pelicanpelt
pen
pencil
pennsylvania
people
people's
pepper
perceive
percent
perception
perfect
perfection
perfectly
perform
performanceperformances
performedperformer
perfume
perhaps
period
perish
perjury
permission
permit
person
personal
personality
personally
personnel
perspective
perspire
persuade
pet
petals
peter
pets
pg
phantom
phenomenonphilip
phone
photographer
photography
photos
phrase
physical
physician
physics
piano
piazza
pickpicked
picture
pictures
pie
piecepieces
pier
pig
pigeon
pigs
pillowpilot
pin
pinch
pink
pinnacle
pipe
pitch
pitt
pity
pizza
place
placed
places
plague
plain
plan
plane
planet
planningplans
plant
plantation
plastic
plateplates
play
played
playerplayers
playground
playing
plays
pleaplead
pleasant
pleasantly
please
pleasure
pledges
plenty
plot
plots
plow
pluck
plus
poach
pod
poem
poet
poetry
poignant
point
pointless
points
poise
poker
poland
pole
poles
police
polish
polite
political
politician
politics
pollution
polo
polyester
pond
ponder
poodle
pool
poor
poorly
pop
popcorn
poppy
popular
population
porch
pork
porn
port
portrait
portray
portrayalportrayed
portrayingportrays
position
positive
possess
possession
possibility
possible
possibly
post
postage
postcards
posted
poster
pot
potatopotatoes
potential
potter
poultry
pounce
pour
poverty
power
powerful
powers
practically
practice
praise
pray
prayer
precedent
predict
predictable
prefer
preference
pregnancy
pregnant
prejudice
premise
preparation
prepare
prepared
presence
presentpresented
presents
preservation
president
press
pressure prestige
presume
pretend
pretentious
pretty
preview
previous
previously
prey
price
prick
pride
priest
prime
princeprincess
printer
prior
prison
prisoners
private
privileges
probability
probably
problem
problems
procedure
proceedings
process
processing
proclaim
produce
produced
producer
producers
product
production
profession
professional
professionals
professorprofit
program
prohibit
project
projector
prominence
promise
promote
promoter
proof
propaganda
proper
property
prose
prosecute
prosper
protagonist
protectprotection
protective
protest
protestant
protocol
proton
proud
prove
provedproves
provideprovidedprovides
province
proving
proximity
psychiatry
psycho
psychological
psychology
pub
public
publication
puddle
pug
pullpulled
pulls
pulp
pump
pumpkin
punch
punishpunishment
punk
pup
pupil
puppy
purchase
pure
purple
purpose
purse
pursue
push
put
puts
putting
pyramid
qualities
quality
quantity
quarantine
quarrel
quarterback
quartz
queenqueens
quench
query
quest
questionquestions
queue
quick
quickly
quiet
quirky
quit
quite
quiz
quota
quotation
quote
r
rabbi
rabbit
race
racer
rachel
racing
racism
racket
radar
radiation
radical
radio
railrailroad
rails
railway
rain
rainbow
raise
raised
rally
ran
random
range
rap
rape
rapid
rarerarely
rash
raspberry
rat
rate
rated
rather
rating
rationalize
rattle
raw
ray
reach
reaction
read
reader
reading
ready
real
realism
realistic
reality
realize
realized
realizes
really
reap
reappear
rearrange
reason
reasons
rebel
recall
receivereceived
receiver
recent
recently
recess
recognition
recognize
recommend
recommendation
recommended
record
recorder
recovery
recreation
recruit
recyclerecycling
red
redeeming
redemption
redheadreduce
reef
reel
reeves
reference
references
reflectreflection
refrain
refreshing
refrigerator
refugees
refuse
regarding
regardless
region
register
registration
regret
regular
reichstag
reject
rejoice
relate
relation relationship
relationships
relative
relatively
relax
relaxation
relaxed
relay
releasereleased
relief
religionreligious
rely
remain
remains
remake
remark
remarkable
remedy
remember
remembered
remindremindedreminds
reminiscent
remove
renounce
rentrental
rented
repair
repeal
repeat
replacereply
report
representation
representative
repress
reprimandreproduce
reptiles
republic
republicans
reputation
request
requirerequired
rescuerescued
researchresearchers
reservation
reserve
respect
respond
responsible
rest
restaurant
restless
restore
restrict
result
results
retailer
retain
retireretiring
retreat
returnreturns
revealrevealed
revenge
review
reviewersreviews
revolution
reward
rhodes
rhythm
rice
rich
richard
ride
ridge
ridicule
ridiculous
right
ring
rings
rinse
rip
ripples
rise
risk
river
riverside
road
roamroar
rob
robert
roberts
robin
robot
rock
rocker
rocket
rocky
rod
rodents
roger
roleroles
roll
rolling
romanceromantic
ron
roof
rook
room
rooster
root
rope
rose
roughround
routine
rover
row
rowing
rub
rubber
rubbish
rugby
ruin
ruined
ruins
rule
rules
run
running
runs
rush
russell
russiarussian
rust
rusty
ryan
sabbath
sad
sadly
safe
safety
said
sailsailing
sailor
saint
saints
sake
salad
salary
sale
sales
salt
salutesam
sanctions
sand
sandwich
santa
sarah
sat
satan
satellite
satin
satire
satisfy
satisfying
saturday
sauce
sausage
savesaved
saving
saw
say
saying
says
scale
scandal
scarce
scarescared
scaresscary
scene
scenery
scenes
scheme
school
sci-fi
science
scientist
scold
scooter
score
scotch
scott
scottish
scouting
scramble
scratch
scream
screaming
screen
screening
screenplay
scribble
script
scrutiny
sculpture
sea
seafood
seagull
sean
search
seashore
seasonseasons
seat
second
seconds
secret
secretary
secure
security
see
seed
seeing
seek
seem
seemed
seemingly
seems
seenseepage
sees
segment
seizeseizure
select
selection
self
sell
seminar
senate
send
sense
sensitive
sent
sentence
sentenced
sentiment
sentry
separate
separation
sequelsequels
sequence
sequences
serial
series
serious
seriously
serveserved
server
serves
service
serving
session
set
sets
setting
settle
seven
several
sew
sewing
sexsexual
sexy
shade
shadow
shake
shakespeare
shallowshame
share
shariff
shark
sharp
shatter
shed
sheep
sheer
sheet
shell
shelter
sheriff
shift
shineshines
ship
shirt
shiver
shock
shocked
shocking
shoes
shootshooting
shopshopping
shore
short
shortage
shotshots
shoulder
shove
show
showed
shower
showingshown
shows
shrink
shuffle
shun
shut
sick
sickness
side
sides
sidewalk
sight
sign
signal
signature
signed
significant
signs
silence
silent
silhouette
silly
silver
similar
similaritysimon
simple
simply
simulation
since
sing
singapore
singer
singing
single
sink
sinner
sip
sister
sisters
sit
site
sitting
situation
situations
six
sixteen
sixth
size
skateskateboard
skating
sketch
skiskiing
skillskills
skin
skip
skirt
skull
sky
skyline
skyscraper
slap
slapstick
slash
slasher
slaveslaveryslaves
slayslaying
sleep
sleeve
slice
slide
slightly
slip
slither
slope
slow
slowly
slurp
sly
small
smart
smash
smear
smell
smile
smith
smoke
smoking
smother
smudge
snake
snap
snatch
sneak
sneeze
sniff
snoozesnore
snow
snowboarding
snowman
snuggle
soak
soap
soar
sob
soccer
social
socialism
society
socks
sofa
softball
software
soil
solar
soldiersoldiers
sole
solid
solve
somebody
somehow
someone
something
sometimes
somewhat
somewhere
son
song
songs
soon
soothe
sorrow
sorry
sort
sorts
soul
soundsounds
soundtrack
soup
source
south
soviets
sow
soybean
space
spaghetti
spanish
spank
spare
spark
spawn
speakspeakingspeaks
special
species
spectacular
speculation
speech
speed
spell
spendspending
spent
sphere
spider
spider-man
spielberg
spill
spin
spine
spiral
spirit
spite
splash
split
spoilspoilerspoilers
spoon
sportsports
spot
sprain
spray
spread
spring
springfield
sprinkle
sprint
spy
squad
square
squash
squeaksqueal
squeeze
squint
squirrel
stab
stability
stadium
stage
stain
staircase
stairs
stake
stalin
stamp
stance
stand
standardstandards
standing
stands
star
stare
starring
stars
start
started
starter
starting
starts
starve
state
statement
states
station
statue
status
stay
stayed
stays
steak
steal
steals
steel
steeple
steer
stem
stencil
stepstephen
stereotypes
steve
steven
stevenson
stew
stewart
stick
stickers
still
sting
stink
stir
stitch
stitches
stock
stockings
stomach
stonestop
stopped
store
stories
stork
storm
storystoryline
storytelling
stove
straight
strain
strange
stranger
strategy
straw
strawberry
stray
stream
street
streets
strength
stretch
strike
string
strip
stripes
strive
stroke
strong
strongly
structure
struggle
struggling
stuck
stud
studentstudents
studio
study
stuff
stumble
stunning
stunts
stupid
stylestylish
subcommittee
subject
submarine
submitsubstance
subtitles
subtle
subtract
subway
succeed
succeeds
successsuccessful
sucksuckedsucks
sudden
suddenly
suds
sue
suffering
suffers
sugar
suggest
suicide
suit
sum
summary
summer
sun
sunflower
sunglasses
sunk
sunlightsunny
sunrisesunset
sunshine
super
superb
superhero
superior
superman
supernatural
supper
supply
support
supporter
supporting
suppose supposed
supposedly
sure
surely
surf
surface
surfers
surgery
surname
surpass
surprise
surprisedsurprises
surprising
surprisingly
surreal
survive
susan
sushi
suspect
suspensesuspenseful
suspicion
swallowswamp
swan
swap
sway
swear
sweat
sweater
sweden
sweet
swimswimming
swimsuit
swing
swoon
sword
sydney
symbol
sympathetic
sympathy
system
table
tableware
tackle
tag
tail
take
taken
takestaking
tale
talenttalentedtalents
tales
talktalking
talks
tall
tan
tank
tap
tape
tarantino
target
tarnish
task
taste
tattoo
tax
taxation
taxi
taxpayer
taylor
tea
teach teacherteaching
team
teartears
tease
technical
technician
technology
teen
teenageteenager
teenagersteens
teeth
telephone
television
tell
telling
tells
temper
temperature
temple
ten
tend
tennis
tenor
tensetension
tent
term
terminator
terms
terrible terribly
terrier
terrific
terrifying
territory
terror
test
tested
testify
texas
text
textbook
textile
texture
thailand
thank
thankfully
thanks
that's
thats
thaw
theater
theaters
theatre
theatrical
theft
theme
themes
theory
there's
therefore
they're
they've
thief
thin
thing
things
think
thinking
thinks
third
thomas
thoroughlythough
thought
thoughts
threatthreats
three
thrillerthrilling
throat
throne
throughout
throw
thrown
thumb
thunderstorm
thus
ticker
tickettickets
tickle
tie
tiger
tight
tiles
till
tim
timber
time
timer
times
tin
tiny
tire
tired
titanic
title
toast
tobacco
todaytoday's
toe
toes
together
toil
toilet
told
tolerance
tom
tomato
tommy
tone
tongue
tony
took
tool
tooth
top
topic
torture
toss
total
totally
tote
touch
touched
touches
touching
tough
tourist
tournament
tow
towardtowards
tower
town
toytoys
track
tract
tradetrading
traditional
traffic
tragedytragic
trail
trailertrailers
train
trainer
training
tram
transformation
transformers
transmission
transplant
transporttransportation
trap
trapped
trash
traveltraveling
tread
treasure
treat
treatedtreatment
treaty
tree
trek
trial
tribunal
trick
tried
tries
trilogy
trim
trinity
trip
triumph
troops
tropical
trouble
troubles
truck
true
truly
trunk
trust
truth
trytrying
tsunami
tub
tug
tulip
tumble
tune
tunnel
turkey
turks
turn
turned
turning
turns
tv
twenty
twice
twigs
twist
twisted
twists
two
type
types
typical
u
ugly
uk
ultimate
ultimately
umbrella
unable
unbelievable
uncle
underground
underrated
understand
understanding
understood
underwater
unexpected
unforgettable
unfortunate
unfortunately
unhappy
uniform
union
unique
unit
unite
united
universal
universe
unknown
unless
unlike
unlikely
unload
unnecessary
unrealistic
unusual
update
upon
upset
urban
urge
us
usa
useused
usesusing
usual
usually
usurp
utter
utterly
v
vacate
vacation
valentine
valley
valor
value
values
vampirevampires
van
vanish
variety
various
vary
vault
veer
vehicle
vein
verdict
verify
verse
versionversions
vessel
veteran
vhs
victimvictims
victor
victory
video
vietnamvietnamese
view
viewed
viewer
viewers
viewing
views
villa
villagevillages
villainvillains
vincent
vine
vinegar
vintage
vinyl
violation
violenceviolent
violet
violin
virtually
virtuoso
vision
visit
visitor
visualvisuallyvisuals
vitamin
vocal
vodka
voicevoices
volcano
volunteer
vomit
vote
voyage
vs
w
wag
wagner
wagon
waist
wait
waiting
wake
walk
walked
walking
walkswall
walter
wander
want
wanted
wanting
wants
war
warm
warnwarning
warranty
warrior
wars
warships
wash
washer
washing
washington
wasp
waste
wasted
watch
watchable
watchedwatching
water
waterfall
wave
way
wayne
ways
we're
we've
weak
wealth
weaponweapons
wearwearing
weather
weave
web
wed
wedding
wednesday
weed
weekweekendweeks
weep
weight
weird
welcome
welfare
well
went
west
western
wet
whale
what's
whatever
whatsoever
wheat
wheneverwhereas
whether
whilst
whip
whiskers
whiskey
whisper
whistle
white
who's
whole
whose
wide
wife
wig
wiggle
wild
william
williams
willing
willis
wilson
win
wind
windmill
window
wine
wing
wings
winnerwinning
winter
wipe
wire
wisdom
wise
wish
wit
witch
withdrawwithdrawal
withdrawn
within
without
witness
witty
wizard
wolf
woman
women
wonder
wonderful
wonderfully
wondering
wood
wooden
woodland
woods
woody
wool
wordwords
workworked
worker
working
workplace
works
workshop
world
worlds
worm
worry
worseworst
worth
worthwhile
worthy
would
would've
wow
wrap
wreck
wrist
write
writer
writers
writingwritten
wrong
wrote
x
x-men
yacht
yale
yard
yarn
yawn
yeah
year
year-oldyearn
years
yell
yellow
yen
yes
yet
yield
york
young
younger
youth
zebra
zero
zinc
zombie
zombies
zone
zoo
?
?
´
52 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
t-SNE on the imdb20 PPMI VSM
positivity negativity
52 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippets
vsm_01_code6_solved
April 1, 2019
In [1]: %matplotlib inlinefrom nltk.corpus import opinion_lexiconimport osimport pandas as pdimport vsm
In [2]: DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
In [3]: imdb5_ppmi = vsm.pmi(imdb5)
In [4]: # Supply a str filename to write the output to a file:vsm.tsne_viz(imdb5_ppmi, output_filename=None)
In [5]: # To display words in different colors based on external criteria:positive = set(opinion_lexicon.positive())negative = set(opinion_lexicon.negative())
colors = []for w in imdb5_ppmi.index:
if w in positive:color = 'red'
elif w in negative:color = 'blue'
else:color = 'gray'
colors.append(color)
vsm.tsne_viz(imdb5_ppmi, colors=colors)
1
53 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Latent Semantic Analysis (LSA)
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
54 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Overview
• Due to Deerwester et al. 1990.
• One of the oldest and most widely used dimensionalityreduction techniques.
• Also known as Truncated Singular Value Decomposition(Truncated SVD).
• Standard baseline, often very tough to beat.
55 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Guiding intuitions for LSA
●
●
●
●
x
y
0 2 10 11 12
0
34
1415
A
B
C
D
56 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
The LSA method
Singular value decompositionFor any matrix of real numbers A of dimension (m× n) thereexists a factorization into matrices T, S, D such that
Am×n = Tm×mSm×mDTn×m
· · · ·· · · ·· · · ·
=
· · ·· · ·· · ·
···
· · ·· · ·· · ·· · ·
T
A3×4 = T3×3 S3×3 DT4×3
57 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Idealized LSA example
d1 d2 d3 d4 d5 d6
gnarly 1 0 1 0 0 0wicked 0 1 0 1 0 0
awesome 1 1 1 1 0 0lame 0 0 0 0 1 1
terrible 0 0 0 0 0 1
⇓⇑
Distance from gnarly
1. gnarly2. awesome3. terrible4. wicked5. lame
T(erm)
gnarly 0.41 0.00 0.71 0.00 -0.58wicked 0.41 0.00 -0.71 0.00 -0.58
awesome 0.82 -0.00 -0.00 -0.00 0.58lame 0.00 0.85 0.00 -0.53 0.00
terrible 0.00 0.53 0.00 0.85 0.00
×
S(ingular values)
2.45 0.00 0.00 0.00 0.000.00 1.62 0.00 0.00 0.000.00 0.00 1.41 0.00 0.000.00 0.00 0.00 0.62 0.000.00 0.00 0.00 0.00 -0.00
×
D(ocument)
d1 0.50 -0.00 0.50 0.00 -0.71d2 0.50 0.00 -0.50 0.00 0.00d3 0.50 -0.00 0.50 0.00 0.71d4 0.50 -0.00 -0.50 -0.00 0.00d5 -0.00 0.53 0.00 -0.85 0.00d6 0.00 0.85 0.00 0.53 0.00
T
gnarly 0.41 0.00wicked 0.41 0.00
awesome 0.82 -0.00lame 0.00 0.85
terrible 0.00 0.53
× 2.45 0.000.00 1.62 =
gnarly 1.00 0.00wicked 1.00 0.00
awesome 2.00 0.00lame 0.00 1.38
terrible 0.00 0.85
Distance from gnarly
1. gnarly2. wicked3. awesome4. terrible5. lame
58 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Cell-value comparisons (k = 100)
0.00 0.25 0.50 0.75 1.00 1.25Cell value 1e8
101
103
105
107
Raw counts
⇒
1.0 0.5 0.0 0.5Cell value 1e8
100
101
102
103
104
105
LSA (k=100)
⇓
10 5 0 5 10 15Cell value
101
102
103
104
105
106
107
PMI
⇒
50 0 50 100 150 200Cell value
101
102
103
104
105
PMI+LSA (k=100)
59 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Choosing the LSA dimensionality
0 10 20 30 40Singular value rank
0.3
0.4
0.5
0.6
0.7
Sing
ular
val
ue
Dream scenario for choosing k
0 1000 2000 3000 4000 5000Singular value rank
5.0
2.5
0.0
2.5
5.0
7.5
Sing
ular
val
ue (l
og-s
cale
)
PMI+LSA
60 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Related dimensionality reduction techniques
• Principal Components Analysis (PCA)• Non-negative Matrix Factorization (NMF)• Probabilistic LSA (PLSA; Hofmann 1999)• Latent Dirichlet Allocation (LDA; Blei et al. 2003)• t-SNE (van der Maaten and Hinton 2008)
See sklearn.decomposition and sklearn.manifold
61 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Code snippetsvsm_01_code6_solved
April 1, 2019
In [1]: import osimport pandas as pdimport vsm
In [2]: DATA_HOME = os.path.join('data', 'vsmdata')
giga5 = pd.read_csv(os.path.join(DATA_HOME, 'giga_window5-scaled.csv.gz'), index_col=0)
In [3]: giga5.shape
Out[3]: (5000, 5000)
In [4]: giga5_lsa100 = vsm.lsa(giga5, k=100)
In [5]: giga5_lsa100.shape
Out[5]: (5000, 100)
1
62 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Autoencoders
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
63 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Overview
• Autoencoders are a flexible class of deep learningarchitectures for learning reduced dimensionalrepresentations.
• Chapter 14 of Goodfellow et al. (2016) is an excellentdiscussion.
64 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
The basic autoencoder model
x
x_hat = hW_hy + b_hy
h = f(xW_xh + b_xh)
Seeks to predict its own input.
High-dimensional inputs are fed through a narrow hidden layer
(or multiple hidden layers).This is the representation of
interest – akin to LSA output.
This might be preceded by a separate dimensionality
reduction step (e.g., LSA)
y_err = x_hat - x
d_b_hy = y_err
h_err = y_err.dot(W_hy.T) * f’(h)
d_W_hy outer(h, y_err)
d_W_xh = outer(x, h_err)
d_b_xh = h_err
Assume f = tanh and so f’(z) = 1.0 - z 2. Per example error is ∑i 0.5 * (x_hati - xi)2
65 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Autoencoder code snippets
vsm_01_code7
April 2, 2019
In [1]: from np_autoencoder import Autoencoderimport osimport pandas as pdfrom torch_autoencoder import TorchAutoencoderimport vsm
In [2]: DATA_HOME = os.path.join('data', 'vsmdata')
giga5 = pd.read_csv(os.path.join(DATA_HOME, 'giga_window5-scaled.csv.gz'), index_col=0)
In [3]: # You'll likely need a larger network, trained longer, for good results.ae = Autoencoder(max_iter=10, hidden_dim=50)
In [4]: # Scaling the values first will help the network learn:giga5_l2 = giga5.apply(vsm.length_norm, axis=1)
In [5]: # The `fit` method returns the hidden reps:giga5_ae = ae.fit(giga5_l2)
Finished epoch 10 of 10; error is 0.4883386066987744
In [6]: torch_ae = TorchAutoencoder(max_iter=10, hidden_dim=50)
In [7]: # A potentially interesting pipeline:giga5_ppmi_lsa100 = vsm.lsa(vsm.pmi(giga5), k=100)
In [8]: giga5_ppmi_lsa100_ae = torch_ae.fit(giga5_ppmi_lsa100)
Finished epoch 10 of 10; error is 1.2230274677276611
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
1
66 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Autoencoder code snippetsIn [9]: vsm.neighbors("finance", giga5).head()
Out[9]: finance 0.000000minister 0.870300. 0.880074</p> 0.896013ministry 0.897051dtype: float64
In [10]: vsm.neighbors("finance", giga5_ae).head()
Out[10]: finance 0.000000article 0.504076style 0.526473domain 0.538920investigators 0.548903dtype: float64
In [11]: vsm.neighbors("finance", giga5_ppmi_lsa100_ae).head()
Out[11]: finance 0.000000affairs 0.232635management 0.248080commerce 0.255099banking 0.256428dtype: float64
2
66 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Global Vectors (GloVe)
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
67 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Overview
• Pennington et al. (2014)
• Roughly speaking, the objective is to learn vectors forwords such that their dot product is proportional to theirprobability of co-occurrence.
• We’ll use the implementation in the mittens package(Dingwall and Potts 2018). There is a referenceimplementation in vsm.py. For really big vocabularies,the GloVe team’s C implementation is probably the bestchoice.
• We’ll make use of the GloVe team’s pretrainedrepresentations throughout this course.
68 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
The GloVe objective
w>iewk + bi + ebk = log(Xik)
Equation (6):
w>iewk = log(Pik) = log(Xik)− log(Xi)
Allowing different rows and columns:
w>iewk = log(Pik) = log(Xik)− log(Xi∗ · X∗k)
That’s PMI!
pmi(X, i, j) = log�
Xij
expected(X, i, j)
�
= log�
P(Xij)
P(Xi∗) · P(X∗j)
�
By the equivalence log(xy ) = log(x)− log(y)
69 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
The weighted GloVe objective
Original
w>iewk + bi + ebk = log(Xik)
Weighted
|V|∑
i,j=1
f�
Xij�
�
w>iewj + bi + ebj − logXij
�2
where V is the vocabulary and f is
f (x)
¨
(x/xmax)α if x < xmax
1 otherwise
Typically, α is set to 0.75 and xmax to 100.
70 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
GloVe hyperparameters
• Learned representation dimensionality.• xmax, which flattens out all high counts.• α, which scales the values as (x/xmax)
α.
f (x)
¨
(x/xmax)α if x < xmax
1 otherwise
f��
100 99 75 10 1��
=�
1.00 0.99 0.81 0.18 0.03�
71 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
GloVe learningThe loss calculations
f (Xij)�
w>iewj − logXij
�
show how gnarly and wicked arepulled toward awesome. Biasterms left out for simplicity.gnarly and wicked deliberatelyfar apart in w0 and ew0.
0.4 0.2 0.0 0.2 0.4GloVe dimension 1
0.4
0.2
0.0
0.2
0.4
GloV
e di
men
sion
2
gnarly
wicked
awesome
terribleBefore iteration 1
0.5 1.0 1.5 2.0GloVe dimension 1
1.5
1.0
0.5
0.0
0.5
GloV
e di
men
sion
2
gnarly
wicked
awesome
terribleAfter iteration 1
Counts gnarly wicked awesome terrible
gnarly 10 0 9 1wicked 0 10 9 1awesome 9 9 19 1terrible 1 1 1 3
Weights(xmax = 10, α = 0.75)gnarly wicked awesome terrible
gnarly 1.00 0.00 0.92 0.18wicked 0.00 1.00 0.92 0.18awesome 0.92 0.92 1.00 0.18terrible 0.18 0.18 0.18 0.41
w0
gnarly 0.27 −0.27wicked −0.27 0.27awesome 0.36 −0.50terrible 0.08 0.16
ew0
gnarly 0.18 −0.18wicked −0.18 0.18awesome 0.03 0.20terrible 0.17 0.32
0.92�
�
0.27 −0.27�> � 0.03 0.20
�
− log( 9 )�
= −2.06
0.92�
�
−0.27 0.27�> � 0.03 0.20
�
− log( 9 )�
= −1.98
w1
gnarly 0.99 −0.85wicked 0.74 −0.54awesome 0.37 −0.26terrible 0.12 0.21
ew1
gnarly 0.97 −0.82wicked 0.73 −0.54awesome 0.34 −0.25terrible 0.20 0.34
0.92�
�
0.99 −0.85�> � 0.34 −0.25
�
− log( 9 )�
= −1.51
0.92�
�
0.74 −0.54�> � 0.34 −0.25
�
− log( 9 )�
= −1.66
72 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
GloVe cell-value comparisons (n = 50)
0.00 0.25 0.50 0.75 1.00 1.25Cell value 1e8
101
103
105
107
Raw counts
⇒
2 1 0 1 2Cell value
103
104
GloVe (n=50)
73 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
GloVe code snippets
vsm_01_code8
April 2, 2019
In [1]: from mittens import GloVeimport numpy as npimport osimport pandas as pdimport vsm
In [2]: DATA_HOME = os.path.join('data', 'vsmdata')
imdb5 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window5-scaled.csv.gz'), index_col=0)
imdb20 = pd.read_csv(os.path.join(DATA_HOME, 'imdb_window20-flat.csv.gz'), index_col=0)
In [3]: # What percentage of the non-zero values are being mapped to 1 by f?def percentage_nonzero_vals_above(df, n=100):
v = df.values.reshape(1, -1).squeeze()v = v[v > 0]above = v[v > n]return len(above) / len(v)
In [4]: percentage_nonzero_vals_above(imdb5)
Out[4]: 0.017534398942316464
In [5]: percentage_nonzero_vals_above(imdb20)
Out[5]: 0.1519065095882084
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
1
74 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
GloVe code snippetsIn [ ]:
In [6]: glv = GloVe(max_iter=100, n=50)
In [7]: imdb5_glv = glv.fit(imdb5)
Iteration 100: loss: 536157.755
In [8]: glv.sess.close()
In [9]: imdb20_glv = glv.fit(imdb20)
Iteration 100: loss: 1043351.625
In [10]: # Restore the original `pd.DataFrame` structure:imdb20_glv = pd.DataFrame(imdb20_glv, index=imdb20.index)
In [11]: # To what a degree is the GloVe objective achieved?def correlation_test(true, pred):
mask = true > 0M = pred.dot(pred.T)with np.errstate(divide='ignore'):
log_cooccur = np.log(true)log_cooccur[np.isinf(log_cooccur)] = 0.0row_prob = np.log(true.sum(axis=1))row_log_prob = np.outer(row_prob, np.ones(true.shape[1]))prob = log_cooccur - row_log_prob
return np.corrcoef(prob[mask], M[mask])[0, 1]
In [12]: correlation_test(imdb5.values, imdb5_glv)
Out[12]: 0.38032242586515264
In [13]: correlation_test(imdb20.values, imdb20_glv.values)
Out[13]: 0.484126476892789
In [ ]:
In [ ]:
In [ ]:
In [14]: %matplotlib inlineimport matplotlib.pyplot as plt
In [15]: def hist(df, output_filename, color, title):figdir = "/Users/cgpotts/Documents/teaching/2018-2019/spring/cs224u/vsm/fig/"output_filename = os.path.join(figdir, output_filename)vals = df.values.reshape(1, -1).squeeze()plt.hist(vals, log=True, facecolor=color)plt.xlabel("Cell value")plt.title(title)plt.savefig(output_filename)plt.close()
2
74 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word2vec
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
75 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Overview
• Introduced by Mikolov et al. (2013).
• Goldberg and Levy (2014) identify the relationshipbetween word2vec and PMI.
• The TensorFlow tutorial Vector representations of wordsis very clear and links to code.
• Gensim package has a highly scalable implementation.
76 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word2vec: From corpus to labeled data
it was the best of times, it was the worst of times, . . .
With window size 2:
x y
it wasit thewas itwas thewas bestthe wasthe itthe bestthe of
. . .
77 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word2vec: Basic skip-gramThe basic skip-gram model estimates the probability of aninput–output pair (a,b) as
P(b | a) =exp(xawb)
∑
b′∈V exp(xawb′)
where xa is the row-vector representation of word a and wb isthe column vector representation of word b. Minimize:
−m∑
i=1
|V|∑
k=1
1{ci = k} logexp(xiwk)
∑|V|j=1 exp(xiwj)
where V is the vocabulary and c is a one-hot encoded vectorof the same length as V. This gives rise to a classifier:
C = softmax(XW + b)
We’re back to our core insight for this unit: word and contextmatrix, pushing their dot products in a specific direction.
78 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
word2vec: Noise contrastive estimation
Training the basic skip-gram model directly is expensive forlarge vocabularies, because W, b, and C get so large. Noisecontrastive estimation addresses that:
∑
a,b∈D− logσ(xawb) +
∑
a,b∈D′logσ(xawb)
with σ the sigmoid activation function 11+exp(−x) . D′ is a
sample of pairs that don’t appear in the training data.
79 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Retrofitting
1. High-level goals and guiding hypotheses2. Matrix designs3. Vector comparison4. Basic reweighting5. Subword information6. Visualization7. Dimensionality reduction
a. Latent Semantic Analysisb. Autoencodersc. GloVed. word2vec
8. Retrofitting
80 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Central goals
• Distributional representations are powerful and easy toobtain, but they tend to reflect only similarity(synonymy, connotation).
• Structured resources are sparse and hard to obtain, butthey support learning rich, diverse semantic distinctions.
• Can we have the best aspects of both? Retrofitting is oneway of saying, “Yes”.
• Retrofitting is due to Faruqui et al. (2015).
81 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Purely distributional representations
• High-dimensional
• Meaning from dense linguistic inter-relationships
• Meaning solely from (nth-order) co-occurrence
• No grounding in physical or social contexts
• Not symbolic
82 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Grounding via supervisionWord vectors to maximize unsupervised log-likelihood ofwords given documents and supervised prediction accuracy:
(Maas et al. 2011)
83 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Hidden representations from a deep classifier
84 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
The retrofitting model
∑
i∈Vαi
qi − qi
2+
∑
(i,j,r)∈Eβij
qi − qj
2
• Balances fidelity to theoriginal vector qi
• against looking more likeone’s graph neighbors.
• Forces are balanced withα = 1 and β = 1
Degree(i)Figure 1: Word graph with edges between related wordsshowing the observed (grey) and the inferred (white)word vector representations.
Experimentally, we show that our method workswell with different state-of-the-art word vector mod-els, using different kinds of semantic lexicons andgives substantial improvements on a variety ofbenchmarks, while beating the current state-of-the-art approaches for incorporating semantic informa-tion in vector training and trivially extends to mul-tiple languages. We show that retrofitting givesconsistent improvement in performance on evalua-tion benchmarks with different word vector lengthsand show a qualitative visualization of the effect ofretrofitting on word vector quality. The retrofittingtool is available at: https://github.com/mfaruqui/retrofitting.
2 Retrofitting with Semantic Lexicons
Let V = {w1, . . . , wn} be a vocabulary, i.e, the setof word types, and⌦ be an ontology that encodes se-mantic relations between words in V . We represent⌦ as an undirected graph (V,E) with one vertex foreach word type and edges (wi, wj) 2 E ✓ V ⇥ Vindicating a semantic relationship of interest. Theserelations differ for different semantic lexicons andare described later (§4).
The matrix Q will be the collection of vector rep-resentations qi 2 Rd, for each wi 2 V , learnedusing a standard data-driven technique, where d isthe length of the word vectors. Our objective isto learn the matrix Q = (q1, . . . , qn) such that thecolumns are both close (under a distance metric) totheir counterparts in Q and to adjacent vertices in ⌦.Figure 1 shows a small word graph with such edgeconnections; white nodes are labeled with the Q vec-
tors to be retrofitted (and correspond to V⌦); shadednodes are labeled with the corresponding vectors inQ, which are observed. The graph can be interpretedas a Markov random field (Kindermann and Snell,1980).
The distance between a pair of vectors is definedto be the Euclidean distance. Since we want theinferred word vector to be close to the observedvalue qi and close to its neighbors qj ,8j such that(i, j) 2 E, the objective to be minimized becomes:
(Q) =
nX
i=1
24↵ikqi � qik2 +
X
(i,j)2E
�ijkqi � qjk2
35
where ↵ and � values control the relative strengthsof associations (more details in §6.1).
In this case, we first train the word vectors inde-pendent of the information in the semantic lexiconsand then retrofit them. is convex in Q and its so-lution can be found by solving a system of linearequations. To do so, we use an efficient iterativeupdating method (Bengio et al., 2006; Subramanyaet al., 2010; Das and Petrov, 2011; Das and Smith,2011). The vectors in Q are initialized to be equalto the vectors in Q. We take the first derivative of with respect to one qi vector, and by equating it tozero arrive at the following online update:
qi =
Pj:(i,j)2E �ijqj + ↵iqiP
j:(i,j)2E �ij + ↵i(1)
In practice, running this procedure for 10 iterationsconverges to changes in Euclidean distance of ad-jacent vertices of less than 10�2. The retrofittingapproach described above is modular; it can be ap-plied to word vector representations obtained fromany model as the updates in Eq. 1 are agnostic to theoriginal vector training model objective.
Semantic Lexicons during Learning. Our pro-posed approach is reminiscent of recent work onimproving word vectors using lexical resources (Yuand Dredze, 2014; Bian et al., 2014; Xu et al., 2014)which alters the learning objective of the originalvector training model with a prior (or a regularizer)that encourages semantically related vectors (in ⌦)to be close together, except that our technique is ap-plied as a second stage of learning. We describe the
1607
85 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Simple retrofitting examples
∑
i∈Vαi
qi − qi
2+
∑
(i,j,r)∈Eβij
qi − qj
2
α = 0
86 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Simple retrofitting examples
∑
i∈Vαi
qi − qi
2+
∑
(i,j,r)∈Eβij
qi − qj
2
α = 0
86 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Simple retrofitting examples
∑
i∈Vαi
qi − qi
2+
∑
(i,j,r)∈Eβij
qi − qj
2
α = 0
86 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Extensions
Drop the assumption that every edge means ‘similar’:
• Mrksic et al. (2016) AntonymRepel, SynonymAttract, andVectorSpacePreservation for different edge types.
• Lengerich et al. (2018): functional retrofitting to learnthe semantics of any edge types.
• This work is closely related to graph embedding(learning distributed representations for nodes), forwhich see Hamilton et al. 2017.
87 / 90
Overview Designs Vector comparison Basic reweighting Subwords Viz LSA AE GloVe w2v Retrofitting
Retrofitting code snippets
vsm_01_code9
April 2, 2019
In [1]: import pandas as pdfrom retrofitting import Retrofitter
In [2]: Q_hat = pd.DataFrame([[0.0, 0.0],[0.0, 0.5],[0.5, 0.0]],
columns=['x', 'y'])
edges = {0: {1, 2}, 1: set(), 2: set()}
In [3]: Q_hat
Out[3]: x y0 0.0 0.01 0.0 0.52 0.5 0.0
In [4]: retro = Retrofitter(verbose=True)
In [5]: X_retro = retro.fit(Q_hat, edges)
Converged at iteration 2; change was 0.0000
In [6]: X_retro
Out[6]: x y0 0.125 0.1251 0.000 0.5002 0.500 0.000
In [7]: # For an application to WordNet, see `vsm_03_retrofitting`.
1
88 / 90
References
References IDavid M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research,
3:993–1022.Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword
information. ArXiv:1607.04606.S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis.
Journal of the American Society for Information Science, 41(6):391–407.Nicholas Dingwall and Christopher Potts. 2018. Mittens: An extension of GloVe for learning domain-specialized
representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association forComputational Linguistics: Human Language Technologies, pages 212–217, Stroudsburg, PA. Association forComputational Linguistics.
Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting wordvectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, pages 1606–1615, Stroudsburg, PA. Association forComputational Linguistics.
John R. Firth. 1957. A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis, pages 1–32. Blackwell,Oxford.
Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embeddingmethod. Technical Report arXiv:1402.3722v1, Bar Ilan University.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.http://www.deeplearningbook.org.
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. InIEEE Data Engineering Bulletin, pages 52–74. IEEE Press.
Zellig Harris. 1954. Distributional structure. Word, 10(23):146–162.Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, pages 50–57, New York. ACM.Benjamin J. Lengerich, Andrew L. Maas, and Christopher Potts. 2018. Retrofitting distributional embeddings to knowledge
graphs with functional relations. In Proceedings of the 27th International Conference on Computational Linguistics,pages 2423–2436, Stroudsburg, PA. Association for Computational Linguistics.
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in NeuralInformation Processing Systems.
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning wordvectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics, pages 142–150, Portland, Oregon. Association for Computational Linguistics.
Laurens van der Maaten and Geoffrey E. Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research,9:2579–2605.
89 / 90
References
References II
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press,Cambridge, MA.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words andphrases and their compositionality. In Christopher J. C. Burges, Leon Bottou, Max Welling, Zoubin Ghahramani, andKilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. CurranAssociates, Inc.
Nikola Mrksic, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke,Tsung-Hsien Wen, and Steve Young. 2016. Counter-fitting word vectors to linguistic constraints. In Proceedings of the2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human LanguageTechnologies, pages 142–148. Association for Computational Linguistics.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages1532–1543, Doha, Qatar. Association for Computational Linguistics.
Hinrich Schütze. 1993. Word space. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural InformationProcessing Systems 5, pages 895–902. Morgan-Kaufmann.
Noah A. Smith. 2019. Contextual word representations: A contextual introduction. ArXiv:1902.06006v2.Peter D. Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of
Artificial Intelligence Research, 37:141–188.Ludwig Wittgenstein. 1953. Philosophical Investigations. The MacMillan Company, New York. Translated by
G. E. M. Anscombe.
90 / 90