ekaterina vylomova/brown bag seminar presentation
DESCRIPTION
Associative thesari, Russian Associative ThesauriTRANSCRIPT
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Associative thesauri: structure and analysis
Brown bag seminar
Ekaterina Vylomova
Fulbright scholar at Montclair State University
February 21, 2014
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Brief bio
Brief Bio
2011: MSc, Bauman Moscow State Technical University
2009: BSc, Bauman Moscow State Technical University
2009: Yandex School of Data Analysis (Moscow Institute of
Physics & Technology)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Brief bio
Brief Bio
2011: MSc, Bauman Moscow State Technical University
2009: BSc, Bauman Moscow State Technical University
2009: Yandex School of Data Analysis (Moscow Institute of
Physics & Technology)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Brief bio
Brief Bio
2011: MSc, Bauman Moscow State Technical University
2009: BSc, Bauman Moscow State Technical University
2009: Yandex School of Data Analysis (Moscow Institute of
Physics & Technology)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's AE?
Associative Experiments
What's AE?
Associative experiment is one of methods of psycholinguistics. It's
based on method of free associations.
Sir Francis Galton conducted the �rst experiment in 1879.
Types of AE
Single Free Association
Multiple Free Associations
Single Controlled Association (synonym, noun, verb, hyponym,
etc.)
Multiple Controlled Associations
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's AE?
Associative Experiments
What's AE?
Associative experiment is one of methods of psycholinguistics. It's
based on method of free associations.
Sir Francis Galton conducted the �rst experiment in 1879.
Types of AE
Single Free Association
Multiple Free Associations
Single Controlled Association (synonym, noun, verb, hyponym,
etc.)
Multiple Controlled Associations
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
What's associative thesaurus?
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
Example of data
EAT Word Associations
CAT stimulated the following associations:
DOG 49 0.52
MOUSE 8 0.08
BLACK 4 0.04
MAT 3 0.03
ANIMAL 2 0.02
EYES 2 0.02
GUT 2 0.02
KITTEN 2 0.02
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
AT for di�erent languages
English
The Structure of Associations in Language and Thought
(Deese, 1965)
Word association (Cramer, 1968)
An associative thesaurus of English and its computer analysis
(Kiss et al., 1973)
Word Association, rhyme and fragment norms (Nelson,
McEvoy & Schreiber, 1999)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
AT for di�erent languages
Dutch
Word association norms with response times (De Groot, 1988)
Word associations: Norms for 1,424 Dutch words in a
continuous task (De Deyne & Storms, 2008)
Swedish
A Swedish Associative Thesaurus (Lonngren, 1998)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
AT for di�erent languages
Japanese
Construction of associative concept dictionary with distance
information, and comparison with electronic concept dictionary
(Okamoto & Ishizaki, 2001)
Building a word association database for basic Japanese
vocabulary (Joyce, 2005)
Korean
Network analysis of Korean Word Associations(Jung et al.,
2010)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
AT for di�erent languages
Czech
Volne slovni parove asociace v cestine (Novak, 1988)
Hebrew
Free association norms in the Hebrew language (Rubinsten,
2005)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri
Slavic Associative Thesauri
Dictionary of associative norms in Russian (Leontiev,1973)
Russian Associative Thesaurus (Karaulov et al.,2002)
Slavic Associative Thesaurus(Russian, Belorussian,Bulgarian,
Ukrainian) (U�mtseva et al., 2004)
Normas asociativas del espanol y del ruso(Sanchez
Puig,Karaulov,Cherkasova, 2000)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DataResearch
Russian associative experiment description
Time frame: 1988-1998
Participants: 11,000 1st-3rd year students; 34 specialities
Stimuli: 6,624(initial list: 1,277)
Associative pairs:1,032,522 (di�erent - 462,500)
Reactions:102,926
Subset used for analysis
Stimuli: 6,577
Reactions:21,312
Associative pairs:102,516
Dataset
Set of triplets: < ci , rj ,wij >, where wij =freqij∑nj=1 freqij
.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DataResearch
Comparison with frequency dictionary of Russian language
Frequency dictionary
Frequency dictionary of modern Russian language (Lyashevskaya,
Sharov, 2009).
Based on the texts from Russian National Corpus
(www.ruscorpora.ru) and includes information about 20,000 most
common words in Russian language.
RAT Lemmatisation
RAT->MyStem(Segalovich, 2003)->lemmas
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DataResearch
Comparison with frequency dictionary of Russian language
TOP-11 Nouns
RAT FreqDict
Human Year
Home, House Human
Money Time
Day Business
Friend Life
Home Day
Male Hand
Fool Work
Business Word
Life Place
Illness FriendE. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DataResearch
Comparison with frequency dictionary of Russian language
Semantic primes?
Concept "Human": "human "child "friend "male"
Concept "Time": "day "time"
Adjectives: "good "bad "big".
These concepts don't change over the time.
Positive correlation with semantic primes (Wierzbicka)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Description
Nodes correspond to words(lemmas)
Edges correspond to associations
Edge's weight correspond to association strength
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Main characteristics of the network
Nodes: |V | = 23, 195, among them:
nodes with outgoing edges(stimuli): |S | = 1, 883
nodes with incoming edges(reactions): |R| = 16, 618
nodes with both types of edges: |SR| = 4, 694
Edges: |E | = 102, 516
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Table of network characteristics
Sign Description Directed Undirected
N Number of nodes 23,195 23,195
L average shortest path length 3.98 3.83
D Diameter 9 8
<k> Average node degree 4.42 8.83
ψ Degree distribution (P(k)) par. 2.2 1.85
Directed to undirected
w̃ij = w̃ji = wij + wji
Degree distribution function
P(k) ≈ k−ψ
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Small-world networks
De�nition
Introduced by Milgram, 1967 ("The small world problem")
L ∝ log(N),i.e. distance L between two randomly chosen nodes
grows proportionally to the logarithm of the number of nodes N in
the network
Also known as "Six degrees of separation"
Examples
World Wide Web (WWW; Adamic, 1999; Albert, Jeong, &
Barabasi, 1999), networks of scienti�c collaboration (Newman,
2001),metabolic networks in biology (Jeong, Tombor, Albert,
Oltval, & Barabasi, 2000)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Scale-free networks
Description
Amaral, Scala et al., 2000 studied small-world networks and
compared degree distribution function P(k).2 types of distribution:
exponential(power grid system in USA, neural system of
C.elegans)
power law(WWW, metabolic networks): P(k) = k−ψ,ψ ∈ (2..4)
Scale-free networks provide better signal propagation.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
DescriptionAssociative Network based on RAT'98Network analysis
Scale-free networks
Other examples
Similar results were obtained for Roget thesaurus(Roget,
1911),WordNet and associative networks(Steyvers and Tenenbaum,
2005).
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Three Models of Associative Network
Concept-based model
Vector-based models
Multidimensional scaling(Torgerson,1958)Latent Semantic analysis(Landauer, Dumais, 1997)
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Data
Core of the network: 4,692 lemmas with 59,392 connections
The structure is similar to associative network(nodes-lemmas, edges
- associations)
Activity accumulation
1. Initial state: random activity
2. Spreading of activation: S ti = S t−1
i +∑
j wijSt−1j , where S t
i is
activity of neuron i at the moment t.
3. Activation exceeds the threshold => produce the reaction.
S ti = 0.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Pros and cons
Pros
very simple model
easy to understand
easy to modify(no need in reevaluation of the model)
Cons
unclear how to choose the threshold value(we did series of
experiment to �nd optimal value)
once activation is released, should we also do modi�cation for
neighbouring neurons?
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Multidimensional scaling
From concept to vector
Distance matrix:
4 =
δ1,1 δ1,2 · · · δ1,Iδ2,1 δ2,2 · · · δ2,I...
.... . .
...
δI ,1 δI ,2 · · · δI ,I
where I means number of objects(words).
Our goal is to �nd such vectors x1, ..., xI ∈ RN that ‖xi − xj‖ ≈ δijfor all i , j ∈ I .
In other words:
minx1,..,xI∑
i<j (‖xi − xj‖ − δij)2.E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Latent Semantic Analysis
From concept to vector-2
Technique of analysing relationships between a set of documents
and the terms they contain by producing a set of concepts related
to the documents and terms.
In my case
Terms are lemmas, document is a set of associations for a given
stimulus.
Inputs: term-document matrix with TF*IDF values
Term frequency: TF = wij =freqij∑Nj=1 freqij
, Inverse document
frequency: IDF = log|S|
|s∈S:r∈s| , |S | is a total number of
stimuli.Singular Value Decomposition => vector representations.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Clustering
k-means
So, we've got vectors. What's next?
Let's evaluate similarity:
First, set a distance metric, e.g. dij =r
√∑Nk=1 |xik − xjk |r
And use it with k-means clustering:
min∑k
i=1
∑xj∈Si
(xj − µi )2,
where k is a number of clusters, Si are evaluated clusters,µi are
centers of the clusters.
So, the technique is based on �nding the nearest cluster.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Clustering
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
Pros and cons
Pros
easy to operate with vectors: add, multiple, subtract, etc.
possible to set preferred dimensionality and visualize
Cons
problem with storage: matrices are huge
complexity: MDS and LSA are based on SVD; it takes O(n3)
choosing optimal number of clusters and dimensionality
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
"Tip of the tongue"application
Data&Method
Data: RAT+Abramov's synonym dictionary
Method: LSA+k-means
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Models of Associative NetworkConcept-based model
"Tip of the tongue"application
Data&Method
The tip-of-the-tongue(TOT) phenomenon is the failure to retrieve a word from memory, combined with partial recall and the feeling that retrieval is imminent. People in a tip-of-the-tongue state can often recall one or more features of the target word, such as the first letter, its syllabic stress, and words similar in sound and/or meaning.•TOT appears to be universal (Brennenet al. 2007)•An occasional tip-of-the-tongue state is normal for people of all ages•TOT becomes more frequent as people age.R. Braun, D. McNeill and A. Luria consider the processes ofrecalling and naming the words as processes of probabilistic choice of a word from involuntary associations’ chain and relate them to the construction of human semantic memory.
1. Expand synonym and associative thesauri with new ones2. Add first letter filtering (see above)3. Add hyponyms and hyperonyms
Abramov. Dictionary of Russian synonyms and similar expressions, 1890-1999 19,297 words & phrases 18,136 synonym articles Karaulov Y.,, Tarasov E., Sorokin Y., Ufimtseva N., Cherkasova G.. 1999. Associative thesaurus of modern russian language. RAS, Moscow. 56,540 associative pairs 50,923 associative pairs (after lemmatization) 26,803 lemmasOverall (synonym and associative pairs combined together) 316,018
DATA
METHODOLOGY
Usage of associative thesauri for solving tasks related to the “tip of the tongue” phenomenonEkaterina VylomovaBauman Moscow State Technical University, Moscow State University of Printing Arts
Associative thesauri+Abramov dictionary: Комильфо - приличие
INTRODUCTION
RATRAT
RATAbramov
dict.
LemmatizationLemmatization
RATRATRATRATRAT
RATAbramovAbramovdict.dict.
AbramovAbramovdict.dict.
RATAbramov
dict.RAT
RATRAT&&
AbramovAbramovlemmaslemmas
RATRAT&&
AbramovAbramovlemmaslemmas
LSA & k-NNLSA & k-NN
Lemmatization using Yandex mystem stemmer
Apply Latent Semantic Algorithm to get vector representation of words
and k-nearest neighbours for clustering
Clusters containing similar by meaning and association words
REFERENCES1. Brown, R., and McNeill, D. (1966). The "tip of the tongue" phenomenon. Journal of Verbal Learning and Verbal Behavior 5, 325-337.2. Караулов Ю.Н., Тарасов Е.Ф., Сорокин Ю.А., Уфимцева Н.В., Черкасова Г.А. (1999). Ассоциативный тезаурус современного русского языка. РАН. (russian)3. Лурия А.Р. (1979). Язык и сознание.//под редакцией Хомско Е.Д., МГУ, Москва - 320 стр.(russian)
RHF #12-04-12039BE-mail: [email protected]
не выходить из пределов благопристойности 0.001
степенный 0.591
чинный 0.591
благочинный 0.646
бонтонный 0.646
комильфотный 0.646
пристойный 0.646
благонравный 0.684
благоприличный 0.684
благопристойный 0.684
корректный 0.684
After clustering:
EXAMPLE
FUTURE PLANS
Hmm...What's the name of that Ukranian food?
Hmm...What's the name of that Ukranian food?
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
RAT'10
Time frame: 20 years after the �rst one(2009-2010)
Location: di�erent regions of Russia.
Stimuli included 1000 most frequent words in Russian language.
The participants: young people at the age of 17-25.
E. Vylomova Associative thesauri
IntroductionAssociative Experiments
Associative ThesauriRussian Associative Thesaurus'98
Associative Network(Graph)Modelling of Associative Network
Future work
Thank you!
Questions?
E. Vylomova Associative thesauri