Automatic Construction of Semantic Hierarchies
Rion L. [email protected]
Fair Isaac 5935 Cornerstone CourtSan Diego, CA 92121
AQUAINT Phase I Biannual Workshop San Diego, CA 9 – 12 June 2003
2 Automatic Construction of Semantic Hierarchies
Outline / Summary
• Notation and Terminology
• Similarity of Meaning and Usage
• Automatic Polysemy Discovery
• Constructing Semantic Hierarchies
• Applications to Query Expansion and Sentence Meaning Comparison
3 Automatic Construction of Semantic Hierarchies
Language Representation in the Model
The Token Lexicon
president george bush visited san jose last weekend.
Each word activates a fixed token of neurons on the input region.
Our experiments typically use a nine region network; this network advances one word at a time over the input text.
president
george
bush
visited
“President George Bush visited San Jose last weekend.”Incoming text is mapped into the universal token language by means of the token lexicon.
Our experiments use a lexicon of size 100,000, representing the 30,000 most frequent words and the 70,000 selected phrases.
… …
4 Automatic Construction of Semantic Hierarchies
Unsupervised Language TrainingTarget region
An antecedent support network
Cerebral cortex
Source region
Cortical antecedent support fascicles are trained between each pair of regions. The connection strength between a pair of neurons is a function of those neurons’ occurrence and co-occurrence probabilities.
In our experiments we train a maximum of four fascicles forward and backward from each region, for a total of 52 possible fascicles.
For training we use a 1.4 giga-word, 75 million sentence untagged newswire corpus (which includes the AQUAINT newswire corpus).
president george bush visited san jose last weekend.
ij
Pr i , j S
Pr j
ijS
5 Automatic Construction of Semantic Hierarchies
brazil
colombia . guatemala . nicaragua . bolivia . ecuador . mexico . el salvador . honduras .
venezuela . costa rica . panama . brazil
Similarity of Meaning and Usage
brazilvenezuela
venezuelaecuador brazil
6 Automatic Construction of Semantic Hierarchies
acrossinto
alongnear
around through
ontotoward
offinsidedownoverfromonin
out
newspaper new york times
washington post magazine
wall street journal journal daily times post
newspapers daily news
associated press paper news weekly
passion fascination enthusiasm
desire appetite
penchant obsession
love fondness affection
sense
Word Families: The Emergence of Abstraction
redbluepinkgray
greenblackyellowwhite
A simple automated process produced over 400,000 families for a lexicon of 30,000 words and 70,000 multi-word ‘phrases’. Each family is a subset of the synonymy set of that word. Families are like word senses, but more abstract and more useful. For example, word family matching between sentences can be used to evaluate their similarity of meaning.
talksaccord
agreementnegotiations
dealpeace
process plan
peace talksefforts
comply withabide by
compliance withaccordance with
violation ofline with
complying with
7 Automatic Construction of Semantic Hierarchies
Automatic Polysemy Discovery
plantsanimals
birdsspecies
fishdogs
animaldogcats
humansinsects
plantsplant
facilitiesfacility
reactorsreactor
factoriesfactory
systemsequipment
systempipelinenuclearstation
footkneeankle
shoulderwristelbow
leg
footfeet
## feetmiles
inchesmeters
## milesmile
kilometers## inches## meters
inchyardskm
winto win
winningwon
after winningwho won
wins
wingetseedo
playgo
takemakehearfind
winvictorygame
victory overloss
gamesseasonwin overopenerseries
Plants“living plants”
“industrial plants”
“body part”
“unit of measurement”
“verb: to win”
“noun: a win”
“common verbs”
Foot Win
8 Automatic Construction of Semantic Hierarchies
Multi-Scale Semantic Similarity
merrill lynchmorgan stanley
salomon brotherslehman brothersgoldman sachssmith barney
goldmanj.p. morganbear stearns
wells fargobank
bank of americafirst interstatesecurity pacific
chase manhattanfargo
corp.co.inc.ltd.plc
companygroupunit
intelibm
microsoftcompaq
hewlett-packardappleoracle
motorola
at&tbellsouth
gtemci
bell atlanticnynexsprintsbc
ameritechbell
corp.
intel at&t wells fargo merrill lynch corp.
“Companies” Super-Family
9 Automatic Construction of Semantic Hierarchies
Extrapolating this Process Yields…
The Semantic Structure of the English Language
10 Automatic Construction of Semantic Hierarchies
Semantic Analysis with Word Families
annual meetingconvention
meetingconference
summitannual conventionannual conference
the their hisItsherourmy
hei
wesheyou
peopleit
who
“He was named executive vice president following the annual meeting.”
“He was named executive vice president following the annual meeting.”
“Who served as vice president in the wake of the meeting?”
was namedwill become
was appointedbecame
was electedserves as
he becameserved as
is now
executive vice presidentvice president
senior vice presidentchief executivevice chairman
chief executive officerpresidentchairmandirector
chief operating officer
following during
shortly afterjust before
prior toafter
soon aftershortly before
beforedays beforehours after
on the eve ofdays after
in the wake of
11 Automatic Construction of Semantic Hierarchies
Demo: Sentence Meaning Comparison
12 Automatic Construction of Semantic Hierarchies
Conclusion
• Similarity of meaning is applied to construct powerful hypernym-type semantic hierarchies by grouping words according to similar contexts.
• Domain specific knowledge is seamlessly integrated into the overall semantic construction.
• This method may be directly applied to foreign languages, as well as to other information modalities such as sound and vision.
• We plan on building semantic hierarchies for Chinese, Arabic, and Spanish soon.
13 Automatic Construction of Semantic Hierarchies
The Team
Other Team Members
Dr. Robert Hecht-Nielsen - Project Leader
Dr. Robert Means - Chief Technologist
Kate Mark - Project Coordinator
David Busby - Chief Brain Software Architect
Dr. Syrus Nemat-Nasser - Scientist
Dr. Shailesh Kumar - Scientist
Adrian Fan - Researcher
Research SponsorsFair Isaac
ARDA