analyzing unstructured text with topic models mark steyvers dep. of cognitive sciences & dep. of...
Post on 20-Dec-2015
220 views
TRANSCRIPT
Analyzing unstructured text with topic models
Mark Steyvers
Dep. of Cognitive Sciences & Dep. of Computer ScienceUniversity of California, Irvine
collaborators: Padhraic Smyth, UC Irvine; Tom Griffiths UC Berkeley
NYT
330,000 articles
Enron
250,000 emails
16 million Medline articles
NSF/ NIH
100,000 grants
Analyzing Unstructured Text
AOL queries
20,000,000 queries
650,000 users
Pennsylvania Gazette
(1728-1800)
80,000 articles
Topic Models and Text Analysis
• Can answer a number of questions:
What is in this corpus?
What is in this document, paragraph, or sentence?
What does this person/group of people write about?
What tags are appropriate for this document?
What are the topical trends over time?
Topic Models
• Automatic and unsupervised extraction of semantic themes
from large text collections.
• Widely used model in machine learning and text mining
– pLSI Model: Hoffman (1999)
– LDA Model: Blei, Ng, and Jordan (2001, 2003)
– LDA with Gibbs sampling : Griffiths and Steyvers (2003, 2004)
Basic Assumptions
• Each topic is a distribution over words
• Each document a mixture of topics
• Each word in a document originates from a single topic
Model
P( words | document ) = P(words|topic) P (topic|document)
Topic = probability distribution over words
topic weightsfor each document
Automatically learned from text corpus
MONEYLOANBANKRIVER
STREAM
RIVERSTREAM
BANKMONEY
LOAN
Topics
.4
1.0
.6
1.0
MONEY1 BANK1 BANK1 LOAN1 BANK1 MONEY1 BANK1
MONEY1 BANK1 LOAN1 LOAN1 BANK1 MONEY1 ....
Topic Weights
Documents and topic assignments
RIVER2 MONEY1 BANK2 STREAM2 BANK2 BANK1
MONEY1 RIVER2 MONEY1 BANK2 LOAN1 MONEY1 ....
RIVER2 BANK2 STREAM2 BANK2 RIVER2 BANK2....
Toy Example
Topics
?
?
MONEY? BANK BANK? LOAN? BANK? MONEY?
BANK? MONEY? BANK? LOAN? LOAN? BANK?
MONEY? ....
TopicWeights
RIVER? MONEY? BANK? STREAM? BANK? BANK?
MONEY? RIVER? MONEY? BANK? LOAN?
MONEY? ....
RIVER? BANK? STREAM? BANK? RIVER?
BANK?....
Statistical Inference
Documents and topic
assignments
?
Statistical Inference
• Exact inference is intractable
• Markov chain Monte Carlo (MCMC) with Gibbs sampling
• scalable to large document collections (e.g. all of wikipedia)
• parallelizable
• Form of dimensionality reduction
– Number of topics T= 50…2000
Examples Topics from New York Times
WEEKDOW_JONES
POINTS10_YR_TREASURY_YIELD
PERCENTCLOSE
NASDAQ_COMPOSITESTANDARD_POOR
CHANGEFRIDAY
DOW_INDUSTRIALSGRAPH_TRACKS
EXPECTEDBILLION
NASDAQ_COMPOSITE_INDEXEST_02
PHOTO_YESTERDAYYEN10
500_STOCK_INDEX
WALL_STREETANALYSTS
INVESTORSFIRM
GOLDMAN_SACHSFIRMS
INVESTMENTMERRILL_LYNCH
COMPANIESSECURITIESRESEARCH
STOCKBUSINESSANALYST
WALL_STREET_FIRMSSALOMON_SMITH_BARNEY
CLIENTSINVESTMENT_BANKINGINVESTMENT_BANKERS
INVESTMENT_BANKS
SEPT_11WAR
SECURITYIRAQ
TERRORISMNATIONKILLED
AFGHANISTANATTACKS
OSAMA_BIN_LADENAMERICAN
ATTACKNEW_YORK_REGION
NEWMILITARY
NEW_YORKWORLD
NATIONALQAEDA
TERRORIST_ATTACKS
BANKRUPTCYCREDITORS
BANKRUPTCY_PROTECTIONASSETS
COMPANYFILED
BANKRUPTCY_FILINGENRON
BANKRUPTCY_COURTKMART
CHAPTER_11FILING
COOPERBILLIONS
COMPANIESBANKRUPTCY_PROCEEDINGS
DEBTSRESTRUCTURING
CASEGROUP
Terrorism Wall Street Firms
Stock Market
Bankruptcy
Learning multiple meanings of words
PRINTINGPAPERPRINT
PRINTEDTYPE
PROCESSINK
PRESSIMAGE
PRINTERPRINTS
PRINTERSCOPY
COPIESFORM
OFFSETGRAPHICSURFACE
PRODUCEDCHARACTERS
PLAYPLAYSSTAGE
AUDIENCETHEATERACTORSDRAMA
SHAKESPEAREACTOR
THEATREPLAYWRIGHT
PERFORMANCEDRAMATICCOSTUMES
COMEDYTRAGEDY
CHARACTERSSCENESOPERA
PERFORMED
TEAMGAME
BASKETBALLPLAYERSPLAYER
PLAYPLAYINGSOCCERPLAYED
BALLTEAMS
BASKETFOOTBALL
SCORECOURTGAMES
TRYCOACH
GYMSHOT
JUDGETRIAL
COURTCASEJURY
ACCUSEDGUILTY
DEFENDANTJUSTICE
EVIDENCEWITNESSES
CRIMELAWYERWITNESS
ATTORNEYHEARING
INNOCENTDEFENSECHARGE
CRIMINAL
HYPOTHESISEXPERIMENTSCIENTIFIC
OBSERVATIONSSCIENTISTS
EXPERIMENTSSCIENTIST
EXPERIMENTALTEST
METHODHYPOTHESES
TESTEDEVIDENCE
BASEDOBSERVATION
SCIENCEFACTSDATA
RESULTSEXPLANATION
STUDYTEST
STUDYINGHOMEWORK
NEEDCLASSMATHTRY
TEACHERWRITEPLAN
ARITHMETICASSIGNMENT
PLACESTUDIED
CAREFULLYDECIDE
IMPORTANTNOTEBOOK
REVIEW
Demographic Analysis of Search Queries
AOL dataset
• Dataset:
- 20,000,000+ web queries
- 650,000+ users
• Users were given “anonymous” user-id
– No demographics in this dataset
Example query log from user #2178
ID Query Date/Time URL clicked
2178 dog eats uncooked pasta 2006-05-26 15:31:562178 inducing dog vomiting 2006-05-26 15:32:46 http://www.twodogpress.com2178 inducing dog vomiting 2006-05-26 15:32:46 http://www.canismajor.com2178 inducing dog vomiting 2006-05-26 15:32:46 http://kitchen.robbiehaf.com2178 inducing dog vomiting 2006-05-26 15:32:46 http://www.dog-first-aid-101.com2178 inducing dog vomiting 2006-05-26 15:38:362178 walmart 2006-05-12 12:39:52 http://www.walmart.com2178 sears 2006-05-12 12:44:22 http://www.sears.com2178 target 2006-05-12 17:05:36 http://www.target.com2178 babycenter.com 2006-05-12 17:43:59 http://www.babycenter.com2178 google 2006-05-16 10:54:39 http://www.google.com2178 fit pregnancy 2006-05-16 15:34:232178 baby center 2006-05-16 15:37:222178 yahoo.com 2006-05-18 17:11:05 http://www.yahoo.com2178 applebee's carside 2006-05-19 19:21:08 http://www.applebees.com2178 baby names 2006-05-20 15:02:38 http://www.babynames.com2178 baby names 2006-05-20 15:02:38 http://www.babynamesworld.com2178 baby names 2006-05-20 15:02:38 http://www.thinkbabynames.com2178 mortgage calculator 2006-05-24 14:39:05 http://www.bankrate.com2178 us zip codes 2006-05-25 21:26:47 http://www.usps.com2178 us zip codes 2006-05-25 21:26:47 http://www.usps.com
Another Query Database…
• Not publicly available
• Dataset
– 250,000+ users
– 411,000+ queries
• Age and gender of users are known:
– age brackets: 0-12, 13-17, 18-20, 21-24, 25-29, 30-
34, 35-44, 45-54, 55-64, 65+
Topic modeling of queries
• Each user searches for a mixture of topics
• Each topic is a probability distribution over query words
Four example topics (out of 200)
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
autocar
partscarsusedford
hondatruck
toyota
webmdcymbalta
xanaxgout
vicodineffexor
prednisonelexaproambien
partystore
weddingbirthdayjewelry
ideascardscakegifts
hannahmontana
zacefron
disneyhigh school
musicalmiley cyrushilary duff
Probability distribution over words. Most likely words listed at the top
User = mixture of topics
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
autocar
partscarsusedford
hondatruck
toyota
hannahmontana
zacefron
disneyhigh school
musicalmiley cyrushilary duff
webmdcymbalta
xanaxgout
vicodineffexor
prednisonelexaproambien
partystore
weddingbirthdayjewelry
ideascardscakegifts
User #7654
80% 20%
User #246
100%
Topic Analysis
• Find likely topics for each demographic bucket
• Find likely demographics given topics
• What’s on the mind of people in different age-groups?
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female Topic 6
poemslove_poems
quotespoetry
love_quotesfamous_quotes
lyricslove
funny_quotesfriendship_poemsbest_love_poems
funny_poemsinspirational_quotes
love_songsshakespeare
“poems” topic
“myspace” topic
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female
Topic 2
myspacegoogle
my_spaceyahoo
mysapceabout_blank
myphotobuckethttp_googleww.myspace
myspace_com_blogshttp_myspacemyspace.cow_myspace
myspcae
“sports” topic
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female Topic 29
espnnfl
nfl_draftnba
2006_nfl_mock_draft2006_nfl_draft
mlbreggie_bush
nfl_mock_draftdallas_cowboys
vince_youngfox_sports
lakersraiders
espn_sports
“MTV” topic
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female Topic 92
betchris_brown
mtvlyricsciara
50_centti
proofbow_wow
chamillionairet.i.
beyonceatl
allhiphoplil_wayne
“Clothing Stores” topic
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female
Topic 111
old_navyvictoria_secret
hollisteramerican_eagle
gapabercrombieaeropostaleforever_21
victorias_secretexpress
charlotte_russehot_topic
targetabercrombiefitch
wet_seal
“Hairstyles” topic
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female Topic 173
hairstyleshair_styles
prom_hairstylespictureshairstyles
haircutssally_beauty_supplycelebrity_hairstyles
hairshort_hairstyles
cosmopolitanprom_updos
prom_hair_stylesshort_hair_styles
picturesprom_hairstylesprom_hair
0-12
13-17
18-20
21-24
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Age
gro
up
Male
Female Topic 10
food_networkrecipes
foodnetworkfoodtv
martha_stewartkraft
betty_crockerfood_tv
food_network_recipesallrecipes
easter_recipesepicuriousrachel_raykraft_foods
chicken_recipes
“recipes” topic
Results
• Topic models give quick summaries of demographic
trends in query datasets
• Other potential applications:
– e.g. blogs, social networking sites, email, etc
– clinical data, e.g. therapy discussions
Analyzing Emailswho writes on what topics?
Enron email data
250,000 emails
5000 authors
1999-2002
Author-topic models
• We can learn the association between authors of
documents and topics
• Assume each author works on a mixture of topics
ENRON Email: who writes on certain topics?
WORD PROB. WORD PROB. WORD PROB. WORD PROB.
HOLIDAY 0.0857 TEXANS 0.0145 GOD 0.0357 AMAZON 0.0312
PARTY 0.0368 WIN 0.0143 LIFE 0.0272 GIFT 0.0226
YEAR 0.0316 FOOTBALL 0.0137 MAN 0.0116 CLICK 0.0193
SEASON 0.0305 FANTASY 0.0129 PEOPLE 0.0103 SAVE 0.0147
COMPANY 0.0255 SPORTSLINE 0.0129 CHRIST 0.0092 SHOPPING 0.0140
CELEBRATION 0.0199 PLAY 0.0123 FAITH 0.0083 OFFER 0.0124
ENRON 0.0198 TEAM 0.0114 LORD 0.0079 HOLIDAY 0.0122
TIME 0.0194 GAME 0.0112 JESUS 0.0075 RECEIVE 0.0102
RECOGNIZE 0.019 SPORTS 0.011 SPIRITUAL 0.0066 SHIPPING 0.0100
MONTH 0.018 GAMES 0.0109 VISIT 0.0065 FLOWERS 0.0099
SENDER PROB. SENDER PROB. SENDER PROB. SENDER PROB.
chairman & ceo 0.131 cbs sportsline com 0.0866 crosswalk com 0.2358 amazon com 0.1344
*** 0.0102 houston texans 0.0267 wordsmith 0.0208 jos a bank 0.0266
*** 0.0046 houstontexans 0.0203 *** 0.0107 sharperimageoffers 0.0136
*** 0.0022 sportsline rewards 0.0175 doctor dictionary 0.0101 travelocity com 0.0094
general announcement 0.0017 pro football 0.0136 *** 0.0061 barnes & noble com 0.0089
TOPIC 109TOPIC 66 TOPIC 182 TOPIC 113
... But also over senders (authors) of email. Most likely authors listed at the top
Enron email: two example topics (T=100)
WORD PROB.
BUSH 0.0227
LAY 0.0193
MR 0.0183
WHITE 0.0153
ENRON 0.0150
HOUSE 0.0148
PRESIDENT 0.0131
ADMINISTRATION 0.0115
COMPANY 0.0090
ENERGY 0.0085
SENDER PROB.
NELSON, KIMBERLY (ETS) 0.3608
PALMER, SARAH 0.0997
DENNE, KAREN 0.0541
HOTTE, STEVE 0.0340
DUPREE, DIANNA 0.0282
ARMSTRONG, JULIE 0.0222
LOKEY, TEB 0.0194
SULLIVAN, LORA 0.0073
VILLARREAL, LILLIAN 0.0040
BAGOT, NANCY 0.0026
TOPIC 10
WORD PROB.
ANDERSEN 0.0241
FIRM 0.0134
ACCOUNTING 0.0119
SEC 0.0065
SETTLEMENT 0.0062
AUDIT 0.0054
CORPORATE 0.0053
FINANCIAL 0.0052
JUSTICE 0.0052
INFORMATION 0.0050
SENDER PROB.
HILTABRAND, LESLIE 0.1359
WELLS, TORI L. 0.0865
DUPREE, DIANNA 0.0825
ARMSTRONG, JULIE 0.0316
DENNE, KAREN 0.0208
SULLIVAN, LORA 0.0072
[email protected] 0.0026
WILSON, DANNY 0.0016
HU, SYLVIA 0.0013
MATHEWS, LEENA 0.0012
TOPIC 32
Detecting Papers on Unusual Topics for Authors
• We can calculate perplexity (unusualness) for words in a
document given an author
Papers ranked by perplexity for M. Jordan:
Author Separation
Can model attribute words to authors correctly within a document?
A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1letsus generalizedistance1based2algorithmsto operatein feature1 spacesusually nonlinearlyrelatedto the input1 spaceThis is done by identifyinga classof kernels1 which can be representedas norm1 based2 distances1in HilbertspacesIt turns1 out that commonkernel1 algorithmssuch as SVMs1and kernel1 PCA1 are actually really distance1based2algorithmsand can be run2 with that classof kernels1 too As well as providing1 a useful new insight1 into how these algorithmsworkthe present2 workcan formthe basis1 for conceivingnew algorithms
This paperpresents2 a comprehensiveapproachfor model2 based2 diagnosis2which includesproposalsfor characterizingand computing2preferred2 diagnoses2assumingthat the system2 description2 is augmentedwith a system2 structure2 a directed2 graph2 explicating the interconnections between system2components2
Specificallywe first introducethe notionof a consequence2which is a syntactically2 unconstrainedpropositional2 sentence2 that characterizesall consistency2 based2 diagnoses2and show2 that standard2
characterizationsof diagnoses2 such as minimalconflicts1 correspondto syntactic2 variations1 on a consequence2Second we proposea new syntactic2 variationon the consequence2 known as negation2
normalformNNF and discussits meritscomparedto standardvariationsThird we introducea basicalgorithm2for computingconsequencesin NNF given a structuredsystem2 descriptionWe showthat if the system2structure2 does not contain cycles2 then there is always a linearsize2 consequence2in NNF which can be computedin lineartime2 For arbitrary1 system2 structures2 we showa preciseconnectionbetween the complexity2 of computing2 consequencesand the topologyof the underlyingsystem2structure2 Finallywe present2 an algorithm2 that enumerates2 the preferred2 diagnoses2characterizedby a consequence2The algorithm2is shown1 to take lineartime2 in the size2 of the consequence2if the preferencecriterion1 satisfiessome generalconditions
Written by(1) Scholkopf_B
Written by(2) Darwiche_A
Application:Faculty Browser
Faculty Browser
• Automatically analyzes computer science papers by
UC San Diego and UC Irvine researchers
• Finds topically related researchers
one topic
most prolific researchers in this topic
topics this researcher is interested in
other researchers with similar
topical interests
one researcher
Inferred network of researchers connected through topics
Modeling Extensions
330,000 articles
2000-2002
Entity-topic modeling
Who is mentioned in what context?
Three investigations began Thursday into the securities and exchange_commission's choice of william_webster to head a new board overseeing the accounting profession. house and senate_democrats called for the resignations of both judge_webster and harvey_pitt, the commission's chairman. The white_house expressed support for judge_webster as well as for harvey_pitt, who was harshly criticized Thursday for failing to inform other commissioners before they approved the choice of judge_webster that he had led the audit committee of a company facing fraud accusations. “The president still has confidence in harvey_pitt,” said dan_bartlett, bush's communications director …
Extracted Named Entities
Used standard algorithms to extract named entities:
- People- Places- Organizations
Standard Topic Model with Entities
team 0.028 tour 0.039 holiday 0.071 award 0.026play 0.015 rider 0.029 gift 0.050 film 0.020game 0.013 riding 0.017 toy 0.023 actor 0.020season 0.012 bike 0.016 season 0.019 nomination 0.019final 0.011 team 0.016 doll 0.014 movie 0.015games 0.011 stage 0.014 tree 0.011 actress 0.011point 0.011 race 0.013 present 0.008 won 0.011series 0.011 won 0.012 giving 0.008 director 0.010player 0.010 bicycle 0.010 special 0.007 nominated 0.010coach 0.009 road 0.009 shopping 0.007 supporting 0.010playoff 0.009 hour 0.009 family 0.007 winner 0.008championship 0.007 scooter 0.008 celebration 0.007 picture 0.008playing 0.006 mountain 0.008 card 0.007 performance 0.007win 0.006 place 0.008 tradition 0.006 nominees 0.007LAKERS 0.062 LANCE-ARMSTRONG 0.021 CHRISTMAS 0.058 OSCAR 0.035SHAQUILLE-O-NEAL0.028 FRANCE 0.011 THANKSGIVING 0.018 ACADEMY 0.020KOBE-BRYANT 0.028 JAN-ULLRICH 0.003 SANTA-CLAUS 0.009 HOLLYWOOD 0.009PHIL-JACKSON 0.019 LANCE 0.003 BARBIE 0.004 DENZEL-WASHINGTON 0.006NBA 0.013 U-S-POSTAL-SERVICE 0.002 HANUKKAH 0.003 JULIA-ROBERT 0.005SACRAMENTO 0.007 MARCO-PANTANI 0.002 MATTEL 0.003 RUSSELL-CROWE 0.005RICK-FOX 0.007 PARIS 0.002 GRINCH 0.003 TOM-HANK 0.005PORTLAND 0.006 ALPS 0.002 HALLMARK 0.002 STEVEN-SODERBERGH 0.004ROBERT-HORRY 0.006 PYRENEES 0.001 EASTER 0.002 ERIN-BROCKOVICH 0.003DEREK-FISHER 0.006 SPAIN 0.001 HASBRO 0.002 KEVIN-SPACEY 0.003
Basketball Holidays OscarsTour de France
computer 0.069 play 0.030technology 0.026 show 0.029system 0.015 stage 0.022digital 0.014 theater 0.022chip 0.013 director 0.017software 0.013 production 0.017machine 0.011 performance 0.016devices 0.010 dance 0.014machines 0.010 audience 0.014video 0.009 festival 0.013Companies 1.000 Theater 0.960
Music 0.040
IBM 0.074 BROADWAY 0.119 BACH 0.035APPLE 0.061 NEW_YORK 0.044 BEETHOVEN 0.026INTEL 0.059 SHAKESPEARE 0.029 LOUIS_ARMSTRONG 0.019MICROSOFT 0.053 THEATER 0.022 MOZART 0.019COMPAQ 0.041 LONDON 0.019 CARNEGIE_HALL 0.017SONY 0.029 GUINNESS 0.018 LATIN 0.017DELL 0.019 TONY 0.016HP 0.018 LINCOLN_CTR 0.015
ArtsComputers
MusicCompanies Theatre
Example of Extracted Entity-Topic Network
Muslim_Militance
Mid_East_Conflict
Palestinian_Territories
Pakistan_Indian_War
FBI_Investigation
Detainees
Mid_East_Peace
US_Military
Religion
Terrorist_Attacks
Afghanistan_War
AL_QAEDA
HAMID_KARZAIMOHAMMED
MOHAMMED_ATTA
NORTHERN_ALLIANCE
BIN_LADEN
TALIBAN
ZAWAHIRI
YASSER_ARAFAT
EHUD_BARAK
ARIEL_SHARON
HAMAS
AL_HAZMI
KING_HUSSEIN
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
50
100
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
5
10
15
Topic Trends
Tour-de-France
Anthrax
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
10
20
30Quarterly Earnings
Proportion of words assigned to topic for that
time slice
Learning Topic Hierarchies(example: psych Review Abstracts)
RESPONSESTIMULUS
REINFORCEMENTRECOGNITION
STIMULIRECALLCHOICE
CONDITIONING
SPEECHREADINGWORDS
MOVEMENTMOTORVISUALWORD
SEMANTIC
ACTIONSOCIALSELF
EXPERIENCEEMOTION
GOALSEMOTIONALTHINKING
GROUPIQ
INTELLIGENCESOCIAL
RATIONALINDIVIDUAL
GROUPSMEMBERS
SEXEMOTIONS
GENDEREMOTIONSTRESSWOMENHEALTH
HANDEDNESS
REASONINGATTITUDE
CONSISTENCYSITUATIONALINFERENCEJUDGMENT
PROBABILITIESSTATISTICAL
IMAGECOLOR
MONOCULARLIGHTNESS
GIBSONSUBMOVEMENTORIENTATIONHOLOGRAPHIC
CONDITIONINSTRESS
EMOTIONALBEHAVIORAL
FEARSTIMULATIONTOLERANCERESPONSES
AMODEL
MEMORYFOR
MODELSTASK
INFORMATIONRESULTSACCOUNT
SELFSOCIAL
PSYCHOLOGYRESEARCH
RISKSTRATEGIES
INTERPERSONALPERSONALITY
SAMPLING
MOTIONVISUAL
SURFACEBINOCULAR
RIVALRYCONTOUR
DIRECTIONCONTOURSSURFACES
DRUGFOODBRAIN
AROUSALACTIVATIONAFFECTIVEHUNGER
EXTINCTIONPAIN
THEOF
ANDTOINAIS
theorymodeldata
informationproposed
modeltheorymodelsw ord
response
readingtext
readersmeaning
comprehension
biasassociative
matricesmatrix
al
memorylistitemitems
recognition
distributedgrams
associateassociations
paired
strengthfamiliarityretroactivedeviationlikelihood
responseinstrumentalresponsesconditioning
behavior
choicedelays
alternativesfixed
rew ard
memorymodelmodels
informationsocial
know ledgeskill
readingaccessspecific
modeleffectslearningtheory
systems
memoryretrieval
serialstoragew orking
preferencereinforcement
choicepunishmentcontingent
modeltheory
informationeffectsaccount
imagesperceptionaccordinglightnessobjects
visualimagery
representationsmental
subsystems
movementeye
positionspeedtarget
orientationeroticbem
sexualebe
situationalconsistency
crosstemporalbehavior
objectbasedneglectattentionspace
stimulivisual
componentcontourforw ard
attributestochastic
choicedifferencetransitivity
maskingmetacontrast
typeinhibition
mask
serialfunctionlatencypositionitems
reasoningbayesiansimilaritiesstatements
gain
similaritygeometricobjectsdensitydistance
ceconditioningprinciples
reinforcementrew ard
modelmemory
processesmodelslearning
imagecomponents
boundnearestneighbor
memoryreasoning
interferenceprocess
background
theorysentence
jamesfit
emotionmodel
memorydecisionresponse
theorytheory
achievementemotion
motivationfailure
modelcs
avoidanceucs
conditioningmodel
memoryproblems
itemstheoretical
goodnessapproach
representationholographic
pictorial
lettersmodelw ordsletter
memoryfunction
psychometriccorrelationsindividuals
performancestresssystemimmunearousal
fight
sexaffects
biologicaldifferenceshandedness
cognitivegigerenzerheuristicsreasoning
biases
childchildren
developmentfieldrisk
bayesianinferencealgorithmsauthors
frequency
speechauditoryacoustic
perceptualsound
actioncontrolintention
goalintentions
personalitybehavior
traitconsistencyidiographic
surfacerepresentations
surfacesoccludingcontour
psychologicalpsychology
reviewamerican
association
eventsinterpersonal
eventimpersonalequilibrium
categoriescategorymetaphor
objectmetaphors
motioncontrast
pathvisual
contour
leftcerebral
handednessspeechhuman
socialperceptionimpressionresearchapproach
sleepimagerydreaming
remeye
reinforcementbehaviorextinctionmatching
partial
binocularrivalry
stereopsismonocular
visual
structurerelations
scaledimensional
keys
riskconjunction
decisionprobabilities
risky
distanceretinal
disparityimage
perceived
perceptionvisual
directionrule
adaptation
partthinking
kindscientificactivities
behaviordevelopmentevolutionary
genescomparative
groupintelligenceintellectual
iqconnections
behaviorfood
drinkinghypothalamusphysiological
taskresource
performanceprocessinganaphors
developmentalsocialethnic
processesdevelopment
fearanxiety
painamygdalaautomatic
neuralvisual
neuronsbehavioralmasking
strategiesproblems
termconfirmation
limitationslanguagesemanticlinguisticthought
correlations
learningmapsmap
barrierparallel
statisticalheuristicsknow ledge
intuitiveheuristic face
recognitionfaces
damagedsemantic
Hidden Markov Topics Model
• Syntactic dependencies short range dependencies
• Semantic dependencies long-range
z z z z
w w w w
s s s s
Semantic state: generate words from topic model
Syntactic states: generate words from HMM
(Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
MODELALGORITHM
SYSTEMCASE
PROBLEMNETWORKMETHOD
APPROACHPAPER
PROCESS
ISWASHAS
BECOMESDENOTES
BEINGREMAINS
REPRESENTSEXISTSSEEMS
SEESHOWNOTE
CONSIDERASSUMEPRESENT
NEEDPROPOSEDESCRIBESUGGEST
USEDTRAINED
OBTAINEDDESCRIBED
GIVENFOUND
PRESENTEDDEFINED
GENERATEDSHOWN
INWITHFORON
FROMAT
USINGINTOOVER
WITHIN
HOWEVERALSOTHENTHUS
THEREFOREFIRSTHERENOW
HENCEFINALLY
#*IXTN-CFP
EXPERTSEXPERTGATING
HMEARCHITECTURE
MIXTURELEARNINGMIXTURESFUNCTION
GATE
DATAGAUSSIANMIXTURE
LIKELIHOODPOSTERIOR
PRIORDISTRIBUTION
EMBAYESIAN
PARAMETERS
STATEPOLICYVALUE
FUNCTIONACTION
REINFORCEMENTLEARNINGCLASSESOPTIMAL
*
MEMBRANESYNAPTIC
CELL*
CURRENTDENDRITICPOTENTIAL
NEURONCONDUCTANCE
CHANNELS
IMAGEIMAGESOBJECT
OBJECTSFEATURE
RECOGNITIONVIEWS
#PIXEL
VISUAL
KERNELSUPPORTVECTOR
SVMKERNELS
#SPACE
FUNCTIONMACHINES
SET
NETWORKNEURAL
NETWORKSOUPUTINPUT
TRAININGINPUTS
WEIGHTS#
OUTPUTS
NIPS Semantics
NIPS Syntax
Random sentence generation
LANGUAGE:[S] RESEARCHERS GIVE THE SPEECH[S] THE SOUND FEEL NO LISTENERS[S] WHICH WAS TO BE MEANING[S] HER VOCABULARIES STOPPED WORDS[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Software
Public-domain MATLAB toolbox for topic modeling on the Web:
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm