data science popup austin: using lda and structural topic modeling to explore trending topics in a...
TRANSCRIPT
DATA SCIENCEPOP UP
AUSTIN
Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call Center
Jordana HellerData Scientist, Mattersight
jheller
DATA SCIENCEPOP UP
AUSTIN
#datapopupaustin
April 13, 2016Galvanize, Austin Campus
Lightning Talk: Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call CenterJordana Heller @jhellerData Science Pop-up Austin, April 13, 2016
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
What We Do
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Our goal: Topic Trends
3/31/2016 4/30/2016 5/31/2016 6/30/2016 7/31/2016
Identifying contents and prevalence of multiword topics present in conversation in an unsupervised way
Unexpected Prevalence Critical Spikes Escalating Frequency
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Our goals, continued
Manageable number of topics
Track expected and unexpected topics
Go deep: Contextualize topic usage
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Short text: Keywords, hashtags, ngrams
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Long text: Could use predetermined topics
Image credit: IBM Watson Concept Insights
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Long text: Or discover themes
Image credit: Blei, 2012, Communications of the ACM
Latent Dirichlet Allocation (LDA) (Blei et al., 2003)
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Great! How about contextualizing trends?
• Where are topics trending?• Structural Topic Modeling (Roberts et al., 2013)
– Instead of relying on post-hoc comparisons, includes covariates in LDA model• Specifies priors as GLMs• Word distribution determined by topic, covariates,
topic-covariate interaction– Authors’ implementation: R package stm (available
via CRAN; all code on GitHub!)
Ready to talk pipeline!
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Data Collection and Preprocessing
Read Transcripts
Add Call-level Covariates
Preprocess text
• Collocations• -Stop words• Stem/completion• -Low freq terms
Create Term-Document
Matrix
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Topic Model Creation
Retrieve last topic
model
• For comparison
Create current
topic model
•Detect number of topics, or specify
Create topic labels
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Topic Model Comparison
Inspect overall topic prevalence
Compare overall topic prevalence across periods
• Topics change! Measure change in word probability distributions for each new topic wrt each old topic
• Match new to closest previous match below change threshold (otherwise new topic)
• Evaluate trends!
Estimate and inspect effects of
covariates
Compare effects of covariates
across periods
•Output can be interpreted similarly to regression
Example results: Hotel reservations Covariates: booking, caller distress
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
� convention, center, mind, worry, philadelphia, inventory� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
� school, college, graduate, medical, clinic
� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
Ã30% beach, balcony, ocean, view
� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
Ä10% back, next, receive, listen, cash future
� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
� back, minute, system, run, inconvenience
� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Booking
Ã42% confirm, email, arrival, local
� NewÄ Decreasingà Increasing
Hit: > 1% of words on call assigned to a given topic
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Caller Distress
� NewÄ Decreasingà Increasing
Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Caller Distress
� square, city, price, hotel, manhattan, central
� NewÄ Decreasingà Increasing
Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Trend Contextualization: Caller Distress
Ä12% online, website, cancel, purchase, advance� NewÄ Decreasingà Increasing
Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion
Nice!
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Our goals, revisited
Manageable number of topics
Track expected and unexpected topics
Go deep: Contextualize topic usage
©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.
Topic trends using structural topic models
Thank you!
DATA SCIENCEPOP UP
AUSTIN
@datapopup #datapopupaustin