applying data science to move beyond keywords for social analysis
TRANSCRIPT
Applying Data Science toMove Beyond Keywords for Social Analysis
Richard CaudleDirector
Developer Relations
Claudio WeeraratneDirectorProduct
Management
DATASIFT FORUM
RUN ON THE BANKS?
RUN ON THE BANKS?
AMBIGUITY OF NATURAL LANGUAGE
RUN ON THE BANKS?
MOVING BEYOND KEYWORDS
bank
similarity x
with-
draw
similarity y
AND
interaction.content any "rbs,lloyds,hsbc,barclays"AND interaction.content any "withdraw,close,cashpoint,atm"
CONCEPT MODELING
KEYWORD RELATIONSHIPS
CONCEPT MODEL
rbs
VECT
OR S
PACE
#rbs
runningbacks
#hsbc
OUR APPROACH• Produce a vector space where words are grouped by
their context• Context of a word is given by surrounding words• Perform unsupervised machine learning to learn topics• word2vec is a well known implementation• gensim is a Python library that simplifies word2vec
usage• Resulting model is queryable for similarity (of word
vectors)• Language-agnostic solution
LEARNING SIMILARITY
Learn to predict a word from surrounding words
"I'm heading to #rbs to close my account"
rbs
account
closerbs
account
close
hsbcbarclays
withdrawbalance
cash money
(1000's posts)
CONCEPT 'BANK'NE
URAL
NET
WOR
K
LEARNING SIMILARITY
DEMO
IMPROVED FILTERING & CLASSIFICATION
interaction.content similar "bank,hsbc:0.7"AND interaction.content similar "withdraw:0.8"
interaction.content any "rbs,lloyds,hsbc,barclays"AND interaction.content any "withdraw,close,cashpoint,atm"
CONCISEINTUITIVE
MAINTAINABLE
UP-TO-DATEHIGHER COVERAGE
ACCURACY
IMPROVING OUR PLATFORM
• Further validation of approach• Operationalization of model production
• Creation new models for different audiences• Automated updating of models
• Implementation of 'similarity' in CSDL
Q&A
LEARN MOREdatasift.com/forum
THANK YOU