search tasks, proactive search & digital assistants
TRANSCRIPT
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Search Tasks, Proactive Search & DigitalAssistants
Rishabh Mehrotra
University College [email protected]
Machine Learning Meetup, Gurgaon
Saturday 25th June, 2016
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Modelling Search Tasks & Behaviors
1 Motivation
2 From Sessions to Tasks
3 Extracting Tasks & Subtasks
4 Hierarchies of Tasks & Subtasks
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Evolution of Web Search
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Search is Everywhere
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Web Search Volume
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Next Generation Systems
Towards Task-based SearchSystems
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants
Information Cards:
From Queries to Cards (Shokouhi et al. SIGIR’15)Modelling User Interests for Zero-query Ranking (Yang etal. ECIR’16)
Intelligent Assistants:
Understanding User Satisfaction with Intelligent Assistants(Kiseleva et al. CHIIR’16)Automatic Online Evaluation of Intelligent Assistants(Jiang et al. WWW’15)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants
Modelling user interaction on Information Cards:
Click-based (pseudo) relevance labels may not beappropriate for evaluating all types of cards.
Features
Reactive HistoryProactive HistoryLexical/Topical FeaturesLocal/Temporal FeaturesConstant Features
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Slight Detour - Proactive IR & Digital Assistants1
1Understanding User Satisfaction with Intelligent Assistants, Kiseleva etal. ACM CHIIR 2016
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Putting it together
Digital Assistants - entry points for all web activities
Goal is to move towards Task Completion
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Understanding Search Tasks
Search Tasks
An (atomic?) information need that may result in issuing ofone or more queries.
Image from Verma et al. 2014
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Why Search Tasks
Image from Wang et al. 2014
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Outline
Part 1 Understanding Search Sessions
Part 2 Extracting Tasks & Subtasks
Part 3 Hierarchies of Tasks & Subtasks
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Understanding Search Behavior
IR research = sessions, topics, queries
We focus on:
1 Multitasking
2 User groups based on multitasking
3 Topical heterogeneity
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Research Questions
1 RQ1: Extent of multi-tasking in search sessions?
2 RQ2: Evidence of user-level heterogeneity based on taskbehavior?
1 RQ2.1: Does search task effort vary across user groups?2 RQ2.2: Association between users’ interests and task
multiplicity?
3 RQ3: Characterizing various heterogeneities in searchbehavior.
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Data Context
No of Queries 620M
No of Sessions 190M
No of Users 2M
Avg No of queries per session 3.18
Avg no of sessions per user 76.01
Avg no of tasks per session 2.08
Table: Data summary
Time Query SessionID TaskID Topic
05/29/2012 14:06:04 adele songs 1 1 Arts05/29/2012 14:11:49 wedding venue 1 2 Society05/29/2012 14:12:01 video download 1 3 Arts
05/29/2012 14:06:04 Obama care 2 4 News05/29/2012 14:11:49 running shoes 2 5 Shopping05/29/2012 14:12:01 sports shoes 2 5 Shopping05/29/2012 14:22:12 wedding cards 2 2 Society
Table: Sample search sessions
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extent of Multitasking
Quantifying the extent of multi-tasking
0%
20%
40%
60%
80%
100%
% se
ssions
No of tasks in a session
EXTENT OF MULTITASKING IN SESSIONS
2 3 4 5 6 7 8 9 10 10+ 1 110629692
0
10
20
30
40
50
60
No of Users (In %)
Avg Number of Tasks Performed in a Session 1 2 3 4 5 6 7 8 9 10 10+
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
User Groups
Focussed Users
Mul--‐taskers
Supertaskers
0%
20%
40%
60%
80%
100%
% of U
ser P
opula-
on
Types of User Groups
TASK CENTRIC USER GROUPS
Focussed Mul--‐Taskers Supertaskers
Figure: User groups based on multi-tasking behaviors
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
User Groups: Example Sessions
User Type Example Queries from a Typical Session
Focussed User
”test guide.com”, ”CNA Practice Test”, ”CNA StateBoard Exam”, ”CNA Testing Schedule and Loca-tions”, ”CNA State Board Practice Test”, ”CNAPractice Test 2014”, CNA 50 Questions Test”, ”FreeGED Practice Test 2014”
Multi-Tasking User”Gravity FSX 2.0”, ”Full Suspension MountainBikes”, ”Walmart Cards”, ”Walmart Instant CardApplication”, ”Gravity FSX 2.0 price”, ”full suspen-sion bike sale”
Supertasking User
”hairstyles for women over 50”, ”thin wavy hairstylesfor women”, ”facebook”, ”fb sign in”, pulledpork crock pot recipe easy”, ”Slow-Roasted PulledPork”, ”barefoot contessa”, ”miley cyrus hair styles”,”hairstyler.com”
Table: Example query sessions from the different user groups.
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Effort Metrics across User Groups
TTFC: Time to First Click TTLC: Time to Last ClickPCC: Page Click Count PgCC: Pagination Click Count
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
TTFC TTLC PCC PgCC
Focussed
Multi-taskers
Super-taskers
Stan
dar
dis
ed s
core
s
Figure: User groups based on multi-tasking behaviors
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Topical Heterogeneity
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Shopping
Home
Kids
Health
Extent of Mul,tasking
(a) Focused
0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66
News
Sports
Arts
Kids
Extent of Mul,tasking
(b) Multitasker
0.88 0.9 0.92 0.94 0.96
News
Sports
Arts
Society
Extent of Mul,tasking
(c) Supertasker
0 0.1 0.2 0.3 0.4 0.5
Arts
Adult
Games
Computers
Extent of Single-‐Tasking
(d) Focused
-‐0.6 -‐0.5 -‐0.4 -‐0.3 -‐0.2 -‐0.1 0
Business
Adult
Computers
Games
Extent of Single-‐Tasking
(e) Multitasker
-‐0.86 -‐0.84 -‐0.82 -‐0.8 -‐0.78 -‐0.76 -‐0.74
Business
Home
Computers
Games
Extent of Single-‐Tasking
(f) Supertasker
Figure: Top topics prone to multi-tasking (Top) and single-tasking(Bottom) across different user groups.
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Takeaways2
Sessions should NOT be the focal units in IR
More than 70% of sessions are multi-task
Users differ based on their multi-tasking habits
Search effort varies across user groups
Some topics prone to multi-tasking than others
2R. Mehrotra, P. Bhattacharya, E. Yilmaz; Characterizing UsersMulti-Tasking Behavior in Web Search, at ACM SIGIR ConferenceCHIIR 2016
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Outline
Part 1 Understanding Search Sessions
Part 2 Extracting Tasks & Subtasks
Part 3 Hierarchies of Tasks & Subtasks
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extracting Tasks & Subtasks
1 Task extraction
2 Subtask extraction
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extracting Search Tasks
Various proposed strategies:
Clustering session based queries [Lucchese, WSDM’11]
Structured Learning Approach [Wang, WWW’13]
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extracting Search Tasks
Various proposed strategies:
Hawkes Process based Task Extraction [Li, KDD’14]
Entity-based Task Extraction [Verma, CIKM’14][White,CIKM’14]
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extracting Search Tasks
Problems abound:
Link query to on-going task = long chains
Impure tasks
Pairwise predictions need extra post-processing
Rely on large corpus of pre-tagged queries
Do not aggregate across users
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Random Fields based Task Extraction3
Propose a latent variable model for extracting tasks:
Assumes k-underlying search tasks
Places a Markov Random Field over the latent task layer
Assumes access to a task-oracle
Each latent task is a multinomial distribution over queryterms
3R. Mehrotra, E. Yilmaz; Extracting Search Tasks via Task RandomField Model, SIGIR 2016 (under review)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Random Fields based Task Extraction
Figure: The proposed task-RF model for extracting search tasks.
The generative process:
Draw task proportions vector θ ∼ Dir(α)
Draw task labels z for all queries from the jointdistribution defined in Eq.(1)
For each query (qi ) in the session:
For each query term (wqi ,j)draw wqi ,j ∼ multi(βzi )
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Random Fields based Task Extraction
A query marginal factorizes into its query terms:
p(q|z , β) =
Mq∏j=1
p(wj |z , β) (1)
The joint distribution of θ, z , q and w can be written as:
p(θ, z ,w |α, β, λ) = p(θ|α)p(z |θ, λ)N∏i=1
Mqi∏j=1
p(wqi ,j |zi , β) (2)
Under the MRF model, the joint probability of all taskassignments z = zi
Ni=1 can be written as:
p(z |θ, λ) =1
A(θ, λ)
N∏i=1
p(zi |θ) exp{λ∑
(m,n)∈E
1(zm = zn)} (3)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Random Fields based Task Extraction
Task Oracle: classifier which predicts whether or not 2 queriesbelong to the same task.
Query Based Featurescosine cosine similarity between the term sets of the queries
edit norm edit distance between query strings
Jac Jaccard coeff between the term sets of the queries
Term proportion of common terms between the queries
url-term-match-sum∑
u∈URLi m(qj , u) +∑
u∈URLj m(qi , u)
url-term-match-max maxu∈URLim(qj , u) + maxu∈URLjm(qi , u)
Embedding cosine distance between embedding vectors of the two queries
URL Based Featuresdom-match the number of domain matches between URLs from both queries
Min-edit-U Minimum edit distance between all URL pairs from the queries
Avg-edit-U Average edit distance between all URL pairs from the queries
Jac-U-min Minimum Jaccard coefficient between all URL pairs from the queries
Jac-U-avg Average Jaccard coefficient between all URL pairs from the queries
Table: Features used by the classifier.
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Extracting Tasks & Subtasks
1 Task extraction
2 Subtask extraction
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Subtask Extraction4
Complex task decompose to more focussed subtasks
Number of subtasks is unknown
Given: collection of on-task queries
Couple Bayesian Nonparametrics & Word Embeddings
4Subtask Extraction: R. Mehrotra, P. Bhattacharya, E. Yilmaz; ExploitingDistributional Representations with Distance Dependent CRPs for ExtractingSub-Tasks. Accepted at NAACL 2016
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Chinese Restaurant Process
p(zi = j‖D, α) ∝{f (dij) if j 6= i
α if j = i
1 For each query i ∈ [1,N], draw assignmentzi ∼ dist − CRP(α, f ,D).
2 For each sub-task, k ∈ {1, ....}, draw a parameter θ∗k ∼ G0.
3 For each query i ∈ [1,N],1 If ci /∈ z∗q1:N
, set the parameter for the i th query to θi = θqi .Otherwise draw the parameter from the base distribution,θi ∼ Dirichlet(λ).
2 Draw the ith query, wi ∼ Mult(M, θi ).
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Quantifying Task based Distances
Task: plan a weddingSample queries: wedding planning, wedding checklist, bridaldresses, wedding cards
Leverage word embeddings.
Classify each word as background word orsubtask-specific word.
Use a weighted combination of their embedding vectors toencode a query’s vector:
Vq =1
nterms
∑i
nqtiΣqnq
Vti (4)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Evaluating Quality of Subtasks
Qualitative Analysis:Proposed Approach LDA
sub-task 1 sub-task 2 sub-task 3 sub-task 1 sub-task 2 sub-task 3
wedding hairstyles used wedding dresses wedding card holders wedding insurance christian wedding vows make your own wedding invitations
wedding hair dos colorful bridal gowns indian wedding program cards destination wedding dresses brides wedding cakes pictures
curly wedding hairstyles preowned wedding dresses wedding program paper wedding planning book cheap wedding dresses planners
pictures of wedding hair wedding attire regency wedding cards party supply stores tea length wedding dresses wedding colors
CRP TW
sub-task 1 sub-task 2 sub-task 3 sub-task 1 sub-task 2 sub-task 3
wedding planning kit wedding theme wedding insurance wedding insurance christian wedding vows cheap dresses
destination wedding wedding guide weddings in vegas destination wedding plus size bridesmaid wedding cakes pictures
wedding table decorations save the date ideas wedding cakes pictures financing wedding rings wedding colors pricing weddings
1938 wedding pictures wedding vacation planning a wedding party supply stores tea length wedding dresses macys wedding dresses
User Study:
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Takeaways
Latent variable model for task extraction based on Taskoracle
Extracts more coherent tasks
Nonparametric model for extracting subtasks
Couple nonparametric approaches with word embeddings
Human judgements & coherence estimates show improvedresults
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Outline
Part 1 Understanding Search Sessions
Part 2 Extracting Tasks & Subtasks
Part 3 Hierarchies of Tasks & Subtasks
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Hierarchies of Tasks & Subtasks
Complex search tasks places intense cognitive burden onthe userUsers need to explore domain-spaceFlat structure-less tasks lack insights about demarcationof subtasksProvides tremendous opportunity to contextually targetthe user (recommendations/advertisements)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Constructing Bayesian Hierarchies
Tree as partitions & mixtures:Tree: partition over the group of queries (Q) where probabilityof a query group is defined via Dynamic Programming as
p(Q|T ) = πT f (Q) + (1− πt)∏
Ti∈ch(T )
p(leaves(Ti )|Ti ) (5)
Gamma-Poisson Model of Query Affinities:
Query-Term Based Affinity (r1)cosine cosine similarity between the term sets of the queries
edit norm edit distance between query strings
Jac Jaccard coeff between the term sets of the queries
Term proportion of common terms between the queries
URL Based Affinity (r2)Min-edit-U Minimum edit distance between all URL pairs from the queries
Avg-edit-U Average edit distance between all URL pairs from the queries
Jac-U-min Minimum Jaccard coefficient between all URL pairs from the queries
Jac-U-avg Average Jaccard coefficient between all URL pairs from the queries
Session/User Based Affinity (r3)Same-U if the two queries belong to the same user
Same-S if the two queries belong to the same session
Embedding Based Affinity (r4)Embedding cosine distance between embedding vectors of the two queries
Table: Query-Query Affinities.
f (Q) =i=3∏i=1
p(Ri |αi , βi ) (6)
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Constructing Bayesian Hierarchies
Forming Arbitrary Tree Structure:Bayesian Rose Trees (NIPS 2012)
Agglomerative Model Selection:
Bottom-up greedy agglomerative fashion
At each iteration: merges two of the trees in the forest
Which pair of tree to merge(& how) - merger that yieldsthe largest Bayes factor improvement
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Takeaways5
We can extract recursive hierarchies of tasks & subtasks
Nonparametric tree partition model for hierarchies
Allows for arbitrary structures
Improves term prediction & task extraction
Human labelled judgements show:
we extract coherent tasksSubtasks are validSubtasks are useful for completing the overall task
5R. Mehrotra, E. Yilmaz; Towards Hierarchies of Search Tasks & Subtasks,In Proceedings of the Poster Track WWW 2015
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
Thank You!
ML Meetup
RishabhMehrotra
Motivation
Search Tasks
From Sessionsto Tasks
Multitasking
User Groups
TopicalHeterogeneity
ExtractingTasks &Subtasks
Task Extraction
SubtaskExtraction
Hierarchies ofTasks &Subtasks
References
R. Mehrotra, P. Bhattacharya, and E. Yilmaz.
Characterizing users’ multi-tasking behavior in web search.In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval,2016.
R. Mehrotra, P. Bhattacharya, and E. Yilmaz.
Deconstructing complex search tasks: A bayesian nonparametric approach for extracting sub-tasks.In Proceedings of NAACL-HLT, pages 599–605, 2016.
R. Mehrotra, P. Bhattacharya, and E. Yilmaz.
Sessions; tasks & topics - uncovering behavioral heterogeneities in online search behavior.In Proceedings of the 2016 39th International ACM SIGIR Conference on Research and Developmentin Information Retrieval. ACM, 2016.
R. Mehrotra and E. Yilmaz.
Towards hierarchies of search tasks & subtasks.In WWW 2015.
R. Mehrotra and E. Yilmaz.
Terms, topics & tasks: Enhanced user modelling for better personalization.In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pages131–140. ACM, 2015.
R. Mehrotra, E. Yilmaz, and M. Verma.
Task-based user modelling for personalization via probabilistic matrix factorization.In RecSys Posters, 2014.
D. Odijk, R. W. White, A. Hassan Awadallah, and S. T. Dumais.
Struggling and success in web search.In CIKM 2015.
White, Bennett, and Dumais.
Predicting short-term interests using activity-based search context.In CIKM 2010.
B. Xiang, D. Jiang, J. Pei, X. Sun, E. Chen, and H. Li.
Context-aware ranking in web search.In SIGIR 2010.
Eugene Agichtein, Ryen W White, Susan T Dumais, and Paul N Bennet.
2012.Search, interrupted: understanding and predicting search task continuation.In Proceedings of the 35th international ACM SIGIR conference on Research and development ininformation retrieval, pages 315–324. ACM.
David M Blei and Peter I Frazier.
2011.Distance dependent chinese restaurant processes.The Journal of Machine Learning Research, 12:2461–2488.
David M Blei, Andrew Y Ng, and Michael I Jordan.
2003.Latent dirichlet allocation.the Journal of machine Learning research, 3:993–1022.
Matthias Hagen, Jakob Gomoll, Anna Beyer, and Benno Stein.
2013.From search session detection to search mission detection.In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pages 85–92.LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE.
Rosie Jones and Kristina Lisa Klinkner.
2008.Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs.In Proceedings of the 17th ACM conference on Information and knowledge management, pages699–708. ACM.
Diane Kelly, Jaime Arguello, and Robert Capra.
2013.Nsf workshop on task-based information search systems.In ACM SIGIR Forum, volume 47, pages 116–127. ACM.
Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan.
2011.Modeling and analysis of cross-session search tasks.In Proceedings of the 34th international ACM SIGIR conference on Research and development inInformation Retrieval, pages 5–14. ACM.
M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger.
2015.From word embeddings to document distances.In ICML.
Zhen Liao, Yang Song, Li-wei He, and Yalou Huang.
2012.Evaluating the effectiveness of search task trails.In Proceedings of the 21st international conference on World Wide Web, pages 489–498. ACM.
Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Gabriele Tolomei.
2011.Identifying task-based sessions in search engine query logs.In Proceedings of the fourth ACM international conference on Web search and data mining, pages277–286. ACM.
Rishabh Mehrotra and Emine Yilmaz.
2015a.Terms, topics & tasks: Enhanced user modelling for better personalization.In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pages131–140. ACM.
Rishabh Mehrotra and Emine Yilmaz.
2015b.Towards hierarchies of search tasks & subtasks.In Proceedings of the 24th International Conference on World Wide Web Companion, pages 73–74.International World Wide Web Conferences Steering Committee.
Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie.
2013.Improving lda topic models for microblogs via tweet pooling and automatic labeling.In Proceedings of the 36th international ACM SIGIR conference on Research and development ininformation retrieval, pages 889–892. ACM.
Rishabh Mehrotra, Emine Yilmaz, and Manisha Verma.
2014.Task-based user modelling for personalization via probabilistic matrix factorization.In RecSys Posters.
Rishabh Mehrotra, Prasanta Bhattacharya, and Emine Yilmaz.
2016.Characterizing users’ multi-tasking behavior in web search.In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval,CHIIR ’16, pages 297–300, New York, NY, USA. ACM.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean.
2013.Distributed representations of words and phrases and their compositionality.In Advances in neural information processing systems, pages 3111–3119.
Greg Pass, Abdur Chowdhury, and Cayley Torgeson.
2006.A picture of search.In InfoScale, volume 152, page 1.
Amanda Spink, Sherry Koshman, Minsoo Park, Chris Field, and Bernard J Jansen.
2005.Multitasking web search on vivisimo. com.In Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on,volume 2, pages 486–490. IEEE.
Ming Sun, Yun-Nung Chen, and Alexander I Rudnicky.
2015.Understanding users cross-domain intentions in spoken dialog systems.In NIPS Workshop on Machine Learning for SLU and Interaction.
Manisha Verma and Emine Yilmaz.
2014.Entity oriented task extraction from query logs.In Proceedings of the 23rd ACM International Conference on Conference on Information andKnowledge Management, pages 1975–1978. ACM.
Chong Wang and David M Blei.
2009.Variational inference for the nested chinese restaurant process.In Advances in Neural Information Processing Systems, pages 1990–1998.
Hongning Wang, Yang Song, Ming-Wei Chang, Xiaodong He, Ryen W White, and Wei Chu.
2013.Learning to extract cross-session search tasks.In Proceedings of the 22nd international conference on World Wide Web, pages 1353–1364.International World Wide Web Conferences Steering Committee.