crowd-augmented social aware search soudip roy chowdhury & bogdan cautis
TRANSCRIPT
Crowd-Augmented Social Aware Search
Soudip Roy Chowdhury & Bogdan Cautis
What are we talking about?• Social Aware Search– Finding results relevant for the query and for the
users (seeker)– Web Search (tf-idf) + Social search (social
connections e.g., follower-following links)• However,– Required numbers of results (K items) are not found– Algorithm does not ensure the quality of the
retrieved results• Our aim is to
What are we talking about?• Social Aware Search– Finding results relevant for the query and for the
users (seeker)– Web Search (tf-idf) + Social search (social
connections e.g., follower-following links)• However,– Required numbers of results (K items) are not found– Algorithm does not ensure the quality of the
retrieved results• Our aim is to
Use
What are we talking about?• Social Aware Search– Finding results relevant for the query and for the
users (seeker)– Web Search (tf-idf) + Social search (social
connections e.g., follower-following links)• However,– Required numbers of results (K items) are not found– Algorithm does not ensure the quality of the
retrieved results• Our aim is to
Use For Datasourcing
What are we talking about?• Social Aware Search– Finding results relevant for the query and for the
users (seeker)– Web Search (tf-idf) + Social search (social
connections e.g., follower-following links)• However,– Required numbers of results (K items) are not found– Algorithm does not ensure the quality of the
retrieved results• Our aim is to
To address the following problems efficiently
Lets see an example!
Query: get top 4 tweets for the query terms “#jesuscharlie #jesuisahmed”
Hashtag Term Tweet ID Frequency
#jesuischarlie D1 1
D2 1
D3 0
D4 2
D5 1
D6 0
Hashtag Term Tweet ID Frequency
#jesuisahmed D1 0
D2 1
D3 1
D4 1
D5 1
D6 0
By aggregating term-frequencies we get the final
result
Hashtag Term Tweet ID Frequency
#jesuischarlie D1 1
D2 1
D3 0
D4 2
D5 1
D6 0
Hashtag Term Tweet ID Frequency
#jesuisahmed D1 0
D2 1
D3 1
D4 1
D5 1
D6 0
Hashtag Term Tweet ID Frequency
#jesuischarlie #jesuisahmed
D1 1
D2 2
D3 1
D4 3
D5 2
D6 0
and top-4 items are
Similarly the social scores are calculated
TweetID Hashtag term
Author Social Score
D1 #jesuischarlie Elham 0.9x0.9x0.5
D2 #jesuischarlie Elham 0.9x0.9x0.5
#jesuischarlie Das 0.9x0.9
D3 #jesuisahmed Bob 0.9
D4 #jesuischarlie Elham 0.9x0.9x0.5
#jesuischarlie Das 0.9x0.9
#jesuisahmed Das 0.9x0.9
D5 #jesuischarlie Chang 0.6
#jesuisahmed Chang 0.6
Hashtag Term Tweet ID Social score
#jesuischarlie D1 0.4
D2 1.21
D3 0
D4 1.21
D5 0.6
Hashtag Term Tweet ID Social score
#jesuisahmed D1 0
D2 0
D3 0.9
D4 0.81
D5 0.6
Hashtag Term Tweet ID Social score
#jesuischarlie #jesuisahmed
D1 0.4
D2 1.21
D3 0.9
D4 2.02
D5 1.2
and top-4 items are
Top-k results with social score!
Social-aware search• Final results are calculated based on the score
model– score(item|seeker,tag)= α × tf-idf(tag,item)+(1-α) ×
sc(item|seeker,tag)• Following this model, the top-4 results for our
example scenario – D4, D2, D5, and D3
• Let us know consider some additional constraints to make sure the results are good in quality
#Constraint: Each result item must at least be tagged twice
Example scenario with quality constraints
Hence top-4 items
are
Hashtag Term Tweet ID Frequency
#jesuischarlie #jesuisahmed
D1 1
D2 2
D3 1
D4 3
D5 2
D6 0
Hashtag Term Tweet ID Social score
#jesuischarlie #jesuisahmed
D1 0.4
D2 1.21
D3 0.9
D4 2.02
D5 1.2
#Constraint: Social score for an item must be > 1, in order to be in the final result list
Example scenario with quality constraints
Hence top-4 items
are
List of quality constraints
1. Min # of posts for item-tag pair2. Min # of distinct tags per item3. Min # of tag occurrences per item4. Threshold for social score5. Threshold stability measures for tags– Based on moving average of relative tag
frequency distribution [1]
To be in the top-k result list an item, apart from the social aware search based threshold must also satisfy these constraints
• Items that do not meet the constraints are friendsourced
• Friendsourcing tasks are designed to improve the quality of the top-k result
• Friendsourcing tasks = I , T , U , where items ⟨ ⟩I are friendsourced to friends U and U provide tags T for items
Friendsourcing
Human Tasks
• T1: Minimum number of posts for an item-tag pair - I1,t1,{u1,u2,...,uk}⟨ ⟩
• T2: Minimum number of distinct tags: I1 , {t1 , t2 ⟨, . . . , tn }, {u1, u2, . . . , uk} ⟩
• T3: Minimum number of tag occurrences: {{{I1,I2,...,ln},t1,{u1,u2,...,uk}}, {I1,I2,...,ln},t2, ⟨
{u1,u2,...,uk}}, ..., {I1,I2,...,ln},tn,{u1,u2,...,uk}}} ⟩
Human Tasks
• T4: Minimum number of taggers: I1,⟨{t1,t2,...,tn}, {u1,u2,...,uk}⟩
• T5: Minimum network-aware score: I1,⟨{t1,t2,...,tn}, {u1,u2,...,uk}⟩
• T6: Stability-based tag quality: I1, t1, {u1, ⟨u2, . . . , uk} ⟩
Human Task Optimization
• Problem 1:– Given a set of items {I1, I2, …, In} in a result list,
that do not satisfy constraints (C1, C2,…., C6)– Choose an item / set of items that can complete
the top-k result list with minimum numbers of tasks
– Inter-item gain
Human Task Optimization contd.
• Problem 2:– Given a chosen item I {I1, … In}, that does not
satisfy constraints (C1, C2,…., C6)– Choose a task Ti {T1,..,T6}, such that it can
satisfy maximum number of constraints by minimizing the total numbers of tasks/item
– Intra-item gain
Human Task Optimization contd.
• Problem 3:– Given a set of items {I1, I2, …, In} in a result list,
that do not satisfy constraints (C1, C2,…., C6)
– Choose a set of tasks , where I {1,..,n} and j {1,..,6} such that the intra and inter item gain is maximized
Solution hints
• Problem 1. – Among n items in the result lists, if for each of the
constraints we create a partial ordered list of items wrt constraint thresholds
– Item/s that appear highest/er positions in most of the list are chosen first to be friendsourced
Solution hints
• Problem 2. – There exists conditional probabilistic dependencies among
constraints– E.g., , where P(T2) =1
– we aim to find i such that the value of the conditional probability value of
– Is maximized
Expert Selection Criteria• Users are selected based on– User expertise score– Communication cost
• User expertise score (Expui)– User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes
Expert Selection Criteria• Users are selected based on– User expertise score– Communication cost
• User expertise score (Expui)– User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes
• Communication cost (Costui)– Social score
System Architecture
1
System Architecture
2✔✔
System Architecture
3
✔
System Architecture
4
System Architecture
5
✔
✔
System Architecture
6
System Architecture
7
✔
✔
✔
Screenshots of CANTO (Search Interfaces)
Screenshots of CANTO (Search Interfaces)
Screenshots of CANTO (Friendsourcing Seeker perspective)
Screenshots of CANTO (Friendsourcing provider perspective)
Summary
• We are working on – Augmenting crowd aka friends for Social-aware search – Algorithm for generating both adhoc and best efforts
social-aware search result – Advanced expert selection algorithm
• Considering both budget and time constraints
– Planning to explore • MAB for expert selection• So far used tweets and hashtags for experimentation, planning
to experiments with Vodkaster dataset (user network, films, comments, micro critique data).
Thank you for your attention!
References
1. William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst.
2. Xuan S. Yang, David W. Cheung, Luyi Mo, Reynold Cheng, and Ben Kao. 2013. On incentive-based tagging. In Proceedings of the 2013 IEEE International Conference on Data Engineering.