crowd-augmented social aware search soudip roy chowdhury & bogdan cautis

Crowd-Augmented Social Aware Search

Soudip Roy Chowdhury & Bogdan Cautis

What are we talking about?• Social Aware Search– Finding results relevant for the query and for the

users (seeker)– Web Search (tf-idf) + Social search (social

connections e.g., follower-following links)• However,– Required numbers of results (K items) are not found– Algorithm does not ensure the quality of the

retrieved results• Our aim is to





Use





Use For Datasourcing





To address the following problems efficiently

Lets see an example!

Query: get top 4 tweets for the query terms “#jesuscharlie #jesuisahmed”

Hashtag Term Tweet ID Frequency

#jesuischarlie D1 1

D2 1

D3 0

D4 2

D5 1

D6 0


#jesuisahmed D1 0

D2 1

D3 1

D4 1

D5 1

D6 0

By aggregating term-frequencies we get the final

result


#jesuischarlie D1 1

D2 1

D3 0

D4 2

D5 1

D6 0


#jesuisahmed D1 0

D2 1

D3 1

D4 1

D5 1

D6 0


#jesuischarlie #jesuisahmed

D1 1

D2 2

D3 1

D4 3

D5 2

D6 0

and top-4 items are

Similarly the social scores are calculated

TweetID Hashtag term

Author Social Score

D1 #jesuischarlie Elham 0.9x0.9x0.5


#jesuischarlie Das 0.9x0.9

D3 #jesuisahmed Bob 0.9


#jesuischarlie Das 0.9x0.9

#jesuisahmed Das 0.9x0.9

D5 #jesuischarlie Chang 0.6

#jesuisahmed Chang 0.6

Hashtag Term Tweet ID Social score

#jesuischarlie D1 0.4

D2 1.21

D3 0

D4 1.21

D5 0.6


#jesuisahmed D1 0

D2 0

D3 0.9

D4 0.81

D5 0.6



D1 0.4

D2 1.21

D3 0.9

D4 2.02

D5 1.2

and top-4 items are

Top-k results with social score!

Social-aware search• Final results are calculated based on the score

model– score(item|seeker,tag)= α × tf-idf(tag,item)+(1-α) ×

sc(item|seeker,tag)• Following this model, the top-4 results for our

example scenario – D4, D2, D5, and D3

• Let us know consider some additional constraints to make sure the results are good in quality

#Constraint: Each result item must at least be tagged twice

Example scenario with quality constraints

Hence top-4 items

are



D1 1

D2 2

D3 1

D4 3

D5 2

D6 0



D1 0.4

D2 1.21

D3 0.9

D4 2.02

D5 1.2

#Constraint: Social score for an item must be > 1, in order to be in the final result list

Example scenario with quality constraints

Hence top-4 items

are

List of quality constraints

1. Min # of posts for item-tag pair2. Min # of distinct tags per item3. Min # of tag occurrences per item4. Threshold for social score5. Threshold stability measures for tags– Based on moving average of relative tag

frequency distribution [1]

To be in the top-k result list an item, apart from the social aware search based threshold must also satisfy these constraints

• Items that do not meet the constraints are friendsourced

• Friendsourcing tasks are designed to improve the quality of the top-k result

• Friendsourcing tasks = I , T , U , where items ⟨ ⟩I are friendsourced to friends U and U provide tags T for items

Friendsourcing

Human Tasks

• T1: Minimum number of posts for an item-tag pair - I1,t1,{u1,u2,...,uk}⟨ ⟩

• T2: Minimum number of distinct tags: I1 , {t1 , t2 ⟨, . . . , tn }, {u1, u2, . . . , uk} ⟩

• T3: Minimum number of tag occurrences: {{{I1,I2,...,ln},t1,{u1,u2,...,uk}}, {I1,I2,...,ln},t2, ⟨

{u1,u2,...,uk}}, ..., {I1,I2,...,ln},tn,{u1,u2,...,uk}}} ⟩

Human Tasks

• T4: Minimum number of taggers: I1,⟨{t1,t2,...,tn}, {u1,u2,...,uk}⟩

• T5: Minimum network-aware score: I1,⟨{t1,t2,...,tn}, {u1,u2,...,uk}⟩

• T6: Stability-based tag quality: I1, t1, {u1, ⟨u2, . . . , uk} ⟩

Human Task Optimization

• Problem 1:– Given a set of items {I1, I2, …, In} in a result list,

that do not satisfy constraints (C1, C2,…., C6)– Choose an item / set of items that can complete

the top-k result list with minimum numbers of tasks

– Inter-item gain

Human Task Optimization contd.

• Problem 2:– Given a chosen item I {I1, … In}, that does not

satisfy constraints (C1, C2,…., C6)– Choose a task Ti {T1,..,T6}, such that it can

satisfy maximum number of constraints by minimizing the total numbers of tasks/item

– Intra-item gain

Human Task Optimization contd.

• Problem 3:– Given a set of items {I1, I2, …, In} in a result list,

that do not satisfy constraints (C1, C2,…., C6)

– Choose a set of tasks , where I {1,..,n} and j {1,..,6} such that the intra and inter item gain is maximized

Solution hints

• Problem 1. – Among n items in the result lists, if for each of the

constraints we create a partial ordered list of items wrt constraint thresholds

– Item/s that appear highest/er positions in most of the list are chosen first to be friendsourced

Solution hints

• Problem 2. – There exists conditional probabilistic dependencies among

constraints– E.g., , where P(T2) =1

– we aim to find i such that the value of the conditional probability value of

– Is maximized

Expert Selection Criteria• Users are selected based on– User expertise score– Communication cost

• User expertise score (Expui)– User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes

Expert Selection Criteria• Users are selected based on– User expertise score– Communication cost

• User expertise score (Expui)– User profile/activity attributes – Question-specific user expertise – Algorithm-specific user attributes

• Communication cost (Costui)– Social score

System Architecture

1

System Architecture

2✔✔

System Architecture

3

✔

System Architecture

4

System Architecture

5

✔

✔

System Architecture

6

System Architecture

7

✔

✔

✔

Screenshots of CANTO (Search Interfaces)

Screenshots of CANTO (Friendsourcing Seeker perspective)

Screenshots of CANTO (Friendsourcing provider perspective)

Summary

• We are working on – Augmenting crowd aka friends for Social-aware search – Algorithm for generating both adhoc and best efforts

social-aware search result – Advanced expert selection algorithm

• Considering both budget and time constraints

– Planning to explore • MAB for expert selection• So far used tweets and hashtags for experimentation, planning

to experiments with Vodkaster dataset (user network, films, comments, micro critique data).

Thank you for your attention!

References

1. William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst.

2. Xuan S. Yang, David W. Cheung, Luyi Mo, Reynold Cheng, and Ben Kao. 2013. On incentive-based tagging. In Proceedings of the 2013 IEEE International Conference on Data Engineering.

crowd-augmented social aware search soudip roy chowdhury & bogdan cautis

Documents

social aware search

retrieved results

numbers of results

quality slide

social scores

jesuisahmed slide

datasourcing slide

final result slide