expertise finding for question answering (qa) services march 5, 2014march 5, 2014 department of...

Expertise Finding for Question Answering (QA) ServicesMarch 5, 2014Department of Knowledge Service EngineeringProf. Jae-Gil Lee

The First Wednesday Multidisciplinary Forum 23/5/2014

Table of Contents• Community-based Question Answering (CQA) Services• Background and Motivation

• Methodology Overview

• Evaluation Results

• Social Search Engines for Location-Based Questions• Background and Motivation

• System Architecture and User Interface



Question Answering (QA) Services

QA services are good at Recently updated information Personalized information Advice & opinion[Budalakoti et al., 2010]

Questions Answers KnowledgeBase

Search

Experts


Community-based Question Answering (CQA) Services

Naver Knowledge-In Yahoo! Answers

50,000 questions per day 160,000 questions per day


Motivation of Our Study• Most contributions (i.e.,

answers) in CQA services are made by a small number of heavy users

• Recently-joined users are prone to leave CQA ser-vices very soon

Only 8.4% of answerers remained after a year

Making the long tail stay longer before they leave is of prime importance towards the success of the services


Problem Setting• To whom does the service provider need to pay special

attention? Recently-joined (i.e., light) users who are likely to become contributive (i.e., heavy) users

• Goal: estimating the likelihood of a light user becoming a heavy user (mainly by his/her expertise)

• Challenges: lack of information about the light user

어장관리 ?


Intuition behind Our Methodology• A person’s active vocabulary reveals his/her

knowledge

• Vocabulary has sharable characteristics so that domain-specific words are repeatedly used by expert answerers

SSD

NAND

ECC

RAM

Device

Memory

Computer

NAND

ECC

RAMSSD

Operation

Data

Drive

Q&A 1 by Answerer 1 Q&A 2 by Answerer 2

Domain-SpecificVocabularies

CommonVocabularies

LevelDifference

SharableCharacteristics


Estimated Expertise

Heavy Users Words Light Users

The more expert a user is, the higher the level of words he/she used is.


Availability• Simply measuring the number of a user’s answers with

their importance proportional to their recency


Answer Affordance• Being defined as the likelihood of a light user becom-

ing a heavy user if he/she is treated specially

• Considering both expertise and availability

𝐴𝑓𝑓𝑜𝑟𝑑𝑎𝑛𝑐𝑒 (𝑢𝑙 )=¿


Data Set• Collected from Naver Knowledge-In (KiN, 지식인 )

• Spanning ten years (from Sept. 2002 to Aug. 2012)

• Including two categories: Computers and Travel• Computers: factual information, Travel: subjective opinions

• The entropy was used for measuring the expertise of a user, working well especially for the categories where factual exper-tise is primarily sought after [Adamic et al., 2008]

• StatisticsComputers Travel

# of answers 3,926,794 585,316

# of words 191,502 232,076

# of users 228,369 44,866


Evaluation Setting (1/2)• Finding the top-k users by

Affordance() for light users our methodology

• Retrieving the top-k directoryexperts managed by KiN competitor

• Measuring the two measuresfor the next one month• User availability: the ratio of the number of the top-k users

who appeared on the day to the total number of users who ap-peared on that day

• Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of an-swers posted on that day


Evaluation Setting (2/2)

Ten year period

Sept. 2002 July 2011 July 2012 Aug. 2012

Used for deriving the word levels Used for finding top-k experts by our methodology

Picked up the top-k directory experts managed by KiN

Monitored the user availability and answer possession


The result of the answer possession

The result of the user availability (a) Computers (b) Travel

(a) Computers (b) Travel

See the paper for the technical details.

Sung, J., Lee, J., and Lee, U., "Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services," In Proc. 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), Cambridge, Massachusetts, July 2013.

This paper received the Best Paper Award at AAAI ICWSM-13.


Table of Contents• Community-based Question Answering (CQA) Services• Background and Motivation

• Methodology Overview


• Social Search Engines for Location-Based Questions• Background and Motivation

• System Architecture and User Interface



Social Search (1/2)• A new paradigm of knowledge acquisition that relies

on the people of a questioner’s social network


Social Search (2/2)

If you want to get some opinions or advices from your online friends, what do you do?

Not knowing whom to ask Knowing whom to ask

Taking advantage of both approaches

Social Search


Location-Based Questions• Informally defined as “search for a business or place of

interest that is tied to a specific geographical location”[Amin et al., 2009]

• Very popular especially in mobile search and typically subjective• Mobile search is estimated to comprise 10% 30% of all ∼

searches About 9 10% of the queries from Yahoo! mobile ∼search and over 15% of 1 million Google queries from PDA de-vices were identified as location-based questions

• In a set of location-based questions, 63% of them were non-factual, and the remaining 37% of them were factual

Mobile social search is the best way to process loca-tion-based questions


Glaucus: A Social Search Engine for Location-Based Questions

1. Asking a question to Glaucus2. Selecting proper experts3. Routing the question to the experts4. Returning an answer to the questioner5. (Optional) Rating the answer

GlaucusSocial Search

Engine

User Database

1: Query

Users

2: Selected Experts

3: Query

Answer 4: Answer

5: Feedback

Crawling

Questioner


User Interface• An Android app has been developed and is under

(closed) beta testing

Questioner Answerer


Data Collection• Being able to collect who visited where and when on

geosocial networking services such as Foursquare• Users check-in to a venue and also may leave a tip

• Our crawler collects such information upon user approval


Expert Finding

Venue

Location

Category

Time

Misc.

Venue

Location

Category

Time

Misc.

Location Aspect Model

Questioner

Question

Other Users

Online Friend?

SimilarityCalculation

Score

Score

Score

Score

Top-k


Evaluation Setting• Collected check-in’s and tips from Foursquare

(foursquare.com)

• Confined to the places in the Gangnam District

• Ranging from April 2012 to December 2012

• Statistics Variable Value

# of users 9,163

# of places (venues) 1,220

# of check-in’s 243,114

# of tips 40,248


Evaluation Results

0123456789

10SocialTelescopeAardvark Glaucus

DC

G

Set 1 Set 2 Set 3

3.94 3.994.07

6.61 6.31 6.68

8.25 8.827.78

Experts Non-Experts1

2

3

An

swer

Rat

ing 2.37

1.97

Qualification of the Experts:Two human judges investigated the profiles of the experts selected by the three systems for 30 questions (distributed to 3 sets) and gave a score in 3 scales.

Quality of the Answers:Two human judges examined the quality of the answers―both from experts and non-experts―and gave a score in 3 scales.

See the paper for the technical details.

Choy, M., Lee, J., Gweon, G., and Kim, D., "Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments," In Proc. 8th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), accepted.

To appear in June 2014.

Thank you very much!Any Questions?E-mail: [email protected] Homepage: http://dm.kaist.ac.kr/

mailto:[email protected]

http://dm.kaist.ac.kr/

expertise finding for question answering (qa) services march 5, 2014march 5, 2014 department of...

Documents

qa services qa services

day slide

availability slide

recency slide

multidisciplinary forum8352014

cqa services naver knowledge

cqa services background

heavy users goal