creating models of real-world communities with referralweb henry kautz university of washington bart...

38
Creating Models of Real- World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

Upload: cody-sullivan

Post on 25-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

Creating Models of Real-World Communities with ReferralWebCreating Models of Real-World Communities with ReferralWeb

Henry KautzUniversity of Washington

Bart SelmanCornell University

Page 2: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

2

Recommender SystemsRecommender Systems

New category of software: programs that make personalized recommendations of goods, services, and people

• Amazon.com - books

• Jango.com - stores

• Whowhere.com - friends

Current methodsContent-based: find things similar to ones you like

Collaborative-filtering: find things liked by people who are similar to you

Explosive growth• Viewed as crucial for e-commerce sites

• Excite: 100,000,000+ recommendations per day!

Page 3: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

3

Anonymous OpinionsAnonymous Opinions

Most recommender systems hide the identity of the sources of the recommendations

• E-communities: fictitious identities

• Matchmaker systems: deliberately hide true identities

• Collaborative filtering: aggregation - no one to trust (or blame!)

Result: anonymous opinions• Okay choosing a movie or CD

• But would you bet your job on that “recommendation”?!

– Gee, boss, the project failed, but somebody on the net, I don't know who, said it was a good approach...

Page 4: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

4

Trusted RecommendationsTrusted Recommendations

For serious life / business decisions, you want the opinion of a trusted expert

• If an expert not personally known, then want to find a reference to one via a chain of friends and colleagues

Referral-chain provides:• Way to judge quality of expert's advice

• Reason for the expert to respond in a trustworthy manner

Finding good referral-chains is slow, time-consuming, but vital

• business gurus on “networking”

Page 5: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

5

Example TasksExample Tasks

• You are an associate editor for JAIR. Find a reviewer for a paper that claims new results on “expander graphs”.

• You are considering transferring to a different division of your company. Is that division head a good guy to work for?

• You are putting together a project team to launch a new internet service. Who in your company should you tap for expertise on image compression?

Page 6: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

6

ReferralWebReferralWeb

Set of all possible referral-chains = a social network

System for modeling, visualizing, and searching social networks

• in a company

• in an e-community

• in the WWW as a whole

Integrates IR search with a model of personal connections

Page 7: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

7

Social NetworksSocial Networks

Social network model specifies:• Who knows who

• Who knows what

How to create?• Ask users to register with system and provide lists

of contacts and interests– sixdegrees, 6DOS, Firefly, Whowhere?

• High startup cost

• Incomplete, out of date, untrustworthy information

• Best experts will actively avoid– a network of the lonely and disenfranchised?

Page 8: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

8

Mining Social NetworksMining Social Networks

Alternative: automatically generate network models from pre-existing data

• Email logs (not)

• Bibliographic databases

• Corporate records of organizational structure, project teams, in-house documents

• Arbitrary web pages– personal web pages more accurate / up to date than

official corporate records!

Can extract evidence for both relationships and expertise

Page 9: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

9

Discovering NamesDiscovering Names

Proper name extraction• Can accurately identify names in arbitrary

documents

• Frequency of co-occurrence of names can be quickly determined using IR search engines

Canonizing names• John Zack, J. C. Zack, Jim Zack

• Match names / initials / nicknames as long as unambiguous

– closed world assumption

• Improvement: use context– “Henry A. Kautz” matches “Harry A. Kautz”

if both strongly linked to “Bart Selman”

Page 10: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

10

Disambiguating NamesDisambiguating Names

Problem: different individuals with the same name

Observation: Within even large organizations the vast majority (90%+) of full names are unique

• 3,000 employees in R&D at AT&T

• 10,000 research scientists in AI, NL, and theory

For medium size networks - considered as noise• Key interface issue: ability to explain each link in

path to users

Further scaling: name + additional context

Page 11: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

11

User ProfilesUser Profiles

Manually-entered profiles incomplete, impossible to maintain

• impossible in principle to create complete a-priori list of kinds of expertise

Many services today create highly specialized profiles

• your book buying habits

Simple, robust profile: “bag of words” of all documents in which your name appears

• standard IR vector space model to match queries, people

Page 12: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

12

Test NetworksTest Networks

1. Proof of concept: 1,000 node network • Created by combination of web crawling and Altavista

queries, centered on a professor at M.I.T.

• Test group of users could usually find experts on given topics

– but small size of network led to distant referrals

2. 10,000 Researchers in AI, Theory, and NL• Based on 30,000 bibliography entries from high-

quality conferences– AAAI, STOC, FOCS, ACL...

• links between co-authors (not citations)

• http://www.research.att.com/kautz/referralweb

• “paper-reviewer finder”

Page 13: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

13

Exploring the NetworkExploring the Network

Page 14: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

14

Who can I ask to review a paper on “expander graphs”?Who can I ask to review a paper on “expander graphs”?

Page 15: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

15

Experts on Expander GraphsExperts on Expander Graphs

Page 16: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

16

Paths to ExpertsPaths to Experts

Page 17: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

17

Request Details on FriezeRequest Details on Frieze

Page 18: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

18

Frieze Home PageFrieze Home Page

Page 19: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

19

ObservationsObservations

Quickly found short chains to experts• Could not be found using IR search alone

User can select chain that is most likely to succeed

• Do not want to bother busiest, most famous experts with every request

Chains cross disciplines• Kautz - AI

• Kearns - AI, Machine Learning

• Blum - Machine Learning, Theory

• Frieze - Theory, Mathematics

Useful tool for strengthening ties both within and between communities

Page 20: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

20

Why Does it Work?Why Does it Work?

The Small World PhenomenaMilgram (1967) - any two individuals in the U.S.A. are

linked by a chain of 6 or fewer first-name acquaintances

– “6 degrees of separation”

– Erdös numbers

– “6 degrees of Kevin Bacon”

But• No formal model to explain short paths!

• Due to high average degree?

• True for acquaintances or co-stars, but false for our computer science co-author database!

– 100’s versus 61 versus 4.28!

Page 21: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

21

Small-world NetworksSmall-world Networks

Due to randomness?• Random graphs have short average path lengths

• But social networks are not random– nodes are highly clustered (many cliques)

– random graph model predicts that high clustering corresponds to long average paths!

Better model: Small-world networks• Idea: a highly structured (clustered) network with

just a few random links (Watts & Stogatz, 1998)

• Result: high clustering + short paths!

• Random edges correspond to shortcuts– direct relationships between people who primarily

participate in different sub-communities

Page 22: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

22

Small-world vs. Random NetworksSmall-world vs. Random Networks

Clustering Coefficient = Average value of C(n) over all nodes, where

2) of neighbors ofnumber (

) of neighborsbetween edges ofnumber ()(

n

nnC

Size Avg Avg Path ClusteringDegree Length Coeff.

CS Co-authors 8,070 4.28 7.9 0.72Random 1 8,070 4.28 6.4 0.072

Film Co-stars 225,000 61 3.65 0.79Random 2 225,000 61 2.99 0.00027

Neurons 282 14 2.66 0.28Random 3 282 14 2.25 0.05

Page 23: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

23

Corporate CommunitiesCorporate Communities

Finding good internal experts a strategic business problem

• “intellectual assets” worthless if not consulted!

AT&T: 170,000 employees, 3,000 in the R&D community

• How to build a project team?

• What R&D people to consult for a new business venture?

• What business people to contact about a new technological breakthrough?

In practice: successful projects based on grassroots cross-organizational networking

Page 24: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

24

Modeling the AT&T Corporate Network

Modeling the AT&T Corporate Network

Model integrates information from• Official organizational charts (online)

• Personal web pages (+ crawling)

• External publication databases

• Internal technical document databases

Informal structure will prove vital for• finding shorter paths to experts

• finding people who can reliably evaluate experts

• synergy between official and unofficial channels

Page 25: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

25

Who can tell me about the Director of Speech Processing research at AT&T?

Who can tell me about the Director of Speech Processing research at AT&T?

Page 26: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

26

Paths With All Link TypesPaths With All Link Types

Page 27: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

27

Filtering link typesFiltering link types

Page 28: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

28

Paths With Only Organizational LinksPaths With Only Organizational Links

Page 29: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

29

Paths With Only Web/Article LinksPaths With Only Web/Article Links

Page 30: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

30

ObservationsObservations

Official company hierarchy only a sparse subset of the corporate social network

Shortest (and often best) paths involve a combination of official and unofficial links

• Conditions for trust and evaluation may greatly differ

• Global social network is the union of many different kinds of sub-networks

Search greatly aided when user can choose different views of the network

+ types of edge

+ strength of edge

Page 31: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

31

Who can help out my project with some great image compression software?

Who can help out my project with some great image compression software?

Page 32: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

32

A Note on BelievabilityA Note on Believability

Observation: the recommendations made by (any) recommender system tend to be either astonishingly accurate, or absolutely ridiculous

• true for any AI-complete problem

How can a recommender system be trusted enough for “serious” use?

• Make system transparent: able to explain its reasoning

• indicate to user where the data is ambiguous

• Any link or node can be explained by viewing the data on which it is based

Page 33: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

33

Checking the Expert’s ExpertiseChecking the Expert’s Expertise

Page 34: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

34

Checking the Reason for an EdgeChecking the Reason for an Edge

Page 35: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

35

Verifying the Edge ContextVerifying the Edge Context

Page 36: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

36

SummarySummary

Many uses of recommender system require connecting people to people, not just providing “oracular” advice

• Find people, not just documents - access to information that may not even be online!

• Help users evaluate quality of information

Need to automatically model existing, real-world communities

• Cannot require everyone to sign up in advance!

• Can improve and strengthen the “weak ties” that are crucial for effective organizations

ReferralWeb: a tool for generating and searching social networks

Page 37: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

37

Status and Future WorkStatus and Future Work

ReferralWeb• Version 2.0 for the Computer Science research

communityhttp://www.research.att.com/~kautz/referralweb

• Corporate version undergoing trials in AT&T Labs

Current research topics• Automatic clustering - discovery of sub-

communities

• Combining uncertain information

• Scale-up to WWW-size communities

• Analysis of more accurate formal models of small-world networks

– accurately predict search performance

Page 38: Creating Models of Real-World Communities with ReferralWeb Henry Kautz University of Washington Bart Selman Cornell University

38

BibliographyBibliography

• Kautz, H., Selman, B. & Shah, M. 1997. The Hidden Web. AI Magazine 18(2): 27-36.

• Milgram, S. 1967. The Small-World Problem. Psychology Today 1(1): 60-76.

• Resnick, P., ed. 1996. Special Section on Recommender Systems. Communications of the ACM 30(3).

• Watts, D. & Stogatz, S. 1998. Collective dynamics of ‘small-world’ networks. Nature 393: 440-442.