1 research on intelligent text information management chengxiang zhai department of computer science...

67
1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Statistics University of Illinois, Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai, [email protected] s joint work with Xuehua Shen, Bin Tan, Qiaozhu Mei, Yue Lu, Hongning, Vinod and other members of the TIMan group

Upload: pauline-watson

Post on 05-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

1

Research on Intelligent Text

Information Management ChengXiang Zhai

Department of Computer Science

Graduate School of Library & Information Science

Institute for Genomic Biology

Statistics

University of Illinois, Urbana-Champaign

http://www.cs.uiuc.edu/homes/czhai, [email protected]

Contains joint work with Xuehua Shen, Bin Tan, Qiaozhu Mei, Yue Lu, Hongning, Vinod, and other members of the TIMan group

Page 2: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

2

Search

Text

Filtering

Categorization

Summarization

Clustering

Natural Language Content Analysis

Extraction

Mining

VisualizationSearchApplications

MiningApplications

InformationAccess

KnowledgeAcquisition

InformationOrganization

Research Roadmap

- Personalized- Retrieval models- Topic map - Recommender

Current focus

-Contextual text mining-Opinion integration-Information quality

Current focus

Web, Email, and Bioinformatics

Entity/Relation Extraction

Page 3: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample Projects

• Optimization of Retrieval Models

• User-Centered Adaptive Information Retrieval

• Multi-Resolution Topic Map for Browsing

• Contextual Text Mining

• Opinion Integration and summarization

• Information Trustworthiness

3

Page 4: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Project 1:Optimization of Retrieval Models

• Content-based matching is a critical component in any search system

• Developed a number of retrieval models to optimize content matching

– Language models: various LMs supporting proximity, word translations, feedback, …

– Axiomatic framework: theoretical analysis of retrieval models

– Recently looking into optimal interactive retrieval and domain-specific retrieval models (feedback, exploitation-exploration tradeoff, medical case retrieval, forum retrieval, … )

4

Page 5: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

5

Project 2:User-Centered Adaptive IR (UCAIR)• A novel retrieval strategy emphasizing

– user modeling (“user-centered”)

– search context modeling (“adaptive”)

– interactive retrieval

• Implemented as a personalized search agent that

– sits on the client-side (owned by the user)

– integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users)

– collaborates with each other

– goes beyond search toward task support

Page 6: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

6

Non-Optimality of Document-Centered Search Engines

As of Oct. 17, 2005

Query = Jaguar

Car

Car

Car

Car

Software

Animal

Mixed results, unlikely optimal for any particular user

Page 7: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

7

The UCAIR Project (NSF CAREER)

Search Engine

“jaguar”

Personalized search agent

WEB

Search Engine

Email

Search Engine

DesktopFiles

Personalized search agent

“jaguar”

...Viewed Web pages

QueryHistory

Page 8: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

8

Potential Benefit of Personalization

Car

Car

Car

Car

Software

Animal

Suppose we know:

1. Previous query = “racing cars” vs. “Apple OS”2. “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days

3. User just viewed an “Apple OS” document

Page 9: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

9

Intelligent Re-ranking of Unseen Results

When a user clicks on the “back” button after viewing a document, UCAIR reranks unseen results to

pull up documents similar to the one the user has viewed

Page 10: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

10

UCAIR Outperforms Google [Shen et al. 05]

Ranking Method

prec@5 prec@10 prec@20 prec@30

Google 0.538 0.472 0.377 0.308

UCAIR 0.581 0.556 0.453 0.375

Improvement 8.0% 17.8% 20.2% 21.8%

Precision at N documents

UCAIR toolbar available at http://sifaka.cs.uiuc.edu/ir/ucair/

Page 12: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Ongoing Work

• UCAIR system

• Recommendation and advertising on social networks

12

Page 13: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Project 3: Multi-Resolution Topic Map for Browsing

• Promoting browsing as a “first-class citizen”

• Multi-resolution topic map for browsing

– Enable a user to find information through navigation

– Very useful when a user can’t formulate effective queries or uses a small screen device

• Search log as information footprints

– Organize search log into a topic map

– Allow a user to follow information footprints of previous users

– Enable social surfing 13

Page 14: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Querying vs. Browsing

14

Page 15: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Information Seeking as Sightseeing

• Know the address of an attraction site?

– Yes: take a taxi and go directly to the site

– No: walk around or take a taxi to a nearby place then walk around

• Know what exactly you want to find?

– Yes: use the right keywords as a query and find the information directly

– No: browse the information space or start with a rough query and then browse

15

When query fails, browsing comes to rescue…

Page 16: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

16

Current Support for Browsing is Limited

• Hyperlinks

– Only page-to-page

– Mostly manually constructed

– Browsing step is very small

• Web directories

– Manually constructed

– Fixed categories

– Only support vertical navigation

ODP

Beyond hyperlinks?

Beyond fixed categories?

How to promote browsing as a “first-class citizen”?

Page 17: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sightseeing Analogy Continues…

17

Page 18: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

18

Topic Map for Touring Information Space

auto

car

insurance

carsrental loan

car::used

car::blue+bookcar::rental

car::pictures

car::parts

enterprise+car+rental alamo+car+rentalnational+car+rental

exotic+car+rentaladvantage+car+rental

rental::boat

Level 3

Level 2

Level 1

0.05 0.0

3

0.03

0.02 0.0

1

Zoom in

Zoom outHorizontal

navigation

Topic regio

ns

Multiple resolutions

Page 19: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

19

Topic-Map based Browsing

Multi-ResolutionTopic Map

Topic Region

Querying

Current Position

Parents

Horizontal Neighbors

Demo

Page 20: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

How can we construct such a multi-resolution topic map?

Multiple possibilities…

20

Page 21: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

21

Search Logs as Information Footprints

User 2722 searched for "national car rental" [!] at 2006-03-09 11:24:29

User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://www.valoans.com)

User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://benefits.military.com)

User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://www.avis.com)

User 2722 searched for "enterprise rent a car" [!] at 2006-04-05 23:37:42 (found http://www.enterprise.com)

User 2722 searched for "meineke car care center" [!] at 2006-05-02 09:12:49 (found http://www.meineke.com)

User 2722 searched for "car rental" [!] at 2006-05-25 15:54:36

User 2722 searched for "autosave car rental" [!] at 2006-05-25 23:26:54 (found http://eautosave.com)

User 2722 searched for "budget car rental" [!] at 2006-05-25 23:29:53

User 2722 searched for "alamo car rental" [!] at 2006-05-25 23:56:13

……

Footprints in information space

Page 22: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

22

Information Footprints Topic Map

• Challenges

– How to define/construct a topic region

– How to control granularities/resolutions of topic regions

– How to connect topic regions to support effective browsing

• Two approaches

– Multi-granularity clustering

– Query editing

Page 23: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Collaborative Surfing

23

Clickthroughs become new footprints

Navigation trace

enriches map structures

New queries become new footprints

Browse logs offer more opportunities

to understand user interests and intents

Page 24: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

24

Project 4:Contextual Text Mining

• Documents are often associated with context (meta-data)

– Direct context: time, location, source, authors,…

– Indirect context: events, policies, …

• Many applications require “contextual text analysis”:

– Discovering topics from text in a context-sensitive way

– Analyzing variations of topics over different contexts

– Revealing interesting patterns (e.g., topic evolution, topic variations, topic communities)

Page 25: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

25

Example 1:Comparing News Articles

Common Themes “Vietnam” specific “Afghan” specific “Iraq” specific

United nations … … …Death of people … … …… … … …

Vietnam War Afghan War Iraq War

CNN Fox Blog

Before 9/11During Iraq war Current

US blog European blog Others

What’s in common? What’s unique?

Page 26: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

26

More Contextual Analysis Questions

• What positive/negative aspects did people say about X (e.g., a person, an event)? Trends?

• How does an opinion/topic evolve over time?

• What are emerging topics? What topics are fading away?

• How can we characterize a social network?

Page 27: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

27

Research Questions

• Can we model all these problems generally?

• Can we solve these problems with a unified approach?

• How can we bring human into the loop?

Page 28: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

28

Documentcontext:

Time = July 2005Location = Texas

Author = xxxOccup. = Sociologist

Age Group = 45+…

Contextual Probabilistic Latent Semantics Analysis ([KDD 2006]…)

View1 View2 View3Themes

government

donation

New Orleans

government 0.3 response 0.2..

donate 0.1relief 0.05help 0.02 ..

city 0.2new 0.1orleans 0.05 ..

Texas July 2005

sociologist

Theme coverages:

Texas July 2005 document

……

Choose a view

Choose a Coverage

government

donate

new

Draw a word from i

response

aid help

Orleans

Criticism of government response to the hurricane primarily consisted of criticism of its response to … The total shut-in oil production from the Gulf of Mexico … approximately 24% of the annual production and the shut-in gas production … Over seventy countries pledged monetary donations or other assistance. …

Choose a theme

Page 29: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

29

Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles)

Cluster 1 Cluster 2 Cluster 3

Common

Theme

united 0.042nations 0.04…

killed 0.035month 0.032deaths 0.023…

Iraq

Theme

n 0.03Weapons 0.024Inspections 0.023…

troops 0.016hoon 0.015sanches 0.012…

Afghan

Theme

Northern 0.04alliance 0.04kabul 0.03taleban 0.025aid 0.02…

taleban 0.026rumsfeld 0.02hotel 0.012front 0.011…

The common theme indicates that “United Nations” is involved in both wars

Collection-specific themes indicate different roles of “United Nations” in the two wars

Page 30: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

30

Spatiotemporal Patterns in Blog Articles

• Query= “Hurricane Katrina”

• Topics in the results:

• Spatiotemporal patterns

Government Response New Orleans Oil Price Praying and Blessing Aid and Donation Personal bush 0.071 city 0.063 price 0.077 god 0.141 donate 0.120 i 0.405president 0.061 orleans 0.054 oil 0.064 pray 0.047 relief 0.076 my 0.116federal 0.051 new 0.034 gas 0.045 prayer 0.041 red 0.070 me 0.060government 0.047 louisiana 0.023 increase 0.020 love 0.030 cross 0.065 am 0.029fema 0.047 flood 0.022 product 0.020 life 0.025 help 0.050 think 0.015administrate 0.023 evacuate 0.021 fuel 0.018 bless 0.025 victim 0.036 feel 0.012response 0.020 storm 0.017 company 0.018 lord 0.017 organize 0.022 know 0.011brown 0.019 resident 0.016 energy 0.017 jesus 0.016 effort 0.020 something 0.007blame 0.017 center 0.016 market 0.016 will 0.013 fund 0.019 guess 0.007governor 0.014 rescue 0.012 gasoline 0.012 faith 0.012 volunteer 0.019 myself 0.006

Page 31: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

31

Theme Life Cycles (“Hurricane Katrina”)

city 0.0634orleans 0.0541new 0.0342louisiana 0.0235flood 0.0227evacuate 0.0211storm 0.0177…

price 0.0772oil 0.0643gas 0.0454 increase 0.0210product 0.0203fuel 0.0188company 0.0182…

Oil Price

New Orleans

Page 32: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

32

Theme Snapshots (“Hurricane Katrina”)

Week4: The theme is again strong along the east coast and the Gulf of Mexico

Week3: The theme distributes more uniformly over the states

Week2: The discussion moves towards the north and west

Week5: The theme fades out in most states

Week1: The theme is the strongest along the Gulf of Mexico

Page 33: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

33

Theme Life Cycles (KDD Papers)

0

0. 002

0. 004

0. 006

0. 008

0. 01

0. 012

0. 014

0. 016

0. 018

0. 02

1999 2000 2001 2002 2003 2004Time (year)

Nor

mal

ized

Str

engt

h of

The

me

Biology Data

Web Information

Time Series

Classification

Association Rule

Clustering

Bussiness

gene 0.0173expressions 0.0096probability 0.0081microarray 0.0038…

marketing 0.0087customer 0.0086model 0.0079business 0.0048…

rules 0.0142association 0.0064support 0.0053…

Page 34: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

34

Theme Evolution Graph: KDDT

SVM 0.007criteria 0.007classifica – tion 0.006linear 0.005…

decision 0.006tree 0.006classifier 0.005class 0.005Bayes 0.005…

Classifica - tion 0.015text 0.013unlabeled 0.012document 0.008labeled 0.008learning 0.007…

Informa - tion 0.012web 0.010social 0.008retrieval 0.007distance 0.005networks 0.004…

……

1999

web 0.009classifica –tion 0.007features0.006topic 0.005…

mixture 0.005random 0.006cluster 0.006clustering 0.005variables 0.005… topic 0.010

mixture 0.008LDA 0.006 semantic 0.005…

2000 2001 2002 2003 2004

Page 35: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

35

Multi-Faceted Sentiment Summary (query=“Da Vinci Code”)

Neutral Positive Negative

Facet 1:Movie

... Ron Howards selection of Tom Hanks to play Robert Langdon.

Tom Hanks stars in the movie,who can be mad at that?

But the movie might get delayed, and even killed off if he loses.

Directed by: Ron Howard Writing credits: Akiva Goldsman ...

Tom Hanks, who is my favorite movie star act the leading role.

protesting ... will lose your faith by ... watching the movie.

After watching the movie I went online and some research on ...

Anybody is interested in it?

... so sick of people making such a big deal about a FICTION book and movie.

Facet 2:Book

I remembered when i first read the book, I finished the book in two days.

Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.

I’m reading “Da Vinci Code” now.

So still a good book to past time.

This controversy book cause lots conflict in west society.

Page 36: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

36

Separate Theme Sentiment Dynamics

“book” “religious beliefs”

Page 37: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

37

Event Impact Analysis: IR Research

vector 0.0514concept 0.0298extend 0.0297 model 0.0291space 0.0236boolean 0.0151function 0.0123feedback 0.0077…

xml 0.0678email 0.0197 model 0.0191collect 0.0187judgment 0.0102rank 0.0097subtopic 0.0079…

probabilist 0.0778model 0.0432logic 0.0404 ir 0.0338boolean 0.0281algebra 0.0200estimate 0.0119weight 0.0111…

model 0.1687language 0.0753estimate 0.0520 parameter 0.0281distribution 0.0268probable 0.0205smooth 0.0198markov 0.0137likelihood 0.0059…

1998

Publication of the paper “A language modeling approach to information retrieval”

Starting of the TREC conferences

year1992

term 0.1599relevance 0.0752weight 0.0660 feedback 0.0372independence 0.0311model 0.0310frequent 0.0233probabilistic 0.0188document 0.0173…

Theme: retrieval models

SIGIR papersSIGIR papers

Page 38: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

38

Topic Modeling + Social Networks

Authors writing about the same topic form a community

Topic Model Only Topic Model + Social Network

Separation of 3 research communities: IR, ML, Web

Page 39: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

39

Next Step in Contextual Text Mining

• Combining contextual text analysis with visualization

• More detailed semantic modeling (entities, relations,…)

• Integration of search and contextual text analysis to develop an analyst’s workbench:

– Interactive semantic navigation and probing

– Synthesis of information/knowledge

– Personalized/customized service

Page 40: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

40

Project 5:Opinion Integration and Summarization• Increasing popularity of Web 2.0 applications

– more people express opinions on the Web

190,451 posts

4,773,658 results

How to digest all?

Page 41: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

41

4,773,658 results

Motivation:Two kinds of opinions

Expert opinions•CNET editor’s review•Wikipedia article•Well-structured•Easy to access•Maybe biased•Outdated soon

190,451 posts

Ordinary opinions•Forum discussions•Blog articles•Represent the majority•Up to date•Hard to access•fragmental

How to benefit from both?

Page 42: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

42

Problem Definition

cute… tiny… ..thicker..last many hrs

die out soon

could afford it

still expensive

DesignBatteryPrice..

DesignBatteryPrice..

Topic: iPod

Expert review with aspects

Text collection of ordinary

opinions, e.g. Weblogs

Integrated Summary

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…warranty …better to extend..

Revi

ew

Aspe

cts

Extr

a As

pect

s

Similar opinions

Supplementaryopinions

InputOutput

Page 43: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Methods

• Semi-Supervised Probabilistic Latent Semantic Analysis (PLSA)

– The aspects extracted from expert reviews serve as clues to define a conjugate prior on topics

– Maximum a Posteriori (MAP) estimation

– Repeated applications of PLSA to integrate and align opinions in blog articles to expert review

Page 44: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

44

Results: Product (iPhone)• Opinion Integration with review aspects

Review article Similar opinions Supplementary opinions

You can make emergency calls, but you can't use any other functions…

N/A … methods for unlocking the iPhone have emerged on the Internet in the past few weeks, although they involve tinkering with the iPhone hardware…

rated battery life of 8 hours talk time, 24 hours of music playback, 7 hours of video playback, and 6 hours on Internet use.

iPhone will Feature Up to 8 Hours of Talk Time, 6 Hours of Internet Use, 7 Hours of Video Playback or 24 Hours of Audio Playback

Playing relatively high bitrate VGA H.264 videos, our iPhone lasted almost exactly 9 freaking hours of continuous playback with cell and WiFi on (but Bluetooth off).

Unlock/hack iPhone

Activation

Battery

Confirm the opinions from the

review

Additional info under real usage

Page 45: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

45

Results: Product (iPhone)

• Opinions on extra aspects

support Supplementary opinions on extra aspects

15 You may have heard of iASign … an iPhone Dev Wiki tool that allows you to activate your phone without going through the iTunes rigamarole.

13 Cisco has owned the trademark on the name "iPhone" since 2000, when it acquired InfoGear Technology Corp., which originally registered the name.

13 With the imminent availability of Apple's uber cool iPhone, a look at 10 things current smartphones like the Nokia N95 have been able to do for a while and that the iPhone can't currently match...

Another way to activate iPhone

iPhone trademark originally owned by

Cisco

A better choice for smart phones?

Page 46: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

46

Results: Product (iPhone)• Support statistics for review aspects

People care about price

People comment a lot about the unique wi-fi

feature

Controversy: activation requires contract with

AT&T

Page 47: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

47

Summarization of Contradictory Opinions[Kim & Zhai CIKM 09]

Neutral Positive Negative

Facet 1:Movie

... Ron Howards selection of Tom Hanks to play Robert Langdon.

Tom Hanks stars in the movie,who can be mad at that?

But the movie might get delayed, and even killed off if he loses.

Directed by: Ron Howard Writing credits: Akiva Goldsman ...

Tom Hanks, who is my favorite movie star act the leading role.

protesting ... will lose your faith by ... watching the movie.

After watching the movie I went online and some research on ...

Anybody is interested in it?

... so sick of people making such a big deal about a FICTION book and movie.

Facet 2:Book

I remembered when i first read the book, I finished the book in two days.

Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.

I’m reading “Da Vinci Code” now.

So still a good book to past time.

This controversy book cause lots conflict in west society.

How can we help analysts digest and interpret contradictory

opinioons?

Page 48: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Contrastive Opinion Summarization

48

x1

x2

xn

x3

X

x4

x5

y1

y2

ym

y3

Y

y4

Page 49: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Contrastive Opinion Summarization

49

x1

x2

xn

x3

X

x4

x5

y1

y2

ym

y3

Y

y4

u1

u2

uk

v1

v2

vk

Contrastive Opinion Summary

,XU YV

Page 50: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Problem Formulation

50

x1

x2

xn

x3

X

x4

x5

y1

y2

ym

y3

Y

y4

u1

u2

uk

U

v1

v2

vk

V

Contrastiveness

Representativeness

Page 51: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Problem Formulation

51

x1

x2

xn

x3

X

x4

x5

y1

y2

ym

y3

Y

y4

u1

u2

uk

U

v1

v2

vk

V

Contrastiveness

Representativeness

),(1

)(1

i

k

ii vu

kSc

Yy

iki

Xxi

kivy

Yux

XSr ),(max

1),(max

1)(

],1[],1[

Page 52: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Summarization as Optimization

1. Define an appropriate content similarity function Ф

2. Define an appropriate contrastive similarity function ψ

3. Solve the optimization problem efficiently.

52

)),(1

),(max),(max(maxarg

))()1()((maxarg*

1

],1[],1[

k

iii

Yyi

kiXx

ikiS

S

vuk

vyY

uxX

ScSrS

Page 53: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample ResultsNo Positive Negative

1 oh ... and file transfers are fast & easy .

you need the software to actually transfer files

2 i noticed that the micro adjustment knob and collet are well made and work well too.

the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob.

3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving

difficult navigation - i wo n’t necessarily say " difficult ,“ but i do n’t enjoy the scrollwheel to navigate .

4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level.

there are 2 things that need fixing first is the battery life.it will run for 6 hrs without problemswith medium usage of the buttons.

53

Page 54: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample ResultNo Positive Negative

1 oh ... and file transfers are fast & easy .

you need the software to actually transfer files

2 i noticed that the micro adjustment knob and collet are well made and work well too.

the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob.

3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving

difficult navigation - i wo n’t necessarily say " difficult ,“ but i do n’t enjoy the scrollwheel to navigate .

4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level.

there are 2 things that need fixing first is the battery life.it will run for 6 hrs without problemswith medium usage of the buttons.

54

Different polarities of opinions made from different perspectives.

Page 55: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample ResultNo Positive Negative

1 oh ... and file transfers are fast & easy .

you need the software to actually transfer files

2 i noticed that the micro adjustment knob and collet are well made and work well too.

the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob.

3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving

difficult navigation - i wo n’t necessarily say " difficult ,“ but i do n’t enjoy the scrollwheel to navigate .

4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level.

there are 2 things that need fixing first is the battery life.it will run for 6 hrs without problemswith medium usage of the buttons.

55

Positive vs. negativeNot much disagreement

Page 56: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample ResultNo Positive Negative

1 oh ... and file transfers are fast & easy .

you need the software to actually transfer files

2 i noticed that the micro adjustment knob and collet are well made and work well too.

the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob.

3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving

difficult navigation - i wo n’t necessarily say " difficult ,“ but i do n’t enjoy the scrollwheel to navigate .

4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level.

there are 2 things that need fixing first is the battery life.it will run for 6 hrs without problemswith medium usage of the buttons.

56

Judgments revealing detailed conditions

Page 57: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Open Opinion Search System

http://timan1.cs.uiuc.edu/cgi-bin/hkim277/COSDemo/lemur.cgi

57

Page 58: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Latent Aspect Rating AnalysisHow to infer aspect ratings?

Value Location Service …..

How to infer aspect weights?

Value Location Service …..

Page 59: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Solution: Latent Rating Regression Model

Reviews + overall ratings Aspect segments

location:1amazing:1walk:1anywhere:1

0.10.70.10.9

nice:1accommodating:1smile:1friendliness:1attentiveness:1

Term weights Aspect Rating

0.00.90.10.3

room:1nicely:1appointed:1comfortable:1

0.60.80.70.80.9

Aspect Segmentation

Latent Rating Regression

1.3

1.8

3.8

Aspect Weight

0.2

0.2

0.6

Topic model for aspect discovery

+

59

Page 60: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Aspect-Based Opinion Summarization

Page 61: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Reviewer Behavior Analysis & Personalized Ranking of Entities

People like cheap hotels because of good value

People like expensive hotels because of good service

Query: 0.9 value 0.1 others

Non-Personalized

Personalized

Page 62: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Project 6: Information Trustworthiness

• How to assess information quality?

• Solution: trust propagation framework

62

Page 63: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Trust Propagation Framework

63

Claim

Evidence

Source

Page 64: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Sample Result: Trusted Sources on Different Topics

64

Page 65: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Trusted Sources on Different Genres

65

Page 66: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

Toward Next-Generation Search Engines

Bag of words

Search

Keyword Queries

Access

Mining

Task Support

Entities-Relations

Knowledge Representation

Search History

Complete User Model

Current Search Engine

Personalization(User Modeling)

Large-Scale Semantic Analysis(Vertical Search Engines)

Full-Fledged Text Info. Management

66

Page 67: 1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute

67

The End

Thank You!

More information about our research can be found athttp://timan.cs.uiuc.edu/