1 fusion approach to finding opinions in blogosphere kiduk yang, ning yu, alejandro valerio, hui...
TRANSCRIPT
![Page 1: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/1.jpg)
1
Fusion Approach to Finding Opinions in Blogosphere
Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke
Web Information Discovery Integrated Tool Laboratory (WIDIT)
Indiana University
Assistant Professor of Information Science
ICWSM 2007
拿 WIDIT in TREC-2006 Blog track 的內容來當 Paper ,圖表都一樣、多了 ReferenceWIDIT 在 Blog track 的 Spam 處理上也是第一名
![Page 2: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/2.jpg)
2/21
WIDIT’s Fusion Approach
adapt a topical retrieval system for opinion retrieval task apply existing system to retrieve blogs about a target (i.e.,
on-topic retrieval) 基本 IR optimize on-topic retrieval to address the challenges of
short queries 作 Rerank identify opinion blogs by leveraging evidences of
subjectiveness/opinion (i.e., opinion identification) 加料 Research Question
what the evidences of opinion are and how they can be leveraged to retrieve opinionated blogs
![Page 3: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/3.jpg)
3/21
Sources of Evidence
Opinion Lexicon a set of terms often used in expressing opinions (e.g., “Skype sucks”, “Skype rocks”, “Skype is cool”).
Opinion Collocations: contextual evidence collocations used to mark adjacent statements as opinions
(e.g., “I believe God exists”, “God is dead to me”) Opinion Morphology
When expressing strong opinions or perspectives, people often use morphed word form for emphasis
(“Skype is soooo buggy”, “Skype is bugfested”).
![Page 4: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/4.jpg)
4/21
![Page 5: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/5.jpg)
5/21
Related Works It is still arguable whether hyperlinks are good indicators
of subjective affiliations 許多研究著墨在 product, customer review 上 Wiebe 等學者關於 subjectivity 的研究 WIDIT 作 IR 的系統
![Page 6: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/6.jpg)
6/21
(1/4) Initial Retrieval – term indexing Hyphenated words were split into parts removing markup tags and stopwords
words in a standard stopword list non-alphabetical words words consisting of more than 25 or less than 3
characters words that contain 3 or more repeated characters
a modified version of the simple plural remover
acronyms and abbreviations were kept
![Page 7: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/7.jpg)
7/21
(2/4) Initial Retrieval – incremental indexing to scale up to large collections
to index the document collection in fixed-size subcollections
to searched in parallel collection term statistics
derived after the creation of the subcollections subcollection retrieval results can simply be
merged without any need for retrieval score normalizations
![Page 8: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/8.jpg)
8/21
(3/4) Initial Retrieval – query indexing identify nouns and noun phrases expand acronyms and abbreviations extract non-relevant portion of topic
descriptions with which to formulate various expanded versions of the query
query expansion submodules
![Page 9: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/9.jpg)
9/21
(4/4) Initial Retrieval - Retrieval Vector Space Model the SMART length-normalized term weights
Term k 和document i 的分數
probabilistic model the Okapi BM25
formula
![Page 10: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/10.jpg)
10/21
On-Topic Retrieval Optimization Rerank based on a set of topic-related reranking
factors Exact Match, exact query string occurrence Proximity Match, padded query string occurrence Noun Phrase Match Non-Rel Match
Steps Compute topic reranking scores for each of top N results Categorize the top N results into reranking groups
designed to preserve initial ranking while appropriate rank-boosting for a given combination of reranking factors
Boost the rank of documents using reranking scores within groups
![Page 11: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/11.jpg)
11/21
Opinion Identification
Opinion Term Module frequency of terms that only occur frequently in opinion blogs
Rare Term Module (e.g., “sooo good”) extract low frequency terms from positive training data removed dictionary terms examined them to construct a RT lexicon and regular
expressions identify creative term patterns used in opinion blogs
IU Module ‘I believe’, ‘my assessment’, ‘good for you’ counts the frequency of “padded” IU collocations within
sentence boundary Adjective-Verb Module 判斷 density of Potential
Subjective Elements (PSE) – (next page)
![Page 12: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/12.jpg)
12/21
Adjective-Verb Module
Selection of Potential Subjective Elements 先找 PSE 集合 Expansion of an initial seed set (WordNet, FrameNet
等 ) Good, Bad, Oppose, Agree
Refine the candidates and eliminate ambiguous elements
Classifying Blogs using AVM 根據 PSE 密度作 Decision >0.5, 100% 有意見 <0.2, 100% 沒意見
![Page 13: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/13.jpg)
13/21
Fusion
the multiple sets of search results after retrieval time on the assumption that documents with higher overlap
are more likely to be relevant scores weighted with the relative contributions of the
fusion components ( 靠 training)
Weighted Sum
Overlap WS
Weighted OWS
![Page 14: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/14.jpg)
14/21
Dynamic Tuning bio-feedback 技術 , 協助人工判斷 local optimum 在哪
![Page 15: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/15.jpg)
15/21
Experiment
2006 TREC blog test collection 50 topics, (title, destrcription, narrative),
12/2005~2006, 100,649 feeds (38G), 2.8m permalinks (75G), 325,000 homepages (20GB)
系統對每 Topic 回答 1000 個結果 +Topic Reranking +Opinion Rerenking
![Page 16: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/16.jpg)
16/21
Results
mean average precision (MAP) the precision at rank where relevant item is retrieved averaged
over topics Mean R-precision (MRP)
the precision at rank same as the total number of relevant items averaged over topics
precision at rank N (P@N)
![Page 17: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/17.jpg)
17/21
Query Length Effect 傳統上 query 越長越好 ( 全用 > 只用 title) 有例外就是有 noise Rerank 可以改善
![Page 18: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/18.jpg)
18/21
Topic Reranking Effect
![Page 19: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/19.jpg)
19/21
![Page 20: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/20.jpg)
20/21
Rerenk 後再 Tune 就不顯著
Short,Topic Short, Opinion Long,Topic Long, Opinion Fusion,Topic Fusion, Opinion
![Page 21: 1 Fusion Approach to Finding Opinions in Blogosphere Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke Web Information Discovery Integrated](https://reader030.vdocuments.net/reader030/viewer/2022032722/56649f445503460f94c64f4a/html5/thumbnails/21.jpg)
21/21
Fusion Effect - Conclusion
Fusion 約提升 20%