comprehensive exampeople.cs.pitt.edu/~wex12/comp/slides.pdf · 2011-10-13 · – multiple-aspect...

Peer-review analysis

Comprehensive exam

Presentered by : Wenting Xiong

Committees: Diane Litman Rebecca Hwa Jingtao Wang

1

Motivation •  Goal

Mine useful information in peers’ feedback and represent them in a intuitive and concise way

•  Tasks and related research topics –  Identify review helpfulness

NLP – Review analysis

–  Summarize reviewers’ comments NLP – Paraphrasing and Summarization

–  Sense-making of review comments interactive review exploration HCI – Visual text analytics

2

Part.1

NLP -- Review Analysis

3

Outline 1.  Review helpfulness analysis

2.  Sentiment analysis (opinion mining) Aspect detection Sentiment orientation Sentiment classification & extraction

4

1 Review helpfulness analysis 1. Automatic prediction

–  Learning techniques –  Features utilities –  The ground-truth

2. Analysis of perceived review helpfulness –  Users’ bias when vote for helpfulness –  Influence of the other reviews of the same product

5

1.1 -- Learning techniques •  Problem formalization

–  Input: textual reviews –  Output: helpfulness score

•  Learning Algorithms –  Supervised learning – Regression

•  Product reviews (e.g. electronics) <Kim 2006>, <Zhang 2006>, <Liu 2007>,<Ghose 2010>, <O'Mahony 2010> •  Trip reviews <Zhang 2006> •  Movie reviews <Zhang 2006>

–  Unsupervised learning – Clustering •  Book reviews <Tsur 2009>

•  Focus –  Predict absolute scores VS. rankings –  Identify most helpful <Liu 2007> vs. unhelpful <Tsur 2009>

6

1.1-- Feature utilities •  Features used to model review helpfulness

–  Controversial results about the effectiveness of subjectivity features

•  term-based counts not useful <Kim, et. al, 2006>, category-based count shows positive words correlate with greater helpfulness <Ghose, et. al, 2010>

–  Data sparsity issues?

Category Feature type

Linguistic

Unigrams, bigrams Low level Structural Syntactic Semantic: 1) domain lexicons 2) Subjectivity

Sentiment analysis

Readability metrics High level

Social factors Reviewer profile Product ratings

7

1.1 --The ground-truth •  Various gold-standard of review helpfulness

–  Aggregated helpfulness votes Perceived helpfulness e.g. <Kim 2006>

–  Manual annotations of helpfulness Real helpfulness <Liu 2007>

•  Problems Percentage of helpful votes is not consistent with annotators

judgments based on helpfulness specifications Error rate of preference pair < 0.5 <Liu 2007>

8

1 Review helpfulness analysis 1. Automatic prediction

–  Learning techniques –  Features utilities –  The ground-truth

2. Analysis of perceived review helpfulness –  Biased voting of review helpfulness on Amazon.com –  The perceived helpfulness is not only determined by the textual content

9

1.2 Analysis of perceived review helpfulness

•  Biased voting of review helpfulness on Amazon.com –  Imbalanced vote – Winner Circle bias –  Early bird bias <Liu 2007>

 “x/y” does not capture the true helpfulness of reviews

•  The perceived helpfulness is not only determined by the textual content

–  Influence of the other reviews of the same product –  Individual bias <Danescu-Niculescu-Mizil 2009>

10

1 Review helpfulness analysis

•  Summary –  Effective features for identify review helpfulness –  Perceived helpfulness VS. real helpfulness

•  Comments – New features

•  Introduce domain knowledge and information from other dimensions

– Data sparsity problem •  High-level features •  Deep learning from low-level features

– Other machine learning techniques •  Theory-based generative models

11

Outline 1.  Review helpfulness analysis

2.  Sentiment analysis (opinion mining)

12

2 Sentient analysis (opinion mining)

How people think about what? 1.  Aspect detec,on 2.  Sen,ment orienta,on 3.  Sen,ment classifica,on & extrac,on

13

2.1 Aspect detection

• Frequency-based approach – Most frequent noun-phrase + sentiment-pivot expansion <Liu, 2004> –  PMI (pointwise Mutual information) with meronymy discriminators + WordNet <Popescu 2005>

• Generative approach –  LDA, MG-LDA <Titov 2008>, sentence-level local-LDA <Brody 2010> –  Multiple-aspect sentiment model <Titov 2008> –  Content-attitude model <Sauper 2011>

14

2.2 Sentiment orientation •  Aggregating from subjective terms

–  Manually constructed subjective lexicons

•  Bootstrapping with PMI –  Adj & adv <Turney 2001> –  opinion-bearing words <Liu 2004>

•  Graph-based approach –  Relaxiation labeling <Popescu 2005> –  Scoring <Brody 2010>

•  Domain adaptation –  SCL algorithm <Blitzer 2007>

•  Through topic models –  MAS -- aspect-independent + aspect-dependent <Titov 2008> –  Content-attitude models -- predicted posterior of sentiment distribution <Sauper, 2011>

15

2.3 Sentiment classification and extraction •  Classification

–  Binary <Turney 2001> –  Finer-grained e.g. metric labeling <Pang 2005>

•  Data sparsity –  Bag-of-Words vs. Bag-of-Opinions <Qu 2010>

•  Opinion-oriented extraction –  Topic of interest

•  Pre-defined •  Automatically learned •  User-specified

16

2 Summary Comparing reviews’ helpfulness and sentiment •  In terms of automatic prediction, both are metric inferring

problem, that can be formalized as standard ML problems with same input X though different output Y

•  The learned knowledge about opinion topics and the associated sentiments would help model the general utility of reviews

17

Part.2

NLP -- Paraphrasing & Summarization

18

Outline

1.  Paraphrasing Paraphrases are semantically equivalent with each other 1.  Paraphrase recognition 2.  Paraphrase generation

2.  Summarization Shorter representation of the same semantic information of the input text

1.  Informativeness computation 2.  Extracted summarization of evaluative text

19

1.1 Paraphrase recognition

• Discriminative approach – Various string similarity metrics – Different level of abstrac,on of textual strings

<Malakasiotis 2009>

Ques%on: Any useful exis6ng resourses for iden6fying equivalent seman6c informa6on? •  Word-‐level: dic,onary, WordNet •  Phrase-‐level: ? •  Sentence-‐level: ?

20

1.2 Paraphrase generation

• Corpora – Monolingual vs. bilingual

• Methods – Distributional-similarity based – Corpora based

• Evaluation – Intrinsic evaluation vs. extrinsic evaluation

21

1.2 -- Corpora

• Monolingual corpora –  Parallel corpora

•  Translation candidates •  Definitions of the same term

– Comparable corpora •  Summary of the same event •  Documents on the same topic

• Bilingual parallel corpora

22

1.1 -- Methods.1 • Distributional-similarity based methods

– DIRT, paths frequently occur with same words at their ends

•  Using a single monolingual corpus •  MI to measure association strength between slot and its arguments <Lin 2001>

–  Sentence-lattices, argument similarity of multiple slots on sentence-lattices

•  Using a comparable monolingual corpus •  Hierarchical clustering for grouping similar sentences •  MSA to induce lattices <Barzilay 2003>

23

1.2 -- Methods.2 • Corpora-based methods

– Monolingual parallel corpus •  Monolingual MT <Quirk 2004> •  Merging partial parse trees FSA <Pang 2003> •  Paraphrasing from definitions <Hashimoto 2011>

–  Monolingual comparable corpus

•  MSR paraphrase corpus <Dolan 2005> •  Edit distance, Journalism convention

•  Sentence-lattices <Barzilay 2003>

– Bilingual parallel corpus •  Pivot approach <Callison-Burch 2005> <Zhao 2008> •  Random-walk based HTP <Kok 2009>

24

1.2 -- Evaluation •  Intrinsic evaluation

–  Responsiveness •  Can access precision, but no recall

–  Standard test references <Callison-Burch 2008> •  Manually aligned corpus •  Lower bound precision & relative recall

• Extrinsic evaluation –  Alignment tasks in monolingual translation

•  Alignment error rate •  Alignment precision, recall, F-measure <Dolan 2004>

• Model-specific evaluation – FSA <Pang 2005>

25

2 Summarization Tasks in automatic summarization

I.  Content selection II.  Information ordering III.  Automatic editing, information fusion

Focus of this talk -- 1.  Informativeness computation 2.  Information selection (and generation) 3.  Summarization evaluation

26

2.1 Computing informativeness •  Semantic information (Topic identification)

– Word-level •  Frequency, TFIDF <Liu 2004>, Topic signature <Lin 2001>, PMI(w, topic) <Wang 2011>, external domain knowledge <Zhuang 2006>

–  Sentence-level •  HMM content models <barzilay 2004> •  Category classification + sentence clustering <Abu-Jbara 2011>

–  Summary-level •  Sentiment-aspect match model + KL divergence <Lerman 2009>

•  Opinion-based sentiment scores for evaluative texts •  Sentiment polarity, intensity, mismatch, diversity <Lerman 2009>

•  Discriminative approach to predict informativeness •  Combine statistic, semantic, sentiment features in linear or log-linear models <wang 2011>

27

2.2 Information selection & generation • Extraction

–  Rank-based sentence selection •  Aggregation of word informative weights (+ discourse features) <Carenini, 2006> <Wang, 2011> •  Optimized by Maximal Marginal Relevance

–  Topic-based selection •  HMM content model <Barzilay, 2004> •  Languge-model based clustering of informative phrases <Liu, 2010> •  Summarize citations based on category-cluster-setence <Abu-Jbara, 2011>

–  Structured evaluative summary •  Aspect + overall rating <Hu, 2004> •  Aspect + pos and cons <Zhuang, 2006> •  Hierarchical aspects + sentiment phrasal expressions <Liu 2010>

• Abstraction

–  Generate evaluative arguments based on aggregation of extracted information <Carenini, 2006> –  Graph-based summarization using adjacently matrix to model dialogue structure <Wang, 2011>

28

2.3 Summarization evaluation •  Pyramid (empirical)

–  Multiple human wrote gold-standards –  SCU <Ani 2007>

•  ROUGE –  Automatically compare with gold-standard –  Consider correlation based on unigram, bigram, longest common subsequence <Lin 2004>

•  Fully automatic –  Good summary should be similar to the input –  KL divergence, JS divergence <Ani 2009>

Manual summary

Manual rating

Responsiveness

✔ ✔

Pyramid ✗ ✔ ROUGE ✔ ✗ Fully auto ✗ ✗

  User preference of sentiment summarizer

Paraphrasing and summarization -- Summary •  Common theme

–  Semantic equivalence

•  Related to sentiment analysis in computing informativeness of reviews

–  Aspect-dependent sentiment orientation •  Overall vs. distribution statistics

–  Aspect coverage •  Compute through scoring or measuring probabilistic model's distribution divergence

30

Part. 3 HCI -- Visual text analytics

31

Outline

1.  Text visualization 1.  Inner-set visualization for abstraction 2.  Intra-set visualization for comparison

2.  Interactive exploration 1.  Design principles and examples

32

1 Text visualization

•  Inner-set visualization for abstraction – Semantic information – Sentiment information (opinions)

•  Intra-set visualization for comparison

33

1.1 Inner-set visualization techniques • Semantic information

– Original text with highlighted keywords •  Most detailed information

– Topic-based representation •  List of target entities (Jigsaw, <Stasko 2010>) •  Haystack (Themail, <Viegas 2006>) •  Tagcloud (OpinionSeer <Wu 2010>), TIARA <Liu 2009>, reviewSpotlight <Yatani, 2011>)

– Vector-based representation •  Dot in space (ThemeScapes <Wise 1995>)

34

1.1 Inner-set visualization techniques • Sentiment information

– Value-based visual representation •  Bar -- Opinion polarity and intensity <Liu 2005> •  Histogram -- Rating distribution <Carenini 2006> •  Double-square -- Frequency, polarity, intensity <Oelke 2009> •  Thumbnail table -- opinion report for people in groups <Oelke 2009>

Comment: – Requires NLP techniques for opinion mining and sentiment analysis

•  e.g. Intelligence support for identify salient information for exploration (Aspect that opinions are most (dis)consisitant) <Carenini 2006> 35

1 Text visualization

•  Inner-set visualization for abstraction – Semantic information – Sentiment information (opinions)

•  Intra-set visualization for comparison – Dimensionality of comparison

•  Via layout or visualizing metadata as axis

36

1.2 Intra-set visualization techniques

• Dimensionality of exploration – 1D: layout or metadata – 2D: layout or/and metadata – 3D & 3D+: layout or/and metadata

37

1.2 Intra-set visualization -- 1D Exploration •  Side-by-side

–  Compare single product reviews feature-by-feature <Liu 2005> –  Connect interesting events of different period of times (Continuum, <Andre 2007>) –  Explore the connection of entities across documents (Jigsaw, <Stasko 2010>)

•  Grid-layout of data in groups –  Faceted metadata for image browsing <Yee 2003> –  Facetbox for presenting filtering by facet-data <Lee 2009> –  Exploring term-based language patterns across document <Don 2007>

•  Timeline -- temporal features –  Themail <Viegas 2006>, Contitunn <Andre 2007> Tiara <Liu 2009>, TwitInfo <Marcus 2011> etc.

38

1.2 Intra-set visualization -- 2D Exploration •  Aspect-based opinion analysis across multiple targets

–  Paired <Liu 2005> –  Matrix <Orlke 2009>

•  Scatter plot of targets with metadata as axis –  Discover the entity-coverage in documents (Jigsaw <Stasko 2010>) –  Visual DL search result with categorical and hierarchical axes <Shneiderman 2000>

•  2D graph (layout) –  Exploring relationships between entities and documents (Jigsaw <Stasko 2010>) –  *Diagram of social network (TIARA <Liu 2009>)

•  Spatial representation in 2D space –  Triangle scatter-plot of opinions (OpinionSeer <Wu 2010>) –  *Opinion space <Faridani 2010>

•  Circled correlation map of review aspects <Orlke 2009> 39

1.3 Intra-set visualization -- 3D Exploration •  3D-spacial representation

–  ThemeScapes <Wise 1995> •  Theme strength as elevation (terrain map)

•  Combine multiple visualization of metadata variables –  OpinionSeer <Wu 2010>

•  Radial visualization with co-centric rings + stacked graph + triangle scatter plot

–  TIARA <Liu 2010> •  Stacked topic-models (Wordcloud)

over timeline

Pos – Discover unperceivable interactions among multiple factors

Cons – Concise but hard to interpret – Interaction is more complex and hard to design 40

2 Interactive exploration Design principles and examples •  Data on-demand and in-depth exploration

From the data perspective – Overview then detailed view

From the interaction perspective –  zoom-in and zoom-out for exploration – Hierarchic filtering for search and browse – Detail information as tooltip in explanatory visualization

•  Support exploration of multiple interest – View switching for interest-specific visualization techniques – Query-based content browsing – Pivot action for navigating between related items

•  Context preserving – Overview + detailed view – Support local interactions (hierarchically structured data) – A view of selection history of browsing 41

Visual text analytics -- summary To conclude •  Text visualization construct the semantic mapping

between the text and visual variables

•  Visualize metadata together with textual information for comparison and exploration

•  Interaction design should follow human's intuition of data exploration –  Data characteristics –  Inherited connection between data and metadata

42

Visual text analytics -- Connection between NLP and HCI •  NLP help visual analytic in extracting the target information and organize them in a desired way

•  Visual analytic provide exploratory tool for text analysis and opinion mining

•  Poses challenges to NLP in terms of both new corpora and interesting problems

43

Conclusion In terms of my own research interest •  Review analysis

–  How to model the real helpfulness of peer-reviews

•  Paraphrasing and summarization –  How to identify common themes and aggregate comments from different reviewers

•  Visual text analytic –  How to create informative representation of reviews –  And design intuitive interactive-exploration for students or teachers to mind useful information

Challenges and contributions •  Theory-based high level information of usefulness •  Summary-style paraphrasing •  Visualize connection between opinions with detailed semantic information in context

44