the 2010 jdpa sentiment corpus for the automotive domain

28
The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University

Upload: jason-kessler

Post on 26-Jun-2015

907 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: The 2010 JDPA Sentiment Corpus for the Automotive Domain

The JDPA Sentiment Corpusfor the Automotive Domain

Miriam Eckert, Lyndsie Clark, Nicolas Nicolov

J.D. Power and Associates

Jason S. Kessler

Indiana University

Page 2: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Overview

• 335 blog posts containing opinions about cars– 223K tokens of blog data

• Goal of annotation project:– Examples of how words interact to evaluate entities– Annotations encode these interactions

• Entities are invoked physical objects and their properties– Not just cars, car parts– People, locations, organizations, times

Page 3: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Excerpt from the corpus

“last night was nice. sean bought me caribou and we went to my house to watch the baseball game …

“… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”

Page 4: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Outline

• Motivating example• Overview of annotation types

– Some statistics

• Potential uses of corpus• Comparison to other resources

Page 5: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a

had agreat a disappointing stereo,

and was

mildly

very grippy. He also considered a

which, while highly had a better

PERSON

Honda Civic.CAR

engine,CAR-PART CAR-PART

stereo.CAR-PART

CARPERSON

BMW

ItCAR

REFERS-TO

pricedCAR-FEATURE

REFERS-TO

Page 6: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a

had agreat a disappointing stereo,

and was

mildly

very grippy. He also considered a

which, while highly had a better

PERSON

Honda Civic.CAR

engine,CAR-PART CAR-PART

stereo.CAR-PART

CARPERSON

BMW

ItCAR

pricedCAR-FEATURE

TARGET TARGET TARGET

TARGET

TARGET

Page 7: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a

had agreat a disappointing stereo,

and was

mildly

very grippy. He also considered a

which, while highly had a better

PERSON

Honda Civic.CAR

engine,CAR-PART CAR-PART

stereo.CAR-PART

CARPERSON

BMW

ItCAR

REFERS-TO

pricedCAR-FEATURE

REFERS-TO

PART-OF PART-OF

FEATURE-OF

PART-OF

Page 8: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a

had a great a disappointing stereo,

and was

mildly

very grippy. He also considered a

which, while highly had a better

PERSON

Honda Civic.CAR

engine,CAR-PART CAR-PART

stereo.CAR-PART

CARPERSON

BMW

ItCAR

pricedCAR-FEATURE

DIMENSION

MORE

LESS

Page 9: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a

had a great a disappointing stereo,

and was

mildly

very grippy. He also considered a

which, while highly had a better

PERSON

Honda Civic.CAR

engine,CAR-PART CAR-PART

stereo.CAR-PART

CARPERSON

BMW

ItCAR

REFERS-TO

PART-OF PART-OF

TARGET TARGET TARGET

TARGET

TARGET

pricedCAR-FEATURE

FEATURE-OF

DIMENSION

MORE

LESS

Entity-level sentiment: positive

Entity-level sentiment: mixed

REFERS-TO

TARGET

Page 10: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Outline

• Motivating example• Overview of annotation types

– Some statistics

• Potential uses of corpus• Comparison to other resources

Page 11: The 2010 JDPA Sentiment Corpus for the Automotive Domain

John recently purchased a Civic. It had a great engine and was priced well.

John

PERSON

Civic It

Entity annotationsREFERS-TO

REFERS-TO

CAR

engine

CAR-PART

• >20 semantic types from• ACE Entity Mention Detection Task• Generic automotive types

priced

CAR-FEATURE

Page 12: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Entity-relation annotationsEntity-level sentiment: Positive

• Relations between entities• Entity-level sentiment

annotations• Sentiment flow between

entities through relations• My car has a great engine.• Honda, known for its high

standards, made my car.

Civic

CAR

engine

CAR-PART

priced

CAR-FEATURE

PART-OF FEATURE-OF

Page 13: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Entity annotation type: statistics

• Inter-annotator agreement• Among mentions 83%• Refers-to: 68%

• 61K mentions in corpus and 43K entities

• 103 documents annotated by around 3 annotators

A1: …Kia Rio…A2: …Kia Rio…

MATCH

A1: …Kia Rio…A2: …Kia Rio…

NOT A MATCH

Page 14: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Sentiment expressions

great engine

highly priced

Prior polarity: positive

Prior polarity: negative

• Evaluations• Target mentions• Prior polarity:

• Semantic orientation given target

• positive, negative, neutral, mixed

… a

highly spec’edPrior polarity: positive

Page 15: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Sentiment expressions

• Occurrences in corpus: 10K

• 13% are multi-word • like no other, get up and go

• 49% are headed by adjectives

• 22% nouns (damage, good amount)

• 20% verbs (likes, upset)

• 5% adverbs (highly)

Page 16: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Sentiment expressions

• 75% of sentiment expression occurrences have non evaluative uses in corpus

• “light”– …the car seemed too light to be safe…– …vehicles in the light truck category…

• 77% sentiment expression occurrences are positive

• Inter-annotator agreement: – 75% spans, 66% targets, 95% prior polarity

Page 17: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Modifiers -> contextual polarityNEGATORS

not a good car

not a very good car

INTENSIFIERSvery good cara

kind of good cara

UPWARD

DOWNARD

NEUTRALIZERS

if goodthe car is

I hope goodthe car is

COMMITTERSsure goodthe car isI am

UPWARD

suspect goodthe car isIDOWNWARD

Page 18: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Other annotations

• Speech events (not sourced from author)– John thinks the car is good.

• Comparisons:– Car X has a better engine than car Y.– Handles a variety of cases

Page 19: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Outline

• Motivating example• Overview of annotation types

– Some statistics

• Potential uses of corpus• Comparison to other resources

Page 20: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Possible tasks

• Detecting mentions, sentiment expressions, and modifiers

• Identifying targets of sentiment expressions, modifiers

• Coreference resolution• Finding part-of, feature-of, etc. relations• Identifying errors/inconsistencies in data

Page 21: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Possible tasks• Exploring how elements interact:

– Some idiot thinks this is a good car.• Evaluating unsupervised sentiment systems or

those trained on other domains• How do relations between entities transfer

sentiment?– The car’s paint job is flawless but the safety record

is poor.• Solution to one task may be useful in solving

another.

Page 22: The 2010 JDPA Sentiment Corpus for the Automotive Domain

But wait, there’s more!

• 180 digital camera blog posts were annotated• Total of 223,001 + 108,593 = 331,594 tokens

Page 23: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Outline

• Motivating example– Elements combine to render entity-level

sentiment

• Overview of annotation types– Some statistics

• Potential uses of corpus• Comparison to other resources

Page 24: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Other resources

• MPQA Version 2.0 – Wiebe, Wilson and Cardie (2005)– Largely professionally written news articles – Subjective expression

• “beliefs, emotions, sentiments, speculations, etc.”

– Attitude, contextual sentiment on subjective expressions

– Target, source annotations– 226K tokens (JDPA: 332K)

Page 25: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Other resources

• Data sets provided by Bing Liu (2004, 2008)– Customer-written consumer electronics product

reviews– Contextual sentiment toward mention of product– Comparison annotations– 130K tokens (JDPA: 332K)

Page 26: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Thank you!

• Obtaining the corpus:– Research and educational purposes– [email protected]– June 2010– Annotation guidelines:

http://www.cs.indiana.edu/~jaskessl

• Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden

Page 27: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Top 20 annotations by type

Page 28: The 2010 JDPA Sentiment Corpus for the Automotive Domain

Inter-annotator agreement