crowdtruth for digital hermeneutics
DESCRIPTION
TRANSCRIPT
CrowdTruth for Digital Hermeneutics
Human-assisted computing for understanding of events in cultural heritage
Lora Aroyo http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
The Gallery of Cornelis van der Geest
DIGITAL HERMENEUTICS
theory of interpretation: relation parts of wholes events as context for interpretation of online collections
intersection of hermeneutics & Web technology
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Enrichment with Events
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Events Narrative
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Demo Datasets Rijksmuseum
– 159,860 Artworks – 71,851 Concepts – 73,374 Persons
Sound & Vision – 10,000 Videos – 172,000 Concepts
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Issues
• How to get (extract) the events? – needs to be scalable and automated
• How to collect ground truth for training? – experts don’t do as good job as they think
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Flickr: vanilllaph
What are Events? events perdure = their parts exist at different time points
objects endure = they have all their parts at all points in time
objects are wholly present at any point in time, events unfold over time
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Events are Vague humans have no clear notion of what events are
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Events have Perspec/ves and people don’t always agree
“event is a significant happening or gathering of people. I would define a happening as an event if the group of people gathered were united in one common goal.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
We asked the crowd what an EVENT is ...
“Event is a happening, which can be scheduled or unscheduled. An earthquake or fire happens (unscheduled). A wedding or birthday party (scheduled). It is an occasion that is unusual and tends to be memorable.”
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
“An event would be any occurrence where physical action has taken place. It may be a single, momentary instance (I sneezed), or it may span a period of time (the festival ran for four hours). An event may also be made up of a number of smaller events, such as a day at school is an event, but each individual class is also an event itself. Basically an event must have a physical action over any delimited time span.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
“A planned public or social get together or occasion.”
“an event is an incident that's very important or monumental”
“An event is something occurring at a specific time and/or date to celebrate or recognize a particular occurrence.”
“a location where something like a function is held. you could tell if something is an event if there people gathering for a purpose.”
“Event can refer to many things such as: An observable occurrence, phenomenon or an extraordinary occurrence.”
We asked the crowd what an EVENT is ...
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
“an event is the exemplifica:on of a property by a substance at a given :me” Jaegwon Kim, 1966 “events are changes that physical objects undergo” Lawrence Lombard, 1981
“events are proper:es of spa:otemporal regions”, David Lewis, 1986
under30ceo.com http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Gold Standard Assumption
• Systems need to be told what is right & what is wrong with a gold standard or ground truth
• Performance is measured on test sets veHed by human experts à never perfect, always improving against test data
• Historically, gold standards are created assuming that for each annotated instance there is a single right answer
• Gold standard quality is measured in inter-annotator agreement à does not account for perspec:ves, for reasonable alterna:ve interpreta:ons
HOW DO WE SCALE & AUTOMATE SOMETHING FOR WHICH THERE IS SO MUCH DISAGREEMENT?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Position Annotator disagreement is not noise, but signal. Not a problem to overcome but a source of information for machines
Artificially restricting humans does not help machines to learn.
They will learn better from diversity
Crowd Truth Annotator disagreement is indicative of the variation in human semantic interpretation of signs, and can indicate ambiguity, vagueness, over-generality, etc.
http://www.freefoto.com/preview/01-47-44/Flock-of-Birds
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
HOW DO WE COLLECT & REPRESENT DISAGREEMENT SO THAT IT CAN BE
HARNESSED?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
• Use crowdsourcing to get multiple perspectives (in the collection)
• Automatically generate examples (for scalability) • Multiple people annotate each example • Represent the annotations (the result) in a way that
captures the disagreement
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
CrowdTruth Framework for Medical Relation Extraction
Aroyo, L., Welty, C.: Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, 2013 Aroyo, L., Welty, C.: Truth is a Lie: 7 Myths about Human Annotation, AI Magazine, 2014 (in print)
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
CrowdTruth Framework for News Event Extraction
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
The police came to Apple’s glass cube on Fifth Avenue on Tuesday to enforce order after activists released black balloons inside the cube to [protest] the company’s environmental policies.
The police came to Apple’s glass cube on Fifth Avenue on Tuesday [to enforce] order after activists released black balloons inside the cube to protest the company’s environmental policies.
The police came to Apple’s glass cube on Fifth Avenue on Tuesday [to enforce order] after activists released black balloons inside the cube to protest the company’s environmental policies.
The police came to Apple’s glass cube on Fifth Avenue on Tuesday to enforce order after activists [released] black [balloons] inside the cube to protest the company’s environmental policies.
The police [came]to Apple’s glass cube on Fifth Avenue on Tuesday [to enforce] order after activists released black balloons inside the cube to protest the company’s environmental policies.
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Event Semantics are Hard event type: Semafor event location type: GeoNames event time type: Allen’s time theory, KSL time ontology event participant type: based on proper nouns classes
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Inel, O., Aroyo, L., et al (2013). Domain-independent Quality Measures for Crowd Truth Disagreement, DeRIVE2013
Events have multiple DIMENSIONS M
icro
-task
Tem
plat
e
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Each DIMENSION has different GRANULARITY M
icro
-task
Tem
plat
e
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
People have different POINTS OF VIEWS M
icro
-task
Tem
plat
e
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Why do people disagree?
Sign
Reference Observer
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Why do people disagree?
Sentence
Annotation Task Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Disagreement Analytics
• sentence metrics: sentence clarity, sentence-relation score • annotation task metrics: event clarity, event type similarity, relation ambiguity • worker metrics:
o worker-sentence disagreement o worker-worker disagreement o avg number of annotations per sentence o valid words in explanation text o same explanation across contributions o “[OTHER]” + different type o time to complete, number of sentences, etc.
Soberón G.,Aroyo, L., et al (2013): Crowd truth metrics. CrowdSem2013 Workshop Aroyo, L., Welty, C.: (2013) Measuring crowd truth for medical relation extraction. AAAI Fall Symposium on Semantics for Big Data
http://www.americanprogress.org/wp-content/uploads/2012/12/multiple_measures_onpage.jpg http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Experimental Setting
• 70 putative events • 8 experiments 2 for each
event role filler • annotations 15 workers
per putative event • max annot./worker 10 • workers native English
speakers on CF
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Annotation Example Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace demonstrators [ENTERED] the cube clutching helium-filled balloons, which were the shape and color of charcoal briquettes. Overall annotation & granularity distribution:
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Even
t Typ
e D
isag
reem
ent
[ENTERED]
ACTION (18.2%)
MOTION (9.1%)
ARRIVING_OR_ DEPARTING (54.5%)
PURPOSE (18.2%)
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace demonstrators [ENTERED] the cube clutching helium-filled balloons, which were the shape and color of charcoal briquettes.
type type
type type
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Even
t Loc
atio
n D
isag
reem
ent
[ENTERED]
the cube (38.5%)
cube (38.5%)
none (23%)
NOT APPLICABLE
(100%)
OTHER (100%) type
type
COMMERCIAL (40%)
OTHER (40%)
INDUSTRIAL (20%)
type
type
type
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace demonstrators [ENTERED] the cube clutching helium-filled balloons, which were the shape and color of charcoal briquettes.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Even
t Tim
e D
isag
reem
ent
[ENTERED]
Around (9.1%)
Around 2:30 p.m. (45.45%)
2:30 p.m. (45.45%)
TIMESTAMP (100%)
TIMESTAMP (100%)
type
type
TIMESTAMP (100%)
type
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace demonstrators [ENTERED] the cube clutching helium-filled balloons, which were the shape and color of charcoal briquettes.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Even
t Par
ticip
ant D
isag
reem
ent
[ENTERED]
Greenpeace (15.39%)
demonstrators (15.39%)
Greenpeace demonstrators
(69.23%)
PERSON (100%)
ORGANIZATION (100%)
type
type
ORGANIZATION (77.77%)
PERSON (22.22%)
type
type
Around 2:30 p.m., as if delivering birthday greetings, several Greenpeace demonstrators [ENTERED] the cube clutching helium-filled balloons, which were the shape and color of charcoal briquettes.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Comparative Annotation Distribution Event Type Distribution Time Type Distribution
The high disagreement for event type across all sentences likely indicates problems with the ontology. These event types are difficult to distinguish between. The event classes may overlap, be confusable, too vague, etc.
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Comparative Annotation Distribution Location Type Distribution Participant Type Distribution
Sentence
Ontology Worker
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Sentence Clarity Identifies sentences that are unclear or ambiguous based on the distribution of types
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Spam Detection
From the TRIANGLE we aim to efficiently remove the spam & low-quality contributors:
o we filter sentences based on their clarity score first in order to avoid penalizing workers for contributing on difficult or ambiguous sentences; When bad sentences are identified we remove them and see significant increase of accuracy on spam detection
o apply the worker metrics to analyze worker agreement to identify workers who systematically disagree (1) with the opinion of the majority (worker-sentence disagreement), or (2) with the rest of their co-workers (worker-worker disagreement); When spammers are identified we remove their annotations and the accuracy of the sentence metrics improves
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
What more … Understand human disagreement on event extraction with focus on ambiguity:
○ Would different classification (ontology) of putative
events perform better? ○ Does the overlapping of the types (ontology) influence
the results? ○ Identify the right role fillers (per event) for multiple
putative events. ○ Would event clustering help with determining the most
appropriate structure of the event and its role fillers?
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Conclusions
● Capturing events is important for digital hermeneutics
● Understanding disagreement helps understand event semantics
● Considering the interdependence of the different aspects of the annotations improves their quality
● Disagreement metrics adaptable across domains - helped us to understand the vagueness and the clarity of a sentence/putative event
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Sentence
Annotation task Worker
The Crowd Truth Crew
• Lora Aroyo, PI (VU) • Chris Welty, PI (IBM) • Robert-‐Jan Sips (IBM) • Anca Dumitrache, PhD candidate (VU-‐IBM) • Oana Inel, Lukasz Romasko, researchers (VU) • Students: Khalid, Rens, Benjamin, TaXana, HarrieYe • Engineers: Jelle, Arne (IBM)
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
hYp://crowd-‐watson.nl
AGORA Eventing History
Susan Legêne Chiel van den Akker
VU History department
VU Computer Science
Guus Schreiber Lora Aroyo
Geertje Jacobs
Johan Oomen
http://agora.cs.vu.nl
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
Events @ • Agora: Historical Events in Cultural Heritage Collections
– http://agora.cs.vu.nl/
• Extractivism: Activist Events in Newspapers – http://mona-project.org/
• Semantics of History
– http://www2.let.vu.nl/oz/cltl/semhis/
• BiographyNet: Events Change in Perspective over Time
• NewsReader: Multilingual Events & Storylines in Newspapers – http://www.newsreader-project.eu/
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo