ww
w.a
mip
roje
ct.o
rg
Collaborative Annotation of the Collaborative Annotation of the
AMI Meeting CorpusAMI Meeting Corpus
Jean Carletta
University of Edinburgh
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20072
AMI PartnersAMI Partners
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20073
NXT Major Development NXT Major Development SitesSites
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20074
AMI's aimAMI's aim
• aim: to develop technologies for browsing meetings and to assist people during meetings
• interdisciplinary: signal processing, language engineering, theoretical linguistics, human-computer interfaces, organizational psychology, ...
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20075
Why annotation?Why annotation?
• For basic scientific understanding - e.g.,• How do people choose a next speaker? • What is the relationship between speech
and gesture during deixis?
• For machine learning• Hand-code e.g. statement vs. question• Identify features for each like word
sequences and prosody• Use the data to fit a statistical classifier that
codes new data automatically
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20076
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20077
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20078
AMI Meeting RoomsAMI Meeting Rooms
4 close- and 2 wide-view cameras, 4 head-set and 8 array microphones, presentation screen capture, whiteboard capture, pen devices, plus extra site-dependent devices
TNO Edinburgh IDIAP
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 20079
IS1004d, 3:07 - 4:11IS1004d, 3:07 - 4:11
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200710
Corpus OverviewCorpus Overview
• 100 hrs of well-recorded meetings
• orthographically transcribed with word timings by forced alignment
• ASR output
• heavily annotated by hand for communicative behaviours
• Creative Commons Share-Alike licensing, with demo DVD
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200711
Hand AnnotationsHand Annotations
• transcription with word-level timings from forced alignment (100%)
• timestamping against signal (10-30%)• head gestures; hand gestures for
addressing and interactions with objects; location in room; gaze; emotion?
• discourse structure (70%)• dialogue acts (some w/ addressing), named
entities, topic segments, linked extractive and abstractive summaries
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200712
Costs in person-hrs/hrCosts in person-hrs/hr
transcription 30
topic segments + abstractive summaries 6-10
dialogue acts w/ some relations 20
addressing 12
extractive summaries linked to abstract 1
named entities 2-5
hand gestures (rough timings) 6
head gestures (rough timings) 6
head gestures (precision timings) 20
movement around room 4
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200713
Core ProblemsCore Problems
• How do we represent all of these kinds of annotation on the same base data, including both structural relationships and timing?
• How do we allow for multiple (human and machine) annotations of the same property, so that we can compare them?
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200714
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200715
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200716
NITE XML ToolkitNITE XML Toolkit• Mature toolkit for handling annotations with
temporal ordering and full structural relations • Data storage format designed to support
distributed corpus development• Libraries for data handling, query, and writing
graphical user interfaces• End user annotation tools for common tasks• Command line utilities for analysis, feature
extraction
• Open source
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200717
NXT corpus designNXT corpus design• data model is multi-rooted tree with arbitrary
graph structure over the top• each node has one set of children, multiple parents
• annotations often naturally map to a tree• corpus design to decide where trees intersect
• NXT can represent arbitrary graphs but the more the data has this character, the less useful the query language is
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200718
extract from Bdb001.A.words.xml
<w nite:id="Bdb001.w.1,342" starttime="356.39" endtime="" c="W">time</w> <w nite:id="Bdb001.w.1,343" starttime="" endtime="" c="HYPH">-</w> <w nite:id="Bdb001.w.1,344" starttime="" endtime="356.59" c="W">line</w>
extract from Bdb001.A.speech-quality.xml<speechquality nite:id="Bdb001.emphasis.16" type="emphasis"> <nite:child href="Bdb001.A.words.xml#id(Bdb001.w.1,342)..id(Bdb001.w.1,344)" /> </speechquality>
Stand-off XMLStand-off XML
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200719
Metadata fileMetadata file
Like set of DTDs for the XML files plus:
• connections between the files
• list of "observations" (coded dialogues/group discussions/texts)
• catalog for finding signals and data on disk
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200720
Simple example querySimple example query
($w word)($r reference): ($w@POS = “NN”) && ($r ^ $w)
Return list of 2-tuples of words and referring expressions where the word’s part of speech is NN and the word is in the referring expression.
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200721
General features of the General features of the languagelanguage
• Match variable by no type, single type, or disjunctive type
• Attribute and content tests for existence, ordering, equality, match to regexp
• The usual boolean combinators• Quantifiers forall and exists • Filtering by passing results to another query
to create a result tree (not list)
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200722
Uses for queriesUses for queries
• Exploring the data in a browser• Basic frequency counts• Verifying data quality• Indexing complexes for further use• Finding things for screen rendering
in GUI
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200723
Only configuration Only configuration needed to:needed to:
• search/index data in NXT format• display data in a standardized
(ugly) way• Set up annotation tools for some
common tasks• dialogue act• named entity• time-stamped labelling
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200724
• [named entity demo]
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200725
Programming Programming tailored interfacestailored interfaces
• development time is 1.5 days - 2 weeks depending on • how clear the spec is• complexity of the interface and
whether our "transcription view" middleware fits
• familiarity with Swing
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200726
Named entity coderNamed entity coder
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200727
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200728
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200729
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200730
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200731
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200732
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200733
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200734
ww
w.a
mip
roje
ct.o
rg
Carletta 20 June 200735
SummarySummary
• NXT provides infrastructure for collaborative annotation that • Is distributed• Provides structural relationships• Provides timing w.r.t signals• Works for large-scale projects
• NXT’s best current demonstration is in the AMI Meeting Corpus