automatic disfluency detection in multi-party conversations...
TRANSCRIPT
ww
w.a
mip
roje
ct.o
rgGerman Research Center forArtificial Intelligence GmbH
FeastFeast, 30th September 2009, 30th September 2009Sebastian Sebastian GermesinGermesin
Automatic DisfluencyAutomatic DisfluencyDetection in Multi-partyDetection in Multi-party
ConversationsConversations
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 092
German Research Center forArtificial Intelligence GmbH
OutlineOutline• Motivation• Theoretical Background• Data (AMI Corpus)• Disfluency Detection System
• Hybrid Classification Approach• Self-arranging Modules• Experimental Results
• Conclusions & Outlook
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 093
German Research Center forArtificial Intelligence GmbH
MotivationMotivationExampleExample
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 094
German Research Center forArtificial Intelligence GmbH
MotivationMotivation• Have to detect (and clean) disfluencies
in the transcribed speech• Readability
• Transcription• Extractive Summarization
• Post-Processing• NLP-systems’ performance drop when faced with
disfluent speech
• Human detector?• Too expensive!• Too slow!
⇒Automatic Detection System!
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 095
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“Disfluencies are syntactical and grammatical[speech] errors that occur in spoken but notin written language.” [Besser, 2006]
DefinitionDefinition
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 096
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“The cat uh the dog sneaks around the corner.”
TerminologyTerminology
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 097
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“The cat uh the dog sneaks around the corner.”
TerminologyTerminology
Reparandum
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 098
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“The cat uh the dog sneaks around the corner.”
TerminologyTerminology
Reparandum
Interregnum
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 099
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“The cat uh the dog sneaks around the corner.”
TerminologyTerminology
Reparandum Reparans
Interregnumcomplex
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0910
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical Background
“The d dog sneaks around the corner.”
TerminologyTerminology
Reparandum
simple
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0911
German Research Center forArtificial Intelligence GmbH
Theoretical BackgroundTheoretical BackgroundAll TypesAll Types
Simple disfluencies
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0912
German Research Center forArtificial Intelligence GmbH
DataData
• AMI meeting corpus• 135 meetings (~ 100 hours speech)• 4 participants• task: design a remote control• freely interaction• Many annotations, e.g.:
• Transcribed speech• Dialogue acts• Gestures• ...
quantitativequantitative
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0913
German Research Center forArtificial Intelligence GmbH
DataData
• 45 meeting enriched with disfluencyannotation• 31,000 Disfluencies• 15.8% erroneous words• 41.5% disfluent Dialogue Acts• 80% (33) for training• 20% (12) for evaluation
quantitativequantitative
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0914
German Research Center forArtificial Intelligence GmbH
DataData
• Discovered a heterogeneity towards thestrictness of different disfluency types1. Some disfluencies have strict structure
• ex.: Repetition : “The cat the cat plays “2. Some other disfluencies have also strict structure but
this structure is very common in natural language• ex.: Replacement : “The dog the cat plays“• ex.: Fluent : “The dog the cat and the bird play”
3. Some other disfluencies have no obvious structure• ex.: Disruptions : “The dog the cat and“• ex.: Order : “The plays cat”
qualitativequalitative
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0915
German Research Center forArtificial Intelligence GmbH
Automatic SystemAutomatic SystemDesign QuestionDesign Question
• Can we leverage the heterogeneityof disfluencies for their detection?→Yes!
→ Use modules for subsets of disfluencies→ Use different feature-sets for each module
(depending on the disfluency types)→ Find “optimal” classifier for each module
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0916
German Research Center forArtificial Intelligence GmbH
Automatic SystemAutomatic SystemHybrid ModulesHybrid Modules
• SHS:• Stuttering, Hesitation, Slip-of-the-Tongue
• REP:• Repetition
• DNE:• Discourse Marker, Explicit Editing Term
• DEL:• Deletion
• REV:• Insertion, Replacement, Restart, Other
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0917
German Research Center forArtificial Intelligence GmbH
How toHow to combine the modules?combine the modules?
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0918
German Research Center forArtificial Intelligence GmbH
Training ProcessTraining ProcessSelf-arranging ModulesSelf-arranging Modules
• Immense search space• #(modules) * #(classifier) * placeInSystem
• Solution(s):• Old system:
• Choosen manually• Current system:
• Automatically trained1.Use greedy hill-climbing
– Use weight for errors to improve Precision!2.Reduce classifier library
– Take 10% results in maximal performance lossof 2.3% (depending on the module)
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0919
German Research Center forArtificial Intelligence GmbH
GroDiGroDiGreedy Hill-ClimbingGreedy Hill-Climbing
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0920
German Research Center forArtificial Intelligence GmbH
Training ProcessTraining ProcessSelf-arranging ModulesSelf-arranging Modules
• Immense search space• #(modules) * #(classifier) * placeInSystem
• Solution(s):• Old system:
• Choosen manually• Current system:
• Automatically trained1.Use greedy hill-climbing
– Use weight for errors to improve Precision!2.Reduce classifier library
– Take 10% results in maximal performance lossof 2.3% (depending on the module)
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0921
German Research Center forArtificial Intelligence GmbH
GroDiGroDiPerformance-Curve of J48Performance-Curve of J48
Best: J48 "-L -U -M 2 -A"
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0922
German Research Center forArtificial Intelligence GmbH
Experimental ResultsExperimental Results
93.5 %94.5 %12 m.
94.7 %95.1 %6 m.33 m.
new 0.11
94.8 %95.3 %6 m.22 m.
0.4290.5 %92.9 %6 m.22 m.old
83.3 %88.6 %12 m.0.00
85.7 %90.3 %6 m.--baseline
RT-factoravg. F1AccuracyEval.data
Train.dataSystem
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0923
German Research Center forArtificial Intelligence GmbH
ConclusionsConclusions• Aims:
• Development of a system that automaticallydetects a broad set of disfluencies
• Fully automatic learning process• Robust and Fast
• Achievements:• Stand-alone tool for detection of disfluencies:
GroDi - Get rid of Disfluencies• Self-arranging modules• Detection rate: 95% Accuracy• Real-time factor of 0.11
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0924
German Research Center forArtificial Intelligence GmbH
OutlookOutlook
• Develop module(s) for the detection ofMistake, Order, Omission
• Embed other learning approaches, e.g.:• Conditional Random Fields• HMMs
• Use other corpus like, e.g., Switchboard
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0925
German Research Center forArtificial Intelligence GmbH
Thank you!Thank you!
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0926
German Research Center forArtificial Intelligence GmbH
Demo?Demo?
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0927
German Research Center forArtificial Intelligence GmbH
GroDiGroDiDiff. Module ArrangementsDiff. Module Arrangements
ww
w.a
mip
roje
ct.o
rg
Sebastian Germesin September 0928
German Research Center forArtificial Intelligence GmbH
GroDiGroDi
Used technologies WEKA toolkit for machine learning Maximum Entropy classifier from Stanford NLP group CRF Tagger from http://crftagger.sourceforge.net/
Features for machine learning: Lexical: words, lexical parallelism, (POS-Tags) Prosodic: duration, pauses, pitch, energy Dynamic: disfluency types of surrounding words Speaker: age, role in meeting, native language