conversation disentanglement in sports discourse

Post on 14-Jan-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Conversation Disentanglement in Sports Discourse. Anthony Wong 6/01/11. Importance of Topic. What is conversation disentanglement? Clustering task, diving a transcript into a number of smaller, separate conversations Conversation disentanglement has a couple practical applications: - PowerPoint PPT Presentation

TRANSCRIPT

Conversation Conversation Disentanglement in Disentanglement in

Sports DiscourseSports Discourse

Anthony Wong6/01/11

Importance of TopicImportance of TopicWhat is conversation disentanglement?

◦Clustering task, diving a transcript into a number of smaller, separate conversations

Conversation disentanglement has a couple practical applications:◦Summary generation◦User-interface systems like automatic

threading

Basis of my ApproachBasis of my Approach

Michael Elsner and Eugene Charniak (2008)◦Uses lexical and non-lexical features

to cluster different threads Time between utterances, same

speaker, number of shared words, “content” words

Proposed Project Proposed Project OverviewOverviewFollow the methodology in Elsner and

Charniak’s paper◦Create and annotate a dataset of sports

discourseUse existing Elsner/Charniak model to

provide a baseline classification results and see how well their model adapts to a different chat domain

Test out different feature combination to hopefully raise performance

? – Compare results with Elsner/Charniak paper in some meaningful way

Progress so farProgress so far

Retrieving and preparing Retrieving and preparing datadata

Retrieving and preparing Retrieving and preparing datadata

Annotating the dataAnnotating the data

Annotating the dataAnnotating the data

T1 715 KateC : Sam - this is going to be painful, isn't it? T1 715 SamHolako : I hope not Kate, but Howard, Nelson and Carter have killed the Raptors in the past T2 715 JaredWade : Classic Frisco. The Minnesota bathroom smells worse, I hear. T3 715 Anthony(RapsFan) : @Batman: His WP48 is the worst on the team. Andrea is terrible. He scores. That's about it. T3 715 Arnold : Holy impossibilities , Batman - that won't happen. T4 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won. T5 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway T5 715 ZachHarper : I don't think it works that way T6 715 Aras : Jared! T6 715 JaredWade : Aras.

Annotating the dataAnnotating the dataThe annotated part of this transcript

has 399 lines.177 unique threads.The average conversation length is

2.25423728814 .The median conversation length is 2 .The entropy is 7.0155726118 bits.The median chat has 0.0 interruptions

per line.The average block of 10 contains

6.25706940874 threads.The line-averaged conversation density

is 2.77944862155 .

Running Elsner model as Running Elsner model as isis T1 715 KateC : Sam - this is going to be painful, isn't it? T2 715 SamHolako : I hope not Kate, but Howard,

Nelson and Carter have killed the Raptors in the past T3 715 JaredWade : Classic Frisco. The Minnesota

bathroom smells worse, I hear. T4 715 Anthony(RapsFan) : @Batman: His WP48 is the

worst on the team. Andrea is terrible. He scores. That's about it.

T5 715 Arnold : Holy impossibilities , Batman - that won't happen.

T6 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won.

T7 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway

T8 715 ZachHarper : I don't think it works that way T9 715 Aras : Jared! T9 715 JaredWade : Aras.

Running Elsner model as Running Elsner model as isis368 unique threads.The average conversation length is

1.08423913043 .The median conversation length is

1 .The entropy is 8.48485646504 bits.The median chat has 0.0

interruptions per line.The average block of 10 contains

9.52699228792 threads.The line-averaged conversation

density is 1.42355889724 .

Editing the model and Editing the model and evaluationevaluation

Still in progress◦A lot of room for improvement◦Many different feature combinations

to try

Need to get evaluation code running

IssuesIssuesDocumentation for Elsner code is

good, but my Python is not

Integration issues between my data and Elsner code

MEGA Model Optimization Package (megam)

top related