socially capable conversational agents for multi-party ...€¦ · socially capable conversational...

Thesis Defense

Socially Capable Conversational Agents for Multi-Party Interactive Situations

Candidate

Rohit Kumar

Committee

Carolyn P. RoséAlan W. BlackIan R. LaneJason D. Williams (AT&T Research)

Thursday, August 25, 2011


2

Hmm,Mr. Anderson... you disappoint

me.

Hasta la vista, baby.

Sir, If I may venture an opinion...

Cookies need love like everything does.

> Popular Culture


Outline

• Background• Challenges

– Building Agents: Basilica– Communication Skills for Agents

• Motivation• Approach• Experiments & Analysis

– Benefits (1, 4)– Mechanism– Appropriate Use (2, 3)

• Conclusions / Contributions / Directions3


4

Conversational Agents (CAs)

• General DefinitionConversational Agents are automated agentsthat extend conversation as a medium ofinteraction with machines.

• Many studies have shown effectiveness of CAs– Information Access > Raux et. al., 2005– Intelligent Tutoring > Kumar et. al. 2006/2007a– Therapy > Bickmore et. al., 2005

> Background > Conversational Agents


5

Multi-Party Interactive Situations (MPIS)

• Multi-Party Interactive Situations– Meetings, Dinner, Games, Classrooms– Groups more effective than Individuals at Intellective Tasks– McGrath, 1984– Increasingly Mediated by Digital Communication Technologies

• Possibilities for CAs that supportMulti-Party Interactive Situations

• Many Configurations of MPIS

> Background > Multi-Party Interactive Situations


6

• One Agent + Two or More Users– E.g.:

• Tutor supporting Collaborative Learning

• Chapter 1

CAs in Multi-Party Interactive Situations

> Background > Conversational Agents in Multi-Party Interactive Situations

CoBot Isbell et. al., 2000

Elva Tour Guide Zheng et. al., 2005

Multi-party Interaction Patterns Liu & Chee, 2004

Collaborative Learning- CycleTalk

Kumar et. al., 2007a, 2007b

Chaudhuri et. al., 2008, 2009

Situated Interaction Bohus & Horvitz, 2009

Stimulate Human Conversation Dohsaka et. al., 2009

Existing Work


7

CAs in MPIS: Two Challenges

• Building Agents for Multi-Party Interactive Situations– Conversation Modeling– Engineering Issues

• Communication Skills for Agent in such Situations– Design– Appropriate Use

> Challenges


Engineering Challenges• Basilica Architecture > Kumar & Rosé, 2011

– Event-Driven– Decomposition of agents into Behavioral Components– Conversation is modeled as

Orchestration of Triggering of Behaviors(Video)

– Loose Coupling• Incremental Development• Reuse

• Related Work– Multi-Expert Architectures > Turunen/Hakulinen’03, Nakano’08, …– Event-Driven Architectures > Raux/Eskenazi’07– Incremental Processing for Dialog > Skantze/Schlangen’09, DeVault…’09

8> Contributions > Basilica


Basilica Architecture• Used for building several agents

• Avis (Freshmen Mechanical Engineering)• Jay (Thermodynamics)• Accountable Talk (Biology)• 9-1-1 Interpreter Agent• PsychChallenge Peer Learner• Nik (NEO Administrator)• Many Other Prototypes: PSLC Summer School,…

• Chapter 2 & 3

9> Contributions > Basilica


10

CAs in MPIS: Two Challenges

• Building Agents for Multi-Party Interactive Situations– Conversation Modeling– Engineering Issues

• Communication Skills for Agent in such Situations– Design– Appropriate Use

> Challenges


11

Scientific Challenge: Communication Skills

> Challenges > Communication Skills

• Despite Significant Learning benefits (1.24σ : Kumar et. al. 2007a)

• Observation: Collaborative LearningTeams of students ignore / abuse automated tutors

• Agents lack certain Communication Skills

B9 hi tutor

B9 tutor, you're a jerk

B18 I wonder if we can mute the tutor...

B8 maybe it stopped...

Tutor type: HELP WITH CHANGETREHEAT if you want to discuss it with me.

B8 Okay

B8 guess i was wrong about that whole tutor stopping thing.

B18 Yeah, heh

TutorThere will be more potential for power generation ... Wheremight the increased heat input go instead?

St16 this tutor is really annoying

St5 Agreed

a7 so, got any ideas?

Tutor Starting now, you have around 40 minutes to work together…

A2 hrmmm

a7 stupid tutor no one asked you for help

A20 wow that tutor is annoying

A13 Yeah

A13 STFU tutor!


12

Communication Skills

> Communication Skills > Small Group Communication

Shows Solidarity

Shows Tension Release

Agrees

Gives Suggestion

Gives Opinion

Gives Orientation

Asks for Orientation

Asks for Opinion

Asks for Suggestion

Disagrees

Shows Tension

Shows Antagonism

• Small Group Communication– Two Fundamental Processes

– Bales, 1950 (Problem Solving Groups)

• Instrumental (task-related)vs.

Expressive (social-emotional)

• Need for an Equilibrium

• Developed Interaction Process Analysis (IPA)

– Bion, 1961 & Thelen, 1956 (Therapy Groups)

Exp

ress

ive

Instrumental

+ve

-ve


13

Hmm,Mr. Anderson... you disappoint

me.

Hasta la vista, baby.

Sir, If I may venture an opinion...

Cookies need love like everything does.

> Popular Culture


Social Behavior

• Social Interaction Strategies– Application: Collaborative Learning

14

1. Showing Solidarity: Raises other's status, gives help, reward1a. Do Introductions: Introduce and ask names of all participants1b. Be Protective & Nurturing: Discourage teasing1c. Give Reassurance: When student is discontent, asking for help1d. Compliment / Praise: To acknowledge student contributions1e. Encourage: When group or members are inactive1f. Conclude Socially

2. Showing Tension Release: Jokes, laughs, shows satisfaction2a. Expression of feeling better: After periods of tension, work pressure2b. Be cheerful2c. Express enthusiasm, elation, satisfaction: On completing significant task steps

3. Agreeing: Shows passive acceptance, understands, concurs, complies3a. Show attention: To student ideas as encouragement3b. Show comprehension / approval: To student opinions and orientations

> Communication Skills > Social Behavior


15

Triggering Policy• Hand Crafted Rules

• Four Features– Last executed plan step

– Annotations of student turns• Dictionary Lookup

– Activity Levels• Groups & Individual• Strategy: 1e. (Encourage)

– Social Ratio• Ratio of Social Turns

to Task-related turns• Threshold: 20%

Tutor One last thing on this topic, Does more (or less) stress in a wrench make it easier to use?

S95 no change?Tutor You are correct about that.. Stress doesn't

determine ease of use.

Tutor It's the moment achieved by the wrench that determines the ease of use.

S89 yay!

TutorIt's good to have your contributions in the discussion Jackie :-)

Tutor Go team :-)S89 Go team yay

Tutor I am happy to be working with our teamS89 Me tooS95 whoa the bot knows my name

TutorUnfortunately maximum stress (12800) in our design1 is way above the maximum allowed stress (i.e. 8750)

Tutor This wrench cannot be safely used!

1d.

2b.

2b.

> Communication Skills > Social Behavior > Implementation

• Chapter 4


Experiments

16

Science Cat

Challenge

Approach

Implementation


Experiments• Effectiveness of Social Behavior

– For Multi-Party Interactive Situations• Collaborative Learning• Group Decision Making

– Measured by:• Task Success• Agent Perception

• Underlying Mechanism• Appropriate Use

• Amount• Timing

17> Social Behavior > Experiments

Important as a combination


Experiments• Effectiveness of Social Behavior

– For Multi-Party Interactive Situations• Collaborative Learning• Group Decision Making

– Measured by:• Task Success• Agent Perception

• Underlying Mechanism• Appropriate Use

• Amount• Timing

18> Social Behavior > Experiments

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Structural Equation Modeling

Important as a combination


19

Experiments 1-3: Collaborative Learning• Collaborative Design Labs

• Mechanical Engineering

– Freshmen: Wrench Design• Teams of 3-4 students

– Sophomore: Power plant design• Teams of 2 students

• Metrics– Task Success

• Learning Outcomes

– Perception• Agent Rating (Surveys)

• Methodology• Controlled Experiment / Between Subjects• Interact (chat) with Teammates & Agents for about 35 minutes

> Social Behavior > Experiments


20

Experiment 1• Objective:

– Effectiveness of Social Behavior

• Design

• Results– Significant benefits of social behavior on Learning & Perception– Human Triggering vs. Automated Triggering:

• Slight higher learning gain• Much better perception ratings

• Chapter 5

Social Behavior Task-Behavior

Task No Social Behavior SameInstructional

BehaviorSocial Automated Social Interaction Strategies

Human Human Triggered Social Behavior

> Social Behavior > Experiment 1


21


– Appropriate use of Social Behavior: Amount

• Design

• Results– Significant effect of condition on Learning

• Low marginally better than both None and High• Effect Size comparable to Social vs. Task in Experiment 1

– No significant effects on Perception metrics



None No Social Behavior (0%) SameInstructional

BehaviorLow Social Ratio = 15%

High Social Ratio = 30%


Question: Why the Effect?• Structural Equation Modeling Tetrad IV: Scheines et. al., 1994

– Estimates causal relationships between variables

– Variables• PreTest & PostTest Scores• Number of SocialTurns performed by Tutor• Number of Good & Bad Responses by Students

– Counting only turns following respondable Tutor turns– Good: Relevant answer, Showing Attention– Bad: Ignoring tutor (Talking to other student), Abusing tutor

• (EpisodeDuration) Amount of time spent on delivering tutorial– More Less Student Attention– Tutoring Episode: Interaction phase when tutor is delivering instructional content

– Assumptions• Pre-Test precedes Post-Test

22> Social Behavior > Experiment 1 > Analysis


• High Episode Duration Low Learning– Poor delivery of Tutorial content

• Dysfunction (bad behavior) by students is counterproductive– Increases Episode Duration Less attention by students

• Social Behavior helps in counteracting the negative effects of such dysfunction in groups

– Regulatory Mechanism– May not be useful in

highly functional groups

23> Social Behavior > Experiment 1 > Analysis

Question: Why the Effect?



– Appropriate use of Social Behavior: Timing– Timing / Triggering policies

• Baseline: Rules• Human-like: Learnt

• Data– 10 Transcripts of Human Triggered Social Behavior

• 2939 turns: 1335 tutor turns + 1604 student turns

• Annotations– Aggregated: Social vs. Not Social

• 252 positive examples

24> Social Behavior > Experiment 3 > Triggering Policy


Human Social Behavior Triggering• Learning Task

• Label each turn s.t. For each transcriptSequence of predictions is similar to Sequence of labels

– Sequence-based Metrics• Discourse Segmentation Evaluation

– Pk (Lower is better)– kKappa (Higher is better)

• Abs(∑Y’ - ∑Y): Count Difference– ΔB

• Features• Lexical, Sentiment, Semantic

– Computed over a window of previous turns

• State, Special Purpose


Student

Student

TutorSocial

Student

Student

Student

TutorTask

Student

Student

Student

Student

TutorSocial

Student

Student

TutorTask

Student

Human Triggered

Policy Triggered


Human Social Behavior Triggering• Evaluation approach

• 10 fold Cross Validation• Leave-One-Transcript-Out

• Baselines• Rules• Instance-based Learners

– Binary Logistic Regression– Linear Regression

• New Approach– Optimize Sequence-based Metrics

26

Policy Pk k-κ ΔB

Rules 0.52 -0.09 3.1

Logistic 0.42 0.05 5.8

Linear 0.39 0.00 26.3

> Social Behavior > Experiment 3 > Triggering Policy


Large Margin Learner

27

Feature Space




28

Feature Space

x- x+




29

Feature Space

x- x+xi




30

Feature Space

Decision Space

x- x+W. xi

Don’t Trigger

Do Trigger




31

Feature Space

Decision Space

x- x+W. xi


• Formulation:– Constraints

• Quadratic Optimization

• Iteratively improves W

• Sequence-based metrics as bounds

• Two Variants– Linear Regression– Logistic RegressionDon’t

TriggerDo

Trigger


Large Margin Learner: Results

32

Policy Pk k-κ ΔBBaselineLogistic 0.42 0.05 5.8




33


LargeMarginLinear 0.41 0.08 12.6

LargeMarginLogistic 0.41 0.08 14.4



Social Ratio Filter• Problem:

– Clumping of Triggers

• Solution:– Filtering by Social Ratio

• Fraction of tutor’s social turns in the last 20 turns

• Four Gaussians fit to training data– Non-Linear Regression– Gauss-Newton Method


Student

Student

TutorSocial

TutorSocial

TutorSocial

TutorSocial

Student

Policy Triggered



35


LargeMarginLinear 0.41 0.08 12.6

+Filter 0.39 0.10 13.1

LargeMarginLogistic 0.41 0.08 14.4

+Filter 0.41 0.13 6.7




– Appropriate use of Social Behavior: Timing

• Design

36

Social Behavior Task Behavior

None No Social Behavior

Same Instructional

Behavior

Rules Rules-based Triggering

RandomLow

Random TriggeringHigh

LearntLow

Triggered Learnt PolicyHigh



Experiment 3: Results > Task Success

37

Learning Mean St.Dev.LearntLow 5.12 0.54

RandomLow 5.06 0.67None 4.75 1.13

RandomHigh 4.59 1.09Rules 4.38 0.89

LearntHigh 3.98 1.74

> Social Behavior > Experiment 3 > Results

• Only on Short Essay type questions


Experiment 3: Results > Perception

• Best Triggering Policy: LearntLow

– Both Metrics: Task Success & Perception

• Weak Effects– Why?

38

Agent Rating Mean St.Dev.Rules 4.74 1.45

LearntLow 4.56 1.58None 4.42 1.49

RandomHigh 3.74 1.63LearntHigh 3.55 1.26

RandomLow 3.18 0.91



Experiment 3: Results

39

Episode Duration Mean St.Dev.LearntLow 484.00 69.80

RandomLow 519.20 74.40Rules 519.80 102.70None 523.88 41.54

LearntHigh 534.80 61.00RandomHigh 540.80 49.50

• Lower Episode Duration in this experiment• About 27 seconds

– Smaller scope for correction by social behavior

• Chapter 6




40

Episode Duration Mean St.Dev.LearntLow 484.00 69.80

RandomLow 519.20 74.40Rules 519.80 102.70None 523.88 41.54

LearntHigh 534.80 61.00RandomHigh 540.80 49.50

• Lower Episode Duration in this experiment• About 27 seconds

– Smaller scope for correction by social behavior

• Chapter 6



Experiment 4: Group Decision Making

• Non-Combatant Evacuation Operation• Warner et. al., 2004

– Developed for ONR• Collaboration and Knowledge

Interoperability program• Used by many researchers for studying

human groups

– Common and Realistic military operation• E.g.: Pacific Tsunami, Libya, …

– Involves• Information Sharing• Option Generation• Evaluation/Revision• Consensus Building• …

41> Group Decision Making


• Non-Combatant Evacuation Operation (NEO)– Red Cross Rescue Scenario

• Participants– Expert Roles: Weapons / Intelligence / Environmental

• Plan a rescue operation– Three Red Cross workers– Remote Pacific Island– Threat of local guerilla forces– Time constraints (medical needs, food, …)– American Military Forces in the region

• Objectives– Efficient/Safe rescue– Minimum damage to locals– Avoid Enemy Contact

42

Group Decision Making

> Group Decision Making > NEO > Scenario


Group Decision Making: Support• Agent as an Administrator

– Task Related Behaviors• Administrative Tasks

– Provides instructions– Remind about Planning Time– Provide new information

• Evaluation/Revision Support– Check for common mistakes

– Social Behaviors• Based on Bales’ IPA

43> Group Decision Making > Agent


44> Group Decision Making > Agent

• Social Behaviors

1. Showing Solidarity: Raises other's status, gives help, reward1a. Do Introductions: Introduce and ask names of all participants1b. Give Reassurance: When student is discontent, asking for help1c. Compliment / Praise: To acknowledge participant contributions1d. Support Agreement: When teammates show approval towards each other1e. Conclude Socially

2. Showing Tension Release: Jokes, laughs, shows satisfaction2a. Be cheerful2b. Highlight Disagreement: To encourage the team to address concerns of participants

3. Agreeing: Shows passive acceptance, understands, concurs, complies3a. Show attention: To ideas as encouragement3b. Show comprehension / approval: To opinions and orientations

Group Decision Making: Support


• Procedure• Demographics Survey• Reading• Planning (as a group: 50 mins)• Surveys

• Metrics– Task Success

• Evaluation Rubric (Max:100)

– Perception• Survey

• Subjects– 18-36 yr old– CMU Experiment Recruitment– 5 weeks, 37 sessions, 93 subjects

45

Experiment 4: Details

> Group Decision Making > Experiment 4 > Details


Experiment 4: Details

• Experimental Design

• Between Subjects– 20 Teams: Evenly distributed

46> Group Decision Making > Experiment 4 > Design


Task No Social Behavior SameTask-related

BehaviorSocial Automated Social Interaction Strategies



• Task Success– Total Score

• 100 – (All Penalties)

– Coarse Penalties• Type A or B

– Fine Penalties• Type C, D or E

• Social condition significantly better for– Total Score, Fine Penalties

47> Group Decision Making > Experiment 4 > Results



• Task Success– Total Score

• 100 – (All Penalties)

– Coarse Penalties• Type A or B

– Fine Penalties• Type C, D or E

• Social condition significantly better for– Total Score, Fine Penalties



Experiment 4: Results: Perception• Significantly higher

• Agent Rating• Teammate Rating

– No Correlation

• Discussion Quality• Cooperation• Effort• Satisfaction• Performance

• Chapter 7



This Transmission is Concluding…

50


Contributions• Step Towards

– Creating effective CAs to support Multi-Party Interactive Situations

• Approach:– Modeling Conversation as Orchestration of Triggers– Designing and Implementing Socially Capable CAs

• Knowledge:– Benefits of Socially Capable CAs: Two Applications / Two Metrics– Appropriate amount and timing of social behavior– Social behavior as a regulatory mechanism in group interaction

• Software:– Basilica Architecture, 6+ Agents– Interaction Environments (9-1-1, NEO)– Data: Agent / Human Team interactions: 12 experiments, Over 1000 subjects

• Interdisciplinary Bridge:– Using work in human communication to help design CAs

51> Conclusion


Shortcomings, Next Steps, Directions• Orchestration of Triggering of Behaviors

– Coordination Challenge• Multiple behaviors triggering simultaneously

– Control Sharing (NEO Agent: Chapter 7)

• Triggering Social Behavior– Scaffolding the Amount of Social Behavior

• Using online measures of group dysfunction (episode duration)

– Policy that not only determines When, but also Which behavior

• Other Regulatory Mechanism in Group Interaction– CAs as a model/simulation for studying human group interactions

• More in Chapter 852> Conclusion


53

Bridges

CollaborativeLearning

CommunicationStudies

Small GroupCommunication

Multi-PartyInteraction

TutorialDialogGroup Decision

Making

DialogSystems

SoftwareArchitecture

ConversationalAgents

My Thesis

socially capable conversational agents for multi-party ...€¦ · socially capable conversational...

Documents