automatic recognition of social roles using long term role

39
Automatic recognition of social roles using long term role transitions in small group interactions Gaurav Fotedar 1 Aditya Gaonkar P 2, 1 Saikat Chatterjee 3 Prasanta Kumar Ghosh 1 1 Department of Electrical Engineering Indian Institute of Science, Bangalore, India 2 Department of Electrical Engineering Columbia University, New York, USA 3 Department of Communication Theory KTH Electrical Engg. School, Stockholm, Sweden Interspeech, 2016 Gaurav Fotedar et al. Social role recognition Interspeech, 2016 1 / 39

Upload: others

Post on 03-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Automatic recognition of social roles using long termrole transitions in small group interactions

Gaurav Fotedar1 Aditya Gaonkar P2,1 Saikat Chatterjee3

Prasanta Kumar Ghosh1

1Department of Electrical EngineeringIndian Institute of Science, Bangalore, India

2Department of Electrical EngineeringColumbia University, New York, USA

3Department of Communication TheoryKTH Electrical Engg. School, Stockholm, Sweden

Interspeech, 2016

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 1 / 39

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 2 / 39

Introduction

What are meeting roles?

Roles define who is doing what in a meeting

They can be of two types :

Formal Roles: Often relate to the official designation of a member e.gManager, Developer, Designer. These stay constant for a member forthe length of the meeting.Social Roles: These characterise the behaviour of a member at aparticular time in a meeting e.g. Protagonist, Supporter. Hence, amember goes through many social roles as the meeting progresses.

We will be focusing on the recognition of social roles in our work.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 3 / 39

Introduction

Why social roles?

Social roles characterise relationships between members in a meetingand capture the dynamics of a meeting.

They answer semantic queries like, Who is doing? What in an event?

Knowing social roles helps to determine engagement, socialdominance and hot-spots in meetings.

Information of social roles has been used for topic segmentation inconversation discourses and summarising spoken documents

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 4 / 39

Introduction

Types of social roles

Typically, the social roles in a meeting could be:

Gatekeeper - a group moderator.

Neutral - a passive participant.

Protagonist - the driver of the conversation.

Supporter - participant with cooperative attitude.

Attacker - participant expressing disagreement.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 5 / 39

Introduction

Types of social roles

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 6 / 39

Introduction

Challenges in recognition of social roles in meetings

Dis-fluency in speech and overlaps of members while speakingincrease errors of ASR and speaker segmentation systems.

Short speaker turns reduce the data available for feature extraction ofa particular speaker.

Limited availability of annotated corpora.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 7 / 39

Introduction

Related Work and Our Contribution

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 8 / 39

Prior Work

Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.

aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.

Introduction

Related Work and Our Contribution

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 9 / 39

Prior Work

Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.

Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.

aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.

Introduction

Related Work and Our Contribution

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 10 / 39

Prior Work

Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.

Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.

Wilson et al.c used combinations of speech activity, subjectivity, andexpressive prosodic features with CRF.

aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.cWilson, Theresa, and Gregor Hofer. ”Using linguistic and vocal expressiveness in social role recognition.” Proceedings of the16th international conference on Intelligent user interfaces. ACM, 2011.

Introduction

Related Work and Our Contribution

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 11 / 39

Prior Work

Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.

Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.

Wilson et al.c used combinations of speech activity, subjectivity, andexpressive prosodic features with CRF.

Sapru et al.d annotated the AMI corpus with social roles and usedHCRF with combinations of lexical, acoustic and structural features.

aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.cWilson, Theresa, and Gregor Hofer. ”Using linguistic and vocal expressiveness in social role recognition.” Proceedings of the16th international conference on Intelligent user interfaces. ACM, 2011.dSapru, Ashtosh, and Herv Bourlard. ”Automatic recognition of emergent social roles in small group interactions.” IEEETransactions on Multimedia 17.5 (2015): 746-760.

Introduction

Related Work and Our Contribution

Our Contribution

All existing works have predicted roles in each meeting sliceindependently.

We incorporate role transition probabilities across meeting slices topredict social roles.

We propose a dynamic programming framework to reduce the runtimein estimating the sequence of predicted roles.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 12 / 39

Data

The AMI Corpus

100 hours of audio-visual recordings of role played meetings

4 members in each meeting with the formal roles:

Project ManagerIndustrial DesignerUser Interface DesignerMarketing Expert

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 13 / 39

Data

Social Role Annotation

Social Role annotation for 59 meetings are available. 1

Each meeting has been segmented into meeting slices based onpauses longer than 1 second. It is assumed that social role remainsconstant for a member for one meeting slice.

For each meeting slice, the 4 members have been assigned one socialrole from among Gatekeeper, Protagonist, Neutral, Supporter andAttacker.

1Done by Sapru et al.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 14 / 39

Data

Social Role Annotation

Figure: Distribution of social roles in the annotated AMI Corpus

Due to limited data for Attackers we only consider the other 4 socialroles.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 15 / 39

Proposed Method Using Role Transitions

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 16 / 39

Proposed Method Using Role Transitions

Why consider transition probabilities

Typically role recognition is posed as a classification problem usingfeatures from the respective meeting slice

Let fk be the feature vector and Lk be the role in the k-th slice for aparticipant in the meeting.Pkr = Prob(Lk = ρr |fk), r = 1, 2, 3, 4 are computed by the classifier

and the role with the highest probability becomes the predicted role

However, the variation of the role of a participant across slices coulddepend on the group dynamics and personal characteristics.

So prediction of the role in the k-th slice can use the information ofthe roles in previous slices.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 17 / 39

Proposed Method Using Role Transitions

Formulating the role recognition problem

We formulate the problem as maximizing the joint probability of theroles of a participant in all slices expressed as

Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK )

Using the definition of conditional probability,

Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK ) ∝p(f1, f2, · · · , fK |L1,L2, · · · ,LK )Prob(L1,L2, · · · ,LK ) (1)

Next, we will consider the calculation of the two terms on the rightside of the above equation.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 18 / 39

Proposed Method Using Role Transitions

Formulating the role recognition problem

Assuming that given roles in all K slices, the feature vectors in theseslices are independent we obtain :

p(f1, f2, · · · , fK |L1,L2, · · · ,LK ) =K∏

k=1

p(fk |Lk) (2)

Also, assuming all roles are equally likely, we know

Pkr = Prob(Lk = ρr |fk) ∝ p(fk |Lk = ρr ) (3)

We propose to use a discriminative classifier to obtain p(Lk |fk) whichcan then be used to obtain the first term in (1) by (2) and (3).

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 19 / 39

Proposed Method Using Role Transitions

Formulating the role recognition problem

The second term in (1) Prob(L1,L2, · · · ,LK ) captures the long termrole dynamics of a participant.

Applying the First-order Markov chain assumption

= Prob(LK |L1, · · · ,LK−1)Prob(L1, · · · ,LK−1)

= Prob(LK |LK−1)Prob(L1, · · · ,LK−1) = · · ·

= Prob(L1)K∏

k=2

Prob(Lk |Lk−1) ∝K∏

k=2

Prob(Lk |Lk−1)

[assuming roles are equally likely] (4)

We can see that this term can be obtained using the role transitionprobabilities.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 20 / 39

Proposed Method Using Role Transitions

Formulating the role recognition problem

Assuming the role sequence across meeting slice as a First-orderMarkov process, we calculate count based transition probabilities.

HHHH

HHFrom

ToGatekeeper Neutral Protagonist Supporter

Gatekeeper 0.70 0.16 0.04 0.10

Neutral 0.02 0.72 0.03 0.23Protagonist 0.10 0.14 0.62 0.14

Supporter 0.06 0.35 0.07 0.52

Table: Role transition probabilities.

This shows us a pattern that probability of staying in the same role isrelatively high for all roles

There is a significant probability of transition between Neutral andSupporter however, transition from Neutral to Protagonist is unlikely.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 21 / 39

Proposed Method Using Role Transitions

Formulating the role recognition problem

We can now re-write (1) as

Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK ) ∝(K∏

k=1

p(fk |Lk)

)1−γ ( K∏k=2

Prob(Lk |Lk−1)

)γ(5)

Where we use weights (1-γ) and γ, where γ (0 ≤ γ ≤ 1), to controlthe contribution of the role transition probabilities.

And the estimated sequence of roles can be obtained by

L̂k , ∀k = arg maxL1,··· ,LK

Prob(L1, · · · ,LK |f1, f2, · · · , fK ) (6)

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 22 / 39

Proposed Method Dynamic Programming Algorithm

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 23 / 39

Proposed Method Dynamic Programming Algorithm

DP based Algorithm

Since there are 4 roles, a full search for solving (6) will have acomplexity of O(4K )

We propose a DP based solution having a complexity of O(16K )

Let Dr (k) be the maximum probability of assigning k many roles forfirst k slices with ρr as the role in the k-th meeting slice.

Let the back-tracking pointer be ξr (k) which stores the role assignedto the (k − 1)-th slice for obtaining the maximum probability Dr (k).

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 24 / 39

Proposed Method Dynamic Programming Algorithm

DP based Algorithm

Dr (k) is computed in a recursive manner as follows:

1 Initialization: Compute Dr (1) =(P1r

)(1−γ)using equation (3).

2 Iteration: For 2 ≤ k ≤ K and 1 ≤ r ≤ 4, compute the following:

Dr (k) = max1≤r ′≤4

{Dr ′(k − 1)×

(αr ,r ′

)γ}× (Pkr

)(1−γ)

ξr (k) = arg max1≤r ′≤4

{Dr ′(k − 1)×

(αr ,r ′

)γ}where Pk

r is obtained using equation (3) and αr ,r ′ = Prob(ρr |ρr ′)which is the transition probability obtained from the training data.

3 Backtracking: L̂K = arg maxr Dr (K ).

L̂k = ξL̂k+1(k + 1), k = K − 1,K − 2, · · · , 1 (7)

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 25 / 39

Experiments and Results Features

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 26 / 39

Experiments and Results Features

Acoustic Features

Speaking Style and Vocal Expression can give hints about speaker role

We use OpenSMILE to extract various features such as : 2

Average, Standard Deviation, Skewness, range, kurtosis, minimum,maximumLinear and quadratic regression coefficients and approximation errors ofLow Level Descriptor (LLD) contours like Sub-band energy, spectralroll off, Spectral flux, short time energy

This creates a 297 dimensional feature vector

2Eyben, Florian, et al. ”Recent developments in openSMILE, the munich open-source multimedia feature extractor.”Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 27 / 39

Experiments and Results Features

Lexical Features

The words used by speakers in a meeting hold information about theirroles

We use Linguistic Inquiry and Word Count(LIWC) to analyse thespeech transcripts of meeting slices. 3

Weights for various linguistic categories are obtained through LIWCresulting in a 43-dimensional feature vector

3Pennebaker, James W., Matthias R. Mehl, and Kate G. Niederhoffer. ”Psychological aspects of natural language use: Ourwords, our selves.” Annual review of psychology 54.1 (2003): 547-577.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 28 / 39

Experiments and Results Features

Structural Features

Duration of speech and the number of speaker turns can holdsignificant information about the speaker role

Transcripts of the meeting slices provide the timestamps of wordutterances, which we utilise to calculate fraction of speaking time andnumber of turns taken by the speaker.

This results in an 2-dimensional feature vector

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 29 / 39

Experiments and Results Experimental Setup

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 30 / 39

Experiments and Results Experimental Setup

Experimental Setup

Five fold cross validation setup

59 meetings randomly divided into 5 sets. 4 sets with 12 meetings, 1set with 11 meetings. For each fold:

3 sets are used for training.1 set is used as development set1 set is used for testing

The γ parameter is optimised on development set. In a grid searchapproach γ is varied from 0 to 1 with a step of 0.1, and the γ whichprovides the highest accuracy is considered as optimum

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 31 / 39

Experiments and Results Experimental Setup

Experimental Setup

We use HCRF for the classification task 4

3 hidden states with 500 function evaluations to train

Various combinations of features have been used namely Acoustic(A),Lexical(L), Structural(S), Acoustic+Lexical(AL),Acoustic+Structural(AS), Lexical+Structural(LS),Acoustic+Lexical+Structural(ALS).

We use three performance metrics : Precision, Recall, F-score foreach role averaged across five folds. Recall is also reported averagedacross all roles.

The work by Sapru et al. is considered as the baseline scheme. 5

4Python implementation freely available at https://github.com/dirko/pyhcrf5Sapru, Ashtosh, and Herv Bourlard. ”Automatic recognition of emergent social roles in small group interactions.” IEEE

Transactions on Multimedia 17.5 (2015): 746-760.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 32 / 39

Experiments and Results Results and Discussion

Outline

1 Introduction

2 Data

3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm

4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion

5 Conclusions and Future work

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 33 / 39

Experiments and Results Results and Discussion

Results I

Method Gatekeeper Neutral Protagonist Supporter

PrecisionBaseline 0.50 0.90 0.50 0.67Proposed 0.57 0.89 0.57 0.67

RecallBaseline 0.43 0.91 0.47 0.73Proposed 0.44 0.92 0.44 0.75

F-scoreBaseline 0.46 0.91 0.46 0.69Proposed 0.49 0.91 0.46 0.70

Table: Performance metrics averaged across all feature combinations for all roles.

Significant improvement in Precision over the baseline for the roleswith lesser amount of data(Gatekeeper and Protagonist).

The recall (accuracy) averaged across all roles turns out to be 0.75and 0.76 using the baseline and the proposed methods respectively

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 34 / 39

Experiments and Results Results and Discussion

Results II

A L S LS AL AS ALS

0.45

0.5

0.55

0.6

0.65

Gatekeeper

Pre

cis

ion

A L S LS AL AS ALS

0.4

0.45

0.5

0.55

Recall

A L S LS AL AS ALS

0.4

0.5

0.6

F−

score

A L S LS AL AS ALS

0.85

0.9

0.95

Neutral

A L S LS AL AS ALS

0.860.88

0.90.920.940.960.98

A L S LS AL AS ALS

0.86

0.88

0.9

0.92

0.94

A L S LS AL AS ALS

0.5

0.6

0.7

Protagonist

A L S LS AL AS ALS

0.35

0.4

0.45

0.5

0.55

A L S LS AL AS ALS

0.35

0.4

0.45

0.5

0.55

A L S LS AL AS ALS

0.62

0.64

0.66

0.68

0.7

0.72

Supporter

A L S LS AL AS ALS

0.7

0.75

0.8

A L S LS AL AS ALS

0.65

0.7

0.75

Baseline Proposed

Figure: Various performance metrics for different feature combinations for all roles

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 35 / 39

Experiments and Results Results and Discussion

Results II

For the Neutral role, in terms of F-score the 2-dimensional StructuralFeatures perform as well as other higher dimensional featurecombinations for both methods.

For the Gatekeeper role, a combination of Lexical and structuralfeatures(LS) performs best in term of F-score for both methods.

For the Gatekeeper role, the proposed method outperforms thebaseline irrespective of feature combination used in terms of averageF-score.

For AL and ALS the proposed method improves F-score in 3 roles(Gatekeeper, Supporter, Protagonist).

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 36 / 39

Experiments and Results Results and Discussion

Results III

Feature Combinations

Fold A L S LS AL AS ALS

1 0.2 0.3 0.1 0.4 0.6 0.4 0.4

2 0.1 0.2 0.1 0.4 0.5 0.3 0.5

3 0.0 0.3 0.6 0.6 0.4 0.4 0.4

4 0.5 0.3 0.0 0.4 0.4 0.4 0.4

5 0.4 0.2 0.0 0.1 0.4 0.3 0.5

Avg 0.24 0.26 0.16 0.38 0.46 0.36 0.44

Table: Optimal γ values for different folds and feature combinations

Benefit from role transitions varies across feature combinations whichis reflected by γ values

γ values are highest for AL and ALS which as previously notedimprove the F-score for 3 roles.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 37 / 39

Conclusions and Future work

Conclusions

Precision of role recognition improves when role transition probabilityis included.

Improvement in precision is pronounced for Gatekeeper andProtagonist, which occur less frequently.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 38 / 39

Conclusions and Future work

Future Work

Incorporating role dynamics of 3 or more consecutive slices.

Investigation of the effect of including interpersonal dynamics in themodel.

Investigating role recognition on realistic situations, as this has beenon a ”constructed” corpus, AMI.

Gaurav Fotedar et al. Social role recognition Interspeech, 2016 39 / 39