expressive non-verbal interaction in string quartet · expressive non-verbal interaction in string...

6
Expressive Non-Verbal Interaction in String Quartet Donald Glowinski Univ. of Genoa, IT Univ. of Geneva, CH [email protected] Giorgio Gnecco Univ. of Genoa, IT [email protected] Stefano Piana Univ. of Genoa, IT [email protected] Antonio Camurri Univ. of Genoa, IT [email protected] Abstract—The present study investigates expressive non- verbal interaction in musical context starting from behavioral features extracted at individual and group level. We define four features related to head movement and direction that may help gaining insight on the expressivity and cohesion of the performance. Our preliminary findings obtained from the analysis of a string quartet recorded in ecological settings show that these features may help in distinguishing between two types of performance: (a) a concert-like condition where all musicians aim at performing at best, (b) a perturbed one where the 1 st violinist devises alternative interpretations of the music score without discussing them with the other musicians. I. I NTRODUCTION Our aim in the present study is to shed light on the way the musicians behave in performing the co-operative and emotionally-engaged task of playing in a string quartet (SQ): in particular, we want to measure how the expressive behavior of the musicians may change as a consequence of modifications in the performing conditions. Several studies have used observational and interview methods to explore the way musicians interact and determine the overall quality of experience [3]. Others, including the study reported here, investigate the interaction by means of quantitative measures, with a particular focus on expressive alignment processes in communication. The literature on alig- ment grows out of linguistics research on convergence between speakers, but it has broadened to various nonverbal behaviors (e.g., [13] reviews studies on speech prosody, turn taking, joint attention, backchanneling, head nods, smiles and mirror- ing/contagion effects). Within this framework, SQs have been identified as a particularly promising context for investigating expressive and adaptive interactions of groups of people [2], [6]. The SQ scenario involves a particular social structure, which we would expect to be reflected in a particular style of communication [10]. In a SQ, all the musicians contribute equally to the performance of the group. There is some degree of leadership, usually played by the 1 st violinist, but not the kind of hierarchy that can be seen in an orchestra (conductor vs. other musicians). In this perspective, the SQ has been described as a self-managed team, i.e., a working structure where all partners share roughly equal responsibility in the development of a common project [4]. Our analysis starts from the movement of musicians’ heads. Head movement is known to play a central role in non- verbal communication in general [6] and in music in particular [1], [7]. Head movements may express the way musicians understand the phrasing and breathing of the music, and so provide information about the high-level emotional structures in terms of which the players are interpreting the music. Additional information indicating how each musician stands with respect to the group as the performance unfolds, may be obtained by studying the movement of musicians’heads with respect to the positions of several points of interest. As proposed by [5], the ear of the quartet is a prominent (a) (b) Fig. 1. A picture of the four musicians of the SQ (Quartetto di Cremona) studied in this paper (a), together with their Motion Capture (MoCap) 3D representation (b). The musicians are wearing MoCap reflective markers.The subjective center of the quartet, the ear, is represented by a white dot and corresponds to a reflective marker mounted above a tripod situated among the musicians. example of such points of interest. The SQ ear refers to a fixed subjective center, whose position is defined by the musicians themselves, and located at nearly equal distance from each of them (see Figure 1). The SQ ear is so called because it refers to the location of an imaginary listener who would receive the musical contributions of all the musicians. This center is expected to function as a reference point for all the musicians during the performance and to help them in coordinating and achieving a coherent sound. In this direction and following [12], four features have been implemented to evaluate 1) how the heads’ directions of the four musicians converge toward the SQ ear; 2) how much the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction 978-0-7695-5048-0/13 $26.00 © 2013 IEEE DOI 10.1109/ACII.2013.45 233

Upload: hoangnhi

Post on 27-Jul-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Expressive Non-Verbal Interaction in String Quartet

Donald GlowinskiUniv. of Genoa, IT

Univ. of Geneva, CH

[email protected]

Giorgio GneccoUniv. of Genoa, IT

[email protected]

Stefano PianaUniv. of Genoa, IT

[email protected]

Antonio CamurriUniv. of Genoa, IT

[email protected]

Abstract—The present study investigates expressive non-verbal interaction in musical context starting from behavioralfeatures extracted at individual and group level. We definefour features related to head movement and direction thatmay help gaining insight on the expressivity and cohesion ofthe performance. Our preliminary findings obtained from theanalysis of a string quartet recorded in ecological settings showthat these features may help in distinguishing between two typesof performance: (a) a concert-like condition where all musiciansaim at performing at best, (b) a perturbed one where the 1st

violinist devises alternative interpretations of the music scorewithout discussing them with the other musicians.

I. INTRODUCTION

Our aim in the present study is to shed light on theway the musicians behave in performing the co-operative andemotionally-engaged task of playing in a string quartet (SQ): inparticular, we want to measure how the expressive behavior ofthe musicians may change as a consequence of modificationsin the performing conditions.

Several studies have used observational and interviewmethods to explore the way musicians interact and determinethe overall quality of experience [3]. Others, including thestudy reported here, investigate the interaction by means ofquantitative measures, with a particular focus on expressivealignment processes in communication. The literature on alig-ment grows out of linguistics research on convergence betweenspeakers, but it has broadened to various nonverbal behaviors(e.g., [13] reviews studies on speech prosody, turn taking,joint attention, backchanneling, head nods, smiles and mirror-ing/contagion effects). Within this framework, SQs have beenidentified as a particularly promising context for investigatingexpressive and adaptive interactions of groups of people [2],[6]. The SQ scenario involves a particular social structure,which we would expect to be reflected in a particular styleof communication [10]. In a SQ, all the musicians contributeequally to the performance of the group. There is some degreeof leadership, usually played by the 1st violinist, but not thekind of hierarchy that can be seen in an orchestra (conductorvs. other musicians). In this perspective, the SQ has beendescribed as a self-managed team, i.e., a working structurewhere all partners share roughly equal responsibility in thedevelopment of a common project [4].

Our analysis starts from the movement of musicians’ heads.Head movement is known to play a central role in non-verbal communication in general [6] and in music in particular[1], [7]. Head movements may express the way musiciansunderstand the phrasing and breathing of the music, and soprovide information about the high-level emotional structures

in terms of which the players are interpreting the music.Additional information indicating how each musician standswith respect to the group as the performance unfolds, maybe obtained by studying the movement of musicians’headswith respect to the positions of several points of interest.As proposed by [5], the ear of the quartet is a prominent

(a)

(b)

Fig. 1. A picture of the four musicians of the SQ (Quartetto di Cremona)studied in this paper (a), together with their Motion Capture (MoCap) 3Drepresentation (b). The musicians are wearing MoCap reflective markers.Thesubjective center of the quartet, the ear, is represented by a white dot andcorresponds to a reflective marker mounted above a tripod situated among themusicians.

example of such points of interest. The SQ ear refers to a fixedsubjective center, whose position is defined by the musiciansthemselves, and located at nearly equal distance from each ofthem (see Figure 1). The SQ ear is so called because it refersto the location of an imaginary listener who would receivethe musical contributions of all the musicians. This center isexpected to function as a reference point for all the musiciansduring the performance and to help them in coordinating andachieving a coherent sound.

In this direction and following [12], four features have beenimplemented to evaluate 1) how the heads’ directions of thefour musicians converge toward the SQ ear; 2) how much the

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction

978-0-7695-5048-0/13 $26.00 © 2013 IEEE

DOI 10.1109/ACII.2013.45

233

musicians move jointly forward and backward with respect tothe SQ ear; 3) how the heads’ directions of each subset of 3musicians converge toward the head of the remaining one; 4)how much the head of each musician is directed toward eachother musician. Hence, a different movement behavior of thegroup with respect to the ear or other points of interest may beexpected to reflect different expressive performing conditions.

The paper is organized as follows: Section II describes themultimodal setup and experimental procedure, Section III de-tails the behavioral features implemented to characterize groupand individual expressive performance, Section IV presents theobtained results, and Section V discusses the main findings andconcludes.

II. SUBJECTS AND STIMULI

A. Choice of professional concert level musicians

The Quartetto di Cremona, an internationally recognizedstring quartet, was invited to participate to the experiment.Preliminary encounters confirmed that the components of thisquartet show key qualities that made them suitable for conduct-ing our study. They were able to tolerate disturbance createdby the multimodal setup (videocameras, markers, and on-bodysensors) thanks to their longstanding experience of perfor-mance in a variety of environmental situations (concert hall, tvand radio broadcastings). They understood and replied in detailto the experimenters’ demands, being used to work tightlywith life contemporary composers for whom they createdartworks. They demonstrated high flexibility in performing avariety of styles and have developed well-advanced strategiesto rehearse altogether. The piece that we selected was part oftheir repertoire.

B. Choice of the musical fragment

The music piece performed by the SQ during the exper-iment was extracted from the Allegro of the String QuartetNo. 14 in D minor, known as Death and the Maiden, by F.Schubert. This piece is a staple of the quartets repertoire andstirs together a number of very contrasted musical elementsincluding homorhytmic structure where musicians tend to playat unison, fugato writing styles which replicate the musicalsubject over the different instruments or concerto style melodicdevelopment interpreted by the 1st violinist and accompaniedby repetitive chords and tremolos of the other musicians.

C. Procedure

Two sessions of recordings were done with the Quar-tetto di Cremona (July, 13th and 14th, 2011) following twoexperimental procedures. In the first procedure (conditionA, experimented on the 1st day), the four musicians wereinstructed to play 5 times the Schubert music piece at bestin a concert-like situation. In the second protocol (conditionB, experimented on the 2nd day), the 1st violinist of the stringquartet devised alternative interpretations of the music score,which contradict the usual interpretation (e.g., playing fortewhere nuance is written piano, speeding up when a rallentandois requested etc.). The other members of the quartet werenot aware of these new versions before playing. For eachprocedure, the quality of each performance was assessed bythe musicians themselves through post-performance ratings on

a 7-items Likert scale (e.g., expressivity and group cohesionwere evaluated asking questions such as “how emotionallyengaging was your performance?” and “how did you manageto coordinate with the other musicians?”, see [8]).

D. Apparatus and Set-up

The experiment took place in a 250-seat auditorium, anenvironment similar to a concert hall, suitable for experimentsin ecological setups (see Figure 1(a)). A multimodal recordingplatform has been set up to capture and analyze the movement-data of the musicians. The musicians’ behavior was capturedby means of the Qualisys Motion Capture system. In particular,3 reflective markers were placed on each musician’s head (1marker on the back of the neck, 2 markers above the eyes).Original real-time applications based on the EyesWeb XMIsoftware platform have been developed to synchronize theQualisys MoCap data together with the video and audio data.

E. Selected data

The present paper focuses on one particular component ofthe recordings: the time series data of the positions of themusicians’ heads. In Section III, several static and dynamicbehavioral features of the SQ are described. They are relatedto the movement of musicians’ heads with respect to specificpoints of interest (e.g., the SQ ear).

III. DESCRIPTION OF IMPLEMENTED BEHAVIORAL

FEATURES

This section details the features implemented to charac-terize group expressive behavior and to distinguish betweenthe two performing conditions A and B. Music players arenumbered from 1 to 4, so the number 1, 2, 3, 4 denotes,respectively, the 1st violinist, the 2nd violinist, the violist,and the cellist. Frames of each recording are denoted by k(k = 1, . . . , Nframes).

A. Convergence of the heads’ directions toward the ear

The first behavioral feature F1 is a vector, made up of twocomponents, which evaluates how the heads’ directions of thefour musicians converge toward the SQ ear (see Section I).

The following procedure has been followed, for each framek of each recording.

1) For each musician i (i = 1, . . . , 4), compute the current

position vector p(k)i in the horizontal plane of the musician’s

head center of gravity (COG) as the mean of the positionvectors describing the three markers located on the musician’s

head. Then, define the current direction d(k)i in the horizontal

plane of the musician i’s head as the unit vector connectingthe COG of his head to the point located in the middle of theline between the two other markers above his eyes (see Figure2).

2) For each musician i (i = 1, . . . , 4), consider the half-

line HL(k)i starting from the point p

(k)i and with direction

d(k)i , i.e., the set of all the points with position vectors

p(k)i + td

(k)i ,

where t ≥ 0 is any nonnegative real number.

234

(a) Condition A (concert-like)

(b) Condition B (perturbed)

Fig. 2. Illustration of features F1 and F3 measuring how the heads’ directionsof the four musicians converge toward the ear and how the heads’ directionsof each subset of 3 musicians converge toward the head of the remaining onerespectively. Figure shows a snapshot of the heads’ markers positions of themusic players when condition A (concert-like) and condition B (perturbed)are tested respectively. White half-lines refer to the heads’ directions and thegreen dot corresponds to the position of the point of total convergence (PoTC)(i.e. where all musician heads are converging). The yellow point representsthe point of partial convergence associated with the 1st violinist (PoPC1) (i.e.where the subset of the other 3 musician’s heads are converging). Similarly-defined points are associated with the other musicians. One can observe thatin condition A, all musicians’ head directions converge toward the ear of theSQ (white dot in the picture above) whereas in condition B, 2nd violinist,violist and cellist heads’ directions converge toward 1st violinist’s one (reddot). The positions of all these points are used to compute the features F1

and F3, see formulas (1) and (3).

3) For each pair (i, j) of musicians (i, j = 1, . . . , 4, i < j),

compute the position vector p(k)i,j of the intersection between

the two half-lines HL(k)i and HL

(k)j . As p

(k)i �= p

(k)j , such

an intersection exists if and only if the following condition ismet:

− the algebraic linear system (in the real unknows u andv)

p(k)i + ud

(k)i = p

(k)j + vd

(k)j

has a unique solution (this happens if and only if d(k)i is

not parallel to d(k)j ), and both the obtained u and v are

nonnegative.

When the condition above holds, the position vector p(k)i,j

is then defined equivalently as

p(k)i,j = p

(k)i + ud

(k)i ,

or

p(k)i,j = p

(k)j + vd

(k)j .

The procedure is repeated 6 times, determining - for the

frames for which they exist - the 6 position vectors p(k)1,2 , p

(k)1,3 ,

p(k)1,4 , p

(k)2,3 , p

(k)2,4 , p

(k)3,4 of the 6 pairwise intersections.

4) Denote by I(k) the subset of the pairs (i, j) of musicians(i, j = 1, . . . , 4, i < j) for which the pairwise intersectionsabove exist at frame k, and by |I(k)| its cardinality. If I(k)

is nonempty, then the position vector of the point of totalconvergence (PoTC) - the point where the directions of allmusician’s head directions converge (see Figure 2) - is definedas

p(k)PoTC =

∑(i,j)∈I(k) p

(k)i,j

|I(k)| .

If I(k) is empty, the PoTC is not defined at frame k.

5) Denote by e the fixed position vector of the SQ ear and

evaluate the distance ‖p(k)PoTC − e‖ between PoTC and ear.

When the PoTC is not defined, the distance is set equal to itsmaximum value achieved in the frames of the recording forwhich the PoTC exists.

The components F1,1 and F1,2 of the first behavioralfeature F1 are defined, respectively, as the mean distancebetween PoTC and ear and its standard deviation:

F1,1 =1

Nframes

Nframes∑

k=1

‖p(k)PoTC − e‖ ;

F1,2 = std. ‖p(k)PoTC − e‖.

(1)

B. Joint movement dynamics of the heads toward the ear

Fig. 3. Illustration of feature F2 measuring how much the musicians movejointly forward and backward with respect to the ear. Figure shows an examplein which cohesion is not maximal as the radial components of the headvelocities of 1st violinist, 2nd violinist and violist are greater than 0 (theirheads are moving toward the ear), but the radial component of the cellist’shead velocity is smaller than 0 (it is moving away from the ear). Such radialcomponents are used to compute the feature F2 (see formula (2)).

The second behavioral feature F2 evaluates how much themusicians move jointly forward and backward with respect tothe SQ ear. The following procedure has been followed, foreach frame k of each recording.

235

1) Determine the velocity and kinetic energy of the i

musician’s head COG at frame k, respectively as v(k)i and

K(k)i = 1

2mi‖v(k)i ‖2.

2) Evaluate the unit vector d(k)i,ear =

e−p(k)i

‖e−p(k)i ‖ connecting

the i musician’s head COG to the ear, and determine the radialcomponent v

(k)i,rad = v

(k)i ·d(k)

i,ear of v(k)i , that is, the one along

the direction d(k)i,ear.

3) Compute the following quantities:

- individual radial kinetic energy K(k)i,rad = 1

2mi(v(k)i,rad)

2;

- signed radial kinetic energy K±,(k)i,rad = sign(v

(k)i,rad) ·

12mi(v

(k)i,rad)

2(where sign(v

(k)i,rad) = ±1);

- total radial kinetic energy K(k)rad =∑4

i=1 K(k)i,rad;

- total signed radial kinetic energy K±,(k)rad =

∑4i=1 K

±,(k)i,rad ,

and its modulus |K±,(k)rad |.

The second behavioral feature F2 is defined as the Pear-son’s correlation coefficient between the modulus of the totalsigned radial kinetic energy (which quantifies how much themusicians coordinate their head movement in the direction ofthe ear) and the total radial kinetic energy (which quantifieshow much the musician move in the direction of the ear):

F2 = corr. coeff. of time series |K±,(k)rad | and K

(k)rad. (2)

C. Convergence of a subset of 3 heads’ directions toward theremaining musician

The third behavioral feature F3 is a vector made up of 4components, one for each musician. It evaluates how the heads’directions of each subset of 3 musicians converge toward thehead of the remaining one. The following procedure has beenused, for each frame k of each recording,

1) For each musician l = 1, . . . , 4, denote by I(k)l the sub-

set of the pairs (i, j) of musicians (i, j = 1, . . . , 4, i < j, i, j �=l), different from l, for which the pairwise intersections defined

in Section III-A exist at frame k, and by |I(k)l | its cardinality.

If I(k)l is nonempty, the position vector of the point of

partial convergence (PoPCl) associated with the musician l(see Figure 2) is defined as

p(k)PoPCl

=

∑(i,j)∈I(k)

l

p(k)i,j

|I(k)l |.

If I(k)l is empty, the PoPCl is not defined at frame k.

2) Consider the distance ‖p(k)PoPCl

− p(k)l ‖ between the

PoPCl associated with the musician l and his COG. WhenPoPCl is not defined, the distance is set equal to its maximumvalue achieved in the frames of the recording for which thePoPCl exists.

Each component F3,l of the third behavioral feature F3 isdefined as the mean distance between PoPCl and the COG ofthe musician l:

F3,l =1

Nframes

Nframes∑

k=1

‖p(k)PoPCl

− p(k)l ‖. (3)

D. Focus of attention (FoA) of single musician

Fig. 4. Illustration of feature F4 measuring how much the head of eachmusician is directed toward each other musician. Figure shows an example ofdetermination of the focus of attention (FoA) for the 1st violinist. In this case,the head of violist (3rd player) is the one that minimizes the angle betweenthe direction of 1st violinist’s head and any vector connecting 1st violinist’shead to any other heads. Moreover, such minimum angle is less than thethreshold, equal to 15◦. The collection of FoAs is then used to compute thefourth feature F4 (see formula (4)).

The fourth set of behavioral features F4 is a matrix whoseelements specify how much the head of each musician isdirected toward each other musician. The following procedurehas been followed, for each frame k of each recording and foreach musician i (see Figure 4 for an example).

1) Compute the angles between the direction of musiciani’s head and the vectors connecting his head to the each other’sone, respectively.

2) If the minimum of these angles is smaller than a giventhreshold (set to 15 ◦ adapted from the literature [11]) andis achieved for the musician j, the head of musician i isconsidered as directed toward musician j. Define j as the focusof attention of i, FoA(k)(i) = j.

3) Otherwise, conclude that the head of musician i is notdirected toward any other musician in such frame. Define 0 asthe focus of attention of i, FoA(k)(i) = 0.

4) By definition, there are no frames for whichFoA(k)(i) = i (no musician is directed toward himself).

Each element F4,i,j of the fourth behavioral feature F4

is defined as the percentage of frames in which the focus ofattention of the musician i is j (the possible values of i and jbelong respectively to the set {1, 2, 3, 4} and {0, 1, 2, 3, 4}):

F4,i,j = %of frames for which FoA(k)(i) = j. (4)

IV. RESULTS

This section describes the results obtained for each featuredefined in Section III, for both conditions A and B. Allfeatures were submitted to statistical tests to draw inferenceson them. The distribution of data and their variances werefirst verified to select the most appropriate statistical tests.For all features, the obtained values did not follow a nor-mal distribution according to normality tests (Kolmogorov-Smirnov). The variances were also not homogeneous accordingto statistical tests (Levene). Specific non-parametric tests wereused alternatively that do not require the assumptions of a nor-mal distribution and equal variances (of the residuals). Mann-Whitney U test was applied on the mean F1, F2, F4 featurevalues of all recordings taken together, for each condition Aand B. Multi-level modeling (Linear Mixed Modeling) was

236

alternatively used for the F3 feature to consider an additionallevel in the analysis (subset of musicians) that could not beincluded otherwise.

A. Convergence of the heads’ directions toward the ear

Non-parametric Mann-Whitney U test showed that therewas no significant difference for F1,1 mean values of the firstfeature F1 (U=5, p = .117) but a high statistically significantdifference for F1,2 mean values (U=0, p= .008), see Figure 5.Musicians’ heads variability (value of F1,2) was significantlysmaller in condition B with respect to condition A.

Fig. 5. Means and confidence intervals for component F1,2 of first featureF1 in conditions A and B. Musicians’ heads variability was significantlysmaller in perturbed condition (B) with respect to concert-like one (A).

B. Joint movement dynamics of the heads toward the ear

For second feature F2, non-parametric Mann-Whitney Utest showed that the difference between its mean values underconditions A and B was significant (U=25, p = .008). In bothconditions, however, feature F2 mean values were extremelyhigh revealing high cohesion between musicians in their move-ments along the direction of the ear.

Fig. 6. Means and confidence intervals for second feature F2, in conditionsA and B. One can notice the high values in both conditions showing thatcohesion between musicians along the direction of the ear remain highnotwithstanding the perturbation.

C. Convergence of a subset of 3 heads’ directions toward theremaining musician

Linear mixed model (LMM) has been chosen to comparemusicians’ third feature F3 values across conditions A and Bto handle correlated data and unequal variance observed in the

Fig. 7. Means and confidence intervals for third feature F3 components inconditions A and B. One can notice that the distance between 1st violinist andhis associated point of partial convergence (PoPC1) was significantly smallerin perturbed condition (B) with respect to concert-like one (A). A similarresult was obtained for 2nd violinist, whereas the opposite was observed forthe cellist (who sits in front of the 1st violinist).

dataset. To control the inflation of type I error probability dueto multiple comparisons, the Bonferroni correction was appliedto adjust the α-value (the level of statistical significance).The linear mixed model identified significant main effects ofCondition (A vs B), (p < .001). As shown in Figure 7, thedistance between 1st violinist and his associated point of partialconvergence (PoPC1) decreases significantly from concert-likecondition (A) to perturbed one (B) revealing how 2nd violinist,violist and cellist’s heads are converging toward him. As aside effect, distance between 2nd violinist and his associatedpoint of partial convergence (PoPC2) decreases significantly,whereas distance between cellist and his associated point ofpartial convergence (PoPC4) increases significantly.

D. Focus of attention (FoA) of single musician

Statistical analysis investigated values of fourth feature F4

components related to the 1st violinist. These componentsquantify how much 2nd violinist, violist and cellist are fo-cusing on 1st violinist in condition A and B. Non-parametricMann-Whitney U test revealed a significant difference in themean values (U=8282, p < .001). Figure 8 shows pie chartssummarizing mean values of fourth feature F4 components forall musicians in condition A and B.

E. Questionnaire

Independent samples t-tests were conducted to comparethe ratings of expressivity and cohesion, in each performancecondition A and B, as indicated by the four musicians aftereach recording. Results (means and confidence intervals) areshown in Figure 9. Difference in ratings was significant,t(38)=12.13, p < .001 for expressivity, not for cohesion (p= .07). Interestingly, this rating of cohesion is consistent withthe findings obtained in Section IV-B for feature F2, accordingto which cohesion was high in both conditions, despite theperturbation specific of condition B.

V. DISCUSSION AND CONCLUSION

Playing with others represents one of the most engagingand expressive experience [3]. Our findings showed that a setof behavioral features can be implemented to automatically

237

26%

< 1%11% 63%

Distribution of the focus of attention − player 1 − Condition A

0234

27%

< 1%

25%

48%

Distribution of the focus of attention − player 1 − Condition B

0234

89%

6%5%Distribution of the focus of attention − player 2 − Condition A

014

89%

11%< 1%Distribution of the focus of attention − player 2 − Condition B

014

97%

2%< 1%Distribution of the focus of attention − player 3 − Condition A

012

72%

26%< 1%1%

Distribution of the focus of attention − player 3 − Condition B

0124

18%

65%

16%1%Distribution of the focus of attention − player 4 − Condition A

0123

9%

82%

8%< 1%Distribution of the focus of attention − player 4 − Condition B

0123

Fig. 8. Pie-chart representation of mean values of fourth feature F4

components, in conditions A and B. One can notice that 2nd violinist, violistand cellist focused their attention on 1st violinist more in perturbed condition(B) with respect to concert-like one (A).

Fig. 9. Means and confidence intervals for expressivity and cohesion itemsin the questionnaire, in condition A and B. One can notice that expressivitywas larger in concert-like condition (A) with respect to perturbed one (B),whereas cohesion was similar in the two cases.

distinguish between a highly satisfying engaging and expres-sive type of performance versus a lower one. Specifically,we found that: features F1 and F2 revealed that the SQ earplayed a central role to coordinate musicians and achieve acohesive performance in both performing conditions A andB (see Sections IV-A and IV-B); feature F3 showed that inperturbed condition B, another point of interest plays a centralrole: the point of partial convergence PoPC1 associated withthe 1st violinist (see Section III-C);features F1, F3 and F4 enabled to distinguish betweenconcert-like condition A from perturbed one B, as the ob-served differences of their means were statistically significant(see Sections IV-A, IV-C, and IV-D); the results obtainedfrom feature analysis were consistent with the results of thequestionnaire (see Section IV-E).

Our results confirm in a quantitative way previous studieson music ensemble performance (e.g., [2]). It has actually beensuggested that musicians pay attention to other performers’heads to better predict their upcoming actions. This is partic-ularly obvious when the behavior of a musician is difficult topredict. Indeed, condition B of the present study set the 1st

violinist in a privileged position, providing him with informa-tion that could be transmitted to the other musicians mainlythrough his movement, and constraining all the other musiciansto follow him tightly to maintain the group cohesion. More

generally, our study highlights the potential of the SQ scenarioas a test case to study group behavior in social emotionally-engaged and creative activities in ecological settings.

Future work will include: the computation of the features ata local scale instead of a global one (which would allow one toconsider the effect of different musical writing styles inside thesame piece); the detection of possible nonlinear dependenciesamong the individual features of different members of the SQ[9]; the application of our methodology to other SQs and othersettings (e.g., musicians in orchestras, or other small groupsof highly skilled people, not necessarily musicians such asdancers or athletes) and the use of more sophisticated classi-fication methods (e.g., SVMs) to highlight the discriminativepower of the features.

ACKNOWLEDGMENT

The project SIEMPRE acknowledges the financial supportof the Future and Emerging Technologies (FET) programmewithin the Seventh Framework Programme for Research ofthe European Commission, under FET-Open grant number:250026-2.

REFERENCES

[1] S. Dahl, F. Bevilacqua, R. Bresin, M. Clayton, L. Leante, I. Poggi, andN. Rasamimanana. Gestures in Performance. Musical Gestures: Sound,Movement, and Meaning, 2009.

[2] J.W. Davidson and J.M.M. Good. Social and musical co-ordinationbetween members of a string quartet: An exploratory study. Psychologyof Music, 30(2):186, 2002.

[3] T. Eerola and J.K Vuoskoski. A review of music and emotion studies:Approaches, emotion models and stimuli. Music Perception, 2013.

[4] A. Gilboa and M. Tal-Shmotkin. String quartets as self-managed teams:An interdisciplinary perspective. Psychology of Music, 2010.

[5] D. Glowinski, L. Badino, A. Ausilio, A. Camurri, and L. Fadiga.Analysis of leadership in a string quartet. In Third InternationalWorkshop on Social Behaviour in Music at ACM ICMI 2012, 2012.

[6] D. Glowinski, N. Dael, A. Camurri, G. Volpe, M. Mortillaro, andK. Scherer. Toward a minimal representation of affective gestures.Affective Computing, IEEE Transactions on, 2(2):106 –118, april-june2011.

[7] G. Gnecco, L. Badino, A. Camurri, A. D’Ausilio, L. Fadiga, D. Glowin-ski, M. Sanguineti, G. Varni, and G. Volpe. Towards automated analysisof joint music performance in the orchestra. In Proceedings of the ThirdInternational Conference on Arts and Technology (ArtsIT 2013), LectureNotes of the Institute for Computer Sciences, Social Informatics andTelecommunications Engineering (LNICST) series, Springer, to appear.

[8] H. Hung and D. Gatica-Perez. Estimating cohesion in small groupsusing audio-visual nonverbal behavior. Multimedia, IEEE Transactionson, 12(6):563–575, 2010.

[9] P.E. Keller and M. Appel. Individual differences, auditory imagery, andthe coordination of body movements and sounds in musical ensembles.Music Perception, 28(1):27–46, 2010.

[10] F. Seddon and M. Biasutti. A comparison of modes of communicationbetween members of a string quartet and a jazz sextet. Psychology ofMusic, 37(4):395, 2009.

[11] R. Stiefelhagen. Tracking focus of attention in meetings. In MultimodalInterfaces, 2002. Proceedings. Fourth IEEE International Conferenceon, pages 273–280. IEEE, 2002.

[12] R. Stiefelhagen and J. Zhu. Head orientation and gaze directionin meetings. In CHI ’02 Extended Abstracts on Human Factors inComputing Systems, CHI EA ’02, pages 858–859, New York, NY, USA,2002. ACM.

[13] A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D’Errico,and M. Schroder. Bridging the gap between social animal and unsocialmachine: A survey of social signal processing. IEEE Transactions onAffective Computing, 2012.

238