an introduction to the multimodal corpus · an introduction to the multimodal corpus hucomtech and...

127
An introduction to the multimodal corpus HuComTech and its annotation László Hunyadi University of Debrecen Nijmegen, MPI, 19 December 2012 Wednesday, April 24, 13

Upload: others

Post on 08-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

An introduction to the multimodal corpus

HuComTech and its annotation

László HunyadiUniversity of Debrecen

Nijmegen, MPI, 19 December 2012

Wednesday, April 24, 13

HuComTech:

Human-Computer Technologies

Hungarian Communication Technologies

Wednesday, April 24, 13

Wednesday, April 24, 13

research modules involved:

Wednesday, April 24, 13

research modules involved:

• computational linguistics

Wednesday, April 24, 13

research modules involved:

• computational linguistics

• communication theory

Wednesday, April 24, 13

research modules involved:

• computational linguistics

• communication theory

• psychology

Wednesday, April 24, 13

research modules involved:

• computational linguistics

• communication theory

• psychology

• digital image processing

Wednesday, April 24, 13

research modules involved:

• computational linguistics

• communication theory

• psychology

• digital image processing

• engineering (robotics)

Wednesday, April 24, 13

Some basic data about the corpus

Wednesday, April 24, 13

Purpose of the corpus

to identify elements of human-human communication and structural relations between them that are

- relevant for HCI- technologically implementable

furthermore, to - learn the multimodal nature of human communication (both its verbal and nonverbal aspects)- describe human communication in a multimodal, holistic model

Wednesday, April 24, 13

The corpus is intended to represent sufficient data in proper arrangement for purposes of

linguisticslanguage technology (the teaching and testing of a speech recognition software)behavioral psychologyrobotics and more

Wednesday, April 24, 13

Wednesday, April 24, 13

Corpus:

Wednesday, April 24, 13

Corpus:

• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including

Wednesday, April 24, 13

Corpus:

• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including

• ≈ 450.000 word tokens

Wednesday, April 24, 13

Corpus:

• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including

• ≈ 450.000 word tokens

• 15 read sentences

Wednesday, April 24, 13

Corpus:

• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including

• ≈ 450.000 word tokens

• 15 read sentences

• 10 minute guided dialogues (job interviews)

Wednesday, April 24, 13

Corpus:

• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including

• ≈ 450.000 word tokens

• 15 read sentences

• 10 minute guided dialogues (job interviews)

• 15 minute free dialogues

Wednesday, April 24, 13

45.5%

54.5%

Distribution of subjects by sex

male female

0

10

20

30

40

50

19 20 21 22 23 24 25 26 27 28 29 30

22235

710

811

4848

6

Distribition of subjects by age

# su

bjec

ts

age

Wednesday, April 24, 13

Annotation

serves the study of multimodality through the study of unimodality and the fusion of aligning markers

Markers to be annotated are determined by a theoretical-technological model of communication

Wednesday, April 24, 13

Theoretical considerations

Wednesday, April 24, 13

We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.

Theoretical considerations

Wednesday, April 24, 13

We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.

Therefore, in order to properly represent human communication for technology, we need a model to follow this two-way mechanism: to serve both synthesis and analysis.

Theoretical considerations

Wednesday, April 24, 13

We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.

Therefore, in order to properly represent human communication for technology, we need a model to follow this two-way mechanism: to serve both synthesis and analysis.

We assume that the approach of a generative model proposed for syntax (especially Chomsky 1981) can be useful in building such a two-way model of communication for technology.

Theoretical considerations

Wednesday, April 24, 13

The basic structure of the model

Modularity:

Wednesday, April 24, 13

The pragmatic extension:

The basic structure of the model

Wednesday, April 24, 13

Characteristics of the constituent modules

Wednesday, April 24, 13

Characteristics of the constituent modules

Wednesday, April 24, 13

• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures

Characteristics of the constituent modules

Wednesday, April 24, 13

• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures

• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.

Characteristics of the constituent modules

Wednesday, April 24, 13

• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures

• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.

Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)

Characteristics of the constituent modules

Wednesday, April 24, 13

• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures

• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.

Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)

• The functional extension assigns all possible communicative functions to any given structure generated by the basic structure and only those

Characteristics of the constituent modules

Wednesday, April 24, 13

• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures

• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.

Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)

• The functional extension assigns all possible communicative functions to any given structure generated by the basic structure and only those

Eg. the restart node in the basic structure can be associated with the continuity function, but the assignment of the function turn-taking to it is not possible.

Characteristics of the constituent modules

Wednesday, April 24, 13

Characteristics of the constituent modules

Wednesday, April 24, 13

Characteristics of the constituent modules

Wednesday, April 24, 13

• The pragmatic extension actualizes the input from the functional extension for the given actual communicative event by selecting the appropriate markers and their appropriate values based on the given scenario and ontology behind the event.

Characteristics of the constituent modules

Wednesday, April 24, 13

• The pragmatic extension actualizes the input from the functional extension for the given actual communicative event by selecting the appropriate markers and their appropriate values based on the given scenario and ontology behind the event.

Eg. The function of ‘happiness’ is expressed by the appropriate value of some modal marker(s): facial, gestural, audio, lexical or some/all of them.

Characteristics of the constituent modules

Wednesday, April 24, 13

• Functions are translated into their technological counterparts as parameters through data fusion

• The pragmatic extension selects the modalities and their markers to represent the given function

• Actual occurrences of markers are represented by the corresponding parameter values

• Technology receives these parameter values as input and operates on them.

Interface to technology: application of the model

Pragmatic extension

Technology

The pragmatic extension is the interface to technology:

Wednesday, April 24, 13

Experimental settings and annotation

1. video

Wednesday, April 24, 13

Annotation

unimodal (video, audio)multimodal (video + audio)

manualautomatic

description of physical properties (esp. video)interpretative annotations

with focus on emotions and the multimodal alignment of video and audiospecial features of annotation: pragmatic syntactic prosodic

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Audio

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Audio

IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Audio

IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V

discourse-level: TT, TK, BC, SL

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Audio

IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V

discourse-level: TT, TK, BC, SLemotion-level: neutral, sad, happy/laughing, surprised, recall, tensed (and degrees of: strong, moderate, reduced), other, silence

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

comevent: start, end

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

comevent: start, end

deictic: addressee, self, measure, object, shape; left, right, both

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

comevent: start, end

deictic: addressee, self, measure, object, shape; left, right, both

emblems: attention, agree, block, disagree, doubt, doubt-shrug, refusal, surprise, more-or-less, number, finger-ring, hands up, one hand other hand, other

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

headshift: lower, turn, raise, shake, nod; sideways, left, right

emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

headshift: lower, turn, raise, shake, nod; sideways, left, right

touchmotion: hair, leg, arm, face, eye, ear, chin, mouth, neck, bust, forehead, nose, glasses; tap, scratch; left, right

emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

posture: crossing arm, holding head, lean back, lean forward, lean left, lean right, rotate right, roate left, shoulder up, upright

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

posture: crossing arm, holding head, lean back, lean forward, lean left, lean right, rotate right, roate left, shoulder up, upright

handshape: breaking, fist, crossing fingers, open flat, open spread, thumb out, index out; left, right, both

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)

eyebrows: scowl, up; left, right, both

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Video

facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)

eyebrows: scowl, up; left, right, both

gaze: blink, up, down, left, right, forwards, left-up, left-down, right-up, right-down

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax

structural segmentation:

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax

clause boundaries

structural segmentation:

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax

clause boundaries

hierarchical arrangement of clauses

structural segmentation:

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax

clause boundaries

hierarchical arrangement of clauses

internal structure of clauses (esp. missing elements)

structural segmentation:

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax vs. prosody

(prosody of clauses)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax vs. prosody

pitch movement: rise, fall, stagnant + finer distinctions

(prosody of clauses)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax vs. prosody

pitch movement: rise, fall, stagnant + finer distinctions

intensity: increase, decrease, stagnant + finer distinctions

(prosody of clauses)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Syntax vs. prosody

pitch movement: rise, fall, stagnant + finer distinctions

intensity: increase, decrease, stagnant + finer distinctions

pause/duration: increase, decrease, stagnant + finer distinctions

(prosody of clauses)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

Annotation: DiAMSL for text based eventseral on several layers, esp. audio: turn management, discourse

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

Annotation: DiAMSL for text based eventseral on several layers, esp. audio: turn management, discourse

Multimodality - the complex of audio + video multimodal communicative act (annotation based on Bach-Harnish)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

communicative act types: constatives directives, comissives, acknowledgements, none

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

communicative act types: constatives directives, comissives, acknowledgements, none

supporting events of communicative acts: backchannel, politeness markers, corrections, no support

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

thematic control: topic initiation, elaboration, topic change (contextual, non-contextual)

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - multimodal

thematic control: topic initiation, elaboration, topic change (contextual, non-contextual)

information structure: given vs. new information

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

agreement: uninterested, disagree, block, uncertainty; full, partial

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

agreement: uninterested, disagree, block, uncertainty; full, partial

attention: calling, paying

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

agreement: uninterested, disagree, block, uncertainty; full, partial

attention: calling, paying

deixis

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

information: received novelty

Wednesday, April 24, 13

Levels of annotation and attributes (labels)

Pragmatics - unimodal

information: received novelty

turn-management: intending to start speaking, start speaking successfully, end speaking, breaking in

Wednesday, April 24, 13

Sample data for multimodal alignments

Turn management

Wednesday, April 24, 13

Sample data for multimodal alignments

Turn management

turn-give: forward, blink, down, left-down, right-down

Wednesday, April 24, 13

Sample data for multimodal alignments

Turn management

turn-give: forward, blink, down, left-down, right-down

turn-take: forward, blink, down, left-down, right-down

Wednesday, April 24, 13

Sample data for multimodal alignments

Turn management

turn-give: forward, blink, down, left-down, right-down

turn-take: forward, blink, down, left-down, right-down

break-in_turn-keep: forwards, blink, up, down, left-down, right-down

Wednesday, April 24, 13

Sample data for multimodal alignments

Emotions vs. gestures

Wednesday, April 24, 13

Sample data for multimodal alignments

Emotions vs. gestures

uncertainty is mostly found to be associated with the hand gesture open spread, less frequently with crossing fingers

Wednesday, April 24, 13

Sample data for multimodal alignments

Emotions vs. gestures

uncertainty is mostly found to be associated with the hand gesture open spread, less frequently with crossing fingers

agreement is also associated with open spread and crossing fingers

Wednesday, April 24, 13

Sample data for multimodal alignments

Emotions vs. gestures

Wednesday, April 24, 13

Sample data for multimodal alignments

Emotions vs. gestures

doubt is found to be associated with open spread, crossing fingers and sideways as well

Wednesday, April 24, 13

Video annotation (manual vs. automatic)

annotation methodadvantage/disadvantage (+/-)advantage/disadvantage (+/-)

annotation methodphysical values interpretative values

manual - +

automatic + -

Essential difference between the two: automatic annotation is ‘digital’ following framewise judgement across a predefined number of frames, whereas manual annotation is ‘analog’

Wednesday, April 24, 13

Automatic annotation:Noldus FaceReader

Wednesday, April 24, 13

Video analysis state log - Face Model: General - Calibration: ContinousVideo analysis state log - Face Model: General - Calibration: ContinousStart time: 2012.06.15. 9:35:15Filename: C:\Users\MACMINI\Documents\Noldus test\007_J_C2.trimmed.movFrame rate:3,33333333333333

Video TimeEmotion0:00:03 Unknown

0:00:19.800Neutral0:00:34.800Unknown0:00:35.699Neutral0:00:39.300Sad0:00:42.900Neutral0:00:49.500Scared

0:00:51 Sad0:00:51.900Neutral0:01:02.400Sad0:01:05.100Disgusted0:01:07.800Neutral0:01:13.500Angry

0:01:15 Neutral0:01:16.800Sad

0:01:18 Neutral0:01:22.200Sad0:01:25.500Neutral0:01:31.200Surprised0:01:32.700Neutral0:01:33.600Sad0:01:35.399Unknown

0:01:36 Sad0:01:36.899Neutral0:01:42.600Angry0:01:46.199Sad0:01:56.100Neutral0:02:03.300Sad0:02:04.200Neutral0:02:06.300Scared0:02:07.500Neutral0:02:26.100Surprised0:02:29.100Neutral0:02:42.299Scared0:02:43.500Sad0:02:47.100Surprised

0:02:48 Scared0:02:51.299Sad0:03:00.299Neutral0:03:08.100Surprised0:03:10.500Neutral0:03:12.299Sad0:03:13.200Neutral0:03:15.899Sad0:03:20.100Disgusted

0:03:24 Sad0:03:26.100Neutral0:03:27.899Angry0:03:29.100Neutral

0:03:30 Angry0:03:31.799Sad0:03:47.100Scared0:03:48.300Sad0:04:05.100Neutral0:04:07.500Disgusted0:04:08.400Angry0:04:09.300Neutral0:04:10.500Angry

0:04:12 Sad0:04:13.200Neutral0:04:23.400Happy0:04:28.800Scared0:04:32.400Neutral0:04:40.800Unknown0:04:46.200Sad0:04:50.400Neutral0:04:51.300Sad0:04:57.900Neutral0:05:04.800Unknown0:05:05.700Angry0:05:08.700Happy0:05:13.200Angry0:05:14.099Neutral0:05:16.200Sad0:05:18.599Disgusted0:05:19.800Sad0:05:21.900Surprised0:05:22.800Sad0:05:24.900Neutral

0:05:33 Sad0:05:35.099Neutral0:05:41.400Scared0:05:43.800Neutral0:05:50.400Sad0:05:52.500Angry0:05:53.700Sad0:06:04.200Neutral0:06:11.400Angry

0:06:15 Sad0:06:15.900Unknown0:06:17.099Angry0:06:19.500Neutral0:06:21.599Sad0:06:23.400Surprised0:06:24.300Neutral0:06:29.700Angry0:06:31.799Neutral0:06:33.299Scared0:06:34.500Angry0:06:36.900Sad0:06:37.799Disgusted0:06:40.500Sad0:06:44.700Scared0:06:46.799Neutral0:06:57.900Sad0:07:02.099Neutral0:07:03.900Sad0:07:06.900Angry

0:07:09 Scared0:07:10.200Sad

0:07:12 Neutral0:07:15.300Surprised0:07:16.800Scared0:07:18.300Neutral0:07:21.300Sad0:07:22.200Unknown0:07:27.600Sad0:07:36.900Scared0:07:37.800Neutral0:07:41.100Sad0:07:42.600Neutral0:07:46.200Sad0:07:52.500Surprised0:07:56.700Sad0:07:58.200Scared0:07:59.400Neutral0:08:04.200Unknown0:08:05.100Sad0:08:08.100Scared0:08:09.600Neutral0:08:11.400Happy0:08:12.300Neutral

0:08:15 Sad0:08:15.900Neutral0:08:40.199Sad0:08:41.400Neutral0:08:46.800Scared0:08:48.300Neutral0:08:59.100Unknown0:09:00.300Sad0:09:03.300Disgusted0:09:06.900Sad0:09:10.500Disgusted0:09:13.800Surprised0:09:15.900Neutral0:09:24.300Unknown0:09:27.300Sad0:09:31.199Neutral0:09:32.100Happy0:09:34.199Sad0:09:38.699Disgusted0:09:42.900Neutral0:09:50.100Surprised0:09:52.199Neutral0:09:55.199Surprised0:09:56.699Neutral0:10:14.400Unknown0:10:15.900END

Values assigned to single frames hence only begin time of the given value

Settings and sample output

Wednesday, April 24, 13

Binary value assignment

Wednesday, April 24, 13

Timeline: natural vs. happy

Wednesday, April 24, 13

Valence

Wednesday, April 24, 13

Summary statistics

2. video

Wednesday, April 24, 13

Comparison: automatic vs. manual emotion recognition

3%4%7%

45%

42%

Manual annotation

happy natural recall tense surprise

Wednesday, April 24, 13

Comparison: automatic vs. manual emotion recognition

Although both systems are based on the FACS model of emotions, different categories (emotions) recognised

Whereas both systems assign interpretative values, manual annotation selects ‘more difficult’ ones

Manual annotation offers subjectively observed degrees of emotions (strong, moderate, reduced), for automatic annotation thresholds for being ‘happy’, ‘angry’ etc. are determined statistically > smaller degrees are left out

Wednesday, April 24, 13

Comparison: automatic vs. manual emotion recognition

Eventual unrealistic values in automatic annotation are the result of the single frames approach

Duration is not marked > offset of an emotion cannot be determined in case of non-continuous labels

Most agreement between the two approaches: happy, natural

Wednesday, April 24, 13

Annotation of spoken syntax and its relation to prosody in the HuComTech corpus

Wednesday, April 24, 13

• Aim: language technology (speech-to-text)

• communication studies (alignment of multimodal markers for communicative acts and emotions)

• linguistics (the syntax-prosody interface)

Wednesday, April 24, 13

Syntactic data from our annotation

Spoken language vs. written language

Grammar: same/different?

Same underlying principles:

grouping of elementshierarchical organisation of groups

Difference: two additional dimensions of spoken language:

- time - grouping has language specific means

Wednesday, April 24, 13

# of clauses in a sentence

# of clauses/sentence

12345678910111213141516171819202122

Informal dialogs Formal dialogs

2933 688289 82163 2492 2644 937 924 224 217 07 06 14 04 02 01 02 02 13 00 03 01 00 0

0

750

1500

2250

3000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

# of clauses/sentence

Informal dialogs Formal dialogs

Wednesday, April 24, 13

Structural relations (hierarchy)

clause sequences with no structural relation has no subordinate clausehas no coordinate clausehas neither subordinate, nor coordinate clause

embeddingsinsertionsmultiple subordination (recursion)

Wednesday, April 24, 13

Type of missing element according to

syntactic code

Type of missing element Informal dialogs % Formal dialogs %1. nothing missing 2664 35.59 758 34.62. main clause 37 0.49 15 0.693. preceding clause 58 0.77 6 0.274. relative pronoun 89 1.19 22 1.015. conjunction 22 0.29 4 0.186. subject (grammatical) 3178 42.37 1167 54.147. subject (logical) 274 3.66 113 5.178. predicate 214 2.87 72 3.299. object 102 1.36 45 2.0610. adverb 11 0.15 4 0.1811. attribute 0 0 0 012. verb 10 0.13 0 013. unfinished clause 728 9.7 167 13.114 missing element not relevant 3375 45.05 769 35.21

Sum: 143.62 149.9Type of missing element by frequency

Type of missing element Informal dialogs % Formal dialogs %14 missing element not relevant6. subject (grammatical)1. nothing missing13. unfinished clause7. subject (logical)8. predicate9. object4. relative pronoun3. preceding clause2. main clause5. conjunction10. adverb12. verb11. attribute

Sum:

3375 45.05 769 35.213178 42.37 1167 54.142664 35.59 758 34.6728 9.7 167 13.1274 3.66 113 5.17214 2.87 72 3.29102 1.36 45 2.0689 1.19 22 1.0158 0.77 6 0.2737 0.49 15 0.6922 0.29 4 0.1811 0.15 4 0.1810 0.13 0 00 0 0 0

142.62 149.9

Wednesday, April 24, 13

Syntactic types vs. gestures

Alignment of syntactic type and gestures can offer an insight into certain cognitive processes in communication:

- speech dynamics - error detection - gesturing the “untold”

Wednesday, April 24, 13

Syntactic types vs. gestures

A very interesting finding: pause (silence) and gaze can be mutually supplementary:

We found very few instance of gazing at the right overlap of an unfinished clause followed by a pause, but there was frequent gazing if there was no pause in a similar position.

Also, looking up as gazing direction was specific to alignment with the end of an unfinished clause but was quite rare at the end of a finished/complete clause.

Wednesday, April 24, 13

14%

11%

2%2%

5%

10%6%

49%

Unfinished clause type 13 + no SL + gaze

informal: forwards left-down right-downright-up left-up up downblink

Wednesday, April 24, 13

Annotation of prosody in the HuComTech corpus

Wednesday, April 24, 13

• the “IP-level” (based on F0-contour and pause, manual)

• pitch movement (automatic)

• intensity change (automatic, in progress)

• accent/stress detection (automatic, in progress)

Wednesday, April 24, 13

• aim: to generate data on pitch movement trends (actual movement type, tone range)

• capture F0-properties of syntactic types

• assign communicative functions (including emotions)

Detection of pitch movement

Wednesday, April 24, 13

• based on Praat (algorithm and scripts)

• stylization on the syllable level (P. Mertens’ prosogram (‘perceived pitch’ in semitones, http://bach.arts.kuleuven.be/pmertens/prosogram/)

• trend of syllable based stylization (Szekrényes, 2012)

• classification

The calculation of the trend of pitch movement

Wednesday, April 24, 13

{p} %o hát %o pályakezdĘ vagyok, úgyhogy legfeljebb most a tanulmányaimról tudok mesélni,1.0.2.0.0.0.6. 2.3.1.0.0.0.6.

s12upward stagnant rise fall stagnantL1 L1 L1 L1 L1 L1L1 L1 L1 L1

126.86 135.98 135.98 121.67 121.67 155.21155.21 119.61 119.61 108.28

2 3 4

40

50

60

70

80

90asyll, G=0.32/T2, DG=20, dmin=0.035

Prosogram v2.8006mc22 F shure

150 Hz

Calculations associated with syntactic type but not based on it

Wednesday, April 24, 13

Filename StartTime EndTime Duration StartValue EndValueAbsolute

DifferenceChange across time (Hz/msec) Movement ActualF0Range Sentence # Clause analysis

057fc30_I_shure 136.44 137.01 0.57 236.31 172.81 63.5 0.111411 fall MM s30 1.0.0.0.0.0.6.

057fc30_I_shure 137.01 137.06 0.05 172.81 221.4 48.59 1.079783 rise MM s30 1.0.0.0.0.0.6.

057fc30_I_shure 137.06 137.29 0.23 221.4 255.44 34.04 0.14798 rise MH2 s30 1.0.0.0.0.0.6.

057fc30_I_shure 137.29 137.78 0.49 255.44 179.93 75.5 0.153578 fall H1M s30 1.0.0.0.0.0.6.

057fc30_I_shure 137.78 138.32 0.54 179.93 186.13 6.2 0.011479 stagnant MM s31 1.2.0.0.0.0.4,6,9.

057fc30_I_shure 138.32 138.42 0.1 186.13 229.14 43.01 0.452767 rise MM s31 1.2.0.0.0.0.4,6,9.

057fc30_I_shure 138.42 139.03 0.61 229.14 168.1 61.05 0.10008 fall ML1 s31 1.2.0.0.0.0.4,6,9.

057fc30_I_shure 139.03 139.08 0.05 168.1 198.17 30.07 0.6014 rise L1M s31 2.0.0.1.0.0.6.

057fc30_I_shure 139.08 139.2 0.12 198.17 165.49 32.68 0.272326 fall ML1 s31 2.0.0.1.0.0.6.

057fc30_I_shure 139.2 139.56 0.36 165.49 169.56 4.07 0.011147 stagnant L1M s31 2.0.0.1.0.0.6.

057fc30_I_shure 139.56 139.63 0.07 169.56 201.24 31.68 0.452586 rise MM s31 2.0.0.1.0.0.6.

057fc30_I_shure 139.63 140.2 0.56 201.24 208.74 7.5 0.013276 stagnant MM s31 2.0.0.1.0.0.6.

5 pitch levels: L2, L1, M, H1, H2Slope (Hz/ms)Absolute values

3. video

Wednesday, April 24, 13

Syntactic types vs. F0

70%

23%

6%

Formal type 13 (unfinished clause)

fall rise stagnant

61%25%

14%

Informal type 13 (unfinished clause)

fall rise stagnant

Wednesday, April 24, 13

Syntactic types vs. F0

40%

16%

44%

Formal type 2 (main clause missing)

fall rise stagnant

56% 29%

16%

Informal type 2 (main clause missing)

fall rise stagnant

Wednesday, April 24, 13

Syntactic types vs. F0

70%

23%

8%

Formal type 3 (subord. clause missing)

fall rise stagnant

56% 29%

16%

Informal type 2 (subord. clause missing)

fall rise stagnant

Wednesday, April 24, 13

methodology based on the calculation of the trend of pitch movement (currently being implemented)

Detection of intensity change

Wednesday, April 24, 13

• based on Hunyadi 2002

• PET: pitch and energy over time

• accent/stress is the result of the interaction of pitch and intensity: relative prominence

• absolute PET-value + duration

Detection of accent/stress

Wednesday, April 24, 13

Ő látta? he saw ‘Did HE see it?’

Wednesday, April 24, 13

Ő látta? he saw ‘Did he SEE it?’

Wednesday, April 24, 13

mondja‘says it.’

Kati mondja.Kate says‘It is Kate who says it.’

Wednesday, April 24, 13

THANK YOU!http://hucomtech.unideb.hu/hucomtech

Wednesday, April 24, 13