in ♫ ♫ otion harmony zohar barzelay, yoav y. schechner dept. elect. eng. technion – israel...

38
in otio n Harmony Zohar Barzelay , Yoav Y. Schechner Dept. Elect. Eng. Technion – Israel Institute of Technology 1 Ack: Einav Namer, Yael Waissman, ISF

Post on 18-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

in ♫ ♫ otion Harmony

Zohar Barzelay , Yoav Y. Schechner

Dept. Elect. Eng.Technion – Israel Institute of Technology

1

Ack: Einav Namer, Yael Waissman, ISF

2

Barzelay, Schechner

Violin-guitar: raw

“Harmony in otion” ♫ ♫

3

Barzelay, Schechner

Violin: Detected and Recovered

“Harmony in otion” ♫ ♫

4

Barzelay, Schechner

Guitar: Detected and Recovered

“Harmony in otion” ♫ ♫

5

Video features: track all

Barzelay & Schechner, Harmony in Motion

Find the best

6

Barzelay & Schechner, Harmony in Motion

Finding an Audio-Visual Object (AVO)

Spatial matching: Many “coincidences”

Barzelay & Schechner, Harmony in Motion

?

?

?

7

Corresponding images?* Always: unmatched features* Good image match: many “coincidences”* Spatial Edges

Spatial matching

* Feature-based

* Feature = significant change in space: edge, corner

* Maximize coincidences

* No need to match everything

Barzelay & Schechner, Harmony in Motion

Audio-Visual matching

* Feature-based

* Feature = significant change in time: temporal-edge

* Maximize coincidences

* No need to match everything

8

Barzelay & Schechner, Harmony in Motion

Feature-based Cross-Modal Matching 9

Barzelay & Schechner, Harmony in Motion

Feature-based Cross-Modal Matching 9

Barzelay & Schechner, Harmony in Motion

Feature-based Cross-Modal Matching

time [frames]

Acceleration

10

x

Feature-based Cross-Modal Matching

‘Visual Onsets’ ‘Audio Onsets’

t0

1

vt

0

1

a

Amplitude

t

11

Barzelay & Schechner, Harmony in Motion

Audio-Visual Coincidences 12

13

Barzelay & Schechner, Harmony in Motion

Audio Pre-processing

t0

frequency

t

amplitude

0

frequency

energy

0

FSpectrogram

Significant change in audio

Barzelay & Schechner, Harmony in Motion

t0

frequency

spectrogram

Audio Onsets

Beginning of new sounds

t0

temporal derivative

14

Handling pitch-drift

Barzelay & Schechner, Harmony in Motion

15

directional derivativespectrogramnon-directional derivativespectrogram

Barzelay & Schechner, Harmony in Motion

Handling pitch-drift16

v

v

0

1

v

t

v

t

Visual Matching 17

t

a0

1

0

1

0

-4

1v

v

v

vt

-5

t

Visual Matching 18

Amplitude

0

1

vt

0

1

coincidences

inconsistencies

= TL a v-(1 ) T a v

a

Barzelay & Schechner, Harmony in Motion

Ranking Criterion

t0

t

19

0

1

vt

0

1

a

Barzelay & Schechner, Harmony in Motion

Residual Audio Onsets 20

(1)

coincidences

Residual Onsets0

t a

t0

1

v

v

vt

Sequential Object Detection 21

t0

Amplitude

Residual Onsets

(1)a

0

1

v

Barzelay & Schechner, Harmony in Motion

22

Barzelay, Schechner

Speech: raw

“Harmony in otion” ♫ ♫

23

Barzelay, Schechner

Speech A-B-C: Detected & Recovered

“Harmony in otion” ♫ ♫

24

Barzelay, Schechner

Speech 1-2-3: Detected & Recovered

“Harmony in otion” ♫ ♫

Audio Isolation

25

26

Barzelay & Schechner, Harmony in Motion

Audio Pre-processing

t0

frequency

t

amplitude

0

frequency

energy

0

FSpectrogram

t0

frequency

Spectrogram

t

Audio Isolation 27

Corresponding

Onsets

Barzelay & Schechner, Harmony in Motion

0

0 f0 2 f03 f04 f0 5 f0 6 f

07 f

Harmonic

Sounds

t

Audio Isolation

Spectrogram

27

Corresponding

Onsets

t

frequency

28

Barzelay & Schechner, Harmony in Motion

Fourier representation

t0

frequency

t

amplitude

0

frequency

energy

0

Spectrogram

frequency

phase

0 F

29

Barzelay & Schechner, Harmony in Motion

Filtered audio

t0

frequency

t

amplitude

0

frequency

energy

0

Spectrogram

frequency

old phase

0 F -1

0

1

t

v

t

Barzelay & Schechner, Harmony in Motion

Limitations: Temporal Tolerance

t0

t a

3000:00:16

¼ sec

Time-Frequency overlap

Barzelay & Schechner, Harmony in Motion

Limitations: Audio Sparsity 31

t

frequency

Overlapping audio onsets

Sounds may overlap in timeOnsets should not

0

1

t

v

time

acceleration

Feature-Detection:–edge scale–significance level–pruning

Barzelay & Schechner, Harmony in Motion

Detection Parameters 32

Visual Edges:

00:00:15

33

Barzelay, Schechner

Dual Viloin

“Harmony in otion” ♫ ♫

Barzelay, Schechner

“Harmony in otion” ♫ ♫34

Barzelay, Schechner

“Harmony in otion” ♫ ♫35

Feature-based Cross-Modal Association

• Features: Temporal Audio/Visual Edges.

• Simultaneous Objects + Sounds.

• A General Concept.

36