cross-modal (visual-auditory) denoising dana segev yoav y. schechner michael elad technion –...

76
Cross-Modal (Visual- Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Upload: lesley-stephens

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Cross-Modal (Visual-Auditory)

DenoisingDana Segev

Yoav Y. Schechner

Michael Elad

Technion – Israel Institute of Technology

1

Page 2: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

2

Digits sequence Noisy digits sequence

Denoised by state of the art algorithm of Cohen & Berdugo

Segev, Schechner, Elad, Cross-Modal Denoising

Page 3: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Use one modality to denoise another?

• Use video to denoise a soundtrack?

3

Segev, Schechner, Elad, Cross-Modal Denoising

Page 4: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

a

Very intenseNon-stationaryUnknownUnseen source.

Noise

Single microphone

4

Segev, Schechner, Elad, Cross-Modal Denoising

Page 5: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

5

very noisy audio

time (sec)

Input

Algorithm

denoised audio

OutputFor human and machine hearing

video

Cross-modalExample-

Based

Segev, Schechner, Elad, Cross-Modal Denoising

Page 6: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

6

Segev, Schechner, Elad, Cross-Modal Denoising

Page 7: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

7

Segev, Schechner, Elad, Cross-Modal Denoising

Page 8: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

8

Training xample set

nput test set

I

E

Segev, Schechner, Elad, Cross-Modal Denoising

Page 9: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

9

Segev, Schechner, Elad, Cross-Modal Denoising

Page 10: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

10

~syllable(0.25 sec)

Segev, Schechner, Elad, Cross-Modal Denoising

Page 11: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

lophone

11

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

Page 12: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

lophone

12

Sound

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

Page 13: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

13

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

Page 14: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

14

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

Page 15: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

15

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

Page 16: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

16

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

Page 17: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Cross-modal representation.

17

• Generating multimodal features.

• Cross-modal pattern recognition.

• Rendering a denoised signal.

• Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

Page 18: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

18

Input video

Video feature-space

time (sec)

Input audio

Audio feature-spaceSegev, Schechner, Elad, Cross-Modal

Denoising

Page 19: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

19

Input audio-video

time (sec)

Audio-video feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 20: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

20

Training audio-video

Audio-video examples

feature-space

time (sec)

Segev, Schechner, Elad, Cross-Modal Denoising

Page 21: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

21

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 22: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

22

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 23: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

23

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 24: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

24

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 25: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

25

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 26: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

26

Exam

ple

s

... ...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 27: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

27

Exam

ple

s

... ...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 28: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

28

Noisy audio

Clean segment

Clean segment

Clean segment

Segev, Schechner, Elad, Cross-Modal Denoising

Page 29: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

29

Noisy audio

Clean segment

Clean segment

Clean segment Denoised

Segev, Schechner, Elad, Cross-Modal Denoising

Page 30: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Exam

ple

s

... ...

30

Segev, Schechner, Elad, Cross-Modal Denoising

Page 31: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

31

Examples..

. ..

.

Input

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 32: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

32

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Page 33: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

33

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Page 34: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

34

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Page 35: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Bartender experiment

35

Segev, Schechner, Elad, Cross-Modal Denoising

Page 36: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

36

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Page 37: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Cross-modal representation.

37

• Generating multimodal features.

Cross-modal pattern recognition (NN).Rendering a denoised signal.

• Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

Page 38: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

38

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

Page 39: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

39

Feature-spaceFor the k-th

example segment:

Segev, Schechner, Elad, Cross-Modal Denoising

Page 40: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

40

Feature-space

bi

fif

ty

two

ar

bi - fif - ty- two

For the k-th example segment:

Segev, Schechner, Elad, Cross-Modal Denoising

Page 41: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

41

Current cluster

Next cluster

bi ty fif two ar

bi

tyfif

twoar

1

1

1

1

1

1

1

Feature-space

bi

fif

ty

two

ar

1

2

1

Segev, Schechner, Elad, Cross-Modal Denoising

Page 42: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

42

Current cluster

Next cluster

bi ty fif two ar

bi

tyfif

twoar

13

17

22

9

43

21

53

60

2

3

7 11

6

23

12

5

7

6

1

2

4

526 1

12

Syllable consecutive probability

The probability for transition

between clusters

=Number of examples in training set

Segev, Schechner, Elad, Cross-Modal Denoising

Page 43: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

43

Hidden Markov Model

PTimedelay

bi

fif

fif

bi

Segev, Schechner, Elad, Cross-Modal Denoising

Page 44: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

44

PTimedelay

bi

fif

fif

bike

Audio noise

Segev, Schechner, Elad, Cross-Modal Denoising

Page 45: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

45

Hidden Markov Model

PTimedelay

bi

fif

fif

bi

+mi

Audio noise

keSegev, Schechner, Elad, Cross-Modal Denoising

Page 46: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

46

Examples..

. ..

.

Input

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 47: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

47

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 48: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

48

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 49: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

49

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

Page 50: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

50

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

Page 51: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

51

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

Page 52: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

52

A Cost function

A Regularization term

A Data term

A Regularization term

A Data term

Segev, Schechner, Elad, Cross-Modal Denoising

Page 53: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

53

A Cost function

A Regularization term

A Data term

A Regularization term

A Data term

Optimally vector of indices

Segev, Schechner, Elad, Cross-Modal Denoising

Page 54: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

54

• nodes

• edges

Complexity:

Examples

Input

...

.. .

...

...

...

...

...

...

...

...

Complexity: Dynamic Programming

Segev, Schechner, Elad, Cross-Modal Denoising

Page 55: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

55

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 56: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

56

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 57: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

57

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Page 58: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Cross-modal representation.

58

• Generating multimodal features.

Cross-modal pattern recognition.

Rendering a denoised signal.

Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

Page 59: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Audio Features

59

• Sensitivity to sound perception.• Dimension reduction

Visual Features• Focusing on the

motion of interest• Dimension reduction

SpeechFeatures

MusicFeatures

Requirements

The spatial trajectoryof a hitting rod

DCT coefficients

MFCCs

Spectrogram of each segment

Segev, Schechner, Elad, Cross-Modal Denoising

Page 60: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

60

MFCCs – Mel-frequency Ceptral Coefficients

Audio signalSignal spectrum

Mel-frequency filter bank log(.)

DCT

MFCCsSegev, Schechner, Elad, Cross-Modal Denoising

Page 61: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

61

Spectrogram of each segment

Spectrogram

Xylophne signal

Spectrogram

accumulation

Segev, Schechner, Elad, Cross-Modal Denoising

Page 62: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

The given movie

62

. . .

speech

Segev, Schechner, Elad, Cross-Modal Denoising

Page 63: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Locking on the object of interest

63

. . .speech

Segev, Schechner, Elad, Cross-Modal Denoising

Page 64: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

64

. . .speech

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

Page 65: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

65

. . .speech

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

Page 66: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Extracting features

66

DCT coefficients which highly represent motion between frames

speech

Segev, Schechner, Elad, Cross-Modal Denoising

Page 67: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

The given movie

67

. . .

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

Page 68: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Locking on the object of interest

68

Xylophone

. . .

Segev, Schechner, Elad, Cross-Modal Denoising

Page 69: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Extracting global motion by tracking

69

Xylophone

. . .

X

Z Y

Segev, Schechner, Elad, Cross-Modal Denoising

Page 70: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

70

Xylophone

. . .X

Z Y

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

Page 71: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Extracting features

71

Xylophone

Hitting rod spatial coordinates

X

YZ

Segev, Schechner, Elad, Cross-Modal Denoising

Page 72: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Speech

72

• A corpus of a limited number of words and

syllables:

Digits and bar beverages.

• Video rate 25fps, Audio rate 8000Hz.

• Kmeans clustering, 350 clusters.

• Distance measurement l2 norm.Xylophone

• A corpus of a limited sounds.

• Video rate 25fps, Audio rate 16000Hz

• Distance measurement l2 norm.Segev, Schechner, Elad, Cross-Modal Denoising

Page 73: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

73

Xylophone

•Training duration: 103 sec

•Testing duration : 100 secMusic from song by

GNR: SNR = 0.9Xylophone

Melody: SNR = 1

Segev, Schechner, Elad, Cross-Modal Denoising

Page 74: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Speech: Digits

74

•Training duration: 60 sec•Testing duration : 240 sec

Noisy Denoised

SNR = 0.07

Segev, Schechner, Elad, Cross-Modal Denoising

Page 75: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

Speech: Bartender

75

Music from song by Phil Collins

Male Speech White Gaussian

•Training duration: 48 sec

•Testing duration : 350 sec

SNR = 0.59

SNR = 0.3 SNR = 0.38

Segev, Schechner, Elad, Cross-Modal Denoising

Page 76: Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1

76

video

very noisy audio

time (sec)

Input

Algorithm

denoised audio

OutputFor human and machine hearing

• Example-based• Hidden Markov Model

Segev, Schechner, Elad, Cross-Modal Denoising