classifying motion picture audio eirik gustavsen 07.06.07

20
Classifying Motion Picture Audio Eirik Gustavsen 07.06.07

Post on 19-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Classifying Motion Picture Audio

Eirik Gustavsen07.06.07

Outline

• Motivation • Thesis• State of the Art• Proposed system• Experimental setup• Results• Future work• Conclusion

Motivation

• Most projects classify clear classes or classes with noise.

• Few clear boundaries in motion picture audio• Subjective descriptions of movies• Dificult to compare movie content

Thesis

It is possible to automatically create a table of contents of a motion picture, based on its audio track only.

Research questions

• Find best LLDs to classify motion picture audio

• Detect boundaries between audio classes within complex audio segments

• Automatically create a TOC based on the audio track only

Pre-Processing44100 Hz sample rateMono16 bits

30 ms windows (LW)

Low Level Descriptors

Time domain Frequency domain

Low Level Descriptors

• Total of 23 low level descriptors

TIME DOMAIN

• Audio Power• Audio Wave Form• Root-Mean Square• Short Time Energy• Low Short Time Energy Ratio• Zero-Crossing Rate• High Zero-Crossing Rate Ratio

FREQUENCY DOMAIN

• Audio Spectrum Centroid• Fundamental Frequency• 10 Mel-Frequency Cepstral Coefficients• Spectrum Flux

Dimensionally reduction

Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.

f(1)f(2)f(3)f(4)f(5)...f(23)

PCAd(1)d(2)d(3)

K Nearest Neighbors

Proposed system

Pre- Prosessing LLD Norm

PCAKNNPost- Prosessing

TOC Generation

Classifying Audio

Speech

Noise (white)

Music

”Silence”

Mixed audio classes

Class Boundary Detection

Class Boundary Detection

Class Boundary Detection

Finding most suitable LLDs

Most Suitable:

ASCAWFRMSHZCRR

Sample Results

Music with low volume

Clear speech

Speech with background environmental sounds

Fading between music and speech

Speech with Background music

Jingle

” Some mistakes”

Future Work

• To be done in this thesis– Post processing– TOC

• Open research questions for future works– New motion picture audio classes– Detecting sound objects– Speech recognition

Conclusion

• Pre-processing makes it possible to classify motion picture audio correctly

• Using right combination of LLDs enhances the result of the classification

Questions

?