indexing and retrieval of audio

27
Multimedia Database Management System - Chapter 5 Indexing and Retrieval of Audio Rachmat Wahid Saleh Insani, S.Kom

Upload: rachmat-wahid-saleh-insani

Post on 29-Jul-2015

191 views

Category:

Technology


1 download

TRANSCRIPT

Multimedia Database Management System - Chapter 5

Indexing and Retrieval of Audio

Rachmat Wahid Saleh Insani, S.Kom

Multimedia Database Management System - Chapter 5

Introduction• Audio is classified into three types: speech, music,

and noise.

• Different audio types are processed and indexed in different ways.

• Query audio pieces are similarly classified, processed, and indexed.

• Audio pieces are retrieved based on similarity between the query index and the audio index in the database.

Multimedia Database Management System - Chapter 5

Objectives• Main audio properties and features.

• Audio classification.

• Main speech recognition techniques.

• General approach in indexing and retrieval.

• Temporal and content relationship between media types.

Multimedia Database Management System - Chapter 5

Main Audio Properties and Features

• Time domain

• Frequency domain

Multimedia Database Management System - Chapter 5

Features Derives in theTime Domain

A signal is represented as amplitude varying with time.

Multimedia Database Management System - Chapter 5

Features Derives in theTime Domain

• Average energy

• Zero crossing rate

• Silence ratio

E =x(n)2

n=0

N−1

∑N

ZC =| sgn x(n)− sgn x(n−1)

n=1

N

∑2N

Multimedia Database Management System - Chapter 5

Features Derived fromthe Frequency Domain

• Sound spectrum

Multimedia Database Management System - Chapter 5

Features Derived fromthe Frequency Domain

• Bandwidth

• Energy Distribution

• Harmonicity

• Pitch

Multimedia Database Management System - Chapter 5

Timbre

• Quality of a sound.

Multimedia Database Management System - Chapter 5

Audio ClassificationWhy audio classification is important?

- Different audio types require different processing and indexing retrieval techniques.

- Different audio types have different significance to different applications.

- Speech is important audio types which is successful speech recognition techniques available.

- Audio types is very useful to some applications.

- The search space after classification is reduced to a particular audio class during the retrieval process.

Multimedia Database Management System - Chapter 5

Audio Classification

• There are two types of sound: speech and music.

Multimedia Database Management System - Chapter 5

Main Characteristics

Music

• Music has frequency range from 16-20,000 Hz.

• Music has low silence ratio.

• Music has regular beats.

Speech

• Speech frequency range from 100-7,000 Hz.

• Speech has high silence ratio.

• No regular beats.

Multimedia Database Management System - Chapter 5

Audio Classification Frameworks

• Step by Step Classification

• Feature Vector Based Audio Classification

Multimedia Database Management System - Chapter 5

Step by Step Classification

Multimedia Database Management System - Chapter 5

Feature Vector BasedAudio Classification

Audio pieces of the same class are located close to each other in the feature space and audio pieces of different classes are located far apart in the feature space.

Multimedia Database Management System - Chapter 5

Speech Recognition and Retrieval

Multimedia Database Management System - Chapter 5

AutomaticSpeech Recognition

ASR system collect models or feature vectors for all possible speech units. Speech unit e.g., phoneme, word, and phrases.

Multimedia Database Management System - Chapter 5

Automatic Speech Recognition Factors

• A phoneme spoken by different speakers or the same speaker at different times produces different features in term of duration, amplitude, and frequency components.

• The above differences are exacerbated by the background or environmental noise.

• Normal speech is continuous and difficult to separate into individual phonemes.

• Phonemes vary with their location in a word.

Multimedia Database Management System - Chapter 5

General ASR System

Multimedia Database Management System - Chapter 5

Speech Recognition Performance

Speech recognition performance is normally measured by recognition error rate. The lower the error rate, the higher the performance.

The performance are affected by following factors:

- Subject matter: this may vary from a set of digits, a newspaper article, to general news.

- Types of speech: read or spontaneous conversation.

- Size of the vocabulary: it ranges from dozens to a few thousand words.

Multimedia Database Management System - Chapter 5

Music Indexing and Retrieval

Multimedia Database Management System - Chapter 5

Indexing and Retrieval of Structured Music and Sound Effects

• Structured music are represented by a set of commands.

• The most common structured music is MIDI.

• A new standard of structured audio is MPEG-4 Structured Audio.

• The formats contains structure and notes description.

Multimedia Database Management System - Chapter 5

Indexing and Retrieval of Structured Music and Sound Effects

Multimedia Database Management System - Chapter 5

Indexing and Retrieval of Sample Based Music

• Based on extracted sound features.

• Based on pitches of music notes.

Multimedia Database Management System - Chapter 5

Music Retrieval Based on a set of Features

Multimedia Database Management System - Chapter 5

Music Retrieval Based on Pitch

Multimedia Database Management System - Chapter 5

Multimedia Information IR Using Relationships between Audio and Other

Media