an introduction to the “thor-like” power of ogg vorbis !

24
An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003

Upload: brandi

Post on 19-Mar-2016

58 views

Category:

Documents


1 download

DESCRIPTION

An Introduction to the “Thor-like” Power of Ogg Vorbis !. Robert W. Ferguson III January 30, 2003. Xiphophorus. Xiphophorus is a freshwater fish genus comprised of 23 species. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

An Introduction to the“Thor-like” Power of Ogg Vorbis!

Robert W. Ferguson IIIJanuary 30, 2003

Page 2: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

XiphophorusXiphophorus is a freshwater fish genus

comprised of 23 species. Since the 1920's its been known that one

could make hybrids between the different species easily. In some cases, one simply had to place one Xiphophorus species next to another in an aquarium, and they would reproduce.

Page 3: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

XIPH.COMXiphophorus is a non-profit organization

responsible for the Ogg project.Xiphophorus is GPL.All cool companies have an X to start

their name.

Page 4: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

What Is Ogg VorbisThe Ogg project is an open-source

alternative to proprietary and patented codecs for digital media (for both audio and video).

The Vorbis project is responsible for the creation of a perceptual audio encoder similar to famous,inherently evil, proprietary codecs popularized by global, illegal file sharing.

Page 5: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

It Is Not MP3Vorbis is in the same category as

MPEG-4 (AAC)And similar to, but higher performance than

MPEG-1/2 audio layer 3 MPEG-4 audio (TwinVQ) WMA - Windows Media Audio PAC

Page 6: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

ClassificationVorbis I

Vorbis I is a forward-adaptive monolithic transform CODEC based on the Modified Discrete Cosine Transform.

The codec is structured to allow addition of a hybrid wavelet filter bank in Vorbis II to offer better transient response and reproduction using a transform better suited to localized time

Page 7: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

PacketsVorbis uses free-form packets that have no

minimum size, maximum size, or fixed/expected size. Packets are designed that they may be truncated (or padded) and remain decodable.

Page 8: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Error DetectionVorbis provides none of its own protection

against errors. It is solely a method of accepting input

audio, dividing it into individual frames and compressing these frames into raw, unformatted 'packets'.

Page 9: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

ATH – Absolute Threshold of Hearing

Most codecs assume volume is fixed during playback. Vobis assumes that volume can be adjusted.

Page 10: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Tone MaskingTone masking is when louder frequencies

mask out adjacent quieter ones.Most codes use a psychoacoustics model

to calculate what’s left as best as possible in given bit-rate limits.

Vorbis approximates the same thing using as many bits as it takes.

Page 11: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

CouplingMost sounds consist of many channels and

have redundancy between these channels. This is exploited to lower the bit-rate if the channels are encoded in some joint representation.

The simplest example is to encode the average and the difference between channels (for a stereo sound) – this is called mid/side representation and it requires fewer bits for sections that are close to mono.

Page 12: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Channel SupportVorbis supports up to 255 channels.At the moment the encoder knows to use

coupling for 2-channel files only, but eventually it will scale.

Page 13: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Vector QuantizationVector Quantization (VQ) is a lossy data

compression method where vectors are rounded off into encoding regions.

Basically if you group together numbers describing different channels, your channels become automatically coupled (normally a group would be picked from data describing a single channel, so channels would be approximated independently).

Page 14: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Vector Quantization…The process of VQ introduces some

vector quantization noise. The difference between the approximation (a limited number of these can be chosen) and the original group of numbers.

All codecs suffer from quantization problems. VQ should suffer less.

Page 15: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Memory UsageThe vector codebooks used in the first

stage of decoding are packed, in their entirety into the Vorbis bit-stream headers.

In packed form, these codebooks occupy only a few kilobytes; The extent to which they are pre-decoded into a cache is the dominant factor in decoder memory usage.

Page 16: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Following the StandardAny file that follows the decoding

standard, regardless of encoding method follows the standard.

Page 17: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Headers Identification Header

The identification header identifies the bitstream as Vorbis, Vorbis version, and the simple audio characteristics of the stream such as sample rate and number of channels.

Comment Header The comment header includes user text comments ["tags"] and a

vendor string for the application/library that produced the bitstream.

Setup Header The setup header includes extensive CODEC setup information as well

as the complete VQ and Huffman codebooks needed for decode.

Page 18: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Decoding ProcedureThe decoding and synthesis procedure for all

audio packets is fundamentally the same.

1. decode packet type flag 2. decode mode number 3. decode window shape [long

windows only] 4. decode floor

5. decode residue into residue vectors

6. inverse channel coupling of residue vectors

7. generate floor curve from decoded floor data

Page 19: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Decoding Procedure... 8. compute dot product of

floor and residue, producing audio spectrum vector

9. inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I

10. overlap/add left-hand output of transform with right-hand output of previous frame

11. store right hand-data from transform of current frame for future lapping.

12. if not first frame, return results of overlap/add as audio result of current frame

Rearrangement of the synthesis arithmetic is possible.

Page 20: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Controversy The entire probability model of the codec, the Huffman

and VQ codebooks, is packed into the bitstream header along with extensive CODEC setup parameters (often several hundred fields).

It’s impossible to embed a simple frame type flag in each audio packet, or begin decode at any frame in the stream without having previously fetched the codec setup header.

Vorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec has been initialized/setup with the setup headers.

Page 21: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Window Shape DecodeVorbis frames use one of two PCM sample

sizes specified during codec setup. In Vorbis I, legal frame sizes are powers of two from 64 to 8192 samples. Aside from coupling, Vorbis handles channels as independent vectors and these frame sizes are in samples per channel.

Page 22: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Overlapping WindowsVorbis uses an overlapping transform, namely

the MDCT, to blend one frame into the next, avoiding most inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. The window shape assures seamless reconstruction.

Page 23: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Dealing with Windows

And slightly more complex in the case of overlapping unequal sized windows:

Page 24: An Introduction to the “Thor-like” Power of  Ogg Vorbis !

Inverse Monolithic Transform The audio spectrum is converted back into time domain

PCM audio via an inverse modified discrete cosine transform (MDCT). A detailed description of the MDCT is available in the paper The use of multirate filter banks for coding of high quality digital audio_, by T. Sporer, K. Brandenburg and B. Edler.