a low latency implementation of a non uniform partitioned overlap and save algorithm for real-time...

10

Click here to load reader

Upload: a3labdsp

Post on 24-May-2015

398 views

Category:

Education


2 download

DESCRIPTION

FIR convolution is a widely used operation in digital signal processing field, especially for filtering operations in real time scenarios. In this context, low computationally demanding techniques for calculating convolutions with low input/output latency become essential, considering that the real time requirements are strictly related to the impulse response length. In this paper, a multithreading real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is proposed with the aim of lowering the workload required in applications like reverberation, also exploiting the human ear sensitivity. Several results are reported in order to show the effectiveness of the proposed approach in terms of computational cost, taking into consideration different impulse responses and also introducing comparisons with existing techniques of the state of the art.

TRANSCRIPT

Page 1: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Paper ID 8521

A Low Latency Implementation of a Non

Uniform Partitioned Overlap and Save

Algorithm for Real Time ApplicationsA. Primavera1, S. Cecchi1, P. Peretti1 L. Romoli1, and F. Piazza1

1A3Lab - DIBET - Universita Politecnica delle MarcheVia Brecce Bianche 1, 60131 Ancona Italy

www.a3lab.dibet.univpm.it

Abstract

FIR convolution is a widely used operation in digital signal processing field, espe-cially for filtering operations in real time scenarios. In this context, low computation-ally demanding techniques for calculating convolutions with low input/output latencybecome essential, considering that the real time requirements are strictly relatedto the impulse response length. In this paper, a multithreaded real time implemen-tation of a Non Uniform Partitioned Overlap and Save algorithm is proposed withthe aim of lowering the workload required in applications like reverberation, also ex-ploiting the human ear sensitivity. Several results are reported in order to show theeffectiveness of the proposed approach in terms of computational cost, taking intoconsideration different impulse responses and also introducing comparisons withexisting techniques of the state of the art.

Page 2: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Introduction

FIR Filtering is probably one of the most recurrent operations in DSP. It is an expensive task espe-cially for long impulse responses (IRs) and low I/O latency.

LOW LATENCY CONVOLUTION COMPUTATIONAL COSTMINIMIZATION

Problem

In the last 30 years, fast convolution algorithms have been deeply investigated:•OverLap and Save (OLS), OverLap and Add (OLA) [1];• Partitioned OverLap and Save (POLS) [2, 3, 4];•Non Uniform Partitioned OverLap and Save (NUPOLS) [5, 6].

State of the Art

We propose a real-time implementation of a NUPOLS algorithm based on:• Automatic partitioning;•Multithreading implementation;• Psychoacoustic improvement;

Proposed Solution

Page 3: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Convolution (1)

Assuming a linear time-invariant system, the linear convolution between the input signal x and thesystem impulse response h is defined as follows:

y(t) = x(t) ∗ h(t) =∫ ∞−∞

x(t− τ )h(τ )dτ. (1)

For discrete-time signals and impulse response with a finite length N , it results:

y[n] = x[n] ∗ h[n] =N−1∑m=0

x(n)h(m− n) (2)

The convolution is performed using equation (2).LATENCY: Theoretically zero;COMPUTATIONAL COST: N − 1 additions and N multiplications;CONSIDERATIONS: It results too expensive for long IR (high values of N ).

Time Domain Convolution

Taking into consideration the circular convolution and the DFT property:

y[n] = x[n] N©h[n] =N−1∑m=0

x[(n−m)N ]h[m] x[n] N©h[n]↔ X [k]H [k], (3)

it results that the convolution can be computed in the frequency domain.

Frequency Domain Convolution

Page 4: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Convolution (2)

The OLS algorithm allows to convert a circular convolution into a linear convolution.LATENCY: Equal to K samples with K > N ;COMPUTATIONAL COST: 2LlogL

K + LK complex multiplications (with K power of 2 and L = 2K for

50% overlap);CONSIDERATIONS: I/O latency is too high for long IR (high values of N ).

OverLap and Save (OLS)

The IR is partitioned in sections of equal size, then, an OLS is applied on each sub-filter.LATENCY: Equal to K samples with K arbitrarily chosen;COMPUTATIONAL COST: 2LlogLK +LP

K complex multiplications and L(P−1)K additions (with K power

of 2, P the number of partitions and L = 2K for 50% overlap);CONSIDERATIONS: The required computational cost is higher than in the OLS.

Uniform Partitioned OverLap and Save (POLS)

The IR is partitioned in sections of increasing size, in order to reduce the computational costallowing a real-time implementation of zero latency convolution.LATENCY: Theoretically zero;COMPUTATIONAL COST: It depends on the adopted partitioning;CONSIDERATIONS: It is difficult to find the optimal partitioning.

Non Uniform Partitioned OverLap and Save (NUPOLS)

Page 5: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Proposed Algorithm (1)

A real time implementation of a suitable NUPOLS algorithm is proposed using NU-Tech framework.

Fig.1 Block diagram of the Non Uniform Partitioned Overlap and Save algorithm.

Three are the main features of the proposed approach:

The required workload is a function of the number of POLSs employed in the NUPOLS algorithm[6]. The optimal partitioning depends on the IR length and the I/O latency constraint.An automatic partitioning procedure is proposed exploiting an offline pre-analysis based on aniterative evaluation of the obtained performance, considering that:• Four partitions are typically enough to obtain good performance;• Very large FFTs are usually not recommended.

Automatic Partitioning (1)

Page 6: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Proposed Algorithm (2)

Each POLS can be considered as a single thread.•Run different convolutions simultaneously with an automatic parallelization of the operations;•High scalability of the implementation.

Multithreaded Implementation (2)

It is possible to reduce the computational cost exploiting the human ear sensitivity [7].The number of complex multiplications to be performed can be lowered by taking into considerationonly the spectral components with significant energy content.

Fig.2 Reverberation Time Fig.3 Frequency Bin considered in each partition of the NUPOLS.

Psychoacoustic Improvement (3)

Page 7: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Results (1)

Several tests have been carried out to evaluate the effectiveness of the proposed approach throughobjective and subjective comparisons.

Objective Analysis• Two different tests have been performed:

1. Workload estimation of POLS and NUPOLS algorithms as a function of the IR length.2. Analysis of the CPU load for three real IRs (small, medium and large size) in order to show the improvement

introduced by the psychoacoustic approach.

• Three different values for the framesize (i.e., 64, 256 and 1024 samples) have been used.• All the tests have been done using a PC with Intel Core 2 @ 2.5 GHz and 2 GigaByte of RAM.

Fig.4 Analysis of the workload as a function of the IR length.

• POLS performance is strictly re-lated to the I/O constraint.•NUPOLS allows to obtain better

performance than POLS.

Considerations

Page 8: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Results (2)

Workload of a Partitioned Overlap and Save for different IRs and framesizes (with/without psycho-acoustic approach).POLS

No PsychoAcoustic PsychoAcousticFS IR T60 SpeedUp T50 SpeedUp T40 SpeedUp

Small 96,9 65,2 1,49 62,1 1,56 55,7 1,7464 Medium 331,7 233,1 1,42 197,5 1,68 175,6 1,89

Large 709,8 558,3 1,27 487,8 1,46 467,67 1,52Small 24,8 16,4 1,51 15,3 1,62 14,4 1,72

256 Medium 76,8 47,6 1,61 43,7 1,76 41,8 1,84Large 159,9 118,3 1,35 105,0 1,52 98,1 1,63Small 7,2 5,3 1,36 4,7 1,52 4,4 1,61

1024 Medium 21,1 12,8 1,64 11,9 1,77 11,9 1,77Large 41,8 29,8 1,40 26,7 1,56 24,8 1,68

Workload of a Non Uniform Partitioned Overlap and Save for different IRs and framesizes (with/without psycho-acoustics approach).NUPOLS

No Psycho-acoustics Psycho-acousticsFS IR T60 SpeedUp T50 SpeedUp T40 SpeedUp

Small 7,9 6,3 1,26 6,0 1,31 5,8 1,3764 Medium 13,5 9,4 1,43 9,1 1,48 9,0 1,49

Large 14,9 13,7 1,08 13,4 1,11 13,2 1,13Small 5,0 4,0 1,25 3,9 1,28 3,8 1,31

256 Medium 8,2 6,0 1,37 5,7 1,44 5,7 1,45Large 9,4 9,3 1,01 8,6 1,10 8,1 1,17Small 3,3 2,6 1,27 2,6 1,26 2,5 1,31

1024 Medium 4,9 4,3 1,14 4,3 1,16 4,1 1,21Large 7,7 6,2 1,24 5,8 1,32 5,6 1,37

Page 9: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Results (3)

Subjective AnalysisFollowing the MUSHRA guidelines [8] [9], the preservation of audio quality as a function of theperceptive thresholds (T60, T50 and T40) has been evaluated by 15 listeners.

Fig.5 Listening test results for Small IR. Fig.6 Listening test results for Medium IR.

Fig.7 Listening test results for Large IR.

•Using a threshold based on T60 doesn’t af-fect the perceived audio quality.• Some artifacts are perceivable employing

T50 and T40.

Considerations

Page 10: A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Conclusions

• A complete review of the most common convolution techniques has been presented;

• A multithreaded real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is here pro-posed;

• The proposed algorithm is based on three key points:

– Automatic partitioning of the IR based on an offline analysis;– Multithreaded implementation to achieve an automatic parallelization of the operations;– Psychoacoustic optimization to reduce the computational cost.

• Different tests have been carried out according to objective and subjective measures, proving the effectiveness ofthe approach in terms of both computational saving and preservation of audio quality.

• Future works will be oriented to a further investigation on the threshold used in the psychoacoustic approach and areal time implementation of the presented algorithm on an embedded platform.

References

[1] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, Prentice Hall International Inc., 1999.

[2] Barry D. Kulp, “Digital Equalization Using Fourier Transform Techniques,” in Proc. 85th Audio Engineering Society Convention (AES’88), Los Angeles, USA, Oct. 1988.

[3] A. Farina and A. Torger, “Real Time Partitioned Convolution for Amiophonics Sourround Sound,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, NY, USA, Oct.2001.

[4] E. Armelloni, C. Giottoli, and A. Farina, “Implementation of real-time partitioned convolution on a DSP board,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NewPaltz, NY, USA, Oct. 2003, pp. 71–74.

[5] W. G. Gardner, “Efficient Convolution without Input-Output Delay,” J. Audio Eng. Soc., vol. 43, no. 3, pp. 127–136, Mar. 1995.

[6] Guillermo Garcia, “Optimal Filter Partition for Efficient Convolution with Short Input/Output Delay,” in Proc. of 113rd Audio Engineering Society Convention (AES’02), Los Angeles, CA, USA, Oct. 2002.

[7] Wen-Chieh Lee, Chung-Han Yang, Chi-Min Liu, and Jiun-In Guo, “Perceptual Convolution for Reverberation,” in Proc. 115th Audio Engineering Society Convention (AES’03), New York, U.S., November2003.

[8] ITU-R BS. 1534, “Method for subjective listening tests of intermediate audio quality,” 2001.

[9] E. Vincent, “MUSHRAM: A MATLAB interface for MUSHRA listening tests,” 2005.