tejaswi nanjundaswamy and kenneth rose signal compression ... · tejaswi nanjundaswamy and kenneth...

1
B IDIRECTIONAL C ASCADED L ONG T ERM P REDICTION FOR F RAME L OSS C ONCEALMENT IN P OLYPHONIC AUDIO S IGNALS Tejaswi Nanjundaswamy and Kenneth Rose Signal Compression Lab, Department of ECE, University of California Santa Barbara Why Frame Loss Concealment (FLC) ? Existing FLC techniques X Waveform repetition X Ineffective for most audio signals as they X are polyphonic. X Subband domain linear prediction X Ineffective for long lost blocks. X MDCT domain tonal X interpolation : : Frame n-1 Frame n : : Frame n+1 X Ineffective when MDCT fails to X resolve individual tones. Cascaded Long Term Prediction (CLTP) Predicting periodic signals such as x i [m]= α i x i [m - N i ]+ β i x i [m - N i + 1] is well known. x 1 [ m]= α 1 x 1 [ m - N 1 ]+ β 1 x 1 [ m - N 1 + 1] H 1 ( z ) [ 1 - α 1 z -N1 -β 1 z -N1+1 ] ____ ____ y 1 [ m]= x 1 [ m] * h 1 [ m] x 2 [ m]= α 2 x 2 [ m - N 2 ]+ β 2 x 2 [ m - N 2 + 1] H 2 ( z ) [ 1 - α 2 z -N2 -β 2 z -N2+1 ] ____ ____ y 2 [ m]= x 2 [ m] * h 2 [ m] We recently introduced CLTP for mixture of such periodic signals x [m] = x i [m] (i.e., polyphonic signals) to improve its compression efficiency. x [ m]= x 1 [ m]+ x 2 [ m] H 1 ( z ) [ 1 - α 1 z -N1 -β 1 z -N1+1 ] ____ __ H 2 ( z ) [ 1 - α 2 z -N2 -β 2 z -N2+1 ] __ ____ ________________________________ _______ _______ + = y [ m]= x [ m] * h 1 [ m] * h 2 [ m] + = Here we propose employing a CLTP synthesis filter, suitably adapted for FLC by utilizing the past and even the usually available future samples effectively. Cascaded Long Term Prediction for Frame Loss Concealment Preliminary parameters are estimated from the past samples via a recursive technique Parameters of j th filter (1 - α j z -N j - β j z -N j +1 ) are estimated in the residue of filtering with all the others Q i ,i 6=j (1 - α i z -N i - β i z -N i +1 ), via the well known technique for LTP. Each filter in the cascade is estimated this way in a loop until convergence. Using only the past samples for the filter parameter estimate doesn’t explain future samples correctly So CLTP filter updated with multiplicative gain factors H c (z )= P -1 Y i =0 (1 - G i (α i z -N i + β i z -N i +1 )). The gain factors are adjusted to minimize squared prediction error in the future samples. As cost function has complex dependency on these factors, a generic quasi-Newton opti- mization called L-BFGS method is used along with backtracking line search for step sizes. Simply predicting from past samples doesn’t ensure smooth transition into the available future samples. Thus lost frame samples are predicted in reverse direction from future samples with different set of CLTP gain factors. Final reconstruction of lost frame is a weighted average of predicted samples in each direction. + = For use in MPEG AAC, the reconstructed frame is transformed to MDCT domain and energy smoothing performed in each band l , via a gain factor given as, f [l ]= r e n-1 [l ]e n+1 [l ] e n [l ] , if e n [l ] e n-1 [l ]e n+1 [l ] > T or e n [l ] e n-1 [l ]e n+1 [l ] < 1/T , 1, otherwise. Frame n-1 Frame n Frame n+1 Frame n-1 Frame n Frame n+1 Evaluations MPEG reference AAC-LD encoder used to generate 64 kbps bitstreams and the following decoders compared, Reference decoder with no frame loss. Reference decoder with subband domain linear prediction based FLC (SBP-FLC). Reference decoder with MDCT domain tonal interpolation FLC (MDCT-FLC). Reference decoder with the proposed CLTP based FLC (CLTP-FLC). Testing data-set: 6 audio files, 4s each, mono, 44.1/48 kHz. Frame loss was at the rate of 10% and random. Objective evaluation results of Segmental SNR in dB. Filename SBP-FLC MDCT-FLC CLTP-FLC Piano -3.16 -0.67 5.10 Guitar -1.95 0.19 7.15 Harp -3.59 -1.77 3.80 Bells -2.08 0.06 4.26 Mfv 2.27 0.34 11.53 Mozart -2.03 1.22 8.4 Average -1.76 -0.11 6.71 (+6.82) Subjective evaluation results of MUSHRA listening tests (16 listeners, plots with average and 95% confidence interval). Ref Anc NoLoss SBP-FLC MDCT-FLC CLTP-FLC 0 20 40 60 80 100 - Single instrument multiple chord files - Orchestra files pp pp a Conclusions Currently used FLC techniques sub-optimal for polyphonic audio signals. Bidirectional cascaded LTP proposed for significantly improved FLC, which takes into account all the available information. Subjective and objective evaluations substantiate these improvements. Future directions include developing low complexity variant and handling burst frame losses. a http://www.scl.ece.ucsb.edu/ {tejaswi, rose}@ece.ucsb.edu

Upload: others

Post on 10-Jun-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tejaswi Nanjundaswamy and Kenneth Rose Signal Compression ... · Tejaswi Nanjundaswamy and Kenneth Rose Signal Compression Lab, ... jz Nj jz Nj+1) are estimated in the residue of

BIDIRECTIONAL CASCADED LONG TERM PREDICTION FORFRAME LOSS CONCEALMENT IN POLYPHONIC AUDIO SIGNALS

Tejaswi Nanjundaswamy and Kenneth RoseSignal Compression Lab, Department of ECE, University of California Santa Barbara

Why Frame Loss Concealment (FLC) ?

Existing FLC techniques

XWaveform repetition

X Ineffective for most audio signals as they

X are polyphonic.

XSubband domain linear prediction

X Ineffective for long lost blocks.

XMDCT domain tonalX interpolation

: :

Frame n−1 Frame n

: :

Frame n+1

X Ineffective when MDCT fails to

X resolve individual tones.

Cascaded Long Term Prediction (CLTP)

• Predicting periodic signals such as xi[m] = αixi[m − Ni] + βixi[m − Ni + 1] iswell known.

x1[m] = α1x1[m − N1] + β1x1[m − N1 + 1]

H1(z )[1 − α1z

−N1−β1z−N1+1]

____ ____

y1[m] = x1[m] ∗ h1[m]

x2[m] = α2x2[m − N2] + β2x2[m − N2 + 1]

H2(z )[1 − α2z

−N2−β2z−N2+1]

____ ____

y2[m] = x2[m] ∗ h2[m]

•We recently introduced CLTP for mixture of such periodic signals x[m] =∑xi[m] (i.e., polyphonic signals) to improve its compression efficiency.

x[m] = x1[m]+ x2[m]

H1(z )[1 − α1z

−N1−β1z−N1+1]

____ __

H2(z )[1 − α2z

−N2−β2z−N2+1]

__ ____

________________________________

__

__

__

_

__

__

__

_

+

=

y [m] = x[m]∗ h1[m]∗h2[m]

+

=

• Here we propose employing a CLTP synthesis filter, suitably adapted for FLCby utilizing the past and even the usually available future samples effectively.

Cascaded Long Term Prediction for Frame Loss Concealment

• Preliminary parameters are estimated from the past samples via a recursivetechnique• Parameters of jth filter (1 − αjz−Nj − βjz−Nj+1) are estimated in the residue of filtering with

all the others∏∀i,i 6=j(1− αiz−Ni − βiz−Ni+1), via the well known technique for LTP.

• Each filter in the cascade is estimated this way in a loop until convergence.

• Using only the past samples for the filter parameter estimate doesn’t explainfuture samples correctly

• So CLTP filter updated with multiplicative gain factors

Hc(z) =P−1∏i=0

(1− Gi(αiz−Ni + βiz−Ni+1)).

• The gain factors are adjusted to minimize squared prediction error in the future samples.• As cost function has complex dependency on these factors, a generic quasi-Newton opti-

mization called L-BFGS method is used along with backtracking line search for step sizes.

• Simply predicting from past samples doesn’t ensure smooth transition into theavailable future samples.• Thus lost frame samples are predicted in reverse direction from future samples

with different set of CLTP gain factors.

• Final reconstruction of lost frame is a weighted average of predicted samplesin each direction.

+

=

• For use in MPEG AAC, the reconstructed frame is transformed to MDCT domainand energy smoothing performed in each band l , via a gain factor given as,

f [l] =

√√

en−1[l]en+1[l]en[l]

, if en[l]√en−1[l]en+1[l]

> T or en[l]√en−1[l]en+1[l]

< 1/T ,

1, otherwise.

Frame n−1 Frame n Frame n+1 Frame n−1 Frame n Frame n+1

Evaluations

•MPEG reference AAC-LD encoder used to generate 64 kbps bitstreams and thefollowing decoders compared,• Reference decoder with no frame loss.• Reference decoder with subband domain linear prediction based FLC (SBP-FLC).• Reference decoder with MDCT domain tonal interpolation FLC (MDCT-FLC).• Reference decoder with the proposed CLTP based FLC (CLTP-FLC).

• Testing data-set: 6 audio files, 4s each, mono, 44.1/48 kHz.• Frame loss was at the rate of 10% and random.•Objective evaluation results of Segmental SNR in dB.

Filename SBP-FLC MDCT-FLC CLTP-FLCPiano -3.16 -0.67 5.10Guitar -1.95 0.19 7.15Harp -3.59 -1.77 3.80Bells -2.08 0.06 4.26Mfv 2.27 0.34 11.53Mozart -2.03 1.22 8.4Average -1.76 -0.11 6.71 (+6.82)

• Subjective evaluation results of MUSHRA listening tests (16 listeners, plotswith average and 95% confidence interval).

Ref Anc NoLoss SBP−FLC MDCT−FLC CLTP−FLC

0

20

40

60

80

100− Single instrument multiple chord files − Orchestra filespp pp

a

Conclusions

• Currently used FLC techniques sub-optimal for polyphonic audio signals.• Bidirectional cascaded LTP proposed for significantly improved FLC, which

takes into account all the available information.• Subjective and objective evaluations substantiate these improvements.• Future directions include developing low complexity variant and handling burst

frame losses.

a

http://www.scl.ece.ucsb.edu/ {tejaswi, rose}@ece.ucsb.edu