some aspects of wideband speech in enterprise telephony

22
2nd Workshop on Wideband Spee ch Quality - June 2005 1 Some Aspects of Wideband Speech in Enterprise Telephony Eric J. Diethorn ([email protected] ) with Gary W. Elko ([email protected] ) and Joseph L. Hall ([email protected] ) Avaya, Inc. Avaya Labs, Research 233 Mt. Airy Road, Basking Ridge, New Jersey 07920 USA

Upload: rock

Post on 30-Jan-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Some Aspects of Wideband Speech in Enterprise Telephony. Eric J. Diethorn ( [email protected] ) with Gary W. Elko ( [email protected] ) and Joseph L. Hall ( [email protected] ) Avaya, Inc. Avaya Labs, Research 233 Mt. Airy Road, Basking Ridge, New Jersey 07920 USA. Outline. Physical acoustics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

1

Some Aspects of Wideband Speech in Enterprise Telephony

Eric J. Diethorn ([email protected])

with

Gary W. Elko ([email protected]) and

Joseph L. Hall ([email protected])

Avaya, Inc.

Avaya Labs, Research

233 Mt. Airy Road,

Basking Ridge, New Jersey 07920 USA

Page 2: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

2

Outline

Physical acoustics Echo Voice coders Conferencing Wideband speech & intelligibility Hallway demonstration – Avaya SIP Softphone

Page 3: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

3

Some introductory thoughts

Wideband speech telephony will instantaneously raise the bar of end-user expectation, at least for some applications. Skype

We have standards for the reproduction of wideband speech, but is wider-band good enough? Maybe [150, 5000] is good enough?

With greater bandwidth comes a greater range of potential artifacts that the acoustical-signal-processing engineer must address. Low-frequency acoustic echo, earpiece hiss, speech-coder

distortion, arbitration of multiple sampling rates.

The preferences of end users are uncertain. Speech bandwidths policies (buddy lists, profiles)? Suppose I have a physiological speech impediment. Do I

want it emphasized?

Page 4: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

4

Physical acoustics

The physical design of terminal acoustics must change to render wideband speech.

Acoustical signal processing changes, too.

Page 5: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

5

Loudspeakers & enclosures

Frequency (Hz)

So

un

d P

res

su

re L

ev

el

(dB

)

Frequency response,traditional narrowband speakerphone,

80 dB-SPL50 cm

Page 6: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

6

Loudspeakers & enclosures

Frequency (Hz)

TH

D a

t h

arm

on

ics

(%

)

Total harmonic distortion,traditional narrowband speakerphone,

80 dB-SPL50 cm

• High distortion at low frequency end of wideband-speech spectrum

• Acoustic echo control difficult if not impossible without acoustical modifications.

Page 7: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

7

Earpieces

Frequency (Hz)

So

un

d P

res

su

re L

ev

el

(dB

)

Frequency response, wideband handset

• In order to satisfy wideband standards, acoustical modifications are necessary to extend the low-frequency response of most earpiece designs.

• This is particularly challenging for physical arrangements in which the earpiece is held to the ear with little pressure.

Page 8: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

8

Microphones Most low-cost electret microphones used today have a

frequency response that is practically flat beyond the range of wideband speech – they are “wideband ready.”

Multiple microphone arrangements – arrays – can be exploited to reduce the level of ambient noise at frequencies not present in traditional narrowband telephony. Low-frequency rumble. High-frequency hiss.

Short-time spectral modification methods of noise reduction can help, but the perception of artifacts from such processing is enhanced by the wider speech band.

Page 9: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

9

Microphones

Omnidirectional microphone (traditional) Good pick-up of talkers in all

directions But, picks-up ambient noise

from all directions

Directional microphone Reduces off-axis noise Reduces reverberation of talker’s voice Reduces coupling from speakerphone

(helping AEC) But, talkers off axis can’t be heard well.

Front of

phone

Front of

phone

Page 10: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

10

Echo

Requirements on echo control may change. The art of echo control must evolve to meet the

challenge of wideband speech.

Page 11: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

11

Requirements on Talker Echo

Source: Transmission Systems for Communications, Bell Telephone Laboratories, Inc., 5th Edition, 1982.

Roundtrip, mouth-to-ear, echo loss requirements were measured on populations for narrowband speech. How well do these data apply to wideband speech echo paths?

Echo annoyance as a function of roundtrip, mouth-to-ear loss and delay, for narrowband speech.

Acoustic-to-acoustic echo-path loss (dB)

Pe

rce

nt

Go

od

-or-

Be

tte

r

Page 12: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

12

Talker Echo, Continued

Source: Transmission Systems for Communications, Bell Telephone Laboratories, Inc., 5th Edition, 1982.

Being strictly digital, wideband-speech network paths do not suffer from analog circuit noises, however, analog and environmental noises enter calls at the endpoint. Should requirements on talker echo incorporate such (wideband) noise phenomena?

Echo annoyance as a function of roundtrip, mouth-to-ear echo-and-noise loss. Long-haul (~1000 mi.) PSTN connection, circa 1980.

Page 13: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

13

Wideband speech coding

G.722, G.722.1 and G.722.2 G.722 is cheap. G.722.1 often comes with video-on-the-enterprise

(Polycom).

Proprietary codecs Silicon solution providers have their favorites. Some are

pretty good.

Linear 16-bit encoding? Speech-transmission bandwidth (bits-per-second) is

becoming a non-issue in the enterprise, at least for wired LANs.

Architecturally appealing within the enterprise. Let boundary gateways worry about transcoding.

Page 14: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

14

Multirate audio conferencing

IP-1

Conferencebridgeserver

narrowbandspeech

Wide- andnarrow-band

speech

Leased WAN(compressed speech,e.g., G.729, G.726)

PSTN

IP-2...

...

Rate arbitration Transcoding Multirate mixing (Artificial) bandwidth extension

Page 15: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

15

Stereo audio conferencingHands-free, wideband-speech communications with stereo echo cancellation

ROOM 1 ROOM 2

h1 h1~

talker

+-

echo

h2h2~

g2 g1

NL

NL

Page 16: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

16

Stereo Conferencing

(Placeholder, video demonstration)

Page 17: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

17

Wideband speech & intelligibility

Siemens – “…wideband transmissions can reduce speech ambiguities by as much as 90 percent, increasing conversational intelligibility and reducing listener fatigue.” (2003 press release)

Polycom – “For single syllables, 3.3 kHz bandwidth yields an accuracy of only 75 percent, as opposed to over 95 percent with 7 kHz bandwidth.” (2003 white paper)

Marketing vs. science – both required

Page 18: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

18

Experimental study*

Similar to Diagnostic Rhyme Test and Diagnostic Alliteration Test , except we generated our own word pairs e. g., “tie” & “pie” (“hot” & “hop”)

Subject hears one of the two, is shown both, is asked “Which of these two did you hear?”

Clean anechoic speech filtered to 3 bandwidths [50,3300], [50,5000] and [50,7000] Hz.

Investigate all nine combinations of three bandwidths and three additive-noise levels (0 dB, +12 dB, +24 dB SNR).

Reference: G.A. Miller and P.E. Nicely, “An analysis of perceptual confusions among some English consonants” Lincoln Laboratory, MIT, 1955 (J. Acoust. Soc. Amer. Vol. 27, pp. 338-352)

* For questions concerning aspects of this study, contact Joseph L. Hall, Avaya Research, [email protected]

Page 19: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

19

What do they sound like?

“Seed, feed, seed” at different bandwidths and additive noise levels.

3.3 kHz LP 5 kHZ LP 7 kHZ LP

CLEAN

24 dB SNR

12 dB SNR

0 dB SNR

Page 20: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

20

Representative results

3.3 kHz 5 kHz 7 kHz

CUTOFF FREQUENCY

0.5

0.6

0.7

0.8

0.9

1.0P

rob(

CO

RR

EC

T)

SNR=24 dBSNR=12 dBSNR = 0 dB

s (e.g. six) mistaken for f (e.g. fix)

90% CI

Confuse_s_f

Page 21: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

21

Summary of results

s>thz>ð

s>ft>k

t>pð>z

s>shz>g

ð>vf>s

th>sp>t

v>ðth>f

d>gf>th

g>zsh>s

v>bg>d

b>vk>t

0.0

0.1

0.2

0.3S

LOP

E F

RO

M L

PF

LIN

EA

R R

EG

RE

SS

ION

s/th

z/ð

s/f

t/k

t/p ð/zs/shz/g f/s

SIGslope_LP

SLOPE FROM LPF LINEAR REGRESSION

ð/v th/s p/t v/ð th/fd/g f/th g/zsh/sv/b g/d b/v k/t

Red: BW significant at 90% level (from AOV - BW SNR)

Black: BW not significant at 90% level

Page 22: Some Aspects of Wideband Speech in Enterprise Telephony

2nd Workshop on Wideband Speech Quality - June 2005

22

Hallway Demonstration -- Avaya widebandSIP softphone

Wideband speech (16 kHz sampling, bandwidth limited by PC sound architecture).

Voice codecs G.711, G.729, G.726 G.722