speech signal processing for media accessibility.ppt [互換モード] · background sound...

15
Geneva, Switzerland, 24 October 2013 Speech signal processing for media accessibility Takayuki Ito, Dr. Eng. Executive Research Engineer, NHK Engineering System, Inc. [email protected] ITU Workshop on “Making Media Accessible to all: The options and the economics” (Geneva, Switzerland, 24 (p.m.) – 25 October 2013)

Upload: others

Post on 26-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Geneva, Switzerland, 24 October 2013

Speech signal processingfor media accessibility

Takayuki Ito, Dr. Eng.Executive Research Engineer, NHK Engineering System, Inc.

[email protected]

ITU Workshop on “Making Media Accessible to all:The options and the economics”

(Geneva, Switzerland, 24 (p.m.) – 25 October 2013)

Page 2: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Geneva, Switzerland, 24 October 2013 2

Ageing : A Global Issue

Population of elderly persons is increasing globally because of fertility rates decline.

Need providing elderly persons with the opportunity to continue contributing to society.

(UN 2002 Madrid International Plan of Action on Ageing)

From “supported” to “supporting”

Japan Aged65 and over

2010 23%2040 36%

Page 3: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Ageing : degradation of hearing

Hearing loss especially in higher frequenciesHearing Aid is available.

Background sound interferes to understand speech.

Better mixing balance for TV programs is needed.

Degradation of cognitive speedSlower speech rate is preferable.

Compensating these degradations makes easier for their social participation.

Geneva, Switzerland, 24 October 2013 3

Page 4: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Speech rate conversion technology

4Geneva, Switzerland, 24 October 2013

Page 5: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

TV and radio set with “Slow button”

Speech rate conversion for elderly people

5

The elderly sometimes claim “Recent speeches on TV programs are too fast for me to understand.”A need to slow down speech rate without degrading sound quality

Geneva, Switzerland, 24 October 2013

×

Slowertime

①②③③④⑤⑥⑦⑧⑧⑨⑩

Fastertime

①②④⑤⑥⑦⑨⑩

Originaltime

time

①②③④⑤⑥⑦⑧⑨⑩

stop

Analog elongation

×①②③④⑤⑥⑦⑧⑨⑩

Page 6: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Speech rate conversion without changing length

6

Start is coincided at blue line positions

Start is not coincidedbut…

Again it coincides

Original

Converted

Stop

Geneva, Switzerland, 24 October 2013

streaming data

Page 7: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Visually impaired people use fast replay to find a main idea in audio books or web pages.(Audio skimming)

Original(n times) BGM speechsilent

Important part(speech)

J

Intelligible high speed speechfor visually impaired people

7

Make this part easier to understand

Converted(same length)

Make slower Make slower

time

E

GF

Stop

Geneva, Switzerland, 24 October 2013

recorded data

Page 8: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Applications of speech rate conversion

Geneva, Switzerland, 24 October 2013 8

slower faster

Learn foreign language

For people with learning disability

Quick news internet service

Audio skimming for visually

impaired people

For elderly people

Page 9: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

clean audio

9Geneva, Switzerland, 24 October 2013

Page 10: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

A TV receiver with clean audio dial

10Geneva, Switzerland, 24 October 2013

Various ways to realize this.For detailed information, please see FG AVA TR Part 12.

Page 11: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Receiver-side re-mixing for the elderly(Clean Audio)

Separate speech from background sound by stereo correlation.Estimated speech component is enhanced for clearer speech. Speech and BG sound is re-mixed with favorite ratio.Nothing is necessary to change in production and transmission.

11

BroadcastSound Output

Sound

Stereosignal

adaptive filter

Voice detectorSpeech / non-speech

flag

Estimated speech Re-mixing

speech and BG with specified

ratio

spectrum emphasiz

-er×α

×β

×γ

×η

Geneva, Switzerland, 24 October 2013

Estimated BG sound

Page 12: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Demonstration of the receiver-side clear audio

Geneva, Switzerland, 24 October 2013 12

Page 13: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Conclusions and Recommendations

Compensating degraded functions of the elderly helps their social participation.Speech rate conversion and re-mixing F/B sounds are promising technologies for these purpose.Broadcasters/TV manufacturers are encouraged to provide these services/ devices with these functions.

Refer FG AVA Tech. Report Part 12 for more information.

Geneva, Switzerland, 24 October 2013 13

Page 14: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Geneva, Switzerland, 24 October 2013 14

Page 15: Speech signal processing for media accessibility.ppt [互換モード] · Background sound interferes to understand speech. Better mixing balance for TV programs is needed. ... BGM

Mixing balance meterIndicate loudness-basedmixing balance“Elderly emulation mode”indicates better mixing for the elderly.Young mixing engineerscan produce better balanced audio for the elderly.

Clear audio in studio : Mixing balance meter

15

Speech(narration etc.)

Backgroundsounds

Mixedsound

Mixing balance meter

Studio

CalculatesLoudness

&Estimate the favorability of the MIX‐Level

Geneva, Switzerland, 24 October 2013