a review on speech processing and use of lpc (linear ...excom.vsb.cz › images › files ›...

45
The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 A review on speech processing and use of LPC (Linear Predictive Coding) Keynote Talk Prof. H. Gökhan İlk

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

A review on speech processing

and use of LPC

(Linear Predictive Coding)

Keynote Talk

Prof. H. Gökhan İlk

Page 2: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Telecommunication

Educational Seminar

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 3: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Contact details

Address : Ankara University,

Faculty of Engineering

Electrical & Electronics Eng. Department

Gölbaşı, Ankara TURKEY

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

[email protected]

Page 4: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

What is speech signal?

• Speech signal represents the change of acoustic pressure with respect to time s(t) in continious time or s[n] in discrete time

Page 5: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

• It has a bandwidth between 20 Hz to 20 KHz

• The audibable band, Audio Band)

• Speech, for communication purposes, has a bandwidth of up to 3.4 KHz

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 6: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

What is sampling frequency fs ?

• In general speech is sampled at 8000 Hz (Narrow band speech)

• 11025 Hz (Forensic Quality)

• 16000 Hz (Wide band speech, ITU standard just like 8 kHz)

• 44100 Hz for CD quality

• A Word on quantization:::

• Depth is usually 16 bits

Page 7: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

A more complicated, but more informative waveform:

Spectrogram: Time, Frequency and magnitude, all at once

Q1: What is the samplingfrequency?Q2: What does the spectrum looklike?

Confucius, 'Learning without thought is

labor lost’

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 8: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speech Production Mechanism

LUNGS

Page 9: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

How does it look like in time domain?

Short term correlation (i.e, from sample to sample correlation)

Long term

correlation

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 10: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Redundancy:

• Speech signal posses lots of short term, i.e. from sample to sample and long term, i.e. from one pitch period (fundamental period) to another

• It is an almost periodic waveform

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 11: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

How does it look like?

• Speech can be generally classified as Voiced or Unvoiced.

• Voiced part is a quasi-periodic (almost periodic) signal with higher energy and less zero crossing.

• Unvoiced part is a noise like signal

Page 12: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Areas of speech processing:

• Speech processing has several applications

Speech recognition (What is said ?) Speaker identification (Who says it ?) Speech compression (Less samples) Speech coding (Save or Send it digitally) Speech analysis (Medical or Forensic applications) Speech synthesis• (From parameters or from text to speech signal, TTS) Speech enhancement (Noisy to clean) Combination of more than one application

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 13: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speech recognition

SpeechRecognition

Continious speechrecognition

I would like to buy a ticket from Ostrava to Prague for tomorrow

Isololated speechrecognition

One, Two, …

Word spotting

… going out for a pub and need

cash for this weekend …

Speech to Text applications

• For dictation purposes

(transcribing medical reports,

limited vocabulary for the disabled)

Artificial Intelligence

Page 14: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speaker Identification

SpeakerIdentification

Close set 1:NOpen set 1:(N+1)Speaker Identification

Identify “ who” is speaking from a known set of people.

Speaker Verification 1:1

Determine whether the speaker is the person of interest

Text dependent/Text Independent

Does / Does not matter what is being said

Speaker transformation

Barack Obama speaking in Czech

Page 15: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speech coding

Speech coding

Waveform coders

Try to preserve the waveform in time domain

A-law, CELP

Vocoders (VOiceCODERS)

Try to preserve the entropy (information) content of speech, usually in the spectral domain

MBEV

Transform Coders

Adaptive transformcoders (ATC)

Speech coding for VoIP

From circuit switching technology to packet switching technology

Military vocodersNATO and DoDstandards

MELP (1.2 – 2.4 kbps)

Page 16: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speech analysis

Speech Analysis

Voicing analysis/Pitchestimation, jitter andshimmer analysis

To identify effects of surgery also in forensicanalysis

Cepstralanalysis/Spectralanalysis

Widely used“feature” in speechrecognition

Formant analysis

To study phoneticsand vowelstructures

Time domain analysis

To obtain featuresin speechapplications

Page 17: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Speech synthesis

Speech Synthesis

Text to speechsynthesis

Newspapers speaking, robots talking

Speech coders

During decodingspeech synthesisis required

Artificial productionof human sound

Prosodic modification and emotional content

Page 18: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Current – State of Speech Processing

Windowing(Stationary)

Postprocessing

Preprocessing

Featureselection/ extraction

ClassifierDesign

Classification result

Speech recognicedwith Speaker ID

Text-to-texttranslation

TTSProsody, emotionmodification

Output Speech

(in anotherlanguage)

Input Speech(in one language)

Page 19: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Father of all parameters Mother of all featuresLPC (Linear Predictive Coding)

A little bit of history first !

How slow human brain works, evolution from simple to most advanced

Page 20: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

How would

you code

these

samples,

speech[n],

OF COURSE

PCM

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 21: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Is it not easier

to code the

difference

signal?

DPCM

(Differential

Pulse Coded

Modulation)

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 22: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

A mojor problem !

Differences can be

quiet different

ADPCM

(Adaptive

Differential Pulse

Coded Modulation)

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 23: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Evolving from PCM to ADPCM

Human brain Works in a different, more greedy way !

Is it possible to adapt NOT only for one sample difference?

But many sample differences???

Page 24: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

LPC equations

P

k

k knxanxnxnxne1

][ˆ

pnxanxanxaknxanx p

p

k

k

...21][ˆ 21

1

p

k

k aknxanxnxne0

0 1 ,][ˆ

A simple, linear model. Just addition and multiplicationwith constants: LPC

Page 25: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Engineering point of view LPC

+

Linear Prediction

Filter

nx

][ˆ nx

ne-

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 26: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Building bridges

1

0

][ˆN

k

knxkwnynnnyne

Anyone heard of Wiener Filter Theory, Optimal Filtering

1

0

][ˆN

k

knxkhnynnnyne

Convolution sum

Wiener filter turns outan FIR filter with N coefficients

Page 27: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Auto-regression

Error is the difference between our signal and optimal estimate

nxyNow

1

0

][ˆN

k

k knxanxnxnxne

Page 28: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Lets summarize !

1

1

][ˆN

k

k knxanxnxnxne

pnxanxanxaknxanx p

p

k

k

...21][ˆ 21

1

p

k

k aknxanxnxne0

0 1 ,][ˆ

Page 29: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

LPC Analysis Filter

nx

p

k

k aknxanxnxne0

0 1 ,][ˆ

Now we are using P previous samples

Page 30: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

LPC Analysis, AR (Auto Regressive) Model

nepnxanxanxanx p ...21][ 21

Considering the difference: Optimum filter theory and regressionanalysis; since both independent and dependent variables

belong to the same random process, x, x[n] is called an autoregressive or AR process. That is why LPC (linear predictive

analysis) is also called AR analysis

Page 31: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

How do you minimize a VECTOR?

It is now possible to determine the estimates byminimising the mean squared error, i.e.

}])()({[)}({ 2

1

2

p

j

j jnsnsEneEError

Setting the partial derivatives of Error with respect to j to zero for j = 1,2,...,p, we get

where E{.} is the expectation operator

PiinsjnsnsEp

j

j ,...,2,1 0)}(])()({[1

Page 32: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Solving the linear equation

piiji n

p

j

nj ,...,2,1)0,(),(1

)}()({),( jnsinsEjin

pjpijnsinsEjin ,...,2,1,,...,2,1)}()({),(

This is auto correlation?

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 33: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Are we good in linear algabra?That is, e(n) is orthogonalto s(n-i) for i = 1,2,...P

A x = b

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 34: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Auto-Correlation Method

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

pjpijmsimsjipN

m

nnn

0,1)()(),(1

0

Page 35: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

jN

m

nnn jmsmsjR1

0

)()()(

piiRjiR n

p

j

nj

1)()(1

)(

:

)2(

)1(

:

)0(..)1(

::::

)2(..)1(

)1(.)1()0(

2

1

pR

R

R

RpR

pRR

pRRR

n

n

n

pnn

nn

nnn

Levinson-

Durbin

recursion

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 36: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 37: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Finally

• Now that we have the LPC ai coefficients, we can present speech with a compact representation

• This further requires an efficient representation of the excitation (residual, error) signal. In fact for example optimum magnitude calculation of regularly spaced pulses for the excitation constitutes GSM (Global System for Mobile Communications)

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 38: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Why LPC is so popular ? LPC in the frequency domain

Page 39: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Why LPC is so popular ? LPC in the time domain

Page 40: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Innovations representation

H(z)=A(z) H-1(z)=1/A(z)

nx

ne nx

IF you assume the residual signal (error) as White noise !!!

The inverse system has many advantages.

1. In communications (left and right systems are apart)

2. The system on the right do not need any input ???

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 41: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Innovations representationInnovations representation is basically an inverse system.

Why called innovations??

Assume that x, our discrete random signal is speech.

e[n] is white noise

We do not need e[n], because the inverse filter itself represent the information. That is why the representation is called INNOVATIONS.

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 42: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

More on LPC ! An expert view

Since 1/A(z) is a causal filter (does everybody see that???), this implies that it is minimum phase (It is causal stable with a causal stable inverse)

Since A(z) is an FIR filter, it is always stable and we know that it is causal. Wealso know that 1/A(z) is also causal. BUT IS IT ALWAYS STABLE???

When ak (LPC coefficients) are found by solving Normal equations with a positive definite correlation function. Since they are found by solving a positive definite matrix inverse, the poles always lie within the unit circleand therefore they are ALWAYS STABLE !

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 43: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

How good the theory works?

• Steve wore a bright red cashmere sweater (Male speaker)

• Before Thursday’s exam, review every formula (Female speaker)

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Page 44: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Samples

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

Male

“Steve wore a bright

red cashmere sweater”

Female

“Before Thursday’s

exam review every

formula”

2.4 kb/s

1.0 kb/s

128 kb/s PCM

2.4 kb/s

1.0 kb/s

128 kb/s PCM

Page 45: A review on speech processing and use of LPC (Linear ...excom.vsb.cz › images › files › 2017_RoSP › TES_Ilk.pdf · A review on speech processing and use of LPC (Linear Predictive

Thank you for listening,

any questions?

The Development of Excellence of the Telecommunication Research Team in

Relation to International Cooperation - CZ.1.07/2.3.00/20.0217