a review on speech processing and use of lpc (linear ...excom.vsb.cz › images › files ›...
TRANSCRIPT
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
A review on speech processing
and use of LPC
(Linear Predictive Coding)
Keynote Talk
Prof. H. Gökhan İlk
Telecommunication
Educational Seminar
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Contact details
Address : Ankara University,
Faculty of Engineering
Electrical & Electronics Eng. Department
Gölbaşı, Ankara TURKEY
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
What is speech signal?
• Speech signal represents the change of acoustic pressure with respect to time s(t) in continious time or s[n] in discrete time
• It has a bandwidth between 20 Hz to 20 KHz
• The audibable band, Audio Band)
• Speech, for communication purposes, has a bandwidth of up to 3.4 KHz
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
What is sampling frequency fs ?
• In general speech is sampled at 8000 Hz (Narrow band speech)
• 11025 Hz (Forensic Quality)
• 16000 Hz (Wide band speech, ITU standard just like 8 kHz)
• 44100 Hz for CD quality
• A Word on quantization:::
• Depth is usually 16 bits
A more complicated, but more informative waveform:
Spectrogram: Time, Frequency and magnitude, all at once
Q1: What is the samplingfrequency?Q2: What does the spectrum looklike?
Confucius, 'Learning without thought is
labor lost’
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Speech Production Mechanism
LUNGS
How does it look like in time domain?
Short term correlation (i.e, from sample to sample correlation)
Long term
correlation
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Redundancy:
• Speech signal posses lots of short term, i.e. from sample to sample and long term, i.e. from one pitch period (fundamental period) to another
• It is an almost periodic waveform
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
How does it look like?
• Speech can be generally classified as Voiced or Unvoiced.
• Voiced part is a quasi-periodic (almost periodic) signal with higher energy and less zero crossing.
• Unvoiced part is a noise like signal
Areas of speech processing:
• Speech processing has several applications
Speech recognition (What is said ?) Speaker identification (Who says it ?) Speech compression (Less samples) Speech coding (Save or Send it digitally) Speech analysis (Medical or Forensic applications) Speech synthesis• (From parameters or from text to speech signal, TTS) Speech enhancement (Noisy to clean) Combination of more than one application
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Speech recognition
SpeechRecognition
Continious speechrecognition
I would like to buy a ticket from Ostrava to Prague for tomorrow
Isololated speechrecognition
One, Two, …
Word spotting
… going out for a pub and need
cash for this weekend …
Speech to Text applications
• For dictation purposes
(transcribing medical reports,
limited vocabulary for the disabled)
Artificial Intelligence
Speaker Identification
SpeakerIdentification
Close set 1:NOpen set 1:(N+1)Speaker Identification
Identify “ who” is speaking from a known set of people.
Speaker Verification 1:1
Determine whether the speaker is the person of interest
Text dependent/Text Independent
Does / Does not matter what is being said
Speaker transformation
Barack Obama speaking in Czech
Speech coding
Speech coding
Waveform coders
Try to preserve the waveform in time domain
A-law, CELP
Vocoders (VOiceCODERS)
Try to preserve the entropy (information) content of speech, usually in the spectral domain
MBEV
Transform Coders
Adaptive transformcoders (ATC)
Speech coding for VoIP
From circuit switching technology to packet switching technology
Military vocodersNATO and DoDstandards
MELP (1.2 – 2.4 kbps)
Speech analysis
Speech Analysis
Voicing analysis/Pitchestimation, jitter andshimmer analysis
To identify effects of surgery also in forensicanalysis
Cepstralanalysis/Spectralanalysis
Widely used“feature” in speechrecognition
Formant analysis
To study phoneticsand vowelstructures
Time domain analysis
To obtain featuresin speechapplications
Speech synthesis
Speech Synthesis
Text to speechsynthesis
Newspapers speaking, robots talking
Speech coders
During decodingspeech synthesisis required
Artificial productionof human sound
Prosodic modification and emotional content
Current – State of Speech Processing
Windowing(Stationary)
Postprocessing
Preprocessing
Featureselection/ extraction
ClassifierDesign
Classification result
Speech recognicedwith Speaker ID
Text-to-texttranslation
TTSProsody, emotionmodification
Output Speech
(in anotherlanguage)
Input Speech(in one language)
Father of all parameters Mother of all featuresLPC (Linear Predictive Coding)
A little bit of history first !
How slow human brain works, evolution from simple to most advanced
How would
you code
these
samples,
speech[n],
OF COURSE
PCM
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Is it not easier
to code the
difference
signal?
DPCM
(Differential
Pulse Coded
Modulation)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
A mojor problem !
Differences can be
quiet different
ADPCM
(Adaptive
Differential Pulse
Coded Modulation)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Evolving from PCM to ADPCM
Human brain Works in a different, more greedy way !
Is it possible to adapt NOT only for one sample difference?
But many sample differences???
LPC equations
P
k
k knxanxnxnxne1
][ˆ
pnxanxanxaknxanx p
p
k
k
...21][ˆ 21
1
p
k
k aknxanxnxne0
0 1 ,][ˆ
A simple, linear model. Just addition and multiplicationwith constants: LPC
Engineering point of view LPC
+
Linear Prediction
Filter
nx
][ˆ nx
ne-
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Building bridges
1
0
][ˆN
k
knxkwnynnnyne
Anyone heard of Wiener Filter Theory, Optimal Filtering
1
0
][ˆN
k
knxkhnynnnyne
Convolution sum
Wiener filter turns outan FIR filter with N coefficients
Auto-regression
Error is the difference between our signal and optimal estimate
nxyNow
1
0
][ˆN
k
k knxanxnxnxne
Lets summarize !
1
1
][ˆN
k
k knxanxnxnxne
pnxanxanxaknxanx p
p
k
k
...21][ˆ 21
1
p
k
k aknxanxnxne0
0 1 ,][ˆ
LPC Analysis Filter
nx
p
k
k aknxanxnxne0
0 1 ,][ˆ
Now we are using P previous samples
LPC Analysis, AR (Auto Regressive) Model
nepnxanxanxanx p ...21][ 21
Considering the difference: Optimum filter theory and regressionanalysis; since both independent and dependent variables
belong to the same random process, x, x[n] is called an autoregressive or AR process. That is why LPC (linear predictive
analysis) is also called AR analysis
How do you minimize a VECTOR?
It is now possible to determine the estimates byminimising the mean squared error, i.e.
}])()({[)}({ 2
1
2
p
j
j jnsnsEneEError
Setting the partial derivatives of Error with respect to j to zero for j = 1,2,...,p, we get
where E{.} is the expectation operator
PiinsjnsnsEp
j
j ,...,2,1 0)}(])()({[1
Solving the linear equation
piiji n
p
j
nj ,...,2,1)0,(),(1
)}()({),( jnsinsEjin
pjpijnsinsEjin ,...,2,1,,...,2,1)}()({),(
This is auto correlation?
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Are we good in linear algabra?That is, e(n) is orthogonalto s(n-i) for i = 1,2,...P
A x = b
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Auto-Correlation Method
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
pjpijmsimsjipN
m
nnn
0,1)()(),(1
0
jN
m
nnn jmsmsjR1
0
)()()(
piiRjiR n
p
j
nj
1)()(1
)(
:
)2(
)1(
:
)0(..)1(
::::
)2(..)1(
)1(.)1()0(
2
1
pR
R
R
RpR
pRR
pRRR
n
n
n
pnn
nn
nnn
Levinson-
Durbin
recursion
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Finally
• Now that we have the LPC ai coefficients, we can present speech with a compact representation
• This further requires an efficient representation of the excitation (residual, error) signal. In fact for example optimum magnitude calculation of regularly spaced pulses for the excitation constitutes GSM (Global System for Mobile Communications)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Why LPC is so popular ? LPC in the frequency domain
Why LPC is so popular ? LPC in the time domain
Innovations representation
H(z)=A(z) H-1(z)=1/A(z)
nx
ne nx
IF you assume the residual signal (error) as White noise !!!
The inverse system has many advantages.
1. In communications (left and right systems are apart)
2. The system on the right do not need any input ???
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Innovations representationInnovations representation is basically an inverse system.
Why called innovations??
Assume that x, our discrete random signal is speech.
e[n] is white noise
We do not need e[n], because the inverse filter itself represent the information. That is why the representation is called INNOVATIONS.
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
More on LPC ! An expert view
Since 1/A(z) is a causal filter (does everybody see that???), this implies that it is minimum phase (It is causal stable with a causal stable inverse)
Since A(z) is an FIR filter, it is always stable and we know that it is causal. Wealso know that 1/A(z) is also causal. BUT IS IT ALWAYS STABLE???
When ak (LPC coefficients) are found by solving Normal equations with a positive definite correlation function. Since they are found by solving a positive definite matrix inverse, the poles always lie within the unit circleand therefore they are ALWAYS STABLE !
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
How good the theory works?
• Steve wore a bright red cashmere sweater (Male speaker)
• Before Thursday’s exam, review every formula (Female speaker)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Samples
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Male
“Steve wore a bright
red cashmere sweater”
Female
“Before Thursday’s
exam review every
formula”
2.4 kb/s
1.0 kb/s
128 kb/s PCM
2.4 kb/s
1.0 kb/s
128 kb/s PCM
Thank you for listening,
any questions?
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217