theses exam 2012 - wideband speech reconstruction

20
1 st Advisor : Dr. Ir. Bambang Hidayat, DEA Fitrie Ratnasari (111080134) SIMULATION OF RECONSTRUCTION WIDEBAND SPEECH SIGNAL USING SPECTRAL SHIFTING 2 nd Advisor : Inung Wijayanto S.T, M.T 1

Upload: fitrie-ratnasari

Post on 06-May-2015

519 views

Category:

Technology


11 download

DESCRIPTION

In most of the communication systems speech is transmittes in narrowband, containing frequencies from 300 Hz to 3400 Hz. Compared with normal speech which is generally contains a perceptually significant amount of energy up to 8 kHz, this speech has a muffled quality and reduced intelligibility, particularly noticeable in sounds such as /s/ and /f/ . Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate. Wideband reconstruction is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband reconstruction can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony. This final project aims to simulate the bandwidth extension system using spectral shifting method for highband excitation, which is used codebook and linear mapping to estimate the envelope of highband. The algorithm for wide band expansion proved to work, though certain unwanted artefacts were introduced in the reconstructed signal. Listening tests confirmed the presence of these unwanted artefacts. Objective and subjective tests demonstrate that wideband speech synthesized using these techniques have presentage in (numerical) 50 % of the respondences with SNR 5,13 dB. Optimum parameter used in this system goes to Euclidean distance with K=1 for KNN classification and correlation distance with 256 clusters for Kmean clustering. Computational time for spectral shifting 0.144 s, for spectral folding 0.138 s and codebook needs 164,2 s. Subjective measurement using DMOS for spectral shifting about 3.65 and for spectral folding 2. However further research and improvement to reach higher quality from this system for implementation are still needed.

TRANSCRIPT

  • 1.SIMULATION OF RECONSTRUCTION WIDEBAND SPEECH SIGNAL USING SPECTRAL SHIFTING Fitrie Ratnasari (111080134) 1 Advisor : Dr. Ir. Bambang Hidayat, DEA st2nd Advisor : Inung Wijayanto S.T, M.T 1

2. Outline Background Purpose Problem Formulation Problem Limitation Theory System Model Simulation Result Conclusion Suggestion 2 3. BACKGROUND The transmitting speech narrowband sounds muffled, thin and far away communication.Wideband speech coding inquires an increase in the bit rate, bandwidth and expensive.Various codec in amount of telecommunication technology need a bridge to equate the speech quality 3 4. PURPOSE To simulate the reconstruction of wideband speech signal using spectral shifting and spectral folding method To estimate the wideband speech envelope using codebook algorithm To analyze performance system of simulation using objective, subjective measurement and computational time 4 5. PROBLEM FORMULATION How to simulate the reconstruction of wideband speech signal using spectral shifting method with Matlab R.2009.a How to estimate the wideband speech envelope using codebook algorithmHow to estimate the residual error of high frequence wideband speech signalHow to analyze and synthesize linear predictive in a system5 6. PROBLEM LIMITATIONS System focused on using spectral shifting method for reconstruction wideband speech signal System doesnt require any communication system channel or transmission Input speech for simulation is in a clean speech signal, format .*wav, with sampling frequency 16KHz Non real-time system 6 7. PROBLEM LIMITATIONS (contd) Data training of speech signal are in Bahasa without any dialect regionEvaluation of simulation only tested in a normal hearing Performance paramaters that analyze using SNR, Cross Correlation, DMOS (Degraded Mean Opinion Score), and computational time7 8. Theory Wideband vs Narrowband Narrowband Wideband: 200 3400 Hz : 50 7000 HzThe high-frequency extension from 3400 to 7000 Hz provides better fricative differentiation, and therefore higher intelligibility.Wideband SpeechNarrowband SpeechWideband is the answer to make intelligibility, naturalness of speech, feeling of transparent communication and facilitates speaker recognition.8 9. SpectogramWideband SpeechNarrowband Speech 9 10. Wideband Reconstruction Estimating wideband envelope. This system using codebook algorithmEstimating missing high frequency. This system using spectral shifting and folding10 11. Codebook Algorithm11 12. Spectral Shifting and Spectral Folding Method Spectral shifting or spectral folding is a method for estimating the missing high frequency. Spectral shifting methodLPFPitch detectorCosine generatorXSpectral folding method12 13. MulaiSystem Model Studi LiteraturPerekaman Suara WidebandData MasukanFilter TelephoniK Bandlimited speech (Narrowband)LP AnalysisLPC Coefficient of narrowbandResidual Error of NarrowbandEnvelope EstimationHigh Frequency RegenerationLPC Coeff estimated of widebandResidual Error Estimation of WidebandLP SynthesisHPFAddingReconstructed Wideband13 14. Simulation Result (Objective Measurement)0.455.1549Spectral Folding Spectral Shifting5.155 5.15Spectral Folding Spectral Shifting0.45 0.4495 0.4490.44855.1455.1370.4480.44690.44755.140.447 5.1350.4465 0.4465.130.4455 0.4455.125 SNR cross corrTesting Result of Reconstruction Simulation 14 15. Simulation Result (Subjective Measurement) 1st testing (A) Narrowband signal (B) Original wideband Preference : BProportion : 95% 2nd testing(A) Narrowband signal (B) Wideband reconstructed using spectral folding Preference: AProportion : 100%3rd testing (A) Narrowband signal (B) Wideband reconstructed using spectral shifting Preference : A / BProportion : 50 %4th testing (A) Wideband reconstructed using spectral folding (B) Wideband reconstructed using spectral shifting Preference : BProportion : 100%Testing Result of Simulation using A/B preference test15 16. Simulation Result 6 5 44.05 3.653 22Highest DMOS Lowest DMOS Mean DMOS1 0 NarrowbandSpectral ShiftingSpectral FoldingTesting Result of Simulation using DMOS 16 17. Conclusion 1. Simulation of reconstruction wideband speech signal using spectral shifting and spectral folding proved to work, though certain unwanted artefacts were introduced in the reconstructed speech signal. 2. The best parameter which is used for this system is Euclidean Distance with K = 1 for knn classification and Correlation Distance with 256 clusters for kmean clustering. 3. Maximum of mean SNR of this system toward to 5.3 dB 4. Computational time for this system needs about 0.176 seconds, and 164 seconds for codebook process. 17 18. Conclusion 5. Wideband reconstructed signal using spectral shifting method has a better performance rather than spectral folding method. Mean DMOS for spectral shifting about 3.65 and for spectral folding 2 6. Percentage of respondence for choosing both of spectral shifting method narrowband signal has a same presentage. It means that this system still need further reaserch to reach higher quality for implementation18 19. Suggestion 1. System might be implemented and analyzed with the other languange programs, such as Java, C, etc 2. System might be impelemented with using another method for estimating missing high frequency such as statistical method GMM, HMM, etc 3. System might be added by using with estimating wideband energy in order to have a higher quality. 4. System can be applied with dialect region 5. System for reconstruction wideband speech signal can be implemented as a real time process19 20. Thank you DankeBedankt Merci Maturnuwun 20