speech user interface 語音介面
DESCRIPTION
Speech User Interface 語音介面. 無所不在的資訊取得 Pervasive Information Access. 動機. 當載具變得越來越小,輸入與輸出方式也受到相對的限制 輸入端 : 實體鍵盤大小受限,虛擬鍵盤也有同樣問題,且缺乏觸覺回饋。 輸出端 : 螢幕大小限制 ( 目前市售最大螢幕手機 Samsung note 5.3 吋 ). 應用實例. 電話語音系統 ( 客服專線 ) 文字輸入 汽車語音導航 語音搜尋 對話系統 語音記事 視障者介面. 應用實例 : 語音搜尋. 例如 : google voice search. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/1.jpg)
Speech User Interface語音介面
![Page 2: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/2.jpg)
無所不在的資訊取得Pervasive Information Access
![Page 3: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/3.jpg)
動機當載具變得越來越小,輸入與輸出方式也受到相對的限制
– 輸入端 : 實體鍵盤大小受限,虛擬鍵盤也有同樣問題,且缺乏觸覺回饋。
– 輸出端 : 螢幕大小限制 ( 目前市售最大螢幕手機 Samsung note 5.3 吋 )
![Page 4: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/4.jpg)
應用實例電話語音系統 ( 客服專線 )文字輸入汽車語音導航語音搜尋對話系統語音記事視障者介面
![Page 5: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/5.jpg)
應用實例 : 語音搜尋例如 : google voice search
![Page 6: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/6.jpg)
應用實例 : 文字輸入Dragon dictation ( 聲龍聽寫 )
http://itunes.apple.com/us/app/dragon-dictation/id341446764?mt=8
![Page 7: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/7.jpg)
應用實例 : 對話系統Siri: Apple 於 2011 年 10 月推出基於語音辨識之虛擬個人助理 (Apple 官方影片)
![Page 8: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/8.jpg)
應用實例 : 語音記事reQall
![Page 9: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/9.jpg)
語音介面的優勢輸入速度 : 一般人說話速度可達每分鐘
100 字 ( 前提 : 辨識度 )指令集的數量幾乎無限制身體其他部位仍可同時動作 : 開車時邊與乘客聊天、邊聽音樂自然 : 作為人與人間的主要的溝通方式( 演化結果 )
![Page 10: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/10.jpg)
語音介面的限制語音辨識仍不完美
– 錯誤率超過 5% 時,花費在偵測與更正錯誤的時間可能比使用鍵盤輸入還久– 語音辨識的準確率易受雜訊影響語音介面沒有可見的狀態 (no visible state)語音介面難以學習– 如何知道要下哪些指令 ?– 如何得知介面涵蓋的範圍 ?
![Page 11: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/11.jpg)
完整之語音對話系統架構
AutomaticSpeech
Recognition
NaturalLanguage
Understanding
DialogueManagement
Planning
NaturalLanguageGeneration
Text-to-speech
signal words
logical form
words
![Page 12: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/12.jpg)
主要組成元件語音辨識 (speech recognition)
– 電腦需辨識 ( 理解 ) 使用者之語音輸入
語音合成 (speech synthesis, text-to-speech, TTS)– 電腦必須能將文字轉為語音,與使用者溝通
![Page 13: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/13.jpg)
語音辨識的型態連續 vs. 非連續語音 (continuous vs. non-
continuous)語者相關或無關 (speaker independent vs. dependent)即興或朗讀文章 (spontaneous vs. read)關鍵字搜尋或全句辨識 (keyword spotting vs. continuous recognition of spoken words)字彙集大或小 (small vs. large vocabulary set)
![Page 14: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/14.jpg)
語音辨識技術隱藏式馬可夫模型 (Hidden Markov
Model)參考論文 :A tutorial on hidden Markov models and selected applications in speech recognition
![Page 15: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/15.jpg)
語音辨識系統評估透過 word error rate (WER) 來評估語音辨識系統的表現ErrorRate = 100*(Subs + Ins + Dels) /
Nwords
REF: I WANT TO GO HOME ***REC: * WANT TWO GO HOME NOWSC: D C S C C I100*(1S+1I+1D)/5 = 60%
![Page 16: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/16.jpg)
語音辨識的技術挑戰如何提升辨識率 ?如何克服雜訊干擾問題 ?如何處理贅字、停頓、發語詞等情況 ?如何加快辨識速度 ? – 雖然在桌上型電腦或筆記型電腦上的速度已沒有太大問題,但在智慧型手機尚仍有改善空間,通常做法是將語音上傳至伺服器進行後續處理及辨識。斷字 segmentation (silly versus sill lea)同音異義字 (mail vs. male)從語音辨識到語意辨識
![Page 17: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/17.jpg)
語音合成又稱為文字轉語音 (text-to-speech,
TTS)技術必須將輸入文字段落進行分析 ( 如中文的斷詞 ) ,決定對應的發音與其聲調,再交由波形合成單元產生語音。一般而言,波形合成乃利用在資料庫內的許多已錄好的語音連接起來。系統則因為儲存的語音單元大小不同而有所差異,若是要儲存 phone 以及 diphone 的話,系統必須提供大量的儲存空間。
![Page 18: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/18.jpg)
實例說明 ( 清大 MIR 實驗室 )
![Page 19: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/19.jpg)
中文 TTS 線上展示NTHU MIR Lab (清華大學 MIR 實驗室 )NTU CSIE (台大 )GUTTS (台科大 )工研院資通所科大訊飛
![Page 20: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/20.jpg)
英文 TTS 線上展示AT & T Natural VoicesGood evening, class. Today we are going to discuss an important type of human-computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far.
![Page 21: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/21.jpg)
語音合成技術
Phonetic AnalysisDictionary LookupGrapheme-to-Phoneme (LTS)
Text AnalysisText NormalizationPart-of-Speech taggingHomonym Disambiguation
Prosodic AnalysisBoundary placementPitch accent assignmentDuration computationWaveform
synthesis
RawText in
Speech out
![Page 22: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/22.jpg)
波形合成方法Concatenative synthesis: based on
the concatenation (or stringing together) of segments of recorded speech ( 將預錄的語音片段串連起來 )Formant synthesis: created using additive
synthesis and an acoustic model with various fundamental frequency, voicing, and noise levels.Articulatory synthesis: synthesizing
speech based on models of the human vocal tract
![Page 23: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/23.jpg)
波形合成 : 連鎖合成法目前所有商業語音合成系統均採用
Concatenative Synthesis 連鎖合成法,可再細分為以下三類 :Diphone Synthesis– Units are diphones; middle of one phone to middle of
next.– Why? Middle of phone is steady state.– Record 1 speaker saying each diphoneUnit Selection Synthesis– Larger units (Record 10 hours or more, so have multiple
copies of each unit)– Use search to find best sequence of unitsDomain-specific synthesis: concatenates prerecorded words and phrases to create complete utterances
![Page 24: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/24.jpg)
語音合成的技術挑戰如何正確斷字 (斷詞 )? (中文自然語言處理 )如何合成正確的聲韻 ?使用 concatenative synthesis 技術時,如何在音節與音節之間交接處更為平順 ?如何在語音中加入聲音表情 ?如何產生有特色、辨識度高的語音 ?
![Page 25: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/25.jpg)
語音對話系統Speech conversational systemSIRI: 基於美國國防部 Cognitive
Assistant that Learns and Organizes (CALO) project以語音為基礎的個人虛擬助理http://en.wikipedia.org/wiki/Siri_(software)
![Page 26: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/26.jpg)
展示影片A conversation with Siri on the iPhone 4S
![Page 27: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/27.jpg)
主要技術Conversational Interface: 語音辨識核心由 Nuance 所提供。Personal Context Awareness: CALO 計畫相關技術。Service Delegation: 資訊搜尋與服務提供,有多家公司參與。
![Page 28: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/28.jpg)
資料與服務蒐尋OpenTable, Gayot, CitySearch, BooRah, Yelp, Yahoo Local, ReserveTravel, Localeze for restaurant and business questions and actions;Eventful, StubHub, and LiveKick for events and concert information;MovieTickets, RottenTomatoes and the New York Times for movie information and reviews;True Knowledge, Bing Answers, and Wolfram Alpha for factual question answering;Bing, Yahoo and Google for web search.
![Page 29: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/29.jpg)
ChatterBot聊天機器人對於無法理解之問題,採取如 ELIZA 等對話產生器之方式來回應。Siri meets ELIZA
![Page 30: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/30.jpg)
語音介面 : 實用面之問題Major problems:– modes (no feedback)
• certain commands only work when in specific states
– deep hierarchies (also known as voice mail hell)Verbose feedback wastes time/patience– only confirm consequential things– use meaningful, short cuesInterruption– half-duplex communication (i.e., no barge-in
support)Too much speech on the part of customer is
tiringSpeech takes up space in working memory– can cause problems when problem solving
![Page 31: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/31.jpg)
語音介面開發標準VoiceXML (VXML) is the W3C's
standard XML format for specifying interactive voice dialogues between a human and a computer. 目前版本 VoiceXML 2.1 VoiceXML 3.0 (working draft)
![Page 32: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/32.jpg)
語音介面開發工具語音辨識 : CMU Sphinx; Open Source
Toolkit For Speech Recognition http://cmusphinx.sourceforge.net/語音合成 festvox: http://festvox.org/index.html語音介面: Microsoft Speech API (SAPI 5.3)Java Speech API
![Page 33: Speech User Interface 語音介面](https://reader035.vdocuments.net/reader035/viewer/2022081505/5681688b550346895ddf0bda/html5/thumbnails/33.jpg)
參考資料X. Huang, A. Acero and H. W. Hn
, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, 2001.Rabiner
and Schafer, Theory and Applications of Digital Speech Processing, 2010.Why is Siri Important?