12.eca
TRANSCRIPT
-
8/12/2019 12.ECA
1/5
ECA-based Control Interface on Android for Home Automation
System
Modules:
Voice Activity Detector:
Automatic Speech Recognition
Conversational Engine
Control Interface
Tet!To!Speech
Virtual "ead Animation
Voice Activity Detector:
The Voice Activity Detectors (VAD) role is to discriminate the users voice frames from those
containing noise. This module reads the digitized audio samples acquired from a microphone and
sends the filtered raw audio to the A!. The actual implementation of the VAD module is "ased
on the phin# $ase li"rary% which was modified so it can wor& with the 'pen native audio
li"raries present on Android.
*iltered !aw
Audio AudioVAD
Microphon
e ASR
Audio li"raries
-
8/12/2019 12.ECA
2/5
Automatic Speech Recognition
The Automatic peech !ecognition (A!) module performs speech to te#t conversion. +t ta&es
as input the utterance with the users speech that come from the VAD and sends the resultant te#t
to the ,. +n the proposed platform% the A! module is "ased on the -oc&et phin# speech
recognition li"rary.
Audio Tet
Conversational Engine
The ,onversational ngine (,) e#tracts the meaning of the utterance% manages the dialog flow
and produces the actions appropriate for the target domain. +t generates a response "ased on the
VAD ASR CE
speech
recognition
li"rary
-
8/12/2019 12.ECA
3/5
input% the current state of the conversation and the dialog history. +t was also added support for
an o"ect/oriented data"ase that can decrease the dynamic memory usage at the e#pense of an
increment of the response time
Speech
Control Interface
The ,ontrol +nterface translates the commands said "y then user to a format that can "e
understood "y the target applications or services running on the same device or accessi"le
remotely. This module is domain/specific and has to "e reimplemented or adapted for every new
target application.
#ser commands Target Application
Tet!To!Speech
The TT module implementation is "ased on the epea& li"rary. The Te#t/To/peech (TT)
su"system carries out the generation of the synthetic output voice from the te#t that comes as aresponse from the ,.it sends to the V0A module a list of the phonemes with their duration soanimation and artificial speech match up. The TT module implementation is "ased on the
epea& li"rary.
ASR CE CI
CI
-
8/12/2019 12.ECA
4/5
synthetic
Voice
Virtual "ead Animation
This module receives as inputs "oth the mood information from the , and the list of the
phonemes durations from the TT module. $y processing the inputs% it generates the visemes
(the visual representation of the phonemes) and the facial e#pression that will "e rendered along
with the synthetic voice.
Mood information
CE TTS VHA
Speak
library
CE
TTS
VHA
Syntheti
c
Voice
-
8/12/2019 12.ECA
5/5
$honemes% durations