12.eca

8/12/2019 12.ECA

1/5

ECA-based Control Interface on Android for Home Automation

System

Modules:

Voice Activity Detector:

Automatic Speech Recognition

Conversational Engine

Control Interface

Tet!To!Speech

Virtual "ead Animation

Voice Activity Detector:

The Voice Activity Detectors (VAD) role is to discriminate the users voice frames from those

containing noise. This module reads the digitized audio samples acquired from a microphone and

sends the filtered raw audio to the A!. The actual implementation of the VAD module is "ased

on the phin# $ase li"rary% which was modified so it can wor& with the 'pen native audio

li"raries present on Android.

*iltered !aw

Audio AudioVAD

Microphon

e ASR

Audio li"raries

8/12/2019 12.ECA

2/5

Automatic Speech Recognition

The Automatic peech !ecognition (A!) module performs speech to te#t conversion. +t ta&es

as input the utterance with the users speech that come from the VAD and sends the resultant te#t

to the ,. +n the proposed platform% the A! module is "ased on the -oc&et phin# speech

recognition li"rary.

Audio Tet

Conversational Engine

The ,onversational ngine (,) e#tracts the meaning of the utterance% manages the dialog flow

and produces the actions appropriate for the target domain. +t generates a response "ased on the

VAD ASR CE

speech

recognition

li"rary

8/12/2019 12.ECA

3/5

input% the current state of the conversation and the dialog history. +t was also added support for

an o"ect/oriented data"ase that can decrease the dynamic memory usage at the e#pense of an

increment of the response time

Speech

Control Interface

The ,ontrol +nterface translates the commands said "y then user to a format that can "e

understood "y the target applications or services running on the same device or accessi"le

remotely. This module is domain/specific and has to "e reimplemented or adapted for every new

target application.

#ser commands Target Application

Tet!To!Speech

The TT module implementation is "ased on the epea& li"rary. The Te#t/To/peech (TT)

su"system carries out the generation of the synthetic output voice from the te#t that comes as aresponse from the ,.it sends to the V0A module a list of the phonemes with their duration soanimation and artificial speech match up. The TT module implementation is "ased on the

epea& li"rary.

ASR CE CI

CI

8/12/2019 12.ECA

4/5

synthetic

Voice

Virtual "ead Animation

This module receives as inputs "oth the mood information from the , and the list of the

phonemes durations from the TT module. $y processing the inputs% it generates the visemes

(the visual representation of the phonemes) and the facial e#pression that will "e rendered along

with the synthetic voice.

Mood information

CE TTS VHA

Speak

library

CE

TTS

VHA

Syntheti

c

Voice

8/12/2019 12.ECA

5/5

$honemes% durations

12.eca

Documents