expert system voice assistant

8/10/2019 Expert System Voice Assistant

1/52

Expert System Voice Assistant

A

MAJOR PROJECT

Submitted For The Partial Fulfilment Of The Requirement

For The Award Of Degree Of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE & ENGINEERING

Submitted by: Guided By:

1. Aakash Shrivastava(0101CS101001) Prof. Shikha Agarwal

2. Ashish Kumar Namdeo(0101CS101024)

3. Avinash Dongre(0101CS101026)

4. Chitransh Surheley(0101CS101031)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

UNIVERSITY INSTITUTE OF TECHNOLOGY

RAJIV GANDHI PRODYOGIKI VISHWAVIDYALAYA

BHOPAL-462036

2013- 2014


2/52

ii

UNIVERSITY INSTITUTE OF TECHNOLOGY

RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA, BHOPAL

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that Akash Shrivastava, Ashish Kumar Namdeo, Avinash Dongre, Chitransh

Surheley of B.E fourth year, Computer science & Engineering have completed their major project

Expert System Voice Assistant during the academic year 2013-14 under our guidance and

supervision.

We approve the project for the submission for the partial fulfillment of the requirement for the

award of degree in Computer Science & Engineering.

Prof. Shikha Agarwal Dr. Sanjay Silakari Dr. V.K.Sethi

Project Guide ( Head CSE Dept.) (Director, UIT-RGPV)


3/52

iii

DECLARATION BY CANDIDATE

We hereby declare that the work which is being presented in the Major project Expert System

Voice Assistant submitted in partial fulfillment of the requirement for the award of Bache lorDegree in Computer Science & Engineering .The work which has been carried out at

University Institute of Technology, RGPV, Bhopal is an authentic record of our work carried

under the guidance of Prof. Shikha Agrawal Department of Computer Science & Engineering,

UIT-RGPV, Bhopal.

The matter written in this project has not been submitted by us for the award of any other

degree.

Aakash Shrivastava(0101CS101001)

Ashish Kumar Namdeo(0101CS101024)

Avinash Dongre(0101CS101026)

Chitransh Surheley(0101CS101031)


4/52

iv

ACKNOWLEDGEMENT

We take the opportunity to express our cordial gratitude and deep sense of indebtedness to our

guide Prof. Shikha Agrawal, Department / Computer Science and Engineering for the valuableguidance and inspiration throughout the project duration. We feel thankful to her for their

innovative ideas, which led to successful completion of this project work. She has always

welcomed our problem and helped us to clear our doubt. We will always be grateful to them for

providing us moral support and sufficient time.

We owe our sincere thanks to Dr. Sanjay Silakari (HOD, CSE) who helped us duly in time

during our project work in the Department.

At the same time, we would like to thank all other faculty members and all non-teaching staff in

Computer Science and Engineering Department for their valuable co-operation.

Aakash Shrivastava(0101CS101001)

Ashish Kumar Namdeo(0101CS101024)

Avinash Dongre(0101CS101026)

Chitransh Surheley(0101CS101031)


5/52

v

Abstract

Speech interface to computer is the next big step that computer science need to take for

general users. Speech recognition will play an important role in taking technology to them.

Our goal is to create a speech recognition software that can recognise spoken words. This

report takes a brief look at the basic building block of a speech recognition, speech synthesis

and the overall human and computer interaction. The most important purpose of this project is

to understand the interface between a person and a computer. Traditional or orthodox ways of

interaction are keyboard, mouse or any other input device but nowadays the computing has

become more sophisticated and complex operation. With these properties we have got the

advantage and resources to think about building a more modern interface which will allow us

to make a more natural looking interaction. So in this project, we have tried to develop an

application which will make the human - computer interaction more interesting and user

friendly. It is called the Expert System Voice Assistant the main application of this project is

that it takes human voice as an input,processes it accordingly, does the given task and

responds at the end. This project is Digital life assistant which uses mainly human

communication means such Twitter, instant message and voice to create two way connections

between human and his computer, controlling power, documents, social media and much

more. In our project we mainly use voice as communication, so it is basically the Speechrecognition application. The concept of speech technology really encompasses two

technologies: Synthesizer and Recognizer. A speech synthesizer takes as input and produces

an audio stream as output. A speech recognizer on the other hand does opposite. It takes an

audio stream as input and thus turns it into text transcription. The voice is a signal of infinite

information. A direct analysis and synthesizing the complex voice signal is due to too much

information contained in the signal. Therefore the digital signal processes such as Feature

Extraction and Feature Matching are introduced to represent the voice signal. In this project

we directly use speech engine which use Feature extraction technique as Mel scaled frequency

cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived from Fourier

transform and filter bank analysis are perhaps the most widely used front- ends in state-of-the-

art speech recognition systems. Our aim to create more and more functionalities which can

help human to assist in their daily life and also reduces their efforts.


6/52

vi

Table of Contents

1.0 INTRODUCTION-------------------------------------------- ERROR! BOOKMARK NOT DEFINED.

1.1 EXISTING SYSTEMS----------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

1.2 SPEECH RECOGNITION-------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

1.3 SPEECH SYNTHESIS----------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

1.4 INTERMEDIATE OPERATIONS AND RESULT CREATION------------ERROR!BOOKMARK NOT DEFINED.

2.0 LITERATURE SURVEY AND RELATED WORK---- ERROR! BOOKMARK NOT DEFINED.

2.1 MICROSOFT SPEECH RECOGNITION ENGINE------------------------ERROR!BOOKMARK NOT DEFINED.

2.2 COLLECTED INFORMATION AND HISTORY--------------------------ERROR!BOOKMARK NOT DEFINED.

2.3 AVAILABILITY OF RESOURCES----------------------------------------ERROR!BOOKMARK NOT DEFINED.

2.4 RELATED WORK--------------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

3.0 PROPOSED WORK------------------------------------------- ERROR! BOOKMARK NOT DEFINED.

3.1 PROBLEM DESCRIPTION-----------------------------------------------ERROR!BOOKMARK NOT DEFINED.

3.2 ARCHITECTURE OF THE PROJECT-------------------------------------ERROR!BOOKMARK NOT DEFINED.

3.3 WORKING OF THE PROJECT-------------------------------------------ERROR!BOOKMARK NOT DEFINED.

4.0 DESIGN AND DEVELOPMENT--------------------------- ERROR! BOOKMARK NOT DEFINED.

4.1 MICROSOFT VISUAL STUDIO-----------------------------------------ERROR!BOOKMARK NOT DEFINED.

4.2 SPEECH SYNTHESIS----------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

5.0 IMPLEMENTATION AND CODING--------------------- ERROR! BOOKMARK NOT DEFINED.

5.1 POST QUERY DESIGN--------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

5.2 PROTOTYPE AND INCEPTION-----------------------------------------ERROR!BOOKMARK NOT DEFINED.

5.3 DEFAULT COMMANDS.TXT------------------------------------------ERROR!BOOKMARK NOT DEFINED.

6.0 RESULTS-------------------------------------------------------- ERROR! BOOKMARK NOT DEFINED.

6.1 SNAPSHOT OF THE GUI------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

6.2 FLOWCHATS---------------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

7.0 CONCLUSION AND FUTURE WORK------------------- ERROR! BOOKMARK NOT DEFINED.

REFERENCES -------------------------------------------------- ERROR! BOOKMARK NOT DEFINED.

LIST OF FIGURES AND TABLES------------------------- ERROR! BOOKMARK NOT DEFINED.

2.1 POST QUERY DESIGN--------------------------------------------------ERROR!BOOKMARK NOT DEFINED.

2.2 PROTOTYPE AND INCEPTION-----------------------------------------ERROR!BOOKMARK NOT DEFINED.

2.3 DEFAULT COMMANDS.TXT------------------------------------------ERROR!BOOKMARK NOT DEFINED.


7/52

vii


8/52

1

Chapter 1

1. Introduction

Speech is an effective and natural way for people to interact with applications, complementing

or even replacing the use of mice, keyboards, controllers, and gestures. A hands-free, yet

accurate way to communicate with applications, speech lets people be productive and stay

informed in a variety of situations where other interfaces will not. Speech recognition is a

topic that is very useful in many applications and environments in our daily life. Generally

speech recognizer is a machine which understands humans and their spoken word in some

way and can act thereafter. A different aspect of speech recognition is to facilitate for people

with functional disability or other kinds of handicap. To make their daily chores easier, voice

control could be helpful. With their voice they could operate the system. This leads to the

discussion about intelligent homes where these operations can be made available for the

common man as well as for handicapped.Voice activated systems and gesture control systems

have taken the experiences of the nave end-users to the next level. Present day users are able

to access or control the system without making a physical interaction with the computer. The

proposed model presents a new approach to voice activated control systems which enhancesthe response time and user experience by looking beyond the steps of speech recognition and

focus on the post processing step of natural language processing. The proposed method

conceives the system as a Deterministic Finite State Automata, where each state is allowed a

finite set of keywords, which will be listened to by the speech recognition system. This is

achieved by the introduction of a new system to handle Finite Automata called Switch State

Mechanism. The natural language processing is used to regularly update the state keywords

and give the user a life like interaction with the computer.

With the input functionality of speech recognition, your application can monitor the state,

level, and format of the input signal, and receive notification about problems that might

interfere with successful recognition.You can create grammars programmatically using

constructors and methods on theGrammarBuilder andChoices classes. Your application can

dynamically modify programmatically created grammars while it is running. The structure of
http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.grammarbuilder(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.grammarbuilder(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.choices(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.choices(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.choices(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.grammarbuilder(v=office.14).aspx


9/52

2

grammars authored using these classes is independent of the Speech Recognition Grammar

Specification.

voice recognition fundamentally functions as a pipeline that converts PCM (Pulse Code

Modulation) digital audio from a sound card into recognized speech. The elements of the

pipeline are:

1. Transform the PCM digital audio into a better acoustic representation

2. Apply a "grammar" so the speech recognizer knows what phonemes to expect. A

grammar could be anything from a context-free grammar to full-blown English.

3. Figure out which phonemes are spoken.

4. Convert the phonemes into words.

1.1 Existing Systems

Although some promising solutions are available for speech synthesis and recognition, most

of them are tuned to English. The acoustic and language model for these systems are for

English language. Most of them require a lot of configuration before they can be used. ISIP

and Sphinx are two of the known Speech Recognition software in open source. gives a

comparison of public domain software tools for speech recognition. Some commercial

software like IBMs ViaVoice are also available.

1.1.1 SIRI

SIRI is an intelligent personal assistant and knowledge navigator which works as an

application for Apple Inc.'s iOS. The application uses a natural language user interface to

answer questions, make recommendations, and perform actions by delegating requests to a set

of Web services. Apple claims that the software adapts to the user's individual preferences

over time and personalizes results. The name Siri is Norwegian, meaning "beautiful womanwho leads you to victory", and comes from the intended name for the original developer's first

child.

Siri was originally introduced as an iOS application available in the App Store by Siri, Inc.,

which was acquired by Apple on April 28, 2010. Siri, Inc. had announced that their software


10/52

3

would be available for BlackBerry and for phones running Android, but all development

efforts for non-Apple platforms were cancelled after the acquisition by Apple.

Siri has been an integral part of iOS since iOS 5 and was introduced as a feature of the iPhone

4S in October 14, 2011. Siri was added to the third generation iPad with the release of iOS 6in September 2012, and has been included on all iOS devices released during or after October

2012. Siri has several fascinating features where you can call or text someone, search

anything, open any app etc with your voice which is very helpful indeed.

1.1.2 S-VOICE

S Voiceis an intelligent personal assistant and knowledge navigator which is only available as

a built-in application for the Samsung Galaxy smartphones. The application uses a natural

language user interface to answer questions, make recommendations, and perform actions by

delegating requests to a set of Web services. It is based on the Vlingo personal assistant.

Some of the capabilities of S Voice include making appointments, opening apps, setting

alarms, updating social network websites such as Facebook or Twitter and navigation. S Voice

also offers efficient multitasking as well as automatic activation features, for example when

the car engine is started.

s-voice possesses same features as siri.

1.1.3 GOOGLE NOW

Google Now is an intelligent personal assistant developed by Google. It is available within the

Google Search mobile application for the Android and iOS operating systems, as well as the

Google Chrome web browser on personal computers. Google Now uses a natural language

user interface to answer questions, make recommendations, and perform actions by delegating

requests to a set of web services. Along with answering user-initiated queries, Google Now

passively delivers information to the user that it predicts they will want, based on their search

habits. It was first included in Android 4.1 ("Jelly Bean"), which launched on July 9, 2012,

and was first supported on the Galaxy Nexus smartphone. The service was made available for

iOS on April 29, 2013 in an update to the Google Search app, and later for Google Chrome on

March 24, 2014.


11/52

4

The expert system voice assistant is based on the combination of 3 major operations

Speech Recognition

Intermediate Operations and result creation

Speech Synthesis

1.2 Speech Recognition

Speech recognition refers to the ability to listen (input in audio format) spoken words and

identify various sounds present in it, and recognise them as words of some known language.

Speech recognition in computer system domain may then be defined as the ability of computer

systems to accept spoken words in audio format - such as wav or raw - and then generate its

content in text format. Speech recognition in computer domain involves various steps with

issues attached with them. The steps required to make computers perform speech recognition

are: Voice recording, word boundary detection, feature extraction, and recognition with the

help of knowledge models. Word boundary detection is the process of identifying the start and

the end of a spoken word in the given sound signal. While analysing the sound signal, at times

it becomes difficult to identify the word boundary. This can can be attributed to various

accents people have, like the duration of the pause they give between words while speaking.

Feature Extraction refers to the process of conversion of sound signal to a form suitable for the

following stages to use. Feature extraction may include extracting parameters such as

amplitude of the signal, energy of frequencies, etc. Recognition involves mapping the given

input (in form of various features) to one of the known sounds. This may involve use of

various knowledge models for precise identification and ambiguity removal. Knowledge

models refers to models such as phone acoustic model, language models, etc. which help the

recognition system. To generate the knowledge model one needs to train the system. During

the training period one needs to show the system a set of inputs and what outputs they shouldmap to. This is often called as supervised learning.


12/52

5

Structure of a standard speech recognition system.


13/52

6

How Speech Recognition Works

A speech recognition engine (or speech recognizer) takes an audio stream as input and turns it

into a text transcription. The speech recognition process can be thought of as having a front end

and a back end.

Convert Audio Input

The front end processes the audio stream, isolating segments of sound that are probably speech

and converting them into a series of numeric values that characterize the vocal sounds in the

signal.

Match Input to Speech Models

The back end is a specialized search engine that takes the output produced by the front end and

searches across three databases: an acoustic model, a lexicon, and a language model.

The acoustic model represents the acoustic sounds of a language, and can be trained to

recognize the characteristics of a particular user's speech patterns and acoustic

environments.

The lexiconlists a large number of the words in the language, and provides information

on how to pronounce each word.

The language modelrepresents the ways in which the words of a language are combined.

For any given segment of sound, there are many things the speaker could potentially be saying.

The quality of a recognizer is determined by how good it is at refining its search, eliminating the

poor matches, and selecting the more likely matches. This depends in large part on the quality of

its language and acoustic models and the effectiveness of its algorithms, both for processing

sound and for searching across the models.

Grammars

While the built-in language model of a recognizer is intended to represent a comprehensive

language domain (such as everyday spoken English), a speech application will often need to

process only certain utterances that have particular semantic meaning to that application. Rather

than using the general purpose language model, an application should use a grammar that

constrains the recognizer to listen only for speech that is meaningful to the application. This

provides the following benefits:

Increases the accuracy of recognition


14/52


15/52

8

Described above are the core elements of the most common, HMM-based approach to speech

recognition. Modern speech recognition systems use various combinations of a number of

standard techniques in order to improve results over the basic approach described above. A

typical large-vocabulary system would need context dependency for the phonemes (so

phonemes with different left and right context have different realizations as HMM states); it

would use cepstral normalization to normalize for different speaker and recording conditions;

for further speaker normalization it might use vocal tract length normalization (VTLN) for

male-female normalization and maximum likelihood linear regression(MLLR) for more

general speaker adaptation. The features would have so-called delta and delta-delta

coefficients to capture speech dynamics and in addition might useheteroscedastic linear

discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use

splicing and an LDA-based projection followed perhaps byheteroscedastic linear discriminant

analysis or a global semi-tied covariance transform (also known as maximum likelihood linear

transform, or MLLT). Many systems use so-called discriminative training techniques that

dispense with a purely statistical approach to HMM parameter estimation and instead optimize

some classification-related measure of the training data. Examples are maximum mutual

information (MMI), minimum classification error (MCE) and minimum phone error (MPE).

Decoding of the speech (the term for what happens when the system is presented with a new

utterance and must compute the most likely source sentence) would probably use the Viterbi

algorithm to find the best path, and here there is a choice between dynamically creating a

combination hidden Markov model, which includes both the acoustic and language model

information, and combining it statically beforehand (the finite state transducer, or FST,

approach).

A possible improvement to decoding is to keep a set of good candidates instead of just

keeping the best candidate, and to use a better scoring function (rescoring) to rate these good

candidates so that we may pick the best one according to this refined score. The set of

candidates can be kept either as a list (theN-best listapproach) or as a subset of the models (a

lattice). Rescoring is usually done by trying to minimize the Bayes risk (or an approximation

thereof): Instead of taking the source sentence with maximal probability, we try to take the

sentence that minimizes the expectation of a given loss function with regards to all possible
http://en.wikipedia.org/w/index.php?title=Delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta-delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta-delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta-delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=N-best_list&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=N-best_list&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=N-best_list&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=N-best_list&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta-delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta-delta_coefficient&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Delta_coefficient&action=edit&redlink=1


16/52

9

transcriptions (i.e., we take the sentence that minimizes the average distance to other possible

sentences weighted by their estimated probability). The loss function is usually the

Levenshtein distance, though it can be different distances for specific tasks; the set of possible

transcriptions is, of course, pruned to maintain tractability. Efficient algorithms have been

devised to rescore lattices represented as weighted finite state transducers with edit distances

represented themselves as a finite state transducer verifying certain assumptions.

Dynamic time warping (DTW)-based speech recognition

Dynamic time warping is an approach that was historically used for speech recognition but has

now largely been displaced by the more successful HMM-based approach.

Dynamic time warping is an algorithm for measuring similarity between two sequences that

may vary in time or speed. For instance, similarities in walking patterns would be detected,

even if in one video the person was walking slowly and if in another he or she were walking

more quickly, or even if there were accelerations and decelerations during the course of one

observation. DTW has been applied to video, audio, and graphics indeed, any data that can

be turned into a linear representation can be analyzed with DTW.

A well-known application has been automatic speech recognition, to cope with different

speaking speeds. In general, it is a method that allows a computer to find an optimal match

between two given sequences (e.g., time series) with certain restrictions. That is, the

sequences are "warped" non-linearly to match each other. This sequence alignment method is

often used in the context of hidden Markov models.

Neural networks

Neural networks emerged as an attractive acoustic modeling approach in ASR in the late

1980s. Since then, neural networks have been used in many aspects of speech recognition such

as phoneme classification, isolated word recognition, and speaker adaptation.

In contrast to HMMs, neural networks make no assumptions about feature statistical

properties and have several qualities making them attractive recognition models for speech

recognition. When used to estimate the probabilities of a speech feature segment, neural

networks allow discriminative training in a natural and efficient manner. Few assumptions on


17/52

10

the statistics of input features are made with neural networks. However, in spite of their

effectiveness in classifying short-time units such as individual phones and isolated words,

neural networks are rarely successful for continuous recognition tasks, largely because of their

lack of ability to model temporal dependencies. Thus, one alternative approach is to use neural

networks as a pre-processing e.g. feature transformation, dimensionality reduction, for the

HMM based recognition.

1.3 Speech Synthesis

Speech synthesis is the artificial production of human speech. A computer system used for

this purpose is called a speech synthesizer, and can be implemented in software or hardware

products. A text-to-speech (TTS) system converts normal language text into speech; other

systems render symbolic linguistic representations like phonetic transcriptions into speech.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored

in a database. Systems differ in the size of the stored speech units; a system that stores phones

or diphones provides the largest output range, but may lack clarity. For specific usage

domains, the storage of entire words or sentences allows for high-quality output.

Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice

characteristics to create a completely "synthetic" voice output.

The quality of a speech synthesizer is judged by its similarity to the human voice and by its

ability to be understood clearly. An intelligible text-to-speech program allows people with

visual impairments or reading disabilities to listen to written works on a home computer.

Many computer operating systems have included speech synthesizers since the early 1990s.


18/52

11

A typical TTS system


19/52

12

A text-to-speech system (or "engine") is composed of two parts. a front-end and a back-end.

The front-end has two major tasks. First, it converts raw text containing symbols like numbers

and abbreviations into the equivalent of written-out words. This process is often called text

normalization, pre-processing, or tokenization. The front-end then assigns phonetic

transcriptions to each word, and divides and marks the text into prosodic units, like phrases,

clauses, and sentences. The process of assigning phonetic transcriptions to words is called

text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody

information together make up the symbolic linguistic representation that is output by the front-

end. The back-endoften referred to as the synthesizerthen converts the symbolic linguistic

representation into sound. In certain systems, this part includes the computation of the target

prosody (pitch contour, phoneme durations),which is then imposed on the output speech.

1.4 Intermediate Operations

After the computer recognizes the speech then it is able to convert the spoken words into

respective text. Now that text can be used as command. whatever we speak will be converted

into a command and that command is handled by various system references.

We can operate, Manage and manipulate any system attribute or element using these

commands. We can use various RSS feeds to create weather, email and other social mediaservices.

Results are created as products of intermediate operations. Now these results are fed into the

speech synthesis engine which is responsible for responding to all the events. Now we can get

some better feedback from the computer


20/52

13

1.5 Architecture of the project


21/52

14

Chapter 2

2. Literature Survey and Related Work

2.1 Microsoft Speech recognition Engine

Windows Speech Recognition is a speech recognition application included in Windows

Vista, Windows 7 and Windows 8.Windows Speech Recognition allows the user to control the

computer by giving specific voice commands. The program can also be used for thedictation

of text so that the user can enter text using their voice on their Vista or Windows 7 computer.

Applications that do not present obvious "commands" can still be controlled by asking thesystem to overlay numbers on top of interface elements; the number can subsequently be

spoken to activate that function. Programs needing mouse clicks in arbitrary locations can also

be controlled through speech; when asked to do so, a "mousegrid" of nine zones is displayed,

with numbers inside each. The user speaks the number, and another grid of nine zones is

placed inside the chosen zone. This continues until the interface element to be clicked is

within the chosen zone.

Windows Speech Recognition has a fairly high recognition accuracy and provides a set ofcommands that assists in dictation.

A brief speech-driven tutorial is included to help

familiarize a user with speech recognition commands. Training could also be completed to

improve the accuracy of speech recognition.

Currently, the application supports several languages, including English (U.S. and British),

Spanish, German, French, Japanese and Chinese (traditional and simplified).

Windows speech recognition plays an important role in the development of expert system

voice assistant. The speech recognition phase is carried out with the help of windows speech

recognition engine
http://en.wikipedia.org/wiki/Transcription_(linguistics)http://en.wikipedia.org/wiki/Transcription_(linguistics)http://en.wikipedia.org/wiki/Transcription_(linguistics)http://en.wikipedia.org/wiki/Transcription_(linguistics)


22/52


23/52


24/52

17

There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4

are all similar to each other, with extra features in each newer version. SAPI 5 however was a

completely new interface, released in 2000. Since then several sub-versions of this API have

been released.

2.3.2 .NET Application Architecture


25/52

18

2.4 Related Work

It would be inappropriate to not mention Siri or Google Now, when discussing voice activated

systems, though they are for mobile devices. Siri relies on web services and hence facilitates

the learning of user preferences over time. However, all the intelligent personal assistants

including Samsungs S Voice, Iris and others use natural language processing following

speech recognition. The use of state machine is limited to context storage and evaluation. One

of the best attempts to create expert system voice assistant was achieved recognition system

followed by natural language processing. From the videos posted of his Project Jarvis we can

concur that the response time of the system is not real time. However, his project was able to

capture the entirety of digital life assistant. Individual projects such as Project Alpha andothers have tried to utilize state systems through the use of Windows Speech Recognition

Macros. Further other small projects such as Project Rita rely on state system for concocting

responses to a command spoken by the user. The scope of these projects, however are limited

due to improper management of macros or keywords.


26/52

19

Chapter 3

3. Problem Description

A voice assistant is not a very traditional or orthodox application. These applications are not

generally available in a very big context. Other thing is that not all the people can interact with

the computer via orthodox input methods like keyboards or mouse click. Some people with

physical disability or those who are not able to see, may find it very difficult to interact with

the computer but with the help of this application they can feel like operating the computer as

smoothly as the normal people do. The problem is that we have to combine the features of

speech recognition, interpretation,

system manipulation, command generation and speech synthesis.

we want the computer to recognize our spoken words and we want the spoken operation to be

performed. After all that we want the application to respond in text to speech or any other

synthetic voice feedback.

We have to make sure that the application understands every command and provides the

results with feedback


27/52


28/52

21

12. killtask - Kills a specified task You have to specify vocally which running task is to be

killed.

13. CMD -Starts a new command prompt window.

14. Start or Close any Program or Directory - You can start any program by saying its

name. You can open or close any directory by commanding the name and you can switch from

one another via voice well. The confirmation of the start and termination can be vocal.

15. Tasklist -Views current running processes.

16. lock - Locks the workstation.

17. Screen off - Turns off the monitor.can dim the brightness of the screen.

18. System specific tasks - You can control your computers regular operations via voice

commands. like you can turn off or put asleep the computer by saying that. you can open close

disk tray by voice commands You can turn you computer off by saying turn off or can put it to

sleep by saying sleep etc..

19. Open any website - You can open a specific website by calling it. This includes manyfamous websites.

20. What is there to offer: The first thing will be to know the potentials and capabilities of

the project, So if the user says what can you do or commands the application will show

the list of commands and operations it can perform.

21 . Print this page: This command is said to print a specific page. The application will take

the spoken word print as an input and the status of the task will be provided as the outputvia voice.

22. Screenshot anything:You can take the screenshot of any page or window by saying the

word.


29/52

22

23. Play music or video Locally: You can just simply instruct the assistant to play a local

music or video file, On the basis of name, artist or genre etc.

24. Multimedia Control: You can control the volume and select the playlist and go to next or

previous track on the basis of voice commands.

25. Manage your Email: You can manage and check for any new emails by saying

something like check mail. The system will vocally response about the fed command and

can read your emails for you.

26 . Presentation control: You can start the presentation go to previous or next slides and end

the presentation

27. Delete file: You can delete any selected file by saying this command

28. Cut/Copy/Paste:You can do these operations on any selected file or text

29. Select all: Say it and it will select the whole document or all the files

Program Options

Start Automatically - If checked, this program will be added to your start-up folder so that it

will start automatically each time you start Windows.

Show Progress Bars- The program can monitor your usage of the mouse and keyboard and

show you the progress you are making at using your voice instead of the mouse and keyboard.

Progress is measured on several dimensions including: mouse clicks, mouse movement,

keyboard letters, and navigation/function keys.

General options -

1. Open and Close Programs

2. Navigate Programs/Folders

3. Switch or Minimize Windows

4. Change Settings


30/52

23


31/52

24

Chapter 5

5. Design and Development

5.1 Required:

Hardware: Pentium Processor, 512MB of RAM, 10GB HDD.

OS:Windows.

Language:C#.

Tools: .Net Framework 4.5, Microsoft Visual Studio 2010, voice macros.The speech signal

and all its characteristics can be represented in two different domains, the time and the

frequency domain A speech signal is a slowly time varying signal in the sense that, when

examined over a short period of time (between 5 and 100 ms), its characteristics are short-

time stationary. This is not the case if we look at a speech signal under a longer time

perspective (approximately time T>0.5 s). In this case the signals characteristics are non-

stationary, meaning that it changes to reflect the different

sounds spoken by the talker To be able to use a speech signal and interpret its characteristics

in a proper manner some kind of representation of the speech signal are preferred.

5.2 Microsoft Visual Studio

Microsoft Visual Studio is an integrated development environment (IDE) from Microsoft. It is

used to develop computer programs for Microsoft Windows superfamily of operating systems,

as well as web sites, web applications and web services. Visual Studio uses Microsoft

software development platforms such as Windows API, Windows Forms, Windows

Presentation Foundation, Windows Store and Microsoft Silverlight. It can produce both native

code and managed code.

Visual Studio includes a code editor supporting IntelliSense as well as code refactoring. The

integrated debugger works both as a source-level debugger and a machine-level debugger.

Other built-in tools include a forms designer for building GUI applications, web designer,

class designer, and database schema designer. It accepts plug-ins that enhance the

functionality at almost every levelincluding adding support for source-control systems (like


32/52

25

Subversion) and adding new toolsets like editors and visual designers for domain-specific

languages or toolsets for other aspects of the software development lifecycle(like the Team

Foundation Server client: Team Explorer).

Visual Studio supports different programming languages and allows the code editor and

debugger to support (to varying degrees) nearly any programming language, provided a

language-specific service exists. Built-in languages include C, C++ and C++/CLI (via Visual

C++), VB.NET (via Visual Basic .NET), C# (via Visual C#), and F# (as of Visual Studio

2010). Support for other languages such as M, Python, and Ruby among others is available via

language services installed separately. It also supports XML/XSLT, HTML/XHTML,

JavaScript and CSS.

Microsoft provides "Express" editions of its Visual Studio at no cost. Commercial versions of

Visual Studio along with select past versions are available for free to students via Microsoft's

DreamSpark program

5.3 Speech Synthesis

The most important qualities of a speech synthesis system are naturalness and intelligibility.

Naturalness describes how closely the output sounds like human speech, while intelligibility is

the ease with which the output is understood. The ideal speech synthesizer is both natural and

intelligible. Speech synthesis systems usually try to maximize both characteristics.

The two primary technologies generating synthetic speech waveforms are concatenative

synthesis and formant synthesis. Each technology has strengths and weaknesses, and the

intended uses of a synthesis system will typically determine which approach is used.

Create TTS Content

The content that a TTS engine speaks is called a prompt. Creating a prompt can be as simple

typing a string. SeeSpeak the Contents of a String.

For greater control over speech output, you can create prompts programmatically using the

methods of the PromptBuilder class to assemble content for prompts from text,Speech

Synthesis Markup Language (SSML), files containing text or SSML markup, and prerecorded
http://en.wikipedia.org/wiki/Intelligibility_(communication)http://en.wikipedia.org/wiki/Formanthttp://msdn.microsoft.com/en-us/library/hh361602(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.promptbuilder.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.promptbuilder.aspxhttp://msdn.microsoft.com/en-us/library/hh361602(v=office.14).aspxhttp://en.wikipedia.org/wiki/Formanthttp://en.wikipedia.org/wiki/Intelligibility_(communication)


33/52

26

audio files.PromptBuilder also allows you to select a speaking voice and to control attributes

of the voice such as rate and volume. SeeConstruct and Speak a Simple Prompt andConstruct

a Complex Prompt for more information and examples

Initialize and Manage the Speech Synthesizer

TheSpeechSynthesizer class provides access to the functionality of a TTS engine in Windows

Vista, Windows 7, and in Windows Server 2008. Using theSpeechSynthesizerclass, you can

select a speaking voice, specify the output for generated speech, create handlers for events that

the speech synthesizer generates, and start, pause, and resume speech generation.

Generate Speech

Using methods on the SpeechSynthesizer class, you can generate speech as either a

synchronous or an asynchronous operation from text, SSML markup, files containing text or

SSML markup, and prerecorded audio files.

Respond to Events

When generating synthesized speech, the SpeechSynthesizer raises events that inform a

speech application about the beginning and end of the speaking of a prompt, the progress of a

speak operation, and details about specific features encountered in a prompt. EventArgs

classes provide notification and information about events raised and allow you to write

handlers that respond to events as they occur

Control Voice Characteristics

To control the characteristics of speech output, you can select a voice with specific attributes

such as language or gender, modify properties of the SpeechSynthesizer such as rate and

volume, or adding instructions either in prompt content or in separate lexicon files that guide

the pronunciation of specified words or phrases.

Apart from the analysis some manual scripts can help in answering the most common

questions without having the trouble of creating a process .
http://msdn.microsoft.com/en-us/library/system.speech.synthesis.promptbuilder.aspxhttp://msdn.microsoft.com/en-us/library/hh361649(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361616(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361616(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspxhttp://msdn.microsoft.com/en-us/library/hh361616(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361616(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361649(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.promptbuilder.aspx


34/52

27

Chapter 6

6.Implementation and coding

6.1 Post Query Design

In Visual C#, you can use either the Windows Form Designer or the Windows Presentation

Foundation (WPF) Designer to quickly and conveniently create user interfaces. For

information to help you decide what type of application to build

Adding controls to the design surface.

Setting initial properties for the controls.

Writing handlers for specified events.

Although you can also create your UI by manually writing your own code, designers enable

you to do this work much faster.

Adding Controls

In either designer, you use the mouse to drag controls, which are components with visualrepresentation such as buttons and text boxes, onto a design surface.As you work visually, the

Windows Forms Designer translates your actions into C# source code and writes them into a

project file that is named name.designer.cs where name is the name that you gave to the form.

Similarly, the WPF designer translates actions on the design surface into Extensible

Application Markup Language (XAML) code and writes it into a project file that is named

Window.xaml. When your application runs, that source code (Windows Form) or XAML

(WPF) will position and size your UI elements so that they appear just as they do on the

design surface. For more information.

Setting Properties

After you add a control to the design surface, you can use the Properties window to set its

properties, such as background color and default text.


35/52

28

In the Windows Form designer, the values that you specify in the Properties window are the

initial values that will be assigned to that property when the control is created at run time. In

the WPF designer, the values that you specify in the Properties window are stored as attributes

in the window's XAML file.

In many cases, those values can be accessed or changed programmatically at run time by

getting or setting the property on the instance of the control class in your application. The

Properties window is useful at design time because it enables you to browse all the properties,

events, and methods supported on a control.

Handling Events

Programs with graphical user interfaces are primarily event-driven. They wait until a user does

something such as typing text into a text box, clicking a button, or changing a selection in a

listbox. When that occurs, the control, which is just an instance of a .NET Framework class,

sends an event to your application. You have the option of handling an event by writing a

special method in your application that will be called when the event is received.

You can use the Properties window to specify which events you want to handle in your code.

Select a control in the designer and click the Events button, with the lightning bolt icon, on the

Properties window toolbar to see its events.

When you add an event handler through the Properties window, the designer automatically

writes the empty method body. You must write the code to make the method do something

useful. Most controls generate many events, but frequently an application will only have to

handle some of them, or even only one. For example, you probably have to handle a button's

Click event, but you do not have to handle its Size Changed event unless you want to do

something when the size of the button changes.

6.2 Prototype And Inception

The Project is being coded in the language csharp.Speech recognition is the very first step in

this process so we start with that.


36/52

29

Initialize the Speech Recognizer

To initialize an instance of the shared recognizer in Windows, we us

C#

SpeechRecognizer sr = newSpeechRecognizer();

Create a Speech Recognition Grammar

One way to create a speech recognition grammar is to use the constructors and methods on the

GrammarBuilderLoad the Grammar into the Speech Recognizer

After the grammar is created, it must be loaded into the speech recognizer. The following

example loads the grammar by calling theLoadGrammar(Grammar)method, passing the

grammar created in the previous operation.

sr.LoadGrammar(g);

Register for Speech Recognition Event Notification

The speech recognizer raises a number of events during its operation, including the

SpeechRecognizedevent. For more information, seeUse Speech Recognition Events.The

speech recognizer raises theSpeechRecognizedevent when it matches a user utterance with a

grammar. An application registers for notification of this event by appending an

EventHandlerinstance as shown in the following example. The argument to the

EventHandlerconstructor,sr_SpeechRecognized, is the name of the developer-written event

handler.

sr.SpeechRecognized += new

EventHandler(sr_SpeechRecognized);

Create a Speech Recognition Event Handler

When you register a handler for a particular event, the Intellisense feature in Microsoft Visual

Studio creates a skeleton event handler if you press the TAB key. This process ensures that
http://msdn.microsoft.com/en-us/library/system.speech.recognition.grammarbuilder.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.grammarbuilder.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.loadgrammar.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.loadgrammar.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.loadgrammar.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/hh361573(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361573(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/hh361573(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/hh361573(v=office.14).aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.loadgrammar.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.grammarbuilder.aspx


37/52

30

parameters of the correct type are used. The handler for theSpeechRecognizedevent shown in

the following example displays the text of the recognized word or phrase using theResult

property on theSpeechRecognizedEventArgsparameter, e.

voidsr_SpeechRecognized(objectsender, SpeechRecognizedEventArgs e)

{

MessageBox.Show(e.Result.Text);

}

System.Speech.Synthesis

Namespace has been efficiently used to synthesize the speech and it gets underway like that

usingSystem;

usingSystem.Speech.Synthesis;

namespaceSampleSynthesis

{

classProgram

{

staticvoidMain(string[] args)

{

SpeechSynthesizer synth = newSpeechSynthesizer();

synth.SetOutputToDefaultAudioDevice();

synth.Speak("This example demonstrates a basic use of Speech Synthesizer");

Console.WriteLine();

Console.WriteLine("Press any key to exit...");

Console.ReadKey();
http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.recognitioneventargs.result.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.recognitioneventargs.result.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.recognitioneventargs.result.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizedeventargs.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizedeventargs.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizedeventargs.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizedeventargs.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.recognitioneventargs.result.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.speechrecognized.aspx


38/52

31

} }}

System.Diagnostics.Process.Start(Name) can be used to execute the commanded text.t

publicstaticProcess Start(

stringfileName,stringarguments,

stringuserName,

SecureString password,

stringdomain

)

6.4 Default Commands.TXT:

Hello

Hello Jarvis

Goodbye

Goodbye Jarvis

Close Jarvis

Jarvis

Stop talking

What's my name?

What time is it

What day is it

Whats todays date

Whats the date

Hows the weather

Whats the weather like

Whats it like outside

What will tomorrow be like

Whats tomorrows forecast

Whats tomorrow like

Whats the temperature


39/52

32

Whats the temperature outside

Play music

Play a random song

You decide

Play

Pause

Turn Shuffle On

Turn Shuffle Off

Next Song

Previous Song

Fast Forward

Stop Music

Turn Up

Turn Down

Mute

Unmute

What song is playing

Fullscreen

Exit Fullscreen

Play video

next window

select all

copy

paste

print this page

Close window

Out of the way

Come back

Show default commands

Show shell commands

Show web commands


40/52

33

Show social commands

Show Music Library

Show Video Library

Show Email List

Show listbox

Hide listbox

Shutdown

Log off

Restart

Abort

I want to add custom commands

I want to add a custom command

I want to add a command

Update commands

Set the alarm

What time is the alarm

Clear the alarm

Stop listening

JARVIS Come Back Online

Refresh libraries

Change video directory

Change music directory

Check for new emails

Read the email

Open the email

Next email

Previous email

Clear email list

Change Language

Check for new updates

Yes


41/52

34

No

back

new folder

take screenshot

paint

go up

go down

save

save as

delete

cut

away

reload

start presentation

next slide

previous slide

end presentation

zoom in

hold control

6.5 RSS_Reader

using System;

using System.Linq;

using System.Text;

using CustomizeableJarvis.Properties;

using System.Xml;

using System.Xml.Linq;

using System.Net;

namespace CustomizeableJarvis

{


42/52

35

class RSSReader

{

public static void CheckForEmails()

{

string GmailAtomUrl = "https://mail.google.com/mail/feed/atom";

XmlUrlResolver xmlResolver = new XmlUrlResolver();

xmlResolver.Credentials = new NetworkCredential(Settings.Default.GmailUser,

Settings.Default.GmailPassword);

XmlTextReader xmlReader = new XmlTextReader(GmailAtomUrl);

xmlReader.XmlResolver = xmlResolver;

try

{

XNamespace ns = XNamespace.Get("http://purl.org/atom/ns#");

XDocument xmlFeed = XDocument.Load(xmlReader);

var emailItems = from item in xmlFeed.Descendants(ns + "entry")

select new

{

Author = item.Element(ns + "author").Element(ns + "name").Value,

Title = item.Element(ns + "title").Value,

Link = item.Element(ns + "link").Attribute("href").Value,

Summary = item.Element(ns + "summary").Value

};

frmMain.MsgList.Clear(); frmMain.MsgLink.Clear();

foreach (var item in emailItems)

{

if (item.Title == String.Empty)

{

frmMain.MsgList.Add("Message from " + item.Author + ", There is no subject


43/52

36

and the summary reads, " + item.Summary);

frmMain.MsgLink.Add(item.Link);

}

else

{

frmMain.MsgList.Add("Message from " + item.Author + ", The subject is " +

item.Title + " and the summary reads, " + item.Summary);

frmMain.MsgLink.Add(item.Link);

}

}

if (emailItems.Count() > 0)

{

if (emailItems.Count() == 1)

{

frmMain.Jarvis.SpeakAsync("You have 1 new email");

}

else { frmMain.Jarvis.SpeakAsync("You have " + emailItems.Count() + " new

emails"); }

}

else if (frmMain.QEvent == "Checkfornewemails" && emailItems.Count() == 0)

{ frmMain.Jarvis.SpeakAsync("You have no new emails"); frmMain.QEvent =

String.Empty; }

}

catch { frmMain.Jarvis.SpeakAsync("You have submitted invalid log in information");

}

}

public static void GetWeather()

{

try


44/52

37

{

string query = String.Format("http://weather.yahooapis.com/forecastrss?w=" +

Settings.Default.WOEID.ToString() + "&u=" + Settings.Default.Temperature);

XmlDocument wData = new XmlDocument();

wData.Load(query);

XmlNamespaceManager man = new XmlNamespaceManager(wData.NameTable);

man.AddNamespace("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");

XmlNode channel = wData.SelectSingleNode("rss").SelectSingleNode("channel");

XmlNodeList nodes = wData.SelectNodes("/rss/channel/item/yweather:forecast",

man);

frmMain.Temperature=

channel.SelectSingleNode("item").SelectSingleNode("yweather:condition",

man).Attributes["temp"].Value;

frmMain.Condition=

channel.SelectSingleNode("item").SelectSingleNode("yweather:condition",

man).Attributes["text"].Value;

frmMain.Humidity = channel.SelectSingleNode("yweather:atmosphere",

man).Attributes["humidity"].Value;

frmMain.WinSpeed = channel.SelectSingleNode("yweather:wind",

man).Attributes["speed"].Value;

frmMain.Town = channel.SelectSingleNode("yweather:location",

man).Attributes["city"].Value;

frmMain.TFCond=

channel.SelectSingleNode("item").SelectSingleNode("yweather:forecast",


45/52

38

man).Attributes["text"].Value;

frmMain.TFHigh=


man).Attributes["high"].Value;

frmMain.TFLow=


man).Attributes["low"].Value;

frmMain.QEvent = "connected";

}

catch { frmMain.QEvent = "failed"; }

}

public static void CheckBloggerForUpdates()

{

if (frmMain.QEvent == "UpdateYesNo")

{

frmMain.Jarvis.SpeakAsync("There is a new update available. Shall I start the

download?");

}

else

{

String UpdateMessage;

String UpdateDownloadLink;

string AtomFeedURL = "http://google.com";

XmlUrlResolver xmlResolver = new XmlUrlResolver();

XmlTextReader xmlReader = new XmlTextReader(AtomFeedURL);

xmlReader.XmlResolver = xmlResolver;

XNamespace ns = XNamespace.Get("http://www.w3.org/2005/Atom");


46/52

39

XDocument xmlFeed = XDocument.Load(xmlReader);

var blogPosts = from item in xmlFeed.Descendants(ns + "entry")

select new

{

Post = item.Element(ns + "content").Value

};

foreach (var item in blogPosts)

{

string[] separator = new string[] { "
" };

string[] data = item.Post.Split(separator, StringSplitOptions.None);

UpdateMessage = data[0];

UpdateDownloadLink = data[1];

if (UpdateDownloadLink == Properties.Settings.Default.RecentUpdate)

{

frmMain.QEvent = String.Empty;

frmMain.Jarvis.SpeakAsync("No new updates have been posted");

}

else

{

frmMain.Jarvis.SpeakAsync("A new update has been posted. The description

says, " + UpdateMessage + ".");

System.Windows.Forms.MessageBox.Show(UpdateMessage, "Update

Message");

frmMain.Jarvis.SpeakAsyncCancelAll();

frmMain.Jarvis.SpeakAsync("Would you like me to download the update?");

frmMain.QEvent = "UpdateYesNo";

Properties.Settings.Default.RecentUpdate = UpdateDownloadLink;

Properties.Settings.Default.Save();

}

}}}}


47/52

40

Chapter 7

7.1 Conclusion and Future work:

In this project a simple mechanism that could eliminate the excess use of Natural Language

Processing. This takes us another step closer to the most ideal expert voice assistant However,

there is still lot of scope for research on this topic and Switch State Mechanism only offers us

a partial solution that solves the responsiveness issue or the computation time for

understanding the command

In this Project Expert voice assistant which uses mainly human communication means such

Twitter, instant message and voice to create two way connections between human and his

computer, controlling it and its applications, notify him of breaking news, FacebooksNotifications and many more. In our project we mainly use voice as communication means so

the ESVA is basically the Speech recognition application. The concept of speech technology

really encompasses two technologies: Synthesizer and recognizer. A speech synthesizer takes

as input and produces an audio stream as output. A speech recognizer on the other hand does

opposite. It takes an audio stream as input and thus turns it into text transcription. The voice is

a signal of infinite information. A direct analysis and synthesizing the complex voice signal is

due to too much information contained in the signal. Therefore the digital signal processes

such as Feature Extraction and Feature Matching are introduced to represent the voice signal.

In this project we directly use speech engine which use Feature extraction technique as Mel

scaled frequency cepstral. The mel- scaled frequency cepstral coefficients (MFCCs) derived

from Fourier transform and filter bank analysis are perhaps the most widely used front- ends

in state-of-the-art speech recognition systems. Our aim to create more and more functionalities

which can help human to assist in their daily life and also reduces their efforts. In our test we

check all this functionality is working properly.

In the future this is going to be one of the most prominent technologies that are going to

evolve around the technical world. This application might not fulfill all the commands that

user want it to have but in future the commands can be in various ranges and forms Language

support can be extended as well


48/52


49/52

42

Chapter 8

8.1 Snapshot of the GUI


50/52


51/52


52/52

expert system voice assistant

Documents