mobility assistant for visually impaired(mavi) on cloud · mobility assistant for visually impaired...

Mobility Assistant for VisuallyImpaired(MAVI) on Cloud

A thesis submitted in partial fulfillmentof the requirements for the degree of

BACHELOR OF TECHNOLOGY &MASTER OF TECHNOLOGY

in

Computer Science & Engineering

by

Akhil Soni2013CS50300

Under the guidance of

Prof. M. BalakrishnanDr. Chetan Arora

Department of Computer Science and Engineering,Indian Institute of Technology Delhi.

June 2018

Certificate

This is to certify that the thesis titled Mobility Assistant for Visually

Impaired(MAVI) on Cloud being submitted by Akhil Soni for the award

of Bachelor of Technology & Masters of Technology in Computer

Science & Engineering is a record of bona fide work carried out by him

under my guidance and supervision at the Department of Computer Sci-

ence & Engineering. The work presented in this thesis has not been sub-

mitted elsewhere either in part or full, for the award of any other degree or

diploma.

Prof. M. Balakrishnan

Department of Computer Science and Engineering

Indian Institute of Technology, Delhi

Dr. Chetan Arora

Department of Computer Science and Engineering

Indraprastha Institute of Information Technology, Delhi

Abstract

Mobility Assistant for Visually Impaired (MAVI) is a device which aims to

improve the life of visually impaired in terms of safety, social inclusion and

navigation. This thesis implements a cloud and online solution for MAVI.

Initially, the available cloud services were identified and the best fit cloud

service was chosen to proceed. A simple prototype was developed for my lo-

cal system and then eventually, a prototype of MAVI on cloud was developed

on Raspberry Pi.

Network latency analysis was performed in order to improve the solution.

A detailed survey was done in order to estimate the cloud run times. The

accuracies of all the modules of MAVI that were ported to cloud (which are

Face Detection, Cow and Dog Detection and OCR) were reported. The En-

ergy Consumption was also calculated.

In the final phases, a comparative study of the batch processing of images

was performed to show the effect of batch sizes on the total run time. The

comparison between PivotHead smart camera and USB webcam was per-

formed to identify the better fit for our application.

Finally, a fully functional end to end MAVI on Cloud prototype was de-

veloped comprising of a Raspberry Pi, a USB Webcam and the Android

App.

Acknowledgments

I would like to thank my supervisor, Prof. M. Balakrishnan for providing

me with the opportunity to work on this interesting project as my M.Tech

Project. His unfailing support, guidance and help have been invaluable dur-

ing the course of this project. I am grateful for all the help I received from

him.

I would also like to thank Dr. Chetan Arora for his valuable insights and

assistance regarding subject of Computer Vision.

Mr. Rajesh Kedia and Mr. Anupam Sobti played an integral role in my

project. I sincerely thank them for all their insights, experience, coding ex-

pertise and efforts that have really helped me.

I also extend my thanks to my friends Deepanker Mishra(2013CS50282),

Garvit Jain(2013CS50284) and Akhil Masa(2013MT60602) for their insights

and coding expertise that I have received at various instances.

Special thanks to Mr. S. D. Sharma for providing me with all the lab equip-

ments and support. This is not just the result of my efforts but an outcome

of efforts of several individuals.

Akhil Soni

Contents

1 Prelude 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation and Objective . . . . . . . . . . . . . . . . . . . . 1

1.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Survey of Cloud Services 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Google Vision API . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Microsoft Computer Vision API . . . . . . . . . . . . . . . . . 5

2.4 Amazon Rekognition . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 IBM Watson Visual Recognition . . . . . . . . . . . . . . . . . 6

2.6 SkyBiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.7 Comparison and Conclusion . . . . . . . . . . . . . . . . . . . 6

3 Setup and Testbed for the evaluation 8

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . 8

3.1.2 Virtual Machine on Google Cloud . . . . . . . . . . . . 9

3.1.3 Connecting to the Server . . . . . . . . . . . . . . . . . 9

3.2 Phase1 : On PC . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Installing Pre-Requisites . . . . . . . . . . . . . . . . . 10

3.2.2 Running the Vision API . . . . . . . . . . . . . . . . . 11

3.3 Phase2 : On Raspberry Pi . . . . . . . . . . . . . . . . . . . . 11

3.3.1 Installing Pre-Requisites . . . . . . . . . . . . . . . . . 11

3.3.2 Running the Vision API . . . . . . . . . . . . . . . . . 12

c© 2018, Indian Institute of Technology Delhi

CONTENTS

4 Experimental Setup 13

4.1 Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 General Methodology . . . . . . . . . . . . . . . . . . . 13

4.1.2 Face Detection . . . . . . . . . . . . . . . . . . . . . . 14

4.1.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.4 Animal Detection . . . . . . . . . . . . . . . . . . . . . 16

4.1.5 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 16

4.1.6 Experiment Details . . . . . . . . . . . . . . . . . . . . 16

4.2 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Current Measurements . . . . . . . . . . . . . . . . . . 17

4.2.2 Energy Measurements . . . . . . . . . . . . . . . . . . 17

5 Results 19

5.1 Network Latency Analysis . . . . . . . . . . . . . . . . . . . . 19

5.1.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 19


5.1.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.4 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 25

5.1.5 Mean and Standard Deviation . . . . . . . . . . . . . . 26

5.2 Cloud Run Time Analysis . . . . . . . . . . . . . . . . . . . . 27

5.2.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 27


5.2.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.4 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 33

5.2.5 Mean and Standard Deviation . . . . . . . . . . . . . . 34

5.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 35



5.3.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Energy Consumption Analysis . . . . . . . . . . . . . . . . . . 41

5.4.1 Current Readings . . . . . . . . . . . . . . . . . . . . . 41

5.4.2 Energy Readings . . . . . . . . . . . . . . . . . . . . . 41

6 Batch Processing 44

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Prototype 46

7.1 Capturing Images . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.1.1 PivotHead Smart Camera . . . . . . . . . . . . . . . . 46

7.1.2 USB Web Camera . . . . . . . . . . . . . . . . . . . . 47

7.2 Prototype1 - Using PivotHead . . . . . . . . . . . . . . . . . . 48

7.3 Prototype2 - Using USB Web Camera . . . . . . . . . . . . . . 48

7.4 PivotHead Vs WebCam . . . . . . . . . . . . . . . . . . . . . 49

8 Conclusions 50

8.1 Cloud Service Used . . . . . . . . . . . . . . . . . . . . . . . . 50

8.2 Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . 50

8.3 Cloud Run Time . . . . . . . . . . . . . . . . . . . . . . . . . 50

8.4 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 51

8.5 Final Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Bibliography 52

Chapter 1

Prelude

1.1 Introduction

Mobility Assistant for Visually Impaired(MAVI)

MAVI is an ambitious project aimed at enabling mobility for visually im-

paired individuals, especially in India. The three major problems that MAVI

tackles are Safety, Social Inclusion and Navigation. The overview of the sys-

tem is shown in the figure below: [6]

Figure 1.1: MAVI Overview

1.2 Motivation and Objective

Being blessed with all the basic sensory organs, we don’t realize the impor-

tance of any unless we come across someone who is deprived of one or more


1.3 Thesis Contribution

of the basic senses. Eyes being one of the most important and basic sensory

organs that a human needs to function, MAVI aims to bring vision to the

visually impaired and make them visualize as much as they can. Being a vi-

sually impaired person is really challenging. Just imagine a scenario where in

one has to live his entire life with his eyes closed. It’s a nightmare. Hence, vi-

sual impairment is one of the severe type of disabilities a person must endure.

With MAVI, we are trying to solve serious problems which could be solved

using the current technologies. With the advancement in cloud technologies,

a lot of computation can be done on the cloud which might save some time if

we run the same algorithms locally. Also, with the growing number of people

every day gaining access to Internet connectivity, cloud computation will be

a really good idea. For example, the face detection service provided by cloud

service providers might be significantly faster than the algorithm which has

been developed for performing face detection locally. Hence, Objective of

this thesis is to implement and develop a cloud solution for MAVI, perform

network latency analysis and compute the accuracy of the cloud services.

1.3 Thesis Contribution

The thesis is focused at implementing a cloud solution for MAVI and per-

forming an in depth analysis for the entire system. Initially, the various cloud

services available were explored and the best fit for our application(MAVI)

was identified. Once, the cloud service was identified, the system was set up

on my PC, wherein an image is uploaded from my PC to the cloud and the

cloud returns with a solution. Once, the PC setup was working perfectly,

the system was ported to Raspberry Pi. Initially the images were stored

locally on the Raspberry Pi in the memory card but gradually the system

dynamically took images from the camera and an end to end MAVI on Cloud

system was developed. The later and final part of the thesis focused on the

analysis of the system developed. An in depth network analysis was done on

three different spectrums 3G, 4G and IIT Delhi campus WiFi respectively.

Batch processing of images and the analysis of the same was also performed.


1.4 Thesis Outline

Finally, the accuracy of the system was measured.

1.4 Thesis Outline

The thesis comprises of 8 chapters including this one. Chapter 2 discusses

the comparison and survey of the various cloud services available. Chap-

ter 3 describes about the experimental setup on local PC and Raspberry Pi

respectively. Chapter 4 is about the experimental setup detailing how the

experiments were carried out, datasets used for analysis, the tools used, etc.

Chapter 5 shows the results for accuracy, latency and cloud runtime analy-

sis. Chapter 6 describes about the batch processing experiment. Chapter 7

depicts the prototype developed. Chapter 8 concludes and summarizes my

entire work along with references attached in the end.


Chapter 2

Survey of Cloud Services

2.1 Introduction

In this chapter, the various cloud services available online are explored and

compared with each other. Big tech giants like Google, Amazon, Microsoft

and IBM provide vision API’s where in they have their own computer vi-

sion algorithms which incorporate various features such as Face Detection,

Natural Language Processing(OCR), Logo Detection and several other fea-

tures. There are also some certain cloud services which cater to only specific

features like Face Detection and Recognition. Skybiometry is one such ap-

plication which pertain only to face detection and recognition. Let us now

analyze the various cloud services available.

2.2 Google Vision API

Google’s Vision API is one of the best among the various cloud service

providers. It provides various features such as Face Detection, Label Detec-

tion and Logo Detection. Apart from this, it also provides Landmark Detec-

tion and Explicit Content Detection. Optical Character Recognition(OCR)

is also supported by the Google’s Vision API. There are a total of 56 lan-

guages supported where the OCR can detect text.[8] These 56 languages

include English and various Indian languages such as Marathi, Tamil, San-

skrit and Bengali. The modules of MAVI which could be ported to cloud by

using Google’s Vision API are Face Detection, OCR in Sign Board Detection

and Animal Detection(Cow and Dog) in Label Detection.


2.3 Microsoft Computer Vision API

2.3 Microsoft Computer Vision API

Microsoft’s Vision API provides features such as categorizing images, identi-

fying image types, fagging adult content, Optical Character Recognition(OCR)

and several other features such as generating thumbnails and perceiving color

schemes.[7] There are a total of 25 languages supported by Microsoft’s Vision

API in OCR which includes English but none of the Indian languages unlike

Google’s Vision API. The languages supported are quite few as compared to

the Google’s Vision API. The categorizing image feature of Microsoft’s Vision

API is analogous to Google’s Label Detection. It classifies image into several

categories including animals like cow and dog. The MAVI modules which

could be ported to cloud by using Microsoft’s Vision API are Face Detec-

tion, OCR in Sign Board Detection(just English) and Animal Detection(Cow

and Dog) in Categorizing Images.

2.4 Amazon Rekognition

Amazon Rekognition provides features such as Object and Scene Detection,

Facial Recognition, Facial Analysis, Face Comparison, Unsafe Image Detec-

tion, Celebrity Detection and Text in Image(OCR). Amazon Rekognition

supports text in most Latin Scripts for text detection. It recognizes up to 50

sequences of characters per image and lists them as words and lines. Also,

this feature recognizes only text horizontal with +/- 30 degrees orientation.

Facial Recognition finds similar faces in a large collection of images.[10] The

modules of MAVI that can be ported to cloud using Amazon Rekognition are

Face Detection, Face Recognition, Animal Detection(Cow and Dog) in Ob-

ject and Scene Detection and Optical Character Recognition(OCR) in Text

in Image.


2.5 IBM Watson Visual Recognition

2.5 IBM Watson Visual Recognition

IBM Watson Visual Recognition provides Image classification and Face De-

tection. Image classification classifies the image into various classes such as

bench, dog, bush, swings, text and a few other.[4] The languages supported

by IBM Watson Visual Recognition are just 8 including English, French,

Spanish, German, Korean, Japanese, Italian and Arabic. There is no sup-

port for any Indian Language. The MAVI modules that can be ported to

cloud using IBM Watson Visual Recognition are Face Detection, Text De-

tection and Animal Detection(using Image Classification).

2.6 SkyBiometry

SkyBiometry is a specialized tool which provides only specific feature of Facial

Detection and Recognition. No other support is available. It is a tool that

is meant specifically for Facial Detection and Recognition.

2.7 Comparison and Conclusion

Table 2.1: Cloud Services Comparison

FD FR AD OCR

Google Yes No Yes Yes(EN,HI)

Microsoft Yes No Yes Yes(EN)

Amazon Yes Yes Yes Yes(EN)

IBM Yes No Yes Yes(EN)

SkyBiometry Yes Yes No No

FD - Face Detection

FR - Face Recognition

AD - Animal Detection


2.7 Comparison and Conclusion

The above table clearly summarizes the various cloud services available with

respect to MAVI modules. Hence, based on the above table, the Google’s

Vision API was chosen over the others as it provides OCR support in both

English and Hindi respectively. Also there is extensive support available on

the web for Google’s Vision API. Hence, based on these two conclusions, I

chose to build the MAVI on Cloud with cloud services taken from Google’s

Vision API. The features which were used from the Google’s Vision API

are Face Detection, Label Detection and OCR for detecting faces, detecting

animals(cows and dogs) and for detecting the text in SignBoards.


Chapter 3

Setup and Testbed for the eval-

uation

3.1 Introduction

Once the cloud service is finalized, the next step is to setup the system and

get it working. It was done in two phases. In the initial and the starting

phase, the setup was done on my local PC while in the second and the latter

phase, the setup was ported to Raspberry Pi.

3.1.1 Getting Started

Google provides a free trial of its cloud services where in it provides credits

worth $300 free which has a validity of one year. There is a term called units

defined for billing purposes by Google. Google charges for each feature that

is applied on an image. Here, each and every feature that is applied to an

image is termed as a billable unit by Google. For example, if one applies

Face Detection and Label Detection to the same image, the user is billed for

1 unit of Label Detection and 1 unit for Face Detection both. The pricing

system of Google is as depicted in the figure below:

Figure 3.1: Google Pricing as of 19/06/18 [3]


3.2 Phase1 : On PC

3.1.2 Virtual Machine on Google Cloud

The next step was to create a Virtual Machine(VM) on the Google Cloud.

This VM is basically the host for all of our cloud computations. All the

cloud processing that is going to happen will be on this VM. The VM that

is created has the following specifications:

Zone - us-central1-a

1 shared vCPU, 0.6GB memory

Operating System - Ubuntu16.04

SSD Persistent Disk - 15GB

Overall Rent/month - $4.34

The above specifications were sufficient for the project purpose and were

decided focusing on the aim to minimize the VM rent per month.

3.1.3 Connecting to the Server

Once the VM is set up and running, the next step is to establish a connection

to the VM through the local machine or the Raspberry Pi. SSH keys are used

in order to connect to the server. SSH keys can be generated in numerous

ways. PuttyGen has been used to generate SSH keys. Hence, a private key

and public key pair is generated. We would use these set of keys to connect

to the server on the Google Cloud. Now, since the VM is set up, the next

step is to run the Vision API.

3.2 Phase1 : On PC

In the starting phase, the aim was to run it successfully on the local system.

Putty has been used to connect to the VM that is hosted on the Google

Cloud and FileZilla to transfer files across the systems. Following are the


3.2 Phase1 : On PC

pre requisite softwares and packages that needed to be installed on the VM

before one can actually run the Vision API.

3.2.1 Installing Pre-Requisites

The pre requisites are Python-Pip, the Google-Cloud library for python and

finally the credentials for the Vision API need to be created. The sequential

execution of the following commands would ensure the smooth installation

of all the pre requisites. The credentials for the Vision API are created from

the console GUI of the google-cloud. The page looks like this:

Figure 3.2: Credentials[1]

Clicking on the Create credentials button would create a JSON file which is

to be uploaded to the VM. The sequential execution of the following com-

mands would ensure a successful setup of the Google Cloud.

sudo apt-get install python-pip

sudo pip install google-cloud

pip install –upgrade pip


3.3 Phase2 : On Raspberry Pi

nano $HOME/.profile

export GOOGLE APPLICATION CREDENTIALS=$PATH

source $HOME/.bashrc

$PATH is the path where the JSON file is stored

3.2.2 Running the Vision API

In this setup, the images were stored on my local system. Once all the pre

requisites are met, a generalized script was written which took in an image

as an input and then produced the desired output. This script comprises

of three scripts. One which is run on the VM, it runs the Vision API and

produces the output in a txt file on the VM itself. Second script is used to

upload the image from the local system to the VM and the third script then

brings back the generated txt file to the local system.


3.3.1 Installing Pre-Requisites

The procedure to install pre requisites on Raspberry Pi is very similar to

that discussed earlier. The pre requisites remain the same which are to in-

stall Python Pip, installing google cloud library for python and loading the

credentials for the Vision API. The sequential execution of the following

commands would ensure a smooth and successful installation of all the pre

requisites:[2]

sudo apt-get install python-pip

sudo pip install –upgrade pip

sudo apt-get install libjpeg8-dev

sudo pip install –upgrade google-api-python-client

sudo pip install –upgrade Pillow sudo su

sudo nano $HOME/.bashrc

export GOOGLE APPLICATION CREDENTIALS=$PATH



source $HOME/.bashrc

$PATH is the path where the JSON file is stored

3.3.2 Running the Vision API

In this case, when the pre requisites are installed, there is no external need

to upload the image to the VM separately. Hence, a single script suffices.

This script takes an image as an input which is stored on the SD card of

Raspberry Pi, uploads it to the cloud and receives back the output on the

SD card. The coding of the script in Raspberry Pi is not the same as that

of the scripts used on the local system. The scripts differ in the part where

the images need to be uploaded on cloud. In case of Raspberry Pi, a single

Request serves the purpose of uploading the image on the cloud as well as

downloading the JSON output from the Google Cloud whereas in case of

PC, there were two different scripts specialized to do each of the two tasks

of uploading image and running the Vision API and downloading the JSON

object.


Chapter 4

Experimental Setup

In this chapter, the experimental setup is described along with how the var-

ious experiments are conducted in order to perform the Network Latency

and Energy Consumption analysis. Setup for network latency is described

first and then Energy Consumption. Let me start by Network Latency then

proceeding to Energy Consumption.

4.1 Network Latency

4.1.1 General Methodology

The default SSID hotspot name to which the Raspberry Pi connects is

MAVI hotspot and the password to the same is Mavi@123. The images are

stored on the memory card which is inserted in Raspberry Pi. The datasets

that are used in all the experiments are the MAVI datasets which is to say

MAVI Face Detection dataset, MAVI Cow and Dog dataset, MAVI Sign-

Board dataset for analyzing Face images, Animal Images and OCR Images

respectively. To analyze the network behavior, a tool named Wireshark is

used. The command line tool for Wireshark is known as Tshark. Tshark is

used to analyze the network behavior during the experiments. The Tshark

output is directed to a txt file. A parser in python is written in order to

parse the txt file and extract meaningful information from the same like Im-

age Upload time, Cloud Run Time and the JSON Download time.

The python parser parses the tshark txt file into meaningful data. This data

is then transferred to an excel sheet. The data is further processed and then

individual Matlab scripts are written to plot graphs for Face Detection, An-

imal Detection and OCR respectively.


4.1 Network Latency

The output of tshark looks somewhat like in the figure given below:

Figure 4.1: Tshark Output

4.1.2 Face Detection

The images are characterized into three classes:

• Class 1 - Images containing 1 face.

• Class 2 - Images containing 2 faces.

• Class 3 - Images containing 4 faces.

There are a total of 25 images chosen from each class. Also there are two

types of experiments done:

• Experiment 1 - Where only Face Detection Algorithm of the Vision

API is running(on all the three classes of images).

• Experiment 2 - Where all algorithms(Face Detection, Label Detection

and Text Detection) of the Vision API are running(on all the three

classes of images).


4.1 Network Latency

Hence, there are two sets of experiments which are performed over 75 images

each. These images are chosen such that ensuring the required number of

faces are detected in each of them. This is to stay consistent with the Cloud

Run Time.

There is a field named Face Annotations in the response JSON which provides

us the information like how many faces are detected and the bounding box

co ordinates of each of the faces. I look for this field in the response JSON.

4.1.3 OCR

The images are classified into two classes:

• Class 1 - Images containing SignBoards with just English Text

• Class 2 - Images containing SignBoards with both English and Hindi

Text

There are a total of 25 images chosen from each class. Also there are two

types of experiments done:

• Experiment 1 - Where only Text Detection Algorithm of the Vision

API is running(on both the classes of images).


and Text Detection) of the Vision API are running(on both the classes

of images).


each. There is a field named Text Annotations in the response JSON which

provides us the information with what is the text detected and the bounding

box co ordinates of the text that is detected. I look for this field in the

response JSON.


4.1 Network Latency

4.1.4 Animal Detection

There is only one class of images here containing cows and dogs. There are

a total of 25 images. There are two types of experiments done:

• Experiment 1 - Where only Label Detection Algorithm of the Vision

API is running.


and Text Detection) of the Vision API are running.


each. There is a field named Label Annotations in the response JSON which

provides us the information with what are the labels that are detected. I

look for this field in the response JSON.

4.1.5 Diverse Images

25 images were collected of diverse qualities captured from the PivotHead

camera. These images had a blend of variety of images, some containing

faces, some containing animals and some containing SignBoards. Another

experiment was performed in which all the algorithms of the Vision API

were running(Face Detection, Text Detection and Label Detection).

4.1.6 Experiment Details

All the above sets of experiments were performed during 6 different times

of days and on multiple days. To ensure the network behavior is captured

correctly, all the above mentioned experiments were conducted under three

different network spectrums - 3G, 4G and IIT Delhi Campus Wifi.


4.2 Energy Consumption


4.2.1 Current Measurements

An external device is used in order to analyze the current that is absorbed

by Raspberry Pi when the prototype is running. Initially, the current is mea-

sured when there are no external components connected to the Raspberry

Pi apart from the power source. Then, after connecting components such as

keyboard, mouse and monitor for the GUI output of Raspberry Pi, current

is measured. This gives the base readings for comparison.

There are two sets of experiments that are performed:

• Set 1 - Images are stored on Raspberry Pi itself. The code is run.

• Set 2 - Images are dynamically captured from the USB Webcam and

the full prototype is run including the MAVI app

In both of the above mentioned sets of experiments, there were two subclasses

in each of them one in which the Internet Connectivity is provided using WiFi

Hotspot and other in which the Internet Connectivity is provided using USB

Tethering.

4.2.2 Energy Measurements

The same external device which was used to measure current is used to

measure Energy too. But, while measuring Energy, a 2600mAH power bank

is also used. Here there were again two sets of experiments performed:

• Set 1 - In this, the external USB device is used to measure the Energy.

In this experiment, there were two different experiments performed

including USB Tethering and WiFi Hotspot

• Set 2 - In this, 2600mAh power bank is used to power the Raspberry

Pi and check how long does it take to completely discharge the power

bank thereby calculating Energy used. In this experiment too, there



were two different experiments performed including USB Tethering and

WiFi Hotspot.


Chapter 5

Results

5.1 Network Latency Analysis

As described earlier, the experiments were performed under three different

network conditions mainly 3G, 4G and IITD Wifi. The experiments were

performed during 6 different times of the day which are 7 A.M, 10 A.M, 1:30

P.M, 4:30 P.M, 7:30 P.M and 12 A.M. Also, the network latency is assumed

to be same as the latency involved in uploading an image because the latency

in downloading the JSON is negligible(of the order of 10 mili seconds). A

comparative analysis of these 3 networks during these 6 different times of a

day is described below:


There are a total of 150 images which are plotted in the upcoming graphs

since there were 75 images each belonging to the 2 different types of experi-

ments as discussed earlier.

In all the subsequent graphs, X axis denotes the time in milliseconds and

Y axis denotes the fraction of images. A point on the graph denotes what

fraction of images are uploaded(y co-ordinate) by the corresponding time(x

co-ordinate) The following graph is a cumulative distribution function which

shows the variation of the upload time of images under 3G network:



0 1000 2000 3000 4000 5000 6000 7000 80000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Upload Time for Face Images − 3G

7 A.M

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.1: Upload Time for face images - 3G

The following graph is a cdf which shows the variation of the upload time of

images under 4G network:

0 1000 2000 3000 4000 5000 6000 7000 80000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Upload Time for Face Images − Airtel4G

7 A.M

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.2: Upload Time for face images - 4G


images under IITD Wifi network:



0 1000 2000 3000 4000 5000 6000 7000 80000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Upload Time for Face Images − IITDWifi

7 A.M

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.3: Upload Time for face images - IITD Wifi


There are a total of 50 images which are plotted in the upcoming graphs since

there were 25 images each belonging to the 2 different types of experiments

as discussed earlier.





0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for animal Images − 3g

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.4: Upload Time for animal images - 3G



0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for animal Images − airtel

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.5: Upload Time for animal images - 4G





0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for animal Images − wifi

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.6: Upload Time for animal images - IITD Wifi

5.1.3 OCR


since there were 50 images each belonging to the 2 different types of exper-

iments as discussed earlier. The following graph is a cdf which shows the

variation of the upload time of images under 3G network:

0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for OCR Images − 3g

7:00 A.M.

10:00 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.7: Upload Time for Sign Board Images - 3G





0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for OCR Images − Airtel4G

7:00 A.M.

10:00 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.8: Upload Time for Sign Board Images- 4G



0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for OCR Images − wifi

7:00 A.M.

10:00 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.9: Upload Time for Sign Board Images - IITD Wifi





there were 25 images and only one type of experiment as discussed earlier.

The following graphs are cdf’s which shows the variation of the upload time

of images under 3G, 4G and IITD Wifi network respectively:

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for miixed Images − 3g

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.10: Upload Time for diverse images - 3G

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for mixed Images − Airtel4G

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.11: Upload Time for diverse images - 4G



0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

1.2

Time(in ms)

Frac

tion

of Im

ages

Upload Time for mixed Images − IITDWifi

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.12: Upload Time for diverse images - IITD Wifi

5.1.5 Mean and Standard Deviation

The mean and standard deviation of the upload times under 3G, 4G and

IITD Wifi networks are captured in the tables below:

Table 5.1: Mean and Standard Deviation - 3G

Face Images Animal Im-ages

OCR Images Diverse Im-ages

Mean 1487.47ms 640.53ms 938.21ms 527.40ms

StandardDevia-tion

1368.08ms 346.22ms 910.52ms 275.31ms


5.2 Cloud Run Time Analysis




Mean 734.12ms 370.52ms 563.12ms 393.89ms

StandardDevia-tion

839.37ms 702.84ms 645.56ms 362.58ms

Table 5.3: Mean and Standard Deviation - IITD Wifi



Mean 249.57ms 298.38ms 264.09ms 181.94ms

StandardDevia-tion

336.32ms 636.23ms 554.28ms 439.32ms




there were 25 images each of three different characteristics of image namely

images containing 1 face, 2 face and 4 faces respectively as discussed earlier.

The average over 75 images is plotted in the upcoming graphs assuming the

same cloud run time of 1 face, 2 faces and 4 faces. This was concluded after

observing no significant difference in their run times. The following graph is

a cdf which shows the variation of the Cloud Run Time of images under 3G

network:



0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for Face Images − 3G

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.13: Cloud Run Time for face detection - 3G

The following graph is a cdf which shows the variation of the Cloud Run

Time of images under 4G network:

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for Face Images − Airtel4G

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.14: Cloud Run Time for face detection - 4G




Time of images under IITD WIfi network:

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for Face Images − IITDWifi

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.15: Cloud Run Time for face detection - IITD Wifi



since there was just one class of images containing 25 images as discussed

earlier.

The following graphs is a cdf which shows the variation of the Cloud Run




0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for animal Images − 3g

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.16: Cloud Run Time for animal detection - 3G


Time of images under 4G network

0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for animal Images − Airtel4G

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.17: Cloud Run Time for animal detection - 4G


Time of images under IITD Wifi network



0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for animal Images − IITDWifi

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.18: Cloud Run Time for animal detection - IITD Wifi

5.2.3 OCR


there are 2 different characteristics of images, one on which just English text

detection algorithm is run and one on which both English and Hindi text

detection algorithm is run. The following graph is a cdf which shows the

variation of the Cloud Run Time of images under 3G network:

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.2

0.4

0.6

0.8

1

Time(in ms)

Fra

ctio

n o

f Im

ages

Algo Time for ocr Images − 3g

7:00 A.M. − English


1:30 P.M. − English



12 A.M. − English

7:00 A.M. − Both

10:00 A.M. − Both

1:30 P.M. − Both

4:30 P.M. − Both

7:30 P.M. − Both

12 A.M. − Both

Figure 5.19: Cloud Run Time for OCR - 3G





0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for ocr Images − Airtel4G






12 A.M. − English

7:00 A.M. − Both

10:00 A.M. − Both

1:30 P.M. − Both

4:30 P.M. − Both

7:30 P.M. − Both

12 A.M. − Both

Figure 5.20: Cloud Run Time for OCR - 4G


Time of images under IITD WIfi network:

0 500 1000 1500 2000 2500 3000 35000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for ocr Images − IITDWifi






12 A.M. − English

7:00 A.M. − Both

10:00 A.M. − Both

1:30 P.M. − Both

4:30 P.M. − Both

7:30 P.M. − Both

12 A.M. − Both

Figure 5.21: Cloud Run Time for OCR - IITD Wifi





since there were 25 images and only one characteristic of images are there as

discussed earlier. The following graphs are cdf’s which shows the variation

of the Cloud Run Time of images under 3G, 4G and IITD WIfi network

respectively:

0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for mixed Images − 3g

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.22: Cloud Run Time for all algorithms combined - 3G

0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for mixed Images − airtel

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.23: Cloud Run Time for all algorithms combined - 4G



0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Time(in ms)

Frac

tion

of Im

ages

Algo Time for mixed Images − IITDWifi

7 A.M.

10 A.M.

1:30 P.M.

4:30 P.M.

7:30 P.M.

12 A.M.

Figure 5.24: Cloud Run Time for all algorithms combined - IITD Wifi

5.2.5 Mean and Standard Deviation

The mean and standard deviation of the Cloud Run Times under 3G, 4G

and IITD Wifi networks are captured in the tables below:


Face Animal OCR En-glish

OCRBoth

All Algo-rithms

Mean 708.25ms 865.02ms 835.16ms 2090.28ms 835.8ms

StandardDevia-tion

208.61ms 140.11ms 215.35ms 962.38ms 113.89ms


5.3 Accuracy



OCRBoth

All Algo-rithms

Mean 766ms 928.45ms 909.76ms 1719.21ms 952.64ms

StandardDevia-tion

133.63ms 145.23ms 238.05ms 421.86ms 127.19ms

Table 5.6: Mean and Standard Deviation - IITD Wifi


OCRBoth

All Algo-rithms

Mean 970.33ms 986.58ms 1015.80ms 1812.47ms 1089.53ms

StandardDevia-tion

162.60ms 335.14ms 261.57ms 485.02ms 223.96ms

5.3 Accuracy

In this section, I will discuss about the accuracy of the various algorithms

that are run on Cloud and the specific details too for which the test results

are positive and negative


There are a total of 731 images in the MAVI Face Detection dataset. They

are classified under two illumination conditions 1 and 2. The results are as

depicted in the graph below:


5.3 Accuracy

0

50

100

150

200

250

300

0

50

100

150

200

250

300

350

400

−1

1

Width Of The Bounding BoxHeight of The Bounding Box

Figure 5.25: Face Detection Accuracy

In the above figure, the X-axis depicts the width of the face in the image

and the Y-axis depicts the height of the face in the image. On the Z-axis, -1

depicts that a face is not detected and +1 depicts that the face is detected.

As we can see from the graph, all the faces which are not detected mostly

lie in the box with dimensions less than 50x50. A detailed description of

variation of accuracy with face sizes is shown in the table below:

Table 5.7: Variation of Accuracy with Face Size

Size Images Detected Images Accuracy

30x30 89 0 0%

40x40 179 1 0.005%

50x50 114 58 50.87%

60x60 79 76 96.20%

60x60 + 270 269 99.62%

In the above table, number of face images corresponding to size AxB denotes

the number of images that have their dimensions smaller than AxB but bigger

than the ones in the previous row. For example, 40x40 denotes the number


5.3 Accuracy

of images whose dimensions are less than 40x40 but bigger than 30x30 and

so on. The variation of accuracy with illumination conditions is shown in the

graph below:

Figure 5.26: Illumination Vs Accuracy

The distribution of image sizes with the two illumination conditions are as

shown in the graphs below:

0

50

100

150

200

250

300

0

50

100

150

200

250

300

350

400

−1

1


Figure 5.27: Illumination1 Vs Accuracy


5.3 Accuracy

0

50

100

150

200

250

0

50

100

150

200

250

300

350

−1

1


Figure 5.28: Illumination2 Vs Accuracy


Cow

There are a total of 1599 images of Cows in the MAVI Cow Dataset. These

images are classified according to two domains. Domain 1 classifies images

as Standing and Sitting Cows while Domain 2 classifies images as Front, Side

and Back poses of Cows.

Figure 5.29: Accuracy - Cow Detection


5.3 Accuracy

The graph above shows the accuracies in all of these different categories. The

table below captures the details of the images with respect to different poses

of cows and the accuracy of the same respectively.

Table 5.8: Variation of Accuracy with Cow Poses

Pose Images Detected Images Accuracy

Standing 1552 695 44.78%

Sitting 44 13 29.54%

Front 161 72 44.72%

Side 1157 600 51.85%

Back 280 36 12.85%

Dog

There are a total of 1557 images of Cows in the MAVI Dog Dataset. These

images are classified according to two domains. Domain 1 classifies images

as Standing and Sitting Dogs while Domain 2 classifies images as Front,

Side and Back poses of Dogs. The table below captures the details of the

images with respect to different poses of dogs and the accuracy of the same

respectively.

Table 5.9: Variation of Accuracy with Dog Poses

Pose Images Detected Images Accuracy

Standing 667 273 40.93%

Sitting 889 343 38.58%

Front 322 129 40.06%

Side 997 458 45.93%

Back 237 29 12.23%

The graph below shows the accuracies of DOg detection in all the different

poses of Dogs:


5.3 Accuracy

Figure 5.30: Accuracy - Dog Detection

5.3.3 OCR

The OCR accuracy is divided into two classes:

• Class 1 - English Accuracy in which only English language is specified

in the request to the Vision API

• Class 2 - English and Hindi Accuracy in which both English and Hindi

is specified in the request to the Vision API.

The dataset used is the MAVI OCR dataset updated as on 25th June, 2018.

The accuracy is computed as character by character. A total of 1428 images

are taken into consideration in Class 1 and a total of 1409 images are taken

into consideration in Class 2.

The Accuracy in Class 1 is 90.26% while in Class 2 is just 60.58%

Hence, as we can see the Accuracy is much higher when just the English lan-

guage is specified as compared to when both English and Hindi language are

specified. Also, the accuracy of English text detection decreases too when


5.4 Energy Consumption Analysis

both English and Hindi are specified as compared to when just English is

specified.


5.4.1 Current Readings

Base Current is the current absorbed by Raspberry Pi when there are no

external components connected apart from the power source.

Base Current - 0.31A The current grows to 0.72A when external com-

ponents like mouse, keyboard, monitor are connected and the Internet con-

nectivity is provided through USB tethering.

The following table shows the readings of the two set of experiments that

were performed:

Table 5.10: Current Consumption

WiFiHotspot

USB Tether-ing

Images stored on Raspberry Pi 0.33A - 0.35A 0.77A - 0.84A

Entire Prototype(Images capturedfrom camera and the output sent toapp via Bluetooth)

0.45A - 0.55A 0.82A - 0.88A

In the second experiment where the images are taken from camera, the cur-

rent consumption increases just when every new image is captured.

5.4.2 Energy Readings

The one iteration of the following experiments consisted of 13 different ex-

periments namely diverse images, 1face, 1face - all, 2 face, 2 face all, 4 face,

4 face, animal, animal all, ocr English, ocr English all, ocr Both, ocr Both

all. Each of which consisted of 25 images.



1/2/4face - Images containing 1/2/4 faces and just the face detection algo-

rithm is run

1/2/4face all - Images containing 1/2/4 faces and all the algorithms are run

A similar convention is followed for all the remaining naming’s too.

USB Tethering

The Readings obtained through power bank are as follows:

Power Bank Capacity at 3.7V = 2600mAh

Power Bank Capacity at 5V = 3.7/5 * 2600 = 1924mAh

Total Run Time – 2 hrs 20 mins

10.5 iterations of the experiment done

3425 images computed(10*13*25 + 1*7*25)

Average Current Drawn = 824.5 mA(1924*3/7)

Measurements through USB Device:

Initially – 460 mAh

After 1 iteration – 624 mAH

1 iteration cost – 164 mAh

10.5 iterations cost – 1726mAh

Hence, the device has roughly around 10% error since the actual total energy

used is 1924mAh whereas if we calculate the total energy used by the USB

device, it turns out to be 1726mAh.

WiFi Hotspot

The readings obtained through power bank are as follows:

Power Bank Capacity at 3.7V – 2600mAh

Power Bank Capacity at 5V = 3.7/5 * 2600 = 1924mAh

Total Run Time – 5 hrs 20 mins



30.38 iterations of the experiment done

9875 images computed(30*13*25 + 1*5*25)

Average Current Drawn = 360.75 mA(1924*3/16)

Measurements through USB Device:

Initially – 1329 mAh

After 1 iteration – 1389 mAH

1 iteration cost – 60 mAh

30.38 iterations cost – 1823mAh

Hence, the device has roughly around 10% error since the actual total energy

used is 1924mAh whereas if we calculate the total energy used by the USB

device, it turns out to be 1823mAh.


Chapter 6

Batch Processing

6.1 Introduction

In this chapter, the batch processing of images on the cloud is described.

Here, a batch of images will be evaluated together on the cloud and the

corresponding output would also be received in batches. The flow of this

chapter will be as follows: I will start with the Experimental Setup followed

by the Results

6.2 Experimental Setup

There are a total of 60 images comprising of a blend of diverse images like

SignBoard Images, Animal Images and Face Images. There are two set of

experiments performed on these images:

• Class 1 - Sequential execution of algorithms on images which means

Images are processed one by one and one after the another

• Class 2 - Processing a batch of images together. The batch size was

varied from 5 to 10 to 15.

All the above mentioned experiments were conducted under three network

spectrums namely 3G, 4G and IITD Wifi as were the earlier experiments

too.

Google cloud doesn’t support a batch size of more than 16 as of 21/06/2018.

6.3 Results

In the subsequent tables, batch size 1 is equivalent to processing one image

after another sequentially. The following table shows the results of batch

processing of images under 3G Network:


6.3 Results

Table 6.1: Batch Processing Of Images - 3G

Batch Size Mean Standard Deviation

1 137.4s 1.48s

5 79.54s 3.38s

10 84.07s 2.12s

15 77.28s 1.62s

The following table shows the variation of run time with batch size under 4G

network:

Table 6.2: Batch Processing Of Images - 4G


1 95.64s 4.90s

5 40.19s 10.15s

10 38.93s 8.84s

15 33.91s 5.46s

The following table shows the variation of run time with batch size under

IITD Wifi network:

Table 6.3: Batch Processing Of Images - IITD WIfi


1 96s 2s

5 54s 2s

10 32.5s 2.5s

15 30s 2s


Chapter 7

Prototype

7.1 Capturing Images

In this chapter, the MAVI On Cloud prototypes are described. As the MAVI

system requires a continuous stream of images that are processed. Hence, we

need a camera too. There are two sources available to capture images which

are using the PivotHead smart camera and using the USB Web Camera. In

the following sections, a detailed description of both of these cameras are

mentioned.

7.1.1 PivotHead Smart Camera

Figure 7.1: PivotHead wearable smart camera[9]

PivotHead is a wearable device like a spectacle with a camera attached in

between the two glasses. It has various modes like the live streaming mode


7.1 Capturing Images

and the image capture mode. It generates a wifi hotspot over which the

images are transferred. Thus, in order to receive the captured images during

live streaming from the PivotHead camera, one needs to connect to the wifi

hotspot generated by the same. It also has a provision for inserting a memory

card where the captured images can be stored. For more details, one can have

a look at the extensive documentation of the PivotHead camera available

online.

7.1.2 USB Web Camera

Another way of capturing images is using the USB webcam. We have used

the Logitech USB webcam. For capturing images, the fswebcam tool is in-

stalled in Raspberry Pi. The images are taken on RaspBerry Pi using the

following command:

fswebcam –no-banner -r 640x480 image.jpg

The Logitech USB webcam that is used looks like the figure below:

Figure 7.2: Logitech USB Web Camera[5]


7.2 Prototype1 - Using PivotHead

7.2 Prototype1 - Using PivotHead

In this prototype, PivotHead smart camera is used for capturing images.

Since this is a MAVI on Cloud prototype, a common portable wifi hotspot is

generated. The images from the PivotHead camera is transferred to Rasp-

berry Pi over this hotspot itself and the same hotspot is also used to provide

Internet Connectivity so that the Google Vision API can run. The latency

in this prototype is large since there are two processes running on the same

network, the images are getting transferred as well as the images are getting

uploaded to the cloud. Hence, the order of latency per image is roughly

around 4 to 6 seconds. Also, the PivotHead Camera live streaming stops

automatically after some time due to over heating. Hence, there are a few

drawbacks to this prototype.

The visually impaired person will wear the PivotHead smart camera and

the Raspberry Pi will be mounted in a case. The person walks as the Pivot-

Head camera captures images, the images are sent to the Google Cloud for

processing and finally the MAVI android app speaks out the information

detected in the image.

7.3 Prototype2 - Using USB Web Camera

In this prototype, the Logitech USB Web Camera is used in order to capture

images. The latency in this prototype is lesser than as compared to the

earlier one. Though, the quality of images captured by Webcam is not as

good as the one’s captured from PivotHead. This prototype works extremely

well in indoor settings but performs poorly in the outdoor settings. Thus,

this prototype has a few drawbacks too. The following section presents a

comparison of the two prototypes discussed, the pro’s and con’s of both of

them.


7.4 PivotHead Vs WebCam

7.4 PivotHead Vs WebCam

As discussed in the earlier two sections, both the prototypes have their own

pro’s and con’s. The following table summarizes the advantages and disad-

vantages of both the prototypes.

Table 7.1: Prototypes Comparison

Prototype1 - PivotHead Prototype2 - Webcam

Large amounts of latency(4s to 6s) Latency around 2.5s to 3.5s

Unreliable source stream Reliable source stream

Touch sensor is disabled due tooverheating of the device

No touch sensors

Good Quality Of Images Image quality not as good.

Performs equally well in both in-door and outdoor settings

Performs great for indoor settingsbut performs poorly in outdoorsettings.


Chapter 8

Conclusions

8.1 Cloud Service Used

As discussed in chapter 2, after analyzing and comparing the features of

various cloud services available online, Google Cloud was finally chosen for

the purpose. The Google Cloud Vision API provides both English and Hindi

language support for OCR while none of the other cloud services provide

the same. Also, there is an extensive documentation available of the Goolge

Cloud Vision API as compared to the others and there is a lot of online

support available for the Google Cloud Vision API too. Hence, google cloud

best suits the MAVI application.

8.2 Network Latency

In chapter 5, the results of Network Latency were presented under three

network conditions 3G, 4G and IITD Wifi respectively. The total run time is

computed as the sum of the upload time, cloud run time and the download

time. The upload time is the governing factor in the total run time as it

comprises of the major chunk. The upload time is lowest in IITD Wifi and

highest in 3G which is also expected as the bandwidth decreases as we go

from IITD WIfi to 4G to 3G networks.

8.3 Cloud Run Time

In chapter 5, the results of Cloud Run Time were presented under three

network conditions 3G, 4G and IITD Wifi respectively.

As we can see from the subsection 5.2.5, the Cloud Run Time remains more or

less the same with change in network as is also expected. The OCR algorithm



when run only for English language takes lesser time as compared to when

it is run for both English and Hindi languages. The standard deviation

in the cloud run time is much lesser than that in the upload time. This

can be attributed to the reason that upload time is primarily dependent

on the network speed and bandwidth available while the cloud run time is

independent of such factors leading to the small standard deviations.


As seen in the results in section 5.4, the current drawn by Raspberry Pi is not

much when the Internet connectivity is provided via WiFi hotspot whereas

it is significantly higher when the Internet Connectivity is provided through

USB tethering. This is due to the reason that the mobile draws current too

to get charged when there is USB tethering. There is just a slight increase

in current when an image is captured by the USB camera. There is no

significant increase in current if the images are stored on the Pi itself and

not taken dynamically from the camera.

8.5 Final Prototype

As discussed in chapter 7, there were two prototypes that were developed.

But, finally the USB webcam prototype was chosen for the demo on the Open

House Day due to the small amounts of latency involved in it and the reliable

source stream of images. Since Open House day was conducted in outdoor

settings, initially the prototype didn’t perform well as the images that were

captured were all of bad quality. But, once this was realized, the USB web

cam was placed in such a way so that the images captured were appropriate

which means to say in such a way that images were neither too bright nor

too dark.


Bibliography

[1] Google cloud console home page https://console.cloud.google.com/home.

[2] Google cloud vision on raspberry pi

https://www.dexterindustries.com/howto/use-google-cloud-vision-

on-the-raspberry-pi.

[3] Google vision api pricing https://cloud.google.com/vision/pricing.

[4] Ibm watson visual recognition https://www.ibm.com/watson/developercloud/visual-

recognition/api/v3/curl.html?curl.

[5] Logitech web cam https://www.logitech.com/en-

roeu/product/webcam-c170.

[6] Mavi overview http://www.cse.iitd.ac.in/mavi/.

[7] Microsoft vision api https://docs.microsoft.com/en-us/azure/cognitive-

services/computer-vision/home.

[8] Ocr language support - google https://cloud.google.com/vision/docs/languages.

[9] Pivothead https://www.techrepublic.com/article/pivothead-debuts-

next-generation-smartglass-at-wearable-tech-expo/.

[10] Amazon Rekognition https://aws.amazon.com/rekognition/image fea-

tures/.

[7, 3, 1, 2, 10, 4, 6, 5, 9, 8]


mobility assistant for visually impaired(mavi) on cloud · mobility assistant for visually impaired...

Documents