presented by: michal nir, saar gross supervisors: nadav golbandi, oren somekh computer science...

Presented by: Michal Nir, Saar GrossSupervisors: Nadav Golbandi, Oren Somekh

Computer Science Department

Industrial Project (234313)

Tuesday, January 24, 2012

This project extends on a previous project which includes a client application (Android) and a server application (Running on Tomcat). The user takes a photo using his smartphone and records an

audio linked to that photo. Tags are extracted from the audio using speech-to-text and

the photo, with its tags, is uploaded to Flickr. The speech-to-text engine (Sphinx) works best using

small dictionaries. In our project, we will try to supply Sphinx with a custom

dictionary created for each photo (Or stack of photos) using the photo’s geo-location information.

Using the geo-location info, we can extract relevant tags from Flickr, thus creating the custom dictionary.

Implement a new module, running on the server application, that will create custom dictionaries for the Sphinx voice-to-text engine.

Optimize the algorithm for creating the custom dictionary while achieving optimal results with acceptable hit on performance.

The server generates tag recommendations, in one of two ways:

Uploading an image (Or multiple images) that contains a geo-location, with an audio file attached, will trigger the server to create a custom dictionary for the Sphinx voice-to-text engine.

The client may ask for tag recommendations by sending a request containing the image’s geo-location only.

The server can also be instructed not to use the image’s geo-location for compiling the recommendations list (Privacy concerns) and in that case, only the user’s “private tags” will be used.

The server supports uploading multiple images- When uploading multiple images, images are

clustered into different groups based on location (Using a simple and deterministic algorithm).

The server will compile a recommendation list for each group.

Every image with an audio file attached will be processed using Sphinx with its group’s custom dictionary.

All images will be uploaded to Flickr using their identified tags and user-supplied tags.

Returning recommendations only for a group of images is essentially the same.

Except, we only return recommendations for the largest group of images.

Method of compiling a recommendation list for an image (Or group of images):

Group of images

Public Tags(Based on geo-location)

By ranking tags found in images near the given geo-location

Public Tags(Based on geo-location)

By querying Flicker’s Places API

Private Tags(NOT using geo-location)

By ranking the user’s past used tags

Implemented using

independent threads

(All running in parallel)

Implemented using

independent threads

(All running in parallel)

Merging Results

Merging parameters are configurable

To Android Client(When asking for Tag

Recommendations only)

To Sphinx(When uploading images to

Flickr)

Server side: 1. Tag Recommendation are compiled for an

image/group of images and can be presented to the user (Recommendation only) or used for Sphinx voice-to-text.

2. Performance:1. In general- Pretty good.2. Compiling a recommendation list usually takes no

more than a few seconds.3. In any case, a time limit is enforced.4. Most interaction with Flickr is completely multi-

threaded to avoid bottlenecks.5. Compiled recommendation lists are cached based on

time and location to optimize performance further.

Server properties file: 1. Virtually all parameters needed for the server are acquired externally

from a properties (Settings) file.1. Tweaking the server becomes an easy and intuitive task.

2. The server uses 2 different sets of settings:1. Settings to be used when uploading images to Flickr.2. Settings to be used when asking for Tag Recommendations only.

1. Gives us more flexibility when changing the server’s settings.

3. Example from imageupload.properties:

x

Client side:

Client side:Merged the Camera and Gallery applications into one.Added a new Tag Editor (Can now add/edit and remove tags from images).Added support for working with multiple images and getting tag recommendations.Many bug fixes and GUI improvements:

New Image Properties dialog. Updated menus and icons. Improved gallery performance and design.

For evaluating the algorithm’s performance, we would like to do the following:

Find a user who uploaded many tagged images (With a reasonable time difference between them) in a popular location (San Francisco bridge, Las-Vegas Strip).

Perform a cross-validation analysis- Choose a subset of images from the user’s images. Send the images to server and receive tag recommendations for

them. Evaluate the accuracy (Precision and Recall) of the

recommendations using the 2 left-out images. Repeat…

Our expectations are that accuracy will be affected by many factors-

Number of tags merged into final recommendation list from each source.

Dictionary size.

We wrote TagRecTestFramework- Completely automated. Behaves like a “normal” client (Server thinks

it’s talking to an Android client). For each given location-

Finds a user with enough tagged images (Configurable…) in the area with a small time difference between images (Also configurable).

Perform cross-validation on grouped images.

- 10 images in each group, Min. of 20 tags per image- Search radius: 1 KM, Time difference between images: Max. 1 day

Piazza San Pietro (Vatican City)(41.902309, 12.457341)

Algorithm’s accuracy is very image/user-dependent:

We found that most images in Flickr are not tagged or tagged with irrelevant tags.

Most images on Flickr are not geotagged. Flickr has ~5 billion photos. Only ~170 million are geotagged (~3% of all photos).

Quality of results could be improved by tweaking the server’s settings-

Giving more weight to private/public tags affects the accuracy.

Compiling a larger recommendation list (And thus, a larger dictionary for Sphinx) improves recall but may hurt Sphinx’s performance (Sphinx works best with small dictionaries).

presented by: michal nir, saar gross supervisors: nadav golbandi, oren somekh computer science...

Documents