university college london department of computer science cs m038/gz06: mobile and cloud computing...

University College LondonDepartment of Computer Science

CS M038/GZ06: Mobile and Cloud Computing

Paper presentation

Students: Shaig Mursalzade & Vasos Koupparis Date: 14.03.2014

CrowdSearch: Accurate Real-time Image Search on Mobile Phones

Motivation and problem definition

Main contributions and aspects of work

Design decisions

Experimental evaluation

Related work

Future work

Conclusion

Outline:

Mobile phones are right hands of people. More than 70% of smartphone users perform internet

search. However there are several challenges such as:Small form-factorResource limitations and etc.

For these reasons, on-board sensors are used for multimedia search.


Example: Use of GPS and voice for searching is powerful nowadays.

But: Image search falls behind.Question: What is actually image search?


Search by taking a picture!

Google Goggles: Taking a picture of a famous landmark would search for information about it, or taking a picture of a product's barcode will search for information on the product.

What is actually image search?

Image search has significant challenges due to variations in: Lighting Texture Image quality and etc.Multimedia searches require significant memory, storage and computing resources.So, search should be precise and generate few erroneous results.

New approach: CrowdSearch.

Limitations

Accurate image search system that combines automated image search with real time human validation of search results.

Automated image search –> generates candidate search results Real time validation -> uses Amazon Mechanical Turk for

validation by humans (for monetary cost)

CrowdSearch requires: an image query, a query deadline and a payment mechanism for human validators.

Sensitive aspects: delay, accuracy, monetary cost and energy.

CrowdSearch

CrowdSearch interface and validation task example

Figure 1.p.78.CrowdSearch iPhone interface

Figure 2.p.79.

How to construct tasks such that they are likely to be answered quickly? Simple format for validation tasks. Validator is required to provide

simple YES or NO answer. YES for correctly matching images, NO for otherwise.

How to minimize human error and bias? Requesting several duplicate responses for a validation task from

multiple validators and aggregating the responses using majority rule.

How to price a validation task to minimize delay? It is typically better to have more tasks at a low price than fewer

tasks at a high price.

Design choices

Parallel posting to optimize delay(expensive in terms of monetary cost)

Serial posting to optimize cost (incurs much higher delay than parallel posting, when top ranked image is incorrect)

CrowdSearch prediction algorithm (optimizes delay and cost)

Optimizing delay and cost.

CrowdSearch algorithm estimates the probability that any of the received valid sequences of “YES, NO” answers occurs during the deadline. Done by using:1) Models of inter-arrival times of responses from human

validators2) Probability estimates for each sequence that can lead to a

positive validation results.

IF probability of a valid result within the deadline is less than a pre-defined threshold Pth, validation task is posted for next candidate image.


Consider S(i) as partial sequence received. CrowdSearch uses 2 functions.

DelayPredict which estimates the probability than sequence S(j) will be received before the deadline

ResultPredict which estimates probability that the sequence S(j) will occur given that sequence S(i) has been received so far.

As these probabilities are independent from each other, product of the two, P(j) is the probability that the sequence S(j) is received prior to the deadline.

We compute P+, which is the accumulation of P(j) for all cases where the remaining sequence leads to a positive answer.

This gives us the predicted probability that current task can be validated as positive given S(i) results are received.


Probability tree called SeqTree is used for predicting validation results.

Predicting validation results

For two leaf nodes where only the last bit is different, they have a common parent node whose sequence is the common substring of the two leaf nodes. For example, nodes ‘YNYNN’ and ‘YNYNY’ have a parent node of ‘YNYN’. The probability of a parent node is the summation of the probability from its children. Following this rule, the SeqTree is built, where each node Si is associated with a probability pi that its sequence can happen.

Given the tree, it is easy to predict the probability that Sj occurs given partial sequence Si using the SeqTree. Simply find the nodes that correspond to Si and Sj respectively, and the probability we want is pj/pi

Figure 5 (p.83)

Image search process contains 2 major steps:1) Extracting features from query image2) Search through database images with features of query

image.Features set of compact image representations called visual terms.

Advantage: Compact can be communicated from phone to remote server with

low energy cost. Disadvantage

Has significant computation overhead and delay.

Image search overview

The first question is whether visterm extraction should be performed on the mobile phone or remote server? System chooses the best option for visterm extraction depending on

the availability of WiFi connectivity. If only 3G connectivity is available, visterm extraction is performed locally, whereas if WiFi connectivity is available, the raw image is transferred quickly over the WiFi link and performed at the remote server.

The second question is whether inverted index lookup should be performed on the phone or the remote server? It is chosen to use a remote server for inverted index lookup as

having database on phone is not feasible and it makes harder to update the database to add new images.

Implementation tradeoff

CrowdSearch components

Figure 6 (p.84)

In the next slides, 4 aspects of the performance of CrowdSearch are evaluated.

1) Improvement in image search precision2) Accuracy of the delay models3) Ability to tradeoff cost and delay4) Energy efficiency

CrowdSearch experimental evaluation

In the graph x axis is the length of ranked list obtained from search engine. Y axis is the precision indicator. Top-ranked response has 80% precision for categories such as

buildings and books Very poor precision for faces and flowers.

Therefore, we can not present the results directly to users!

Precision of automated search results

Figure 7 (p.85)

X axis shows 4 different image categories. Y axis is the precision indicator.

How human validation can improve image search precision?

Human-validated search scheme returns only the candidate images on the ranked list that are deemed to be correct. Automated image search simply returns the top five images on the ranked list.

Two key observations: 1) Considerable improvement in

all strategies.2) Second, among the four

schemes, human validation with majority(5) is easily the best performer and consistently provides accuracy greater than 95% for all image categories.

Figure 8 (p.85)

X axis show time with seconds Y axis is the cumulative distribution function.

Accuracy of delay models

Figure shows the cumulative distribution functions (CDF) for the first response. This model is derived by the convolution of the acceptance time and submission time distribution. Graph shows that the model parameters for the acceptance, submission, as well as the total delay for the first response fit the testing data very well.

The scatter points are for testing dataset and the solid line curves are for our model.

Figure 9 (p.86)

Here, we evaluate the CrowdSearch algorithm on its ability to meet a user-specified deadline while maximizing accuracy and minimizing overall monetary cost. CrowdSearch is compared against 2 schemes:

parallel posting Parallel posting posts all five candidate results at the same time.

serial posting Serial posting processes one candidate result at a time and returns the first

successfully validated answer.

Ability to tradeoff cost and delay

We evaluate three aspects: precision, recall, and cost. 1) Precision is the ratio of the number of correct results to the total

number of results returned to the user. 2) Recall is the ratio of number of correctly retrieved results and the

number of results that actually correct.3) Cost is measured in dollars.

At lowest deadline neither scheme obtains many responses from human validators, hence the recall is very poor and many valid images are missed. The parallel scheme is quicker to recover as the deadline increases, and recall quickly. The serial scheme does not have enough time to post multiple candidates, however, and is slower to improve.


Y axis indicates Recall, X axis shows deadline

For stringent deadlines of 120 seconds or lower, CrowdSearch posts tasks aggressively since its prediction correctly estimates that it cannot meet the deadline without posting more tasks. Thus, the recall follows parallel posting. Beyond 120 seconds, CrowdSearch tends to wait longer and post fewer tasks since it has more slack. This can lead to some missed images which leads to the dip at 180 seconds. Again the recall increases after 180 seconds since the CrowdSearch has better prediction with more partial results.

Figure 10 (p.87)

Figure shows the average price per validation task as a function of the user-specified delay. When the deadline is small, CrowdSearch behaves similar to parallel search. When deadline is larger than 120 seconds, the cost of CrowdSearch is significantly smaller and only 6-7% more than serial search.


Y axis indicates Cost, X axis shows deadline

180 or 240 seconds is ideal to obtain a balance between delay, cost, and accuracy in terms of precision and recall.

Figure 10 (p.87)

We consider two design choices: 1) remote processing where phones are used only as a query front-end while all

search functionality is at the remote server, and 2) partitioned execution, where the visterm extraction is performed on the

phone and the search by visterms is done at the remote server. In each design, we consider the case where the network connection is via 3G

and WiFi.

Energy efficiency

Energy consumption of the partitioned scheme is the same in 3G and WiFi. This is because visterms are very compact. With WiFi, remote processing is more efficient than local processing. But, communicating the image via 3G is more expensive, as 3G has greater power usage and lower bandwidth. Results confirm design choice of using remote processing when WiFi is available and local processing when only 3G is available.

Figure 12 (p.88)

Related Work

Image SearchGoogle Goggle[21]: primarily advertised for building landmarks Why ?Techniques such as SIFT, visterm-extraction using vocabulary trees and inverted lookup

approaches have a well known limitation.

Limitation: work best for mostly planar images eg. buildings work poor for non-planar images eg. faces

iScope system[31]: a multi-modal image search system for mobile devices Performs image search using: - mixture of feature

- temporal or spatial information

CrowdSearch use of real time human validation does not improve the performance of an image search engine BUT helps filtering incorrect responses and return only the good ones.

Using multiple features can help image search engine performance BUTIt does not solve the problem of low accuracy for certain categories

Related Work

Participatory Sensing: sensing using mobile phones and in-built accelerometers, high-quality cameras, microphones, and digital compasses.

Urban Sensing[4] a platform extracting patterns of use and citizens’ perceptions related or concerning city spaces

Nokia’s SensorPlanet[22]:a global test platform for mobile-centric wireless sensor network research.

MetroSense[7]

SurroundSense[1]: a mobile phone based system that explores logical localization via ambience fingerprinting.

Such projects concentrate to utilize humans with mobile phones for providing sensor data that can eventually be used for applications such as traffic monitoring.

CrowdSearch is distinct from those approaches as it is focus on designing human-in-the-loop computation systems rather than just using mobile phones for data collection.

Related Work

Crowdsourcing:

reCaptcha[29]:uses humans to solve difficult OCR tasks - enabled digitazation of old books and newspapers- protects websites against robots

ESP game[28]:uses humans to find good labels for image - Faciliate image search

- rewards participants with points if the players provide matching models

auction-based crowdsourcing model (eg: Taskcn[23] )

Simultaneous crowdsourcing contests (eg: TopCoder[25] )

CroudSearch inspired from those approaches BUT differs in that it focus on using croudsourcing to provide real-time search services for mobile

users.

Related Work

Many apps utilize micro-payment crowdsourcing systems eg.AMT including the use of crowdsoursing for labelling images and other complex data items.

Sorokin et al[15] show that a quick way to annotate a large image databases is using AMT.

- is done in an offline manner- image annotation is noisy VS validation of candidates from a search engine

It seems that CrowdSearch model can have broader applicability for others apps that use croudsourcing systems

Amazon Remembers[27]: it takes phone based queries and uses crowdsourcing to retrieving product information.

- Combines mobile phones with croudsourcing

Crowd Search enables real-time responses by specifying deadlines and combining automated and human processing

Considerably more sophisticated

Improving CrowdSearch Performance: Realistic model for such

systems may be one where the users post their queries to CrowdSearch and go offline. CrowdSearch processes the search query and sends the results to the user via notification, such as iPhone push notification or SMS.

The price of human validation can also be reduced with a simple optimization in order to be more adaptive about how many duplicates are requested for each validation task.

Future possibilities.

Improving Automated Search Performance: With using positive and

negative feedbacks from humans, GPS locations, orientation or text tags.

CrowdSearch payment models: 2 possible payment models.

First is where the search provider pays for human validation in-order to provide a more accurate search experience for mobile users. Second is where the mobile users pay directly for human validation through a micropayment account such as PayPal.

Future possibilities.

Conclusion

Unlike text , image search is difficult due to unclear features.

A general image search system is far from reality despite the significant research in the area

CrowdSearch demonstrate an 95% search precision

Compare to alternative approaches with similar search delay ,it saves the monetary cost up to 50%

While CrowdSearch focus on image search, techniques used are applicable to areas beyond images to any multimedia search from mobile phones.

Is it enough to design and build such systems only on iphones?

How the world find the idea of a system that will be able to identify faces?

university college london department of computer science cs m038/gz06: mobile and cloud computing...

Documents

automated image search

crowdsearch slide

internet search

accurate image search

image query

ranked image

validation tasks

monetary cost crowdsearch