mobile visual search

50
Mobile Visual Search Oge Marques Florida Atlantic University Universitat Politècnica de Catalunya Barcelona 2 Mar 2012

Upload: oge-marques

Post on 10-May-2015

3.716 views

Category:

Technology


0 download

DESCRIPTION

Mobile Visual Search (MVS) is a fascinating research field with many open challenges and opportunities, which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos) using mobile devices.This talk is structured in four parts:1. Opportunities: where I present recent and relevant numbers of the mobile computing market, particularly in the field of photography apps, social networks, and mobile search. 2. Basic concepts: where I explain the basic MVS pipeline and discuss the three main MVS scenarios and associated challenges.3. Technical aspects: where I briefly cover topics such as feature extraction, indexing, descriptor matching, and geometric verification, discuss the state of the art in these fields, and comment on open problems and research opportunities.4. Examples and applications: where I show representative examples of academic research and commercial apps in this field.

TRANSCRIPT

Page 1: Mobile Visual Search

Mobile Visual Search

Oge Marques Florida Atlantic University

Universitat Politècnica de Catalunya ���Barcelona

2 Mar 2012

Page 2: Mobile Visual Search

Take-home message

Oge  Marques  

Mobile Visual Search (MVS) is a fascinating research field with many open challenges and opportunities which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos) using mobile devices.

Page 3: Mobile Visual Search

Outline

•  This talk is structured in four parts:

1.  Opportunities

2.  Basic concepts

3.  Technical aspects

4.  Examples and applications

Oge  Marques  

Page 4: Mobile Visual Search

Part I

Opportunities

Page 5: Mobile Visual Search

Mobile visual search: driving factors

•  Age of mobile computing

Oge  Marques  h,p://60secondmarketer.com/blog/2011/10/18/more-­‐mobile-­‐phones-­‐than-­‐toothbrushes/    

Page 6: Mobile Visual Search

Mobile visual search: driving factors

•  Why do I need a camera? I have a smartphone…������

���(22 Dec 2011)

Oge  Marques  h,p://www.cellular-­‐news.com/story/52382.php    

Page 7: Mobile Visual Search

Mobile visual search: driving factors

•  Powerful devices

1 GHz ARM Cortex-A9 processor, PowerVR SGX543MP2, Apple A5 chipset

Oge  Marques  h,p://www.apple.com/iphone/specs.html    h,p://www.gsmarena.com/apple_iphone_4s-­‐4212.php    

Page 8: Mobile Visual Search

Mobile visual search: driving factors

•  Powerful devices

Oge  Marques  h,p://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-­‐series/808/Nokia808PureView_Whitepaper.pdf    h,p://www.nokia.com/fr-­‐fr/produits/mobiles/808/    

Page 9: Mobile Visual Search

Mobile visual search: driving factors

Social networks and mobile devices ���(May 2011)

Oge  Marques  h,p://jess3.com/geosocial-­‐universe-­‐2/    

Page 10: Mobile Visual Search

Mobile visual search: driving factors

•  Social networks and mobile devices – Motivated users: image taking and image sharing are

huge!

Oge  Marques  :  h,p://www.onlinemarkeUng-­‐trends.com/2011/03/facebook-­‐photo-­‐staUsUcs-­‐and-­‐insights.html    

Page 11: Mobile Visual Search

Mobile visual search: driving factors

•  Instagram: – 15 million registered users (in 13 months) – 7 employees

– A (growing ecosystem) based on it! •  Search •  Send postcards

•  Manage your photos

•  Build a poster •  etc.

Oge  Marques  h,p://thenextweb.com/apps/2011/12/07/instagram-­‐hits-­‐15m-­‐users-­‐and-­‐has-­‐2-­‐people-­‐working-­‐on-­‐an-­‐android-­‐app-­‐right-­‐now/    h,p://www.nuwomb.com/instagram/      

Page 12: Mobile Visual Search

Mobile visual search: driving factors

•  Legitimate (or not quite…) needs and use cases

Oge  Marques  h,p://www.slideshare.net/dtunkelang/search-­‐by-­‐sight-­‐google-­‐goggles  h,ps://twi,er.com/#!/courtanee/status/14704916575      

Page 13: Mobile Visual Search

Mobile visual search: driving factors

•  A natural use case for CBIR with QBE (at last!) – The example is right in front of the user!

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

IEEE SIGNAL PROCESSING MAGAZINE [62] JULY 2011

! The mobile client processes the query image, extracts fea-tures, and transmits feature data. The image-retrieval algo-rithms run on the server using the feature data as query.

! The mobile client downloads data from the server, and all image matching is performed on the device.One could also imagine a hybrid of the approaches men-

tioned above. When the database is small, it can be stored on the phone, and image-retrieval algorithms can be run locally [8]. When the database is large, it has to be placed on a remote server and the retrieval algorithms are run remotely.

In each case, the retrieval framework has to work within stringent memory, computation, power, and bandwidth constraints of the mobile device. The size of the data transmit-ted over the network needs to be as small as possible to reduce network latency and improve user experience. The server laten-cy has to be low as we scale to large databases. This article reviews the recent advances in content-based image retrieval with a focus on mobile applications. We first review large-scale image retrieval, highlighting recent progress in mobile visual search. As an example, we then present the Stanford Product Search system, a low-latency interactive visual search system. Several sidebars in this article invite the interested reader to dig deeper into the underlying algorithms.

ROBUST MOBILE IMAGE RECOGNITIONToday, the most successful algorithms for content-based image retrieval use an approach that is referred to as bag of features (BoFs) or bag of words (BoWs). The BoW idea is borrowed from text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the database, the document itself can be likewise represented by a

bag of salient words, regardless of where these words appear in the text. For images, robust local features take the analogous role of visual words. Like text

retrieval, BoF image retrieval does not consider where in the image the features occur, at least in the initial stages of the retrieval pipeline. However, the variability of features extracted from different images of the same object makes the problem much more challenging.

A typical pipeline for image retrieval is shown in Figure 2. First, the local features are extracted from the query image. The set of image features is used to assess the similarity between query and database images. For mobile applications, individual features must be robust against geometric and photometric dis-tortions encountered when the user takes the query photo from a different viewpoint and with different lighting compared to the corresponding database image.

Next, the query features are quantized [9]–[12]. The parti-tioning into quantization cells is precomputed for the database, and each quantization cell is associated with a list of database images in which the quantized feature vector appears some-where. This inverted file circumvents a pairwise comparison of each query feature vector with all the feature vectors in the data-base and is the key to very fast retrieval. Based on the number of features they have in common with the query image, a short list of potentially similar images is selected from the database.

Finally, a geometric verification (GV) step is applied to the most similar matches in the database. The GV finds a coherent spatial pattern between features of the query image and the can-didate database image to ensure that the match is plausible. Example retrieval systems are presented in [9]–[14].

For mobile visual search, there are considerable challenges to provide the users with an interactive experience. Current deployed systems typically transmit an image from the client to the server, which might require tens of seconds. As we scale to large databases, the inverted file index becomes very large, with memory swapping operations slowing down the feature-match-ing stage. Further, the GV step is computationally expensive and thus increases the response time. We discuss each block of the retrieval pipeline in the following, focusing on how to meet the challenges of mobile visual search.

[FIG1] A snapshot of an outdoor mobile visual search system being used. The system augments the viewfinder with information about the objects it recognizes in the image taken with a camera phone.

Database

QueryImage

FeatureExtraction

FeatureMatching

GeometricVerification

[FIG2] A Pipeline for image retrieval. Local features are extracted from the query image. Feature matching finds a small set of images in the database that have many features in common with the query image. The GV step rejects all matches with feature locations that cannot be plausibly explained by a change in viewing position.

MOBILE IMAGE-RETRIEVAL APPLICATIONS POSE A UNIQUE

SET OF CHALLENGES.

Page 14: Mobile Visual Search

Part II

Basic concepts

Page 15: Mobile Visual Search

MVS: technical challenges

•  How to ensure low latency (and interactive queries) under constraints such as: – Network bandwidth

– Computational power – Battery consumption

•  How to achieve robust visual recognition in spite of low-resolution cameras, varying lighting conditions, etc.

•  How to handle broad and narrow domains

Oge  Marques  

Page 16: Mobile Visual Search

MVS: Pipeline for image retrieval

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

Page 17: Mobile Visual Search

3 scenarios

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

Page 18: Mobile Visual Search

Part III

Technical aspects

Page 19: Mobile Visual Search

Part III - Outline

•  The MVS pipeline in greater detail

•  Datasets for MVS research

•  MPEG Compact Descriptors for Visual Search (CDVS)

Oge  Marques  

Page 20: Mobile Visual Search

MVS: descriptor extraction

•  Interest point detection •  Feature descriptor computation

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

Page 21: Mobile Visual Search

Interest point detection •  Numerous interest-point detectors have been proposed in

the literature: –  Harris Corners (Harris and Stephens 1988) –  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian

(DoG) (Lowe 2004) –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002) –  Hessian affine (Mikolajczyk et al. 2005) –  Features from Accelerated Segment Test (FAST) (Rosten and

Drummond 2006) –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006)

•  Different tradeoffs in repeatability and complexity •  See (Mikolajczyk and Schmid 2005) for a comparative

performance evaluation of local descriptors in a common framework.

Oge  Marques  Girod  et  al.  IEEE  Signal  Processing  Magazine  2011  

Page 22: Mobile Visual Search

Feature descriptor computation

•  After interest-point detection, we compute a visual word descriptor on a normalized patch.

•  Ideally, descriptors should be: –  robust to small distortions in scale, orientation, and

lighting conditions; – discriminative, i.e., characteristic of an image or a small

set of images; – compact, due to typical mobile computing constraints.

Oge  Marques  Girod  et  al.  IEEE  Signal  Processing  Magazine  2011  

Page 23: Mobile Visual Search

Feature descriptor computation •  Examples of feature descriptors in the literature: –  SIFT (Lowe 1999) –  Speeded Up Robust Feature (SURF) interest-point

detector (Bay et al. 2008)

– Gradient Location and Orientation Histogram (GLOH) (Mikolajczyk and Schmid 2005)

– Compressed Histogram of Gradients (CHoG) (Chandrasekhar et al. 2009, 2010)

•  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and (Mikolajczyk and Schmid PAMI 2005) for comparative performance evaluation of different descriptors.

Oge  Marques  Girod  et  al.  IEEE  Signal  Processing  Magazine  2011  

Page 24: Mobile Visual Search

Feature descriptor computation

•  What about compactness? – Option 1: Compress off-the-shelf descriptors. •  Result: poor rate-constrained image-retrieval

performance.

– Option 2: Design a descriptor with compression in mind. –  Example: CHoG (Compressed Histogram of Gradients) ���

(Chandrasekhar et al. 2009, 2010)

Oge  Marques  Girod  et  al.  IEEE  Signal  Processing  Magazine  2011  

Page 25: Mobile Visual Search

CHoG: Compressed Histogram of Gradients

Oge  Marques  Chandrasekhar  et  al.  CVPR  09,10  Bernd Girod: Mobile Visual Search

Patch

01101

101101

CHoG

Descriptor

Gradient distributions

for each bin Gradients

Spatial

binning

Histogram

compression

dx

dy

0100011 111001 0010011 01100 1010100

dx

dy

0100101

011101

Page 26: Mobile Visual Search

CHoG: Compressed Histogram of Gradients

•  Performance evaluation – Recall vs. bit rate

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

by approximately a factor of two. Moreover,

transmission of features allows yet another

optimization: it’s possible to use progressive

transmission of image features, and let the

server execute searches on a partial set of

features, as they arrive.15 Once the server

finds a result that has sufficiently high match-

ing score, it terminates the search and immedi-

ately sends the results back. The use of this

optimization reduces system latency by an-

other factor of two.Overall, the SPS system demonstrates that

using the described array of technologies, mo-

bile visual-search systems can achieve high rec-

ognition accuracy, scale to realistically large

databases, and deliver search results in an ac-

ceptable time.

Emerging MPEG standardAs we have seen, key component technolo-

gies for mobile visual search already exist, and

we can choose among several possible architec-

tures to design such a system. We have shown

these options at the beginning, in Figure 2.

The architecture shown in Figure 2a is the easi-

est one to implement on a mobile phone, but it

requires fast networks such as Wi-Fi to achieve

good performance. The architecture shown in

Figure 2b reduces network latency, and allows

fast response over today’s 3G networks, but

requires descriptors to be extracted on the

phone. Many applications might be accelerated

further by using a cache of the database on the

phone, as exemplified by the architecture

shown in Figure 2c.However, this immediately raises the ques-

tion of interoperability. How can we enable

mobile visual search applications and databases

across a broad range of devices and platforms, if

the information is exchanged in the form of

compressed visual descriptors rather than

images? This question was initially posed dur-

ing the Workshop on Mobile Visual Search,

held at Stanford University in December 2009.

This discussion led to a formal request by the

US delegation to MPEG, suggesting that the po-

tential interest in a standard for visual search

applications be explored.16 As a result, an ex-

ploratory activity in MPEG was started, which

produced a series of documents in the subse-

quent year describing applications, use cases,

objectives, scope, and requirements for a future

standard.17

As MPEG exploratory work progressed, it

was recognized that the suite of existing

MPEG technologies, such as MPEG-7 Visual,

does not yet include tools for robust image-

based retrieval and that a new standard should

therefore be defined. It was further recognized

[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92

Figure 7. Comparison of different schemes with regard to classification

accuracy and query size. CHoG descriptor data is an order of magnitude

smaller compared to JPEG images or uncompressed SIFT descriptors.

100 101 10280

82

84

86

88

90

92

94

96

98

100

Query size (Kbytes)

Cla

ssifi

catio

n ac

cura

cy (

%)

Send feature (CHoG)

Send image (JPEG)

Send feature (SIFT)

Figure 8. End-to-end latency for different schemes. Compared to a system

transmitting a JPEG query image, a scheme employing progressive

transmission of CHoG features achieves approximately four times the

reduction in system latency over a 3G network.

0

2

4

6

8

10

12

Resp

onse

tim

e (s

econ

ds)

JPEG(3G)

Feature(3G)

Featureprogressive

(3G)

JPEG(WLAN)

Feature(WLAN)

Feature extractionNetwork transmissionRetrieval

92

Industry and Standards

Page 27: Mobile Visual Search

MVS: feature indexing and matching •  Goal: produce a data structure that can quickly return a short

list of the database candidates most likely to match the query image. –  The short list may contain false positives as long as the correct match

is included. –  Slower pairwise comparisons can be subsequently performed on just

the short list of candidates rather than the entire database. •  Example of a technique: Vocabulary Tree (VT)-Based Retrieval

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

Page 28: Mobile Visual Search

MVS: geometric verification

•  Goal: use location information of features in query and database images to confirm that the feature matches are consistent with a change in view-point between the two images.

Oge  Marques  Girod  et  al.  IEEE  MulUmedia  2011  

Page 29: Mobile Visual Search

MVS: geometric verification •  Method: perform pairwise matching of feature descriptors and evaluate

geometric consistency of correspondences. •  Techniques:

–  The geometric transform between the query and database image is usually estimated using robust regression techniques such as: •  Random sample consensus (RANSAC) (Fischler and Bolles 1981) •  Hough transform (Lowe 2004)

–  The transformation is often represented by an affine mapping or a homography.

•  Note: GV is computationally expensive, which is why it’s only used for a subset of images selected during the feature-matching stage.

Oge  Marques  

IEEE SIGNAL PROCESSING MAGAZINE [69] JULY 2011

[11] use weak geometric consistency checks to rerank images based on the orientation and scale information of all features. The authors in [53] and [69] propose incor-porating geometric information into the VT matching or hashing step. In [70] and [71], the authors investigate how to speed up RANSAC estimation itself. Philbin et al. [72] use single pairs of matching features to propose hypotheses of the geometric transformation model and verify only possible sets of hypotheses. Weak geometric consistency checks are typically used to rerank a larger candidate list of images, before a full GV is performed on a shorter candidate list.

To speed up GV, one can add a geometric reranking step before the RANSAC GV step, as illustrated in Figure 5. In [73], we propose a reranking step that incorporates geometric information directly into the fast index lookup stage and use it to reorder the list of top matching images (see “Fast Geometric Reranking”). The main advantage of the scheme is that it only requires x, y fea-ture location data and does not use scale

INVERTED INDEX COMPRESSION For a database containing 1 million images and a VT that uses soft binning, each image ID can be stored in a 32-b unsigned integer, and each fractional count can be stored in a 32-b float in the inverted index. The memory usage of the entire inverted index is gk

k51 Nk# 64 bits, where Nk is the length of

the inverted list at the kth leaf node. For a database of 1 mil-lion product images, this amount of memory reaches 10 GB, a huge amount for even a modern server. Such a large memory footprint limits the ability to run other concurrent processes on the same server, such as recognition systems for other databases. When the inverted index’s memory usage exceeds the server’s available random access memory (RAM), swap-ping between main and virtual memory occurs, which signifi-cantly slows down all processes.

A compressed inverted index [58] can significantly reduce memory usage without affecting recognition accuracy. First, because each list of IDs 5ik1, ik2, c, ikNk

6 is sorted, it is more efficient to store consecutive ID differences 5dk15 ik1,dk25 ik22 ik1, c, dkNk

5 ikNk2 ik1Nk2126 in place of the IDs. This

practice is also commonly used in text retrieval [62]. Second, the fractional visit counts can be quantized to a few repre-sentative values using Lloyd-Max quantization. Third, the dis-tributions of the ID differences and visit counts are far from uniform, so variable-length coding can be much more rate efficient than fixed-length coding. Using the distributions of the ID differences and visit counts, each inverted list can be encoded using an arithmetic code (AC) [63]. Since keeping the decoding delay low is very important for interactive mobile visual search applications, a scheme that allows ultra-fast decoding is often preferred over AC. The carryover code

[64] and recursive bottom-up complete (RBUC) code [65] have been shown to be at least ten times faster in decoding than AC, while achieving comparable compression gains as AC. The carryover and RBUC codes attain these speedups by enforcing word-aligned memory accesses.

Figure S6(a) compares the memory usage of the invert-ed index with and without compression using the RBUC code. Index compression reduces memory usage from near-ly 10 GB to 2 GB. This five times reduction leads to a sub-stantial speedup in server-side processing, as shown in Figure S6(b). Without compression, the large inverted index causes swapping between main and virtual memory and slows down the retrieval engine. After compression, memory swapping is avoided and memory congestion delays no longer contribute to the query latency.

Uncod

ed

Coded

Uncod

ed

Coded

0

5

10

Mem

ory

Usa

ge (

GB

)

(a)

0

2

4

6

Que

ry L

aten

cy (

s)

(b)

[FIG S6] (a) Memory usage for inverted index with and without compression. A five times savings in memory is achieved with compression. (b) Server-side query latency (per image) with and without compression. The RBUC code is used to encode the inverted index.

QueryData VT

GeometricReranking GV

IdentifyInformation

[FIG5] An image retrieval pipeline can be greatly sped up by incorporating a geometric reranking stage.

[FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometric model. True feature matches are shown in red. False feature matches are shown in green.

Girod  et  al.  IEEE  MulUmedia  2011  

Page 30: Mobile Visual Search

Datasets for MVS research

•  Stanford Mobile Visual Search Data Set ���(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/) – Key characteristics: •  rigid objects

•  widely varying lighting conditions •  perspective distortion

•  foreground and background clutter •  realistic ground-truth reference data

•  query data collected from heterogeneous low and high-end camera phones.

Oge  Marques  Chandrasekhar  et  al.  ACM  MMSys  2011  

Page 31: Mobile Visual Search

SMVS Data Set: categories and examples

•  DVD covers

Oge  Marques  h,p://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html    

Page 32: Mobile Visual Search

SMVS Data Set: categories and examples

•  CD covers

Oge  Marques  h,p://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html    

Page 33: Mobile Visual Search

SMVS Data Set: categories and examples

•  Museum paintings

Oge  Marques  h,p://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/museum_painUngs.html    

Page 34: Mobile Visual Search

Other MVS data sets

Oge  Marques  ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT  

Page 35: Mobile Visual Search

MPEG Compact Descriptors for Visual Search (CDVS)

•  Objective – Define a standard that enables efficient

implementation of visual search functionality on mobile devices

•  Scope •  bitstream of descriptors •  parts of descriptor extraction process (e.g. key-point

detection) needed to ensure interoperability

– Additional info: •  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs •  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)

Oge  Marques  Bober,  Cordara,  and  Reznik  (2010)  

Page 36: Mobile Visual Search

MPEG CDVS

•  Summarized timeline

Oge  Marques  

that among several component technologies for

image retrieval, such a standard should focus pri-

marily on defining the format of descriptors and

parts of their extraction process (such as interest

point detectors) needed to ensure interoperabil-

ity. Such descriptors must be compact, image-

format independent, and sufficient for robust

image-matching. Hence, the title Compact

Descriptors for Visual Search was coined as an in-

terim name for this activity. Requirements and

Evaluation Framework documents have been

subsequently produced to formulate precise cri-

teria and evaluation methodologies to be used

in selection of technology for this standard.

The Call for Proposals17 was issued at the 96th

MPEG meeting in Geneva, in March 2011, and

responses are now expected by November

2011. Table 1 lists milestones to be reached in

subsequent development of this standard.It is envisioned that, when completed, this

standard will

! ensure interoperability of visual search appli-

cations and databases,

! enable a high level of performance of imple-

mentations conformant to the standard,

! simplify design of descriptor extraction and

matching for visual search applications,

! enable hardware support for descriptor ex-

traction and matching in mobile devices, and

! reduce load on wireless networks carrying

visual search-related information.

To build full visual-search applications, this

standard may be used jointly with other

existing standards, such as MPEG Query For-

mat, HTTP, XML, JPEG, and JPSearch.

Conclusions and outlookRecent years have witnessed remarkable

technological progress, making mobile visual

search possible today. Robust local image fea-

tures achieve a high degree of invariance

against scale changes, rotation, as well as

changes in illumination and other photometric

conditions. The BoW approach offers resiliency

to partial occlusions and background clutter,

and allows design of efficient indexing

schemes. The use of compressed image features

makes it possible to communicate query

requests using only a fraction of the rate

needed by JPEG, and further accelerates search

by storing a cache of the visual database on

the phone.Nevertheless, many improvements are still

possible and much needed. Existing image fea-

tures are robust to much of the variability be-

tween query and database images, but not all.

Improvements in complexity and compactness

are also critically important for mobile visual-

search systems. In mobile augmented-reality

applications, annotations of the viewfinder

content simply pop up without the user ever

pressing a button. Such continuous annota-

tions require video-rate processing on the

mobile device. They may also require improve-

ments in indexing structures, retrieval algo-

rithms, and moving more retrieval-related

operations to the phone.Standardization of compact descriptors for

visual search, such as the new initiative within

MPEG, will undoubtedly provide a further

boost to an already exciting area. In the near

[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93

Table 1. Timeline for development of MPEG standard for visual search.

When Milestone Comments

March, 2011 Call for Proposals is published Registration deadline: 11 July 2011

Proposals due: 21 November 2011

December, 2011 Evaluation of proposals None

February, 2012 1st Working Draft First specification and test software model that can

be used for subsequent improvements.

July, 2012 Committee Draft Essentially complete and stabilized specification.

January, 2013 Draft International Standard Complete specification. Only minor editorial

changes are allowed after DIS.

July, 2013 Final Draft International

Standard

Finalized specification, submitted for approval and

publication as International standard.

July!

Sep

tem

ber

2011

93

Girod  et  al.  IEEE  MulUmedia  2011  

Page 37: Mobile Visual Search

Part IV

Examples and applications

Page 38: Mobile Visual Search

Examples

•  Google Goggles •  SnapTell •  oMoby (and the IQ Engines API) •  pixlinQ •  Moodstocks

Oge  Marques  

Page 39: Mobile Visual Search

Examples of commercial MVS apps •  Google

Goggles – Android

and iPhone – Narrow-

domain search and retrieval

Oge  Marques  h,p://www.google.com/mobile/goggles    

Page 40: Mobile Visual Search

SnapTell •  One of the earliest (ca. 2008) MVS apps for iPhone –  Eventually acquired by Amazon (A9)

•  Proprietary technique (“highly accurate and robust algorithm for image matching: Accumulated Signed Gradient (ASG)”).

Oge  Marques  h,p://www.snaptell.com/technology/index.htm    

Page 41: Mobile Visual Search

oMoby (and the IQ Engines API) –  iPhone app

Oge  Marques  h,p://omoby.com/pages/screenshots.php    

Page 42: Mobile Visual Search

oMoby (and the IQ Engines API)

•  The IQ Engines API: ���“vision as a service”

Oge  Marques  h,p://www.iqengines.com/applicaUons.php    

Page 43: Mobile Visual Search

pixlinQ •  A “mobile visual

search solution that enables you to link users to digital content whenever they take a mobile picture of your printed materials.” – Powered by image

recognition from LTU technologies

Oge  Marques  h,p://www.pixlinq.com/home    

Page 44: Mobile Visual Search

pixlinQ

•  Example app (La Redoute)

Oge  Marques  h,p://www.youtube.com/watch?v=qUZCFtc42Q4    

Page 45: Mobile Visual Search

Moodstocks: overview •  Offline image recognition thanks to a smart image

signatures synchronization

Oge  Marques  h,p://www.youtube.com/watch?v=tsxe23b12eU    

Page 46: Mobile Visual Search

Moodstocks: technology •  Unique features:

–  offline image recognition thanks to a smart image signatures synchronization,

–  QR Code decoding, –  EAN 8/13 decoding, –  online image recognition as a fallback for very large image databases, –  simultaneous run of image recognition and barcode decoding, –  seamless scans logging in the background.

•  Cross-platform (iOS / Android) client-side SDK and HTTP API available: https://github.com/Moodstocks

•  JPEG encoder used within their SDK also publicly available: https://github.com/Moodstocks/jpec

Oge  Marques  

Page 47: Mobile Visual Search

Moodstocks

•  Many successful apps for different platforms

Oge  Marques  h,p://www.moodstocks.com/gallery/    

Page 48: Mobile Visual Search

Concluding thoughts

Page 49: Mobile Visual Search

Concluding thoughts

•  Mobile Visual Search (MVS) is coming of age.

•  This is not a fad and it can only grow.

•  Still a good research topic – Many relevant technical challenges

– MPEG efforts have just started

•  Infinite creative commercial possibilities

Oge  Marques  

Page 50: Mobile Visual Search

Thanks!

•  Questions?

•  For additional information: [email protected] Oge  Marques