collaborative mobile visual computing

Collaborative MobileVisual Computing

Zoltan KatoUniversity of Szeged

„Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013

Future: Collaborative sensing

• (video) cameras became standard on mobile phones– Almost everybody is equiped with a camera

• Collaborative sensing (ad-hoc mobile network of cameras)

• Collection of still images and video• Mobile phone’s computing power is quickly increasing

(GPU is also becoming standard)– Panorama stitching– 3D reconstruction– Wide range tracking (e.g. everywhere in a city)– …


Collaborative sensing applications

• Emergency situations (e.g. Great East Japan Earthquake 2011: 125,000 buildings damaged or destroyed)– Fast environment mapping is critical

• Image based navigation, look at distant places• Rendering synthetic views/videos of e.g. sport events

(always see the actual event from the best point of view)

• Security: detection of unusual events, wide range visual search of a suspected car/person

• etc…


Motivation

• Mobile– Smartphone explosion!– Sensors on-board

• Camera module + position, orientation, acceleration, …

– Network connection • Wifi and/or mobile internet communication

– Capable and still increasing computing power– Billions of potential users!

• Collaborative– Near synchronous imaging– Decentralized, ad-hoc camera network– Computation is done by the peers– Data is shared among them by request


Problem Statement• High-level collaborative tasks

– 3D scene reconstruction– Synthetic view generation– Panorama generation

• Fundamental algorithmic problems– Correspondence

• Detect corresponding objects in the images

– Ad-hoc camera network calibration• How the 3D scene is projected to a plane?• Which cameras are close to each other?• Which cameras have a common view?

– Reconstructing the third dimension• Each camera provides a 2D images• Fuse the visual content of several cameras to produce a

3D image

– Communication• Infrastructure for peer-to-peer data exchange• Distributed algorithm design


Image database


• Scene types– Large flat regions

• Building facades

– Street view• Planes with different orientations

– Landscape images• Object far away

– Indoor scenes

• Image acquisition– 4-5 mobile devices with different cameras

• Smartphones and tablets

– VGA and 2 megapixel images– Photos + sensor information

• position, orientation

– Goal• cca. 5 photos per scene of cca. 50 different scenes

• Problem– Dense correspondence: find a pair for each pixel in the other images– Sparse correspondence: find the occurrence of detected objects (points, regions) in

other images of the same scene– Wide-baseline

• Possible solutions– Keypoint detection– Region detection– Outlier filtering is important!

Correspondence


Keypoint-based Correspondence


• Keypoint detection– Detect pixels in the images with unique neighborhood properties– State of the art methods from literature: SIFT, SURF, GFTT, MSER, STAR

• Provided by OpenCV software package

• Descriptors– Describes the keypoint neighborhood– Feature vector of 64 or 128 dimensions– State of the art methods from literature: SIFT, SURF

• Pairing– Based on the distance between descriptor feature vectors– FLANN algorithm

• Outlier detection– Remove invalid pairs– RANSAC algorithm with fundamental matrix hypothesis and reprojection error

Preliminary Results


• Today’s high-end mobile devices can solve the correspondence problem for VGA and 2 megapixel image sizes in acceptable time

• Repetitive patterns generate outliers (frequent in urban scenes)…

Region-based Correspondences


• Advantages over keypoints– Keypoint detection can be problematic due to poor imaging hardware and lossy JPEG

image compression

• Method of choice– MSER (Maximally Stable Extremal Regions)– Appearance is consistent with the

transformation– Shape must be covariant to object position

• Problems to solve– Finding corresponding regions

• Using computed plane normals

– Region merging and rejection algorithm

• Application– Patch-based 3D reconstruction

Calibration & Vision Graph Construction

• Our solution to the problems1. The construction of the Vision Graph of the network from

point correspondences and sensor information2. The relative pose estimated with respect to a planar structure

containing a low-rank texture (e.g. flats, windows, brick walls, etc.)


• Given a set of mobile cameras, our task is to determine the locations of each camera with respect to the 3D scene using visual information and sensor datas.

• The main problems1. Determine which cameras are seeing a common view 2. Estimate the pose of these cameras

Vision graph construction

1. Taking images with custom Android app– Image + sensor data (location, orientation, FOV)

2. Sensor placement based on– Location data– Orientation and FOV data

3. Constructing graphs G(V,E) from sensor data– V: sensor locations– E: connected if 3D sensor views have some overlap


1.

2.

3.

4. Placing images based on sensor data and graph information– Using a ~300m radius

5. Filtering images of the same group based on content– Extracting interest points (SURF)– Extracting local image features around interest points

• LBP, texture, edge histograms

– Filtering interest points based in local feature(dis-)similarity

– Checking for existing correspondences between such images

– Reason: possible content occlusions in images of thesame area (we do not have map/street information)

6. Goal: image placement based on above location and filtering information


4.

6.

Camera Pose Estimation

1. Calculate the relative pose within the network– Choose an arbitrary main camera. The main camera estimates

the relative translation and rotation to its neighbors. The relative pose can be easily extracted from the essential matrices.

– The necessary point correspondences determined from extracted SIFT or SURF features.

2. Relate the camera network to a planar surface– The main camera determines the relative pose to an extracted patch of a low-rank

texture using the TILT (Transform Invariant Low-rank Textures) algorithm.– The algorithm estimates the best planar homography, which aligns the extracted

pattern to be low-rank.– The relative position and orientation factorized from the estimated planar homography.


2.

3. Calibrating the whole network w.r.t. the 3D world– The camera network pose and the low-rank homography are

usually not in the same scale, thus we have to determine a relative scale to achieve a consistent calibration of the network.

– Assuming that at least another camera sees the same pattern, this problem can be easily solved by a classical mutual information based registration algorithm.

– This algorithm runs on each mobile of the network, the final scale will be the median of the estimated scales.


Patch-based 3D reconstruction


• Given a pair of corresponding regionsdetermine the normal & depth of the 3D surface– The usual way to determine normals based on

cross product computing (using surface points reconstructed beforehand)

– Our method requires the knowledge of the affine transformation between the images of a surface patch

• The proposed method is compatible with region based correspondence computation (e.g. MSER)– Affine transformation can be computed without

establishing point-point correspondences

Properties

• The proposed method– Has closed-form expression for normal vectors– Serves exact solution for any calibrated projections (i.e. isn’t restricted only to

perspective cameras)– Good approximate solutions for any (smooth) surfaces – The precalculated normal can be used for distance calculation (closed form solution

exists for planar patches observed with perspective cameras)

• Limitations: the analysis shows that it can’t be used in the following cases– If the camera centers and the observed region are on the same line (i.e. the

transformation is scaling)– For objects’ contour points (where the normal vector perpendicular to the reprojected

ray tangent to the object)– These cases can easily be determined in algebraic manner



Jozsef Molnar, PhDRui Huang, PhD

Levente Kovacs, PhDAttila Tanacs, PhD

Atul RaiZsolt Santa

Endre Juhasz

This work was partially supported by the European Union and the European Social Fund through project FuturICT.hu

(grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-0013).

Team members

collaborative mobile visual computing

Documents

society of future

mobile devices

mobile phonesalmost

scene of cca

camera network calibrationhow

different camerassmartphones

dimensioneach camera

camera networkcomputation