indoor point cloud processing

39

Click here to load reader

Upload: petteri-teikari-phd

Post on 14-Feb-2017

103 views

Category:

Real Estate


5 download

TRANSCRIPT

Page 1: Indoor Point Cloud Processing

Indoor Point Cloud ProcessingDeep learning for semantic segmentation of indoor point clouds

Page 2: Indoor Point Cloud Processing

Implementation Initial ‘deep learning’ idea

.XYZ point cloud better than the reconstructed .obj file for automatic segmentation due to higher resolution

Input Point Cloud

3D CAD MODELNo need to have planar surfaces Sampled too densely

www.outsource3dcadmodeling.com

2D CAD MODELStraightforward from 3D to 2D

cadcrowd.com

RECONSTRUCT 3D“Deep Learning”

3D Semantic Segmentationfrom point cloud / reconstructed mesh

youtube.com/watch?v=cGuoyNY54kUarxiv.org/1608.04236

Primitive-based deep learning segmentationThe order between semantic segmentation and reconstruction could be swapped

Page 3: Indoor Point Cloud Processing

Sensors Architectural spaces

https://matterport.com/

Some Companycould upgrade to? http://news.mit.edu/2015/object-recognition-robots-0724

https://youtu.be/m6sStUk3UVk

http://news.mit.edu/2015/algorithms-boost-3-d-imaging-resolution-1000-times-1201

+

http://www.forbes.com/sites/eliseackerman/2013/11/17/

Page 4: Indoor Point Cloud Processing

HARDWARE Existing scanners staticScan space eventually with a drone

https://www.youtube.com/watch?v=dVPOf-oDUOM

Introducing Cartographer We are happy to announce the open source release of Cartographer, a real-time simultaneous localization and mapping (SLAM) library in 2D and 3D with ROS support. SLAM algorithms combine data from various sensors (e.g. LIDAR, IMU and cameras) to simultaneously compute the position of the sensor and a map of the sensor’s surroundings.

We recognize the value of high quality datasets to the research community. That’s why, thanks to cooperation with the Deutsches Museum (the largest tech museum in the world), we are also releasing three years of LIDAR and IMU data collected using our 2D and 3D mapping backpack platforms during the development and testing of Cartographer.

http://www.ucl.ac.uk/3dim/bim | http://www.homepages.ucl.ac.uk/~ucescph/

Indoor Mobile MappingRapid Data Capture for Indoor Modelling

As part of a working group we are investigating the great potential of indoor mobile mapping systems for providing 3D capture of the complex and unique environment that exists inside buildings. The investigation is taking the form of a series of trials to explore the technical capabilities of Indoor Mobile Mapping Systems, such as the i-MMS from Viametris, with a view to performance in Survey and BIM applications with respect to UK standards. The working group is investigating the potential of such technology in terms of accuracies, economic viability and its future development.

Page 6: Indoor Point Cloud Processing

Implementation rough IdeaInput Point Cloud

CAD-Primitive based reconstructionTrained on ModelNet.

CAD PrimitivesModelNetmodelnet.cs.princeton.edu

Possibly only simplified modelling, with only walls, floor and openingshttp://dx.doi.org/10.1016/j.cag.2015.07.008

2D CAD FLOORPLAN → .SVG FOR REAL ESTATE AGENTS

Page 7: Indoor Point Cloud Processing

Point clouds to Architectural Models #1

Point-Cloud Processing with Primitive Shapescg.cs.uni-bonn.de/en/projects

UCL > School of BEAMS > Faculty of Engineering Science > Civil, Environmental and Geomatic Engineering

http://discovery.ucl.ac.uk/id/eprint/1485847

From Point Cloud to Building Information Model: Capturing and Processing Survey Data Towards Automation for High Quality 3D Models to Aid a BIM ProcessThomson, CPH; (2016) From Point Cloud to Building Information Model: Capturing and Processing Survey Data Towards Automation for High Quality 3D Models to Aid a BIM Process. Doctoral thesis, UCL (University College London).

Page 8: Indoor Point Cloud Processing

Point clouds to Architectural Models #2

Eric Turner, May 14, 2015Electrical Engineering and Computer Sciences, University of California at BerkeleyTechnical Report No. UCB/EECS-2015-105http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-105.html, Cited by 1

Figure 3.2: (a) Point cloud of scanned area, viewed from above and colored by elevation; (b) wall sample locations generated from point cloud. Clutter such as furniture or plants do not aect position of wall samples.

Page 10: Indoor Point Cloud Processing

First level Idea

Input point cloud (1,039,097 vertices)

'Simplified' point cloudBy Markus Ylimäki

1) Noisy input with possible missing parts

2) Denoise, consolidate, find normals and possibly upsample the point cloud

3) Find planar surfaces with semantic labels (semantic segmentation for point clouds)1) optimally you would like to describe a wall just using 4 corner point massive reduction of points→

4) Remove too complex shapes like chairs, flowers, chandeliers, etc. whatever

Old school techniques, no machine learning here yet

Each color correspond to a plane, black correspond to no plane (1,039,097 vertices)

This algorithm gives okay results, but could be a lot faster, and you never can have too robust method. And better ‘inpainting’ performance for missing data.

PhD student at the Center for Machine Vision Research in the University of Oulu.

Page 11: Indoor Point Cloud Processing

First level Pre-processing just use existing codePoint Cloud Denoising via Moving RPCAE Mattei, A Castrodad, 2016 - Computer Graphics Forum - Wiley Online Library

Walls become a lot better planar in top view

Optimize consolidation for point clouds

Screened Poisson Reconstructionhttps://github.com/mkazhdan/PoissonRecon, C++ code)

CGAL, Point Set Processinghttp://doc.cgal.org/latest/Point_set_processing_3/

http://vcc.szu.edu.cn/research/2013/EAR/

Deep points consolidation - ACM Digital Libraryby S Wu - 2015 - Cited by 2 - Related articles

[webpage]     [pdf]     [video]     [ppt]     [code]     [data]    

EAR/WLOP CODE AVAILABLE in CGAL as illustrated below

Consolidation of Low quality Point ‐Clouds from Outdoor Scenes

Page 12: Indoor Point Cloud Processing

First level Pre-processing Motivation for image-naîve people

https://www.youtube.com/watch?v=BlDl6M0go-cBackground of BM3D (and later BM4D) developed at Tampere University of Technology, the state-of-the-art denoising algorithm at least prior to deep learning denoisers

Images are always estimates of the “real images” like any measurement in general, and a photo of a black circle on a white background in practice for the computer might not be composed of only two colors. But in practice is corrupted by noise and blur and quantitative image analysis might be facilitated by some image restoration pre-processing algorithms. And we want to use ROBUST ALGORITHMS that perform well also with low-resolution and noisy point cloud (think of Google Tango scans or even more professional laser scanners/ LIDARs)

Lu et al. (2016), https://doi.org/10.1109/TVCG.2015.2500222

“BM3D for point clouds”Patch-Collaborative Spectral Point-Cloud Denoisinghttp://doi.org/10.1111/cgf.12139

You can visualize the removed noise with Hausdorff distance for example

http://dx.doi.org/10.1111/cgf.12802

http://staff.ustc.edu.cn/~lgliu/Publications/Publications/2015_SMI_QualityPoint.pdf

Page 13: Indoor Point Cloud Processing

First level reconstruction

http://dx.doi.org/10.1111/cgf.12802

Page 14: Indoor Point Cloud Processing

First level plane segmentation in practice #1

https://tams.informatik.uni-hamburg.de/people/alumni/xiao/publications/Xiao_RAS2013.pdf

junhaoxiao/TAMS-Planar-Surface-Based-Perception

3D perception code developed at TAMS (http://tams.informatik.uni-hamburg.de/) by Junhao Xiao and others, including point cloud plane segmentation, planar segment area calculation, scan registration based on planar segments, etc.

The following libraries will help also if not everything is found from the implementation above

● CGAL 4.9 - Point Set Processing: User Manual

● PCL - Point Cloud Library (PCL)

● PDAL - Point Data Abstraction Library — pdal.io

For ICC and BIM processing the VOLVOX plugin for Rhino seemed interesting

https://github.com/DURAARK

http://papers.cumincad.org/data/works/att/ecaade2016_171.pdf

Page 16: Indoor Point Cloud Processing

First level plane segmentation in practice #2

http://dx.doi.org/10.1117/1.JEI.24.5.051008

Furthermore, various enhancements are applied to improve the segmentation quality. The GPU implementation of the proposed algorithm segments depth images into planes at the rate of 58 fps. Our pipeline-interleaving technique increases this rate up to 100 fps. With this throughput rate improvement, the application benefit of our algorithm may be further exploited in terms of quality and enhancing the localization

Page 17: Indoor Point Cloud Processing

First level Shape representations

Data-driven shape processing and modeling provides a promising solution to the development of “big 3D data”. Two major ways of 3D data generation, 3D sensing and 3D content creation, populate 3D databases with fast growing amount of 3D models. The database models are sparsely enhanced with manual segmentation and labeling, as well as reasonably organized, to support data-driven shape analysis and processing, based on, e.g., machine learning techniques. The learned knowledge can in turn support efficient 3D reconstruction and 3D content creation, during which the knowledge can be transferred to the newly generated data. Such 3D data with semantic information can be included into the database to enrich it and facilitate further data-driven applications.

https://arxiv.org/abs/1502.06686

Page 18: Indoor Point Cloud Processing

Synthesis Modular blocks as cloud microservices?

POINT CLOUD

2D Floorplan

3D CAD Model

Denoising

Consolidation

Upsampling

PlanarSegmentation

TAMS

Simplification WLOP

DeepPoints

Bilateral

CGAL

Metafile

‘Image restoration’ pipeilineNot necessarily every block before planar segmentation is needed and ‘pre-processing’ could be bypassed

Only to be run from cloud?Start from existing libraries and implementations?

See the details from previous slides.Each block has code available so no new code need to be written to get to MVP

SHUFFLE

Ground TruthBenchmark performance- Accuracy- Computation speed- Robustness

Page 20: Indoor Point Cloud Processing

Second level to deep learning

Page 21: Indoor Point Cloud Processing

General Motivation #1Where VR is going beyond this project

Carlos E. Perez, Software Architect - Design Patterns for Deep Learning Architectures Written Aug 9

Yes.

(1) MagicLeap has known to be hiring Deep Learning experts for its Augmented Reality system. They are known to use Movidius as their chip which is a deep learning vision processor.

(2) Gesture recognition can be done via deep learning.

(3) Voice identification seems to have an importance in a VR context.

See: Design Patterns for Deep Learning Architectures : Applications

https://techcrunch.com/2016/10/28/magic-leap-goes-to-finland-in-pursuit-of-nordic-vr-and-ar-talent/

http://www.forbes.com/sites/davidewalt/2016/11/02/inside-magic-leap-the-secretive-4-5-billion-startup-changing-computing-forever/#2f9365e5e83f

https://www.wired.com/2016/04/magic-leap-vr/

Page 22: Indoor Point Cloud Processing

General Motivation #2Where 3D is going beyond this project

http://jobsearch.scania.com/segerjoblist/presentation.aspx?presGrpId=9470&langId=1&ie=False

http://www.sensorsmag.com/seventh-sense-blog/artificial-intelligence-autonomous-driving-24333

Viorica Pătrăucean, Ph.D: "BIM for existing infrastructure" http://www-smartinfrastructure.eng.cam.ac.uk/files/generating-bim-models-for-existing-assets

Page 23: Indoor Point Cloud Processing

General Motivation #2bWhere 3D: Autonomous driving

https://www.youtube.com/watch?v=4zOqJK-_GAk

Automatic object detection and removal from 3D point clouds by Oxbotica

Francis Engelmann, Jörg Stückler and Bastian LeibeComputer Vision Group, RWTH Aachen Universityhttps://www.youtube.com/watch?v=YebCdz7QsRs

Page 24: Indoor Point Cloud Processing

General Motivation #2CWhere 3D: building information models (BIM)

http://www.spar3d.com/news/lidar/paracosms-new-handheld-lidar-scanner-built-construction-analytics/

GeoSLAM is playing into this trend with the release of their ZEB-CAM, an add-on for the company’s ZEB-REVO handheld indoor mapper that captures imagery at the same time as 3D scan data.

The data captured by the two sensors is fully synchronized, and users can view the results side by side in GeoSLAM’s desktop software. Click a spot in the scan, and the associated imagery is displayed. Click a spot in the imagery, and the associated scan data is displayed.

Page 25: Indoor Point Cloud Processing

General Motivation #3Where AR is going beyond this project

http://adas.cvc.uab.es/varvai2016/

This half-day workshop will include invited talks from researchers at the forefront of modern synthetic data generation with VAR for VAI

● Learning Transferable Multimodal Representations in VAR, e.g., via deep learning

● Virtual World design for realistic training data generation

● Augmenting real-world training datasets with renderings of 3D virtual objects

● Active & reinforcement learning algorithms for effective training data generation and accelerated learning

Xcede’s Data Science team are collaborating with one of the world’s foremost image recognition and augmented reality platforms. Already working with some of the world's top brands, including Pepsi, Coca-Cola, Procter & Gamble, General Mills, Anheuser-Busch, Elle, Glamour, Honda and BMW their mobile app has been downloaded over 45 million times.

Our client is now looking for a Computer Vision Researcher to join their Deep Learning R&D team who can help bring their technology to the next level.

http://www.eetimes.com/author.asp?section_id=36&doc_id=1330958

https://arxiv.org/pdf/1605.09533v1.pdf

Page 26: Indoor Point Cloud Processing

NIPS 2016: 3D WorkshopDeep learning is proven to be a powerful tool to build models for language (one-dimensional) and image (two-dimensional) understanding. Tremendous efforts have been devoted to these areas, however, it is still at the early stage to apply deep learning to 3D data, despite their great research values and broad real-world applications. In particular, existing methods poorly serve the three-dimensional data that drives a broad range of critical applications such as augmented reality, autonomous driving, graphics, robotics, medical imaging, neuroscience, and scientific simulations. These problems have drawn the attention of researchers in different fields such as neuroscience, computer vision, and graphics.

The goal of this workshop is to foster interdisciplinary communication of researchers working on 3D data (Computer Vision and Computer Graphics) so that more attention of broader community can be drawn to 3D deep learning problems. Through those studies, new ideas and discoveries are expected to emerge, which can inspire advances in related fields.

This workshop is composed of invited talks, oral presentations of outstanding submissions and a poster session to showcase the state-of-the-art results on the topic. In particular, a panel discussion among leading researchers in the field is planned, so as to provide a common playground for inspiring discussions and stimulating debates.

The workshop will be held on Dec 9 at NIPS 2016 in Barcelona, Spain. http://3ddl.cs.princeton.edu/2016/

ORGANIZERS● Fisher Yu - Princeton University

● Joseph Lim - Stanford University

● Matthew Fisher - Stanford University

● Qixing Huang - University of Texas at Austin

● Jianxiong Xiao - AutoX Inc.

http://cvpr2017.thecvf.com/ In Honolulu, Hawaii

“I am co-organizing the 2nd Workshop on Visual Understanding for Interaction in conjunction with CVPR 2017. Stay tuned for the details!”

“Our workshop on Large-Scale Scene Under-standing Challenge is accepted by CVPR 2017.

Page 27: Indoor Point Cloud Processing

Labeling 3d Spaces Semantic Part

Manually labeling 3D scans →way too time consuming!

https://arxiv.org/abs/1511.03240

SynthCam3D is a library of synthetic indoor scenes collected from various online 3D repositories and hosted at http://robotvault.bitbucket.orghttps://arxiv.org/abs/1505.00171

SYNTHETIC DATA

The advantages of synthetic 3D models cannot be overstated, especially when considering scenes: once a 3D annotated model is available, it allows rendering as many 2D annotated views as desired,

Samples of annotated images rendered at various camera poses for an office scene taken from SynthCam3D

youtube.com/watch?v=cGuoyNY54kU

Existing datasetsNYUv2

Page 28: Indoor Point Cloud Processing

SYNTHETIC Datasets #1

SynthCam3D is a library of synthetic indoor scenes collected from various online 3D repositories and hosted at http://robotvault.bitbucket.org.

Large public repositories (e.g. Trimble Warehouse) of 3D CAD models have existed in the past, but they have mainly served the graphics community. It is only recently that we have started to see emerging interest in synthetic data for computer vision. The advantages of synthetic 3D models cannot be overstated, especially when considering scenes: once a 3D annotated model is available, it allows rendering as many 2D annotated views as desired, at any resolution and frame-rate. In comparison, existing datasets of real data are fairly limited both in the number of annotations and the amount of data. NYUv2 provides only 795 training images for 894 classes; hence learning any meaningful features characterising a class of objects becomes prohibitively hard.

https://arxiv.org/abs/1505.00171

Page 29: Indoor Point Cloud Processing

SYNTHETIC Datasets #2

Creating large datasets with pixelwise semantic labels is known to be very challenging due to the amount of human effort required to trace accurate object boundaries. High-quality semantic labeling was reported to require 60 minutes per image for the CamVid dataset and 90 minutes per image for the Cityscapes dataset. Due to the substantial manual effort involved in producing pixel-accurate annotations, semantic segmentation datasets with precise and comprehensive label maps are orders of magnitude smaller than image classification datasets. This has been referred to as the “curse of dataset annotation”: the more detailed the semantic labeling, the smaller the datasets.

Somewhat orthogonal to our work is the use of indoor scene models to train deep networks for semantic understanding of indoor environments from depth images [15, 33]. These approaches compose synthetic indoor scenes from object models and synthesize depth maps with associated semantic labels. The training data synthesized in these works provides depth information but no appearance cues. The trained models are thus limited to analyzing depth maps.

15

SynthCam3Dprevious slide

33

Page 30: Indoor Point Cloud Processing

Deep Learning Problems

Data columns: x, y, z, red, green, blue

Point clouds can be huge

• Voxelization of the scene impossible in practice without severe downsampling / discretization

• Mesh/surface reconstruction increases the data amount as well

How to handle massive datasets in deep learning?

Simplify (primitive-based reconstruction) before semantic segmentation?

https://github.com/btgraham/SparseConvNet

https://ei.is.tuebingen.mpg.de

https://arxiv.org/abs/1605.06240

This can be used to analyse 3D models, or space-time paths. Here are some examples from a 3D object dataset. The insides are hollow, so the data is fairly sparse. The computational complexity of processing the models is related to the fractal dimension of the underlying objects.

https://arxiv.org/abs/1503.04949https://github.com/MPI-IS/bilateralNN

doi:10.1111/j.1467-8659.2009.01645.x

1

2

3

Can't use 3D CNNsTry alternative schemes

no normals

Page 31: Indoor Point Cloud Processing

Point clouds with deep learning: example with Normals

Eurographics Symposium on Geometry Processing 2016, Volume 35 (2016), Number 5http://dx.doi.org/10.1111/cgf.12983

Convolutional neural networks Work on normal estimation with CNNs focus on using as input RGB images, or possibly RGB-D, but not sparse data such as unstructured 3D point clouds. CNN-based techniques have been applied to 3D data though, but with a voxel-based perspective, which is not accurate enough for normal estimation. Techniques to efficiently apply CNN-based methods to sparse data have been proposed too [Gra15], but they mostly focus on efficiency issues, to exploit sparsity; applications are 3D object recognition, again with voxel-based granularity, and analysis of space-time objects. An older, neuron-inspired approach [JIS03] is more relevant to normal estimation in 3D point clouds but it actually addresses the more difficult task of meshing. It uses a stochastic regularization based on neighbors, but the so-called “learning process” actually is just a local iterative optimization.

CNNs can also address regression problems such as object pose estimation [PCFG12]. These same properties seem appropriate as well for the task of learning how to estimate normals, including in the presence of noise and when several normal candidates are possible near sharp features of the underlying surface

The question, however, is how to interpret the local neighborhood of a 3D point as an image-like input that can be fed to a CNN. If the point cloud is structured, as given by a depth sensor, the depth map is a natural choice as CNN input. But if the point cloud is unstructured, it is not clear what to do. In this case, we propose to associate an image-like representation to the local neighborhood of a 3D point via a Hough transform. In this image, a pixel corresponds to a normal direction, and its intensity measures the number of votes for that direction; besides, pixel adjacency relates to closeness of directions. It is a planar map of the empirical probability of the different possible directions. Then, just as a CNN for ordinary images can exploit the local correlation of pixels to denoise the underlying information, a CNN for these Hough-based direction mapsmight also be able to handle noise, identifying a flat peak around one direction. Similarly, just as a CNN for images can learn a robust recognizer, a CNN for direction maps might be able to make uncompromising decisions near sharp features, when different normals are candidate, opting for one specific direction rather than trading off for an average, smoothed normal. Moreover, outliers can be ignored in a simple way by limiting the size of the neighborhood, thus reducing or preventing the influence of points lying far from a more densely sampled surface

Makes computationallyfeasible

Page 32: Indoor Point Cloud Processing

Literature Indoor point cloud segmentation with deep learning

http://robotvault.bitbucket.org/scenenet-rgbd.html

http://delivery.acm.org/10.1145/3020000/3014008

https://pdfs.semanticscholar.org/1ce8/1a2c8fa5731db944bfb57c9e7e8eb0fc5bd2.pdf

https://arxiv.org/pdf/1612.00593v1.pdf

Page 34: Indoor Point Cloud Processing

Second level deep learning in Practice

btgraham/SparseConvNetC++ Spatially-sparse convolutional networks. Allows processing of sparse 2, 3 and 4 dimensional data.Build CNNs on the square/cubic/hypercubic or triangular/tetrahedral/hyper-tetrahedral lattices

gangiman/PySparseConvNetPython wrapper for SparseConvNet

in practice

http://3ddl.cs.princeton.edu/2016/slides/notchenko.pdf

Update old school machine learning approach to modern deep learning. Reconstruct the planar shapes using a database of CAD models (ModelNet)?Requires some work for sure

http://staff.ustc.edu.cn/~juyong/DictionaryRecon.html

MOTIVATION

3dmodel_feature Code for extracting 3dcnn features of CAD models

Page 35: Indoor Point Cloud Processing

Point cloud pipeline 2nd Step, “Deep learnify”

Denoising

Consolidation

Upsampling

PlanarSegmentation

Simplification

2D Unstructured 3DRough correspondences from more established 2D Deep Learning World

btgraham/SparseConvNetgangiman/PySparseConvNetPython wrapper for SparseConvNet3dmodel_feature Code for extracting 3dcnn features of CAD models

Sparse libraries only as starting points

https://arxiv.org/abs/1503.04949https://github.com/MPI-IS/bilateralNN

http://arxiv.org/abs/1607.02005

Andrew Adams, Jongmin Baek, Myers Abraham Davis May 2010, http://dx.doi.org/10.1111/j.1467-8659.2009.01645.x

Page 36: Indoor Point Cloud Processing

3D SHAPE representations #1: VRN EnsembleModelNet40: 95.54% Accuracy – The STATE-OF-THE-ART!

For this work, we select the Variational Autoencoder (VAE), a probabilistic framework that learns both an inference network to map from an input space to a set of descriptive latent variables, and a generative network that maps from the latent space back to the input space.

Our model, implemented in Theano with Lasagne comprises an encoder network, the latent layer, and a decoder network, as displayed in Figure 1.

https://arxiv.org/abs/1608.04236https://github.com/ajbrock/Generative-and-Discriminative-Voxel-Modeling

Page 37: Indoor Point Cloud Processing

3D SHAPE representations #2: Probing filters

https://arxiv.org/abs/1605.06240

https://github.com/yangyanli/FPNN

Created by Yangyan Li, Soeren Pirk, Hao Su, Charles Ruizhongtai Qi, and Leonidas J. Guibas from Stanford University.

Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Unfortunately, the computational complexity of 3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation.

In this work, we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them.

Our learning algorithm optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space “intelligently”, rather than operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets

Page 39: Indoor Point Cloud Processing

Point cloud pipeline in practice #2 Joint pipeline

http://dx.doi.org/10.1016/j.neucom.2015.08.127

http://ai.stanford.edu/~quocle/tutorial2.pdf