interactive indoor 3d modeling from a single photo with cv ... · new furniture with a 3d indoor...

Interactive Indoor 3D Modeling from a Single Photo with CV Support

Tomoya Ishikawa1, Thangamani Kalaivani12, Takashi Okuma1, Keechul Jung3, and Takeshi Kurata1

1 AIST, Japan 2 University of Tsukuba, Japan 3Soongsil University, Korea [email protected]

Abstract

This paper describes an interactive indoor 3D modeler for efficiently virtualizing the inside of rooms from a single photo by using interaction techniques based on geometric constraints and computer vision (CV) support. In the modeler, users place planes on a photo in a two dimensional way as the unit of interaction. The given regions are inverse-projected into the 3D space for creating 3D models. The geometric information is given by CV- supported user interaction easily. Based on it, the modeler provides plane orthogonality and parallelism, and view-volume constraints to support interactive modeling. In addition, visualization by projective texture mapping (PTM), depth mapping, and the smart second view makes it easy that the users confirm shapes of models. 1. Introduction

Virtualized real objects made from photos enable virtual environments to enhance the reality. That makes it possible to reduce gaps between the real and virtual world in various applications. In this paper, we mainly aim at furniture-shopping support by pre-visualizing a combination of a customer’s room and new furniture with a 3D indoor model created from a photo captured by the customer. Since such visualization prompts the customer to imagine the appearance of his/her room with new furniture, QoS (Quality of Service) should be improved.

A typical way of modeling objects from photos is stereo methods. In spite of the long-established effort, it is still hard to generate 3D models with practical quality. Moreover, in terms of convenience, it is not appropriate that every customer is required to shoot photos suitable for stereo methods. Thus, providing the modeling with a single photo should be beneficial for broadening the customer range.

Modeling methods from a single photo have been investigated for creating better-quality 3D models [1-4].

Automatic Photo Pop-up [1] gives 3D information to regions of an input photo by learning lots of features of other photos in advance. However, since the method depends on the color-based segmentation, it cannot work well for modeling the inside of rooms which often consist of similar colors. On the other hand, manual 3D modeler can produce high-quality models by taking advantages of users’ knowledge, which is time consuming instead [2]. Tour into the picture [3] is a semi-automatic modeler that allows users to create models by estimating camera parameters from a photo and by approximating an object shape as a rectangular solid. Oh et al. [4] utilizes the camera parameters more efficiently to keep shapes of models by constraining changes of depth values of pixels on LoS (Lines of Sight) when translating them. But, their method requires huge amounts of time for dividing a photo into regions.

Our approach is based on manual or semi-automatic interaction as well as the modelers [1-4] mentioned above, but it supports intuitive and efficient modeling by interactions techniques using geometric constraints derived from a photo and visualization coupled with the interaction. In the following, Section 2 and 3 describe our proposed modeler and a user study using it. Finally, conclusions and future work are given in Section 4. 2. Interactive indoor 3D modeler

Differing from Oh’s system [4], our proposed modeler does not make the user assign depth values to pixels nor carry out image segmentation in advance as 3D models can be created directly in 3D space based on a photo while modeling. Furthermore, real-time PTM, depth-map presentation, and the smart second view which is adaptively controlled help the user comprehend shapes of models being created. It not only reduces time for modeling, but also improves the usability.

tomoyo

テキストボックス

The 3rd International Workshop on Ubiquitous Virtual Reality 2009 in Adelaide (IWUVR2009)

Input : Indoor photo Camera-parameter estimation Interactive modeling

& checkingOutput : 3D model

(c) (d)(a) (b)

Figure 1. Flowchart of our proposed m odeler.

2.1. Modeler overview

Figure 1 shows the flowchart of our proposed modeler. In the modeler, firstly, it estimates camera parameters from an input photo by easy user interaction with CV support. Then, the user interactively creates 3D models using estimated camera parameters. While modeling, the user can confirm that models being created are correct or not by changing viewpoints and visualizing textures and depth maps projected on the models. After modeling, the modeler outputs a model file for applying it to other applications. The rest in this section describes camera- parameter estimation from a photo, interaction for modeling and visualization. 2.2. Camera-parameter estimation

In indoor environments, floors, walls, and furniture are placed in parallel or perpendicularly with respect to the others generally. Such features make the modeling quite easy by applying an orthogonal coordinate system to floors and walls occupying large regions on a photo. Our proposed modeler utilizes the features and estimates transformation parameters between the object and the camera coordinate systems, and focal length by CV-supported simple user interaction.

An object coordinate system is set by selecting two pairs of lines, which are parallel in the actual 3D room in a photo. The modeler first executes Hough transformation to detect lines on the photo, and then displays the lines to the user (Figure 1-(b)). Thus, the user can give the pairs of parallel lines to the modeler easily just by clicking the displayed lines. The 2D intersection points of the selected lines are vanishing points for the photo. From the two vanishing points ei, the m i ed by the fol

focal length f of the ca era can be est matlowing equation. f = |𝐞 ∙ 𝐞 | where 𝐞𝐢 = , y )T (i = 1,2). (x

Hereby, the rotation matrix R between the object and camera coordinate systems is given by the following equation. 𝐑 = (𝐯 𝟏 𝐯 𝟐 𝐯 𝟏 × 𝐯 𝟐)

where (i = 1,2) 𝐯 x , y , f)T𝐯′ 𝐯 ‖𝐯 ‖.⁄ = (=

After estimating R, the origin of the object coordinate system is set by the user. Though the manipulation, the modeler can get the translation vector from the object to the camera coordinate systems. If there is lens distortion, it should be corrected by using other software tools such as PTLens [5] in advance. Also, we assume that the principal point corresponds to the center of the photo. 2.3. User-interaction for modeling

Assuming that each object in an indoor photo can

be modeled with a set of quadrangular and freeform planes for virtual furniture arrangement, our modeler provides two kinds of tools for generating planes to the user as follows.

Quadrangular tool: generates 3D quadrangular planes by giving the two points of the opposite corners by mouse clicking. This tool is suitable for simple objects such as floors, walls, tables, and shelves.

Freeform tool: generates freeform 3D planes formed by a set of points repeatedly clicked by the user as the contour of the planes. This tool is used for more complex objects.

In both tools, the depth of the first clicked point is set as the intersection between the line of sight passing through the clicked point on the photo and the other planes. At the original viewpoint, the user can easily understand the correspondence between the photo and the model being generated. Particularly, in the case of the freeform tool, the interaction for setting contour points of 3D planes is same as 2D interaction to photos, thus the user can create models intuitively.

During these interactions, the normal of each plane can be toggled among several default directions as in x-y, y-z, and z-x planes. The function is effective in environments consisting of planes such as rooms. In addition, the user can create models from other viewpoints different from the original viewpoint while confirming textures and depth maps projected onto the models by real-time PTM described below.

Plane being translated View volume

Figure 2. Plane translation with geometric constraint.

The generated models can be translated, deformed,

and deleted. In terms of translation and deformation, by using the view-volume constraint, the user can control the depth (Figure 3) and normal vector without changing 2D shapes projected onto the input photo.

2.4. Visualization for checking 3D model 2.4.1. Color & depth representation

The proposed modeler provides three kinds of selectable presentation modes to the user as follows (Figure 3).

Projective texture mapping (PTM): re-projects the texture in the photo to 3D models and shows the correspondence between the shapes of the models and the textures.

Depth mapping: displays the depth from the viewpoint to the models as the gray-scaled view image and shows the shapes of the model clearly.

Mixed mapping: displays the models with textures obtained by blending photo textures and the depth value and shows more shape-enhanced view image than the PTM.

The above presentation modes can be rendered by GPU in real-time not only while viewing the models but also while generating and editing the models, so it is effective for confirming the shape of models being created.

It is often difficult for the user to confirm shapes of models from the original viewpoint only with PTM. In such cases, the depth mapping or mixed mapping provide good cues to confirm them, to find lacks of planes, and to adjust depth.

2.4.2. Smart second view

For easily understanding shapes of models while making them, our modeler displays not only the primary view but also the second view. This simultaneous presentation makes the user intuitively carry out creation and confirmation of the models.

We define the criteria for determining the second view parameters as follows.

Figure 3. Examples of PTM (left), depth mapping

(center), and mixed mapping (right).

Plane being created

Figure 4. Close-up of second view (left) and part of

primary view (right).

1. View parameters should not be changed frequently.

2. Next point which will be created (corresponding to the mouse cursor) must not be occluded by the other planes.

3. The view must not show us the backside of the target plane.

4. The view should be parallel to the target plane. 5. The view should have wide FoV when the

primary view has narrow FoV, and vice versa. The modeler searches for the parameters of the second view based on the above criteria. For real-time search of view parameters, the parameters are sampled coarsely. 3. User study

We carried out a user study for the purpose of investigate the pros and cons of our proposed modeler. In this user study, the subjects experienced indoor modeling from a photo using two modelers; one is with our proposed interaction techniques and another is without it. And then they gave us remarks about impressions of the modelers as a whole, the manipulations, and furniture arrangement. The conditions were as follows.

The difference of the interaction techniques between the modelers is configured as Table 1.

The subjects are given time to learn the interaction techniques of both modelers by tutorials for 30 minutes each. After the tutorials, they practice both modelers freely for 10 minutes each.

After the practices, the subjects create indoor models of a photo (Figure 1-(a)) by using both modelers for an hour each.

Table 1. Functions of modelers for experiment. The number of the subjects is six. Three subjects use the modeler without the proposed interaction techniques firstly, and then use the modeler with it. The rest is the opposite.

Without proposedinteraction techniques

With proposed interaction techniques

Camera calibration

Quadrangular tool

Freeform tool

Plane manipulation without geo. constraint

Plane manipulation with geo. constraint

Projective texture mapping

Depth mapping

Mixed mapping

Second view

The subjects make remarks within 30 minutes after trials for both modelers.

The followings are the summary of the remarks. [Impressions of modelers with/without proposed interaction techniques]

Many subjects had positive impressions on the proposed modeler. On the other hand, without the proposed interaction techniques, it got negative impressions. We think that one of the reasons is because fewer constraints from a photo are available for manipulating 3D planes. [Impressions of manipulations of both modelers]

Many subjects had negative impressions because, in both modelers, different kinds of the manipulations have to be repeatedly used mainly by using a mouse. Despite that, the time to learn the manipulations was not sufficient. To alleviate the problem, our modeler should provide visual feedbacks and instructions for current manipulation, next one, and how to do it for inexperienced users.

Figure 5. Examples of furniture arrangement.

Future works include the improvement mentioned

above, development of tools for more flexible shape creation, and initial-shape recommendation by machine learning. Furthermore, we are considering that the problem of incorrect textures on occlusions can be solved by the following idea.

[Impression of furniture arrangement] Most of the subjects commented that the

expressiveness of indoor models consisting of 3D planes is enough for virtual furniture arrangement from the original viewpoint. However, we also had comments that the approximation of a set of planes gave the subjects undesired visual effect from other viewpoints. In addition, incorrect textures were projected onto occluded areas for the original viewpoint and such inappropriate textures were often observed when the viewpoint was different from the original viewpoint.

From models generated by a user, our modeler can compute visible and invisible regions for every plane. Based on the result, the occluded regions can be inpainted by copying the textures of visible regions. Moreover, the normal direction of each region and spatial relationship among them help to inpaint textures on occlusions in consideration of perspective distortion and actual proximity of each object in the real world. Although conventional methods of texture inpainting need huge amount of time to do it, we expect that our modeler can reduce the time drastically by applying the above idea for inpainting.

Figure 5 shows examples of furniture arrangement in which a virtual chair is placed using the indoor model created with the proposed modeler. As you can see in Figure 5, the occlusion by the table and the sofa is correctly represented.

4. Conclusions and future work 6. References

We have proposed the CV-supported modeler for creating indoor 3D models from a single photo. The modeler provides the plane generation tools which take advantages of camera parameters from a photo, editing by using the geometric constraint, and intuitive visualization for confirming shapes of generated models for efficient modeling. As for the user study, the subjects had positive impressions on the proposed interaction techniques. However, each manipulation and visualization should be improved more.

[1] D. Hoiem, et al., “Automatic photo pop-up”, ACM Trans. on Graphics, vol. 24, no. 3, pp. 577-584, 2005. [2] “Google SketchUp”, http://sketchup.google.com. [3] Y. Horry, et al., “Tour into the picture: using a spidery mesh interface to make animation from a single image”, In Proc. of SIGGRAPH, pp. 225-232, 1997. [4] B. M. Oh, et al., “Image-based modeling and photo editing”, In Proc. of SIGGRAPH, pp. 433-442, 2001. [5] “PTLens”, http://epaperpress.com/ptlens/.

interactive indoor 3d modeling from a single photo with cv ... · new furniture with a 3d indoor...

Documents