assessing ifc classes with means of geometric deep

2021 European Conference on Computing in ConstructionIxia, Rhodes, Greece

July 25-27, 2021

ASSESSING IFC CLASSES WITH MEANS OF GEOMETRIC DEEP LEARNING ONDIFFERENT GRAPH ENCODINGS

Fiona C. Collins1, Alexander Braun1, Martin Ringsquandl2, Daniel M. Hall3, André Borrmann1

1Technical University of Munich, Germany2Siemens AG, Corporate Technology, Munich, Germany

3ETH Zürich, Switzerland

ABSTRACTMachine-readable Building Information Models(BIM) are of great benefit for the building operationphase. Losses through data exchange or issues insoftware interoperability can significantly impedetheir availability. Incorrect and imprecise semanticsin the exchange format IFC are frequent and compli-cate knowledge extraction. To support an automatedIFC object correction, we use a Geometric DeepLearning (GDL) approach to perform classificationbased solely on the 3D shape. A Graph Convolu-tional Network (GCN) uses the native triangle-meshand automatically creates meaningful local featuresfor subsequent classification. The method reachesan accuracy of up to 85% on our self-assembled,partially industry dataset.

INTRODUCTIONThe use of a standardized data structure is critical tosuccessfully adopting BIM in the building-operationphase. Most building operation related tasks requireseamless and automatic data access to vast amountsof cross-disciplinary data, fast retrieval and spatial lo-calization of specific data, and the creation of relevantsub-views to perform specific jobs (Becerik-Gerberet al. 2012). The Industry Foundation Classes (IFC)provide an open data exchange format to standard-ize information flow in the BIM process. However,the mapping from BIM-to-IFC is not straight for-ward, and exchanges often result in a loss or mod-ification of object semantics and information (Kooet al. 2020), (Ozturk 2020). Misclassified entities,as defined in a IFC data model, prevent successfuldeployment of automation in BIM-supported tasksduring operation (Jang & Collinge 2020, Wu & Zhang2019) and require tedious rework jobs of the FacilityManagement (FM). BIM has its history in computer-aided-design (CAD) and is still nowadays a methodwith primary dependence on geometric modeling andinstance placement. An architect or engineer mightrecognize an object seamlessly based on his experienceand thus miss to define an explicit semantic class. Incontrast, the latter’s absence will likely keep the ob-ject hidden from any automated model parsing. AsBIM requirements are growing to encapsulate variouscomplex concepts and entities across multiple disci-plines, the IFC standard becomes highly complex and

requires expert knowledge for the correct semanticmapping of BIM instances. Consequently, the lackof rigidness in describing BIM instances makes IFC-based exchanges unpredictable (Koo et al. 2020).

The absence of semantic object classes is a com-mon challenge also in the Scan-to-BIM process (Maet al. 2018). The reconstructed geometry from pointclouds needs to be segmented and semantically la-beled before being of use for downstream applica-tions.

In the following paragraphs, we will first showwhy the general classification of 3D shapes is not astraight-forward task due to the various formats thedata can be encoded in. After that, we describe howthe rapid progress in GDL might be a powerful toolfor imitating a practitioner’s eye and facilitate com-puterized shape understanding.

3D data formats

BIM elements can be encoded explicitly (definitionby the element’s surface) and implicitly (a series ofconstruction steps encodes the element). Implicitrepresentations such as Constructive Solid Geometry(CSG) or extruded-/swept volumes have different ad-vantages favoring a traceable and flexible geometryexchange in the BIM process (Borrmann et al. 2015).A limiting factor of implicit representations is, all in-volved software systems’ requirement to support theoperators used for the geometries’ initial creation.

Explicit representation formats such as BoundaryRepresentations (BRep) or triangulated surface rep-resentations provide more generic object encoding.As shown in Figure 1a, surface models consist of a setof connected polygons or triangles, discretizing an ob-ject’s continuous surface. Their use extends beyondBIM and is especially important in visualization-related applications, such as simulations. Similarly,solid models, e.g., CSG or voxel-assembled geometry(Figure 1c) encode the physical space an object oc-cupies. Depending on the type, one or several solidsform a semantically coherent object. Moreover, pointclouds are commonly known to be acquired by Li-DAR technology (Bosché et al. 2015), Photogramme-try (Tuttas et al. 2017), or depth cameras (Armeniet al. 2016) and can, depending on the acquisitionsettings, offer a very exact representation of the sur-faces of a scene or object. A set of points in a givencoordinate system (X,Y,Z) encodes for a scene, an ob-

(a) (b) (c) (d)

Figure 1: Explicit 3D representations for a 3-way valve: (a) Triangulated mesh (b) point cloud (c) Voxel grid (d) 2D Multi-view images

ject shape, or a segment (Figure 1b). Alternatively,a 3D shape can be represented by its 2D projectionor rendering from multiple view angles (Figure 1d).

When applying Machine Learning (ML), the needfor vast datasets tends to let researchers choose gen-eral representation formats over native implicit rep-resentations formats from BIM (Kim et al. 2019).

Deep Learning on 3D data

Deep Learning (DL) has led to considerable break-throughs in various tasks, notably in computer vision(Krizhevsky et al. 2012, Czerniawski & Leite 2020).However, deep layer stacking and the concept of con-volutional filters by design induce specific priors inthe learning process. These constructs have appearedto be very suitable, especially in image classificationand segmentation tasks on structured data domains.

While deep learning has matured into a technologyfinding its use in commercial applications, its appli-cation on 3D data is not straight forward. When con-sidering non-Euclidean data such as surface meshes orunstructured and irregular point cloud data, the con-volutional filter’s fundamental assumptions are notgiven (Bronstein et al. 2017). Discretizing the 3Dspace into a regular density grid or taking multi-view snapshots of the object are ways to reestablisha structured data space suitable for deep convolu-tional learning. Choosing the ideal voxel size in thefirst and defining the optimal viewpoint in the secondapproach is a challenging preprocessing step and islikely to impede the conservation of relevant featuresfor classification tasks. PointNet (Qi et al. 2017) andit’s wide range of variations operate on point clouddata directly leveraging important spatial character-istics native to the data. It treats each point individ-ually and uses, in the case of PointNet++, symmetricfunctions to guarantee the principles of convolutionto apply to 3D data. PointNet, however, does notconsider local connectivity between points and failsto extract detailed geometrical information (Wang,Huang, Hou, Zhang & Shan 2019, Wang, Sun, Liu,Sarma, Bronstein & Solomon 2019).

Geometric deep learning (Bronstein et al. 2017)is the translation of the key concepts of convolu-tion to the non-Euclidean domain and allows for im-

proved 3D learning on explicit geometry represen-tations without data preprocessing or cumbersomefeature engineering. New methods and facilitationframeworks such as PyTorch Geometric1 or DeepGraph Library2 unlock the development of algorithmsoperating on raw 3D surface meshes.

The abundance and relevance of surface meshes inBIM and point cloud reconstruction suggest exploringtechnologies, capable of leveraging the native datacharacteristics. In this context, our contribution canbe summarized as follows:

• The advantages of GDL in the context of IFCinstance classification are identified.

• A method for future assembly of IFC entities forextended shape learning is presented

• The presented dataset is published as BIM-GEOM (Collins 2021)

• A basic light-weight GDL architecture is imple-mented to operate upon the IFC shape encod-ings.

We start off by showing how previous approachesdealt with classifying BIM elements and point out thebenefits GDL could have in comparison. In the meth-ods section we present the workflow leading from IFCmodels to two different shape encodings, laying thebase for GDL. We then train a Graph ConvolutionalNeural Network (GCN) and discuss the results’ rele-vance to today’s construction industry challenges.

RELATED WORKMethods commonly referred to as BIM semantic en-richment deduce implicit information from meaning-ful topological and element-wise features. We candifferentiate between rule-based-inference and MLmethods.

Sacks et al. (2017) use single-object features likee.g volume, extrusion direction and pair-wise topolog-ical, as well as semantic relationship features such as”parallel to” or ”neighboring object is X”. Ma et al.(2018) highlight, that rule-based inference lacks rigorand suggests to extend their approach with ML.

1https://pytorch-geometric.readthedocs.io2https://www.dgl.ai

https://pytorch-geometric.readthedocs.io

https://www.dgl.ai

ML methods have in turn reached significant at-tention. Koo et al. (2017) extract geometrical fea-tures such as width, height, length, volume from theobject geometries to train a support-vector-machinefor IFC object outlier detection. The challenge of tra-ditional Euclidean ML models lies in defining mean-ingful features derived from the 3D shape.

Kim et al. (2019) use a 2D CNN approach to clas-sify 3D objects and render multi-view images fromeach object. Although the 2D-image approach offersadvantages regarding the availability of datasets fortransfer-training, it neglects structures and patternsthat only exist in 3D. Most recently, Koo et al. (2020)introduce the thermography of GDL and comparePointNet to a Multi-View-CNN approach for BIM el-ements classification. Despite achieving good results,they argue that the approaches are computationallytoo expensive in training and preprocessing for con-struction industries’ daily practice.

No previous work has investigated non-Euclideandeep learning methods in the BIM context. The keyidea is to learn high-dimensional vector representa-tions of local geometry, so-called embeddings, therebycircumventing the tedious task of geometric featureselection. We extend the work in this domain by in-troducing GDL concepts described by Bronstein et al.(2017) and offer a data-driven classification approachsuitable for domain practitioners.

METHODS AND APPROACHIn order for the GCN to fulfill the classification

task, training on labeled data is required. The pro-cess of preparing a GCN for building element classifi-cation is as follows: 1. Assembly, preprocessing, andencoding of shapes, 2. iterative optimization of thenetworks trainable parameters (training), and 3. val-idation of the network on unseen data. We assemblea dataset of building models from both, industry andacademia and form with it the basis of our approach.

Data assembly

The characteristics of the training dataset must wellreflect the characteristics of the domain the networkis supposed to be applied to later. Otherwise thenetwork might overfit to the more specific cases pre-sented in the training dataset.

Ideally, the trained network can predict the classof shapes independent of the

1. authoring software and vendor specific modellingdifferences

2. building functionality and its implications on lay-outs

3. Level of Geometry (LoG), low LoG remnantsfrom early project stages

4. repetitive nature of some construction elements,causing over- and under-represented classes

The aspects are considered in the following assemblysteps.

22 IFC files are used for data collection, authoredin Autodesk Revit and ArchiCAD by different do-main practitioners (addressing Item 1). Here, we as-sume that the BIM-to-IFC mappings are set correctlyby the author and perform a brief manual verifica-tion. A similarly significant share of models repre-sents office buildings and public buildings (Item 2).Although no closer attention is given to LoG dur-ing assembly, Item 3 is discussed in the result sec-tion. The Application Programming Interface (API)of SimpleBIM is used to tag unique geometries basedon their representations. We thereby avoid extract-ing repetitive elements from the IFC models (Item4). The IFC geometries of interest are extracted fromthe tagged files with IfcOpenShell3, triangulated us-ing the PythonOCC libraries4, and divided into train-ing(80%) and test(20%) set for learning and vali-dating. The dataset consists of structural elements(IfcWall, IfcSlab, IfcColumn, IfcWindow, IfcDoor,IfcStair IfcRailing), the equipment (IfcFlowTerminal,IfcFlowSegment, IfcFlowFitting, IfcDistributionCon-trolElement, IfcFlowController) and the interior fur-niture (IfcFurnishingElements), an example of each isshown in Figure 3.

Table 1 shows the final dataset composition. Amaximum of 100 randomly unique geometries per IFCfile is set to avoid over-representing one element class.It prevents abundant element classes with mostlyunique geometries like IfcWall and IfcSlab from dom-inating the final data (Item 4).

Table 1: BIMGEOM class distribution: Total no. of geometries(G), no. of unique geometries (UG) and the no. of selected

geometries (SG) limiting 100 UG per IFC file

Category G UG SG

IfcWall 39’474 23’381 2’000IfcSlab 7’541 6’828 1’395IfcColumn 6’752 2’307 1’280IfcWindow 9’531 932 776IfcDoor 8’473 1’917 1’313IfcStair 762 668 668IfcRailing 2’546 2’192 1’068IfcFlowTerminal 13’849 679 498IfcFlowSegment 39’596 29’902 308IfcFlowFitting 29’734 3’624 202IfcDistributionControlElement 7’778 220 181IfcFlowControler 4’907 233 175IfcFurnishingElement 6’200 773 371

Total 177’143 73’656 10’146

3http://ifcopenshell.org/python4http://www.pythonocc.org/

http://ifcopenshell.org/python

http://www.pythonocc.org/

Data prepossessing and encoding in a graph

The type of neural network used for element classi-fication influences the preprocessing effort and, con-sequently, considerably the feature availability andgranularity. Our neural network should run on dataclosest possible to the original geometry represen-tation. We also argue that the less preprocessingsteps are needed the more intuitive, generalizable,and light-weight the shape learning process becomes.

A graph is formulated as G = (V,W ); consistingof a set of vertices V and an adjacency matrix Wrepresenting the nodes’ connectivity. For connectednodes, the matrix W can take the value 1; 0 other-wise. In our case, we weight the edges according tothe normalized Euclidean distance between the con-nected nodes, letting W take continuous values. Eachnode in the graph is associated with a set of featuresX , normalized coordinates and normal vectors. Wedenote the node’s neighborhood as N(v).

A tessellated mesh translates to a graph intu-itively; the triangle vertex points formulate the setof vertices V , the triangles’ boundaries the matrixW . Figure 2a illustrates the graph formulated fromthe tessellated mesh upon batch generation. We no-tice that the transformation of low LoD geometry toa triangulated surface mesh results in planar surfacesbeing approximated by a single or relatively few largetriangles. High LoD geometry in turn is character-ized by a high density of triangles with a relativelysmall surface area. In some cases this Mesh-To-Graphmethod results in a rather simplistic graph on whichwe suppose the network to overfitt quickly. Gener-alization to even slightly different appearances mightbe difficult.

To counteract this, we choose an additional alter-native encoding. We uniformly sample 1024 points(similarly to Qi et al. (2017)) from the mesh faceesproportionally to their face area. The result is asynthetic point cloud encoding for the shape. A k-Nearest-Neighbor (k-NN) graph is created, based onthe node position in the metric space, see Figure 2b.This encoding has the advantage that more graphnodes more thoroughly represent low LoD geometry.In contrast, local high LoD geometry is approximatedby only a few sampled points. K = 5 was found toperform best.

Figure 2a-b illustrate the differences in encodingfor low LoD regions, 2c-d in turn show a local zoomon a high LoD region. The mesh graph comparativelylacks nodes on the large door faces but manifests ahigh node density at the geometrically more complexareas such as the door handle. In turn, the 5-NNgraph shows a more uniform node density across theshape, looses however important details for complexlocal geometry. Combining the methods to leveragethe advantages of both, lies beyond the scope of thispaper.

a b c d

Figure 2: Encoding variations for IfcDoor and a zoom on thedoor handle: (a) & (c) Graph translated from surface mesh, (b)& (d) Point Sampling and subsequent creation of 5-NN graph

Figure 3: Dataset samples: (from left to right, as read) IfcWall,IfcSlab, IfcColumn, IfcWindow, IfcDoor, IfcStair, IfcRailing,

IfcFlowTerminal, IfcFlowSegment, IfcFlowFitting,IfcDistributionControlElement, IfcFlowController,

IfcFurnishingElement

GCN architecture choices and training setting

To assemble a well performing GCN, several architec-tural choices such as the number of layers, the widthof feature channels, the activation function, the opti-mizer, drop-out strategy and the loss function, haveto be made. Such parameter optimizations are per-formed iteratively on a subset of BIMGEOM and leadto the described design choices. For further insightswe publish our code on GitHub5. The best perform-ing architecture after parameter optimization is thenused for training.

The GCN is trained such that the final embed-dings allow for classification. The need for cumber-some 3D feature engineering as used in related clas-sification approaches is thereby obleviated.

Equations 1 - 3 formalize the described approach.The embedding h0

v of node v correspond to the initialinput features described earlier. Every subsequenthigh-dimensional node embedding is obtained by ap-plying a nonlinear transformation σ to the sum of theweighted aggregation of neighborhood embeddings huand the weighted embedding of node v itself, hv. In amulti-layer graph convolution network where K > 1,the output of Eq. 3 serves as an input for the subse-quent convolutional layer. The trainable weight ma-trices Wk and bias matirces Bk are, similar to ker-nels in CNNs operating on images, shared amongstall nodes in one convolutional layer. We set the ag-

5https://github.com/fclairec/geometric-ifc

https://github.com/fclairec/geometric-ifc

Concatenate

Dro

pout

Ran

dom

trans

form

atio

n

GC

N(N

,64)

GC

N(N

/2,2

56)

Dro

pout

ReLU

ReLU

Line

ar(2

56,C

)

GC

N(N

,132

)

ReLu

K-N

Nas

sem

bly G=(V, W)

Graph construction

Glo

balp

oolin

g(1

,256

)

Shap

ein

put

N ×(d +dim(X))

N ×(d +dim(X))

IfcDoorIfcWallIfcSlabIfcColumnIfcWindowIfcStairIfcRailing...

original mesh

sampled points

Figure 4: Graph convolutional neural network architecture for element classification: The network takes N points as an input for bothgraph encoding variations. Each node has 3-dimensions d and an additional set of features X . Random transformations, such as

rotations, random translation per node, allow for dataset balancing.

gregation of neighborhood embeddings to ”sum” forthis work.

In practice, this means that in a three-layeredGCN as used here, each node will have updated itsembedding three times, every time considering thefeatures from its direct neighborhood. The final em-bedding h3

v of node v will contain information fromits three-hop neighborhood.

h0v = xv (1)

hkv = σ

(Wk ∑hk−1

u +Bkhk−1v

)∀k ∈ {1, . . . ,K} and ∀u ∈ N(v)

(2)

zv = hKv (3)

A multi-layer graph convolution architecture, as seenin Figure 4 is implemented and serves as a basis forour experiments. Throughout the three model lay-ers, the embeddings are concatenated with those ofthe respective prior layer. The concatenation allowsthe network to forget about the updated, potentiallyless useful embeddings, and fall back to use the previ-ous layer’s embeddings solely. A dropout rate of 0.2is introduced as a regularization strategy. As an acti-vation function, we choose ReLU. The global poolinglayer takes the feature-wise maximum across nodes,forming the final embedding of dimension 256 andforwards them to a fully connected layer. Finally, across-entropy loss guarantees good training for clas-sification purposes.

The training was conducted over 250 epochs withearly stopping. We found the network to perform bestwith a learning rate of 0.001, and batch size 30.

The dataset is balanced with class subsamplingand dataset augmentation to avoid biases towardsdominant classes in the dataset. We transformed ev-ery sample randomly upon loading. Apart from pre-venting overfitting on specific graph constellations,such transformation steps have other advantages:

• Scale normalization permits the network to clas-

sify shapes independent of their scale.• Translations in the range of 0 to 0.01 (in refer-

ence to unit scale) are applied to each node. Inthe case of the 5-NN graph encoding, this hasthe benefit of the training samples being moresimilar to noisy reconstructed meshes from theScan-to-BIM use case.

• Random rotations of 360° with respect to all 3axes guarantee the network to be rotation invari-ant. Even-though graphs are rotation-invariantby definition, the addition of absolute coordi-nates in the node features induces a bias.

• Evening class size reduces bias towards over-represented samples, e.g., IfcWall. By applyingrandom transformations, samples from under-represented classes are drawn several times fromthe dataset, and each time modified slightly.

Arguably, the network could or could not be rotationand scale invariant. In most cases it is a justifiedassumption, that the building elements are correctlyoriented with respect to their z-axis. We therefore in-vestigate our networks performance by adding a biasof a correctly oriented z-axis to the network by apply-ing random rotation only around the z-axis. Out ofscope in this work lies the consideration of absoluteshape size for better classification purposes.

We report the classification results with means ofthe per class and overall test accuracy. Additionalinsights are discussed using a confusion matrix.

EXPERIMENTS AND USE CASE SCE-NARIO

Classification performances

We evaluate our trained GCN architecture on thegiven test set of BIMGEOM. The overall test ac-curacy reported is 0.73 for the Mesh-to-Graph and0.67 for 5-NN-Graph encoding. When training thenetwork with a bias around the oriented z-axis theoverall test accuracy increases to 0.85 and 0.83, re-spectively. In Table 2, additionally, the accuracy per

Table 2: Classification results reported as overall accuracy (OA) and accuracy per class. The reported values are an average of 5trainings. The per class accuracy is calculated as the number of correctly classified samples normalized by the total samples of the

respective class in the augmented dataset. We compare the outcome of the two graph encoding variations Mesh-to-Graph and5-NN-Graph. * denotes experiments where only random rotations around the x,y plane were applied (fixed in z-direction).

Encoding OA IfcWa IfcSl IfcCo IfcWi IfcDo IfcSt IfcRai IfcFT IfcFSe IfcFFi IfcDiCoEl IfcFCo IfcFuEl

Mesh to Graph 0.73 0.67 0.72 0.85 0.63 0.89 0.82 0.65 0.37 0.84 0.73 0.78 0.82 0.195-NN Graph 0.67 0.86 0.79 0.85 0.29 0.80 0.77 0.48 0.33 0.32 0.71 0.80 0.70 0.09

Mesh to Graph* 0.85 0.75 0.82 0.90 0.80 0.70 0.75 0.65 0.79 0.74 0.86 0.83 0.81 0.515-NN Graph* 0.83 0.83 0.92 0.93 0.71 0.93 0.95 0.83 0.64 0.79 0.83 0.81 0.78 0.30

class is reported. We note that both, the overall, aswell as the per-class accuracy represent the amountof correctly classified elements (per class) relative tothe total samples (of each class) in the balanced testset. Since class balancing results in multiple samplingand transforming elements from under-representedclasses, we have to consider the risk of network bi-ases for such classes. Additional insight is given byconsidering the amount of confusion for each class,see the confusion matrix in Figure 6.

Rotation invariant network

Although, when rotation invariant, individual classesperform relatively well for both encodings (Ifc-Door, IfcStair, IfcFlowController), IfcFurnishingEle-ment and IfcFlowTerminal are conspicuous. Theseclasses contain the most geometrical variance (e.g.,chair and bookshelf or water tap and a ventilationoutlet) and show the need for a classifier at the sub-type level or the consideration of context informa-tion. Figure 5 aims to give insight into the inter-classvariance for IfcFowTerminal and IfcDoor. IfcDistri-butionControlElement is supposedly not representedwell enough in the dataset but is often approximatedby dummy geometry or vendor-specific geometries.The network, performing well for this class, mightbe reflecting a bias towards the specific shape typepresent in the dataset.

The Mesh-to-Graph encoding performs signifi-cantly better than the 5-NN encoding for IfcRailing,and IfcFlowSegment, IfcWindow, worse however forIfcWall and IfcSlab. One cause for this could be thedifference in point density at high LoD regions be-tween the two approaches, for example, at the door-or window handles. During preprocessing too fewpoints are sampled in such areas. Similarly, the subtlechanges in surface geometry between a windows’ glaz-ing and its frame might be approximated too coarselyin the 5-NN approach. The flexible representationof triangle meshes seems to have its benefits whenshapes have varying surface complexities. An impre-cision in encoding caused by the 5-NN graph con-struction might additionally cause lousy performancefor IfcRailing: Sampled points on the same bar couldlie further apart than they do to points on the neigh-boring bar and cause imprecise graph connections..

The network’s confusion between IfcStair and IfcRail-ing might be an indicator of this.

On the other hand, the better performance of the5-NN approach in classes such as IfcWall and IfcSlab,suggests that the nodes on the large flat surfaces areessential for classification. For instance, the similarityof normal vectors on large triangle surfaces might notbe identified for the Mesh-to-Graph encoding due tothe absence of nodes on the surface.

IfcFlowSegments are confused partially with Ifc-Columns and IfcWalls with IfcSlabs or IfcColumns.We believe that rotation invariance impedes a rea-sonable classification in this case.

Figure 5: Inter-class variability for IfcFlowTerminal (top-row)and IfcDoor (bottom-row): Some examples from the dataset are

shown

Z-oriented-axis biased network

When we add a network bias around the orientedz-axis, most classes’ classification accuracies arepromising. Although different for both encodings, itcan generally be said that the intentional bias addscertainty where we would, as domain experts, expectit to, e.g., IfcWall, IfcSlab, IfcWindow, IfcFlowSeg-ment, IfcFurnishingElememnt, IfcRailing.

Same as in the rotation invariant encoding, theoverall accuracy is slightly higher for the Mesh-to-Graph encoding. However, the total number ofclasses performing better is higher for the 5-NN ap-proach.

IfcFurnishingElement still performs poorly forboth encodings, although the performance for theMesh-to-Graph encoding has increased considerably.When looking into the individual predictions, the net-work classifies many non-simplified IfcFurnishing ele-ments correctly for the Mesh-to-Graph encoding. Wesuppose that happens thanks to higher node densityat high LoD geometry for the mesh-graph than the 5-

NN-Graph. Details such as furniture-leg extremitiesand bookshelf handles seem to be learned correctlyand account for some good predictions. The high di-versity in the class IfcFurnishingElement would how-ever suggest investigating classification at subtypelevel. For classes with no particular orientation inthe building, e.g., IfcFlowController, the z-bias do notlead to significant performance increases.The confusion between classes for this experiment aredepicted in Figure 6.

Despite the bias in z-direction, confusion occursbetween IfcFlowSegment and IfcColumn, which is ex-plicable by the IfcFlowSegents also being verticallypresent in a typical building. IfcWall still manifestsconfusion with IfcColumn. For the mentioned classes,we suggest that context information or scale infor-mation is needed for a yet more precise classification.The confusion between IfcWall with IfcSlab has beenreduced considerably by introducing the bias.

IfcFurnishingElements are, on the one hand, mostoften confused with IfcDistributionControlElementsor IfcWall and on the other hand with IfcFlowFit-ting. The first confusion is explicable by dummycuboid geometries being present in both IfcFurnishin-gElement and IfcDistributionControlElement classesand the missing scale information due to shape nor-malization. Low IfcWalls and compact IfcFlowFit-ting cuboids, similarly, can resemble approximateddummy benches, tables, and bookshelves. For suchcases, it becomes essential to consider context in-formation for increasing classification performance.IfcFurnishingElement, being classified as IfcFlowTer-minal, might be explained by the latter manifestinglong-straight geometrical forms (fixations extendingto e.g. ceiling or floor or long LED strips). Thefurniture-leg extremities are most likely confused withthem.

IfcWindows often misclassified as IfcDoors, how-ever not in the opposite way. Especially single wingedIfcWindows are without window sill are mistaken fordoors. IfcWindows manifesting either of such a char-acteristic are very likely to be recognized correctly.Generally, the equipment classes such as IfcFlow-Controller and IfcFlowTerminal manifest more confu-sion amongst themselves than they do with structuralelements. This holds for the 5-NN approach as well asfor the Mesh-to-Graph approach. We deduce that theweighted adjacency matrix encodes well for the phys-ical distances between the nodes and lets the networkdifferentiate between structural elements and equip-ment. Seemingly, the network identifies close-rangegeometrical features similarly well as long-range fea-tures.

Benchmark evaluation for Scan-To-BIM workflow

Additionally to validating our trained model on theunseen test data of BIMGEOM, we extend the valida-tion to a benchmark dataset without retraining. As

described earlier, the object classification of surfacemeshes can become useful in the Scan-To-BIM pro-cess. The reconstruction of surfaces from point cloudsis a challenging task for itself. After reconstruction,the next task is instance segmentation and classifica-tion, which, as we argue, could benefit from advancesin deep learning on surface models as presented in ourapproach. We thus evaluate our approach in the con-text of Khoshelham et al. (2017)’s benchmark datasetfor indoor modeling. Apart from the point cloudsand respective IFC models, the dataset does not con-tain reconstructed surface mesh entities to test ouralgorithm. Therefore, we apply the same methodsdescribed for data assembly and report the overallprediction performances as overall accuracy in Table3. The dataset lacks equipment and furnishing in-stances, limiting our evaluation’s scope to the follow-ing structural classes: IfcWall, IfcSlab, IfcWindow,IfcDoor, IfcColumn, IfcStair, IfcRailing. The over-all prediction accuracy is 0.54 for the Mesh-to-Graphapproach and 0.81 for the 5-NN-Point-Graph enod-ing. We can argue that the GCN trained on the 5-NN encoding has a better generalization power. Weattempt to explain this by pointing out that, pointsare sampled at random from the mesh surfaces duringdata augmentation, yielding a slightly different graphfor training at each time. In contrast, the Mesh-to-Graph encoding corresponds in each case to the initialconnectivity present in the input shape. Except forthe random noise (translation) added, the trainingsamples will look the same each time. The Mesh-to-Graph approach thus requires a bigger dataset forbetter generalization. The 5-NN, arguably, general-ized well with an overall accuracy of 0.81 The bench-mark shapes are rather simplistically modeled (lowLoG) since the IFCs are reconstructions from pointclouds rather than models used for construction plan-

Figure 6: Confusion matrix (rows:true labels,columns:predicted labels) for element encoding as 5-NN Graph,

biased for z-oriented axis. The matrix is normalized by thenumber of true labels in each class. The diagonal entries thus

are the same as the accuracy per class reported in Table 2

ning. To increase the performance of our approach to100%, we require an extension of our approach withcontext information or a higher LOG modeling de-gree.

Table 3: Inference results on Indoor Modelling BenchmarkDataset (Khoshelham et al. 2017)

Model name OA

Mesh to Graph 0.535-NN Graph 0.66

Mesh to Graph* 0.545-NN Graph* 0.81

Model complexity in 3D learning

The last experiment compares our approach with re-spect to its computational efficiency. Next to predic-tion performance, the number of trainable parametersand the amount of preprocessing steps are essentialpoints, determining the algorithms’ feasibility in theconstruction industry’s practice. As Koo et al. (2020)most recently points out, the computational inten-sity of heavy approaches is a practical drawback fordeployment. Compared to deep learning approachessuch as PointNet++ and MVCNN, mentioned in Kooet al. (2020), our model has considerably fewer pa-rameters. Table 4 shows that our GCN model re-duces the number of trainable parameters by a factorof 10 compared to PointNet++ and achieves simi-lar overall accuracies. Regarding the preprocessingsteps, our approach avoids selecting the ideal view-points for multi-view image capture and, in the caseof the Mesh-to-Graph encoding, avoids a point sam-pling as performed in PointNet++.Table 4: Deep learning architecture complexity: Total amountof trainable parameters (TP), mean epoch time (MET) [s] forspecified hardware, and batch size. PointNet++ is deployedwithout its spatial transformer and trained on the same train

and test data. Additionally reported is Koo et al. (2020)s resultsof the number of parameters for MVCNN

Model name # TP MET [s] OA

GCN 126’925 10.5 0.85PointNet++ 3.5 M. 189.5 0.86MVCNN 60 M. - -

RESULT DISCUSSIONFor excellent support of the industry’s practition-ers, the network would need to achieve accuraciesclose to 100%. Considering the limited scope of cer-tain training data in the BIM community, we believethe reported overall accuracies of above 80% to be apromising step in the direction of automated modelcorrection. The experiments showed, the network tohave difficulties coping with high-geometric-varianceclasses such as IfcFurnishingElement or IfcFlowTer-

minal. The network’s final layers do not seem to allowfor different geometrical feature learning within oneclass. Investigating this issue further and addressingit by adapting the model architecture or classifyingelements into more detailed sub-types might be ben-eficial. Since the benchmark dataset does not includeall classes and limits the conclusion on generalizationpower, we remain careful about conclusions regardingthe preferred form of graph encoding. The net-work captures localized, fine-grained geometric vari-ations as well as long-range features well for both en-codings. This ability of GCNs to deduce relevant ge-ometrical features automatically reduces the need formanual geometric feature definition, as suggested byvarious work introduced earlier.

Limitations & Overfitting

We partially address the issue of little publicly avail-able data Ma et al. (2018) by publishing the datasetused in this work. However, especially since mostIFC models lack equipment, overfitting is an issue.The network is most likely biased for vendor-specificequipment types and might perform poorly on others.

The IFC models used for this work are assumedto have semantically correct elements. Only a shortvisual analysis has been performed to check the dataconsistency.

Relevance for BIM/FM practitioners

This work’s motivation originated from stakeholder-specific or lacking construction element classes im-peding the automated use of BIM in the operationphase. This work suggests a generic, adaptive, andlight-weight machine learning approach to map con-struction elements to stakeholder-specific data mod-els, thereby unlocking further automation in FM re-lated data retrieval tasks. Avoiding tedious featureselection and operating on close-to-native IFC geom-etry, we imagine that the method could be embeddedin any BIM parsing application. Despite our re-sults being limited to IFC classes’ granularity, ourapproach can be extended to operate on other classi-fication systems such as Omniclass or Uniclass.

CONCLUSION & OUTLOOKThe lack of correct semantic classes needed forbuilding operation could be improved with methodssuch as ours. The full capacities of GDL in thiscontext is promising and suggests further work intothe direction, especially regarding the inclusion ofcontext information.For our approach to be of value for the Scan-to-BIMworkflow, future work could proceed to evaluateit together with a preceding mesh reconstructionand instance segmentation. When reconstructingpoint clouds, occlusions or incorrect surface recon-

struction result in incomplete shapes, challengingsegmentation and classification. In future work, theknowledge of our research could be extended withan ablation study with respect to node removal andshape completion.

ACKNOWLEDGEMENTSWe gratefully acknowledge Prof. Georg Nemetschek’sgenerosity and support for this research.

REFERENCESArmeni, I., Sener, O., Zamir, A. R., Jiang, H.,

Brilakis, I., Fischer, M. & Savarese, S. (2016), ‘3Dsemantic parsing of large-scale indoor spaces’, Pro-ceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition 2016-Decem, 1534–1543.

Becerik-Gerber, B., Jazizadeh, F., Li, N. & Calis,G. (2012), ‘Application areas and data require-ments for BIM-enabled facilities management’,Journal of Construction Engineering and Manage-ment 138(3), 431–442.

Borrmann, A., König, M., Koch, C. & Beetz, J., eds(2015), Building Information Modeling, VDI-Buch,Springer Fachmedien Wiesbaden.

Bosché, F., Ahmed, M., Turkan, Y., Haas, C. T.& Haas, R. (2015), ‘The value of integratingScan-to-BIM and Scan-vs-BIM techniques forconstruction monitoring using laser scanning andBIM: The case of cylindrical MEP components’,Automation in Construction 49, 201–213.URL: http://dx.doi.org/10.1016/j.autcon.2014.05.014

Bronstein, M. M., Bruna, J., Lecun, Y., Szlam, A. &Vandergheynst, P. (2017), ‘Geometric Deep Learn-ing: Going beyond Euclidean data’, IEEE SignalProcessing Magazine 34(4), 18–42.

Collins, F. (2021), ‘BIMGEOM’.URL: https://doi.org/10.7910/DVN/YK86XK

Czerniawski, T. & Leite, F. (2020), ‘Automated seg-mentation of RGB-D images into a comprehen-sive set of building components using deep learn-ing’, Advanced Engineering Informatics 45(Novem-ber 2019), 101131.URL: https://doi.org/10.1016/j.aei.2020.101131

Jang, R. & Collinge, W. (2020), ‘Improving BIM assetand facilities management processes: A Mechani-cal and Electrical (M&E) contractor perspective’,Journal of Building Engineering 32(May), 101540.URL: https://doi.org/10.1016/j.jobe.2020.101540

Khoshelham, K., Vilariño, L. D., Peter, M., Kang,Z. & Acharya, D. (2017), ‘The ISPRS bench-mark on indoor modelling’, International Archivesof the Photogrammetry, Remote Sensing andSpatial Information Sciences - ISPRS Archives42(2W7), 367–372.

Kim, J., Song, J. & Lee, J.-K. (2019), Recognizingand Classifying Unknown Object in BIM Using 2DCNN, Vol. 1028, Springer Singapore.URL: http://link.springer.com/10.1007/978-981-13-8410-3

Koo, B., Jung, R., Yu, Y. & Kim, I. (2020), ‘Ageometric deep learning approach for checkingelement-to-entity mappings in infrastructure build-ing information models’, Journal of ComputationalDesign and Engineering 0(October), 1–12.

Koo, B., Shin, B. & Krijnen, T. F. (2017), ‘Employ-ing outlier and novelty detection for checking theintegrity of BIM to IFC entity associations’, IS-ARC 2017 - Proceedings of the 34th InternationalSymposium on Automation and Robotics in Con-struction (July 2017), 14–21.

Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012),Imagenet classification with deep convolutionalneural networks, in F. Pereira, C. J. C. Burges,L. Bottou & K. Q. Weinberger, eds, ‘Advances inNeural Information Processing Systems’, Vol. 25,Curran Associates, Inc., pp. 1097–1105.

Ma, L., Sacks, R., Kattel, U. & Bloch, T. (2018),‘3D Object Classification Using Geometric Featuresand Pairwise Relationships’, Computer-Aided Civiland Infrastructure Engineering 33(2), 152–164.

Ozturk, G. B. (2020), ‘Interoperability in buildinginformation modeling for AECO/FM indus-try’, Automation in Construction 113(December2019), 103122.URL: https://doi.org/10.1016/j.autcon.2020.103122

Qi, C. R., Su, H., Mo, K. & Guibas, L. J.(2017), PointNet: Deep Learning on Point Setsfor 3D Classification and Segmentation, in ‘TheIEEE Conference on Computer Vision and PatternRecognition (CVPR)’, Vol. 2017-Janua, pp. 601–610.

Sacks, R., Ma, L., Yosef, R., Borrmann, A., Daum,S. & Kattel, U. (2017), ‘Semantic Enrichmentfor Building Information Modeling: Procedure forCompiling Inference Rules and Operators for Com-plex Geometry’, Journal of Computing in Civil En-gineering 31(6), 04017062.

Tuttas, S., Braun, A., Borrmann, A. & Stilla, U.(2017), ‘Acquisition and Consecutive Registrationof Photogrammetric Point Clouds for Construction

Progress Monitoring Using a 4D BIM’, Photogram-metrie, Fernerkundung, Geoinformation 85(1), 3–15.

Wang, L., Huang, Y., Hou, Y., Zhang, S. & Shan,J. (2019), ‘Graph attention convolution for pointcloud semantic segmentation’, Proceedings of theIEEE Computer Society Conference on ComputerVision and Pattern Recognition 2019-June, 10288–10297.

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein,M. M. & Solomon, J. M. (2019), ‘Dynamic graphCnn for learning on point clouds’, ACM Transac-tions on Graphics 38(5).

Wu, J. & Zhang, J. (2019), Introducing Geomet-ric Signatures of Architecture, Engineering, andConstruction Objects and a New BIM Dataset, in‘Computing in Civil Engineering 2019: Visualiza-tion, Information Modeling, and Simulation - Se-lected Papers from the ASCE International Con-ference on Computing in Civil Engineering 2019’,number September, pp. 264–271.

assessing ifc classes with means of geometric deep

Documents