atmos u karto phd

8/12/2019 Atmos u Karto Phd

1/139

3D Shape Analysis for Quantification, Classification, and Retrieval

Indriyati Atmosukarto

A dissertation submitted in partial fulfillment ofthe requirements for the degree of

Doctor of Philosophy

University of Washington

2010

Program Authorized to Offer Degree: Computer Science and Engineering


2/139


3/139

University of WashingtonGraduate School

This is to certify that I have examined this copy of a doctoral dissertation by


and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final

examining committee have been made.

Chair of the Supervisory Committee:

Linda G. Shapiro

Reading Committee:

Linda G. Shapiro

James F. Brinkley III

Maya Gupta

Date:


4/139


5/139

In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make its copies

freely available for inspection. I further agree that extensive copying of this dissertation isallowable only for scholarly purposes, consistent with fair use as prescribed in the U.S.Copyright Law. Requests for copying or reproduction of this dissertation may be referredto Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,1-800-521-0600, to whom the author has granted the right to reproduce and sell (a) copiesof the manuscript in microform and/or (b) printed copies of the manuscript made frommicroform.

Signature

Date


6/139


7/139

University of Washington

Abstract

3D Shape Analysis for Quantification, Classification, and Retrieval


Chair of the Supervisory Committee:

Professor Linda G. Shapiro

Computer Science and Engineering

Three-dimensional objects are now commonly used in a large number of applications includ-

ing games, mechanical engineering, archaeology, culture, and even medicine. As a result,

researchers have started to investigate the use of 3D shape descriptors that aim to encapsu-

late the important shape properties of the 3D objects. This thesis presents new 3D shape

representation methodologies for quantification, classification and retrieval tasks that are

flexible enough to be used in general applications, yet detailed enough to be useful in medical

craniofacial dysmorphology studies. The methodologies begin by computing low-level fea-

tures at each point of the 3D mesh and aggregating the features into histograms over mesh

neighborhoods. Two different methodologies are defined. The first methodology begins by

learning the characteristics of salient point histograms for each particular application, and

represents the points in a 2D spatial map based on longitude-latitude transformation. The

second methodology represents the 3D objects by using the global 2D histogram of the

azimuth-elevation angles of the surface normals of the points on the 3D objects.

Four datasets, two craniofacial datasets and two general 3D object datasets, were ob-

tained to develop and test the different shape analysis methods developed in this thesis.Each dataset has different shape characteristics that help explore the different properties of

the methodologies. Experimental results on classifying the craniofacial datasets show that

our methodologies achieve higher classification accuracy than medical experts and existing

state-of-the-art 3D descriptors. Retrieval and classification results using the general 3D ob-


8/139


9/139

jects show that our methodologies are comparable to existing view-based and feature-based

descriptors and outperform these descriptors in some cases. Our methodology can also beused to speed up the most powerful general 3D object descriptor to date.


10/139


11/139

TABLE OF CONTENTS

Page

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 3D Descriptors for General Objects . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3: Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset . . . . . . . . . . . . . . . . 153.2 Deformational Plagiocephaly Dataset . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Heads Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 SHREC Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 4: Base Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Low-level Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Mid-level Feature Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 5: Learning Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.1 Learning Salient Points for 22q11.2 Deletion Syndrome . . . . . . . . . . . . . 265.2 Learning Salient Points for Deformational Plagiocephaly . . . . . . . . . . . . 28

5.3 Learning Salient Points for General 3D Objects . . . . . . . . . . . . . . . . . 30

Chapter 6: 2D Longitude-Latitude Salient Map Signature . . . . . . . . . . . . . . 34

6.1 Salient Point Pattern Projection . . . . . . . . . . . . . . . . . . . . . . . . . 34

i


12/139

6.2 Classification using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Retrieval using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . . . 45

6.4 Retrieval using Salient Views . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Chapter 7: Global 2D Azimuth-Elevation Angles Histogram of Surface NormalVectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.1 3D Shape Severity Quantification and Localization for Deformational Plagio-cephaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2 Classification of 22q11.2 Deletion Syndrome . . . . . . . . . . . . . . . . . . . 78

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 8: Learning 3D Shape Quantification for Craniofacial Research . . . . . . 838.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.2 Facial Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.3 2D Histogram of Azimuth Elevation Angles . . . . . . . . . . . . . . . . . . . 86

8.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.5 Feature Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.6 Exp erimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

ii


13/139

LIST OF FIGURES

Figure Number Page

1.1 Example of applications that use 3D objects . . . . . . . . . . . . . . . . . . . 2

2.1 Anthropometric landmarks on patients head . . . . . . . . . . . . . . . . . . 12

3.1 Example of 3D face mesh data of children with 22q11.2 deletion syndrome. . 16

3.2 Tops of heads of children with deformational plagiocephaly. . . . . . . . . . . 17

3.3 Example of objects in the Heads dataset. . . . . . . . . . . . . . . . . . . . . 19

3.4 Example morphs from the horse class . . . . . . . . . . . . . . . . . . . . . . 19

3.5 Example of objects in the SHREC 2008 Classification dataset . . . . . . . . . 20

4.1 Low-level feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Azimuth and elevation angle of a 3D surface normal vector. . . . . . . . . . . 24

4.3 Mid-level feature aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1 Craniofacial anthropometric landmarks. . . . . . . . . . . . . . . . . . . . . . 27

5.2 Example of training points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.3 Example histograms of salient and non-salient points . . . . . . . . . . . . . . 29

5.4 Salient point prediction for two faces in the 22q11.2DS dataset . . . . . . . . 29

5.5 Salient point prediction for training data in Heads dataset . . . . . . . . . . . 31

5.6 Salient point prediction for testing data in Heads dataset . . . . . . . . . . . 31

5.7 Salient point prediction for objects in SHREC 2008 dataset . . . . . . . . . . 32

6.1 Salient point patterns on 3D objects . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 2D longitude-latitude signature maps . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Classification accuracy vs training rotation angle increment. . . . . . . . . . . 42

6.4 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.5 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.6 Salient points resulting from clustering. . . . . . . . . . . . . . . . . . . . . . 54

6.7 Salient view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.8 Salient views vs Distinct salient views . . . . . . . . . . . . . . . . . . . . . . 56

6.9 Top 5 distinct salient views in SHREC dataset . . . . . . . . . . . . . . . . . 57

6.10 Average retrieval scores using top K salient views . . . . . . . . . . . . . . . . 59

iii


14/139

7.1 Surface normal vectors of 3D points . . . . . . . . . . . . . . . . . . . . . . . 66

7.2 Calculation of the Flatness Scores . . . . . . . . . . . . . . . . . . . . . . . . 67

7.3 Severity localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.4 Spectrum of deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.5 Correlation between LPFS and Expert Score . . . . . . . . . . . . . . . . . . 70

7.6 Correlation between RPFS and Expert Score . . . . . . . . . . . . . . . . . . 70

7.7 Correlation between AS and Expert Score . . . . . . . . . . . . . . . . . . . . 72

7.8 Correlation between AAS and Expert Score . . . . . . . . . . . . . . . . . . . 72

7.9 Correlation between AAS and aOCLR . . . . . . . . . . . . . . . . . . . . . . 74

7.10 ROC curve for LPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.11 ROC curve for RPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.12 ROC curve for AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.13 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.14 Correlation between AAS and Brachycephaly score . . . . . . . . . . . . . . . 76

7.15 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.16 Projections of 2D azimuth-elevation angles to the face . . . . . . . . . . . . . 81

8.1 Overview of the quantification learning framework. . . . . . . . . . . . . . . . 83

8.2 Facial region selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.3 2D histogram of selected region . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.5 Positional information about selected region . . . . . . . . . . . . . . . . . . . 898.6 Positional information about selected region with normal vector . . . . . . . . 90

8.7 Output of the genetic programming quantification approach . . . . . . . . . . 91

8.8 F-measure for training and testing dataset . . . . . . . . . . . . . . . . . . . . 94

8.9 Projection of selected histogram bins . . . . . . . . . . . . . . . . . . . . . . . 100

8.10 Tree structure for quantifying midface hypoplasia . . . . . . . . . . . . . . . . 103

8.11 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 105

8.12 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 106

8.13 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 107

8.14 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 108

8.15 Quantification score for midface hypoplasia. . . . . . . . . . . . . . . . . . . . 109

iv


15/139

LIST OF TABLES

Table Number Page

4.1 Besl-Jain surface characterization. . . . . . . . . . . . . . . . . . . . . . . . . 23

6.1 Classification performance for 22q11.2DS. . . . . . . . . . . . . . . . . . . . . 37

6.2 Overall comparison of the various shape descriptors. . . . . . . . . . . . . . . 38

6.3 Comparison of classification accuracy for 22q11.2DS. . . . . . . . . . . . . . . 38

6.4 Plagiocephaly classification using 254 individual dataset . . . . . . . . . . . . 396.5 Plagiocephaly classification using 140 individuals dataset . . . . . . . . . . . . 39

6.6 Comparison of classification accuracy for plagiocephaly. . . . . . . . . . . . . 40

6.7 Comparison of classification accuracy for SHREC 2008 dataset. . . . . . . . . 43

6.8 Comparison of timing of each phase . . . . . . . . . . . . . . . . . . . . . . . 44

6.9 Pose-normalized retrieval experiment 2 . . . . . . . . . . . . . . . . . . . . . . 46

6.10 Average retrieval score comparing three pose-normalization methods. . . . . . 48

6.11 Average retrieval score using different low-level features . . . . . . . . . . . . 48

6.12 Average retrieval score using image wavelet analysis . . . . . . . . . . . . . . 49

6.13 Comparing the salient map signature best results against existing methods. . 496.14 Comparing retrieval score for classes in SHREC dataset . . . . . . . . . . . . 50

6.15 Average retrieval score using salient views . . . . . . . . . . . . . . . . . . . . 62

6.16 Retrieval score using maximum number of distinct views . . . . . . . . . . . . 63

6.17 Average feature extraction runtime per object. . . . . . . . . . . . . . . . . . 64

7.1 Descriptive statistics for the Left Posterior Flatness Score (LPFS) . . . . . . 71

7.2 Descriptive statistics for the Right Posterior Flatness Score (RPFS) . . . . . 73

7.3 Descriptive statistics for the Asymmetry Score (AS) . . . . . . . . . . . . . . 78

7.4 Descriptive statistics for AAS and aOCLR . . . . . . . . . . . . . . . . . . . . 79

7.5 AUC for quantifying posterior flattening . . . . . . . . . . . . . . . . . . . . . 80

7.6 Classification accuracy for plagiocephaly . . . . . . . . . . . . . . . . . . . . . 80

7.7 Classification of 22q11.2DS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.8 Classification accuracy of 22q11.2DS facial dysmorphologies . . . . . . . . . . 81

8.1 Genetic programming parameters. . . . . . . . . . . . . . . . . . . . . . . . . 92

v


16/139

8.2 Classification performance for nine facial anomalies using GP . . . . . . . . . 93

8.3 Classification performance using various shape descriptors . . . . . . . . . . . 95

8.4 Comparing GP to the global approaches . . . . . . . . . . . . . . . . . . . . . 96

8.5 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 97

8.6 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 98

8.7 Coefficients for midface hypoplasia . . . . . . . . . . . . . . . . . . . . . . . . 99

8.8 Best performing mathematical expression . . . . . . . . . . . . . . . . . . . . 101

8.9 Best performing mathematical expressions . . . . . . . . . . . . . . . . . . . . 102

8.10 Classification performance in predicting 22q11.2 Deletion Syndrome. . . . . . 104

vi


17/139

ACKNOWLEDGMENTS

I wish to express a very deep and sincere gratitude to my advisor, Professor Linda

Shapiro, without whose guidance, encouragement and support I would not be able to com-

plete this PhD study. I have learned tremendously from her on how to become an excellent

researcher and writer, especially one in the field of computer vision.

I am very grateful to all the members of my PhD thesis committee, Dr Maya Gupta,

Dr James Brinkley, Dr Steve Seitz, and Dr Mark Ganther, for their useful feedbacks and

comments.

I would also like to thank my collaborators at Seattle Childrens Hospital Cranifoacial

Centre: Dr Michael Cunningham, Dr Matthew Speltz, Dr Carrie Heike, Dr Brent Collett,

for providing me with the medical 3D mesh data for this dissertation, as well as for their

engaging discussions and suggestions.

I owe an indescribable amount of gratitude to my parents, my sisters, and my niece for

having confidence in me, always encouraging me and cheering me up when I am down.Finally, I reserve special thanks for my husband, David Gomulya, for being my best

friend and a great supporter, and my son, Kiran, for bringing new joy into my life.

This research was supported by the National Science Foundation under grant number

DBI-0543631.

vii


18/139

DEDICATION

to my son

Kiran Atmosukarto Gomulya

our Ray of Light

viii


19/139

1

Chapter 1

INTRODUCTION

1.1 Motivation

Advancement in technology for digital acquisition of 3D models has led to an increase in

the number of 3D ob jects available. Three-dimensional ob jects are now commonly used

in a number of areas such as games, mechanical design for CAD models, archaeology and

cultural heritage, and medical research studies. Figure 1.1 shows some applications that use

3D objects. The widespread integration of 3D models in different fields motivates the need

to be able to store, index, classify, and retrieve 3D objects automatically. However, current

classification and retrieval techniques for text, 2D images, and videos cannot be directly

translated and applied to 3D ob jects, as 3D ob jects have different data characteristics from

other data modalities.

Classification and retrieval of 3D objects requires the 3D objects to be represented in

a way that captures the local and global shape characteristics of the object. This requires

creating a 3D descriptor or signature that summarizes the important shape properties of the

object. Unfortunately, finding a descriptor that is able to describe the important character-

istics of a 3D object is not a trivial task. The descriptor should be able to capture a good

balance between the global and local shape properties of the ob ject, so as to allow flexibility

in performing different tasks. The global properties of an object capture the overall shape

of an object, while the local properties capture the details of an object.

A specific example of the usage of 3D models in the medical field is the work that re-

searchers at Seattle Childrens Hospital Craniofacial Center (SCHCC) are pursuing. The

researchers at SCHCC use CT scans and 3D surface meshes of childrens heads to inves-

tigate head shape dysmorphology due to craniofacial disorders such as craniosynostosis,

22q11.2 deletion syndrome, deformational plagiocephaly, or cleft lip and palate. These

researchers aspire to develop new computational techniques that can represent, quantify,


20/139

2

(a) (b) (c) (d)

Figure 1.1: Example of applications that use 3D objects: (a) Second Life is a game thatsimulates a virtual 3D world, (b) The Digital Michelangelo is a Stanford project that aimsto digitize cultural artifacts for cataloging, conservation, and restoration, (c) FoldIt! is acomputer game that uses 3D protein structures to understand how proteins fold for use indrug developments, and (d) Plan3D is an interior design application that allows users to

incorporate 3D models in house designs.

and analyze variants of biological morphology from the 3D models acquired from stereo

camera technology. The objective of their research in the long run is to ultimately reveal

genotype-phenotype disease associations.

This thesis investigates new methodologies for representing 3D objects that are useful

in medical applications. Most existing 3D shape descriptors have only been developed and

tested on general 3D object datasets, while those designed for medical purposes must usually

satisfy a specific medical application and dataset. The objective of this work is to develop

3D shape representation methodologies that are flexible enough to generalize from specific

medical tasks to general 3D object tasks. This work was motivated by the collaborations in

two research studies at SCHCC for the study of craniofacial anatomy: 1) a study of children

with 22q11.2 deletion syndrome and 2) a study of infants with deformational plagiocephaly.

22q11.2 deletion syndrome (22q11.2DS) is a genetic disease that is one of the most com-

mon multiple anomaly syndromes in humans [41]. This condition is associated with more

than 180 clinical features, including over 25 dysmorphic craniofacial features. Abnormal

clinical features of individuals with 22q11.2DS include asymmetric face shape, hooded eyes,

bulbous nasal tip, and retrusive chin, among others. The range of variation in individual

feature expression is very large. As a result, even experts have difficulty in diagnosing

22q11.2DS from frontal facial photographs alone [9]. Early detection of 22q11.2DS is im-


21/139

3

portant as many affected individuals are born with conotruncal cardiac anomalies, mild-to-

moderate immune deficiencies and learning disabilities, all of which can benefit from earlyintervention.

Deformational plagiocephaly (also known as positional plagiocephaly, or non-synostotic

plagiocephaly) refers to the deformation of the head, characterized by a persistent flattening

on the side resulting in an asymmetric head shape and misalignment of the ears. Deforma-

tional plagiocephaly is caused by persistent pressure on the skull of a baby before or after

birth. Another possible factor that can lead to deformational plagiocephaly is torticollis, a

muscle tightness in the neck resulting in a limited range of motion for the head that causes

infants to look in one direction and to rest on the same spot of the back of the head. If left

untreated, children with these abnormal head shape conditions may experience a number

of medical issues in their lives, ranging from social problems due to abnormal appearance

to delayed neurocognitive development [18, 77].

1.2 Problem Statement

Motivated by collaborations with researchers at SCHCC, this thesis develops 3D shape

representation methodologies that can be used for 3D shape classification, retrieval, andquantification. The methodologies provide flexibility to generalize usage for b oth specific

medical datasets and general 3D objects. The following three general problems are tackled.

Problem 1: 3D shape quantification

Given surface mesh Si, which consists ofn points and information regarding the connec-

tivity of the points, the goal is to analyze and describe the shape Si by constructing a

numeric representation of meshSi commonly referred to as a signature or descriptor Di. A

quantitative score may also be calculated from the obtained signature.

Problem 2: 3D shape classification

Given a database of 3D shapesS={S1, S2,...,SN}that have been quantified and described

using their respective numeric signatures Di, 1 i N, and are pre-classified into a num-

ber ofCclasses, the goal is to create an algorithm that can be used to determine to which


22/139

4

class a new 3D object Qbelongs.

Problem 3: 3D shape retrieval

Given a database of 3D shapesS= {S1, S2,...,SN}that have been quantified and described

using their respective numeric signature Di, 1 i N, the goal is to create an algorithm

that retrieves all objects in S that are similar to a query object Q based on their numeric

signatures.

1.3 Thesis Outline

Chapter 2 discusses the literature related to the two main classes of research in this thesis:

3D object descriptor in the computer vision literature and craniofacial assessment in the

medical literature. The datasets used to develop and test the methodology are described in

Chapter 3. Chapter 4 explains the base framework for feature extraction. The method for

learning the salient points of a 3D object is explained and applied to different applications

in Chapter 5. Two different types of 3D object descriptors are introduced and analyzed in

Chapters 6 and 7. Chapter 6 describes the 2D longitude-latitude salient map signature and

investigates its application for classification and retrieval of both general 3D objects and3D medical data. Chapter 7 covers the global 2D azimuth-elevation angles descriptor and

investigates its application for classification of deformational plagiocephaly and 22q11.2DS

datasets. A learning framework for quantification using genetic programming is described

in Chapter 8. Finally, Chapter 9 provides a summary and suggests p ossible future research

directions.


23/139

5

Chapter 2

RELATED LITERATURE

In this chapter, two main classes of research related to the work in this thesis are

described: 3D shape descriptors for general objects from the computer vision literature

and medical studies from the craniofacial literature.

2.1 3D Descriptors for General Objects

Three-dimensional shape analysis and its application in 3D object retrieval and classification

has received increased attention in the past few years. There have been several survey

papers on the topic [81, 82, 26, 95, 20, 36, 13, 14, 12, 52, 69]. Starting in 2006, researchers

in the area have taken the initiative to organize an annual 3D shape retrieval evaluation

contest called SHREC (SHape REtrieval Contest), currently organized by the Network of

Excellence AIM@SHAPE. The contests general objective is to evaluate the effectiveness of

3D shape retrieval algorithms. Participants register for the contest before the test set is

made available. The participants are given 48 hours to apply their 3D retrieval algorithm

to the test set and submit their retrieval results to the organizers. The retrieval results are

evaluated using measurements that relate to precision and recall. The average performance

of each method over a set of queries is calculated to obtain an overall impression of the

algorithms performance. Using a common test set and queries allows a direct comparison

of the different algorithms. The contest started with a single-track using only the Princeton

benchmark database as the test set and has evolved into a multi-track contest. The tracks

now include retrieval of watertight models, CAD models, protein models, 3D face models,

and partial matching. Results of the contest show that no one descriptor performs the best

for all kinds of retrieval and classification task. Each descriptor has its own strength and

weakness for the different queries and tasks.

There are three broad categories of 3D object representation: feature-based methods,


24/139

6

graph-based methods, and view-based methods.

2.1.1 Feature-based methods

Feature-based 3D object descriptors, which are the most popular, can be further catego-

rized into: (1) global features, (2) global feature distributions, (3) spatial maps, and (4)

local features. Early work on 3D ob ject representation and its application to retrieval and

classification focused more on the global features and global feature distribution approaches.

Global features computed to represent 3D ob jects include area, volume and moments. Elad

et al. [22] computed the moments properties of the object and used the vector value of the

moments as a descriptor for the object. Osada et al. [62] calculated a number of global

shape distributions to represent 3D objects. The shape functions measured included the

angle between three random points (A3), the distance between a point and a random point

(D1), the distance between two random points (D2), the area of the triangle between three

random p oints (D3), and the volume between four random p oints on the surface (D4).

Ohbuchiet al. [59] enhanced the D2 shape function by measuring not only the distance, but

also the mutual orientation of the surfaces on which the pair of points is located. Zaharia

et al. [96] introduced a 3D shape spectrum descriptor that computed the distribution of the

shape index of the points over the whole mesh. Similar distributions were also calculated for

other surface properties such as curvature. Some recent works continue to use the feature

distribution approach. Mahmoudiet al. [53] computed the histogram of pairwise diffusion

distances between all points, while Ion et al. [35] defined their descriptor as the histogram

of the eccentricity transform. The histogram uses the maximum geodesic distance from a

point to all other points on the surface. The global feature methods are computationally

efficient, as they reduce the computation space of the 3D object by describing the object

with fewer dimension; however these methods are not discriminative enough when the ob-

jects have small differences as in intra-class retrieval cases or classification of very similar

objects.

Spatial map representations describe the 3D object by capturing and preserving physical

locations on them. Saupe et al. [71] described a spherical extent function by calculating


25/139

7

the maximal extent of a shape across all rays from the origin. They compared two different

kinds of representation of the function: using spherical harmonics and moments. Theirresults showed that using spherical harmonics to represent the function performed better.

The spherical harmonic coefficients reconstruct an approximation of the object at different

resolutions. Kazhdanet al.[39] used this idea to show that spherical harmonics can be used

to transform rotation-dependent shape descriptors into rotation-independent ones without

the need to pose normalize the objects in advance. Their results showed that the applica-

tion of the spherical harmonic representation improved the performance of most spherical

function descriptors. Laga et al. [44, 43] uniformly sampled points on a unit sphere and

used spherical wavelet transforms to represent 3D objects. Spherical wavelet descriptors are

natural extensions of 3D Zernike moments and spherical harmonics; they offer better feature

localization and rotation invariance since spherical harmonics analysis has singularities at

each pole of the sphere.

Wavelets are basis functions that represent a given signal at multiple resolutions. Laga

investigated both second generation wavelets, including linear and butterfly spherical wavelets

with a lifting scheme, and image wavelets with spherical boundary extension rules for con-

structing the shape descriptor [73, 68]. He proposed three descriptors based on the sphericalwavelets: using the coefficients as feature vectors, using the L1 energy of the coefficients,

and using the L2 energy of the coefficients. Zhenbaoet al. [51] compared their multireso-

lution wavelet analysis to the spherical wavelet descriptor and showed that their descriptor

performed slightly b etter. Their method characterized the shape orientation of the object

by setting six view planes and sampled the shape orientation from each of the view planes.

They then performed multiresolution wavelet analysis on each of the view planes and used

the wavelet coefficients for each of the view planes as the feature vector. Assfalget al. [5]

captured the shape of a 3D object using the curvature map of the objects surface. One of

the methods developed in this thesis is quite related to this approach, however it differs in

that it does not use the curvature information directly. Lastly, Tangelderet al. [80] devel-

oped a 3D spatial map by dividing the 3D object into a 3D grid with cells of equal sizes

and measuring the curvature property in each cell.

Recent research is beginning to focus more on the local approach to representing 3D


26/139

8

objects, as this approach has a stronger discriminative power when differentiating objects

that are similar in overall shape [63]. Local features are often points that are consideredto be interesting or salient on the 3D ob ject. These points are computed in various ways.

Some methods randomly select points on the surface of the object. Fromeet al. [25] who

developed a 3D shape context and Johnson et al. [37] who designed spin image descriptors,

both randomly selected points as their basis points. Shilane et al. [75, 76] used random

points with harmonic shape descriptors at four different scales. Most other methods use

the local geometric properties of the 3D object such as curvature or normals to describe the

points on the surface of the object, and define the level difference extrema as the salient

points. Leeet al. [46] used mean curvature properties with the center-surround mechanism

to identify the extrema as final salient points. A similar method was adopted by Li et

al. [47, 48] who found the reliable salient points by considering a set of extrema for a scale-

space representation of a point-based input surface and used the locations of level difference

extrema as the salient feature points. Unnikrishnan et al. [83] presented a multi-scale

interest region detector that captures variation in shape at a point relative to the size of its

neighborhood. Their method used the the extrema of the mean curvature to identify the

salient points. Watanabe et al. [90] used salient extrema of the principal curvatures along

the curvature lines on the surface. Castellaniet al. [15] proposed a new methodology for

detecting and matching salient points based on measuring how much a vertex is displaced

after filtering. The salient points are described using a local description based on a hidden

Markov model.

Ohbuchi et al. [60] rendered multiple views of a 3D model and extracted local features

from each view using the SIFT algorithm. The local features were then integrated into a

histogram using a bag-of-features approach to retrieval. Novatnacket al. [58] [57] extracted

corners and edges of a 3D model by first parameterizing the surface of a 3D mesh model on

a 2D map and constructing a dense surface normal map. They then constructed a discrete

scale-space by convolving the normal map with Gaussian kernels of increasing standard

deviation. The corners and edges detected at individual scales were combined into a unified

representation for the 3D ob ject. Akagunduz et al. [2] used a Gaussian pyramid at several

scales to extract the surface extrema and represented the points and their relationships by


27/139

9

a graphical model. Taati et al. [79] generated a local shape descriptor based on invariant

properties extracted from the principal component space of the local neighborhood around apoint. The salient p oints were selected based on ratios of basic dispersion properties. Other

examples of local descriptors include spin images [37, 4], point signature [17], and symbolic

signatures [70]. Some efforts have also been made in combining both the local and global

properties of the object. Alosaimiet al. [3] combined the information in a 2D histogram and

used the PCA coefficients of the histograms concatenated to form a single feature vector.

Liuet al. [50, 49] represented a global 3D shape as the spatial configuration of a set of local

features. The spatial configuration was represented by computing the distributions of the

Euclidean distances between pairs of local shape clusters, represented by spin images.

2.1.2 Graph-based methods

While feature-based methods use only the geometric properties of the 3D model to define

the shape of the object, graph-based methods use the topological information of the 3D

object to describe its shape. The graph that is constructed shows how the different shape

components are linked together. The graph representations include model graphs, Reeb

graphs, and skeleton graphs. These methods are known to be computationally expensive

and sensitive to small topological changes. Sundaret al. [78] used the skeletal graph as a

shape descriptor to encode both geometric and topological properties of the 3D object. The

similarity measures between two objects were approximated using a greedy algorithm for

bipartite graph matching. Hilagaet al. [30] introduced the use of Reeb graphs for matching

the shapes of articulated models.

2.1.3 View-based methods

The most effective view-based shape descriptor is the LightField descriptor developed by

Chen et al. [16]. A light field around a 3D object is a 4D function that represents the

radiance at a given 3D point in a given direction. Each 4D light field of a 3D object is

represented as a collection of 2D images rendered from a 2D array of cameras distributed

uniformly on a sphere. Their method places the light field cameras on 20 vertices of a


28/139

10

regular dodecahedron and uses orthogonal projection to capture 10 different silhouettes of

the 3D model. Ten different rotations are performed to capture a set of light field descriptorto improve robustness for rotation. The 100 rendered images are then described using

Zernike moments and Fourier descriptors to describe the region shape and contour shape,

respectively of the 3D model. The retrieval of the 3D models is performed in stages where

objects that are greatly dissimilar to the query model are rejected early in the process. This

is done by comparing only a subset of the light field descriptors of the query and of the

database objects in the first few stages of the retrieval process. The light field descriptor was

evaluated to be one of the best performing descriptors in the SHREC competition. Ohbuchi

et al. [60] used a similar view-based approach to the light field descriptor. However, their

method extracted local features from each rendered image using the SIFT algorithm. Wang

et al.[89] improved the space usage efficiency of the LFD descriptor by projecting a number

of uniformly sampled random points along six directions to create six images that are then

described using Zernike moments. They also used a two-stage retrieval method to speed

up the retrieval process. Experimental results on the Princeton shape benchmark database

showed that their methods performance was comparable to the LFD descriptor for some

categories. Vajramushti et al. [84] employed a combination of a view-based depth-buffer

technique and a feature-based volume descriptor for 3D matching. Their method used the

voxel volume of the objects to reduce the search space for the depth-buffer comparisons.

Vranic [87] evaluated a composite descriptor called DESIRE that was formed using depth-

buffer images, silhouettes and ray extents of a 3D object. His results showed that DESIRE

outperformed LFD in retrieving objects of some categories.

It is important to note that most of these existing 3D object descriptors were developed

and tested to describe general 3D objects with high shape variability, and not medical

datasets, which usually have small shape variations. As shown in the analysis section, they

usually do not perform very well in describing medical datasets. This thesis proposes a

feature-based approach that uses a learning methodology to identify the interesting salient

points on the object, discussed in Chapter 5, and creates a global spatial map of the salient

points patterns, described in Chapter 6. The proposed descriptor is tested and shown to

work well for general 3D objects and to outperform other methods on craniofacial medical


29/139


30/139

12

Figure 2.1: Anthropometric landmarks on patients head. These images were published byKelly et al. [40]

shape [score 0], 2) mild shape deformation [score 1], 3) moderate shape severity [score

2], and 4) severe shape deformation [score 3]. When assigning the severity score for a

patient, a clinical expert matches the patients skull shape to the most similar template

and assigns the score corresponding to that template. This technique is currently used

by practitioners using the Dynamic Orthotic Cranioplasty Band (DOC Band) helmet as a

treatment method [33, 34].

Instead of taking physical measurements directly on a patients head, some techniques

take the measurements from photographs of the patients head. This approach is less intru-

sive for young patients, but it is still time consuming and can be inconsistent as techniciansmust manually place landmarks on the photographs. Hutchisonet al. [31, 32] developed

a technique called HeadsUp that involves taking the top view digital photograph of infant

heads fitted with an elastic head circumference band. The elastic band is equipped with

adjustable color markers to identify landmarks such as ear and nose positions. The result-

ing photograph is then automatically analyzed to obtain quantitative measurements for the

head shape, including cephalic index, head circumference, distance of ear to center of nose,

oblique length and ratio. Their results showed that the cephalic index (CI) and Oblique

Cranial Length Ratio (OCLR) can be used for quantification measurement of shape sever-

ity, as the numbers differ significantly between cases and control. Although promising, the

Hutchison method requires subjective decisions regarding the placement of the midline and

ear landmarks and the selection of the posterior point of the OCLR lines. In addition, as

the measurements are done in two dimensions, displacement of head volumes cannot really

be assessed. In addition, the placing of the band on an infant can be quite challenging.


31/139

13

Zonenshaynet al. [98] also employed a headband with two adjustable points (nasion and

inion of the head) and used photographs of the headband shape to calculate the CranialIndex of Symmetry (CIS). These methods require consistency in setting up the band and

placing the markers, which may lead to non-reproducible results. In addition, this is a 2D

technique, but plagiocephaly and brachycephaly are three-dimensional deformations.

Vlimmerenet al. [85] introduced a new method called plagiocephalometry to assess the

asymmetry of the skull. The method uses a thermoplastic material to mold the outline of

a patients skull. The ring is p ositioned around the head at the widest transverse circum-

ference. Three landmarks for the ears and nose are marked on the ring. The ring is then

copied onto a paper and transparent sheet made to keep track of follow-up progress.

Measurement techniques that use full 3D head shape information can provide more

detailed and accurate shape information. Plank et al.[67] used a noninvasive laser shape

digitizer to obtain the 3D surface of the head. This system provides more accurate shape

information, but still requires the use of markers to define an anatomical reference plane for

further quadrant placement and volume calculations. Lanche et al. [45, 61] used a stereo-

camera system to obtain 3D model of the head and developed a statistical model of the

asymmetry to quantify and localize the asymmetry at each patients head. The model wasobtained by first computing the asymmetry of a patients head by deforming a symmetric

ideal head template to the patients head to obtain point correspondence between the left

and right sides of the head. Principal Component Analysis was then performed on the

vector of the asymmetry values of all patients head to obtain a statistical model.

2.2.2 22q11.2 Deletion Syndrome

Similar to the assessment of deformational plagiocephaly, the assessment of 22q11.2 deletion

syndrome has commonly been through physical examination combined with craniofacial

anthropometric measurements. There have been very few automated methods for analyzing

22q11.2DS. Boehringer et al. [11] used Gabor wavelet to transform 2D photographs of

individuals with 10 different facial dysmorphic syndromes. Their method then applied

principal component analysis to describe and classify the dataset. Their method required


32/139

14

landmark placement on the face.

Hammondet al. [29] used the Dense Surface Model method. Landmarks were manuallyplaced on each of the 3D surface mesh, and used to align the faces to a mean face. Principal

component analysis was then used to describe the datasets, and the coefficients were used

to classify dataset. Neither of these two methods are fully automatic as they require manual

landmark placement.

One of the method proposed in this thesis to represent craniofacial dysmorphologies uses

3D surface mesh models of heads without the need for markers or templates. The method

uses the surface normal vectors of all the 3D points on the head and constructs a global 2D

histogram of the azimuth-elevation angles of the surface normal vectors of the 3D points

on the face. The proposed method is general enough to characterize different craniofacial

disorders including deformational plagiocephaly and its variations and 22q11.2DS and its

different manifestation.


33/139

15

Chapter 3

DATASETS

This chapter will describe the four datasets that were obtained to develop and test

the different shape analysis methodologies developed for this thesis. Each dataset has

different characteristics that help explore the different properties of the methodologies. The

22q11.2DS dataset, introduced in Section 3.1, contains 3D face models of individuals affectedand unaffected by 22q11.2 deletion syndrome. The Deformational Plagiocephaly dataset,

discussed in Section 3.2, contains 3D head models of individuals affected and unaffected by

deformational plagiocephaly. The Heads dataset, discussed in Section 3.3, contains head

shapes of different classes of animals, including humans. These three datasets help explore

the performance of the methodology on data of similar overall shape with subtle distinctions

- the type of data for which the methodology was designed and developed. Section 3.4

introduces the SHREC 2008 classification benchmark dataset, which was obtained to further

test the performance of the methodology on general 3D object classification, where objects

in the dataset are not very similar.

3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset

The 3D face models in this dataset were collected at the Craniofacial Center of Seattle

Childrens Hospital using the 3dMD imaging system [1]. The 3dMD imaging system uses

four camera stands, each containing three cameras. Stereo analysis yields twelve range maps

that are combined using 3dMD proprietary software to yield a 3D mesh of an individuals

head and a texture map of the face. The methodologies developed for this thesis use only

the 3D meshes, due to human subject regulations.

An automated system developed by Wilamowska [92, 74] to align the pose of each mesh

was employed. The alignment system uses symmetry to align the yaw and roll angles and

a height differential to align the pitch angle. Although faces are not truly symmetrical,


34/139

16

Figure 3.1: Example of 3D face mesh data of children with 22q11.2 deletion syndrome.

the pose alignment procedure can be cast as finding the angular rotations of yaw and roll

that minimizes the difference between the left and right sides of the face. The pitch of the

head was aligned by minimizing the difference between the height of the chin and the height

of the forehead. In some cases, manual adjustments were necessary to pose normalize the

faces. Figure 3.1 shows two examples of affected individuals in the dataset.

The dataset contained 3D meshes for 189 individuals. Metadata for each 3D mesh

consisted of the age, gender, and self-described ethnicity of the individual plus a label of

affected or unaffected. The dataset consisted of 53 affected individuals and 136 control indi-

viduals. The groundtruth for the individuals label for 22q11.2DS was determined through

laboratory confirmation.

A balanced dataset was created from the original dataset. The balanced dataset con-

sisted of 86 individuals: 43 affected and 43 unaffected with 22q11.2 deletion syndrome. Each

of the 86 individuals were assessed by three craniofacial experts. Frontal and profile images

of the individuals were de-identified and viewed in random order to blind raters. The ex-

perts assigned discrete scores to a total of 18 facial features that are known to characterize

22q11.2DS (score 0 = none, 1 = moderate, 2 = severe). Nine of the facial features (midface

hypoplasia, prominent nasal root, bulbous nasal tip, small nasal alae, tubular nose, small

mouth, open mouth, downturned mouth, and retrusive chin) are further analyzed in Chap-

ter 8. The experts survey showed that all features of the nose were found to have a higher

percentage of moderate and severe expression in 22q11.2DS affected individuals. Midface

hypoplasia was observed to be moderately present in affected individuals [91].


35/139

17

Figure 3.2: Tops of heads of children with deformational plagiocephaly.

3.2 Deformational Plagiocephaly Dataset

The dataset for analyzing the shape dysmorphology due to deformational plagiocephaly was

obtained through a similar data acquisition pipeline as the 22q11.2DS dataset. The resulting

3D meshes are also automatically pose-normalized using the same alignment system used

to normalize the 22q11.2DS dataset [92, 74]. Figure 3.2 shows two examples of individuals

diagnosed with deformational plagiocephaly.

The original dataset consisted of 254 3D head meshes consisting of 100 controls and

154 cases. Each mesh in the original dataset was assessed by two craniofacial experts who

assigned discrete severity scores based on the degree of the deformation severity of different

head areas including back of the head, forehead asymmetry, ear asymmetry, and whether

the flattening at the back of the head was symmetric (case of brachycephaly). In addition,

each expert also noted an overall severity score. The discrete scores were either category 0

for normal, 1 for mild, 2 for moderate and 3 for severe. The laterality of the flatness was

indicated using negative scores to represent left sided deformation and positive scores to

represent right sided deformation.

The work in this thesis focuses on the flattening at the back of the head noted as

posterior plagiocephaly. Since there does not exist any gold standard for assessing the

severity of posterior plagiocephaly, the experts ratings were considered the gold standard

in evaluating the different severity scores developed. The inter-rater agreement between

the two experts was only 65%. As a result, participants were excluded if (1) the two


36/139

18

experts assigned discrepant posterior flattening scores, or (2) the classification based on

expert ratings differed from the clinical classification (case or control) assigned at the timeof enrollment. The final dataset used to investigate posterior plagiocephaly consisted of 140

infants including 50 controls (by definition in category 0 by expert rating) and 90 cases: 46

in category 1 or -1, 35 in category 2 or -2, and 9 in category 3 or -3.

3.3 Heads Dataset

For the Heads dataset, the digitized 3D objects were obtained by scanning hand-made clay

toys using a Roldand-LPX250 laser scanner with a maximal scanning resolution of 0.008

inches for plane scanning mode [70]. Raw data from the scanner consisted of 3D point clouds

that were further processed to obtain smooth and uniformly sampled triangular meshes of

0.9-1mm resolution. To increase the number of objects for training and testing, new objects

were created by deforming the original scanned 3D models in a controlled fashion using 3D

Studio Max software [8]. Global deformations of the models were generated using morphing

operators such as tapering, twisting, bending, stretching and squeezing. The parameters for

each of the operators were randomly chosen from ranges that were determined empirically.

Each deformed model was obtained by applying at least five different morphing operatorsin a random sequence.

Fifteen objects representing seven different classes were scanned. The seven classes are:

cat head, dog head, human head, rabbit head, horse head, tiger head and bear head. Each

of the fifteen original objects were randomly morphed to increase the size of the dataset.

A total of 250 morphed models per original object were generated. Points on the morphed

model are in full correspondence with the original models from which they were constructed.

Figure 3.3 shows examples of objects from each of the seven classes, while Figure 3.4 shows

example of morphs from the horse class.

3.4 SHREC Dataset

The SHREC dataset was selected from the SHREC 2008 Competition Classification of

Watertight Models track [27]. The models in the track were chosen by the organizer to

ensure a high level of shape variability to make the track more challenging. The models


37/139

19

cat dog human rabbit horse tiger bear

Figure 3.3: Example of objects in the Heads dataset.

Figure 3.4: Example morphs from the horse class. Morphs were generated by stretching,twisting, or squeezing the original object with different parameters.

in the dataset were manually classified using three different levels of categorization. At

the coarse level of classification, the objects were classified according to both their shapes

and semantic criteria. At the intermediate level, the classes were subdivided according to

functionality and shape. At thefine level, the classes were further partitioned based on the

object shape. For example, at the coarse level some objects were classified into the furniture

class. At the intermediate level, these same objects were further divided intotables, seats

andbeds. At the fine level, the objects were classified intochairs,armchairs,stools,sofaand

benches. The intermediate level of classification was chosen for the experiments as the fine

level had too few objects per class, while the coarse level had too many objects that were

dissimilar in shape grouped into the same class. The dataset consists of 425 pre-classified

objects. Figure 3.5 shows examples of objects in the benchmark dataset.

The four datasets were used to test the classification and retrieval methodologies de-

veloped in this thesis. The domain-independent base framework of the methodologies is

described next in Chapter 4.


38/139

20

human animal knots airplane bottle chess teapot

Figure 3.5: Example of objects in the SHREC 2008 Classification dataset. It can be seenthat the intra-class variability in this dataset is quite high as objects in the same class havequite different shapes.


39/139

21

Chapter 4

BASE FRAMEWORK

The methodologies developed in this thesis are used for single 3D object classification.

They do not handle objects in cluttered 3D scenes nor occlusion. A surface mesh, which

represents a 3D object, consists of points {pi} on the objects surface and information

regarding the connectivity of the p oints. The base framework of the methodology startsby rescaling the ob jects to fit in a fixed-size bounding box. The framework then executes

two phases: low-level feature extraction (Section 4.1) and mid-level feature aggregation

(Section 4.2). The low-level feature extraction starts by applying a low-level operator to

every point on the surface mesh. After the first phase, every pointpi on the surface mesh

will have either a single low-level feature value or a small set of low-level feature values,

depending on the operator used. The second phase performs mid-level feature aggregation

and computes a vector of values for a given neighborhood of every point pi on the surface

mesh. The feature aggregation results of the base framework are then used to construct the

different 3D object representations [7, 6].

4.1 Low-level Feature Extraction

The low-level operators extract local properties of the surface points by computing a feature

value vi for every point pi on the mesh surface. All low-level feature values are convolved

with a Gaussian filter to reduce noise effects. Three low-level operators were implemented

to test the methodologys performance: absolute Gaussian curvature, Besl-Jain curvature

categorization, and azimuth-elevation of surface normal vectors. Figure 4.1(a) shows an

example of the absolute Gaussian curvature values of a 3D model. Figure 4.1(b) shows

the results of applying a Gaussian filter over the low-level Gaussian curvature values, while

Figure 4.1(c) shows the results of applying the Gaussian filter over the low-level Besl-Jain

curvature values.


40/139

22

(a) (b) (c)

Figure 4.1: (a) Absolute Gaussian curvature low-level feature value, (b) Smoothed AbsoluteGaussian curvature values after convolution with the Gaussian filter, (c) Smoothed Besl-Jain curvature values after convolution. Higher values are represented by cool (blue) colors,while lower values are represented by warm (red) colors.

4.1.1 Absolute Gaussian Curvature

The absolute Gaussian curvature low-level operator computes the Gaussian curvature esti-

mation Kfor every point p on the surface mesh:

K(p) = 2 fF(p)

interior anglef

where F is the list of all the neighboring facets of point p and the interior angle is the

angle of the facets meeting at point p. This calculation is similar to calculating the angle

deficiency at point p. The contribution of each facet is weighted by the area of the facet

divided by the number of points that form the facet. The operator then takes the absolute

value of the Gaussian curvature as the final low-level feature value for each point.

4.1.2 Besl-Jain Curvature

Besl and Jain [10] suggested surface characterization of a point p using only the sign of the

mean curvature H and Gaussian curvature K. These surface characterizations result in a

scalar surface feature for each point that is invariant to rotation, translation and changes

in parametrization. The eight different categories are: (1) peak surface, (2) ridge surface,

(3) saddle ridge surface, (4) plane surface, (5) minimal surface, (6) saddle valley, (7) valley


41/139

23

surface, and (8) cupped surface. Table 4.1 lists the different surface categories with their

respective curvature signs.

Table 4.1: Besl-Jain surface characterization.

Label Category H K

1 Peak surface H 0

2 Ridge surface H


42/139

24

Figure 4.2: Azimuth and elevation angle of a 3D surface normal vector.

Figure 4.3: (a) 1D histogram aggregating the absolute Gaussian curvature values frompoints on the nose of a human head, (b) 2D histogram aggregating the azimuth-elevationvector values at a point on the back of the head.

object size, and that the results are comparable across different objects. The value ofc was

determined empirically; for most experiments a value of c = 0.05 was used. Aggregating

the single-valued low-level feature values results in a 1D histogram with d histogram bins

for every point on the surface mesh. Aggregating the pair-valued low-level feature values

(such as the azimuth-elevation angle feature values) results in a 2D histogram constructed

ofab bins, where a and b are the two different dimension sizes. Figure 4.3(a) shows an

example of a 1D histogram aggregating the absolute Gaussian curvature low-level feature

values from points on the nose of a 3D head object. Figure 4.3(b) shows an example of the

2D histogram aggregating the azimuth-elevation low-level feature values on a head.


43/139

25

Once the feature extraction and aggregation are completed, a learning phase is used

to learn the characteristics of salient points for classification and retrieval as described inChapter 5.


44/139

26

Chapter 5

LEARNING SALIENT POINTS

Given the base frameworks ability to compute low-level feature values at each point of

a 3D mesh and to aggregate these features in neighborhoods about the point, this chap-

ter explores the use of this framework to create a representation for 3D objects. Before

constructing the 3D object signature, salient or interesting points are identified on the 3Dobject and the characteristics of these points are used when constructing the signatures.

The identified salient points are application dependent. The framework and methodology

was developed to be specifically applicable to classification of craniofacial disorders, such

as 22q11.2 deletion syndrome, discussed in Section 5.1, and deformational plagiocephaly,

described in Section 5.2, but also appropriate for general use in 3D shape classification, as

shown in Section 5.3.

Preliminary saliency detection using existing methods [46, 38] were not satisfactory. In

some cases they were not consistent and repeatable for ob jects within the same class. As

a result, to find salient points on a 3D object, a learning approach was selected. A salient

point classifier is trained on a set of marked training points on the 3D objects provided by

experts for a particular application. Histograms of low-level features of the training points

obtained using the base framework (Chapter 4) are then used to train the classifier. For

a particular application, the classifier will learn the characteristics of the salient points on

the surfaces of the 3D objects from that domain. Sets of detected points will lead to salient

regions in the signatures.

5.1 Learning Salient Points for 22q11.2 Deletion Syndrome

Traditionally, studies of individuals with craniofacial disorders such as 22q11.2 deletion syn-

drome have been performed through in-person clinical observation coupled with craniofacial

anthropometric measurements derived from anatomic landmarks [24]. These landmarks are


45/139

27

Figure 5.1: Craniofacial anthropometric landmarks.

located either visually by clinicians or through palpation of the skull. Figure 5.1 shows the

landmark points that are commonly used for craniofacial measurements.

The salient point classifier was trained on a subset of the craniofacial anthropometric

landmarks marked on 3D head objects. This was done so that these craniofacial landmarks

would be included in the set of interesting or salient points for classification of the cranio-

facial disorders. The particular subset of landmarks was selected to b e well-defined points

that both experts and non-experts could easily identify. The training set consisted of human

heads selected from the Heads database. Figure 5.2 shows an example of manually marked

salient points on the training data. Histograms of low-level features obtained using the base

framework were used to train a Support Vector Machine (SVM) [72, 86] classifier to learn

the salient p oints on the 3D surface mesh. WEKAs implementation of SVM was used for

all experiments [93]. A training set, consisting of 75 morphs of 5 human heads was used to

train the classifier to learn the characteristics of the salient points for faces in terms of the

histograms of their low-level features.

Although the salient training points were selected only to be commonly used craniofa-

cial landmark points, empirical studies determined that the classifier actually finds salient

regions with a combination of high curvature and low entropy values. This result can be

observed in the different histograms of salient and non-salient p oints in Figure 5.3. In the

figure, the salient point histograms have mainly low bin counts in the bins corresponding

to low curvature values and a high bin count in the last (highest) curvature bin. The

non-salient point histograms have mainly medium to high bin counts in the low curvature


46/139

28

Figure 5.2: Example of manually marked salient (blue color) and non-salient (red color)points on a human head model. The salient points include corners of the eyes, tip of thenose, corners of the nose, corners of the mouth, and chin.

bins and in some cases a high bin count in the last bin. The entropy of the salient point

histograms also tends to be lower than the entropy of the non-salient point histograms. The

classifier approach avoided the use of brittle thresholds.

Figure 5.4 shows results of the salient points predicted on two faces in the 22q11.2DS

database, which include not just the manually marked points but other points with the same

characteristics. The salient points are colored according to the assigned classifier confidence

score. Non-salient points are colored in red, while salient points are colored in different

shades of blue with dark blue having the highest prediction score.

5.2 Learning Salient Points for Deformational Plagiocephaly

A similar learning-based approach was used to find salient points for 3D heads with de-

formational plagiocephaly. The salient p oint classifier for deformational plagiocephaly was

trained on a set of points marked on the flat areas at the back of the head of individuals with

deformational plagiocephaly. The training salient points consisted of 10 marked points on

the flat areas of 10 heads with deformational plagiocephaly, while the non-salient training

points were selected from 10 heads without deformational plagiocephaly. Histograms of the

azimuth-elevation low-level features obtained using the base framework were used to train a

Support Vector Machine (SVM) classifier to learn the salient p oints on the 3D heads. After


47/139

29

E = 0.348 E=2.435 E=2.79

Salient point histograms

E=3.95 E=3.877 E=4.185

Non-salient point histograms

Figure 5.3: Example histograms of salient and non-salient points. The salient point his-tograms have a high value in the last bin illustrating a high curvature in the region, whilelow values in the remaining bins in the histogram. The non-salient point histograms have

more varied values in the curvature histogram. In addition, the entropy E of the salientpoint histogram is lower than the non-salient point histogram (listed under each histogram).

Figure 5.4: Salient point prediction for two faces in the 22q11.2DS dataset. Non-salientpoints are colored in red, while salient points are colored in different shades ranging fromgreen to blue, depending on the classifier confidence score assigned to the point. A threshold(T = 0.95) was applied to include only salient points with high confidence scores.


48/139

30

training was complete, the classifier was able to label each point on a 3D head as either

salient or non-salient and provide a confidence score for each decision. The same threshold,T = 0.95, was applied to the confidence scores for the salient points.

5.3 Learning Salient Points for General 3D Objects

The salient point classifier for general 3D object classification was trained on selected objects

from the Heads database using the craniofacial landmark points that were used in the

22q11.2DS application. A small training set consisting of 25 morphs of the cat head model,

25 morphs of the dog head model, and 50 morphs of human head models was used to train

the classifier to learn the characteristics of salient points for general 3D object classification.

Histograms of low-level features obtained using the base framework were used to train a

Support Vector Machine (SVM) classifier to learn the salient points on general 3D objects.

A threshold T = 0.95 was also applied to the confidence scores for the classifier salient

points. Figure 5.5 shows results of the salient p oints predicted on instances of the cat,

dog and human head class in the Heads, which include, as previously mentioned, not just

the manually marked points, but other points with the same characteristics. The salient

points are colored according to the assigned classifier confidence score. Non-salient pointsare colored in red, while salient points are colored in different shades of blue with dark blue

having the highest prediction score. While the classifier was only trained on cat heads, dog

heads, and human heads, it does a good job of finding salient points on the other classes

of heads, and the 3D patterns produced are repeatable across objects of the same class.

Figure 5.6 shows the predicted salient points on new object classes that were not included

in the training phase.

The trained classifier was also tested on the SHREC 2008 Classification dataset. Exper-

imental results show the labeled salient points were quite satisfactory. Figure 5.7 shows the

salient points predicted on a number of objects from the SHREC 2008 database. Note that

on this database, which has a lot of intra-class shape variance, the salient point patterns

are not consistent across all members of each class.

After learning and identifying the application-dependent salient points for the 3D ob-

jects in the dataset, the signature for each 3D object is constructed as described next in


49/139

31

(a) (b) (c)

Figure 5.5: Salient point prediction for (a) cat head class, (b) dog head class, and (c) human

head class. Non-salient points are colored in red, while salient points are colored in differentshades ranging from green to blue, depending on the classifier confidence score assigned tothe point. A threshold (T = 0.95) was applied to include only salient points with highconfidence scores.

(a) (b) (c)

Figure 5.6: Salient p oint prediction for (a) rabbit head class, (b) horse head class, and (c)leopard head class from the Heads database. Even though all three classes were not includedin the training, the training model was able to predict salient points across the classes.


50/139

32

(a) (b) (c) (d)

Figure 5.7: Salient point prediction for (a) human class, (b) bird class, (c) human hand

class, and (d) bottle class from the SHREC 2008 database. Note that for classes that havea lot of intra-class shape variance the salient point patterns are not consistent across allmembers of those classes as seen in column (a).


51/139

33

Chapter 6.


52/139

34

Chapter 6

2D LONGITUDE-LATITUDE SALIENT MAP SIGNATURE

Most 3D object analysis methods require the use of a 3D descriptor or signature to

describe the shape and properties of the 3D objects. This chapter describes the construc-

tion of the 3D object signature using the salient point patterns, obtained using the learning

approach described in Chapter 5, mapped onto a 2D plane via a longitude-latitude transfor-

mation, described in Section 6.1. Classification of 3D objects is then performed by training

a classifier using the 2D salient maps of the objects. Results of classification using the 2D

salient map signature are given in Section 6.2. Retrieval of 3D objects is performed by

calculating the distances between the salient signature of the query object and the salient

map signatures of all objects in the database. Results of retrieval using the 2D salient map

signature are given in Section 6.3. Section 6.4 investigates how the salient patterns are used

to obtain 2D salient views for 3D object retrieval.

6.1 Salient Point Pattern Projection

Before mapping the salient point patterns obtained in Chapter 5 onto the 2D plane, the

salient points are assigned a label according to the classifier confidence score assigned to

the point. The classifier confidence score range is then discretized into a number of bins.

For the experiments, at confidence level 0.95 and above, the confidence score range was

discretized into 5 bins. Each salient point on the 3D mesh is then assigned a label based on

the bin into which its confidence score falls.

To obtain the 2D longitude-latitude map signature for an object, the longitude and

latitude p ositions of all the 3D points on the objects surface are calculated. Given any

point pi (pix, piy, piz), the longitude position i and latitude position i of point pi are


53/139

35

calculated as follows:

i= arctan(piz

pix) i= arctan( p

iy(p2ix+p

2iz)

)

A 2D map of the longitude and latitude positions of all the points on the objects surface

is created by discretizing the longitude and latitude values of the points into a fixed number

of pixels. A pixel is labeled with the salient point label of the points that fall into that

pixel. If more than one label is mapped to a pixel, the label with the highest count is used

to label the pixel. Figure 6.1 shows the salient point patterns for the cat head, dog head,

and human head model in the Heads database and their corresponding 2D map signatures.Figure 6.2 shows how different objects that belong to the same class will have similar 2D

longitude-latitude signature maps.

(a) (b) (c)

Figure 6.1: Salient p oint patterns on 3D objects of Figure 5.4 and their corresponding 2Dlongitude-latitude map signatures.

To reduce noise in the 2D longitude-latitude map signature, a wavelet transformation

was applied to the 2D map signatures. In the experiments, the 2D longitude-latitude map

signatures were treated as 2D images and decomposed using image-based Haar wavelet


54/139

36

human head

rabbit head

horse head

wildcat head

Figure 6.2: Objects that are similar and belong to the same class will have similar 2D

longitude-latitude signature maps.

function. The wavelet function decomposes the 2D image into approximation and detail

coefficients. The approximation and detail coefficients at the second level were collected and

concatenated into a new feature vector with dimension d= 13134. This final feature

vector became the descriptor for each object in the database and was used for classification

and retrieval. For most experiments, the noise reduction step was not found to improve the

classification and retrieval performances except for the SHREC dataset (Section 6.2.4).

6.2 Classification using 2D Map Signature

By creating a signature for each 3D objects, it is now possible to perform classification of

3D ob jects in a given database. Several classification experiments were performed on each

of the acquired datasets described in Chapter 3.


55/139

37

6.2.1 Classification of 22q11.2DS Dataset

The goal of this experiment was to classify each individual in the dataset as either affected

or unaffected by 22q11.2DS and to measure the classification accuracy. The salient points

classifier was trained on a subset of the craniofacial anthropometric landmarks marked

on 3D human head models as explained in Chapter 5. Table 6.1 shows the classification

performance with two different classifiers: Adaboost and SVM. Evaluation was done using

the following measures: classification accuracy, precision and recall rates, F-measure, true

positive, and false positive rates. The classification accuracy for the higher scoring SVM

classifier is 86.7%, which is higher than that obtained from a study of three human experts

whose mean accuracy was 72.5% [92].

Table 6.1: Classification performance for 22q11.2DS.

Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate

Adaboost 0.804 0.795 0.804 0.791 0.804 0.387

SVM 0.867 0.866 0.868 0.861 0.868 0.27

The classification accuracy of the map signature was compared to some of the state-of-

the-art and best performing 3D object descriptors in the literature. The following existing

descriptors were used for comparison: Light Field Descriptor (LFD) [16], ray-based spherical

harmonics (SPH) [39], shape distribution of distance between random points (D2) [62], and

absolute angle distance histogram (AAD) [59]. The Light Field Descriptor (LFD) is a view-

based descriptor that extracts features from 100 2D silhouette image views and measures

the distance between two 3D objects by finding the best correspondence between the set

of 3D views for the two ob jects. The Spherical Harmonics method calculates the maximal

extent of a shape across all rays from the origin and uses spherical harmonics to represent

the function. The shape function D2 represents 3D objects by calculating the global shape

distribution of distances between two random points, while the AAD method enhances

the D2 shape function by measuring not only the distance between two random points,

but also the mutual orientation of the surfaces on which the pair of points is located.


56/139


57/139

39

6.2.2 Classification of Deformational Plagiocephaly Dataset

The goal of this experiment was to classify each individual as either control or case affected

by the plagiocephaly condition and to measure the classification accuracy. The salient

points for the map signature were obtained by using the salient flat p oint classifier as

explained in Chapter 5. The classification experiments were performed on the Deformational

Plagiocephaly Dataset introduced in Chapter 3.

Table 6.4 shows the classification accuracy of the method on the full 254 individual

dataset. The groundtruth for the classification was the referral doctors originally assigned

patient status: case or control. Table 6.5 shows the classification accuracy of the method

on the trimmed 140-individual dataset in which the experts agreed. The Adaboost classifier

obtains a 80.3% classification accuracy on the full dataset and an improved 87.9% accuracy

on the trimmed dataset.

Table 6.4: Classification performance for plagiocephaly using the full 254 individualsdataset.


Adaboost 0.803 0.805 0.803 0.804 0.803 0.208SVM 0.787 0.787 0.787 0.787 0.787 0.233

Table 6.5: Classification performance for plagiocephaly using the trimmed 140 individualsdataset.


Adaboost 0.879 0.878 0.879 0.878 0.879 0.156

SVM 0.85 0.849 0.85 0.849 0.85 0.19

The classification accuracy of the methodology for this application was also compared

to existing state-of-the-art descriptors. Table 6.6 shows that the 2D salient map signature

achieves higher classification accuracy for deformational plagiocephaly than other existing

methods, including the LFD descriptor and others discussed in Chapter 2.


58/139

40

Table 6.6: Comparison of classification accuracy for plagiocephaly.

Dataset Salient 2D map LFD SPH D2 AAD

Full 254 dataset 0.803 0.72 0.673 0.650 0.685

Trimmed 140 dataset 0.879 0.714 0.743 0.779 0.721

Classification of this condition can be incorporated into epidemiologic research on the

prevalence and long-term outcome of deformational plagiocephaly, which may eventually

lead to improved clinical care for infants with deformational plagiocephaly.

6.2.3 Classification of Heads Dataset

The Heads database can be thought of as a first step toward testing the 2D salient map

signature on more general shapes still in the craniofacial category, but for multiple different

animals where face shapes can be quite different.

In the first set of experiments, all objects in the Heads database were pose-normalized by

rotating the heads to face the same orientation, as was the case for the medical craniofacial

datasets. Classification of the 3D objects in the database was performed by training a SVM

classifier on the salient point patterns of each class using the 2D longitude-latitude map

signature of the ob jects in the class. The classifier was trained using the signatures of 25

objects from each class for all seven classes in the database and tested with a separate test

set consisting of 50 ob jects per class for each of the seven classes. The classifier achieved

100% classification accuracy in classifying all the pose-normalized objects in the database.

Since 3D objects may be encountered in the world at any orientation, rota

atmos u karto phd

Documents