atmos u karto phd
TRANSCRIPT
-
8/12/2019 Atmos u Karto Phd
1/139
3D Shape Analysis for Quantification, Classification, and Retrieval
Indriyati Atmosukarto
A dissertation submitted in partial fulfillment ofthe requirements for the degree of
Doctor of Philosophy
University of Washington
2010
Program Authorized to Offer Degree: Computer Science and Engineering
-
8/12/2019 Atmos u Karto Phd
2/139
-
8/12/2019 Atmos u Karto Phd
3/139
University of WashingtonGraduate School
This is to certify that I have examined this copy of a doctoral dissertation by
Indriyati Atmosukarto
and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final
examining committee have been made.
Chair of the Supervisory Committee:
Linda G. Shapiro
Reading Committee:
Linda G. Shapiro
James F. Brinkley III
Maya Gupta
Date:
-
8/12/2019 Atmos u Karto Phd
4/139
-
8/12/2019 Atmos u Karto Phd
5/139
In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make its copies
freely available for inspection. I further agree that extensive copying of this dissertation isallowable only for scholarly purposes, consistent with fair use as prescribed in the U.S.Copyright Law. Requests for copying or reproduction of this dissertation may be referredto Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,1-800-521-0600, to whom the author has granted the right to reproduce and sell (a) copiesof the manuscript in microform and/or (b) printed copies of the manuscript made frommicroform.
Signature
Date
-
8/12/2019 Atmos u Karto Phd
6/139
-
8/12/2019 Atmos u Karto Phd
7/139
University of Washington
Abstract
3D Shape Analysis for Quantification, Classification, and Retrieval
Indriyati Atmosukarto
Chair of the Supervisory Committee:
Professor Linda G. Shapiro
Computer Science and Engineering
Three-dimensional objects are now commonly used in a large number of applications includ-
ing games, mechanical engineering, archaeology, culture, and even medicine. As a result,
researchers have started to investigate the use of 3D shape descriptors that aim to encapsu-
late the important shape properties of the 3D objects. This thesis presents new 3D shape
representation methodologies for quantification, classification and retrieval tasks that are
flexible enough to be used in general applications, yet detailed enough to be useful in medical
craniofacial dysmorphology studies. The methodologies begin by computing low-level fea-
tures at each point of the 3D mesh and aggregating the features into histograms over mesh
neighborhoods. Two different methodologies are defined. The first methodology begins by
learning the characteristics of salient point histograms for each particular application, and
represents the points in a 2D spatial map based on longitude-latitude transformation. The
second methodology represents the 3D objects by using the global 2D histogram of the
azimuth-elevation angles of the surface normals of the points on the 3D objects.
Four datasets, two craniofacial datasets and two general 3D object datasets, were ob-
tained to develop and test the different shape analysis methods developed in this thesis.Each dataset has different shape characteristics that help explore the different properties of
the methodologies. Experimental results on classifying the craniofacial datasets show that
our methodologies achieve higher classification accuracy than medical experts and existing
state-of-the-art 3D descriptors. Retrieval and classification results using the general 3D ob-
-
8/12/2019 Atmos u Karto Phd
8/139
-
8/12/2019 Atmos u Karto Phd
9/139
jects show that our methodologies are comparable to existing view-based and feature-based
descriptors and outperform these descriptors in some cases. Our methodology can also beused to speed up the most powerful general 3D object descriptor to date.
-
8/12/2019 Atmos u Karto Phd
10/139
-
8/12/2019 Atmos u Karto Phd
11/139
TABLE OF CONTENTS
Page
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 3D Descriptors for General Objects . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3: Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset . . . . . . . . . . . . . . . . 153.2 Deformational Plagiocephaly Dataset . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Heads Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 SHREC Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 4: Base Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Low-level Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Mid-level Feature Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 5: Learning Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1 Learning Salient Points for 22q11.2 Deletion Syndrome . . . . . . . . . . . . . 265.2 Learning Salient Points for Deformational Plagiocephaly . . . . . . . . . . . . 28
5.3 Learning Salient Points for General 3D Objects . . . . . . . . . . . . . . . . . 30
Chapter 6: 2D Longitude-Latitude Salient Map Signature . . . . . . . . . . . . . . 34
6.1 Salient Point Pattern Projection . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
-
8/12/2019 Atmos u Karto Phd
12/139
6.2 Classification using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Retrieval using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4 Retrieval using Salient Views . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Chapter 7: Global 2D Azimuth-Elevation Angles Histogram of Surface NormalVectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1 3D Shape Severity Quantification and Localization for Deformational Plagio-cephaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 Classification of 22q11.2 Deletion Syndrome . . . . . . . . . . . . . . . . . . . 78
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 8: Learning 3D Shape Quantification for Craniofacial Research . . . . . . 838.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 Facial Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3 2D Histogram of Azimuth Elevation Angles . . . . . . . . . . . . . . . . . . . 86
8.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.5 Feature Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 Exp erimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
ii
-
8/12/2019 Atmos u Karto Phd
13/139
LIST OF FIGURES
Figure Number Page
1.1 Example of applications that use 3D objects . . . . . . . . . . . . . . . . . . . 2
2.1 Anthropometric landmarks on patients head . . . . . . . . . . . . . . . . . . 12
3.1 Example of 3D face mesh data of children with 22q11.2 deletion syndrome. . 16
3.2 Tops of heads of children with deformational plagiocephaly. . . . . . . . . . . 17
3.3 Example of objects in the Heads dataset. . . . . . . . . . . . . . . . . . . . . 19
3.4 Example morphs from the horse class . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Example of objects in the SHREC 2008 Classification dataset . . . . . . . . . 20
4.1 Low-level feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Azimuth and elevation angle of a 3D surface normal vector. . . . . . . . . . . 24
4.3 Mid-level feature aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Craniofacial anthropometric landmarks. . . . . . . . . . . . . . . . . . . . . . 27
5.2 Example of training points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Example histograms of salient and non-salient points . . . . . . . . . . . . . . 29
5.4 Salient point prediction for two faces in the 22q11.2DS dataset . . . . . . . . 29
5.5 Salient point prediction for training data in Heads dataset . . . . . . . . . . . 31
5.6 Salient point prediction for testing data in Heads dataset . . . . . . . . . . . 31
5.7 Salient point prediction for objects in SHREC 2008 dataset . . . . . . . . . . 32
6.1 Salient point patterns on 3D objects . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 2D longitude-latitude signature maps . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Classification accuracy vs training rotation angle increment. . . . . . . . . . . 42
6.4 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Salient points resulting from clustering. . . . . . . . . . . . . . . . . . . . . . 54
6.7 Salient view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.8 Salient views vs Distinct salient views . . . . . . . . . . . . . . . . . . . . . . 56
6.9 Top 5 distinct salient views in SHREC dataset . . . . . . . . . . . . . . . . . 57
6.10 Average retrieval scores using top K salient views . . . . . . . . . . . . . . . . 59
iii
-
8/12/2019 Atmos u Karto Phd
14/139
7.1 Surface normal vectors of 3D points . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 Calculation of the Flatness Scores . . . . . . . . . . . . . . . . . . . . . . . . 67
7.3 Severity localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.4 Spectrum of deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.5 Correlation between LPFS and Expert Score . . . . . . . . . . . . . . . . . . 70
7.6 Correlation between RPFS and Expert Score . . . . . . . . . . . . . . . . . . 70
7.7 Correlation between AS and Expert Score . . . . . . . . . . . . . . . . . . . . 72
7.8 Correlation between AAS and Expert Score . . . . . . . . . . . . . . . . . . . 72
7.9 Correlation between AAS and aOCLR . . . . . . . . . . . . . . . . . . . . . . 74
7.10 ROC curve for LPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.11 ROC curve for RPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.12 ROC curve for AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.13 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.14 Correlation between AAS and Brachycephaly score . . . . . . . . . . . . . . . 76
7.15 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.16 Projections of 2D azimuth-elevation angles to the face . . . . . . . . . . . . . 81
8.1 Overview of the quantification learning framework. . . . . . . . . . . . . . . . 83
8.2 Facial region selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3 2D histogram of selected region . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 Positional information about selected region . . . . . . . . . . . . . . . . . . . 898.6 Positional information about selected region with normal vector . . . . . . . . 90
8.7 Output of the genetic programming quantification approach . . . . . . . . . . 91
8.8 F-measure for training and testing dataset . . . . . . . . . . . . . . . . . . . . 94
8.9 Projection of selected histogram bins . . . . . . . . . . . . . . . . . . . . . . . 100
8.10 Tree structure for quantifying midface hypoplasia . . . . . . . . . . . . . . . . 103
8.11 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 105
8.12 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 106
8.13 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 107
8.14 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 108
8.15 Quantification score for midface hypoplasia. . . . . . . . . . . . . . . . . . . . 109
iv
-
8/12/2019 Atmos u Karto Phd
15/139
LIST OF TABLES
Table Number Page
4.1 Besl-Jain surface characterization. . . . . . . . . . . . . . . . . . . . . . . . . 23
6.1 Classification performance for 22q11.2DS. . . . . . . . . . . . . . . . . . . . . 37
6.2 Overall comparison of the various shape descriptors. . . . . . . . . . . . . . . 38
6.3 Comparison of classification accuracy for 22q11.2DS. . . . . . . . . . . . . . . 38
6.4 Plagiocephaly classification using 254 individual dataset . . . . . . . . . . . . 396.5 Plagiocephaly classification using 140 individuals dataset . . . . . . . . . . . . 39
6.6 Comparison of classification accuracy for plagiocephaly. . . . . . . . . . . . . 40
6.7 Comparison of classification accuracy for SHREC 2008 dataset. . . . . . . . . 43
6.8 Comparison of timing of each phase . . . . . . . . . . . . . . . . . . . . . . . 44
6.9 Pose-normalized retrieval experiment 2 . . . . . . . . . . . . . . . . . . . . . . 46
6.10 Average retrieval score comparing three pose-normalization methods. . . . . . 48
6.11 Average retrieval score using different low-level features . . . . . . . . . . . . 48
6.12 Average retrieval score using image wavelet analysis . . . . . . . . . . . . . . 49
6.13 Comparing the salient map signature best results against existing methods. . 496.14 Comparing retrieval score for classes in SHREC dataset . . . . . . . . . . . . 50
6.15 Average retrieval score using salient views . . . . . . . . . . . . . . . . . . . . 62
6.16 Retrieval score using maximum number of distinct views . . . . . . . . . . . . 63
6.17 Average feature extraction runtime per object. . . . . . . . . . . . . . . . . . 64
7.1 Descriptive statistics for the Left Posterior Flatness Score (LPFS) . . . . . . 71
7.2 Descriptive statistics for the Right Posterior Flatness Score (RPFS) . . . . . 73
7.3 Descriptive statistics for the Asymmetry Score (AS) . . . . . . . . . . . . . . 78
7.4 Descriptive statistics for AAS and aOCLR . . . . . . . . . . . . . . . . . . . . 79
7.5 AUC for quantifying posterior flattening . . . . . . . . . . . . . . . . . . . . . 80
7.6 Classification accuracy for plagiocephaly . . . . . . . . . . . . . . . . . . . . . 80
7.7 Classification of 22q11.2DS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.8 Classification accuracy of 22q11.2DS facial dysmorphologies . . . . . . . . . . 81
8.1 Genetic programming parameters. . . . . . . . . . . . . . . . . . . . . . . . . 92
v
-
8/12/2019 Atmos u Karto Phd
16/139
8.2 Classification performance for nine facial anomalies using GP . . . . . . . . . 93
8.3 Classification performance using various shape descriptors . . . . . . . . . . . 95
8.4 Comparing GP to the global approaches . . . . . . . . . . . . . . . . . . . . . 96
8.5 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 97
8.6 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 98
8.7 Coefficients for midface hypoplasia . . . . . . . . . . . . . . . . . . . . . . . . 99
8.8 Best performing mathematical expression . . . . . . . . . . . . . . . . . . . . 101
8.9 Best performing mathematical expressions . . . . . . . . . . . . . . . . . . . . 102
8.10 Classification performance in predicting 22q11.2 Deletion Syndrome. . . . . . 104
vi
-
8/12/2019 Atmos u Karto Phd
17/139
ACKNOWLEDGMENTS
I wish to express a very deep and sincere gratitude to my advisor, Professor Linda
Shapiro, without whose guidance, encouragement and support I would not be able to com-
plete this PhD study. I have learned tremendously from her on how to become an excellent
researcher and writer, especially one in the field of computer vision.
I am very grateful to all the members of my PhD thesis committee, Dr Maya Gupta,
Dr James Brinkley, Dr Steve Seitz, and Dr Mark Ganther, for their useful feedbacks and
comments.
I would also like to thank my collaborators at Seattle Childrens Hospital Cranifoacial
Centre: Dr Michael Cunningham, Dr Matthew Speltz, Dr Carrie Heike, Dr Brent Collett,
for providing me with the medical 3D mesh data for this dissertation, as well as for their
engaging discussions and suggestions.
I owe an indescribable amount of gratitude to my parents, my sisters, and my niece for
having confidence in me, always encouraging me and cheering me up when I am down.Finally, I reserve special thanks for my husband, David Gomulya, for being my best
friend and a great supporter, and my son, Kiran, for bringing new joy into my life.
This research was supported by the National Science Foundation under grant number
DBI-0543631.
vii
-
8/12/2019 Atmos u Karto Phd
18/139
DEDICATION
to my son
Kiran Atmosukarto Gomulya
our Ray of Light
viii
-
8/12/2019 Atmos u Karto Phd
19/139
1
Chapter 1
INTRODUCTION
1.1 Motivation
Advancement in technology for digital acquisition of 3D models has led to an increase in
the number of 3D ob jects available. Three-dimensional ob jects are now commonly used
in a number of areas such as games, mechanical design for CAD models, archaeology and
cultural heritage, and medical research studies. Figure 1.1 shows some applications that use
3D objects. The widespread integration of 3D models in different fields motivates the need
to be able to store, index, classify, and retrieve 3D objects automatically. However, current
classification and retrieval techniques for text, 2D images, and videos cannot be directly
translated and applied to 3D ob jects, as 3D ob jects have different data characteristics from
other data modalities.
Classification and retrieval of 3D objects requires the 3D objects to be represented in
a way that captures the local and global shape characteristics of the object. This requires
creating a 3D descriptor or signature that summarizes the important shape properties of the
object. Unfortunately, finding a descriptor that is able to describe the important character-
istics of a 3D object is not a trivial task. The descriptor should be able to capture a good
balance between the global and local shape properties of the ob ject, so as to allow flexibility
in performing different tasks. The global properties of an object capture the overall shape
of an object, while the local properties capture the details of an object.
A specific example of the usage of 3D models in the medical field is the work that re-
searchers at Seattle Childrens Hospital Craniofacial Center (SCHCC) are pursuing. The
researchers at SCHCC use CT scans and 3D surface meshes of childrens heads to inves-
tigate head shape dysmorphology due to craniofacial disorders such as craniosynostosis,
22q11.2 deletion syndrome, deformational plagiocephaly, or cleft lip and palate. These
researchers aspire to develop new computational techniques that can represent, quantify,
-
8/12/2019 Atmos u Karto Phd
20/139
2
(a) (b) (c) (d)
Figure 1.1: Example of applications that use 3D objects: (a) Second Life is a game thatsimulates a virtual 3D world, (b) The Digital Michelangelo is a Stanford project that aimsto digitize cultural artifacts for cataloging, conservation, and restoration, (c) FoldIt! is acomputer game that uses 3D protein structures to understand how proteins fold for use indrug developments, and (d) Plan3D is an interior design application that allows users to
incorporate 3D models in house designs.
and analyze variants of biological morphology from the 3D models acquired from stereo
camera technology. The objective of their research in the long run is to ultimately reveal
genotype-phenotype disease associations.
This thesis investigates new methodologies for representing 3D objects that are useful
in medical applications. Most existing 3D shape descriptors have only been developed and
tested on general 3D object datasets, while those designed for medical purposes must usually
satisfy a specific medical application and dataset. The objective of this work is to develop
3D shape representation methodologies that are flexible enough to generalize from specific
medical tasks to general 3D object tasks. This work was motivated by the collaborations in
two research studies at SCHCC for the study of craniofacial anatomy: 1) a study of children
with 22q11.2 deletion syndrome and 2) a study of infants with deformational plagiocephaly.
22q11.2 deletion syndrome (22q11.2DS) is a genetic disease that is one of the most com-
mon multiple anomaly syndromes in humans [41]. This condition is associated with more
than 180 clinical features, including over 25 dysmorphic craniofacial features. Abnormal
clinical features of individuals with 22q11.2DS include asymmetric face shape, hooded eyes,
bulbous nasal tip, and retrusive chin, among others. The range of variation in individual
feature expression is very large. As a result, even experts have difficulty in diagnosing
22q11.2DS from frontal facial photographs alone [9]. Early detection of 22q11.2DS is im-
-
8/12/2019 Atmos u Karto Phd
21/139
3
portant as many affected individuals are born with conotruncal cardiac anomalies, mild-to-
moderate immune deficiencies and learning disabilities, all of which can benefit from earlyintervention.
Deformational plagiocephaly (also known as positional plagiocephaly, or non-synostotic
plagiocephaly) refers to the deformation of the head, characterized by a persistent flattening
on the side resulting in an asymmetric head shape and misalignment of the ears. Deforma-
tional plagiocephaly is caused by persistent pressure on the skull of a baby before or after
birth. Another possible factor that can lead to deformational plagiocephaly is torticollis, a
muscle tightness in the neck resulting in a limited range of motion for the head that causes
infants to look in one direction and to rest on the same spot of the back of the head. If left
untreated, children with these abnormal head shape conditions may experience a number
of medical issues in their lives, ranging from social problems due to abnormal appearance
to delayed neurocognitive development [18, 77].
1.2 Problem Statement
Motivated by collaborations with researchers at SCHCC, this thesis develops 3D shape
representation methodologies that can be used for 3D shape classification, retrieval, andquantification. The methodologies provide flexibility to generalize usage for b oth specific
medical datasets and general 3D objects. The following three general problems are tackled.
Problem 1: 3D shape quantification
Given surface mesh Si, which consists ofn points and information regarding the connec-
tivity of the points, the goal is to analyze and describe the shape Si by constructing a
numeric representation of meshSi commonly referred to as a signature or descriptor Di. A
quantitative score may also be calculated from the obtained signature.
Problem 2: 3D shape classification
Given a database of 3D shapesS={S1, S2,...,SN}that have been quantified and described
using their respective numeric signatures Di, 1 i N, and are pre-classified into a num-
ber ofCclasses, the goal is to create an algorithm that can be used to determine to which
-
8/12/2019 Atmos u Karto Phd
22/139
4
class a new 3D object Qbelongs.
Problem 3: 3D shape retrieval
Given a database of 3D shapesS= {S1, S2,...,SN}that have been quantified and described
using their respective numeric signature Di, 1 i N, the goal is to create an algorithm
that retrieves all objects in S that are similar to a query object Q based on their numeric
signatures.
1.3 Thesis Outline
Chapter 2 discusses the literature related to the two main classes of research in this thesis:
3D object descriptor in the computer vision literature and craniofacial assessment in the
medical literature. The datasets used to develop and test the methodology are described in
Chapter 3. Chapter 4 explains the base framework for feature extraction. The method for
learning the salient points of a 3D object is explained and applied to different applications
in Chapter 5. Two different types of 3D object descriptors are introduced and analyzed in
Chapters 6 and 7. Chapter 6 describes the 2D longitude-latitude salient map signature and
investigates its application for classification and retrieval of both general 3D objects and3D medical data. Chapter 7 covers the global 2D azimuth-elevation angles descriptor and
investigates its application for classification of deformational plagiocephaly and 22q11.2DS
datasets. A learning framework for quantification using genetic programming is described
in Chapter 8. Finally, Chapter 9 provides a summary and suggests p ossible future research
directions.
-
8/12/2019 Atmos u Karto Phd
23/139
5
Chapter 2
RELATED LITERATURE
In this chapter, two main classes of research related to the work in this thesis are
described: 3D shape descriptors for general objects from the computer vision literature
and medical studies from the craniofacial literature.
2.1 3D Descriptors for General Objects
Three-dimensional shape analysis and its application in 3D object retrieval and classification
has received increased attention in the past few years. There have been several survey
papers on the topic [81, 82, 26, 95, 20, 36, 13, 14, 12, 52, 69]. Starting in 2006, researchers
in the area have taken the initiative to organize an annual 3D shape retrieval evaluation
contest called SHREC (SHape REtrieval Contest), currently organized by the Network of
Excellence AIM@SHAPE. The contests general objective is to evaluate the effectiveness of
3D shape retrieval algorithms. Participants register for the contest before the test set is
made available. The participants are given 48 hours to apply their 3D retrieval algorithm
to the test set and submit their retrieval results to the organizers. The retrieval results are
evaluated using measurements that relate to precision and recall. The average performance
of each method over a set of queries is calculated to obtain an overall impression of the
algorithms performance. Using a common test set and queries allows a direct comparison
of the different algorithms. The contest started with a single-track using only the Princeton
benchmark database as the test set and has evolved into a multi-track contest. The tracks
now include retrieval of watertight models, CAD models, protein models, 3D face models,
and partial matching. Results of the contest show that no one descriptor performs the best
for all kinds of retrieval and classification task. Each descriptor has its own strength and
weakness for the different queries and tasks.
There are three broad categories of 3D object representation: feature-based methods,
-
8/12/2019 Atmos u Karto Phd
24/139
6
graph-based methods, and view-based methods.
2.1.1 Feature-based methods
Feature-based 3D object descriptors, which are the most popular, can be further catego-
rized into: (1) global features, (2) global feature distributions, (3) spatial maps, and (4)
local features. Early work on 3D ob ject representation and its application to retrieval and
classification focused more on the global features and global feature distribution approaches.
Global features computed to represent 3D ob jects include area, volume and moments. Elad
et al. [22] computed the moments properties of the object and used the vector value of the
moments as a descriptor for the object. Osada et al. [62] calculated a number of global
shape distributions to represent 3D objects. The shape functions measured included the
angle between three random points (A3), the distance between a point and a random point
(D1), the distance between two random points (D2), the area of the triangle between three
random p oints (D3), and the volume between four random p oints on the surface (D4).
Ohbuchiet al. [59] enhanced the D2 shape function by measuring not only the distance, but
also the mutual orientation of the surfaces on which the pair of points is located. Zaharia
et al. [96] introduced a 3D shape spectrum descriptor that computed the distribution of the
shape index of the points over the whole mesh. Similar distributions were also calculated for
other surface properties such as curvature. Some recent works continue to use the feature
distribution approach. Mahmoudiet al. [53] computed the histogram of pairwise diffusion
distances between all points, while Ion et al. [35] defined their descriptor as the histogram
of the eccentricity transform. The histogram uses the maximum geodesic distance from a
point to all other points on the surface. The global feature methods are computationally
efficient, as they reduce the computation space of the 3D object by describing the object
with fewer dimension; however these methods are not discriminative enough when the ob-
jects have small differences as in intra-class retrieval cases or classification of very similar
objects.
Spatial map representations describe the 3D object by capturing and preserving physical
locations on them. Saupe et al. [71] described a spherical extent function by calculating
-
8/12/2019 Atmos u Karto Phd
25/139
7
the maximal extent of a shape across all rays from the origin. They compared two different
kinds of representation of the function: using spherical harmonics and moments. Theirresults showed that using spherical harmonics to represent the function performed better.
The spherical harmonic coefficients reconstruct an approximation of the object at different
resolutions. Kazhdanet al.[39] used this idea to show that spherical harmonics can be used
to transform rotation-dependent shape descriptors into rotation-independent ones without
the need to pose normalize the objects in advance. Their results showed that the applica-
tion of the spherical harmonic representation improved the performance of most spherical
function descriptors. Laga et al. [44, 43] uniformly sampled points on a unit sphere and
used spherical wavelet transforms to represent 3D objects. Spherical wavelet descriptors are
natural extensions of 3D Zernike moments and spherical harmonics; they offer better feature
localization and rotation invariance since spherical harmonics analysis has singularities at
each pole of the sphere.
Wavelets are basis functions that represent a given signal at multiple resolutions. Laga
investigated both second generation wavelets, including linear and butterfly spherical wavelets
with a lifting scheme, and image wavelets with spherical boundary extension rules for con-
structing the shape descriptor [73, 68]. He proposed three descriptors based on the sphericalwavelets: using the coefficients as feature vectors, using the L1 energy of the coefficients,
and using the L2 energy of the coefficients. Zhenbaoet al. [51] compared their multireso-
lution wavelet analysis to the spherical wavelet descriptor and showed that their descriptor
performed slightly b etter. Their method characterized the shape orientation of the object
by setting six view planes and sampled the shape orientation from each of the view planes.
They then performed multiresolution wavelet analysis on each of the view planes and used
the wavelet coefficients for each of the view planes as the feature vector. Assfalget al. [5]
captured the shape of a 3D object using the curvature map of the objects surface. One of
the methods developed in this thesis is quite related to this approach, however it differs in
that it does not use the curvature information directly. Lastly, Tangelderet al. [80] devel-
oped a 3D spatial map by dividing the 3D object into a 3D grid with cells of equal sizes
and measuring the curvature property in each cell.
Recent research is beginning to focus more on the local approach to representing 3D
-
8/12/2019 Atmos u Karto Phd
26/139
8
objects, as this approach has a stronger discriminative power when differentiating objects
that are similar in overall shape [63]. Local features are often points that are consideredto be interesting or salient on the 3D ob ject. These points are computed in various ways.
Some methods randomly select points on the surface of the object. Fromeet al. [25] who
developed a 3D shape context and Johnson et al. [37] who designed spin image descriptors,
both randomly selected points as their basis points. Shilane et al. [75, 76] used random
points with harmonic shape descriptors at four different scales. Most other methods use
the local geometric properties of the 3D object such as curvature or normals to describe the
points on the surface of the object, and define the level difference extrema as the salient
points. Leeet al. [46] used mean curvature properties with the center-surround mechanism
to identify the extrema as final salient points. A similar method was adopted by Li et
al. [47, 48] who found the reliable salient points by considering a set of extrema for a scale-
space representation of a point-based input surface and used the locations of level difference
extrema as the salient feature points. Unnikrishnan et al. [83] presented a multi-scale
interest region detector that captures variation in shape at a point relative to the size of its
neighborhood. Their method used the the extrema of the mean curvature to identify the
salient points. Watanabe et al. [90] used salient extrema of the principal curvatures along
the curvature lines on the surface. Castellaniet al. [15] proposed a new methodology for
detecting and matching salient points based on measuring how much a vertex is displaced
after filtering. The salient points are described using a local description based on a hidden
Markov model.
Ohbuchi et al. [60] rendered multiple views of a 3D model and extracted local features
from each view using the SIFT algorithm. The local features were then integrated into a
histogram using a bag-of-features approach to retrieval. Novatnacket al. [58] [57] extracted
corners and edges of a 3D model by first parameterizing the surface of a 3D mesh model on
a 2D map and constructing a dense surface normal map. They then constructed a discrete
scale-space by convolving the normal map with Gaussian kernels of increasing standard
deviation. The corners and edges detected at individual scales were combined into a unified
representation for the 3D ob ject. Akagunduz et al. [2] used a Gaussian pyramid at several
scales to extract the surface extrema and represented the points and their relationships by
-
8/12/2019 Atmos u Karto Phd
27/139
9
a graphical model. Taati et al. [79] generated a local shape descriptor based on invariant
properties extracted from the principal component space of the local neighborhood around apoint. The salient p oints were selected based on ratios of basic dispersion properties. Other
examples of local descriptors include spin images [37, 4], point signature [17], and symbolic
signatures [70]. Some efforts have also been made in combining both the local and global
properties of the object. Alosaimiet al. [3] combined the information in a 2D histogram and
used the PCA coefficients of the histograms concatenated to form a single feature vector.
Liuet al. [50, 49] represented a global 3D shape as the spatial configuration of a set of local
features. The spatial configuration was represented by computing the distributions of the
Euclidean distances between pairs of local shape clusters, represented by spin images.
2.1.2 Graph-based methods
While feature-based methods use only the geometric properties of the 3D model to define
the shape of the object, graph-based methods use the topological information of the 3D
object to describe its shape. The graph that is constructed shows how the different shape
components are linked together. The graph representations include model graphs, Reeb
graphs, and skeleton graphs. These methods are known to be computationally expensive
and sensitive to small topological changes. Sundaret al. [78] used the skeletal graph as a
shape descriptor to encode both geometric and topological properties of the 3D object. The
similarity measures between two objects were approximated using a greedy algorithm for
bipartite graph matching. Hilagaet al. [30] introduced the use of Reeb graphs for matching
the shapes of articulated models.
2.1.3 View-based methods
The most effective view-based shape descriptor is the LightField descriptor developed by
Chen et al. [16]. A light field around a 3D object is a 4D function that represents the
radiance at a given 3D point in a given direction. Each 4D light field of a 3D object is
represented as a collection of 2D images rendered from a 2D array of cameras distributed
uniformly on a sphere. Their method places the light field cameras on 20 vertices of a
-
8/12/2019 Atmos u Karto Phd
28/139
10
regular dodecahedron and uses orthogonal projection to capture 10 different silhouettes of
the 3D model. Ten different rotations are performed to capture a set of light field descriptorto improve robustness for rotation. The 100 rendered images are then described using
Zernike moments and Fourier descriptors to describe the region shape and contour shape,
respectively of the 3D model. The retrieval of the 3D models is performed in stages where
objects that are greatly dissimilar to the query model are rejected early in the process. This
is done by comparing only a subset of the light field descriptors of the query and of the
database objects in the first few stages of the retrieval process. The light field descriptor was
evaluated to be one of the best performing descriptors in the SHREC competition. Ohbuchi
et al. [60] used a similar view-based approach to the light field descriptor. However, their
method extracted local features from each rendered image using the SIFT algorithm. Wang
et al.[89] improved the space usage efficiency of the LFD descriptor by projecting a number
of uniformly sampled random points along six directions to create six images that are then
described using Zernike moments. They also used a two-stage retrieval method to speed
up the retrieval process. Experimental results on the Princeton shape benchmark database
showed that their methods performance was comparable to the LFD descriptor for some
categories. Vajramushti et al. [84] employed a combination of a view-based depth-buffer
technique and a feature-based volume descriptor for 3D matching. Their method used the
voxel volume of the objects to reduce the search space for the depth-buffer comparisons.
Vranic [87] evaluated a composite descriptor called DESIRE that was formed using depth-
buffer images, silhouettes and ray extents of a 3D object. His results showed that DESIRE
outperformed LFD in retrieving objects of some categories.
It is important to note that most of these existing 3D object descriptors were developed
and tested to describe general 3D objects with high shape variability, and not medical
datasets, which usually have small shape variations. As shown in the analysis section, they
usually do not perform very well in describing medical datasets. This thesis proposes a
feature-based approach that uses a learning methodology to identify the interesting salient
points on the object, discussed in Chapter 5, and creates a global spatial map of the salient
points patterns, described in Chapter 6. The proposed descriptor is tested and shown to
work well for general 3D objects and to outperform other methods on craniofacial medical
-
8/12/2019 Atmos u Karto Phd
29/139
-
8/12/2019 Atmos u Karto Phd
30/139
12
Figure 2.1: Anthropometric landmarks on patients head. These images were published byKelly et al. [40]
shape [score 0], 2) mild shape deformation [score 1], 3) moderate shape severity [score
2], and 4) severe shape deformation [score 3]. When assigning the severity score for a
patient, a clinical expert matches the patients skull shape to the most similar template
and assigns the score corresponding to that template. This technique is currently used
by practitioners using the Dynamic Orthotic Cranioplasty Band (DOC Band) helmet as a
treatment method [33, 34].
Instead of taking physical measurements directly on a patients head, some techniques
take the measurements from photographs of the patients head. This approach is less intru-
sive for young patients, but it is still time consuming and can be inconsistent as techniciansmust manually place landmarks on the photographs. Hutchisonet al. [31, 32] developed
a technique called HeadsUp that involves taking the top view digital photograph of infant
heads fitted with an elastic head circumference band. The elastic band is equipped with
adjustable color markers to identify landmarks such as ear and nose positions. The result-
ing photograph is then automatically analyzed to obtain quantitative measurements for the
head shape, including cephalic index, head circumference, distance of ear to center of nose,
oblique length and ratio. Their results showed that the cephalic index (CI) and Oblique
Cranial Length Ratio (OCLR) can be used for quantification measurement of shape sever-
ity, as the numbers differ significantly between cases and control. Although promising, the
Hutchison method requires subjective decisions regarding the placement of the midline and
ear landmarks and the selection of the posterior point of the OCLR lines. In addition, as
the measurements are done in two dimensions, displacement of head volumes cannot really
be assessed. In addition, the placing of the band on an infant can be quite challenging.
-
8/12/2019 Atmos u Karto Phd
31/139
13
Zonenshaynet al. [98] also employed a headband with two adjustable points (nasion and
inion of the head) and used photographs of the headband shape to calculate the CranialIndex of Symmetry (CIS). These methods require consistency in setting up the band and
placing the markers, which may lead to non-reproducible results. In addition, this is a 2D
technique, but plagiocephaly and brachycephaly are three-dimensional deformations.
Vlimmerenet al. [85] introduced a new method called plagiocephalometry to assess the
asymmetry of the skull. The method uses a thermoplastic material to mold the outline of
a patients skull. The ring is p ositioned around the head at the widest transverse circum-
ference. Three landmarks for the ears and nose are marked on the ring. The ring is then
copied onto a paper and transparent sheet made to keep track of follow-up progress.
Measurement techniques that use full 3D head shape information can provide more
detailed and accurate shape information. Plank et al.[67] used a noninvasive laser shape
digitizer to obtain the 3D surface of the head. This system provides more accurate shape
information, but still requires the use of markers to define an anatomical reference plane for
further quadrant placement and volume calculations. Lanche et al. [45, 61] used a stereo-
camera system to obtain 3D model of the head and developed a statistical model of the
asymmetry to quantify and localize the asymmetry at each patients head. The model wasobtained by first computing the asymmetry of a patients head by deforming a symmetric
ideal head template to the patients head to obtain point correspondence between the left
and right sides of the head. Principal Component Analysis was then performed on the
vector of the asymmetry values of all patients head to obtain a statistical model.
2.2.2 22q11.2 Deletion Syndrome
Similar to the assessment of deformational plagiocephaly, the assessment of 22q11.2 deletion
syndrome has commonly been through physical examination combined with craniofacial
anthropometric measurements. There have been very few automated methods for analyzing
22q11.2DS. Boehringer et al. [11] used Gabor wavelet to transform 2D photographs of
individuals with 10 different facial dysmorphic syndromes. Their method then applied
principal component analysis to describe and classify the dataset. Their method required
-
8/12/2019 Atmos u Karto Phd
32/139
14
landmark placement on the face.
Hammondet al. [29] used the Dense Surface Model method. Landmarks were manuallyplaced on each of the 3D surface mesh, and used to align the faces to a mean face. Principal
component analysis was then used to describe the datasets, and the coefficients were used
to classify dataset. Neither of these two methods are fully automatic as they require manual
landmark placement.
One of the method proposed in this thesis to represent craniofacial dysmorphologies uses
3D surface mesh models of heads without the need for markers or templates. The method
uses the surface normal vectors of all the 3D points on the head and constructs a global 2D
histogram of the azimuth-elevation angles of the surface normal vectors of the 3D points
on the face. The proposed method is general enough to characterize different craniofacial
disorders including deformational plagiocephaly and its variations and 22q11.2DS and its
different manifestation.
-
8/12/2019 Atmos u Karto Phd
33/139
15
Chapter 3
DATASETS
This chapter will describe the four datasets that were obtained to develop and test
the different shape analysis methodologies developed for this thesis. Each dataset has
different characteristics that help explore the different properties of the methodologies. The
22q11.2DS dataset, introduced in Section 3.1, contains 3D face models of individuals affectedand unaffected by 22q11.2 deletion syndrome. The Deformational Plagiocephaly dataset,
discussed in Section 3.2, contains 3D head models of individuals affected and unaffected by
deformational plagiocephaly. The Heads dataset, discussed in Section 3.3, contains head
shapes of different classes of animals, including humans. These three datasets help explore
the performance of the methodology on data of similar overall shape with subtle distinctions
- the type of data for which the methodology was designed and developed. Section 3.4
introduces the SHREC 2008 classification benchmark dataset, which was obtained to further
test the performance of the methodology on general 3D object classification, where objects
in the dataset are not very similar.
3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset
The 3D face models in this dataset were collected at the Craniofacial Center of Seattle
Childrens Hospital using the 3dMD imaging system [1]. The 3dMD imaging system uses
four camera stands, each containing three cameras. Stereo analysis yields twelve range maps
that are combined using 3dMD proprietary software to yield a 3D mesh of an individuals
head and a texture map of the face. The methodologies developed for this thesis use only
the 3D meshes, due to human subject regulations.
An automated system developed by Wilamowska [92, 74] to align the pose of each mesh
was employed. The alignment system uses symmetry to align the yaw and roll angles and
a height differential to align the pitch angle. Although faces are not truly symmetrical,
-
8/12/2019 Atmos u Karto Phd
34/139
16
Figure 3.1: Example of 3D face mesh data of children with 22q11.2 deletion syndrome.
the pose alignment procedure can be cast as finding the angular rotations of yaw and roll
that minimizes the difference between the left and right sides of the face. The pitch of the
head was aligned by minimizing the difference between the height of the chin and the height
of the forehead. In some cases, manual adjustments were necessary to pose normalize the
faces. Figure 3.1 shows two examples of affected individuals in the dataset.
The dataset contained 3D meshes for 189 individuals. Metadata for each 3D mesh
consisted of the age, gender, and self-described ethnicity of the individual plus a label of
affected or unaffected. The dataset consisted of 53 affected individuals and 136 control indi-
viduals. The groundtruth for the individuals label for 22q11.2DS was determined through
laboratory confirmation.
A balanced dataset was created from the original dataset. The balanced dataset con-
sisted of 86 individuals: 43 affected and 43 unaffected with 22q11.2 deletion syndrome. Each
of the 86 individuals were assessed by three craniofacial experts. Frontal and profile images
of the individuals were de-identified and viewed in random order to blind raters. The ex-
perts assigned discrete scores to a total of 18 facial features that are known to characterize
22q11.2DS (score 0 = none, 1 = moderate, 2 = severe). Nine of the facial features (midface
hypoplasia, prominent nasal root, bulbous nasal tip, small nasal alae, tubular nose, small
mouth, open mouth, downturned mouth, and retrusive chin) are further analyzed in Chap-
ter 8. The experts survey showed that all features of the nose were found to have a higher
percentage of moderate and severe expression in 22q11.2DS affected individuals. Midface
hypoplasia was observed to be moderately present in affected individuals [91].
-
8/12/2019 Atmos u Karto Phd
35/139
17
Figure 3.2: Tops of heads of children with deformational plagiocephaly.
3.2 Deformational Plagiocephaly Dataset
The dataset for analyzing the shape dysmorphology due to deformational plagiocephaly was
obtained through a similar data acquisition pipeline as the 22q11.2DS dataset. The resulting
3D meshes are also automatically pose-normalized using the same alignment system used
to normalize the 22q11.2DS dataset [92, 74]. Figure 3.2 shows two examples of individuals
diagnosed with deformational plagiocephaly.
The original dataset consisted of 254 3D head meshes consisting of 100 controls and
154 cases. Each mesh in the original dataset was assessed by two craniofacial experts who
assigned discrete severity scores based on the degree of the deformation severity of different
head areas including back of the head, forehead asymmetry, ear asymmetry, and whether
the flattening at the back of the head was symmetric (case of brachycephaly). In addition,
each expert also noted an overall severity score. The discrete scores were either category 0
for normal, 1 for mild, 2 for moderate and 3 for severe. The laterality of the flatness was
indicated using negative scores to represent left sided deformation and positive scores to
represent right sided deformation.
The work in this thesis focuses on the flattening at the back of the head noted as
posterior plagiocephaly. Since there does not exist any gold standard for assessing the
severity of posterior plagiocephaly, the experts ratings were considered the gold standard
in evaluating the different severity scores developed. The inter-rater agreement between
the two experts was only 65%. As a result, participants were excluded if (1) the two
-
8/12/2019 Atmos u Karto Phd
36/139
18
experts assigned discrepant posterior flattening scores, or (2) the classification based on
expert ratings differed from the clinical classification (case or control) assigned at the timeof enrollment. The final dataset used to investigate posterior plagiocephaly consisted of 140
infants including 50 controls (by definition in category 0 by expert rating) and 90 cases: 46
in category 1 or -1, 35 in category 2 or -2, and 9 in category 3 or -3.
3.3 Heads Dataset
For the Heads dataset, the digitized 3D objects were obtained by scanning hand-made clay
toys using a Roldand-LPX250 laser scanner with a maximal scanning resolution of 0.008
inches for plane scanning mode [70]. Raw data from the scanner consisted of 3D point clouds
that were further processed to obtain smooth and uniformly sampled triangular meshes of
0.9-1mm resolution. To increase the number of objects for training and testing, new objects
were created by deforming the original scanned 3D models in a controlled fashion using 3D
Studio Max software [8]. Global deformations of the models were generated using morphing
operators such as tapering, twisting, bending, stretching and squeezing. The parameters for
each of the operators were randomly chosen from ranges that were determined empirically.
Each deformed model was obtained by applying at least five different morphing operatorsin a random sequence.
Fifteen objects representing seven different classes were scanned. The seven classes are:
cat head, dog head, human head, rabbit head, horse head, tiger head and bear head. Each
of the fifteen original objects were randomly morphed to increase the size of the dataset.
A total of 250 morphed models per original object were generated. Points on the morphed
model are in full correspondence with the original models from which they were constructed.
Figure 3.3 shows examples of objects from each of the seven classes, while Figure 3.4 shows
example of morphs from the horse class.
3.4 SHREC Dataset
The SHREC dataset was selected from the SHREC 2008 Competition Classification of
Watertight Models track [27]. The models in the track were chosen by the organizer to
ensure a high level of shape variability to make the track more challenging. The models
-
8/12/2019 Atmos u Karto Phd
37/139
19
cat dog human rabbit horse tiger bear
Figure 3.3: Example of objects in the Heads dataset.
Figure 3.4: Example morphs from the horse class. Morphs were generated by stretching,twisting, or squeezing the original object with different parameters.
in the dataset were manually classified using three different levels of categorization. At
the coarse level of classification, the objects were classified according to both their shapes
and semantic criteria. At the intermediate level, the classes were subdivided according to
functionality and shape. At thefine level, the classes were further partitioned based on the
object shape. For example, at the coarse level some objects were classified into the furniture
class. At the intermediate level, these same objects were further divided intotables, seats
andbeds. At the fine level, the objects were classified intochairs,armchairs,stools,sofaand
benches. The intermediate level of classification was chosen for the experiments as the fine
level had too few objects per class, while the coarse level had too many objects that were
dissimilar in shape grouped into the same class. The dataset consists of 425 pre-classified
objects. Figure 3.5 shows examples of objects in the benchmark dataset.
The four datasets were used to test the classification and retrieval methodologies de-
veloped in this thesis. The domain-independent base framework of the methodologies is
described next in Chapter 4.
-
8/12/2019 Atmos u Karto Phd
38/139
20
human animal knots airplane bottle chess teapot
Figure 3.5: Example of objects in the SHREC 2008 Classification dataset. It can be seenthat the intra-class variability in this dataset is quite high as objects in the same class havequite different shapes.
-
8/12/2019 Atmos u Karto Phd
39/139
21
Chapter 4
BASE FRAMEWORK
The methodologies developed in this thesis are used for single 3D object classification.
They do not handle objects in cluttered 3D scenes nor occlusion. A surface mesh, which
represents a 3D object, consists of points {pi} on the objects surface and information
regarding the connectivity of the p oints. The base framework of the methodology startsby rescaling the ob jects to fit in a fixed-size bounding box. The framework then executes
two phases: low-level feature extraction (Section 4.1) and mid-level feature aggregation
(Section 4.2). The low-level feature extraction starts by applying a low-level operator to
every point on the surface mesh. After the first phase, every pointpi on the surface mesh
will have either a single low-level feature value or a small set of low-level feature values,
depending on the operator used. The second phase performs mid-level feature aggregation
and computes a vector of values for a given neighborhood of every point pi on the surface
mesh. The feature aggregation results of the base framework are then used to construct the
different 3D object representations [7, 6].
4.1 Low-level Feature Extraction
The low-level operators extract local properties of the surface points by computing a feature
value vi for every point pi on the mesh surface. All low-level feature values are convolved
with a Gaussian filter to reduce noise effects. Three low-level operators were implemented
to test the methodologys performance: absolute Gaussian curvature, Besl-Jain curvature
categorization, and azimuth-elevation of surface normal vectors. Figure 4.1(a) shows an
example of the absolute Gaussian curvature values of a 3D model. Figure 4.1(b) shows
the results of applying a Gaussian filter over the low-level Gaussian curvature values, while
Figure 4.1(c) shows the results of applying the Gaussian filter over the low-level Besl-Jain
curvature values.
-
8/12/2019 Atmos u Karto Phd
40/139
22
(a) (b) (c)
Figure 4.1: (a) Absolute Gaussian curvature low-level feature value, (b) Smoothed AbsoluteGaussian curvature values after convolution with the Gaussian filter, (c) Smoothed Besl-Jain curvature values after convolution. Higher values are represented by cool (blue) colors,while lower values are represented by warm (red) colors.
4.1.1 Absolute Gaussian Curvature
The absolute Gaussian curvature low-level operator computes the Gaussian curvature esti-
mation Kfor every point p on the surface mesh:
K(p) = 2 fF(p)
interior anglef
where F is the list of all the neighboring facets of point p and the interior angle is the
angle of the facets meeting at point p. This calculation is similar to calculating the angle
deficiency at point p. The contribution of each facet is weighted by the area of the facet
divided by the number of points that form the facet. The operator then takes the absolute
value of the Gaussian curvature as the final low-level feature value for each point.
4.1.2 Besl-Jain Curvature
Besl and Jain [10] suggested surface characterization of a point p using only the sign of the
mean curvature H and Gaussian curvature K. These surface characterizations result in a
scalar surface feature for each point that is invariant to rotation, translation and changes
in parametrization. The eight different categories are: (1) peak surface, (2) ridge surface,
(3) saddle ridge surface, (4) plane surface, (5) minimal surface, (6) saddle valley, (7) valley
-
8/12/2019 Atmos u Karto Phd
41/139
23
surface, and (8) cupped surface. Table 4.1 lists the different surface categories with their
respective curvature signs.
Table 4.1: Besl-Jain surface characterization.
Label Category H K
1 Peak surface H 0
2 Ridge surface H
-
8/12/2019 Atmos u Karto Phd
42/139
24
Figure 4.2: Azimuth and elevation angle of a 3D surface normal vector.
Figure 4.3: (a) 1D histogram aggregating the absolute Gaussian curvature values frompoints on the nose of a human head, (b) 2D histogram aggregating the azimuth-elevationvector values at a point on the back of the head.
object size, and that the results are comparable across different objects. The value ofc was
determined empirically; for most experiments a value of c = 0.05 was used. Aggregating
the single-valued low-level feature values results in a 1D histogram with d histogram bins
for every point on the surface mesh. Aggregating the pair-valued low-level feature values
(such as the azimuth-elevation angle feature values) results in a 2D histogram constructed
ofab bins, where a and b are the two different dimension sizes. Figure 4.3(a) shows an
example of a 1D histogram aggregating the absolute Gaussian curvature low-level feature
values from points on the nose of a 3D head object. Figure 4.3(b) shows an example of the
2D histogram aggregating the azimuth-elevation low-level feature values on a head.
-
8/12/2019 Atmos u Karto Phd
43/139
25
Once the feature extraction and aggregation are completed, a learning phase is used
to learn the characteristics of salient points for classification and retrieval as described inChapter 5.
-
8/12/2019 Atmos u Karto Phd
44/139
26
Chapter 5
LEARNING SALIENT POINTS
Given the base frameworks ability to compute low-level feature values at each point of
a 3D mesh and to aggregate these features in neighborhoods about the point, this chap-
ter explores the use of this framework to create a representation for 3D objects. Before
constructing the 3D object signature, salient or interesting points are identified on the 3Dobject and the characteristics of these points are used when constructing the signatures.
The identified salient points are application dependent. The framework and methodology
was developed to be specifically applicable to classification of craniofacial disorders, such
as 22q11.2 deletion syndrome, discussed in Section 5.1, and deformational plagiocephaly,
described in Section 5.2, but also appropriate for general use in 3D shape classification, as
shown in Section 5.3.
Preliminary saliency detection using existing methods [46, 38] were not satisfactory. In
some cases they were not consistent and repeatable for ob jects within the same class. As
a result, to find salient points on a 3D object, a learning approach was selected. A salient
point classifier is trained on a set of marked training points on the 3D objects provided by
experts for a particular application. Histograms of low-level features of the training points
obtained using the base framework (Chapter 4) are then used to train the classifier. For
a particular application, the classifier will learn the characteristics of the salient points on
the surfaces of the 3D objects from that domain. Sets of detected points will lead to salient
regions in the signatures.
5.1 Learning Salient Points for 22q11.2 Deletion Syndrome
Traditionally, studies of individuals with craniofacial disorders such as 22q11.2 deletion syn-
drome have been performed through in-person clinical observation coupled with craniofacial
anthropometric measurements derived from anatomic landmarks [24]. These landmarks are
-
8/12/2019 Atmos u Karto Phd
45/139
27
Figure 5.1: Craniofacial anthropometric landmarks.
located either visually by clinicians or through palpation of the skull. Figure 5.1 shows the
landmark points that are commonly used for craniofacial measurements.
The salient point classifier was trained on a subset of the craniofacial anthropometric
landmarks marked on 3D head objects. This was done so that these craniofacial landmarks
would be included in the set of interesting or salient points for classification of the cranio-
facial disorders. The particular subset of landmarks was selected to b e well-defined points
that both experts and non-experts could easily identify. The training set consisted of human
heads selected from the Heads database. Figure 5.2 shows an example of manually marked
salient points on the training data. Histograms of low-level features obtained using the base
framework were used to train a Support Vector Machine (SVM) [72, 86] classifier to learn
the salient p oints on the 3D surface mesh. WEKAs implementation of SVM was used for
all experiments [93]. A training set, consisting of 75 morphs of 5 human heads was used to
train the classifier to learn the characteristics of the salient points for faces in terms of the
histograms of their low-level features.
Although the salient training points were selected only to be commonly used craniofa-
cial landmark points, empirical studies determined that the classifier actually finds salient
regions with a combination of high curvature and low entropy values. This result can be
observed in the different histograms of salient and non-salient p oints in Figure 5.3. In the
figure, the salient point histograms have mainly low bin counts in the bins corresponding
to low curvature values and a high bin count in the last (highest) curvature bin. The
non-salient point histograms have mainly medium to high bin counts in the low curvature
-
8/12/2019 Atmos u Karto Phd
46/139
28
Figure 5.2: Example of manually marked salient (blue color) and non-salient (red color)points on a human head model. The salient points include corners of the eyes, tip of thenose, corners of the nose, corners of the mouth, and chin.
bins and in some cases a high bin count in the last bin. The entropy of the salient point
histograms also tends to be lower than the entropy of the non-salient point histograms. The
classifier approach avoided the use of brittle thresholds.
Figure 5.4 shows results of the salient points predicted on two faces in the 22q11.2DS
database, which include not just the manually marked points but other points with the same
characteristics. The salient points are colored according to the assigned classifier confidence
score. Non-salient points are colored in red, while salient points are colored in different
shades of blue with dark blue having the highest prediction score.
5.2 Learning Salient Points for Deformational Plagiocephaly
A similar learning-based approach was used to find salient points for 3D heads with de-
formational plagiocephaly. The salient p oint classifier for deformational plagiocephaly was
trained on a set of points marked on the flat areas at the back of the head of individuals with
deformational plagiocephaly. The training salient points consisted of 10 marked points on
the flat areas of 10 heads with deformational plagiocephaly, while the non-salient training
points were selected from 10 heads without deformational plagiocephaly. Histograms of the
azimuth-elevation low-level features obtained using the base framework were used to train a
Support Vector Machine (SVM) classifier to learn the salient p oints on the 3D heads. After
-
8/12/2019 Atmos u Karto Phd
47/139
29
E = 0.348 E=2.435 E=2.79
Salient point histograms
E=3.95 E=3.877 E=4.185
Non-salient point histograms
Figure 5.3: Example histograms of salient and non-salient points. The salient point his-tograms have a high value in the last bin illustrating a high curvature in the region, whilelow values in the remaining bins in the histogram. The non-salient point histograms have
more varied values in the curvature histogram. In addition, the entropy E of the salientpoint histogram is lower than the non-salient point histogram (listed under each histogram).
Figure 5.4: Salient point prediction for two faces in the 22q11.2DS dataset. Non-salientpoints are colored in red, while salient points are colored in different shades ranging fromgreen to blue, depending on the classifier confidence score assigned to the point. A threshold(T = 0.95) was applied to include only salient points with high confidence scores.
-
8/12/2019 Atmos u Karto Phd
48/139
30
training was complete, the classifier was able to label each point on a 3D head as either
salient or non-salient and provide a confidence score for each decision. The same threshold,T = 0.95, was applied to the confidence scores for the salient points.
5.3 Learning Salient Points for General 3D Objects
The salient point classifier for general 3D object classification was trained on selected objects
from the Heads database using the craniofacial landmark points that were used in the
22q11.2DS application. A small training set consisting of 25 morphs of the cat head model,
25 morphs of the dog head model, and 50 morphs of human head models was used to train
the classifier to learn the characteristics of salient points for general 3D object classification.
Histograms of low-level features obtained using the base framework were used to train a
Support Vector Machine (SVM) classifier to learn the salient points on general 3D objects.
A threshold T = 0.95 was also applied to the confidence scores for the classifier salient
points. Figure 5.5 shows results of the salient p oints predicted on instances of the cat,
dog and human head class in the Heads, which include, as previously mentioned, not just
the manually marked points, but other points with the same characteristics. The salient
points are colored according to the assigned classifier confidence score. Non-salient pointsare colored in red, while salient points are colored in different shades of blue with dark blue
having the highest prediction score. While the classifier was only trained on cat heads, dog
heads, and human heads, it does a good job of finding salient points on the other classes
of heads, and the 3D patterns produced are repeatable across objects of the same class.
Figure 5.6 shows the predicted salient points on new object classes that were not included
in the training phase.
The trained classifier was also tested on the SHREC 2008 Classification dataset. Exper-
imental results show the labeled salient points were quite satisfactory. Figure 5.7 shows the
salient points predicted on a number of objects from the SHREC 2008 database. Note that
on this database, which has a lot of intra-class shape variance, the salient point patterns
are not consistent across all members of each class.
After learning and identifying the application-dependent salient points for the 3D ob-
jects in the dataset, the signature for each 3D object is constructed as described next in
-
8/12/2019 Atmos u Karto Phd
49/139
31
(a) (b) (c)
Figure 5.5: Salient point prediction for (a) cat head class, (b) dog head class, and (c) human
head class. Non-salient points are colored in red, while salient points are colored in differentshades ranging from green to blue, depending on the classifier confidence score assigned tothe point. A threshold (T = 0.95) was applied to include only salient points with highconfidence scores.
(a) (b) (c)
Figure 5.6: Salient p oint prediction for (a) rabbit head class, (b) horse head class, and (c)leopard head class from the Heads database. Even though all three classes were not includedin the training, the training model was able to predict salient points across the classes.
-
8/12/2019 Atmos u Karto Phd
50/139
32
(a) (b) (c) (d)
Figure 5.7: Salient point prediction for (a) human class, (b) bird class, (c) human hand
class, and (d) bottle class from the SHREC 2008 database. Note that for classes that havea lot of intra-class shape variance the salient point patterns are not consistent across allmembers of those classes as seen in column (a).
-
8/12/2019 Atmos u Karto Phd
51/139
33
Chapter 6.
-
8/12/2019 Atmos u Karto Phd
52/139
34
Chapter 6
2D LONGITUDE-LATITUDE SALIENT MAP SIGNATURE
Most 3D object analysis methods require the use of a 3D descriptor or signature to
describe the shape and properties of the 3D objects. This chapter describes the construc-
tion of the 3D object signature using the salient point patterns, obtained using the learning
approach described in Chapter 5, mapped onto a 2D plane via a longitude-latitude transfor-
mation, described in Section 6.1. Classification of 3D objects is then performed by training
a classifier using the 2D salient maps of the objects. Results of classification using the 2D
salient map signature are given in Section 6.2. Retrieval of 3D objects is performed by
calculating the distances between the salient signature of the query object and the salient
map signatures of all objects in the database. Results of retrieval using the 2D salient map
signature are given in Section 6.3. Section 6.4 investigates how the salient patterns are used
to obtain 2D salient views for 3D object retrieval.
6.1 Salient Point Pattern Projection
Before mapping the salient point patterns obtained in Chapter 5 onto the 2D plane, the
salient points are assigned a label according to the classifier confidence score assigned to
the point. The classifier confidence score range is then discretized into a number of bins.
For the experiments, at confidence level 0.95 and above, the confidence score range was
discretized into 5 bins. Each salient point on the 3D mesh is then assigned a label based on
the bin into which its confidence score falls.
To obtain the 2D longitude-latitude map signature for an object, the longitude and
latitude p ositions of all the 3D points on the objects surface are calculated. Given any
point pi (pix, piy, piz), the longitude position i and latitude position i of point pi are
-
8/12/2019 Atmos u Karto Phd
53/139
35
calculated as follows:
i= arctan(piz
pix) i= arctan( p
iy(p2ix+p
2iz)
)
A 2D map of the longitude and latitude positions of all the points on the objects surface
is created by discretizing the longitude and latitude values of the points into a fixed number
of pixels. A pixel is labeled with the salient point label of the points that fall into that
pixel. If more than one label is mapped to a pixel, the label with the highest count is used
to label the pixel. Figure 6.1 shows the salient point patterns for the cat head, dog head,
and human head model in the Heads database and their corresponding 2D map signatures.Figure 6.2 shows how different objects that belong to the same class will have similar 2D
longitude-latitude signature maps.
(a) (b) (c)
Figure 6.1: Salient p oint patterns on 3D objects of Figure 5.4 and their corresponding 2Dlongitude-latitude map signatures.
To reduce noise in the 2D longitude-latitude map signature, a wavelet transformation
was applied to the 2D map signatures. In the experiments, the 2D longitude-latitude map
signatures were treated as 2D images and decomposed using image-based Haar wavelet
-
8/12/2019 Atmos u Karto Phd
54/139
36
human head
rabbit head
horse head
wildcat head
Figure 6.2: Objects that are similar and belong to the same class will have similar 2D
longitude-latitude signature maps.
function. The wavelet function decomposes the 2D image into approximation and detail
coefficients. The approximation and detail coefficients at the second level were collected and
concatenated into a new feature vector with dimension d= 13134. This final feature
vector became the descriptor for each object in the database and was used for classification
and retrieval. For most experiments, the noise reduction step was not found to improve the
classification and retrieval performances except for the SHREC dataset (Section 6.2.4).
6.2 Classification using 2D Map Signature
By creating a signature for each 3D objects, it is now possible to perform classification of
3D ob jects in a given database. Several classification experiments were performed on each
of the acquired datasets described in Chapter 3.
-
8/12/2019 Atmos u Karto Phd
55/139
37
6.2.1 Classification of 22q11.2DS Dataset
The goal of this experiment was to classify each individual in the dataset as either affected
or unaffected by 22q11.2DS and to measure the classification accuracy. The salient points
classifier was trained on a subset of the craniofacial anthropometric landmarks marked
on 3D human head models as explained in Chapter 5. Table 6.1 shows the classification
performance with two different classifiers: Adaboost and SVM. Evaluation was done using
the following measures: classification accuracy, precision and recall rates, F-measure, true
positive, and false positive rates. The classification accuracy for the higher scoring SVM
classifier is 86.7%, which is higher than that obtained from a study of three human experts
whose mean accuracy was 72.5% [92].
Table 6.1: Classification performance for 22q11.2DS.
Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.804 0.795 0.804 0.791 0.804 0.387
SVM 0.867 0.866 0.868 0.861 0.868 0.27
The classification accuracy of the map signature was compared to some of the state-of-
the-art and best performing 3D object descriptors in the literature. The following existing
descriptors were used for comparison: Light Field Descriptor (LFD) [16], ray-based spherical
harmonics (SPH) [39], shape distribution of distance between random points (D2) [62], and
absolute angle distance histogram (AAD) [59]. The Light Field Descriptor (LFD) is a view-
based descriptor that extracts features from 100 2D silhouette image views and measures
the distance between two 3D objects by finding the best correspondence between the set
of 3D views for the two ob jects. The Spherical Harmonics method calculates the maximal
extent of a shape across all rays from the origin and uses spherical harmonics to represent
the function. The shape function D2 represents 3D objects by calculating the global shape
distribution of distances between two random points, while the AAD method enhances
the D2 shape function by measuring not only the distance between two random points,
but also the mutual orientation of the surfaces on which the pair of points is located.
-
8/12/2019 Atmos u Karto Phd
56/139
-
8/12/2019 Atmos u Karto Phd
57/139
39
6.2.2 Classification of Deformational Plagiocephaly Dataset
The goal of this experiment was to classify each individual as either control or case affected
by the plagiocephaly condition and to measure the classification accuracy. The salient
points for the map signature were obtained by using the salient flat p oint classifier as
explained in Chapter 5. The classification experiments were performed on the Deformational
Plagiocephaly Dataset introduced in Chapter 3.
Table 6.4 shows the classification accuracy of the method on the full 254 individual
dataset. The groundtruth for the classification was the referral doctors originally assigned
patient status: case or control. Table 6.5 shows the classification accuracy of the method
on the trimmed 140-individual dataset in which the experts agreed. The Adaboost classifier
obtains a 80.3% classification accuracy on the full dataset and an improved 87.9% accuracy
on the trimmed dataset.
Table 6.4: Classification performance for plagiocephaly using the full 254 individualsdataset.
Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.803 0.805 0.803 0.804 0.803 0.208SVM 0.787 0.787 0.787 0.787 0.787 0.233
Table 6.5: Classification performance for plagiocephaly using the trimmed 140 individualsdataset.
Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate
Adaboost 0.879 0.878 0.879 0.878 0.879 0.156
SVM 0.85 0.849 0.85 0.849 0.85 0.19
The classification accuracy of the methodology for this application was also compared
to existing state-of-the-art descriptors. Table 6.6 shows that the 2D salient map signature
achieves higher classification accuracy for deformational plagiocephaly than other existing
methods, including the LFD descriptor and others discussed in Chapter 2.
-
8/12/2019 Atmos u Karto Phd
58/139
40
Table 6.6: Comparison of classification accuracy for plagiocephaly.
Dataset Salient 2D map LFD SPH D2 AAD
Full 254 dataset 0.803 0.72 0.673 0.650 0.685
Trimmed 140 dataset 0.879 0.714 0.743 0.779 0.721
Classification of this condition can be incorporated into epidemiologic research on the
prevalence and long-term outcome of deformational plagiocephaly, which may eventually
lead to improved clinical care for infants with deformational plagiocephaly.
6.2.3 Classification of Heads Dataset
The Heads database can be thought of as a first step toward testing the 2D salient map
signature on more general shapes still in the craniofacial category, but for multiple different
animals where face shapes can be quite different.
In the first set of experiments, all objects in the Heads database were pose-normalized by
rotating the heads to face the same orientation, as was the case for the medical craniofacial
datasets. Classification of the 3D objects in the database was performed by training a SVM
classifier on the salient point patterns of each class using the 2D longitude-latitude map
signature of the ob jects in the class. The classifier was trained using the signatures of 25
objects from each class for all seven classes in the database and tested with a separate test
set consisting of 50 ob jects per class for each of the seven classes. The classifier achieved
100% classification accuracy in classifying all the pose-normalized objects in the database.
Since 3D objects may be encountered in the world at any orientation, rota