atmos u karto phd

Upload: iurie-cojocari

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Atmos u Karto Phd

    1/139

    3D Shape Analysis for Quantification, Classification, and Retrieval

    Indriyati Atmosukarto

    A dissertation submitted in partial fulfillment ofthe requirements for the degree of

    Doctor of Philosophy

    University of Washington

    2010

    Program Authorized to Offer Degree: Computer Science and Engineering

  • 8/12/2019 Atmos u Karto Phd

    2/139

  • 8/12/2019 Atmos u Karto Phd

    3/139

    University of WashingtonGraduate School

    This is to certify that I have examined this copy of a doctoral dissertation by

    Indriyati Atmosukarto

    and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final

    examining committee have been made.

    Chair of the Supervisory Committee:

    Linda G. Shapiro

    Reading Committee:

    Linda G. Shapiro

    James F. Brinkley III

    Maya Gupta

    Date:

  • 8/12/2019 Atmos u Karto Phd

    4/139

  • 8/12/2019 Atmos u Karto Phd

    5/139

    In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make its copies

    freely available for inspection. I further agree that extensive copying of this dissertation isallowable only for scholarly purposes, consistent with fair use as prescribed in the U.S.Copyright Law. Requests for copying or reproduction of this dissertation may be referredto Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,1-800-521-0600, to whom the author has granted the right to reproduce and sell (a) copiesof the manuscript in microform and/or (b) printed copies of the manuscript made frommicroform.

    Signature

    Date

  • 8/12/2019 Atmos u Karto Phd

    6/139

  • 8/12/2019 Atmos u Karto Phd

    7/139

    University of Washington

    Abstract

    3D Shape Analysis for Quantification, Classification, and Retrieval

    Indriyati Atmosukarto

    Chair of the Supervisory Committee:

    Professor Linda G. Shapiro

    Computer Science and Engineering

    Three-dimensional objects are now commonly used in a large number of applications includ-

    ing games, mechanical engineering, archaeology, culture, and even medicine. As a result,

    researchers have started to investigate the use of 3D shape descriptors that aim to encapsu-

    late the important shape properties of the 3D objects. This thesis presents new 3D shape

    representation methodologies for quantification, classification and retrieval tasks that are

    flexible enough to be used in general applications, yet detailed enough to be useful in medical

    craniofacial dysmorphology studies. The methodologies begin by computing low-level fea-

    tures at each point of the 3D mesh and aggregating the features into histograms over mesh

    neighborhoods. Two different methodologies are defined. The first methodology begins by

    learning the characteristics of salient point histograms for each particular application, and

    represents the points in a 2D spatial map based on longitude-latitude transformation. The

    second methodology represents the 3D objects by using the global 2D histogram of the

    azimuth-elevation angles of the surface normals of the points on the 3D objects.

    Four datasets, two craniofacial datasets and two general 3D object datasets, were ob-

    tained to develop and test the different shape analysis methods developed in this thesis.Each dataset has different shape characteristics that help explore the different properties of

    the methodologies. Experimental results on classifying the craniofacial datasets show that

    our methodologies achieve higher classification accuracy than medical experts and existing

    state-of-the-art 3D descriptors. Retrieval and classification results using the general 3D ob-

  • 8/12/2019 Atmos u Karto Phd

    8/139

  • 8/12/2019 Atmos u Karto Phd

    9/139

    jects show that our methodologies are comparable to existing view-based and feature-based

    descriptors and outperform these descriptors in some cases. Our methodology can also beused to speed up the most powerful general 3D object descriptor to date.

  • 8/12/2019 Atmos u Karto Phd

    10/139

  • 8/12/2019 Atmos u Karto Phd

    11/139

    TABLE OF CONTENTS

    Page

    List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

    Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.1 3D Descriptors for General Objects . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Chapter 3: Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset . . . . . . . . . . . . . . . . 153.2 Deformational Plagiocephaly Dataset . . . . . . . . . . . . . . . . . . . . . . . 17

    3.3 Heads Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.4 SHREC Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    Chapter 4: Base Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.1 Low-level Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.2 Mid-level Feature Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    Chapter 5: Learning Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    5.1 Learning Salient Points for 22q11.2 Deletion Syndrome . . . . . . . . . . . . . 265.2 Learning Salient Points for Deformational Plagiocephaly . . . . . . . . . . . . 28

    5.3 Learning Salient Points for General 3D Objects . . . . . . . . . . . . . . . . . 30

    Chapter 6: 2D Longitude-Latitude Salient Map Signature . . . . . . . . . . . . . . 34

    6.1 Salient Point Pattern Projection . . . . . . . . . . . . . . . . . . . . . . . . . 34

    i

  • 8/12/2019 Atmos u Karto Phd

    12/139

    6.2 Classification using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . 36

    6.3 Retrieval using 2D Map Signature . . . . . . . . . . . . . . . . . . . . . . . . 45

    6.4 Retrieval using Salient Views . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    Chapter 7: Global 2D Azimuth-Elevation Angles Histogram of Surface NormalVectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    7.1 3D Shape Severity Quantification and Localization for Deformational Plagio-cephaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    7.2 Classification of 22q11.2 Deletion Syndrome . . . . . . . . . . . . . . . . . . . 78

    7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    Chapter 8: Learning 3D Shape Quantification for Craniofacial Research . . . . . . 838.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    8.2 Facial Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    8.3 2D Histogram of Azimuth Elevation Angles . . . . . . . . . . . . . . . . . . . 86

    8.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    8.5 Feature Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    8.6 Exp erimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    ii

  • 8/12/2019 Atmos u Karto Phd

    13/139

    LIST OF FIGURES

    Figure Number Page

    1.1 Example of applications that use 3D objects . . . . . . . . . . . . . . . . . . . 2

    2.1 Anthropometric landmarks on patients head . . . . . . . . . . . . . . . . . . 12

    3.1 Example of 3D face mesh data of children with 22q11.2 deletion syndrome. . 16

    3.2 Tops of heads of children with deformational plagiocephaly. . . . . . . . . . . 17

    3.3 Example of objects in the Heads dataset. . . . . . . . . . . . . . . . . . . . . 19

    3.4 Example morphs from the horse class . . . . . . . . . . . . . . . . . . . . . . 19

    3.5 Example of objects in the SHREC 2008 Classification dataset . . . . . . . . . 20

    4.1 Low-level feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.2 Azimuth and elevation angle of a 3D surface normal vector. . . . . . . . . . . 24

    4.3 Mid-level feature aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    5.1 Craniofacial anthropometric landmarks. . . . . . . . . . . . . . . . . . . . . . 27

    5.2 Example of training points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    5.3 Example histograms of salient and non-salient points . . . . . . . . . . . . . . 29

    5.4 Salient point prediction for two faces in the 22q11.2DS dataset . . . . . . . . 29

    5.5 Salient point prediction for training data in Heads dataset . . . . . . . . . . . 31

    5.6 Salient point prediction for testing data in Heads dataset . . . . . . . . . . . 31

    5.7 Salient point prediction for objects in SHREC 2008 dataset . . . . . . . . . . 32

    6.1 Salient point patterns on 3D objects . . . . . . . . . . . . . . . . . . . . . . . 35

    6.2 2D longitude-latitude signature maps . . . . . . . . . . . . . . . . . . . . . . . 36

    6.3 Classification accuracy vs training rotation angle increment. . . . . . . . . . . 42

    6.4 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    6.5 Comparison of retrieval results . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    6.6 Salient points resulting from clustering. . . . . . . . . . . . . . . . . . . . . . 54

    6.7 Salient view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6.8 Salient views vs Distinct salient views . . . . . . . . . . . . . . . . . . . . . . 56

    6.9 Top 5 distinct salient views in SHREC dataset . . . . . . . . . . . . . . . . . 57

    6.10 Average retrieval scores using top K salient views . . . . . . . . . . . . . . . . 59

    iii

  • 8/12/2019 Atmos u Karto Phd

    14/139

    7.1 Surface normal vectors of 3D points . . . . . . . . . . . . . . . . . . . . . . . 66

    7.2 Calculation of the Flatness Scores . . . . . . . . . . . . . . . . . . . . . . . . 67

    7.3 Severity localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    7.4 Spectrum of deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    7.5 Correlation between LPFS and Expert Score . . . . . . . . . . . . . . . . . . 70

    7.6 Correlation between RPFS and Expert Score . . . . . . . . . . . . . . . . . . 70

    7.7 Correlation between AS and Expert Score . . . . . . . . . . . . . . . . . . . . 72

    7.8 Correlation between AAS and Expert Score . . . . . . . . . . . . . . . . . . . 72

    7.9 Correlation between AAS and aOCLR . . . . . . . . . . . . . . . . . . . . . . 74

    7.10 ROC curve for LPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    7.11 ROC curve for RPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    7.12 ROC curve for AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.13 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    7.14 Correlation between AAS and Brachycephaly score . . . . . . . . . . . . . . . 76

    7.15 ROC curve for AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    7.16 Projections of 2D azimuth-elevation angles to the face . . . . . . . . . . . . . 81

    8.1 Overview of the quantification learning framework. . . . . . . . . . . . . . . . 83

    8.2 Facial region selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    8.3 2D histogram of selected region . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    8.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    8.5 Positional information about selected region . . . . . . . . . . . . . . . . . . . 898.6 Positional information about selected region with normal vector . . . . . . . . 90

    8.7 Output of the genetic programming quantification approach . . . . . . . . . . 91

    8.8 F-measure for training and testing dataset . . . . . . . . . . . . . . . . . . . . 94

    8.9 Projection of selected histogram bins . . . . . . . . . . . . . . . . . . . . . . . 100

    8.10 Tree structure for quantifying midface hypoplasia . . . . . . . . . . . . . . . . 103

    8.11 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 105

    8.12 Tree structure for quantifying nasal facial abnormalities . . . . . . . . . . . . 106

    8.13 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 107

    8.14 Tree structure for quantifying oral facial abnormalities . . . . . . . . . . . . . 108

    8.15 Quantification score for midface hypoplasia. . . . . . . . . . . . . . . . . . . . 109

    iv

  • 8/12/2019 Atmos u Karto Phd

    15/139

    LIST OF TABLES

    Table Number Page

    4.1 Besl-Jain surface characterization. . . . . . . . . . . . . . . . . . . . . . . . . 23

    6.1 Classification performance for 22q11.2DS. . . . . . . . . . . . . . . . . . . . . 37

    6.2 Overall comparison of the various shape descriptors. . . . . . . . . . . . . . . 38

    6.3 Comparison of classification accuracy for 22q11.2DS. . . . . . . . . . . . . . . 38

    6.4 Plagiocephaly classification using 254 individual dataset . . . . . . . . . . . . 396.5 Plagiocephaly classification using 140 individuals dataset . . . . . . . . . . . . 39

    6.6 Comparison of classification accuracy for plagiocephaly. . . . . . . . . . . . . 40

    6.7 Comparison of classification accuracy for SHREC 2008 dataset. . . . . . . . . 43

    6.8 Comparison of timing of each phase . . . . . . . . . . . . . . . . . . . . . . . 44

    6.9 Pose-normalized retrieval experiment 2 . . . . . . . . . . . . . . . . . . . . . . 46

    6.10 Average retrieval score comparing three pose-normalization methods. . . . . . 48

    6.11 Average retrieval score using different low-level features . . . . . . . . . . . . 48

    6.12 Average retrieval score using image wavelet analysis . . . . . . . . . . . . . . 49

    6.13 Comparing the salient map signature best results against existing methods. . 496.14 Comparing retrieval score for classes in SHREC dataset . . . . . . . . . . . . 50

    6.15 Average retrieval score using salient views . . . . . . . . . . . . . . . . . . . . 62

    6.16 Retrieval score using maximum number of distinct views . . . . . . . . . . . . 63

    6.17 Average feature extraction runtime per object. . . . . . . . . . . . . . . . . . 64

    7.1 Descriptive statistics for the Left Posterior Flatness Score (LPFS) . . . . . . 71

    7.2 Descriptive statistics for the Right Posterior Flatness Score (RPFS) . . . . . 73

    7.3 Descriptive statistics for the Asymmetry Score (AS) . . . . . . . . . . . . . . 78

    7.4 Descriptive statistics for AAS and aOCLR . . . . . . . . . . . . . . . . . . . . 79

    7.5 AUC for quantifying posterior flattening . . . . . . . . . . . . . . . . . . . . . 80

    7.6 Classification accuracy for plagiocephaly . . . . . . . . . . . . . . . . . . . . . 80

    7.7 Classification of 22q11.2DS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    7.8 Classification accuracy of 22q11.2DS facial dysmorphologies . . . . . . . . . . 81

    8.1 Genetic programming parameters. . . . . . . . . . . . . . . . . . . . . . . . . 92

    v

  • 8/12/2019 Atmos u Karto Phd

    16/139

    8.2 Classification performance for nine facial anomalies using GP . . . . . . . . . 93

    8.3 Classification performance using various shape descriptors . . . . . . . . . . . 95

    8.4 Comparing GP to the global approaches . . . . . . . . . . . . . . . . . . . . . 96

    8.5 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 97

    8.6 GP mathematical expressions for midface hypoplasia . . . . . . . . . . . . . . 98

    8.7 Coefficients for midface hypoplasia . . . . . . . . . . . . . . . . . . . . . . . . 99

    8.8 Best performing mathematical expression . . . . . . . . . . . . . . . . . . . . 101

    8.9 Best performing mathematical expressions . . . . . . . . . . . . . . . . . . . . 102

    8.10 Classification performance in predicting 22q11.2 Deletion Syndrome. . . . . . 104

    vi

  • 8/12/2019 Atmos u Karto Phd

    17/139

    ACKNOWLEDGMENTS

    I wish to express a very deep and sincere gratitude to my advisor, Professor Linda

    Shapiro, without whose guidance, encouragement and support I would not be able to com-

    plete this PhD study. I have learned tremendously from her on how to become an excellent

    researcher and writer, especially one in the field of computer vision.

    I am very grateful to all the members of my PhD thesis committee, Dr Maya Gupta,

    Dr James Brinkley, Dr Steve Seitz, and Dr Mark Ganther, for their useful feedbacks and

    comments.

    I would also like to thank my collaborators at Seattle Childrens Hospital Cranifoacial

    Centre: Dr Michael Cunningham, Dr Matthew Speltz, Dr Carrie Heike, Dr Brent Collett,

    for providing me with the medical 3D mesh data for this dissertation, as well as for their

    engaging discussions and suggestions.

    I owe an indescribable amount of gratitude to my parents, my sisters, and my niece for

    having confidence in me, always encouraging me and cheering me up when I am down.Finally, I reserve special thanks for my husband, David Gomulya, for being my best

    friend and a great supporter, and my son, Kiran, for bringing new joy into my life.

    This research was supported by the National Science Foundation under grant number

    DBI-0543631.

    vii

  • 8/12/2019 Atmos u Karto Phd

    18/139

    DEDICATION

    to my son

    Kiran Atmosukarto Gomulya

    our Ray of Light

    viii

  • 8/12/2019 Atmos u Karto Phd

    19/139

    1

    Chapter 1

    INTRODUCTION

    1.1 Motivation

    Advancement in technology for digital acquisition of 3D models has led to an increase in

    the number of 3D ob jects available. Three-dimensional ob jects are now commonly used

    in a number of areas such as games, mechanical design for CAD models, archaeology and

    cultural heritage, and medical research studies. Figure 1.1 shows some applications that use

    3D objects. The widespread integration of 3D models in different fields motivates the need

    to be able to store, index, classify, and retrieve 3D objects automatically. However, current

    classification and retrieval techniques for text, 2D images, and videos cannot be directly

    translated and applied to 3D ob jects, as 3D ob jects have different data characteristics from

    other data modalities.

    Classification and retrieval of 3D objects requires the 3D objects to be represented in

    a way that captures the local and global shape characteristics of the object. This requires

    creating a 3D descriptor or signature that summarizes the important shape properties of the

    object. Unfortunately, finding a descriptor that is able to describe the important character-

    istics of a 3D object is not a trivial task. The descriptor should be able to capture a good

    balance between the global and local shape properties of the ob ject, so as to allow flexibility

    in performing different tasks. The global properties of an object capture the overall shape

    of an object, while the local properties capture the details of an object.

    A specific example of the usage of 3D models in the medical field is the work that re-

    searchers at Seattle Childrens Hospital Craniofacial Center (SCHCC) are pursuing. The

    researchers at SCHCC use CT scans and 3D surface meshes of childrens heads to inves-

    tigate head shape dysmorphology due to craniofacial disorders such as craniosynostosis,

    22q11.2 deletion syndrome, deformational plagiocephaly, or cleft lip and palate. These

    researchers aspire to develop new computational techniques that can represent, quantify,

  • 8/12/2019 Atmos u Karto Phd

    20/139

    2

    (a) (b) (c) (d)

    Figure 1.1: Example of applications that use 3D objects: (a) Second Life is a game thatsimulates a virtual 3D world, (b) The Digital Michelangelo is a Stanford project that aimsto digitize cultural artifacts for cataloging, conservation, and restoration, (c) FoldIt! is acomputer game that uses 3D protein structures to understand how proteins fold for use indrug developments, and (d) Plan3D is an interior design application that allows users to

    incorporate 3D models in house designs.

    and analyze variants of biological morphology from the 3D models acquired from stereo

    camera technology. The objective of their research in the long run is to ultimately reveal

    genotype-phenotype disease associations.

    This thesis investigates new methodologies for representing 3D objects that are useful

    in medical applications. Most existing 3D shape descriptors have only been developed and

    tested on general 3D object datasets, while those designed for medical purposes must usually

    satisfy a specific medical application and dataset. The objective of this work is to develop

    3D shape representation methodologies that are flexible enough to generalize from specific

    medical tasks to general 3D object tasks. This work was motivated by the collaborations in

    two research studies at SCHCC for the study of craniofacial anatomy: 1) a study of children

    with 22q11.2 deletion syndrome and 2) a study of infants with deformational plagiocephaly.

    22q11.2 deletion syndrome (22q11.2DS) is a genetic disease that is one of the most com-

    mon multiple anomaly syndromes in humans [41]. This condition is associated with more

    than 180 clinical features, including over 25 dysmorphic craniofacial features. Abnormal

    clinical features of individuals with 22q11.2DS include asymmetric face shape, hooded eyes,

    bulbous nasal tip, and retrusive chin, among others. The range of variation in individual

    feature expression is very large. As a result, even experts have difficulty in diagnosing

    22q11.2DS from frontal facial photographs alone [9]. Early detection of 22q11.2DS is im-

  • 8/12/2019 Atmos u Karto Phd

    21/139

    3

    portant as many affected individuals are born with conotruncal cardiac anomalies, mild-to-

    moderate immune deficiencies and learning disabilities, all of which can benefit from earlyintervention.

    Deformational plagiocephaly (also known as positional plagiocephaly, or non-synostotic

    plagiocephaly) refers to the deformation of the head, characterized by a persistent flattening

    on the side resulting in an asymmetric head shape and misalignment of the ears. Deforma-

    tional plagiocephaly is caused by persistent pressure on the skull of a baby before or after

    birth. Another possible factor that can lead to deformational plagiocephaly is torticollis, a

    muscle tightness in the neck resulting in a limited range of motion for the head that causes

    infants to look in one direction and to rest on the same spot of the back of the head. If left

    untreated, children with these abnormal head shape conditions may experience a number

    of medical issues in their lives, ranging from social problems due to abnormal appearance

    to delayed neurocognitive development [18, 77].

    1.2 Problem Statement

    Motivated by collaborations with researchers at SCHCC, this thesis develops 3D shape

    representation methodologies that can be used for 3D shape classification, retrieval, andquantification. The methodologies provide flexibility to generalize usage for b oth specific

    medical datasets and general 3D objects. The following three general problems are tackled.

    Problem 1: 3D shape quantification

    Given surface mesh Si, which consists ofn points and information regarding the connec-

    tivity of the points, the goal is to analyze and describe the shape Si by constructing a

    numeric representation of meshSi commonly referred to as a signature or descriptor Di. A

    quantitative score may also be calculated from the obtained signature.

    Problem 2: 3D shape classification

    Given a database of 3D shapesS={S1, S2,...,SN}that have been quantified and described

    using their respective numeric signatures Di, 1 i N, and are pre-classified into a num-

    ber ofCclasses, the goal is to create an algorithm that can be used to determine to which

  • 8/12/2019 Atmos u Karto Phd

    22/139

    4

    class a new 3D object Qbelongs.

    Problem 3: 3D shape retrieval

    Given a database of 3D shapesS= {S1, S2,...,SN}that have been quantified and described

    using their respective numeric signature Di, 1 i N, the goal is to create an algorithm

    that retrieves all objects in S that are similar to a query object Q based on their numeric

    signatures.

    1.3 Thesis Outline

    Chapter 2 discusses the literature related to the two main classes of research in this thesis:

    3D object descriptor in the computer vision literature and craniofacial assessment in the

    medical literature. The datasets used to develop and test the methodology are described in

    Chapter 3. Chapter 4 explains the base framework for feature extraction. The method for

    learning the salient points of a 3D object is explained and applied to different applications

    in Chapter 5. Two different types of 3D object descriptors are introduced and analyzed in

    Chapters 6 and 7. Chapter 6 describes the 2D longitude-latitude salient map signature and

    investigates its application for classification and retrieval of both general 3D objects and3D medical data. Chapter 7 covers the global 2D azimuth-elevation angles descriptor and

    investigates its application for classification of deformational plagiocephaly and 22q11.2DS

    datasets. A learning framework for quantification using genetic programming is described

    in Chapter 8. Finally, Chapter 9 provides a summary and suggests p ossible future research

    directions.

  • 8/12/2019 Atmos u Karto Phd

    23/139

    5

    Chapter 2

    RELATED LITERATURE

    In this chapter, two main classes of research related to the work in this thesis are

    described: 3D shape descriptors for general objects from the computer vision literature

    and medical studies from the craniofacial literature.

    2.1 3D Descriptors for General Objects

    Three-dimensional shape analysis and its application in 3D object retrieval and classification

    has received increased attention in the past few years. There have been several survey

    papers on the topic [81, 82, 26, 95, 20, 36, 13, 14, 12, 52, 69]. Starting in 2006, researchers

    in the area have taken the initiative to organize an annual 3D shape retrieval evaluation

    contest called SHREC (SHape REtrieval Contest), currently organized by the Network of

    Excellence AIM@SHAPE. The contests general objective is to evaluate the effectiveness of

    3D shape retrieval algorithms. Participants register for the contest before the test set is

    made available. The participants are given 48 hours to apply their 3D retrieval algorithm

    to the test set and submit their retrieval results to the organizers. The retrieval results are

    evaluated using measurements that relate to precision and recall. The average performance

    of each method over a set of queries is calculated to obtain an overall impression of the

    algorithms performance. Using a common test set and queries allows a direct comparison

    of the different algorithms. The contest started with a single-track using only the Princeton

    benchmark database as the test set and has evolved into a multi-track contest. The tracks

    now include retrieval of watertight models, CAD models, protein models, 3D face models,

    and partial matching. Results of the contest show that no one descriptor performs the best

    for all kinds of retrieval and classification task. Each descriptor has its own strength and

    weakness for the different queries and tasks.

    There are three broad categories of 3D object representation: feature-based methods,

  • 8/12/2019 Atmos u Karto Phd

    24/139

    6

    graph-based methods, and view-based methods.

    2.1.1 Feature-based methods

    Feature-based 3D object descriptors, which are the most popular, can be further catego-

    rized into: (1) global features, (2) global feature distributions, (3) spatial maps, and (4)

    local features. Early work on 3D ob ject representation and its application to retrieval and

    classification focused more on the global features and global feature distribution approaches.

    Global features computed to represent 3D ob jects include area, volume and moments. Elad

    et al. [22] computed the moments properties of the object and used the vector value of the

    moments as a descriptor for the object. Osada et al. [62] calculated a number of global

    shape distributions to represent 3D objects. The shape functions measured included the

    angle between three random points (A3), the distance between a point and a random point

    (D1), the distance between two random points (D2), the area of the triangle between three

    random p oints (D3), and the volume between four random p oints on the surface (D4).

    Ohbuchiet al. [59] enhanced the D2 shape function by measuring not only the distance, but

    also the mutual orientation of the surfaces on which the pair of points is located. Zaharia

    et al. [96] introduced a 3D shape spectrum descriptor that computed the distribution of the

    shape index of the points over the whole mesh. Similar distributions were also calculated for

    other surface properties such as curvature. Some recent works continue to use the feature

    distribution approach. Mahmoudiet al. [53] computed the histogram of pairwise diffusion

    distances between all points, while Ion et al. [35] defined their descriptor as the histogram

    of the eccentricity transform. The histogram uses the maximum geodesic distance from a

    point to all other points on the surface. The global feature methods are computationally

    efficient, as they reduce the computation space of the 3D object by describing the object

    with fewer dimension; however these methods are not discriminative enough when the ob-

    jects have small differences as in intra-class retrieval cases or classification of very similar

    objects.

    Spatial map representations describe the 3D object by capturing and preserving physical

    locations on them. Saupe et al. [71] described a spherical extent function by calculating

  • 8/12/2019 Atmos u Karto Phd

    25/139

    7

    the maximal extent of a shape across all rays from the origin. They compared two different

    kinds of representation of the function: using spherical harmonics and moments. Theirresults showed that using spherical harmonics to represent the function performed better.

    The spherical harmonic coefficients reconstruct an approximation of the object at different

    resolutions. Kazhdanet al.[39] used this idea to show that spherical harmonics can be used

    to transform rotation-dependent shape descriptors into rotation-independent ones without

    the need to pose normalize the objects in advance. Their results showed that the applica-

    tion of the spherical harmonic representation improved the performance of most spherical

    function descriptors. Laga et al. [44, 43] uniformly sampled points on a unit sphere and

    used spherical wavelet transforms to represent 3D objects. Spherical wavelet descriptors are

    natural extensions of 3D Zernike moments and spherical harmonics; they offer better feature

    localization and rotation invariance since spherical harmonics analysis has singularities at

    each pole of the sphere.

    Wavelets are basis functions that represent a given signal at multiple resolutions. Laga

    investigated both second generation wavelets, including linear and butterfly spherical wavelets

    with a lifting scheme, and image wavelets with spherical boundary extension rules for con-

    structing the shape descriptor [73, 68]. He proposed three descriptors based on the sphericalwavelets: using the coefficients as feature vectors, using the L1 energy of the coefficients,

    and using the L2 energy of the coefficients. Zhenbaoet al. [51] compared their multireso-

    lution wavelet analysis to the spherical wavelet descriptor and showed that their descriptor

    performed slightly b etter. Their method characterized the shape orientation of the object

    by setting six view planes and sampled the shape orientation from each of the view planes.

    They then performed multiresolution wavelet analysis on each of the view planes and used

    the wavelet coefficients for each of the view planes as the feature vector. Assfalget al. [5]

    captured the shape of a 3D object using the curvature map of the objects surface. One of

    the methods developed in this thesis is quite related to this approach, however it differs in

    that it does not use the curvature information directly. Lastly, Tangelderet al. [80] devel-

    oped a 3D spatial map by dividing the 3D object into a 3D grid with cells of equal sizes

    and measuring the curvature property in each cell.

    Recent research is beginning to focus more on the local approach to representing 3D

  • 8/12/2019 Atmos u Karto Phd

    26/139

    8

    objects, as this approach has a stronger discriminative power when differentiating objects

    that are similar in overall shape [63]. Local features are often points that are consideredto be interesting or salient on the 3D ob ject. These points are computed in various ways.

    Some methods randomly select points on the surface of the object. Fromeet al. [25] who

    developed a 3D shape context and Johnson et al. [37] who designed spin image descriptors,

    both randomly selected points as their basis points. Shilane et al. [75, 76] used random

    points with harmonic shape descriptors at four different scales. Most other methods use

    the local geometric properties of the 3D object such as curvature or normals to describe the

    points on the surface of the object, and define the level difference extrema as the salient

    points. Leeet al. [46] used mean curvature properties with the center-surround mechanism

    to identify the extrema as final salient points. A similar method was adopted by Li et

    al. [47, 48] who found the reliable salient points by considering a set of extrema for a scale-

    space representation of a point-based input surface and used the locations of level difference

    extrema as the salient feature points. Unnikrishnan et al. [83] presented a multi-scale

    interest region detector that captures variation in shape at a point relative to the size of its

    neighborhood. Their method used the the extrema of the mean curvature to identify the

    salient points. Watanabe et al. [90] used salient extrema of the principal curvatures along

    the curvature lines on the surface. Castellaniet al. [15] proposed a new methodology for

    detecting and matching salient points based on measuring how much a vertex is displaced

    after filtering. The salient points are described using a local description based on a hidden

    Markov model.

    Ohbuchi et al. [60] rendered multiple views of a 3D model and extracted local features

    from each view using the SIFT algorithm. The local features were then integrated into a

    histogram using a bag-of-features approach to retrieval. Novatnacket al. [58] [57] extracted

    corners and edges of a 3D model by first parameterizing the surface of a 3D mesh model on

    a 2D map and constructing a dense surface normal map. They then constructed a discrete

    scale-space by convolving the normal map with Gaussian kernels of increasing standard

    deviation. The corners and edges detected at individual scales were combined into a unified

    representation for the 3D ob ject. Akagunduz et al. [2] used a Gaussian pyramid at several

    scales to extract the surface extrema and represented the points and their relationships by

  • 8/12/2019 Atmos u Karto Phd

    27/139

    9

    a graphical model. Taati et al. [79] generated a local shape descriptor based on invariant

    properties extracted from the principal component space of the local neighborhood around apoint. The salient p oints were selected based on ratios of basic dispersion properties. Other

    examples of local descriptors include spin images [37, 4], point signature [17], and symbolic

    signatures [70]. Some efforts have also been made in combining both the local and global

    properties of the object. Alosaimiet al. [3] combined the information in a 2D histogram and

    used the PCA coefficients of the histograms concatenated to form a single feature vector.

    Liuet al. [50, 49] represented a global 3D shape as the spatial configuration of a set of local

    features. The spatial configuration was represented by computing the distributions of the

    Euclidean distances between pairs of local shape clusters, represented by spin images.

    2.1.2 Graph-based methods

    While feature-based methods use only the geometric properties of the 3D model to define

    the shape of the object, graph-based methods use the topological information of the 3D

    object to describe its shape. The graph that is constructed shows how the different shape

    components are linked together. The graph representations include model graphs, Reeb

    graphs, and skeleton graphs. These methods are known to be computationally expensive

    and sensitive to small topological changes. Sundaret al. [78] used the skeletal graph as a

    shape descriptor to encode both geometric and topological properties of the 3D object. The

    similarity measures between two objects were approximated using a greedy algorithm for

    bipartite graph matching. Hilagaet al. [30] introduced the use of Reeb graphs for matching

    the shapes of articulated models.

    2.1.3 View-based methods

    The most effective view-based shape descriptor is the LightField descriptor developed by

    Chen et al. [16]. A light field around a 3D object is a 4D function that represents the

    radiance at a given 3D point in a given direction. Each 4D light field of a 3D object is

    represented as a collection of 2D images rendered from a 2D array of cameras distributed

    uniformly on a sphere. Their method places the light field cameras on 20 vertices of a

  • 8/12/2019 Atmos u Karto Phd

    28/139

    10

    regular dodecahedron and uses orthogonal projection to capture 10 different silhouettes of

    the 3D model. Ten different rotations are performed to capture a set of light field descriptorto improve robustness for rotation. The 100 rendered images are then described using

    Zernike moments and Fourier descriptors to describe the region shape and contour shape,

    respectively of the 3D model. The retrieval of the 3D models is performed in stages where

    objects that are greatly dissimilar to the query model are rejected early in the process. This

    is done by comparing only a subset of the light field descriptors of the query and of the

    database objects in the first few stages of the retrieval process. The light field descriptor was

    evaluated to be one of the best performing descriptors in the SHREC competition. Ohbuchi

    et al. [60] used a similar view-based approach to the light field descriptor. However, their

    method extracted local features from each rendered image using the SIFT algorithm. Wang

    et al.[89] improved the space usage efficiency of the LFD descriptor by projecting a number

    of uniformly sampled random points along six directions to create six images that are then

    described using Zernike moments. They also used a two-stage retrieval method to speed

    up the retrieval process. Experimental results on the Princeton shape benchmark database

    showed that their methods performance was comparable to the LFD descriptor for some

    categories. Vajramushti et al. [84] employed a combination of a view-based depth-buffer

    technique and a feature-based volume descriptor for 3D matching. Their method used the

    voxel volume of the objects to reduce the search space for the depth-buffer comparisons.

    Vranic [87] evaluated a composite descriptor called DESIRE that was formed using depth-

    buffer images, silhouettes and ray extents of a 3D object. His results showed that DESIRE

    outperformed LFD in retrieving objects of some categories.

    It is important to note that most of these existing 3D object descriptors were developed

    and tested to describe general 3D objects with high shape variability, and not medical

    datasets, which usually have small shape variations. As shown in the analysis section, they

    usually do not perform very well in describing medical datasets. This thesis proposes a

    feature-based approach that uses a learning methodology to identify the interesting salient

    points on the object, discussed in Chapter 5, and creates a global spatial map of the salient

    points patterns, described in Chapter 6. The proposed descriptor is tested and shown to

    work well for general 3D objects and to outperform other methods on craniofacial medical

  • 8/12/2019 Atmos u Karto Phd

    29/139

  • 8/12/2019 Atmos u Karto Phd

    30/139

    12

    Figure 2.1: Anthropometric landmarks on patients head. These images were published byKelly et al. [40]

    shape [score 0], 2) mild shape deformation [score 1], 3) moderate shape severity [score

    2], and 4) severe shape deformation [score 3]. When assigning the severity score for a

    patient, a clinical expert matches the patients skull shape to the most similar template

    and assigns the score corresponding to that template. This technique is currently used

    by practitioners using the Dynamic Orthotic Cranioplasty Band (DOC Band) helmet as a

    treatment method [33, 34].

    Instead of taking physical measurements directly on a patients head, some techniques

    take the measurements from photographs of the patients head. This approach is less intru-

    sive for young patients, but it is still time consuming and can be inconsistent as techniciansmust manually place landmarks on the photographs. Hutchisonet al. [31, 32] developed

    a technique called HeadsUp that involves taking the top view digital photograph of infant

    heads fitted with an elastic head circumference band. The elastic band is equipped with

    adjustable color markers to identify landmarks such as ear and nose positions. The result-

    ing photograph is then automatically analyzed to obtain quantitative measurements for the

    head shape, including cephalic index, head circumference, distance of ear to center of nose,

    oblique length and ratio. Their results showed that the cephalic index (CI) and Oblique

    Cranial Length Ratio (OCLR) can be used for quantification measurement of shape sever-

    ity, as the numbers differ significantly between cases and control. Although promising, the

    Hutchison method requires subjective decisions regarding the placement of the midline and

    ear landmarks and the selection of the posterior point of the OCLR lines. In addition, as

    the measurements are done in two dimensions, displacement of head volumes cannot really

    be assessed. In addition, the placing of the band on an infant can be quite challenging.

  • 8/12/2019 Atmos u Karto Phd

    31/139

    13

    Zonenshaynet al. [98] also employed a headband with two adjustable points (nasion and

    inion of the head) and used photographs of the headband shape to calculate the CranialIndex of Symmetry (CIS). These methods require consistency in setting up the band and

    placing the markers, which may lead to non-reproducible results. In addition, this is a 2D

    technique, but plagiocephaly and brachycephaly are three-dimensional deformations.

    Vlimmerenet al. [85] introduced a new method called plagiocephalometry to assess the

    asymmetry of the skull. The method uses a thermoplastic material to mold the outline of

    a patients skull. The ring is p ositioned around the head at the widest transverse circum-

    ference. Three landmarks for the ears and nose are marked on the ring. The ring is then

    copied onto a paper and transparent sheet made to keep track of follow-up progress.

    Measurement techniques that use full 3D head shape information can provide more

    detailed and accurate shape information. Plank et al.[67] used a noninvasive laser shape

    digitizer to obtain the 3D surface of the head. This system provides more accurate shape

    information, but still requires the use of markers to define an anatomical reference plane for

    further quadrant placement and volume calculations. Lanche et al. [45, 61] used a stereo-

    camera system to obtain 3D model of the head and developed a statistical model of the

    asymmetry to quantify and localize the asymmetry at each patients head. The model wasobtained by first computing the asymmetry of a patients head by deforming a symmetric

    ideal head template to the patients head to obtain point correspondence between the left

    and right sides of the head. Principal Component Analysis was then performed on the

    vector of the asymmetry values of all patients head to obtain a statistical model.

    2.2.2 22q11.2 Deletion Syndrome

    Similar to the assessment of deformational plagiocephaly, the assessment of 22q11.2 deletion

    syndrome has commonly been through physical examination combined with craniofacial

    anthropometric measurements. There have been very few automated methods for analyzing

    22q11.2DS. Boehringer et al. [11] used Gabor wavelet to transform 2D photographs of

    individuals with 10 different facial dysmorphic syndromes. Their method then applied

    principal component analysis to describe and classify the dataset. Their method required

  • 8/12/2019 Atmos u Karto Phd

    32/139

    14

    landmark placement on the face.

    Hammondet al. [29] used the Dense Surface Model method. Landmarks were manuallyplaced on each of the 3D surface mesh, and used to align the faces to a mean face. Principal

    component analysis was then used to describe the datasets, and the coefficients were used

    to classify dataset. Neither of these two methods are fully automatic as they require manual

    landmark placement.

    One of the method proposed in this thesis to represent craniofacial dysmorphologies uses

    3D surface mesh models of heads without the need for markers or templates. The method

    uses the surface normal vectors of all the 3D points on the head and constructs a global 2D

    histogram of the azimuth-elevation angles of the surface normal vectors of the 3D points

    on the face. The proposed method is general enough to characterize different craniofacial

    disorders including deformational plagiocephaly and its variations and 22q11.2DS and its

    different manifestation.

  • 8/12/2019 Atmos u Karto Phd

    33/139

    15

    Chapter 3

    DATASETS

    This chapter will describe the four datasets that were obtained to develop and test

    the different shape analysis methodologies developed for this thesis. Each dataset has

    different characteristics that help explore the different properties of the methodologies. The

    22q11.2DS dataset, introduced in Section 3.1, contains 3D face models of individuals affectedand unaffected by 22q11.2 deletion syndrome. The Deformational Plagiocephaly dataset,

    discussed in Section 3.2, contains 3D head models of individuals affected and unaffected by

    deformational plagiocephaly. The Heads dataset, discussed in Section 3.3, contains head

    shapes of different classes of animals, including humans. These three datasets help explore

    the performance of the methodology on data of similar overall shape with subtle distinctions

    - the type of data for which the methodology was designed and developed. Section 3.4

    introduces the SHREC 2008 classification benchmark dataset, which was obtained to further

    test the performance of the methodology on general 3D object classification, where objects

    in the dataset are not very similar.

    3.1 22q11.2 Deletion Syndrome(22q11.2DS) Dataset

    The 3D face models in this dataset were collected at the Craniofacial Center of Seattle

    Childrens Hospital using the 3dMD imaging system [1]. The 3dMD imaging system uses

    four camera stands, each containing three cameras. Stereo analysis yields twelve range maps

    that are combined using 3dMD proprietary software to yield a 3D mesh of an individuals

    head and a texture map of the face. The methodologies developed for this thesis use only

    the 3D meshes, due to human subject regulations.

    An automated system developed by Wilamowska [92, 74] to align the pose of each mesh

    was employed. The alignment system uses symmetry to align the yaw and roll angles and

    a height differential to align the pitch angle. Although faces are not truly symmetrical,

  • 8/12/2019 Atmos u Karto Phd

    34/139

    16

    Figure 3.1: Example of 3D face mesh data of children with 22q11.2 deletion syndrome.

    the pose alignment procedure can be cast as finding the angular rotations of yaw and roll

    that minimizes the difference between the left and right sides of the face. The pitch of the

    head was aligned by minimizing the difference between the height of the chin and the height

    of the forehead. In some cases, manual adjustments were necessary to pose normalize the

    faces. Figure 3.1 shows two examples of affected individuals in the dataset.

    The dataset contained 3D meshes for 189 individuals. Metadata for each 3D mesh

    consisted of the age, gender, and self-described ethnicity of the individual plus a label of

    affected or unaffected. The dataset consisted of 53 affected individuals and 136 control indi-

    viduals. The groundtruth for the individuals label for 22q11.2DS was determined through

    laboratory confirmation.

    A balanced dataset was created from the original dataset. The balanced dataset con-

    sisted of 86 individuals: 43 affected and 43 unaffected with 22q11.2 deletion syndrome. Each

    of the 86 individuals were assessed by three craniofacial experts. Frontal and profile images

    of the individuals were de-identified and viewed in random order to blind raters. The ex-

    perts assigned discrete scores to a total of 18 facial features that are known to characterize

    22q11.2DS (score 0 = none, 1 = moderate, 2 = severe). Nine of the facial features (midface

    hypoplasia, prominent nasal root, bulbous nasal tip, small nasal alae, tubular nose, small

    mouth, open mouth, downturned mouth, and retrusive chin) are further analyzed in Chap-

    ter 8. The experts survey showed that all features of the nose were found to have a higher

    percentage of moderate and severe expression in 22q11.2DS affected individuals. Midface

    hypoplasia was observed to be moderately present in affected individuals [91].

  • 8/12/2019 Atmos u Karto Phd

    35/139

    17

    Figure 3.2: Tops of heads of children with deformational plagiocephaly.

    3.2 Deformational Plagiocephaly Dataset

    The dataset for analyzing the shape dysmorphology due to deformational plagiocephaly was

    obtained through a similar data acquisition pipeline as the 22q11.2DS dataset. The resulting

    3D meshes are also automatically pose-normalized using the same alignment system used

    to normalize the 22q11.2DS dataset [92, 74]. Figure 3.2 shows two examples of individuals

    diagnosed with deformational plagiocephaly.

    The original dataset consisted of 254 3D head meshes consisting of 100 controls and

    154 cases. Each mesh in the original dataset was assessed by two craniofacial experts who

    assigned discrete severity scores based on the degree of the deformation severity of different

    head areas including back of the head, forehead asymmetry, ear asymmetry, and whether

    the flattening at the back of the head was symmetric (case of brachycephaly). In addition,

    each expert also noted an overall severity score. The discrete scores were either category 0

    for normal, 1 for mild, 2 for moderate and 3 for severe. The laterality of the flatness was

    indicated using negative scores to represent left sided deformation and positive scores to

    represent right sided deformation.

    The work in this thesis focuses on the flattening at the back of the head noted as

    posterior plagiocephaly. Since there does not exist any gold standard for assessing the

    severity of posterior plagiocephaly, the experts ratings were considered the gold standard

    in evaluating the different severity scores developed. The inter-rater agreement between

    the two experts was only 65%. As a result, participants were excluded if (1) the two

  • 8/12/2019 Atmos u Karto Phd

    36/139

    18

    experts assigned discrepant posterior flattening scores, or (2) the classification based on

    expert ratings differed from the clinical classification (case or control) assigned at the timeof enrollment. The final dataset used to investigate posterior plagiocephaly consisted of 140

    infants including 50 controls (by definition in category 0 by expert rating) and 90 cases: 46

    in category 1 or -1, 35 in category 2 or -2, and 9 in category 3 or -3.

    3.3 Heads Dataset

    For the Heads dataset, the digitized 3D objects were obtained by scanning hand-made clay

    toys using a Roldand-LPX250 laser scanner with a maximal scanning resolution of 0.008

    inches for plane scanning mode [70]. Raw data from the scanner consisted of 3D point clouds

    that were further processed to obtain smooth and uniformly sampled triangular meshes of

    0.9-1mm resolution. To increase the number of objects for training and testing, new objects

    were created by deforming the original scanned 3D models in a controlled fashion using 3D

    Studio Max software [8]. Global deformations of the models were generated using morphing

    operators such as tapering, twisting, bending, stretching and squeezing. The parameters for

    each of the operators were randomly chosen from ranges that were determined empirically.

    Each deformed model was obtained by applying at least five different morphing operatorsin a random sequence.

    Fifteen objects representing seven different classes were scanned. The seven classes are:

    cat head, dog head, human head, rabbit head, horse head, tiger head and bear head. Each

    of the fifteen original objects were randomly morphed to increase the size of the dataset.

    A total of 250 morphed models per original object were generated. Points on the morphed

    model are in full correspondence with the original models from which they were constructed.

    Figure 3.3 shows examples of objects from each of the seven classes, while Figure 3.4 shows

    example of morphs from the horse class.

    3.4 SHREC Dataset

    The SHREC dataset was selected from the SHREC 2008 Competition Classification of

    Watertight Models track [27]. The models in the track were chosen by the organizer to

    ensure a high level of shape variability to make the track more challenging. The models

  • 8/12/2019 Atmos u Karto Phd

    37/139

    19

    cat dog human rabbit horse tiger bear

    Figure 3.3: Example of objects in the Heads dataset.

    Figure 3.4: Example morphs from the horse class. Morphs were generated by stretching,twisting, or squeezing the original object with different parameters.

    in the dataset were manually classified using three different levels of categorization. At

    the coarse level of classification, the objects were classified according to both their shapes

    and semantic criteria. At the intermediate level, the classes were subdivided according to

    functionality and shape. At thefine level, the classes were further partitioned based on the

    object shape. For example, at the coarse level some objects were classified into the furniture

    class. At the intermediate level, these same objects were further divided intotables, seats

    andbeds. At the fine level, the objects were classified intochairs,armchairs,stools,sofaand

    benches. The intermediate level of classification was chosen for the experiments as the fine

    level had too few objects per class, while the coarse level had too many objects that were

    dissimilar in shape grouped into the same class. The dataset consists of 425 pre-classified

    objects. Figure 3.5 shows examples of objects in the benchmark dataset.

    The four datasets were used to test the classification and retrieval methodologies de-

    veloped in this thesis. The domain-independent base framework of the methodologies is

    described next in Chapter 4.

  • 8/12/2019 Atmos u Karto Phd

    38/139

    20

    human animal knots airplane bottle chess teapot

    Figure 3.5: Example of objects in the SHREC 2008 Classification dataset. It can be seenthat the intra-class variability in this dataset is quite high as objects in the same class havequite different shapes.

  • 8/12/2019 Atmos u Karto Phd

    39/139

    21

    Chapter 4

    BASE FRAMEWORK

    The methodologies developed in this thesis are used for single 3D object classification.

    They do not handle objects in cluttered 3D scenes nor occlusion. A surface mesh, which

    represents a 3D object, consists of points {pi} on the objects surface and information

    regarding the connectivity of the p oints. The base framework of the methodology startsby rescaling the ob jects to fit in a fixed-size bounding box. The framework then executes

    two phases: low-level feature extraction (Section 4.1) and mid-level feature aggregation

    (Section 4.2). The low-level feature extraction starts by applying a low-level operator to

    every point on the surface mesh. After the first phase, every pointpi on the surface mesh

    will have either a single low-level feature value or a small set of low-level feature values,

    depending on the operator used. The second phase performs mid-level feature aggregation

    and computes a vector of values for a given neighborhood of every point pi on the surface

    mesh. The feature aggregation results of the base framework are then used to construct the

    different 3D object representations [7, 6].

    4.1 Low-level Feature Extraction

    The low-level operators extract local properties of the surface points by computing a feature

    value vi for every point pi on the mesh surface. All low-level feature values are convolved

    with a Gaussian filter to reduce noise effects. Three low-level operators were implemented

    to test the methodologys performance: absolute Gaussian curvature, Besl-Jain curvature

    categorization, and azimuth-elevation of surface normal vectors. Figure 4.1(a) shows an

    example of the absolute Gaussian curvature values of a 3D model. Figure 4.1(b) shows

    the results of applying a Gaussian filter over the low-level Gaussian curvature values, while

    Figure 4.1(c) shows the results of applying the Gaussian filter over the low-level Besl-Jain

    curvature values.

  • 8/12/2019 Atmos u Karto Phd

    40/139

    22

    (a) (b) (c)

    Figure 4.1: (a) Absolute Gaussian curvature low-level feature value, (b) Smoothed AbsoluteGaussian curvature values after convolution with the Gaussian filter, (c) Smoothed Besl-Jain curvature values after convolution. Higher values are represented by cool (blue) colors,while lower values are represented by warm (red) colors.

    4.1.1 Absolute Gaussian Curvature

    The absolute Gaussian curvature low-level operator computes the Gaussian curvature esti-

    mation Kfor every point p on the surface mesh:

    K(p) = 2 fF(p)

    interior anglef

    where F is the list of all the neighboring facets of point p and the interior angle is the

    angle of the facets meeting at point p. This calculation is similar to calculating the angle

    deficiency at point p. The contribution of each facet is weighted by the area of the facet

    divided by the number of points that form the facet. The operator then takes the absolute

    value of the Gaussian curvature as the final low-level feature value for each point.

    4.1.2 Besl-Jain Curvature

    Besl and Jain [10] suggested surface characterization of a point p using only the sign of the

    mean curvature H and Gaussian curvature K. These surface characterizations result in a

    scalar surface feature for each point that is invariant to rotation, translation and changes

    in parametrization. The eight different categories are: (1) peak surface, (2) ridge surface,

    (3) saddle ridge surface, (4) plane surface, (5) minimal surface, (6) saddle valley, (7) valley

  • 8/12/2019 Atmos u Karto Phd

    41/139

    23

    surface, and (8) cupped surface. Table 4.1 lists the different surface categories with their

    respective curvature signs.

    Table 4.1: Besl-Jain surface characterization.

    Label Category H K

    1 Peak surface H 0

    2 Ridge surface H

  • 8/12/2019 Atmos u Karto Phd

    42/139

    24

    Figure 4.2: Azimuth and elevation angle of a 3D surface normal vector.

    Figure 4.3: (a) 1D histogram aggregating the absolute Gaussian curvature values frompoints on the nose of a human head, (b) 2D histogram aggregating the azimuth-elevationvector values at a point on the back of the head.

    object size, and that the results are comparable across different objects. The value ofc was

    determined empirically; for most experiments a value of c = 0.05 was used. Aggregating

    the single-valued low-level feature values results in a 1D histogram with d histogram bins

    for every point on the surface mesh. Aggregating the pair-valued low-level feature values

    (such as the azimuth-elevation angle feature values) results in a 2D histogram constructed

    ofab bins, where a and b are the two different dimension sizes. Figure 4.3(a) shows an

    example of a 1D histogram aggregating the absolute Gaussian curvature low-level feature

    values from points on the nose of a 3D head object. Figure 4.3(b) shows an example of the

    2D histogram aggregating the azimuth-elevation low-level feature values on a head.

  • 8/12/2019 Atmos u Karto Phd

    43/139

    25

    Once the feature extraction and aggregation are completed, a learning phase is used

    to learn the characteristics of salient points for classification and retrieval as described inChapter 5.

  • 8/12/2019 Atmos u Karto Phd

    44/139

    26

    Chapter 5

    LEARNING SALIENT POINTS

    Given the base frameworks ability to compute low-level feature values at each point of

    a 3D mesh and to aggregate these features in neighborhoods about the point, this chap-

    ter explores the use of this framework to create a representation for 3D objects. Before

    constructing the 3D object signature, salient or interesting points are identified on the 3Dobject and the characteristics of these points are used when constructing the signatures.

    The identified salient points are application dependent. The framework and methodology

    was developed to be specifically applicable to classification of craniofacial disorders, such

    as 22q11.2 deletion syndrome, discussed in Section 5.1, and deformational plagiocephaly,

    described in Section 5.2, but also appropriate for general use in 3D shape classification, as

    shown in Section 5.3.

    Preliminary saliency detection using existing methods [46, 38] were not satisfactory. In

    some cases they were not consistent and repeatable for ob jects within the same class. As

    a result, to find salient points on a 3D object, a learning approach was selected. A salient

    point classifier is trained on a set of marked training points on the 3D objects provided by

    experts for a particular application. Histograms of low-level features of the training points

    obtained using the base framework (Chapter 4) are then used to train the classifier. For

    a particular application, the classifier will learn the characteristics of the salient points on

    the surfaces of the 3D objects from that domain. Sets of detected points will lead to salient

    regions in the signatures.

    5.1 Learning Salient Points for 22q11.2 Deletion Syndrome

    Traditionally, studies of individuals with craniofacial disorders such as 22q11.2 deletion syn-

    drome have been performed through in-person clinical observation coupled with craniofacial

    anthropometric measurements derived from anatomic landmarks [24]. These landmarks are

  • 8/12/2019 Atmos u Karto Phd

    45/139

    27

    Figure 5.1: Craniofacial anthropometric landmarks.

    located either visually by clinicians or through palpation of the skull. Figure 5.1 shows the

    landmark points that are commonly used for craniofacial measurements.

    The salient point classifier was trained on a subset of the craniofacial anthropometric

    landmarks marked on 3D head objects. This was done so that these craniofacial landmarks

    would be included in the set of interesting or salient points for classification of the cranio-

    facial disorders. The particular subset of landmarks was selected to b e well-defined points

    that both experts and non-experts could easily identify. The training set consisted of human

    heads selected from the Heads database. Figure 5.2 shows an example of manually marked

    salient points on the training data. Histograms of low-level features obtained using the base

    framework were used to train a Support Vector Machine (SVM) [72, 86] classifier to learn

    the salient p oints on the 3D surface mesh. WEKAs implementation of SVM was used for

    all experiments [93]. A training set, consisting of 75 morphs of 5 human heads was used to

    train the classifier to learn the characteristics of the salient points for faces in terms of the

    histograms of their low-level features.

    Although the salient training points were selected only to be commonly used craniofa-

    cial landmark points, empirical studies determined that the classifier actually finds salient

    regions with a combination of high curvature and low entropy values. This result can be

    observed in the different histograms of salient and non-salient p oints in Figure 5.3. In the

    figure, the salient point histograms have mainly low bin counts in the bins corresponding

    to low curvature values and a high bin count in the last (highest) curvature bin. The

    non-salient point histograms have mainly medium to high bin counts in the low curvature

  • 8/12/2019 Atmos u Karto Phd

    46/139

    28

    Figure 5.2: Example of manually marked salient (blue color) and non-salient (red color)points on a human head model. The salient points include corners of the eyes, tip of thenose, corners of the nose, corners of the mouth, and chin.

    bins and in some cases a high bin count in the last bin. The entropy of the salient point

    histograms also tends to be lower than the entropy of the non-salient point histograms. The

    classifier approach avoided the use of brittle thresholds.

    Figure 5.4 shows results of the salient points predicted on two faces in the 22q11.2DS

    database, which include not just the manually marked points but other points with the same

    characteristics. The salient points are colored according to the assigned classifier confidence

    score. Non-salient points are colored in red, while salient points are colored in different

    shades of blue with dark blue having the highest prediction score.

    5.2 Learning Salient Points for Deformational Plagiocephaly

    A similar learning-based approach was used to find salient points for 3D heads with de-

    formational plagiocephaly. The salient p oint classifier for deformational plagiocephaly was

    trained on a set of points marked on the flat areas at the back of the head of individuals with

    deformational plagiocephaly. The training salient points consisted of 10 marked points on

    the flat areas of 10 heads with deformational plagiocephaly, while the non-salient training

    points were selected from 10 heads without deformational plagiocephaly. Histograms of the

    azimuth-elevation low-level features obtained using the base framework were used to train a

    Support Vector Machine (SVM) classifier to learn the salient p oints on the 3D heads. After

  • 8/12/2019 Atmos u Karto Phd

    47/139

    29

    E = 0.348 E=2.435 E=2.79

    Salient point histograms

    E=3.95 E=3.877 E=4.185

    Non-salient point histograms

    Figure 5.3: Example histograms of salient and non-salient points. The salient point his-tograms have a high value in the last bin illustrating a high curvature in the region, whilelow values in the remaining bins in the histogram. The non-salient point histograms have

    more varied values in the curvature histogram. In addition, the entropy E of the salientpoint histogram is lower than the non-salient point histogram (listed under each histogram).

    Figure 5.4: Salient point prediction for two faces in the 22q11.2DS dataset. Non-salientpoints are colored in red, while salient points are colored in different shades ranging fromgreen to blue, depending on the classifier confidence score assigned to the point. A threshold(T = 0.95) was applied to include only salient points with high confidence scores.

  • 8/12/2019 Atmos u Karto Phd

    48/139

    30

    training was complete, the classifier was able to label each point on a 3D head as either

    salient or non-salient and provide a confidence score for each decision. The same threshold,T = 0.95, was applied to the confidence scores for the salient points.

    5.3 Learning Salient Points for General 3D Objects

    The salient point classifier for general 3D object classification was trained on selected objects

    from the Heads database using the craniofacial landmark points that were used in the

    22q11.2DS application. A small training set consisting of 25 morphs of the cat head model,

    25 morphs of the dog head model, and 50 morphs of human head models was used to train

    the classifier to learn the characteristics of salient points for general 3D object classification.

    Histograms of low-level features obtained using the base framework were used to train a

    Support Vector Machine (SVM) classifier to learn the salient points on general 3D objects.

    A threshold T = 0.95 was also applied to the confidence scores for the classifier salient

    points. Figure 5.5 shows results of the salient p oints predicted on instances of the cat,

    dog and human head class in the Heads, which include, as previously mentioned, not just

    the manually marked points, but other points with the same characteristics. The salient

    points are colored according to the assigned classifier confidence score. Non-salient pointsare colored in red, while salient points are colored in different shades of blue with dark blue

    having the highest prediction score. While the classifier was only trained on cat heads, dog

    heads, and human heads, it does a good job of finding salient points on the other classes

    of heads, and the 3D patterns produced are repeatable across objects of the same class.

    Figure 5.6 shows the predicted salient points on new object classes that were not included

    in the training phase.

    The trained classifier was also tested on the SHREC 2008 Classification dataset. Exper-

    imental results show the labeled salient points were quite satisfactory. Figure 5.7 shows the

    salient points predicted on a number of objects from the SHREC 2008 database. Note that

    on this database, which has a lot of intra-class shape variance, the salient point patterns

    are not consistent across all members of each class.

    After learning and identifying the application-dependent salient points for the 3D ob-

    jects in the dataset, the signature for each 3D object is constructed as described next in

  • 8/12/2019 Atmos u Karto Phd

    49/139

    31

    (a) (b) (c)

    Figure 5.5: Salient point prediction for (a) cat head class, (b) dog head class, and (c) human

    head class. Non-salient points are colored in red, while salient points are colored in differentshades ranging from green to blue, depending on the classifier confidence score assigned tothe point. A threshold (T = 0.95) was applied to include only salient points with highconfidence scores.

    (a) (b) (c)

    Figure 5.6: Salient p oint prediction for (a) rabbit head class, (b) horse head class, and (c)leopard head class from the Heads database. Even though all three classes were not includedin the training, the training model was able to predict salient points across the classes.

  • 8/12/2019 Atmos u Karto Phd

    50/139

    32

    (a) (b) (c) (d)

    Figure 5.7: Salient point prediction for (a) human class, (b) bird class, (c) human hand

    class, and (d) bottle class from the SHREC 2008 database. Note that for classes that havea lot of intra-class shape variance the salient point patterns are not consistent across allmembers of those classes as seen in column (a).

  • 8/12/2019 Atmos u Karto Phd

    51/139

    33

    Chapter 6.

  • 8/12/2019 Atmos u Karto Phd

    52/139

    34

    Chapter 6

    2D LONGITUDE-LATITUDE SALIENT MAP SIGNATURE

    Most 3D object analysis methods require the use of a 3D descriptor or signature to

    describe the shape and properties of the 3D objects. This chapter describes the construc-

    tion of the 3D object signature using the salient point patterns, obtained using the learning

    approach described in Chapter 5, mapped onto a 2D plane via a longitude-latitude transfor-

    mation, described in Section 6.1. Classification of 3D objects is then performed by training

    a classifier using the 2D salient maps of the objects. Results of classification using the 2D

    salient map signature are given in Section 6.2. Retrieval of 3D objects is performed by

    calculating the distances between the salient signature of the query object and the salient

    map signatures of all objects in the database. Results of retrieval using the 2D salient map

    signature are given in Section 6.3. Section 6.4 investigates how the salient patterns are used

    to obtain 2D salient views for 3D object retrieval.

    6.1 Salient Point Pattern Projection

    Before mapping the salient point patterns obtained in Chapter 5 onto the 2D plane, the

    salient points are assigned a label according to the classifier confidence score assigned to

    the point. The classifier confidence score range is then discretized into a number of bins.

    For the experiments, at confidence level 0.95 and above, the confidence score range was

    discretized into 5 bins. Each salient point on the 3D mesh is then assigned a label based on

    the bin into which its confidence score falls.

    To obtain the 2D longitude-latitude map signature for an object, the longitude and

    latitude p ositions of all the 3D points on the objects surface are calculated. Given any

    point pi (pix, piy, piz), the longitude position i and latitude position i of point pi are

  • 8/12/2019 Atmos u Karto Phd

    53/139

    35

    calculated as follows:

    i= arctan(piz

    pix) i= arctan( p

    iy(p2ix+p

    2iz)

    )

    A 2D map of the longitude and latitude positions of all the points on the objects surface

    is created by discretizing the longitude and latitude values of the points into a fixed number

    of pixels. A pixel is labeled with the salient point label of the points that fall into that

    pixel. If more than one label is mapped to a pixel, the label with the highest count is used

    to label the pixel. Figure 6.1 shows the salient point patterns for the cat head, dog head,

    and human head model in the Heads database and their corresponding 2D map signatures.Figure 6.2 shows how different objects that belong to the same class will have similar 2D

    longitude-latitude signature maps.

    (a) (b) (c)

    Figure 6.1: Salient p oint patterns on 3D objects of Figure 5.4 and their corresponding 2Dlongitude-latitude map signatures.

    To reduce noise in the 2D longitude-latitude map signature, a wavelet transformation

    was applied to the 2D map signatures. In the experiments, the 2D longitude-latitude map

    signatures were treated as 2D images and decomposed using image-based Haar wavelet

  • 8/12/2019 Atmos u Karto Phd

    54/139

    36

    human head

    rabbit head

    horse head

    wildcat head

    Figure 6.2: Objects that are similar and belong to the same class will have similar 2D

    longitude-latitude signature maps.

    function. The wavelet function decomposes the 2D image into approximation and detail

    coefficients. The approximation and detail coefficients at the second level were collected and

    concatenated into a new feature vector with dimension d= 13134. This final feature

    vector became the descriptor for each object in the database and was used for classification

    and retrieval. For most experiments, the noise reduction step was not found to improve the

    classification and retrieval performances except for the SHREC dataset (Section 6.2.4).

    6.2 Classification using 2D Map Signature

    By creating a signature for each 3D objects, it is now possible to perform classification of

    3D ob jects in a given database. Several classification experiments were performed on each

    of the acquired datasets described in Chapter 3.

  • 8/12/2019 Atmos u Karto Phd

    55/139

    37

    6.2.1 Classification of 22q11.2DS Dataset

    The goal of this experiment was to classify each individual in the dataset as either affected

    or unaffected by 22q11.2DS and to measure the classification accuracy. The salient points

    classifier was trained on a subset of the craniofacial anthropometric landmarks marked

    on 3D human head models as explained in Chapter 5. Table 6.1 shows the classification

    performance with two different classifiers: Adaboost and SVM. Evaluation was done using

    the following measures: classification accuracy, precision and recall rates, F-measure, true

    positive, and false positive rates. The classification accuracy for the higher scoring SVM

    classifier is 86.7%, which is higher than that obtained from a study of three human experts

    whose mean accuracy was 72.5% [92].

    Table 6.1: Classification performance for 22q11.2DS.

    Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate

    Adaboost 0.804 0.795 0.804 0.791 0.804 0.387

    SVM 0.867 0.866 0.868 0.861 0.868 0.27

    The classification accuracy of the map signature was compared to some of the state-of-

    the-art and best performing 3D object descriptors in the literature. The following existing

    descriptors were used for comparison: Light Field Descriptor (LFD) [16], ray-based spherical

    harmonics (SPH) [39], shape distribution of distance between random points (D2) [62], and

    absolute angle distance histogram (AAD) [59]. The Light Field Descriptor (LFD) is a view-

    based descriptor that extracts features from 100 2D silhouette image views and measures

    the distance between two 3D objects by finding the best correspondence between the set

    of 3D views for the two ob jects. The Spherical Harmonics method calculates the maximal

    extent of a shape across all rays from the origin and uses spherical harmonics to represent

    the function. The shape function D2 represents 3D objects by calculating the global shape

    distribution of distances between two random points, while the AAD method enhances

    the D2 shape function by measuring not only the distance between two random points,

    but also the mutual orientation of the surfaces on which the pair of points is located.

  • 8/12/2019 Atmos u Karto Phd

    56/139

  • 8/12/2019 Atmos u Karto Phd

    57/139

    39

    6.2.2 Classification of Deformational Plagiocephaly Dataset

    The goal of this experiment was to classify each individual as either control or case affected

    by the plagiocephaly condition and to measure the classification accuracy. The salient

    points for the map signature were obtained by using the salient flat p oint classifier as

    explained in Chapter 5. The classification experiments were performed on the Deformational

    Plagiocephaly Dataset introduced in Chapter 3.

    Table 6.4 shows the classification accuracy of the method on the full 254 individual

    dataset. The groundtruth for the classification was the referral doctors originally assigned

    patient status: case or control. Table 6.5 shows the classification accuracy of the method

    on the trimmed 140-individual dataset in which the experts agreed. The Adaboost classifier

    obtains a 80.3% classification accuracy on the full dataset and an improved 87.9% accuracy

    on the trimmed dataset.

    Table 6.4: Classification performance for plagiocephaly using the full 254 individualsdataset.

    Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate

    Adaboost 0.803 0.805 0.803 0.804 0.803 0.208SVM 0.787 0.787 0.787 0.787 0.787 0.233

    Table 6.5: Classification performance for plagiocephaly using the trimmed 140 individualsdataset.

    Classifier Accuracy Prec Recall F-Measure TP Rate FP Rate

    Adaboost 0.879 0.878 0.879 0.878 0.879 0.156

    SVM 0.85 0.849 0.85 0.849 0.85 0.19

    The classification accuracy of the methodology for this application was also compared

    to existing state-of-the-art descriptors. Table 6.6 shows that the 2D salient map signature

    achieves higher classification accuracy for deformational plagiocephaly than other existing

    methods, including the LFD descriptor and others discussed in Chapter 2.

  • 8/12/2019 Atmos u Karto Phd

    58/139

    40

    Table 6.6: Comparison of classification accuracy for plagiocephaly.

    Dataset Salient 2D map LFD SPH D2 AAD

    Full 254 dataset 0.803 0.72 0.673 0.650 0.685

    Trimmed 140 dataset 0.879 0.714 0.743 0.779 0.721

    Classification of this condition can be incorporated into epidemiologic research on the

    prevalence and long-term outcome of deformational plagiocephaly, which may eventually

    lead to improved clinical care for infants with deformational plagiocephaly.

    6.2.3 Classification of Heads Dataset

    The Heads database can be thought of as a first step toward testing the 2D salient map

    signature on more general shapes still in the craniofacial category, but for multiple different

    animals where face shapes can be quite different.

    In the first set of experiments, all objects in the Heads database were pose-normalized by

    rotating the heads to face the same orientation, as was the case for the medical craniofacial

    datasets. Classification of the 3D objects in the database was performed by training a SVM

    classifier on the salient point patterns of each class using the 2D longitude-latitude map

    signature of the ob jects in the class. The classifier was trained using the signatures of 25

    objects from each class for all seven classes in the database and tested with a separate test

    set consisting of 50 ob jects per class for each of the seven classes. The classifier achieved

    100% classification accuracy in classifying all the pose-normalized objects in the database.

    Since 3D objects may be encountered in the world at any orientation, rota