mathematical modeling and classification of eye disease srinivasan parthasarathy joint work with m....
TRANSCRIPT
Mathematical Modeling and Classification of Eye Disease
Srinivasan Parthasarathy
Joint work with M. Bullimore, K. Marsolo and
M. Twa
Aspects of this work are funded by the NIH, NSF and the DOE.
Desiderata for Clinical Diagnosis
• Should be accurate and ideally interoperable– Can we use mathematical
modeling?– Can we improve accuracy by
meta-learning?• Should be interpretable
– Can we visualize the decision making process effectively?
• Should be responsive– Can we leverage distributed
computing tools to speed up the process?
Data Modeling
Feature Extraction
Classification
Visualization &Interpretation
Synopsis of Approach
Ocular Anatomy 101
Case Study: Keratoconus• Progressive, degenerative, non-
inflammatory disease.– A leading cause of blindness and
corneal transplant. • Early detection is difficult & important
– Has implications for eye surgery and control-of-disease
– Initial Symptoms: Minor fluctuations in corneal shape
• Diagnosis procedure– Video-keratography exam– Manual analysis of results by clinician
• Challenges to detection– Voluminous data
• one image is 1000s of data points representing corneal surface
• spatial and temporal (longitudinal)
– Features of interest small in scale to mean shape
– Leads to variance in prognosisLate stage Keratoconus Normal (clinically ideal)
Raw Data Description
• Corneal surface represented as a ~7000 point matrix – output of video-keratographic device– Fixed angula sampling in concentric circles around center
• Data stored in cylindrical coordinates:– Radius (ρ)– Angle (θ)– Height / Elevation (z)
• 3 classes – Keratoconus, surgical repaired (lasik), normal• 508 eyes from 254 people (L-R normalization)• Can we use this data to construct a 3-D surface of the
cornea. • How to model?
Modeling
• Desired Properties– Reflect General Shape and Structure of Entity of
Interest– Should capture important features local harmonics– Need for Compact Representation
• Should be capable of capturing important features such as local harmonics
– Capable of tuning to desired resolution– Capable of dealing with multiple dimensions
• Options Evaluated– Zernike Polynomials, Pseudo-Zernike Polynomials,
Wavelets
Modeling: Zernike and Pseudo-Zernike Polynomials
• Hyper-geometric radial basis functions• Each term (mode) in the series represents a 3D geometric
surface.– Value of each term represents the contribution of that mode to the overall
surface (independent)
• Benefits:– Lower order modes show correlation to general surface features of the
cornea.– Higher order modes capture local harmonics– Orthogonal – Anatomic correspondence to clinical concepts
• Drawbacks:– Can be computationally expensive.– Can model noise as well especially higher order
Details
• Pseudo-Zernike (PZ)
• Zernike (Z)
Z & PZ Transformation Algorithm
• Compute least-squares fit between model and original data
• Then use coefficients as feature vector as input for classification
Wavelet Modeling
1. Convert to 1-Dimensional SignalA. Sample along concentric circles (Klyce & Smolek)
• Use standard 1D Wavelets to model Signal and use coefficients to classify
B. Sample along a space-filling curve (us)• Key idea is to maintain spatial coherence • Same as above to classify (works better than A)
2. Apply 2-dimensional Wavelet Models (us)1. Use coefficients to classify (does not work as well as 1.B.)
• Pros– Fast and efficient
• Cons– Overall performance worse (5 -10%) than Zernike-based
approaches– No anatomic correspondence – difficult to interpret
Experiments: Model Fidelity
• Model error of Z vs. PZ?• Model error on different patient classes?• What set of parameters provides the best model
fit?• Transformation Parameters:
– Polynomial Order: 4th – 10th
• Larger the order may model signal +noise !– Radius: 2.0, 2.5, 3.0, 3.5mm (max)
• Larger Radius more number of points to model
Results: Model Fidelity
• General Trend:– Increasing polynomial
order decreases error.– Increasing transformation
radius increases error.
• Same order: Z > PZ• Same # coefficients: Z ~
PZ • Between patient classes:
Keratoconus > LASIK > Normal
Classification and Clinical Decision Support
• Prefer transparent algorithms over “black-box” classifiers.– Use simple classifiers and provide a way to visually
explore the decision making process
• Desire high accuracy– Use an ensemble of simple and interpretable classifiers
• Desire efficiency– Use netsolve distributed computing tool
Basic Classification Performance• Accuracy of Classifier based on PZ vs. Z?
– Zernike works better
• Which classifiers work well– C4.5 (84-85%), Naïve Bayes (84%), VFI (84%), Neural Networks (81%), one-vs-all SVM (82%)– SVMs and NN are also difficult to explain (interpretability)
• Performance of Ensemble Techniques – Boosting, bagging and random forests– All upgrade performance (3-4%)– Bagging prefered – easy to interpret, performance marginally better
• More accurate model = higher classification accuracy?– C4.5 (4th order works best but others are not bad)– NB/VFI/NN/SVM (higher orders do not work well – noise or irrelevant
features hampers performance)
Ensemble Learning
• Combine results of multiple classification models built from different samples of dataset to improve accuracy.– Training data represents a sample of the population.– A classifier built on one sample can “overfit” and model
noise.– Constructing multiple models can filter noise and reduce
generalization error [Breiman, 1996].
• Traditional Methods:– Bootstrap Aggregation (Bagging)– Boosting
Spatial Averaging (SA)• Use classifiers built on different resolutions and
models of the dataset to improve accuracy.– Build classifier for each spatial transformation and
resolution. – Take modal label of classifiers to reach final decision.– Can view as a “structured column bagging” algorithm
• Intuition:– Lower order transformations result in more general, global
model.– Higher order transformations better at capturing local
harmonics, but can model noise.– If errors are uncorrelated, SA should smooth noise effects.
Spatial AveragingOrg.Data
Spatial Transformation(s)
4Z
5Z
6Z
7Z
8Z
9Z
10Z
9PZ7PZ5PZ
10PZ8PZ6PZ4PZ
1. Transform Data2. Classify3. Tally Votes
5 02 4 21
Spatial Averaging with Sub-Selection (Combined)
Org.Data
Spatial Transformation(s)
4Z
7Z
10Z
7PZ
4PZ
1. Transform Data2. Classify3. Tally Votes
2 01 2 00
4 01
Experiments
• How does SA compare to a single decision tree?
• How does SA compare to traditional ensemble-learning methods?
• Can SA be combined with ensemble-learning methods to further improve results?
• Ensemble Methods Evaluated Include– Bagging, Boosting, Random Forests
Spatial Averaging vs. C4.5• 10-fold c.v.• Zernike-based SA
– 7 trees (4th to 10th order)– 3-5% over individual tree.
• PZ-based based SA– Up to 7% improvement
• Combined SA classifier (5 trees)– accuracy of 91.1%, 6-
10% improvement.– Rationale: Clinically it
appears that PZ and Z do better on different varieties of Keratoconus
74
76
78
80
82
84
86
88
90
92
4 5 6 7 8 9 10 SA
Zernike
Pseudo-Zernike
Combined
SA and Traditional Ensemble-Learning
• Combined SA (5 trees) outperforms Boosted (10) or Bagged C4.5 (10)
• Bagging does marginally better than Boosting & RF (not shown)
• Combined + Bagging (5)– 94.1 % accuracy – However it trades off
interpretability (5X5 trees) for accuracy
808284868890929496
Zern
ike
Pse
udo-
Zern
ike
Com
bine
d
Com
bine
d+B
aggi
ng
Acc
urac
y Bag (4th)
Boost (4th)
Spatial Averaging
Visualization of Results
• Task: Visualize results to provide decision support for clinicians.– Give intuition as to why a group of patients are
classified the way they are.– Contrast an individual patient with others in the same
group
• How?– Modes of Zernike/Pseudo-Zernike polynomial
correspond to specific features of the cornea.
Patient-Specific Decision Surface
1. Treat each path through the decision tree as a ‘rule.’
2. Cluster training data by rule.3. Compute average coefficient
values for each cluster.4. Given a patient, classify and
keep the ‘rule coefficients’, set others to zero.
5. Construct overall surface and ‘rule surface’
04C
13C
22C0
0C
K
LKK
≤ 2.88
≤ 9.34≤ -401.83
≤ -4.04
≤ 1.42
13C
33C
13C
24C
N
K
≤ -0.07
≤ 1.00
≤ -0.31
Patient-Specific Decision Surface
Create surfaces using:1. All patient coefficients.
2. All rule mean coefficients.
3. Patient coefficients used in the classifying rule (rest zero).
4. Rule mean coefficients used in the classifying rule (rest zero).
Also:• Bar chart with relative error between patient and rule
mean coefficients.
Visualization: Strongest Rules
Rule 1 - Keratoconus Rule 8 - Normal Rule 4 - LASIK
High Performance Results
• Optimize and parallelize (5 nodes) key steps of the code over a grid environment– M – unoptimized algorithm– NS – netsolve version
• Times shown for computing one decision tree using particular model (Z/PZ) and includes model building time.
Case Study: Glaucoma
• Progressive neuropathy of the optic nerve
• Disease characteristics– Symptom free – Elevated intraocular pressure– Structural loss of retinal ganglion cells– Gradual restriction of the visual field from
periphery to center
Clinical Management of Glaucoma
• Monitoring Intraocular pressure
• Static threshold visual field sensitivity
• Observations of structural change at the optic nerve head
Nor
mal G
lauc
oma
Topographic Modeling
• Objectives– Feature reduction– Preservation of spatial correlation
• Polynomial Modeling– Zernike– Pseudo-Zernike
• Spline Modeling– Knot locations, coefficients
• Wavelet Modeling– 1D vs. 2D
Concluding Remarks
• Modeling and Classifying Corneal Shape– Low-order Zernike polynomials provide adequate model of
corneal surface.– Higher order polynomials begin to model noise but may contain
a few useful features for classification:– Decision trees provide classification accuracy greater than or
equal to other classification methods.– Accuracy can be further improved by using SA-strategy.
• Visualization:– Using classification attributes as basis for visualization provides
method of decision support for clinicians.
• High Performance Implementations can help• Modeling and Classifying Glaucoma – ongoing
General thoughts on Interdisciplinary Collaboration
• Steep learning curve– Need to learn language and requirements– Need to express results in domain language
• Patience, patience, patience– Communities are inertia bound– Often difficult to make headway
• Potential for incredible rewards– Scientific/medical implications
• Good working relationship essential– Equal partners
900µm diameter fit
0 50 100 15040
60
80
100
120
140
Coefficients (n)
Ave
rag
e R
MS
(m
)
ZernikePseudoZernike
0 5 10 1540
60
80
100
120
140
Polynomial Order (n)
Ave
rag
e R
MS
Err
or(
m)
ZernikePseudoZernike
How is RMS related to Class?
• Greater variance in “glaucoma”• Higher mean in “glaucoma”
050
100
150
RM
S E
rro
r (u
m)
900 1000 1100 1200Maximum Fit Radius (um)
normal
glaucoma
Conclusions
Single Decision Tree
00C
13C
33C1
3C
22C
K
N K
K
N K
≤ 2.77
≤ -0.37
≤ 1.93
≤ -433.3
≤ 1.27
Decision Surface
Data
• 254 Patient Records
• 3 Patient Categories– Normal (119)– Diseased (99)– Post-LASIK
(36)
Imaging: Scripting Crop Routines
Imaging: Scripting Centering Routines
• 2D Mean ± SD
• What is the best center point (disc vs cup)?
• Failure associated with class assignment– Normals fail more often
Re-centered