mathematical modeling and classification of eye disease srinivasan parthasarathy joint work with m....

Mathematical Modeling and Classification of Eye Disease

Srinivasan Parthasarathy

Joint work with M. Bullimore, K. Marsolo and

M. Twa

Aspects of this work are funded by the NIH, NSF and the DOE.

Desiderata for Clinical Diagnosis

• Should be accurate and ideally interoperable– Can we use mathematical

modeling?– Can we improve accuracy by

meta-learning?• Should be interpretable

– Can we visualize the decision making process effectively?

• Should be responsive– Can we leverage distributed

computing tools to speed up the process?

Data Modeling

Feature Extraction

Classification

Visualization &Interpretation

Synopsis of Approach

Ocular Anatomy 101

Case Study: Keratoconus• Progressive, degenerative, non-

inflammatory disease.– A leading cause of blindness and

corneal transplant. • Early detection is difficult & important

– Has implications for eye surgery and control-of-disease

– Initial Symptoms: Minor fluctuations in corneal shape

• Diagnosis procedure– Video-keratography exam– Manual analysis of results by clinician

• Challenges to detection– Voluminous data

• one image is 1000s of data points representing corneal surface

• spatial and temporal (longitudinal)

– Features of interest small in scale to mean shape

– Leads to variance in prognosisLate stage Keratoconus Normal (clinically ideal)

Raw Data Description

• Corneal surface represented as a ~7000 point matrix – output of video-keratographic device– Fixed angula sampling in concentric circles around center

• Data stored in cylindrical coordinates:– Radius (ρ)– Angle (θ)– Height / Elevation (z)

• 3 classes – Keratoconus, surgical repaired (lasik), normal• 508 eyes from 254 people (L-R normalization)• Can we use this data to construct a 3-D surface of the

cornea. • How to model?

Modeling

• Desired Properties– Reflect General Shape and Structure of Entity of

Interest– Should capture important features local harmonics– Need for Compact Representation

• Should be capable of capturing important features such as local harmonics

– Capable of tuning to desired resolution– Capable of dealing with multiple dimensions

• Options Evaluated– Zernike Polynomials, Pseudo-Zernike Polynomials,

Wavelets

Modeling: Zernike and Pseudo-Zernike Polynomials

• Hyper-geometric radial basis functions• Each term (mode) in the series represents a 3D geometric

surface.– Value of each term represents the contribution of that mode to the overall

surface (independent)

• Benefits:– Lower order modes show correlation to general surface features of the

cornea.– Higher order modes capture local harmonics– Orthogonal – Anatomic correspondence to clinical concepts

• Drawbacks:– Can be computationally expensive.– Can model noise as well especially higher order

Details

• Pseudo-Zernike (PZ)

• Zernike (Z)

Z & PZ Transformation Algorithm

• Compute least-squares fit between model and original data

• Then use coefficients as feature vector as input for classification

Wavelet Modeling

1. Convert to 1-Dimensional SignalA. Sample along concentric circles (Klyce & Smolek)

• Use standard 1D Wavelets to model Signal and use coefficients to classify

B. Sample along a space-filling curve (us)• Key idea is to maintain spatial coherence • Same as above to classify (works better than A)

2. Apply 2-dimensional Wavelet Models (us)1. Use coefficients to classify (does not work as well as 1.B.)

• Pros– Fast and efficient

• Cons– Overall performance worse (5 -10%) than Zernike-based

approaches– No anatomic correspondence – difficult to interpret

Experiments: Model Fidelity

• Model error of Z vs. PZ?• Model error on different patient classes?• What set of parameters provides the best model

fit?• Transformation Parameters:

– Polynomial Order: 4th – 10th

• Larger the order may model signal +noise !– Radius: 2.0, 2.5, 3.0, 3.5mm (max)

• Larger Radius more number of points to model

Results: Model Fidelity

• General Trend:– Increasing polynomial

order decreases error.– Increasing transformation

radius increases error.

• Same order: Z > PZ• Same # coefficients: Z ~

PZ • Between patient classes:

Keratoconus > LASIK > Normal

Classification and Clinical Decision Support

• Prefer transparent algorithms over “black-box” classifiers.– Use simple classifiers and provide a way to visually

explore the decision making process

• Desire high accuracy– Use an ensemble of simple and interpretable classifiers

• Desire efficiency– Use netsolve distributed computing tool

Basic Classification Performance• Accuracy of Classifier based on PZ vs. Z?

– Zernike works better

• Which classifiers work well– C4.5 (84-85%), Naïve Bayes (84%), VFI (84%), Neural Networks (81%), one-vs-all SVM (82%)– SVMs and NN are also difficult to explain (interpretability)

• Performance of Ensemble Techniques – Boosting, bagging and random forests– All upgrade performance (3-4%)– Bagging prefered – easy to interpret, performance marginally better

• More accurate model = higher classification accuracy?– C4.5 (4th order works best but others are not bad)– NB/VFI/NN/SVM (higher orders do not work well – noise or irrelevant

features hampers performance)

Ensemble Learning

• Combine results of multiple classification models built from different samples of dataset to improve accuracy.– Training data represents a sample of the population.– A classifier built on one sample can “overfit” and model

noise.– Constructing multiple models can filter noise and reduce

generalization error [Breiman, 1996].

• Traditional Methods:– Bootstrap Aggregation (Bagging)– Boosting

Spatial Averaging (SA)• Use classifiers built on different resolutions and

models of the dataset to improve accuracy.– Build classifier for each spatial transformation and

resolution. – Take modal label of classifiers to reach final decision.– Can view as a “structured column bagging” algorithm

• Intuition:– Lower order transformations result in more general, global

model.– Higher order transformations better at capturing local

harmonics, but can model noise.– If errors are uncorrelated, SA should smooth noise effects.

Spatial AveragingOrg.Data

Spatial Transformation(s)

4Z

5Z

6Z

7Z

8Z

9Z

10Z

9PZ7PZ5PZ

10PZ8PZ6PZ4PZ

1. Transform Data2. Classify3. Tally Votes

5 02 4 21

Spatial Averaging with Sub-Selection (Combined)

Org.Data

Spatial Transformation(s)

4Z

7Z

10Z

7PZ

4PZ

1. Transform Data2. Classify3. Tally Votes

2 01 2 00

4 01

Experiments

• How does SA compare to a single decision tree?

• How does SA compare to traditional ensemble-learning methods?

• Can SA be combined with ensemble-learning methods to further improve results?

• Ensemble Methods Evaluated Include– Bagging, Boosting, Random Forests

Spatial Averaging vs. C4.5• 10-fold c.v.• Zernike-based SA

– 7 trees (4th to 10th order)– 3-5% over individual tree.

• PZ-based based SA– Up to 7% improvement

• Combined SA classifier (5 trees)– accuracy of 91.1%, 6-

10% improvement.– Rationale: Clinically it

appears that PZ and Z do better on different varieties of Keratoconus

74

76

78

80

82

84

86

88

90

92

4 5 6 7 8 9 10 SA

Zernike

Pseudo-Zernike

Combined

SA and Traditional Ensemble-Learning

• Combined SA (5 trees) outperforms Boosted (10) or Bagged C4.5 (10)

• Bagging does marginally better than Boosting & RF (not shown)

• Combined + Bagging (5)– 94.1 % accuracy – However it trades off

interpretability (5X5 trees) for accuracy

808284868890929496

Zern

ike

Pse

udo-

Zern

ike

Com

bine

d

Com

bine

d+B

aggi

ng

Acc

urac

y Bag (4th)

Boost (4th)

Spatial Averaging

Visualization of Results

• Task: Visualize results to provide decision support for clinicians.– Give intuition as to why a group of patients are

classified the way they are.– Contrast an individual patient with others in the same

group

• How?– Modes of Zernike/Pseudo-Zernike polynomial

correspond to specific features of the cornea.

Patient-Specific Decision Surface

1. Treat each path through the decision tree as a ‘rule.’

2. Cluster training data by rule.3. Compute average coefficient

values for each cluster.4. Given a patient, classify and

keep the ‘rule coefficients’, set others to zero.

5. Construct overall surface and ‘rule surface’

04C

13C

22C0

0C

K

LKK

≤ 2.88

≤ 9.34≤ -401.83

≤ -4.04

≤ 1.42

13C

33C

13C

24C

N

K

≤ -0.07

≤ 1.00

≤ -0.31

Patient-Specific Decision Surface

Create surfaces using:1. All patient coefficients.

2. All rule mean coefficients.

3. Patient coefficients used in the classifying rule (rest zero).

4. Rule mean coefficients used in the classifying rule (rest zero).

Also:• Bar chart with relative error between patient and rule

mean coefficients.

Visualization: Strongest Rules

Rule 1 - Keratoconus Rule 8 - Normal Rule 4 - LASIK

High Performance Results

• Optimize and parallelize (5 nodes) key steps of the code over a grid environment– M – unoptimized algorithm– NS – netsolve version

• Times shown for computing one decision tree using particular model (Z/PZ) and includes model building time.

Case Study: Glaucoma

• Progressive neuropathy of the optic nerve

• Disease characteristics– Symptom free – Elevated intraocular pressure– Structural loss of retinal ganglion cells– Gradual restriction of the visual field from

periphery to center

Clinical Management of Glaucoma

• Monitoring Intraocular pressure

• Static threshold visual field sensitivity

• Observations of structural change at the optic nerve head

Nor

mal G

lauc

oma

Topographic Modeling

• Objectives– Feature reduction– Preservation of spatial correlation

• Polynomial Modeling– Zernike– Pseudo-Zernike

• Spline Modeling– Knot locations, coefficients

• Wavelet Modeling– 1D vs. 2D

Concluding Remarks

• Modeling and Classifying Corneal Shape– Low-order Zernike polynomials provide adequate model of

corneal surface.– Higher order polynomials begin to model noise but may contain

a few useful features for classification:– Decision trees provide classification accuracy greater than or

equal to other classification methods.– Accuracy can be further improved by using SA-strategy.

• Visualization:– Using classification attributes as basis for visualization provides

method of decision support for clinicians.

• High Performance Implementations can help• Modeling and Classifying Glaucoma – ongoing

General thoughts on Interdisciplinary Collaboration

• Steep learning curve– Need to learn language and requirements– Need to express results in domain language

• Patience, patience, patience– Communities are inertia bound– Often difficult to make headway

• Potential for incredible rewards– Scientific/medical implications

• Good working relationship essential– Equal partners

900µm diameter fit

0 50 100 15040

60

80

100

120

140

Coefficients (n)

Ave

rag

e R

MS

(m

)

ZernikePseudoZernike

0 5 10 1540

60

80

100

120

140

Polynomial Order (n)

Ave

rag

e R

MS

Err

or(

m)

ZernikePseudoZernike

How is RMS related to Class?

• Greater variance in “glaucoma”• Higher mean in “glaucoma”

050

100

150

RM

S E

rro

r (u

m)

900 1000 1100 1200Maximum Fit Radius (um)

normal

glaucoma

Conclusions

Single Decision Tree

00C

13C

33C1

3C

22C

K

N K

K

N K

≤ 2.77

≤ -0.37

≤ 1.93

≤ -433.3

≤ 1.27

Decision Surface

Data

• 254 Patient Records

• 3 Patient Categories– Normal (119)– Diseased (99)– Post-LASIK

(36)

Imaging: Scripting Crop Routines

Imaging: Scripting Centering Routines

• 2D Mean ± SD

• What is the best center point (disc vs cup)?

• Failure associated with class assignment– Normals fail more often

Re-centered

mathematical modeling and classification of eye disease srinivasan parthasarathy joint work with m....

Documents

d surface

general surface features

d geometric surface

detectionvoluminous

model signal

higher order modes

corneal surfacespatial

s of data points