1 evaluation of image segmentation methods jayaram k. udupa medical image processing group -...

34
1 EVALUATION OF IMAGE SEGMENTATION METHODS Jayaram K. Udupa Medical Image Processing Group - Department of Radiology University of Pennsylvania 423 Guardian Drive - 4th Floor Blockley Hall Philadelphia, Pennsylvania - 19104-6021

Upload: britton-lynch

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

EVALUATION OF IMAGE SEGMENTATION METHODS

Jayaram K. Udupa

Medical Image Processing Group - Department of Radiology University of Pennsylvania

423 Guardian Drive - 4th Floor Blockley HallPhiladelphia, Pennsylvania - 19104-6021

2

CAVA

CAVA: Computer-Aided Visualization and Analysis

The science underlying computerized methods of image processing, analysis, and visualization to facilitate new therapeutic strategies, basic clinical research, education, and training.

3

CAD

CAD: Computer-Aided Diagnosis

The science underlying computerized methods for the diagnosis of diseases via images

4

Image Segmentation

Recognition: Determining the object’s whereabouts in the scene. (humans > computer)

Delineation: Determining the object’s spatial extent and composition in the scene. (computer > humans)

In CAVA, Segmentation Delineation. Recognition is usually manual.

5

SEGMENTATION EVALUATION

Can be considered to consist of two components:

• Theoretical

Study mathematical equivalence among algorithms.

• Empirical

Study practical performance of algorithms in specific application domains.

6

SEGMENTATION EVALUATION: Theoretical

Segmentation approaches may be broadly classified into two groups:

• pI approaches Purely image based – rely mostly on information

available in the given image only.

• SM approaches Shape model based – employ prior shape models

for the objects of interest.

7

SEGMENTATION EVALUATION: Theoretical

pI approaches

Boundary-based: optimum boundary

active contours/surfaceslevel sets

Region-based clustering – kNN, CM, FCM

graph cutfuzzy connectedness MRFwatershedsoptimum partitioning (Mumford-Shah, Chan-Vese)

SM approaches

manual tracing Live wire Active Shape Active Appearance m-Reps atlas-based

8

SEGMENTATION EVALUATION: Theoretical

Fundamental challenges in image segmentation:

(Ch1) Are major pI frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

(Ch2) How to develop truly distinct methods constituting real

advance?

(Ch3) How to choose a method for a given application domain?

(Ch4) How to set an algorithm optimally for an application domain?

Currently any method A can be shown empirically to bebetter than any method B, even when they are equivalent.

9

SEGMENTATION EVALUATION: Theoretical

A general theory of image segmentation:

An idealized image F: a function

is a bounded open subset of

A digital image f: a function

f is a digitization of F. is a subset of

A delineation model M:

is a segment of image F, is a parameter vector.

Ciesielski, Udupa, SPIE Proceedings 6512:65120W-1-65120W-12, 2007.Ciesielski, Udupa, MIPG Technical Report 335, U of Pennsylvania, November 2007.

p

. .

n

.C

.Op,F

C .

O

10

SEGMENTATION EVALUATION: Theoretical

A delineation algorithm A: a mapping is a parameter vector, .

Algorithm A represents model M: a limiting process.As the resolution of f increases, S approaches O.

Algorithms A1 and A2 are model-equivalent: if there exists amodel M such that both A1 and A2 represent M.

S C

lim , . pf F

A M ,

.S,f

11

SEGMENTATION EVALUATION: Theoretical

(1) Theorem: The Malladi-Sethian-Vemuri (PAMI-17, 1995) level set algorithm is model equivalent to Udupa-Samarasekera (GMIP-58, 1996) fuzzy connectedness algorithm with gradient based fuzzy affinity.

(FC method has definite computational advantages over LS.)

(2) Audigier and Lotufo have shown by a different approach (Image Foresting Transform) equivalence between particular forms of watershed and fuzzy connectedness.

12

SEGMENTATION EVALUATION: Theoretical

Attributes used by some well known delineation models

Connectedness Gradient Texture Smoothness Shape Noise Optimization

Fuzzy Connectedness

Yes Gradient + homogeneity

affinity

Object feature affinity

No No Scale based FC

In RFC

Chan-Vese No No Yes Yes No No Yes

Mumford-Shah No No (not for edge detection)

Yes Yes No Yes Yes

KWT snake Boundary Yes No Yes No No Yes

Maladi-Sethian-Vemuri LS

Foreground when

expanding

Yes No No No No No

Live wire Boundary Yes Yes Yes User No Yes

Active shape Yes No No No Yes No Yes

Active appearance

Yes No Yes No Yes No Yes

Graph cut Usually not Yes Possible No Usually not

No Yes

Clustering No No Yes No No No Yes

13

SEGMENTATION EVALUATION: Empirical

From now on, we denote a digital image by

Need to specify Application Domain

, ,T B P

T :

B :

P :

Example: Estimating the volume of brain.

A body region -

Imaging protocol -

Application domain: A particular triple

A task -

Example: Head.

Example: T2 weighted MR imaging with a particular set of parameters.

, .C fC

14

SEGMENTATION EVALUATION: Empirical

The segmentation efficacy of a method M in an application domain may be characterized by three groups of factors:

Precision :(Reliability)

Repeatability taking into account all subjective actions influencing the result.

Accuracy :(Validity)

Degree to which the result agrees with truth.

Efficiency : (Viability)

Practical viability of the method.

Udupa et al., Computerized Medical Imaging and Graphics, 30:75-87, 2006.

, ,T B P

15

SEGMENTATION EVALUATION: Empirical

S: A given set of images in

(1) Manual delineation in images in S – trace or paint Std .

(2) Simulated images I: Create an ensemble of “cut-outs” of the object from different images and bury them realistically in different images S. The cut-outs are segmented carefully Std.

For determining accuracy, need true/surrogates of true delineation.

, , .T B P

Std : The corresponding set of images with true delineations.

16

A slice (a) of an image simulated from an acquired MR proton density image of a Multiple Sclerosis patient’s brain and its “true” segmentation (b) of the lesions.

(a) (b)

17

(3) Simulated Images II : Start from (binary/fuzzy) objects (Std )

segmented from real images. Add intensity contrast, blur, noise, background variation realistically S.

White matter (WM) in a gray matter background, simulated by segmenting WM from real MR images and by adding blur, noise, background variation to various degrees: (a) low, (b) medium, and (c) high.

(a) (b) (c)

18

(4) Simulated Images III : As in (3) or (1) but apply realistic deformations to the images in S and Std.

Simulating more images (c) and their “true” segmentations (d) from existing images (a) and their manual segmentation (b) by applying known realistic deformations.

(d)(a) (c)(b)

19

(5) Simulated Images IV:

Start from realistic mathematical phantoms (Std).Simulate the imaging process with noise, blur, background variation, etc.Create Images S.

http://www.bic.mni.mcgill.ca/brainweb/

20

(6) Estimating surrogate segmentations from manual segmentations.

Have many manual segmentations for each image in S.

Estimate the segmentation that represents the best estimate of truth Std.

Warfield, S.K., Zou, K.H., Wells, W.M.: “Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation.” IEEE Trans Med Imaging 23(7):903-921, 2004.

21

SEGMENTATION EVALUATION: Empirical

(1) Intra operator variations(2) Inter operator variations(3) Intra scanner variations(4) Inter scanner variations

Inter scanner variations include variations due to the same brand and different brands.

Repeatability taking into account all subjective actions that influence the segmentation result.

Precision

22

SEGMENTATION EVALUATION: Empirical - Precision

A measure of precision for method M in a trial that produces and for situation Ti is given by

, 1, 2.

1 2

i

1 2

O OM MT

M O OM M

PR i

C C

C C

-

1 - , = 3, 4. + 2

1 2

i

1 2

O OM MT

M O OM M

PR iC C

C C

Surrogates of truth are not needed.

1OMC 2O

MC

Intra/inter operator

Intra/inter scanner

23

SEGMENTATION EVALUATION: Empirical

Accuracy

The degree to which segmentations agree with true segmentation.

Surrogates of truth are needed.

For any scene C acquired for application domain ,

- fuzzy segmentation of in by method ,

- surrogate of true delineation of in .td

M

OM O

O

C C

C C

, ,T B P

24

SEGMENTATION EVALUATION:Empirical – Accuracy

,

, ,

O Otd M td Md d

M Mtd d td

O OM td d M tdd d

M Mtd td

FNVF TPVFU -

UFPVF TNVF

U

C C C C

C C

C C C -C

C Cd -

Ud : A binary image representing a reference super set. (for example, the imaged body region ).

: Amount of tissue truly in that is missed by .

: Amount of tissue falsely delineated by .

dMd

M

FNVF O M

FPVF M

25

SEGMENTATION EVALUATION:Empirical – Accuracy

Requirements for accuracy metrics:

(1) Capture M’s behavior of trade-off between FP and FN.

(2) Satisfy fractional relations:

(3) Capable of characterizing the range of behavior of M.

(4) Boundary-based FN and FP metrics may also be devised.(5) Any monotonic function g(FNVF, FPVF) is fine as a metric.(6) Appropriate for

1

1

d dM M

d dM M

FNVF TPVF

FPVF TNVF

, , .T B P

26

SEGMENTATION EVALUATION:Empirical – Accuracy

1-FNVF

FPVF

Brain WM segmentation in PD MRI images.

Each value of parameter vector of M gives a point on the DOC curve.

The DOC curve characterizes the behavior of M over a range of parametric values of M.

Delineation Operating Characteristic

:AM

Area underthe DOC curve

27

SEGMENTATION EVALUATION: Empirical

Describes practical viability of a method.

Four factors should be considered:

(1) Computational time – for one time training of M (2) Computational time – for segmenting each scene

(3) Human time – for one-time training of M

(4) Human time – for segmenting each scene

1cMt

2cMt

1hMt

2hMt

(2) and (4) are crucial. (4) determines the degree of automation of M.

Efficiency

28

Summary

Precision : Accuracy :

:

:

:

: Area under the DOC curve intra scanner

FN fraction for delineation: inter operator

FP fraction for delineation: intra operator1TMPR

2TMPR

3TMPR

dMFPVF

MA

dMFNVF

Efficiency :

operator time for scene segmentation.:

operator time for algorithm training.:

computational time for scene segmentation.:

computational time for algorithm training.:1cMt

2cMt

1hMt

2hMt

4TMPR : inter scanner

29

SEGMENTATION EVALUATION: Empirical

Software Systems for SegmentationSoftware OS Cost Tools

3D Doctor [162] W fee Manual tracing

3D Slicer [163] W, L, U no fee Manual, EM methods, level sets

3DVIEWNIX [164]

L, U binaryno fee Manual, optimal thresh., FC family, live wire family, fuzzy thresh.,

clustering, live snake

Amira [165] W, L, U, M

fee Manual, snakes, region growing, live wire

Analyze [166] W, L, U fee Manual, region growing, contouring, math morph, interface to ITK

Aquarius [167] Unknown fee Unknown

Brain Voyager [168]

W, L, U fee Thresholding, region growing, histogram methods

CAVASS [169]W, L, U, M

no fee Manual, opt thresh., FC family, live wire family, fuzzy thresh, clustering, live snake, active shape, interface to ITK

etdips [170] W no fee Manual, thresholding, region growing

Freesurfer [171] L, M no fee Atlas-based (for brain MRI)

Advantage Windows U, W fee Unknown

Image Pro [172] W fee Color histogram

30

SEGMENTATION EVALUATION: Empirical Software Systems (cont’d)

Imaris [173] W fee Thresholding (microscopic images)

ITK [174] W, L, U, M

no fee Thresh., level sets, watershed, fuzzy connectedness, active shape, region growing, etc.

MeVisLab [175] W, L binary no fee Manual, thresh., region growing, fuzzy connectedness, live wire

MRVision [176] L, U, M fee Manual, region growing

Osiris [177] W, M no fee Thresholding, region growing

RadioDexter [178] Unknown fee Unknown

SurfDriver [179] W, M fee Manual

SliceOmatic [180] W fee Thresholding, watershed, region growing, snakes

Syngo InSpace [181] Unknown fee Automatic bone removal

VIDA [182] Unknown fee Manual, thresholding

Vitrea [183] Unknown fee Unknown

VolView [184] W, L, U fee Level sets, region growing, watershed

Voxar [185] W, L, U fee Unknown

31

SEGMENTATION EVALUATION: Empirical

Publicly Available Data Sets

Data sets Description

True Segmentation Number of Images

BrainWeb [186]

Simulated brain T1, T2, PD MR images-Objects: CSF, GM, WM, vessels, skull, ..

binary, fuzzy 20 (3D) CAVA

DDSM [187] Digital database for screening mammography - Objects: lesions

no 

2,500 (2D) CAD

ICBM [188] International consortium for brain mapping, MRI, images warped to template

binary 3,000 (3D) CAVA

LIDC [ 189] Lung spiral CT images - Objects: nodules binary (4 readers)

85 (3D) CAD

OAI [190] Osteo arthritis initiative, x-ray and MRI knee images no 160 (2D, 3D) CAVA

RIDER [191]

Chest CT images over time of lung cancer patients, radiation therapy followup

no 140 (3D) CAD

VCC [192] Virtual colonoscopy; CT images of colon no 835 (3D) CAD

VH [193-195]

Visible human data sets; whole body sectional, CT, and MR images

binary 

2 (3D) CAVA

32

Segmentation Evaluation: Empirical

An Evaluation Framework for CAVA should consist of:

(FW1) Real life image data for several application domains

(FW2) Reference segmentations (of all images) that can be used as surrogates of true segmentations.

(FW3) Specification of computable, effective, meaningful metrics for precision, accuracy, efficiency.

(FW4) Several reference segmentation methods optimized for each

(FW5) Software incorporating (FW1) – (FW4).

, , .T B P

, , .T B P

33

SEMENTATION EVALUATION: Empirical

2hMt =0

Remarks

(1) Precision, accuracy, efficiency are interdependent.

• accuracy efficiency.

• precision and accuracy difficult.

(2) “Automatic segmentation method” has no meaning unless the results are proven on a large number of data sets with acceptable precision, accuracy, efficiency, and with .

(3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful than

a “yes” or “no” answer.

(4) DOC is essential to describe the range of behavior of M.

, ,T B P

34

Concluding Remarks

(1) Need unifying segmentation theories that can explain equivalences/distinctness of existing algorithms.

This can ensure true advances in segmentation.

(2) Need evaluation frameworks with FW1-FW5.

This can standardize methods of empirical comparison of competing and distinct algorithms.