1 evaluation of image segmentation methods jayaram k. udupa medical image processing group -...
TRANSCRIPT
1
EVALUATION OF IMAGE SEGMENTATION METHODS
Jayaram K. Udupa
Medical Image Processing Group - Department of Radiology University of Pennsylvania
423 Guardian Drive - 4th Floor Blockley HallPhiladelphia, Pennsylvania - 19104-6021
2
CAVA
CAVA: Computer-Aided Visualization and Analysis
The science underlying computerized methods of image processing, analysis, and visualization to facilitate new therapeutic strategies, basic clinical research, education, and training.
3
CAD
CAD: Computer-Aided Diagnosis
The science underlying computerized methods for the diagnosis of diseases via images
4
Image Segmentation
Recognition: Determining the object’s whereabouts in the scene. (humans > computer)
Delineation: Determining the object’s spatial extent and composition in the scene. (computer > humans)
In CAVA, Segmentation Delineation. Recognition is usually manual.
5
SEGMENTATION EVALUATION
Can be considered to consist of two components:
• Theoretical
Study mathematical equivalence among algorithms.
• Empirical
Study practical performance of algorithms in specific application domains.
6
SEGMENTATION EVALUATION: Theoretical
Segmentation approaches may be broadly classified into two groups:
• pI approaches Purely image based – rely mostly on information
available in the given image only.
• SM approaches Shape model based – employ prior shape models
for the objects of interest.
7
SEGMENTATION EVALUATION: Theoretical
pI approaches
Boundary-based: optimum boundary
active contours/surfaceslevel sets
Region-based clustering – kNN, CM, FCM
graph cutfuzzy connectedness MRFwatershedsoptimum partitioning (Mumford-Shah, Chan-Vese)
SM approaches
manual tracing Live wire Active Shape Active Appearance m-Reps atlas-based
8
SEGMENTATION EVALUATION: Theoretical
Fundamental challenges in image segmentation:
(Ch1) Are major pI frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real
advance?
(Ch3) How to choose a method for a given application domain?
(Ch4) How to set an algorithm optimally for an application domain?
Currently any method A can be shown empirically to bebetter than any method B, even when they are equivalent.
9
SEGMENTATION EVALUATION: Theoretical
A general theory of image segmentation:
An idealized image F: a function
is a bounded open subset of
A digital image f: a function
f is a digitization of F. is a subset of
A delineation model M:
is a segment of image F, is a parameter vector.
Ciesielski, Udupa, SPIE Proceedings 6512:65120W-1-65120W-12, 2007.Ciesielski, Udupa, MIPG Technical Report 335, U of Pennsylvania, November 2007.
p
. .
n
.C
.Op,F
C .
O
10
SEGMENTATION EVALUATION: Theoretical
A delineation algorithm A: a mapping is a parameter vector, .
Algorithm A represents model M: a limiting process.As the resolution of f increases, S approaches O.
Algorithms A1 and A2 are model-equivalent: if there exists amodel M such that both A1 and A2 represent M.
S C
lim , . pf F
A M ,
.S,f
11
SEGMENTATION EVALUATION: Theoretical
(1) Theorem: The Malladi-Sethian-Vemuri (PAMI-17, 1995) level set algorithm is model equivalent to Udupa-Samarasekera (GMIP-58, 1996) fuzzy connectedness algorithm with gradient based fuzzy affinity.
(FC method has definite computational advantages over LS.)
(2) Audigier and Lotufo have shown by a different approach (Image Foresting Transform) equivalence between particular forms of watershed and fuzzy connectedness.
12
SEGMENTATION EVALUATION: Theoretical
Attributes used by some well known delineation models
Connectedness Gradient Texture Smoothness Shape Noise Optimization
Fuzzy Connectedness
Yes Gradient + homogeneity
affinity
Object feature affinity
No No Scale based FC
In RFC
Chan-Vese No No Yes Yes No No Yes
Mumford-Shah No No (not for edge detection)
Yes Yes No Yes Yes
KWT snake Boundary Yes No Yes No No Yes
Maladi-Sethian-Vemuri LS
Foreground when
expanding
Yes No No No No No
Live wire Boundary Yes Yes Yes User No Yes
Active shape Yes No No No Yes No Yes
Active appearance
Yes No Yes No Yes No Yes
Graph cut Usually not Yes Possible No Usually not
No Yes
Clustering No No Yes No No No Yes
13
SEGMENTATION EVALUATION: Empirical
From now on, we denote a digital image by
Need to specify Application Domain
, ,T B P
T :
B :
P :
Example: Estimating the volume of brain.
A body region -
Imaging protocol -
Application domain: A particular triple
A task -
Example: Head.
Example: T2 weighted MR imaging with a particular set of parameters.
, .C fC
14
SEGMENTATION EVALUATION: Empirical
The segmentation efficacy of a method M in an application domain may be characterized by three groups of factors:
Precision :(Reliability)
Repeatability taking into account all subjective actions influencing the result.
Accuracy :(Validity)
Degree to which the result agrees with truth.
Efficiency : (Viability)
Practical viability of the method.
Udupa et al., Computerized Medical Imaging and Graphics, 30:75-87, 2006.
, ,T B P
15
SEGMENTATION EVALUATION: Empirical
S: A given set of images in
(1) Manual delineation in images in S – trace or paint Std .
(2) Simulated images I: Create an ensemble of “cut-outs” of the object from different images and bury them realistically in different images S. The cut-outs are segmented carefully Std.
For determining accuracy, need true/surrogates of true delineation.
, , .T B P
Std : The corresponding set of images with true delineations.
16
A slice (a) of an image simulated from an acquired MR proton density image of a Multiple Sclerosis patient’s brain and its “true” segmentation (b) of the lesions.
(a) (b)
17
(3) Simulated Images II : Start from (binary/fuzzy) objects (Std )
segmented from real images. Add intensity contrast, blur, noise, background variation realistically S.
White matter (WM) in a gray matter background, simulated by segmenting WM from real MR images and by adding blur, noise, background variation to various degrees: (a) low, (b) medium, and (c) high.
(a) (b) (c)
18
(4) Simulated Images III : As in (3) or (1) but apply realistic deformations to the images in S and Std.
Simulating more images (c) and their “true” segmentations (d) from existing images (a) and their manual segmentation (b) by applying known realistic deformations.
(d)(a) (c)(b)
19
(5) Simulated Images IV:
Start from realistic mathematical phantoms (Std).Simulate the imaging process with noise, blur, background variation, etc.Create Images S.
http://www.bic.mni.mcgill.ca/brainweb/
20
(6) Estimating surrogate segmentations from manual segmentations.
Have many manual segmentations for each image in S.
Estimate the segmentation that represents the best estimate of truth Std.
Warfield, S.K., Zou, K.H., Wells, W.M.: “Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation.” IEEE Trans Med Imaging 23(7):903-921, 2004.
21
SEGMENTATION EVALUATION: Empirical
(1) Intra operator variations(2) Inter operator variations(3) Intra scanner variations(4) Inter scanner variations
Inter scanner variations include variations due to the same brand and different brands.
Repeatability taking into account all subjective actions that influence the segmentation result.
Precision
22
SEGMENTATION EVALUATION: Empirical - Precision
A measure of precision for method M in a trial that produces and for situation Ti is given by
, 1, 2.
1 2
i
1 2
O OM MT
M O OM M
PR i
C C
C C
-
1 - , = 3, 4. + 2
1 2
i
1 2
O OM MT
M O OM M
PR iC C
C C
Surrogates of truth are not needed.
1OMC 2O
MC
Intra/inter operator
Intra/inter scanner
23
SEGMENTATION EVALUATION: Empirical
Accuracy
The degree to which segmentations agree with true segmentation.
Surrogates of truth are needed.
For any scene C acquired for application domain ,
- fuzzy segmentation of in by method ,
- surrogate of true delineation of in .td
M
OM O
O
C C
C C
, ,T B P
24
SEGMENTATION EVALUATION:Empirical – Accuracy
,
, ,
O Otd M td Md d
M Mtd d td
O OM td d M tdd d
M Mtd td
FNVF TPVFU -
UFPVF TNVF
U
C C C C
C C
C C C -C
C Cd -
Ud : A binary image representing a reference super set. (for example, the imaged body region ).
: Amount of tissue truly in that is missed by .
: Amount of tissue falsely delineated by .
dMd
M
FNVF O M
FPVF M
25
SEGMENTATION EVALUATION:Empirical – Accuracy
Requirements for accuracy metrics:
(1) Capture M’s behavior of trade-off between FP and FN.
(2) Satisfy fractional relations:
(3) Capable of characterizing the range of behavior of M.
(4) Boundary-based FN and FP metrics may also be devised.(5) Any monotonic function g(FNVF, FPVF) is fine as a metric.(6) Appropriate for
1
1
d dM M
d dM M
FNVF TPVF
FPVF TNVF
, , .T B P
26
SEGMENTATION EVALUATION:Empirical – Accuracy
1-FNVF
FPVF
Brain WM segmentation in PD MRI images.
Each value of parameter vector of M gives a point on the DOC curve.
The DOC curve characterizes the behavior of M over a range of parametric values of M.
Delineation Operating Characteristic
:AM
Area underthe DOC curve
27
SEGMENTATION EVALUATION: Empirical
Describes practical viability of a method.
Four factors should be considered:
(1) Computational time – for one time training of M (2) Computational time – for segmenting each scene
(3) Human time – for one-time training of M
(4) Human time – for segmenting each scene
1cMt
2cMt
1hMt
2hMt
(2) and (4) are crucial. (4) determines the degree of automation of M.
Efficiency
28
Summary
Precision : Accuracy :
:
:
:
: Area under the DOC curve intra scanner
FN fraction for delineation: inter operator
FP fraction for delineation: intra operator1TMPR
2TMPR
3TMPR
dMFPVF
MA
dMFNVF
Efficiency :
operator time for scene segmentation.:
operator time for algorithm training.:
computational time for scene segmentation.:
computational time for algorithm training.:1cMt
2cMt
1hMt
2hMt
4TMPR : inter scanner
29
SEGMENTATION EVALUATION: Empirical
Software Systems for SegmentationSoftware OS Cost Tools
3D Doctor [162] W fee Manual tracing
3D Slicer [163] W, L, U no fee Manual, EM methods, level sets
3DVIEWNIX [164]
L, U binaryno fee Manual, optimal thresh., FC family, live wire family, fuzzy thresh.,
clustering, live snake
Amira [165] W, L, U, M
fee Manual, snakes, region growing, live wire
Analyze [166] W, L, U fee Manual, region growing, contouring, math morph, interface to ITK
Aquarius [167] Unknown fee Unknown
Brain Voyager [168]
W, L, U fee Thresholding, region growing, histogram methods
CAVASS [169]W, L, U, M
no fee Manual, opt thresh., FC family, live wire family, fuzzy thresh, clustering, live snake, active shape, interface to ITK
etdips [170] W no fee Manual, thresholding, region growing
Freesurfer [171] L, M no fee Atlas-based (for brain MRI)
Advantage Windows U, W fee Unknown
Image Pro [172] W fee Color histogram
30
SEGMENTATION EVALUATION: Empirical Software Systems (cont’d)
Imaris [173] W fee Thresholding (microscopic images)
ITK [174] W, L, U, M
no fee Thresh., level sets, watershed, fuzzy connectedness, active shape, region growing, etc.
MeVisLab [175] W, L binary no fee Manual, thresh., region growing, fuzzy connectedness, live wire
MRVision [176] L, U, M fee Manual, region growing
Osiris [177] W, M no fee Thresholding, region growing
RadioDexter [178] Unknown fee Unknown
SurfDriver [179] W, M fee Manual
SliceOmatic [180] W fee Thresholding, watershed, region growing, snakes
Syngo InSpace [181] Unknown fee Automatic bone removal
VIDA [182] Unknown fee Manual, thresholding
Vitrea [183] Unknown fee Unknown
VolView [184] W, L, U fee Level sets, region growing, watershed
Voxar [185] W, L, U fee Unknown
31
SEGMENTATION EVALUATION: Empirical
Publicly Available Data Sets
Data sets Description
True Segmentation Number of Images
BrainWeb [186]
Simulated brain T1, T2, PD MR images-Objects: CSF, GM, WM, vessels, skull, ..
binary, fuzzy 20 (3D) CAVA
DDSM [187] Digital database for screening mammography - Objects: lesions
no
2,500 (2D) CAD
ICBM [188] International consortium for brain mapping, MRI, images warped to template
binary 3,000 (3D) CAVA
LIDC [ 189] Lung spiral CT images - Objects: nodules binary (4 readers)
85 (3D) CAD
OAI [190] Osteo arthritis initiative, x-ray and MRI knee images no 160 (2D, 3D) CAVA
RIDER [191]
Chest CT images over time of lung cancer patients, radiation therapy followup
no 140 (3D) CAD
VCC [192] Virtual colonoscopy; CT images of colon no 835 (3D) CAD
VH [193-195]
Visible human data sets; whole body sectional, CT, and MR images
binary
2 (3D) CAVA
32
Segmentation Evaluation: Empirical
An Evaluation Framework for CAVA should consist of:
(FW1) Real life image data for several application domains
(FW2) Reference segmentations (of all images) that can be used as surrogates of true segmentations.
(FW3) Specification of computable, effective, meaningful metrics for precision, accuracy, efficiency.
(FW4) Several reference segmentation methods optimized for each
(FW5) Software incorporating (FW1) – (FW4).
, , .T B P
, , .T B P
33
SEMENTATION EVALUATION: Empirical
2hMt =0
Remarks
(1) Precision, accuracy, efficiency are interdependent.
• accuracy efficiency.
• precision and accuracy difficult.
(2) “Automatic segmentation method” has no meaning unless the results are proven on a large number of data sets with acceptable precision, accuracy, efficiency, and with .
(3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful than
a “yes” or “no” answer.
(4) DOC is essential to describe the range of behavior of M.
, ,T B P
34
Concluding Remarks
(1) Need unifying segmentation theories that can explain equivalences/distinctness of existing algorithms.
This can ensure true advances in segmentation.
(2) Need evaluation frameworks with FW1-FW5.
This can standardize methods of empirical comparison of competing and distinct algorithms.