parallelization strategy
DESCRIPTION
The global design is inspired by an old presentation already published on slideshare.TRANSCRIPT
1
Is it an open door to
R. MAHMOUDI – A3SI Laboratory– 2009 April
common parallelization strategy for topological operators on multi-core multi-thread architecture ?
2R. MAHMOUDI – A3SI Laboratory– 2009 April
Summary
Parallel thinning operator
Future work
General framework
Discussion
3R. MAHMOUDI – A3SI Laboratory– 2009 April
Summary
Parallel thinning operator
Future work
General framework
Discussion
4R. MAHMOUDI – A3SI Laboratory– 2009 April
General framework
1. Scientific and technical context (1)
Image processing operators
Dynamic redistribution
Thresholding
Point-to-Point operators
Associated class
Linear filters Opening Thinning
Crest restoring
Smoothing
Watershed
Closing
Local operators
Morphological operators
Topological operators
Globaloperators
FourierTransformation
Euclidean DistanceTransformation
Not-linear filters Attributed
Filter
5R. MAHMOUDI – A3SI Laboratory– 2009 April
General framework
1. Scientific and technical context (2)
Point-to-Point operators
(Associated class) Vs (Parallelization strategies)
Local operators
Morphological operators
Topological operators
Globaloperators
Sienstra [1](2002)
Wilkinson [2]
(2007)
[1] F. J. Seinstra, D. Koelma, and J. M. Geusebroek, “A software architecture for user transparent parallel image processing”.[2] M.H.F. Wilkinson, H. Gao, W.H. Hesselink, “Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines”.[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time” .
Meijster [3]
6R. MAHMOUDI – A3SI Laboratory– 2009 April
General framework
2. Ph. D. objectives (1)
Top
olo
gic
al op
era
tors Thinning operator [1]
Crest restoring [1]
2D and 3D smoothing [2]
Watershed based on w-thinning [3]
Watershed based on graph [4]
Homotopic kernel transformation [5]
Leveling kernel transformation [5]
[1] M. Couprie, F. N. Bezerra, and G. Bertrand, “Topological operators for grayscale image processing”, [2] M. Couprie, and G. Bertrand, “Topology preserving alternating sequential filter for smoothing 2D and 3D objects”.[3] G. Bertrand, “On Topological Watersheds”. [4] J. Cousty, M. Couprie, L. Najman and G. Betrand “Weighted fusion graphs: Merging properties and watersheds”.[5] G. Bertrand, J. C. Everat, and M. Couprie, "Image segmentation through operators based on topology“
common
paral lel ization
strategy
7R. MAHMOUDI – A3SI Laboratory– 2009 April
General framework
2. Ph. D. objectives (2)
Shared Memory Machine
CPU1 CPU2 CPU3
CPU
n
Random Access Memory
MIMD Machine :(Execute several instruction streams in parallel on different data)
Main Architectural Classes
SISD machines
SIMD machines
MISD machines
Distributed
Memory System
8R. MAHMOUDI – A3SI Laboratory– 2009 April
General framework
2. Ph. D. objectives (3)
C o m m o n p a r a l l e l i z a t i o n s t r a t e g y o f t o p o l o g i c a l o p e r a t o r s o n m u l t i - c o r e m u l t i t h r e a d a r c h i t e c t u r e ( M I M D M a c h i n e s – S h a r e d M e m o r y S y s t e m ) ?
1. Unifying parallelization method of topological operators class (Algorithmic level)2. Implementation Methodology and optimization techniques on multi-core multithread architecture (Architecture level).
Needs
Main Objectives
9R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
Future work
General framework
Discussion
10R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
1. Theoretical background
Algorithm : λ–Skeleton (input Ғ, λ ; output :Ғ)
1. Repeat until stability 2. Among all the points which are λ–deletable and not λ–end3. Select a point x of minimal value ;4. F(x)=αˉ(x,F)
Filtered thinning method that allows to selectively simplify the topology, based on a local contrast parameter λ.
(b) filtered skeleton with λ = 10.
(a) After Deriche gradient operator
11R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
1. Parallelization strategy (1)
Define search area
Start parallel characterization
Create new shared data structure
End parallel characterization
Merge modified search areaRestart process until stability
12R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
1. Parallelization strategy (2)
SDM-Strategy(Divide and conquer principle)
Up level
DATA PARALLELISM
Down level
THREAD PARALLELISM
MIXEDPARALLELIS
M
13R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
1. Parallelization strategy (3)
14R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
2. Coordination of threads (1)
Thread 1
FailSuccess Blocked
Lock() Unlock() Push()Thread 2
First implementation using a lock-based shared FIFO queue.
15R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
2. Coordination of threads (2)
Thread 1
Semaphore
Lock() and access semaphore
Unlock() and leave semaphore
Push()
Thread 2
Second implementation using a private-shared concurrent FIFO queue
16R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
3. Performance testing (1)
P4 660 E8400 E5335 E5405
Arch. Pentium 4 Core 2 Duo Quad-core Octo-core
CPU Speed 3.60 GHz 3 GHz 2 GHz 2 GHz
Bus Speed 800 MHz 1333 MHz 1333 MHz 1333 MHz
L2 Size 800 MHz 6 MB 8 MB 12 MB
L2 Speed 3.6 GHz 3 GHz 2 GHz 2 GHz
package Type LGA775 LGA775 LGA771 LGA771
Techno. 90 nm 45 nm 65 nm 45 nm
17R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
3. Performance testing (2)
0 1 2 4 8 16 200
10
20
30
40
50
60
70
Number of threads
Wal
l-clo
ck t
ime
[ms]
1 cores
2 cores
4 cores
8 cores
0 1 2 4 8 16 200
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Number of threads
Perf
orm
ance I
mpro
vem
ent
1 cores
2 cores
4 cores
8 cores
First implementation using a lock-based shared FIFO queue.
18R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
3. Performance testing (3)
0 1 2 4 8 16 200
10
20
30
40
50
60
70
Number of threads
Wal
l-clo
ck t
ime
[ms]
1 cores
2 cores
4 cores
8 cores
0 1 2 4 8 16 200
1
2
3
4
5
6
7
Number of threads
Perf
orm
ance I
mpro
vem
ent
1 cores
2 cores
4 cores
8 cores
Second implementation using a private-shared concurrent FIFO queue
19R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
4. Conclusion
1 2 4 80.0
0.2
0.4
0.6
0.8
1.0
1.2
Number of cores
Effi
cien
cy
Using lock-based shared FIFO queue
Using private-shared FIFO queue
Non-specific nature of the proposed parallelization strategy.
Threads coordination and communication during computing dependently parallel read/write for managing cache-resident data
1
2
20R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
Future work
General framework
Discussion
21R. MAHMOUDI – A3SI Laboratory– 2009 April
Future work
1. Extension
Parallel Thinning Operator
SDM - StrategyPerformance enhancement (speed up)
Efficiency (work distribution)
Cache miss
Crest restoring I M B R I C AT E
T W OO p e r a t o r s
22R. MAHMOUDI – A3SI Laboratory– 2009 April
Future work
2. New parallel topological watershed
Parallel watershe
d Operator
SDM - Strategy
Performance enhancement (speed up)
Efficiency (work distribution)
Cache miss
% Achievement
80%
23R. MAHMOUDI – A3SI Laboratory– 2009 April
Parallel thinning operator
Future work
General framework
Discussion
24R. MAHMOUDI – A3SI Laboratory– 2009 April
Discussion
Introduce future programming model (make it easy to write programs that execute efficiently on highly parallel C.S)
Introduce new “Draft” to design and evaluate parallel programming models (instead of old benchmark)
Maximize programmer productivity, future programming model must be more human-centric(than the conventional focus on hardware or application)
25R. MAHMOUDI – A3SI Laboratory– 2009 April