parallelization strategy

1

Is it an open door to

R. MAHMOUDI – A3SI Laboratory– 2009 April

common parallelization strategy for topological operators on multi-core multi-thread architecture ?

2R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion


Summary


Future work

General framework

Discussion


General framework

1. Scientific and technical context (1)

Image processing operators

Dynamic redistribution

Thresholding

Point-to-Point operators

Associated class

Linear filters Opening Thinning

Crest restoring

Smoothing

Watershed

Closing

Local operators

Morphological operators

Topological operators

Globaloperators

FourierTransformation

Euclidean DistanceTransformation

Not-linear filters Attributed

Filter


General framework

1. Scientific and technical context (2)

Point-to-Point operators

(Associated class) Vs (Parallelization strategies)

Local operators

Morphological operators

Topological operators

Globaloperators

Sienstra [1](2002)

Wilkinson [2]

(2007)

[1] F. J. Seinstra, D. Koelma, and J. M. Geusebroek, “A software architecture for user transparent parallel image processing”.[2] M.H.F. Wilkinson, H. Gao, W.H. Hesselink, “Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines”.[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time” .

Meijster [3]


General framework

2. Ph. D. objectives (1)

Top

olo

gic

al op

era

tors Thinning operator [1]

Crest restoring [1]

2D and 3D smoothing [2]

Watershed based on w-thinning [3]

Watershed based on graph [4]

Homotopic kernel transformation [5]

Leveling kernel transformation [5]

[1] M. Couprie, F. N. Bezerra, and G. Bertrand, “Topological operators for grayscale image processing”, [2] M. Couprie, and G. Bertrand, “Topology preserving alternating sequential filter for smoothing 2D and 3D objects”.[3] G. Bertrand, “On Topological Watersheds”. [4] J. Cousty, M. Couprie, L. Najman and G. Betrand “Weighted fusion graphs: Merging properties and watersheds”.[5] G. Bertrand, J. C. Everat, and M. Couprie, "Image segmentation through operators based on topology“

common

paral lel ization

strategy


General framework


Shared Memory Machine

CPU1 CPU2 CPU3

CPU

n

Random Access Memory

MIMD Machine :(Execute several instruction streams in parallel on different data)

Main Architectural Classes

SISD machines

SIMD machines

MISD machines

Distributed

Memory System


General framework


C o m m o n p a r a l l e l i z a t i o n s t r a t e g y o f t o p o l o g i c a l o p e r a t o r s o n m u l t i - c o r e m u l t i t h r e a d a r c h i t e c t u r e ( M I M D M a c h i n e s – S h a r e d M e m o r y S y s t e m ) ?

1. Unifying parallelization method of topological operators class (Algorithmic level)2. Implementation Methodology and optimization techniques on multi-core multithread architecture (Architecture level).

Needs

Main Objectives



Future work

General framework

Discussion



1. Theoretical background

Algorithm : λ–Skeleton (input Ғ, λ ; output :Ғ)

1. Repeat until stability 2. Among all the points which are λ–deletable and not λ–end3. Select a point x of minimal value ;4. F(x)=αˉ(x,F)

Filtered thinning method that allows to selectively simplify the topology, based on a local contrast parameter λ.

(b) filtered skeleton with λ = 10.

(a) After Deriche gradient operator



1. Parallelization strategy (1)

Define search area

Start parallel characterization

Create new shared data structure

End parallel characterization

Merge modified search areaRestart process until stability




SDM-Strategy(Divide and conquer principle)

Up level

DATA PARALLELISM

Down level

THREAD PARALLELISM

MIXEDPARALLELIS

M



2. Coordination of threads (1)

Thread 1

FailSuccess Blocked

Lock() Unlock() Push()Thread 2

First implementation using a lock-based shared FIFO queue.



2. Coordination of threads (2)

Thread 1

Semaphore

Lock() and access semaphore

Unlock() and leave semaphore

Push()

Thread 2

Second implementation using a private-shared concurrent FIFO queue



3. Performance testing (1)

P4 660 E8400 E5335 E5405

Arch. Pentium 4 Core 2 Duo Quad-core Octo-core

CPU Speed 3.60 GHz 3 GHz 2 GHz 2 GHz

Bus Speed 800 MHz 1333 MHz 1333 MHz 1333 MHz

L2 Size 800 MHz 6 MB 8 MB 12 MB

L2 Speed 3.6 GHz 3 GHz 2 GHz 2 GHz

package Type LGA775 LGA775 LGA771 LGA771

Techno. 90 nm 45 nm 65 nm 45 nm




0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

First implementation using a lock-based shared FIFO queue.




0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

1

2

3

4

5

6

7

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

Second implementation using a private-shared concurrent FIFO queue



4. Conclusion

1 2 4 80.0

0.2

0.4

0.6

0.8

1.0

1.2

Number of cores

Effi

cien

cy

Using lock-based shared FIFO queue

Using private-shared FIFO queue

Non-specific nature of the proposed parallelization strategy.

Threads coordination and communication during computing dependently parallel read/write for managing cache-resident data

1

2



Future work

General framework

Discussion


Future work

1. Extension

Parallel Thinning Operator

SDM - StrategyPerformance enhancement (speed up)

Efficiency (work distribution)

Cache miss

Crest restoring I M B R I C AT E

T W OO p e r a t o r s


Future work

2. New parallel topological watershed

Parallel watershe

d Operator

SDM - Strategy

Performance enhancement (speed up)

Efficiency (work distribution)

Cache miss

% Achievement

80%



Future work

General framework

Discussion


Discussion

Introduce future programming model (make it easy to write programs that execute efficiently on highly parallel C.S)

Introduce new “Draft” to design and evaluate parallel programming models (instead of old benchmark)

Maximize programmer productivity, future programming model must be more human-centric(than the conventional focus on hardware or application)

parallelization strategy

Technology

mahmoudi a3si laboratory

parallel thinning operator1

parallel thinning operator3

parallel thinning operator2

parallel thinning operator4

parallel readwrite

general framework2

general framework1