parallelization strategy

Is it an open door to

R. MAHMOUDI – A3SI Laboratory– 2009 April

common parallelization strategy for topological operators on multi-core multi-thread architecture ?

2R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion

Summary

Future work

General framework

Discussion

General framework

1. Scientific and technical context (1)

Image processing operators

Dynamic redistribution

Thresholding

Point-to-Point operators

Associated class

Linear filters Opening Thinning

Crest restoring

Smoothing

Watershed

Closing

Local operators

Morphological operators

Topological operators

Globaloperators

FourierTransformation

Euclidean DistanceTransformation

Not-linear filters Attributed

Filter

General framework

1. Scientific and technical context (2)

Point-to-Point operators

(Associated class) Vs (Parallelization strategies)

Local operators

Morphological operators

Topological operators

Globaloperators

Sienstra [1](2002)

Wilkinson [2]

(2007)

[1] F. J. Seinstra, D. Koelma, and J. M. Geusebroek, “A software architecture for user transparent parallel image processing”.[2] M.H.F. Wilkinson, H. Gao, W.H. Hesselink, “Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines”.[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time” .

Meijster [3]

General framework

2. Ph. D. objectives (1)

tors Thinning operator [1]

Crest restoring [1]

2D and 3D smoothing [2]

Watershed based on w-thinning [3]

Watershed based on graph [4]

Homotopic kernel transformation [5]

Leveling kernel transformation [5]

[1] M. Couprie, F. N. Bezerra, and G. Bertrand, “Topological operators for grayscale image processing”, [2] M. Couprie, and G. Bertrand, “Topology preserving alternating sequential filter for smoothing 2D and 3D objects”.[3] G. Bertrand, “On Topological Watersheds”. [4] J. Cousty, M. Couprie, L. Najman and G. Betrand “Weighted fusion graphs: Merging properties and watersheds”.[5] G. Bertrand, J. C. Everat, and M. Couprie, "Image segmentation through operators based on topology“

common

paral lel ization

strategy

General framework

Shared Memory Machine

CPU1 CPU2 CPU3

Random Access Memory

MIMD Machine :(Execute several instruction streams in parallel on different data)

Main Architectural Classes

SISD machines

SIMD machines

MISD machines

Distributed

Memory System

General framework

C o m m o n p a r a l l e l i z a t i o n s t r a t e g y o f t o p o l o g i c a l o p e r a t o r s o n m u l t i - c o r e m u l t i t h r e a d a r c h i t e c t u r e ( M I M D M a c h i n e s – S h a r e d M e m o r y S y s t e m ) ?

1. Unifying parallelization method of topological operators class (Algorithmic level)2. Implementation Methodology and optimization techniques on multi-core multithread architecture (Architecture level).

Main Objectives

Future work

General framework

Discussion

1. Theoretical background

Algorithm : λ–Skeleton (input Ғ, λ ; output :Ғ)

1. Repeat until stability 2. Among all the points which are λ–deletable and not λ–end3. Select a point x of minimal value ;4. F(x)=αˉ(x,F)

Filtered thinning method that allows to selectively simplify the topology, based on a local contrast parameter λ.

(b) filtered skeleton with λ = 10.

(a) After Deriche gradient operator

1. Parallelization strategy (1)

Define search area

Start parallel characterization

Create new shared data structure

End parallel characterization

Merge modified search areaRestart process until stability

SDM-Strategy(Divide and conquer principle)

Up level

DATA PARALLELISM

Down level

THREAD PARALLELISM

MIXEDPARALLELIS

2. Coordination of threads (1)

Thread 1

FailSuccess Blocked

Lock() Unlock() Push()Thread 2

First implementation using a lock-based shared FIFO queue.

2. Coordination of threads (2)

Thread 1

Semaphore

Lock() and access semaphore

Unlock() and leave semaphore

Push()

Thread 2

Second implementation using a private-shared concurrent FIFO queue

3. Performance testing (1)

P4 660 E8400 E5335 E5405

Arch. Pentium 4 Core 2 Duo Quad-core Octo-core

CPU Speed 3.60 GHz 3 GHz 2 GHz 2 GHz

Bus Speed 800 MHz 1333 MHz 1333 MHz 1333 MHz

L2 Size 800 MHz 6 MB 8 MB 12 MB

L2 Speed 3.6 GHz 3 GHz 2 GHz 2 GHz

package Type LGA775 LGA775 LGA771 LGA771

Techno. 90 nm 45 nm 65 nm 45 nm

0 1 2 4 8 16 200

Number of threads

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

Number of threads

ance I

1 cores

2 cores

4 cores

8 cores

First implementation using a lock-based shared FIFO queue.

0 1 2 4 8 16 200

Number of threads

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

Number of threads

ance I

1 cores

2 cores

4 cores

8 cores

Second implementation using a private-shared concurrent FIFO queue

4. Conclusion

1 2 4 80.0

Number of cores

Using lock-based shared FIFO queue

Using private-shared FIFO queue

Non-specific nature of the proposed parallelization strategy.

Threads coordination and communication during computing dependently parallel read/write for managing cache-resident data

Future work

General framework

Discussion

Future work

1. Extension

Parallel Thinning Operator

SDM - StrategyPerformance enhancement (speed up)

Efficiency (work distribution)

Cache miss

Crest restoring I M B R I C AT E

T W OO p e r a t o r s

Future work

2. New parallel topological watershed

Parallel watershe

d Operator

SDM - Strategy

Performance enhancement (speed up)

Efficiency (work distribution)

Cache miss

% Achievement

Future work

General framework

Discussion

Introduce future programming model (make it easy to write programs that execute efficiently on highly parallel C.S)

Introduce new “Draft” to design and evaluate parallel programming models (instead of old benchmark)

Maximize programmer productivity, future programming model must be more human-centric(than the conventional focus on hardware or application)

parallelization strategy

mahmoudi a3si laboratory

parallel thinning operator1

parallel thinning operator3

parallel thinning operator2

parallel thinning operator4

parallel readwrite

general framework2

general framework1

Technology

parallelization and tuning

parallelization & multicore

parallelization: conway’s game of life

parallelization of fft in afni

turbodecodingalgorithm parallelization

unmanaged parallelization via p/invoke

efficient parallelization of a dynamic unstructured ... ·...

grid parallelization and tests

net multithreading and parallelization

trend towards parallelization

open ts dynamic parallelization system

parallelization using open mp

parallelization overheads - western...

parallel architectures & parallelization principles

parallelization of tests with seleniugrid

loop parallelization

a software strategy for simple parallelization of sequential...

parallelization of gauss-seidel relaxation for real gas...

parallelization and performance optimization of...

parallel monte-carlo tree search - maastricht university ·...