parallelization strategy

Post on 25-Jun-2015

383 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The global design is inspired by an old presentation already published on slideshare.

TRANSCRIPT

1

Is it an open door to

R. MAHMOUDI – A3SI Laboratory– 2009 April

common parallelization strategy for topological operators on multi-core multi-thread architecture ?

2R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion

3R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion

4R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

1. Scientific and technical context (1)

Image processing operators

Dynamic redistribution

Thresholding

Point-to-Point operators

Associated class

Linear filters Opening Thinning

Crest restoring

Smoothing

Watershed

Closing

Local operators

Morphological operators

Topological operators

Globaloperators

FourierTransformation

Euclidean DistanceTransformation

Not-linear filters Attributed

Filter

5R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

1. Scientific and technical context (2)

Point-to-Point operators

(Associated class) Vs (Parallelization strategies)

Local operators

Morphological operators

Topological operators

Globaloperators

Sienstra [1](2002)

Wilkinson [2]

(2007)

[1] F. J. Seinstra, D. Koelma, and J. M. Geusebroek, “A software architecture for user transparent parallel image processing”.[2] M.H.F. Wilkinson, H. Gao, W.H. Hesselink, “Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines”.[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time” .

Meijster [3]

6R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (1)

Top

olo

gic

al op

era

tors Thinning operator [1]

Crest restoring [1]

2D and 3D smoothing [2]

Watershed based on w-thinning [3]

Watershed based on graph [4]

Homotopic kernel transformation [5]

Leveling kernel transformation [5]

[1] M. Couprie, F. N. Bezerra, and G. Bertrand, “Topological operators for grayscale image processing”, [2] M. Couprie, and G. Bertrand, “Topology preserving alternating sequential filter for smoothing 2D and 3D objects”.[3] G. Bertrand, “On Topological Watersheds”.  [4] J. Cousty, M. Couprie, L. Najman and G. Betrand “Weighted fusion graphs: Merging properties and watersheds”.[5] G. Bertrand, J. C. Everat, and M. Couprie, "Image segmentation through operators based on topology“

common

paral lel ization

strategy

7R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (2)

Shared Memory Machine

CPU1 CPU2 CPU3

CPU

n

Random Access Memory

MIMD Machine :(Execute several instruction streams in parallel on different data)

Main Architectural Classes

SISD machines

SIMD machines

MISD machines

Distributed

Memory System

8R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (3)

C o m m o n p a r a l l e l i z a t i o n s t r a t e g y o f t o p o l o g i c a l o p e r a t o r s o n m u l t i - c o r e m u l t i t h r e a d a r c h i t e c t u r e ( M I M D M a c h i n e s – S h a r e d M e m o r y S y s t e m ) ?

1. Unifying parallelization method of topological operators class (Algorithmic level)2. Implementation Methodology and optimization techniques on multi-core multithread architecture (Architecture level).

Needs

Main Objectives

9R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

10R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Theoretical background

Algorithm : λ–Skeleton (input Ғ, λ ; output :Ғ)

1. Repeat until stability 2. Among all the points which are λ–deletable and not λ–end3. Select a point x of minimal value ;4. F(x)=αˉ(x,F)

Filtered thinning method that allows to selectively simplify the topology, based on a local contrast parameter λ.

(b) filtered skeleton with λ = 10.

(a) After Deriche gradient operator

11R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (1)

Define search area

Start parallel characterization

Create new shared data structure

End parallel characterization

Merge modified search areaRestart process until stability

12R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (2)

SDM-Strategy(Divide and conquer principle)

Up level

DATA PARALLELISM

Down level

THREAD PARALLELISM

MIXEDPARALLELIS

M

13R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (3)

14R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

2. Coordination of threads (1)

Thread 1

FailSuccess Blocked

Lock() Unlock() Push()Thread 2

First implementation using a lock-based shared FIFO queue.

15R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

2. Coordination of threads (2)

Thread 1

Semaphore

Lock() and access semaphore

Unlock() and leave semaphore

Push()

Thread 2

Second implementation using a private-shared concurrent FIFO queue

16R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (1)

P4 660 E8400 E5335 E5405

Arch. Pentium 4 Core 2 Duo Quad-core Octo-core

CPU Speed 3.60 GHz 3 GHz 2 GHz 2 GHz

Bus Speed 800 MHz 1333 MHz 1333 MHz 1333 MHz

L2 Size 800 MHz 6 MB 8 MB 12 MB

L2 Speed 3.6 GHz 3 GHz 2 GHz 2 GHz

package Type LGA775 LGA775 LGA771 LGA771

Techno. 90 nm 45 nm 65 nm 45 nm

17R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (2)

0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

First implementation using a lock-based shared FIFO queue.

18R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (3)

0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

1

2

3

4

5

6

7

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

Second implementation using a private-shared concurrent FIFO queue

19R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

4. Conclusion

1 2 4 80.0

0.2

0.4

0.6

0.8

1.0

1.2

Number of cores

Effi

cien

cy

Using lock-based shared FIFO queue

Using private-shared FIFO queue

Non-specific nature of the proposed parallelization strategy.

Threads coordination and communication during computing dependently parallel read/write for managing cache-resident data

1

2

20R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

21R. MAHMOUDI – A3SI Laboratory– 2009 April

Future work

1. Extension

Parallel Thinning Operator

SDM - StrategyPerformance enhancement (speed up)

Efficiency (work distribution)

Cache miss

Crest restoring I M B R I C AT E

T W OO p e r a t o r s

22R. MAHMOUDI – A3SI Laboratory– 2009 April

Future work

2. New parallel topological watershed

Parallel watershe

d Operator

SDM - Strategy

Performance enhancement (speed up)

Efficiency (work distribution)

Cache miss

% Achievement

80%

23R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

24R. MAHMOUDI – A3SI Laboratory– 2009 April

Discussion

Introduce future programming model (make it easy to write programs that execute efficiently on highly parallel C.S)

Introduce new “Draft” to design and evaluate parallel programming models (instead of old benchmark)

Maximize programmer productivity, future programming model must be more human-centric(than the conventional focus on hardware or application)

25R. MAHMOUDI – A3SI Laboratory– 2009 April

top related