parallelization strategy

25
1 Is it an open door to R. MAHMOUDI – A3SI Laboratory– 2009 April mon parallelization strate for topological operators on multi-core multi- thread architecture ?

Upload: r-m

Post on 25-Jun-2015

380 views

Category:

Technology


0 download

DESCRIPTION

The global design is inspired by an old presentation already published on slideshare.

TRANSCRIPT

Page 1: parallelization strategy

1

Is it an open door to

R. MAHMOUDI – A3SI Laboratory– 2009 April

common parallelization strategy for topological operators on multi-core multi-thread architecture ?

Page 2: parallelization strategy

2R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion

Page 3: parallelization strategy

3R. MAHMOUDI – A3SI Laboratory– 2009 April

Summary

Parallel thinning operator

Future work

General framework

Discussion

Page 4: parallelization strategy

4R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

1. Scientific and technical context (1)

Image processing operators

Dynamic redistribution

Thresholding

Point-to-Point operators

Associated class

Linear filters Opening Thinning

Crest restoring

Smoothing

Watershed

Closing

Local operators

Morphological operators

Topological operators

Globaloperators

FourierTransformation

Euclidean DistanceTransformation

Not-linear filters Attributed

Filter

Page 5: parallelization strategy

5R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

1. Scientific and technical context (2)

Point-to-Point operators

(Associated class) Vs (Parallelization strategies)

Local operators

Morphological operators

Topological operators

Globaloperators

Sienstra [1](2002)

Wilkinson [2]

(2007)

[1] F. J. Seinstra, D. Koelma, and J. M. Geusebroek, “A software architecture for user transparent parallel image processing”.[2] M.H.F. Wilkinson, H. Gao, W.H. Hesselink, “Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines”.[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time” .

Meijster [3]

Page 6: parallelization strategy

6R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (1)

Top

olo

gic

al op

era

tors Thinning operator [1]

Crest restoring [1]

2D and 3D smoothing [2]

Watershed based on w-thinning [3]

Watershed based on graph [4]

Homotopic kernel transformation [5]

Leveling kernel transformation [5]

[1] M. Couprie, F. N. Bezerra, and G. Bertrand, “Topological operators for grayscale image processing”, [2] M. Couprie, and G. Bertrand, “Topology preserving alternating sequential filter for smoothing 2D and 3D objects”.[3] G. Bertrand, “On Topological Watersheds”.  [4] J. Cousty, M. Couprie, L. Najman and G. Betrand “Weighted fusion graphs: Merging properties and watersheds”.[5] G. Bertrand, J. C. Everat, and M. Couprie, "Image segmentation through operators based on topology“

common

paral lel ization

strategy

Page 7: parallelization strategy

7R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (2)

Shared Memory Machine

CPU1 CPU2 CPU3

CPU

n

Random Access Memory

MIMD Machine :(Execute several instruction streams in parallel on different data)

Main Architectural Classes

SISD machines

SIMD machines

MISD machines

Distributed

Memory System

Page 8: parallelization strategy

8R. MAHMOUDI – A3SI Laboratory– 2009 April

General framework

2. Ph. D. objectives (3)

C o m m o n p a r a l l e l i z a t i o n s t r a t e g y o f t o p o l o g i c a l o p e r a t o r s o n m u l t i - c o r e m u l t i t h r e a d a r c h i t e c t u r e ( M I M D M a c h i n e s – S h a r e d M e m o r y S y s t e m ) ?

1. Unifying parallelization method of topological operators class (Algorithmic level)2. Implementation Methodology and optimization techniques on multi-core multithread architecture (Architecture level).

Needs

Main Objectives

Page 9: parallelization strategy

9R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

Page 10: parallelization strategy

10R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Theoretical background

Algorithm : λ–Skeleton (input Ғ, λ ; output :Ғ)

1. Repeat until stability 2. Among all the points which are λ–deletable and not λ–end3. Select a point x of minimal value ;4. F(x)=αˉ(x,F)

Filtered thinning method that allows to selectively simplify the topology, based on a local contrast parameter λ.

(b) filtered skeleton with λ = 10.

(a) After Deriche gradient operator

Page 11: parallelization strategy

11R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (1)

Define search area

Start parallel characterization

Create new shared data structure

End parallel characterization

Merge modified search areaRestart process until stability

Page 12: parallelization strategy

12R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (2)

SDM-Strategy(Divide and conquer principle)

Up level

DATA PARALLELISM

Down level

THREAD PARALLELISM

MIXEDPARALLELIS

M

Page 13: parallelization strategy

13R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

1. Parallelization strategy (3)

Page 14: parallelization strategy

14R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

2. Coordination of threads (1)

Thread 1

FailSuccess Blocked

Lock() Unlock() Push()Thread 2

First implementation using a lock-based shared FIFO queue.

Page 15: parallelization strategy

15R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

2. Coordination of threads (2)

Thread 1

Semaphore

Lock() and access semaphore

Unlock() and leave semaphore

Push()

Thread 2

Second implementation using a private-shared concurrent FIFO queue

Page 16: parallelization strategy

16R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (1)

P4 660 E8400 E5335 E5405

Arch. Pentium 4 Core 2 Duo Quad-core Octo-core

CPU Speed 3.60 GHz 3 GHz 2 GHz 2 GHz

Bus Speed 800 MHz 1333 MHz 1333 MHz 1333 MHz

L2 Size 800 MHz 6 MB 8 MB 12 MB

L2 Speed 3.6 GHz 3 GHz 2 GHz 2 GHz

package Type LGA775 LGA775 LGA771 LGA771

Techno. 90 nm 45 nm 65 nm 45 nm

Page 17: parallelization strategy

17R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (2)

0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

First implementation using a lock-based shared FIFO queue.

Page 18: parallelization strategy

18R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

3. Performance testing (3)

0 1 2 4 8 16 200

10

20

30

40

50

60

70

Number of threads

Wal

l-clo

ck t

ime

[ms]

1 cores

2 cores

4 cores

8 cores

0 1 2 4 8 16 200

1

2

3

4

5

6

7

Number of threads

Perf

orm

ance I

mpro

vem

ent

1 cores

2 cores

4 cores

8 cores

Second implementation using a private-shared concurrent FIFO queue

Page 19: parallelization strategy

19R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

4. Conclusion

1 2 4 80.0

0.2

0.4

0.6

0.8

1.0

1.2

Number of cores

Effi

cien

cy

Using lock-based shared FIFO queue

Using private-shared FIFO queue

Non-specific nature of the proposed parallelization strategy.

Threads coordination and communication during computing dependently parallel read/write for managing cache-resident data

1

2

Page 20: parallelization strategy

20R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

Page 21: parallelization strategy

21R. MAHMOUDI – A3SI Laboratory– 2009 April

Future work

1. Extension

Parallel Thinning Operator

SDM - StrategyPerformance enhancement (speed up)

Efficiency (work distribution)

Cache miss

Crest restoring I M B R I C AT E

T W OO p e r a t o r s

Page 22: parallelization strategy

22R. MAHMOUDI – A3SI Laboratory– 2009 April

Future work

2. New parallel topological watershed

Parallel watershe

d Operator

SDM - Strategy

Performance enhancement (speed up)

Efficiency (work distribution)

Cache miss

% Achievement

80%

Page 23: parallelization strategy

23R. MAHMOUDI – A3SI Laboratory– 2009 April

Parallel thinning operator

Future work

General framework

Discussion

Page 24: parallelization strategy

24R. MAHMOUDI – A3SI Laboratory– 2009 April

Discussion

Introduce future programming model (make it easy to write programs that execute efficiently on highly parallel C.S)

Introduce new “Draft” to design and evaluate parallel programming models (instead of old benchmark)

Maximize programmer productivity, future programming model must be more human-centric(than the conventional focus on hardware or application)

Page 25: parallelization strategy

25R. MAHMOUDI – A3SI Laboratory– 2009 April