function level parallelism lead by data dependencies

1
Function Level Parallelism Lead by Data Dependencies Sean Rul, Hans Vandierendonck and Koen De Bosschere Ghent University, ELIS-PARIS, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium Sean Rul is supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders [email protected] http://www.elis.ugent.be/~srul 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Compression Decompression Total Speedup Original Heterogeneous Homogeneous Conclusion Conclusion Conclusion Conclusion Results Results Results Results Problem Problem Problem Problem Applications Applications Applications Applications m f h g i l j k Intercluster data stream Intracluster data stream FPGA Method Method Method Method Matching parallel constructs Matching parallel constructs Matching parallel constructs Matching parallel constructs Call Graph Call Graph Call Graph Call Graph Interprocedural Interprocedural Interprocedural Interprocedural Data Flow Graph Data Flow Graph Data Flow Graph Data Flow Graph Data Sharing Graph Data Sharing Graph Data Sharing Graph Data Sharing Graph Abstracting profiled information Abstracting profiled information Abstracting profiled information Abstracting profiled information Parallelizing Parallelizing Parallelizing Parallelizing Too much information Profile Profile Profile Profile Sequential program Sequential program Sequential program Sequential program Multithreaded Multithreaded Multithreaded Multithreaded program program program program Hybrid Hardware / Software or embedded systems Data Partitioning on Cell processor Besides parallelizing sequential programs: Program Bzip2 (SPEC2000) with reference input Executed on a quad Itanium® system x 10 x 10 x 20 x 20 x 100 x 30 x 20 x 10 1% m f h g i l j k 20% 14% 15% 15% 10% 10% 14% # executions % execution time Read m f h g i l j k ds ds ds ds 1 ds ds ds ds 4 ds ds ds ds 7 ds ds ds ds 8 ds ds ds ds 5 ds ds ds ds 6 ds ds ds ds 9 ds ds ds ds 2 ds ds ds ds 3 Cluster private Cluster shared Write •New microprocessor generation: Increase in parallel computing power •Sequential programs: Cannot exploit these resources •Parallelizing by hand: Difficult and time consuming •Let the compiler do it: Setup framework for parallelism detection •Call graph and interprocedural data flow graph are useful for detecting parallel constructs •Data sharing graph reveals data affinity between functions •Future work: - Find new parallel constructs - Investigate bidirectional data streams Look for a balanced solution Detect for example a data pipeline Minimize communication between threads Add synchronization and initialization code Elliptic node: Data structure Rectangular node: Function

Upload: others

Post on 25-Jan-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Function Level Parallelism Lead by Data Dependencies

Sean Rul, Hans Vandierendonck and Koen De Bosschere

Ghent University, ELIS-PARIS, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium

Sean Rul is supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders

[email protected] http://www.elis.ugent.be/~srul

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Compression Decompression Total

Sp

eed

up

Original Heterogeneous Homogeneous

ConclusionConclusionConclusionConclusionResultsResultsResultsResults

ProblemProblemProblemProblem ApplicationsApplicationsApplicationsApplications

mmmm

ffff

hhhh

gggg

iiii llll

jjjj kkkk

Intercluster data stream

Intracluster data stream

FPGA

MethodMethodMethodMethod

Matching parallel constructsMatching parallel constructsMatching parallel constructsMatching parallel constructs

Call GraphCall GraphCall GraphCall Graph InterproceduralInterproceduralInterproceduralInterprocedural Data Flow GraphData Flow GraphData Flow GraphData Flow Graph Data Sharing GraphData Sharing GraphData Sharing GraphData Sharing Graph

Abstracting profiled informationAbstracting profiled informationAbstracting profiled informationAbstracting profiled information

ParallelizingParallelizingParallelizingParallelizing

Too much information

ProfileProfileProfileProfile

Sequential programSequential programSequential programSequential program

Multithreaded Multithreaded Multithreaded Multithreaded programprogramprogramprogram

Hybrid Hardware / Software or embedded

systems

Data Partitioning onCell processor

Besides parallelizing sequential programs:

Program Bzip2 (SPEC2000) with reference input

Executed on a quad Itanium® system

x 10

x 10 x 20x 20

x 100 x 30 x 20 x 10

1%mmmm

ffff

hhhh

gggg

iiii llll

jjjj kkkk20%

14% 15%

15% 10% 10%

14%

# executions

% execution time

Read

mmmm

ffff

hhhh

gggg

iiii llll

jjjj kkkk

dsdsdsds1111

dsdsdsds4444

dsdsdsds7777

dsdsdsds8888

dsdsdsds5555

dsdsdsds6666

dsdsdsds9999

dsdsdsds2222 dsdsdsds3333

Cluster privateCluster shared

Write

•New microprocessor generation: Increase in parallel computing power

•Sequential programs: Cannot exploit these resources

•Parallelizing by hand: Difficult and time consuming

•Let the compiler do it:Setup framework for parallelism detection

•Call graph and interprocedural data flow graphare useful for detecting parallel constructs

•Data sharing graph reveals data affinitybetween functions

•Future work:- Find new parallel constructs- Investigate bidirectional data streams

Look for a balanced solution

Detect for examplea data pipeline

Minimize communication between threads Add synchronization and initialization code

Elliptic node:Data structure

Rectangular node: Function