grppi - generic reusable parallel patterns interfacenesusws.irb.hr/images/presentation.pdf ·...

135
GrPPI GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group University Carlos III of Madrid Spain January 2018 cbed 1/105

Upload: others

Post on 19-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

GrPPIGeneric Reusable Parallel Patterns Interface

ARCOS GroupUniversity Carlos III of Madrid

Spain

January 2018

cbed 1/105

Page 2: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Warning

c This work is under Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)license.You are free to Share — copy and redistribute the ma-terial in any medium or format.

b You must give appropriate credit, provide a link to thelicense, and indicate if changes were made. You maydo so in any reasonable manner, but not in any way thatsuggests the licensor endorses you or your use.

e You may not use the material for commercial purposes.d If you remix, transform, or build upon the material, you

may not distribute the modified material.

cbed 2/105

Page 3: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

ARCOS@uc3m

UC3M: A young international research oriented university.ARCOS: An applied research group.

Lines: High Performance Computing, Big data,Cyberphysical Systems, and Programming models forapplication improvement.

Improving applications:REPARA: Reengineering and Enabling Performance andpoweR of Applications. Financiado por Comisión Europea(FP7). 2013–2016RePhrase: REfactoring Parallel Heterogeneous ResourceAware Applications. Financiado por Comisión Europea(H2020). 2015–2018

Standardization:ISO/IEC JTC/SC22/WG21. ISO C++ standards committe.

cbed 3/105

Page 4: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Acknowledgements

The GrPPI library has been partially supported by:Project ICT 644235 “REPHRASE: REfactoring ParallelHeterogeneous Resource-aware Applications” fundedby the European Commission through H2020 program(2015-2018).Project TIN2016-79673-P “Towards Unification of HPCand Big Data Paradigms” funded by the Spanish Ministryof Economy and Competitiveness (2016-2019).

cbed 4/105

Page 5: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

GrPPI team

Main teamJ. Daniel Garcia (UC3M, lead).David del Río (UC3M).Manuel F. Dolz (UC3M).Javier Fernández (UC3M).Javier Garcia Blas (UC3M).

CooperationPlácido Fernández (UC3M-CERN).Marco Danelutto (Univ. Pisa)Massimo Torquati (Univ. Pisa)Marco Aldinucci (Univ. Torino). . .

cbed 5/105

Page 6: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 6/105

Page 7: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

1 IntroductionParallel ProgrammingDesign patterns and parallel patternsGrPPI architecture

cbed 7/105

Page 8: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

Thinking in Parallel is hard.

Yale Patt

cbed 8/105

Page 9: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

Thinking

in Parallel

is hard.Yale Patt

cbed 8/105

Page 10: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programmingWell-known set of control-structures embedded inprogramming languages.Control structures inherently sequential.

Parallel programmingConstructs adapting sequential control structures to theparallel world (e.g. parallel-for).

But wait!What if we had constructs that could be both sequentialand parallel?

cbed 9/105

Page 11: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programmingWell-known set of control-structures embedded inprogramming languages.Control structures inherently sequential.

Parallel programmingConstructs adapting sequential control structures to theparallel world (e.g. parallel-for).

But wait!What if we had constructs that could be both sequentialand parallel?

cbed 9/105

Page 12: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Parallel Programming

Sequential Programming versus Parallel Programming

Sequential programmingWell-known set of control-structures embedded inprogramming languages.Control structures inherently sequential.

Parallel programmingConstructs adapting sequential control structures to theparallel world (e.g. parallel-for).

But wait!What if we had constructs that could be both sequentialand parallel?

cbed 9/105

Page 13: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Design patterns and parallel patterns

1 IntroductionParallel ProgrammingDesign patterns and parallel patternsGrPPI architecture

cbed 10/105

Page 14: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Design patterns and parallel patterns

Software design

There are two ways of constructing a software design:One way is to make it so simple that there areobviously no deficiencies, and the other way is tomake it so complicated that there are no obviousdeficiencies.The first method is far more difficult.

C.A.R Hoare

cbed 11/105

Page 15: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):1977: A Pattern Language: Towns, Buildings, Construction.1979: The timeless way of buildings.

To software design (Gamma et al.):1993: Design Patterns: abstraction and reuse of objectoriented design. ECOOP.1995: Design Patterns. Elements of ReusableObject-Oriented Software.

To parallel programming (McCool, Reinders, Robinson):2012: Structured Parallel Programming: Patterns forEfficient Computation.

cbed 12/105

Page 16: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):1977: A Pattern Language: Towns, Buildings, Construction.1979: The timeless way of buildings.

To software design (Gamma et al.):1993: Design Patterns: abstraction and reuse of objectoriented design. ECOOP.1995: Design Patterns. Elements of ReusableObject-Oriented Software.

To parallel programming (McCool, Reinders, Robinson):2012: Structured Parallel Programming: Patterns forEfficient Computation.

cbed 12/105

Page 17: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

Design patterns and parallel patterns

A brief history of patterns

From building and architecture (Cristopher Alexander):1977: A Pattern Language: Towns, Buildings, Construction.1979: The timeless way of buildings.

To software design (Gamma et al.):1993: Design Patterns: abstraction and reuse of objectoriented design. ECOOP.1995: Design Patterns. Elements of ReusableObject-Oriented Software.

To parallel programming (McCool, Reinders, Robinson):2012: Structured Parallel Programming: Patterns forEfficient Computation.

cbed 12/105

Page 18: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

1 IntroductionParallel ProgrammingDesign patterns and parallel patternsGrPPI architecture

cbed 13/105

Page 19: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Some ideals

Applications should be expressed independently of theexecution model.

Multiple back-ends should be offered with simple switchingmechanisms.Interface should integrate seamlessly with modern C++standard library.Make use of modern (C++14) language features.

cbed 14/105

Page 20: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Some ideals

Applications should be expressed independently of theexecution model.Multiple back-ends should be offered with simple switchingmechanisms.

Interface should integrate seamlessly with modern C++standard library.Make use of modern (C++14) language features.

cbed 14/105

Page 21: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Some ideals

Applications should be expressed independently of theexecution model.Multiple back-ends should be offered with simple switchingmechanisms.Interface should integrate seamlessly with modern C++standard library.

Make use of modern (C++14) language features.

cbed 14/105

Page 22: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Some ideals

Applications should be expressed independently of theexecution model.Multiple back-ends should be offered with simple switchingmechanisms.Interface should integrate seamlessly with modern C++standard library.Make use of modern (C++14) language features.

cbed 14/105

Page 23: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

GrPPI

https://github.com/arcosuc3m/grppi

A header only library (might change).A set of execution policies.A set of type safe generic algorithms.Requires C++14.GNU GPL v3.

cbed 15/105

Page 24: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

GrPPI

https://github.com/arcosuc3m/grppi

A header only library (might change).A set of execution policies.A set of type safe generic algorithms.Requires C++14.GNU GPL v3.

cbed 15/105

Page 25: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Setting up GrPPI

Structure.include: Include files.unit_tests: Unit tests using GoogleTest.samples: Sample programs.cmake-modules: Extra CMake scripts.

Initial setup

mkdir buildcd buildcmake ..make

cbed 16/105

Page 26: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Setting up GrPPI

Structure.include: Include files.unit_tests: Unit tests using GoogleTest.samples: Sample programs.cmake-modules: Extra CMake scripts.

Initial setup

mkdir buildcd buildcmake ..make

cbed 16/105

Page 27: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

CMake variables

GRPPI_UNIT_TESTS_ENABLE: Enable building unittests.GRPPI_OMP_ENABLE: Enable OpenMP back-end.GRPPI_TBB_ENABLE: Enable Intel TBB back-end.GRPPI_EXAMPLE_APPLICATIONS_ENABLE: Enablebuilding example applications.GRPPI_DOXY_ENABLE: Enable documentationgeneration.

cbed 17/105

Page 28: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Execution policies

The execution model is encapsulated by execution values.

Current execution types:sequential_execution.parallel_execution_native.parallel_execution_omp.parallel_execution_tbb.dynamic_execution.

All top-level patterns take one execution object.

cbed 18/105

Page 29: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Concurrency degree

Sets the number of underlying threads used by theexecution implementation.

sequential_execution⇒ 1parallel_execution_native⇒ hardware_concurrency().parallel_execution_omp⇒ omp_get_num_threads().

APIex.set_concurrency_degree(4)int n = ex.concurrency_degree()

cbed 19/105

Page 30: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Dynamic back-end

Useful if you want to take the decision at run-time.Holds any other execution policy (or empty).

Selecting the execution back-end

grppi :: dynamic_execution execution_mode(const std::string & opt) {using namespace grppi;if ( "seq" == opt) return sequential_execution{};if ( " thr " == opt) return parallel_execution_native {};if ( "omp" == opt) return parallel_execution_omp{};if ( "tbb" == opt) return parallel_execution_tbb {};return {};

}

cbed 20/105

Page 31: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Dynamic back-end

Useful if you want to take the decision at run-time.Holds any other execution policy (or empty).

Selecting the execution back-end

grppi :: dynamic_execution execution_mode(const std::string & opt) {using namespace grppi;if ( "seq" == opt) return sequential_execution{};if ( " thr " == opt) return parallel_execution_native {};if ( "omp" == opt) return parallel_execution_omp{};if ( "tbb" == opt) return parallel_execution_tbb {};return {};

}

cbed 20/105

Page 32: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Introduction

GrPPI architecture

Function objects

GrPPI is heavily based on passing code sections asfunction objects (aka functors).

Alternatives:Standard C++ predefined functors (e.g. std::plus<int>).Custom hand-written function objects.Lambda expressions.

Usually lambda expressions lead to more concise code.

cbed 21/105

Page 33: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 22/105

Page 34: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

2 Data patternsMap patternReduce patternMap/reduce patternStencil pattern

cbed 23/105

Page 35: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Maps on data sequences

A map pattern applies an operation to every element in atuple of data sets generating a new data set.

Given:A sequence x1

1 , x12 , . . . , x

1N ∈ T1,

A sequence x21 , x

22 , . . . , x

2N ∈ T2,

. . . , andA sequence xM

1 , xM2 , . . . , xM

N ∈ TM ,A function f : T1 × T2 × . . .× TM 7→ U

It generates the sequencef (x1

1 , x21 , . . . , x

M1 ), f (x1

2 , x22 , . . . , x

M2 ), . . . , f (x1

N , x2N , . . . , x

MN )

cbed 24/105

Page 36: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Maps on data sequences

cbed 25/105

Page 37: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Unidimensional maps

map pattern on a single input data set.

Given:A sequence x1, x2, . . . , xN ∈ TA function f : T 7→ U

It generates the sequence:f (x1), f (x2), . . . , f (xN)

cbed 26/105

Page 38: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Key element

Transformer operation: Any operation that can performthe transformation for a data item.

UnaryTransformer: Any C++ callable entity that takes adata item and returns the transformed value.auto square = [](auto x) { return x∗x; };auto length = []( const std:: string & s) { return s.lenght() ; };

MultiTransformer: Any C++ callable entity that takesmultiple data items and return the transformed vaue.auto normalize = [](double x, double y) { return sqrt(x∗x+y∗y); };auto min = []( int x, int y, int z) { return std :: min(x,y,z) ; }

cbed 27/105

Page 39: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Key element

Transformer operation: Any operation that can performthe transformation for a data item.

UnaryTransformer: Any C++ callable entity that takes adata item and returns the transformed value.auto square = [](auto x) { return x∗x; };auto length = []( const std:: string & s) { return s.lenght() ; };

MultiTransformer: Any C++ callable entity that takesmultiple data items and return the transformed vaue.auto normalize = [](double x, double y) { return sqrt(x∗x+y∗y); };auto min = []( int x, int y, int z) { return std :: min(x,y,z) ; }

cbed 27/105

Page 40: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Key element

Transformer operation: Any operation that can performthe transformation for a data item.

UnaryTransformer: Any C++ callable entity that takes adata item and returns the transformed value.auto square = [](auto x) { return x∗x; };auto length = []( const std:: string & s) { return s.lenght() ; };

MultiTransformer: Any C++ callable entity that takesmultiple data items and return the transformed vaue.auto normalize = [](double x, double y) { return sqrt(x∗x+y∗y); };auto min = []( int x, int y, int z) { return std :: min(x,y,z) ; }

cbed 27/105

Page 41: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Single sequences mapping

Double all elements in sequence

template <typename Execution>std :: vector<double> double_elements(const Execution & ex,

const std::vector<double> & v){

std :: vector<double> res(v.size());grppi :: map(ex, v.begin(), v.end(), res.begin() ,

[]( double x) { return 2∗x; }) ;}

cbed 28/105

Page 42: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Multiple sequences mapping

Add two vectors

template <typename Execution>std :: vector<double> add_vectors(const Execution & ex,

const std::vector<double> & v1,const std::vector<double> & v2)

{auto size = std :: min(v1.size() , v2.size () ) ;std :: vector<double> res(size);grppi :: map(ex, v1.begin(), v1.end(), res.begin() ,

[]( double x, double y) { return x+y; },v2.begin()) ;

}

cbed 29/105

Page 43: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Multiple sequences mapping

Add three vectors

template <typename Execution>std :: vector<double> add_vectors(const Execution & ex,

const std::vector<double> & v1,const std::vector<double> & v2,const std::vector<double> & v3)

{auto size = std :: min(v1.size() , v2.size () ) ;std :: vector<double> res(size);grppi :: map(ex, v1.begin(), v1.end(), res.begin() ,

[]( double x, double y, double z) { return x+y+z; },v2.begin() , v3.begin()) ;

}

cbed 30/105

Page 44: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map pattern

Heterogeneous mapping

The result can be from a different type.

Complex vector from real and imaginary vectors

template <typename Execution>std :: vector<complex<double>> create_cplx(const Execution & ex,

const std::vector<double> & re,const std::vector<double> & im)

{auto size = std :: min(re.size () , im.size () ) ;std :: vector<complex<double>> res(size);grppi :: map(ex, re.begin(), re.end(), res.begin() ,

[]( double r, double i) −> complex<double> { return {r,i}; }im.begin()) ;

}

cbed 31/105

Page 45: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Reduce pattern

2 Data patternsMap patternReduce patternMap/reduce patternStencil pattern

cbed 32/105

Page 46: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Reduce pattern

Reductions on data sequences

A reduce pattern combines all values in a data set using abinary combination operation.

Given:A sequence x1, x2, . . . , xN ∈ T .An identity value id ∈ I.A combine operation c : I × T 7→ I

c(c(x , y), z) ≡ c(x , c(y , z))c(id , x) = x̄ , where x̄ is the value of x in I.c(id , c(id , x)) = c(id , x)c(c(c(id , x), y), c(c(id , z), t)) = c(c(c(c(id , x), y), z), t)

It generates the value:c(. . . c(c(id , x1), x2) . . . , xN)

cbed 33/105

Page 47: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Reduce pattern

Reductions on data sequences

cbed 34/105

Page 48: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Reduce pattern

Homogeneous reduction

Add a sequence of values

template <typename Execution>double add_sequence(const Execution & ex, const vector<double> & v){

return grppi :: reduce(ex, v.begin() , v.end(), 0.0,[]( double x, double y) { return x+y; }) ;

}

cbed 35/105

Page 49: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Reduce pattern

Heterogeneous reduction

Add lengths of sequence of strings

template <typename Execution>int add_lengths(const Execution & ex, const std::vector<std::string> & words){

return grppi :: reduce(words.begin(), words.end(), 0,[]( int n, std :: string w) { return n + w.length() ; }) ;

}

cbed 36/105

Page 50: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

2 Data patternsMap patternReduce patternMap/reduce patternStencil pattern

cbed 37/105

Page 51: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce pattern

A map/reduce pattern combines a map pattern and areduce pattern into a single pattern.

1 One or more data sets are mapped applying atransformation operation.

2 The results are combined by a reduction operation.

A map/reduce could be also expressed by thecomposition of a map and a reduce.

However, map/reduce may potentially fuse both stages,allowing for extra optimizations.

cbed 38/105

Page 52: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing avalue.

Given:A sequence x1, x2, . . . xN ∈ TA mapping function m : T 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk )

cbed 39/105

Page 53: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing avalue.

Given:A sequence x1, x2, . . . xN ∈ TA mapping function m : T 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk )

cbed 39/105

Page 54: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce with single data set

A map/reduce on a single input sequence producing avalue.

Given:A sequence x1, x2, . . . xN ∈ TA mapping function m : T 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk )

cbed 39/105

Page 55: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce pattern

cbed 40/105

Page 56: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Single sequence map/reduce

Sum of squares

template <typename Execution>double sum_squares(const Execution & ex, const std::vector<double> & v){

return grppi :: map_reduce(ex, v.begin(), v.end(), 0.0,[]( double x) { return x∗x; }[]( double x, double y) { return x+y; }

) ;}

cbed 41/105

Page 57: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing asingle value.

Given:A sequence x1

1 , x12 , . . . x

1N ∈ T1

A sequence x21 , x

22 , . . . x

2N ∈ T2

. . .A sequence xM

1 , xM2 , . . . xM

N ∈ TMA mapping function m : T1 × T2 × . . .× TM 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk

1 , xk2 , . . . , x

kN)

cbed 42/105

Page 58: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing asingle value.Given:

A sequence x11 , x

12 , . . . x

1N ∈ T1

A sequence x21 , x

22 , . . . x

2N ∈ T2

. . .A sequence xM

1 , xM2 , . . . xM

N ∈ TMA mapping function m : T1 × T2 × . . .× TM 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk

1 , xk2 , . . . , x

kN)

cbed 42/105

Page 59: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce in multiple data sets

A map/reduce on multiple input sequences producing asingle value.Given:

A sequence x11 , x

12 , . . . x

1N ∈ T1

A sequence x21 , x

22 , . . . x

2N ∈ T2

. . .A sequence xM

1 , xM2 , . . . xM

N ∈ TMA mapping function m : T1 × T2 × . . .× TM 7→ RA reduction identity value id ∈ I.A combine operation c : I × R 7→ I

It generates a value reducing the mapping:c(c(c(id ,m1),m2), . . . ,mM)Where mk = m(xk

1 , xk2 , . . . , x

kN)

cbed 42/105

Page 60: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Map/reduce on two data sets

Scalar product

template <typename Execution>double scalar_product(const Execution & ex,

const std::vector<double> & v1,const std::vector<double> & v2)

{return grppi :: map_reduce(ex, begin(v1), end(v1), 0.0,

[]( double x, double y) { return x∗y; },[]( double x, double y) { return x+y; },v2.begin()) ;

}

cbed 43/105

Page 61: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Cannonical map/reduce

Given a sequence of words, produce a container where:The key is the word.The value is the number of occurrences of that word.

Word frequencies

template <typename Execution>auto word_freq(const Execution & ex, const std::vector<std::string> & words){

using namespace std;using dictionary = std :: map<string,int>;return grppi :: map_reduce(ex, words.begin(), words.end(), dictionary{},

[]( string w) −> dictionary { return {w,1}; }[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & entry : rhs) { lhs [entry. first ] += entry.second; }return lhs ;

}) ;}

cbed 44/105

Page 62: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Map/reduce pattern

Cannonical map/reduce

Given a sequence of words, produce a container where:The key is the word.The value is the number of occurrences of that word.

Word frequencies

template <typename Execution>auto word_freq(const Execution & ex, const std::vector<std::string> & words){

using namespace std;using dictionary = std :: map<string,int>;return grppi :: map_reduce(ex, words.begin(), words.end(), dictionary{},

[]( string w) −> dictionary { return {w,1}; }[]( dictionary & lhs, const dictionary & rhs) −> dictionary {

for (auto & entry : rhs) { lhs [entry. first ] += entry.second; }return lhs ;

}) ;}

cbed 44/105

Page 63: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

2 Data patternsMap patternReduce patternMap/reduce patternStencil pattern

cbed 45/105

Page 64: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil pattern

A stencil pattern applies a transformation to every elementin one or multiple data sets, generating a new data set asan output

The transformation is function of a data item and itsneighbourhood.

cbed 46/105

Page 65: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an outputsequence.

Given:A sequence x1, x2, . . . , xN ∈ TA neighbourhood function n : I 7→ NA transformation function f : I × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 47/105

Page 66: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an outputsequence.

Given:A sequence x1, x2, . . . , xN ∈ TA neighbourhood function n : I 7→ NA transformation function f : I × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 47/105

Page 67: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with single data set

A stencil on a single input sequence producing an outputsequence.

Given:A sequence x1, x2, . . . , xN ∈ TA neighbourhood function n : I 7→ NA transformation function f : I × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 47/105

Page 68: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil pattern

cbed 48/105

Page 69: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Single sequence stencil

Neighbour average

template <typename Execution>std :: vector<double> neib_avg(const Execution & ex, const std::vector<double> & v){

std :: vector<double> res(v.size());grppi :: stencil (ex, v.begin() , v.end(),

[]( auto it , auto n) {return ∗ it + accumulate(begin(n), end(n));

},[&](auto it ) {

vector<double> r;if ( it !=begin(v)) r .push_back(∗prev(it));if (distance( it ,end(end))>1) r.push_back(∗next(it));return r ;

}) ;return res;

}

cbed 49/105

Page 70: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an outputsequence.

Given:A sequence x1

1 , x12 , . . . , x

1N ∈ T1

A sequence x21 , x

22 , . . . , x

2N ∈ T1

. . .A sequence xM

1 , xM2 , . . . , xM

N ∈ T1A neighbourhood function n : I1 × I2 × IM 7→ NA transformation function f : I1 × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 50/105

Page 71: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an outputsequence.

Given:A sequence x1

1 , x12 , . . . , x

1N ∈ T1

A sequence x21 , x

22 , . . . , x

2N ∈ T1

. . .A sequence xM

1 , xM2 , . . . , xM

N ∈ T1A neighbourhood function n : I1 × I2 × IM 7→ NA transformation function f : I1 × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 50/105

Page 72: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Stencil with multiple data sets

A stencil on multiple input sequences producing an outputsequence.

Given:A sequence x1

1 , x12 , . . . , x

1N ∈ T1

A sequence x21 , x

22 , . . . , x

2N ∈ T1

. . .A sequence xM

1 , xM2 , . . . , xM

N ∈ T1A neighbourhood function n : I1 × I2 × IM 7→ NA transformation function f : I1 × N 7→ U

It generates the sequence:f (n(x1)), f (n(x2)), . . . , f (n(xN))

cbed 50/105

Page 73: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Data patterns

Stencil pattern

Multiple sequences stencil

Neighbour average

template <typename It>std :: vector<double> get_around(It i, It first , It last ) {

std :: vector<double> r;if ( i != first ) r .push_back(∗std::prev(i));if (std :: distance( i , last )>1) r .push_back(∗std::next(i)) ;

}

template <typename Execution>std :: vector<double> neib_avg(const Execution & ex, const std::vector<double> & v1,

const std::vector<double> & v2){

std :: vector<double> res(std::min(v1.size(),v2.size () ) ) ;grppi :: stencil (ex, v.begin() , v.end(),

[]( auto it , auto n) { return ∗ it + accumulate(begin(n), end(n)); },[&](auto it , auto it2 ) {

vector<double> r = get_around(it1, v1.begin(), v1.end()) ;vector<double> r2 = get_around(it2, v2.begin(), v2.end()) ;copy(r2.begin() , r2.end(), back_inserter(r) ) ;return r ;

},v2.begin()) ;

return res;}

cbed 51/105

Page 74: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 52/105

Page 75: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

3 Task PatternsDivide/conquer pattern

cbed 53/105

Page 76: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

Divide/conquer pattern

A divide/conquer pattern splits a problem into two or moreindependent subproblems until a base case is reached.

The base case is solved directly.The results of the subproblems are combined until the finalsolution of the original problem is obtained.

Key elements:Divider: Divides a problem in a set of subproblems.Solver: Solves and individual subproblem.Combiner: Combines two solutions.

cbed 54/105

Page 77: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

Divide/conquer pattern

A divide/conquer pattern splits a problem into two or moreindependent subproblems until a base case is reached.

The base case is solved directly.The results of the subproblems are combined until the finalsolution of the original problem is obtained.

Key elements:Divider: Divides a problem in a set of subproblems.Solver: Solves and individual subproblem.Combiner: Combines two solutions.

cbed 54/105

Page 78: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

Divide/conquer pattern

cbed 55/105

Page 79: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

A patterned merge/sort

Ranges on vectors

struct range {range(std::vector<double> & v) : first {v.begin() }, last {v.end()} {}auto size() const { return std :: distance( first , last ) ; }std :: vector<double> first, last ;

};

std :: vector<range> divide(range r) {auto mid = r. first + r .size () / 2;return { { r . first , mid}, {mid, r . last } };

}

cbed 56/105

Page 80: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Task Patterns

Divide/conquer pattern

A patterned merge/sort

Ranges on vectors

template <typename Execution>void merge_sort(const Execution & ex, std::vector<double> & v){

grppi :: divide_conquer(exec,range(v),[]( auto r) −> vector<range> {

if (1>=r.size () ) return { r };else return divide( r ) ;

},[]( auto x) { return x; },[]( auto r1, auto r2) {

std :: inplace_merge(r1.first , r1. last , r2. last ) ;return range{r1. first , r2. last };

}) ;}

cbed 57/105

Page 81: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 58/105

Page 82: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 59/105

Page 83: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Pipeline pattern

A pipeline pattern allows processing a data stream wherethe computation may be divided in multiple stages

Each stage processes the data item generated in theprevious stage and passes the produced result to the nextstage

cbed 60/105

Page 84: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.Invoking the pipeline translates into its execution.

Given:A generater g : ∅ 7→ T1 ∪∅A sequence of transformers ti : Ti 7→ Ti+1

For every non-empty value generated by g, it evaluates:fn(fn−1(. . . f1(g())))

cbed 61/105

Page 85: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.Invoking the pipeline translates into its execution.

Given:A generater g : ∅ 7→ T1 ∪∅A sequence of transformers ti : Ti 7→ Ti+1

For every non-empty value generated by g, it evaluates:fn(fn−1(. . . f1(g())))

cbed 61/105

Page 86: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Standalone pipeline

A standalone pipeline is a top-level pipeline.Invoking the pipeline translates into its execution.

Given:A generater g : ∅ 7→ T1 ∪∅A sequence of transformers ti : Ti 7→ Ti+1

For every non-empty value generated by g, it evaluates:fn(fn−1(. . . f1(g())))

cbed 61/105

Page 87: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Generators

A generator g is any callable C++ entity that:Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.T x = g() ;

Is contextually convertible to boolif (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferencedauto val = ∗x;

The standard library offers an excellent candidatestd::experimental::optional<T>.

cbed 62/105

Page 88: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Generators

A generator g is any callable C++ entity that:Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.T x = g() ;

Is contextually convertible to boolif (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferencedauto val = ∗x;

The standard library offers an excellent candidatestd::experimental::optional<T>.

cbed 62/105

Page 89: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Generators

A generator g is any callable C++ entity that:Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.T x = g() ;

Is contextually convertible to boolif (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferencedauto val = ∗x;

The standard library offers an excellent candidatestd::experimental::optional<T>.

cbed 62/105

Page 90: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Generators

A generator g is any callable C++ entity that:Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.T x = g() ;

Is contextually convertible to boolif (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferencedauto val = ∗x;

The standard library offers an excellent candidatestd::experimental::optional<T>.

cbed 62/105

Page 91: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Generators

A generator g is any callable C++ entity that:Takes no argument.Returns a value of type T that may hold (or not) a value.Null value signals end of stream.

The return value must be any type that:Is copy-constructible or move-constructible.T x = g() ;

Is contextually convertible to boolif (x) { /∗ ... ∗/ }if (! x) { /∗ ... ∗/ }

Can be derreferencedauto val = ∗x;

The standard library offers an excellent candidatestd::experimental::optional<T>.

cbed 62/105

Page 92: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Simple pipeline

x -> x*x -> 1/x -> print

template <typename Execution>void run_pipe(const Execution & ex, int n){

grppi :: pipeline (ex,[ i=0,max=n] () mutable −> optional<int> {

if ( i<max) return i;else return {};

},[]( int x) −> double { return x∗x; },[]( double x) { return 1/x; },[]( double x) { cout << x << "\n"; }

) ;}

cbed 63/105

Page 93: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Nested pipelines

Pipelines may be nested.

An inner pipeline:Does not take an execution policy.All stages are transformers (no generator).The last stage must also produce values.

The inner pipeline uses the same execution policy than theouter pipeline.

cbed 64/105

Page 94: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Nested pipelines

x -> x*x -> 1/x -> print

template <typename Execution>void run_pipe(const Execution & ex, int n){

grppi :: pipeline (ex,[ i=0,max=n] () mutable −> optional<int> {

if ( i<max) return i;else return {};

},grppi :: pipeline (

[]( int x) −> double { return x∗x; },[]( double x) { return 1/x; }) ,

[]( double x) { cout << x << "\n"; }) ;

}

cbed 65/105

Page 95: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Pipeline pattern

Piecewise pipelines

A pipeline can be piecewise created.

x -> x*x -> 1/x -> print

template <typename Execution>void run_pipe(const Execution & ex, int n){

auto generator = [ i=0,max=n] () mutable −> optional<int> {if ( i<max) return i; else return {};

};auto inner = grppi :: pipeline (

[]( int x) −> double { return x∗x; },[]( double x) { return 1/x; }) ;

auto printer = []( double x) { cout << x << "\n"; };

grppi :: pipeline (ex, generator, inner, printer ) ;}

cbed 66/105

Page 96: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Execution policies and pipelines

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 67/105

Page 97: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Execution policies and pipelines

Ordering

Signals if pipeline items must be consumed in the sameorder they were produced.

Do they need to be time-stamped?

Default is ordered.

APIex.enable_ordering()ex.disable_ordering()bool o = ex.is_ordered()

cbed 68/105

Page 98: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Execution policies and pipelines

Queueing properties

Some policies (native and omp) use queues tocommunicate pipeline stages.

Properties:Queue size: Buffer size of the queue.Mode: blocking versus lock-free.

APIex.set_queue_attributes(100, mode::blocking)

cbed 69/105

Page 99: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Farm stages

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 70/105

Page 100: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Farm stages

Farm pattern

A farm is a streaming pattern applicable to a stage in apipeline, providing multiple tasks to process data itemsfrom a data stream

A farm has an associated cardinality which is the numberof parallel tasks used to serve the stage

cbed 71/105

Page 101: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Farm stages

Farms in pipelines

Square values

template <typename Execution>void run_pipe(const Execution & ex, int n){

grppi :: pipeline (ex,[ i=0,max=n] () mutable −> optional<int> {

if ( i<max) return i;else return {};

},grppi :: farm(4

[]( int x) −> double { return x∗x; }),[]( double x) { cout << x << "\n"; }

) ;}

cbed 72/105

Page 102: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Farm stages

Piecewise farms

Square values

template <typename Execution>void run_pipe(const Execution & ex, int n){

auto inner = grppi :: farm(4 []( int x) −> double { return x∗x; });

grppi :: pipeline (ex,[ i=0,max=n] () mutable −> optional<int> {

if ( i<max) return i;else return {};

},inner,[]( double x) { cout << x << "\n"; }

) ;}

cbed 73/105

Page 103: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Filtering stages

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 74/105

Page 104: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Filtering stages

Filter pattern

A filter pattern discards (or keeps) the data items from adata stream based on the outcome of a predicate.

cbed 75/105

Page 105: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Filtering stages

Filter pattern

A filter pattern discards (or keeps) the data items from adata stream based on the outcome of a predicateThis pattern can be used only as a stage of a pipeline

Alternatives:Keep: Only data items satisfying the predicate are sent tothe next stageDiscard: Only data items not satisfying the predicate aresent to the next stage

cbed 76/105

Page 106: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Filtering stages

Filtering in

Print primes

bool is_prime(int n);

template <typename Execution>void print_primes(const Execution & ex, int n){

grppi :: pipeline (exec,[ i=0,max=n]() mutable −> optional<int> {

if ( i<=n) return i++;else return {};

},grppi :: keep(is_prime),[]( int x) { cout << x << "\n"; }

) ;}

cbed 77/105

Page 107: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Filtering stages

Filtering out

Discard words

template <typename Execution>void print_primes(const Execution & ex, std::istream & is ){

grppi :: pipeline (exec,[& file ]() −> optional<string> {

string word;file >> word;if (! file ) { return {}; }else { return word; }

},grppi :: discard ([]( std :: string w) { return w.length() < 4; },[]( std :: string w) { cout << x << "\n"; }

) ;}

cbed 78/105

Page 108: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Reductions in pipelines

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 79/105

Page 109: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Reductions in pipelines

Stream reduction pattern

A stream reduction pattern performs a reduction over theitems of a subset of a data stream

cbed 80/105

Page 110: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Reductions in pipelines

Stream reduction pattern

A stream reduction pattern performs a reduction over theitems of a subset of a data stream

Key elementswindow-size: Number of elements in a window reductionoffset: Distance between the begin of two consecutivewindowsidentity: Initial value used for reductionscombiner: Operation used for reductions

cbed 81/105

Page 111: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Reductions in pipelines

Windowed reductions

Chunked sum

template <typename Execution>void print_primes(const Execution & ex, int n){

grppi :: pipeline (exec,[ i=0,max=n]() mutable −> optional<double> {

if ( i<=n) return i++;else return {};

},grppi :: reduce(100, 50, 0.0,

[]( double x, double y) { return x+y; }) ,[]( int x) { cout << x << "\n"; }

) ;}

cbed 82/105

Page 112: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Iterations in pipelines

4 Streaming patternsPipeline patternExecution policies and pipelinesFarm stagesFiltering stagesReductions in pipelinesIterations in pipelines

cbed 83/105

Page 113: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Iterations in pipelines

Stream iteration pattern

A stream iteration pattern allows loops in data streamprocessing.

An operation is applied to a data item until a predicate issatisfied.When the predicate is met, the result is sent to the outputstream.

Key elements:A transformer that is applied to a data item on eachiteration.A predicate to determine when the iteration has finished.

cbed 84/105

Page 114: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Iterations in pipelines

Stream iteration pattern

A stream iteration pattern allows loops in data streamprocessing.

An operation is applied to a data item until a predicate issatisfied.When the predicate is met, the result is sent to the outputstream.

Key elements:A transformer that is applied to a data item on eachiteration.A predicate to determine when the iteration has finished.

cbed 84/105

Page 115: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Streaming patterns

Iterations in pipelines

Iterating

Print values 2n ∗ x

template <typename Execution>void print_values(const Execution & ex, int n){

auto generator = [ i=1,max=n+1]() mutable −> optional<int> {if ( i<max) return i++;else return {};

};

grppi :: pipeline (ex,generator,grppi :: repeat_until (

[]( int x) { return 2∗x; },[]( int x) { return x>1024; }

) ,[]( int x) { cout << x << endl; }

) ;}

cbed 85/105

Page 116: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 86/105

Page 117: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

Addine a new policy

Adding a new execution policy is done by writing a newclass.

No inheritance needed.“Inheritance is the base class of all evils” (Sean Parent).

No dependency from the library.Additionally configure some meta-functions (until we haveconcepts).

cbed 87/105

Page 118: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

My custom execution

my_execution

class my_execution {my_execution() noexcept;

void set_concurrency_degree(int n) const noexcept;void concurrency_degree() const noexcept;

void enable_ordering() noexcept;void disable_ordering() noexcept;bool is_ordered() const noexcept;

// ...};

template <>constexpr bool is_supported<my_execution>() { return true; }

cbed 88/105

Page 119: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

Adding a pattern

my_execution::map

class my_execution {

// ...

template <typename ... InputIterators, typename OutputIterator,typename Transformer>

constexpr void map(std::tuple<InputIterators...> firsts ,OutputIterator first_out , std :: size_t sequence_size,Transformer && transform_op) const;

// ...};

template <>constexpr bool supports_map<my_execution>() { return true; }

cbed 89/105

Page 120: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

Some helpers in the library

Applying a function to a tuple of iterators

template <typename F, typename ... Iterators, template <typename ...> class T>decltype(auto) apply_deref_increment(

F && f,T<Iterators ...> & iterators )

Takes a function f and a tuple of iterators (e.g. result ofmake_tuple(it1, it2, it3).Returns f(*it1++, *it2++, *it3++).Very convenient for implementing data patterns.More like this in include/common/iterator.h.

cbed 90/105

Page 121: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

Implementing map

map

template <typename ... InputIterators, typename OutputIterator,typename Transformer>

void my_execution_native::map(std::tuple<InputIterators...> firsts ,OutputIterator first_out , std :: size_t sequence_size,Transformer transform_op) const

{using namespace std;auto process_chunk = [&transform_op](auto fins, std::size_t size, auto fout){

const auto l = next(get<0>(fins) , size) ;while (get<0>(fins)!= l ) {∗fout++ = apply_deref_increment(

std :: forward<Transformer>(transform_op), fins);}

};// ...

cbed 91/105

Page 122: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Writing your own execution

Implementing map

map

// ...const int chunk_size = sequence_size / concurrency_degree_;{

some_worker_pool workers;for ( int i=0; i !=concurrency_degree_−1; ++i) {

const auto delta = chunk_size ∗ i;const auto chunk_firsts = iterators_next ( firsts ,delta) ;const auto chunk_first_out = next( first_out , delta) ;workers.launch(process_chunk, chunk_firsts, chunk_size, chunk_first_out);

}

const auto delta = chunk_size ∗ (concurrency_degree_ − 1);const auto chunk_firsts = iterators_next ( firsts ,delta) ;const auto chunk_first_out = next( first_out , delta) ;process_chunk(chunk_firsts, sequence_size − delta, chunk_first_out);

} // Implicit pool synch}

cbed 92/105

Page 123: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 93/105

Page 124: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Evaluation

Plataform:2 × Intel Xeon Ivy Bridge E5-2695 v2.Total number of cores: 24.Clock frequency: 2.40 GHz.L3 cache size: 30 MB.Main memory: 128 GB DDR3.OS: Ubuntu Linux 14.04 LTS, kernel 3.13.

Software:Compiler: GCC 6.2.OpenMP 4.0: included in GCC.ISO C++ Threads: included in the C++ STL.Intel TBB: www.threadingbuildingblocks.org

cbed 94/105

Page 125: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Use case

Video processing application for detecting edges using thefilters:

Gaussian BlurSobel operator

It uses a pipeline pattern:S1: Reading frames from a cameraS2: Apply the Gaussian Blur filter (it can use a farm)S3: Apply the Sobel operator (it can use a farm)S4: Writing frames into a file

Parallel variants:using the back ends directlyusing

cbed 95/105

Page 126: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Pipeline compositions

Pipeline+farm compositions made in the video application:

10 D. del Rio Astorga et al.

the parallel Pipeline pattern, in which the first stage reads the frames froma video file passed as input; the second and third stages apply the GaussianBlur and Sobel filters, respectively; and the last stage dumps the processedframes to an output video file.

To carry out the experimental evaluation, we first parallelize this video ap-plication using the above-mentioned execution frameworks and the proposedinterface. Afterwards, we compare both performance and lines of code requiredto implement such parallel versions with respect to the sequential one. Notethat for the case of OpenMP, the implementation of the Pipeline pattern is notstraightforward: it requires the use of queues to communication items betweenstages. In our particular case we leveraged a variant of the Michael and Scottlock-free queue in C++ [13]. To further experiment with our interface, we imple-ment di↵erent versions of the video application using the execution frameworksand distinct compositions of patterns in its main pipeline. As depicted in Fig. 2,(a) we use a non-composed pipeline ( s | s | s | s ); (b) a pipeline composed of afarm in its second stage ( s | f | s | s ); (c) a pipeline composed of a farm in itsthird stage ( s | s | f | s ); and (d) a pipeline composed of two farms in the secondand third stages ( s | f | f | s ).

(a) Non-composed Pipeline. (b) Pipeline ( s | f | s | s ).

(c) Pipeline ( s | s | f | s ). (d) Pipeline ( s | f | f | s ).

Fig. 2: Pipeline and Farm compositions of the video application.

5.1 Analysis of the Usability

In this section we analyze the usability and flexibility of the generic interfacedeveloped. To analyze this aspect, we compare the number of lines required toimplement the parallel version of the application leveraging the interface, withrespect to using directly the parallel execution frameworks. Table 1 summarizesthe percentage of additional lines introduced into the sequential source codein order to implement such parallel versions using the above-mentioned pattern

cbed 96/105

Page 127: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Usability of

Pipeline % of increase of lines of code w.r.t sequentialcomposition C++ Threads OpenMP Intel TBB(p |p |p |p) +8.8 % +13.0 % +25.9 % +1.8 %(p |f |p |p) +59.4 % +62.6 % +25.9 % +3.1 %(p |p |f |p) +60.0 % +63.9 % +25.9 % +3.1 %(p |f |f |p) +106.9 % +109.4 % +25.9 % +4.4 %

cbed 97/105

Page 128: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Performance: frames per secondA GENERIC PARALLEL PATTERN INTERFACE FOR STREAM AND DATA PROCESSING 13

1

4

16

64

480p 720p 1080p 1440p 2160p

Fra

mes

per

sec

ond

Pipeline (p|p|p|p)

1

4

16

64

480p 720p 1080p 1440p 2160p

Pipeline (p|f|p|p)

1

4

16

64

480p 720p 1080p 1440p 2160p

Fra

mes

per

sec

ond

Video resolution

Pipeline (p|p|f|p)

1

4

16

64

256

480p 720p 1080p 1440p 2160p

Video resolution

Pipeline (p|f|f|p)

C++11GrPPI C++11

OpenMPGrPPI OpenMP

TBBGrPPI TBB

Figure 4. FPS w/ and w/o using GRPPI along with the different frameworks and Pipeline compositions.

5.3. Performance analysis of stream vs data patterns

Our next analysis compares the performance among Pipeline compositions that combine streamand data parallel patterns. Figure 5 shows the FPS obtained for different video resolutions andparallel frameworks using GRPPI in different Pipeline compositions containing both stream anddata patterns, Farm and Stencil, respectively. As can be seen, both C++ Threads and Intel TBBframeworks deliver similar performance results for all compositions. A more detailed inspectionof these plots unveils an inflection point where the data-stream compositions start attaining betterperformance. This occurs from 1080p on for C++ Threads and from 1440p on for TBB. Noteas well the slight difference using only a Stencil for computing the first or second filter. Thereason behind this behavior is the higher computational load of the Sobel with respect to theGaussian Blur filter. Regarding the OpenMP framework, it can be clearly seen that the stream-stream (p |f |f |p) composition delivers better results than using stream-data constructions. Thisis mainly because the worker threads in the GRPPI-Farm pattern leverage OpenMP tasks that areactive during the whole video processing, while the Stencil implementation creates and destroys atask each time a video frame is processed. We figured out that the GCC-OpenMP implementationdoes not make use of a thread pool and, therefore, the threads in each Stencil computation arerecurrently created and destroyed. Consequently, stream-data compositions in OpenMP suffer fromconsiderable performance degradations.

Figure 5. FPS for different frameworks and Pipeline compositions with stream and data patterns.

Copyright c� 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. (2016)Prepared using cpeauth.cls DOI: 10.1002/cpe

cbed 98/105

Page 129: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Evaluation

Observations

Using farm for both stages leads an improved FPS rate.

Using farm for only one stage does not any bringsignificant improvement.

Impact of on performanceNegligible overheads of about 2%

Impact on programming effortsSignificant less efforts with respect to other programmingmodels

cbed 99/105

Page 130: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

1 Introduction

2 Data patterns

3 Task Patterns

4 Streaming patterns

5 Writing your own execution

6 Evaluation

7 Conclusions

cbed 100/105

Page 131: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

Summary

An unified programming model for sequential and parallelmodes.Multiple back-ends available.Current pattern set:

Data: map, reduce, map/reduce, stencil.Task: divide/conquer.Streaming: pipeline with nesting of farm, filter,reduction, iteration.

Current limitation:Pipelines cannot be nested inside other patterns (e.g.iteration of a pipeline).

cbed 101/105

Page 132: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

Future work

Integrate additional backends (e.g. FastFlow, CUDA).Eliminate metaprogramming by using Concepts.Extend and simplify the interface for data patterns.Support multi-context patterns.Better support of NUMA for native back-end.More patterns.More applications.

cbed 102/105

Page 133: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

Recent publications

A Generic Parallel Pattern Interface for Stream and Data Processing. D. delRio, M. F. Dolz, J. Fernández, J. D. García. Concurrency and Computation:Practice and Experience. 2017.

Supporting Advanced Patterns in GrPPI: a Generic Parallel PatternInterface. D. R. del Astorga, M. F. Dolz, J. Fernandez, and J. D. Garcia,Auto-DaSP 2017 (Euro-Par 2017).

Probabilistic-Based Selection of Alternate Implementations forHeterogeneous Platforms. J. Fernandez, A. Sanchez, D. del Río, M. F. Dolz, J.D Garcia. ICA3PP 2017. 2017.

A C++ Generic Parallel Pattern Interface for Stream Processing. D. del Río,M. F. Dolz, L. M. Sanchez, J. Garcia-Blas and J. D. Garcia. ICA3PP 2016.

Finding parallel patterns through static analysis in C++ applications. D. R.del Astorga, M. F. Dolz, L. M. Sanchez, J. D. Garcia, M. Danelutto, and M.Torquati, International Journal of High Performance Computing Applications,2017.

cbed 103/105

Page 134: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

GrPPI

https://github.com/arcosuc3m/grppi

cbed 104/105

Page 135: GrPPI - Generic Reusable Parallel Patterns Interfacenesusws.irb.hr/images/presentation.pdf · Design patterns and parallel patterns ... All top-level patterns take one execution object

GrPPI

Conclusions

GrPPIGeneric Reusable Parallel Patterns Interface

ARCOS GroupUniversity Carlos III of Madrid

Spain

January 2018

cbed 105/105