seminar on parallel and concurrent programming

63
Stefan Marr, Daniele Bonetta 2016 Seminar on Parallel and Concurrent Programming

Upload: stefan-marr

Post on 11-Apr-2017

1.670 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Seminar on Parallel and Concurrent Programming

Stefan Marr, Daniele Bonetta2016

Seminar onParallel and Concurrent Programming

Page 2: Seminar on Parallel and Concurrent Programming

2

Agenda

1. Modus Operandi

2. Introduction toConcurrent Programming Models

3. Seminar Paper Overview

Page 3: Seminar on Parallel and Concurrent Programming

3

MODUS OPERANDI

Page 4: Seminar on Parallel and Concurrent Programming

4

Tasks and Deadlines

• Talk on selected paper (student 1)– 30min with slides (+ 15min discussion)• to be discussed with us 1 week before

– Summary (max. 500 word)• 2 days before seminar, 11:59am

• Questions on assigned paper (student 2)– Min. 5 questions– 2 days before seminar, 11:59am

Summaries

Will Be Online

Before Talk

Page 5: Seminar on Parallel and Concurrent Programming

5

ReportCategory 1: Theoretical treatment• Focus on paper, related work, state of the art

of the field• Detailed discussionCategory 2: Practical treatment of topic, for instance• Reproduce experiments/results• Extend experiments• Experiment with variations

Reports and Slides to

be archived online

Page 6: Seminar on Parallel and Concurrent Programming

6

Report• paper summary (500 words)• outline, content, and experiments to be

discussed with us• Cat. 1: ca. 4000 word (excl. references)– state of the art, context in field, and specific

technique from paper• Cat. 2: ca. 2000 word (excl. references)– Discuss experiments, gained insights, found

limitations, etc.

Deadline: Feb. 6th

Page 7: Seminar on Parallel and Concurrent Programming

7

Consultations

• For alternative paper proposals

• To prepare presentation!

• To agree on focus of report/experiments– For experiments mandatory

Technically

optional, but…

Page 8: Seminar on Parallel and Concurrent Programming

8

Grading

• Required attendance: 80% of all meetings

• 50% slides, presentation, and discussion• 50% write-up/experiments

Page 9: Seminar on Parallel and Concurrent Programming

9

Timeline

Oct. 5th Introduction to ConcurrentProgramming Models

Oct. 10th Deadline: List of ranked papersOct. 12th Runtime Techniques for Big Data

and ParallelismWeek 3-5 Preparations and ConsultationsWeek 6-12 PresentationsFeb. 6th Deadline for Report

Depends on

#Students

Page 10: Seminar on Parallel and Concurrent Programming

10

Got Background inConcurrency/Parallelism?

Show of Hands!

Page 11: Seminar on Parallel and Concurrent Programming

Multicore is the Norm

8 Cores200 Euro Phones

24 CoresWorkstation

>=72 CoresEmbedded System

Page 12: Seminar on Parallel and Concurrent Programming

Problem: Power Wall at ca. 5 GHz

Page 13: Seminar on Parallel and Concurrent Programming

CPUs don’t get Faster But Multiply

1990 1995 2000 2005 2010 20150

1

2

3

4

0.2

1.5

3.83.33

3.8

4, 6, 12, … cores

GHz

1 core

Based on the Clock Frequency of Intel Processors

Page 14: Seminar on Parallel and Concurrent Programming

Power ≈ Voltage2 Frequency

Core

Cache Cache

Core Core

Voltage = -15%Frequency = -15%

Power = 1Performance ≈ 1.8

Page 15: Seminar on Parallel and Concurrent Programming

Problem: Memory Wall

Page 16: Seminar on Parallel and Concurrent Programming

Memory Wall

1980 1985 1990 1995 2000 20051

10

100

1000

10000CPU FrequencyDRAM Speeds

CPU -- 2x Every 2 Years

DRAM -- 2x Every 6 Years

Relative Performance

Gap

Source: Sun World Wide Analyst Conference Feb. 25, 2003

Page 17: Seminar on Parallel and Concurrent Programming

Multicore Transition

Work around physical limitationsPower Wall and Memory Wall

05/03/2023 17

Page 18: Seminar on Parallel and Concurrent Programming

Concurrency & Parallelism

70 Years of Problem SolvingFor a brief bit of history:ENIAC’s recessive geneMarcus Mitch, and Akera Atsushi. Penn Printout (March 1996)http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf

ENIAC's main control panel, U. S. Army Photo

Page 19: Seminar on Parallel and Concurrent Programming

Decades of Researchand Solutions for Everything

05/03/2023 19

Page 20: Seminar on Parallel and Concurrent Programming

20

But no Silver Bullet

CSPLocks, Monitors, …Fork/JoinTransactional MemoryData Flow

Actors

Page 21: Seminar on Parallel and Concurrent Programming

21

A Rough Categorization

CommunicatingIsolates

Threads and Locks CoordinatingThreads

Page 22: Seminar on Parallel and Concurrent Programming

22

A Rough Categorization

Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD thesis, Software Languages Lab, Vrije Universiteit Brussel.

Data Parallelism

Page 23: Seminar on Parallel and Concurrent Programming

23

THREADS AND LOCKSPowerful but hard

Page 24: Seminar on Parallel and Concurrent Programming

24

Uniform Shared Memory

A Modelfor the Machines We Used to Have

C/C++

Page 25: Seminar on Parallel and Concurrent Programming

25

Threads

• Sequences of instructions

• Unit of scheduling– Preemptive and concurrent

– Or parallel

time

Page 26: Seminar on Parallel and Concurrent Programming

A Snake Game

• Multiple players

• Compete for ‘apples’

• Shared board

05/03/2023 26

Page 27: Seminar on Parallel and Concurrent Programming

27

Race Conditions and Data Races

Race Condition• Result depending on

timing of operations

Data Race• Race condition on

memory• Synchronization

absent or incomplete

Page 28: Seminar on Parallel and Concurrent Programming

28

Locks

synchronized (board) { board.moveLeft(snake)}

Single Lock is Simple

Page 29: Seminar on Parallel and Concurrent Programming

29

Optimized Locking for more Parallelism

synchronized (board[3][3]) { synchronized (board[3][2]) { board.moveLeft(snake) }}

Strategy: Lock only cells you need to update

What could go wrong?

Page 30: Seminar on Parallel and Concurrent Programming

30

Common Issues

• Lack of Progress– Deadlock– Livelock

• Race Condition– Data race– Atomicity violation

• Performance– Sequential bottle necks– False sharing

Page 31: Seminar on Parallel and Concurrent Programming

31

Basic ConceptsShared Memory with Threads and Locks

• Threads• Synchronization

• No safety guarantees– Data Races– Deadlocks

P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al.P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. TrigonakisP2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al.P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al.

Questions?

Page 32: Seminar on Parallel and Concurrent Programming

32

COORDINATING THREADSMaking Coordination Explicit

CommunicatingThreads

Page 33: Seminar on Parallel and Concurrent Programming

Shared Memory withExplicit Coordination

Raising the Abstraction Level

Libraries formost languages

Page 34: Seminar on Parallel and Concurrent Programming

34

Two Main Variants

Temporal IsolationTransactional Memory

Explicit CommunicationChannel or Message-based

Page 35: Seminar on Parallel and Concurrent Programming

35

Transactional Memory

atomic { board.moveLeft(snake)}

Coordinated byRuntime System

Page 36: Seminar on Parallel and Concurrent Programming

36

Transactional Memory

Simple Programing Model

• No Data Races(within transactions)

• No Deadlocks

Issues• Performance overhead• Still experimental

• Livelocks• Inter-transactional

race conditions• I/O semantics

Page 37: Seminar on Parallel and Concurrent Programming

37

Some Issues

atomic { dataArray = getData(); fork { compute(dataArray[0]); } compute(dataArray[1]);}

P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al.P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al.

What happens with forked thread when transaction aborts?

Page 38: Seminar on Parallel and Concurrent Programming

38

Channel-based Communication

coordChannel ! (#moveLeft, snake)

for i in players(): msg ? coordChannels[i] match msg: (#moveLeft, snake): board[…,…] = …

Player Thread

Coordinator Thread Coordinator Thread

Player Thread Player Thread

send

receive

High-level communicationbut no safety guarantees

Page 39: Seminar on Parallel and Concurrent Programming

39

Coordinating Threads

Transactional Memory• Transactions• Simple Programming Model

• Practical Issues

Channel/Message Communication

• Explicit coordination– Channels or message sending– Higher abstraction level

• No safety guarantees

P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al.

P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-model, AMP'10)

Questions?

Page 40: Seminar on Parallel and Concurrent Programming

40

COMMUNICATING ISOLATESCommunication is Everything

Page 41: Seminar on Parallel and Concurrent Programming

41

Explicit Communication Only

Absence of Low-level Data Races

Page 42: Seminar on Parallel and Concurrent Programming

42

All Interactions Explicit

Actor A Actor B

Actor Principle

Page 43: Seminar on Parallel and Concurrent Programming

43

Many Many Variations

• Channel based– Communicating Sequential Processes

• Message based– Actor models

P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.

Page 44: Seminar on Parallel and Concurrent Programming

44

Communicating Event Loops

Actor A Actor B

One Message at a Time

Page 45: Seminar on Parallel and Concurrent Programming

45

Communicating Event Loops

Actor A Actor B

Actors Contain Objects

Page 46: Seminar on Parallel and Concurrent Programming

46

Communicating Event Loops

Actor A Actor B

Interacting via Messages

Page 47: Seminar on Parallel and Concurrent Programming

47

Message-based Communication

Player 1

Player 1

Board Actor

board <- moveLeft(snake)

class Board { private array; public moveLeft(snake) { array[snake.x][snake.y] = ... }}

Player Actor

Board Actor

async send

actors.create(Board)actors.create(Snake)actors.create(Snake)

Main Program

Page 48: Seminar on Parallel and Concurrent Programming

48

Communicating Isolates

Message or Channel Based• Explicit communication• No shared memory

• Still potential for– Behavioral deadlocks– Livelocks– Bad message inter-leavings– Message protocol violations

P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.

P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)

Questions?

Page 49: Seminar on Parallel and Concurrent Programming

49

DATA PARALLELISMParallelism for Structured Problems

Page 50: Seminar on Parallel and Concurrent Programming

DATA PARALLELISM WITH FORK/JOINJust one Example

50

Page 51: Seminar on Parallel and Concurrent Programming

Fork/Join with Work-Stealing

• Recursivedivide-and-conquer

• Automatic and efficient parallel scheduling

• Widely available for C++, Java, and .NET

05/03/2023 51

Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995), 'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.

Page 52: Seminar on Parallel and Concurrent Programming

Typical Applications

• Recursive Algorithms1

– Mergesort– List and tree traversals

• Parallel prefix, pack, and sorting problems2

• Irregular and unbalanced computation– On directed acyclic graphs (DAGs) – Ideally tree-shaped

52

1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

3 1 7 2

Page 53: Seminar on Parallel and Concurrent Programming

Tiny Example: Summing a large Array

• Simple array with numbers

• Recursively divide– Every ‘ ’ is a parallel fork

• Then do addition– Every ‘ ’ is a join

53

Note: This example is academic, and could be better expressed with a parallel map/reduce library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ.

46 9 42 7 55

45724965

4965

5 6

11+ 49

13+

24+

4572

72 45

9 9

18

+++

42++

Page 54: Seminar on Parallel and Concurrent Programming

Data Parallelism with Fork/Join

• Parallel programming technique

• Recursive divide-and-conquer

• Automatic and efficient load-balancing

58P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)

Page 55: Seminar on Parallel and Concurrent Programming

59

CONCLUSION CONCURRENCY MODELS

Page 56: Seminar on Parallel and Concurrent Programming

60

Four Rough Categories

CommunicatingIsolatesThreads and Locks

CoordinatingThreads

Data ParallelismQuestions?

Page 57: Seminar on Parallel and Concurrent Programming

61

SEMINAR PAPERS

Page 58: Seminar on Parallel and Concurrent Programming

62

These are Suggestions

Please, feel free topropose papers of your interest.

(Papers need to be approved by us)

Page 59: Seminar on Parallel and Concurrent Programming

63

Topics of Interest

• High-level language concurrency models– Actors, Communicating

Sequential Processes, STM, Stream Processing, ...

• Tooling– Debugging– Profiling

• Implementation and runtime systems– Communication

mechanisms– Data/object

representation– System-level aspects

• Big Data Frameworks– Programming models– Runtime level problems

Page 60: Seminar on Parallel and Concurrent Programming

64

Papers without ArtifactsP1.1 Transactional Data Structure Libraries, A. Spiegelman et al. (conc-

model, PLDI'16)P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman

et al. (conc-model, runtime, EuroSys'16)P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key

Properties, J. De Koster et al. (conc-model, Agere'16)P1.4 Why Do Scala Developers Mix the Actor Model with other

Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13)

P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)P1.6 The Asynchronous Partitioned Global Address Space Model, V.

Saraswat et al. (conc-model, AMP'10)

Page 61: Seminar on Parallel and Concurrent Programming

65

Papers without Artifacts

P1.7 Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud, S. C. Müller et al. (conc-model, runtime, OSDI'15)

P1.8 Fast Splittable Pseudorandom Number Generators, G. L. Steele et al. (runtime, OOPSLA'14)

P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al. (runtime, EuroSys'15)

P1.10Application-Assisted Live Migration of Virtual Machines with Java Applications, K.-Y. Hou et al. (runtime, EuroSys'15)

P1.11Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)

Page 62: Seminar on Parallel and Concurrent Programming

66

Papers with Artifacts

P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis (conc-model, PPoPP'16)

P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. (conc-model, ECOOP'16)

P2.3 StreamJIT: a commensal compiler for high-performance stream programming, J. Bosboom et al. (conc-model, runtime, OOPSLA'14)

P2.4 An Efficient Synchronization Mechanism for Multi-core Systems, M. Aldinucci et al. (conc-model, runtime, EuroPar'12)

P2.5 Parallel parsing made practical, A. Barenghi et al. (runtime, SCP'15)

Page 63: Seminar on Parallel and Concurrent Programming

67

Papers with Artifacts

P2.6 SparkR : Scaling R Program with Spark, S. Venkataraman et al. (conc-model, bigdata, SIGMOD'16)

P2.7 SparkSQL: Relational Data Processing in Spark, M. Armbrust et al. (bigdata, runtime, VLDB'14)

P2.8 Twitter Heron: Stream Processing at Scale, S. Kulkarni et al. (bigdata, SIGMOD'15)

P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. D. Bond et al. (tooling, OOPSLA'13)

P2.10Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al. (runtime, OOPSLA'16)