cs 61c: great ideas in computer architecture lecture 19 ...cs61c/fa16/lec/19/l19.pdf · cs 61c...

57
CS 61C: Great Ideas in Computer Architecture Lecture 19: Thread-Level Parallel Processing Bernhard Boser & Randy Katz http://inst.eecs.berkeley.edu/~cs61c

Upload: others

Post on 17-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

CS61C:GreatIdeasinComputerArchitecture

Lecture19:Thread-LevelParallelProcessing

BernhardBoser&RandyKatz

http://inst.eecs.berkeley.edu/~cs61c

Page 2: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 2

Page 3: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ImprovingPerformance1. Increaseclockratefs

− Reachedpracticalmaximumfortoday’stechnology− <5GHzforgeneralpurposecomputers

2. LowerCPI(cyclesperinstruction)− SIMD,“instructionlevelparallelism”

3. Performmultipletaskssimultaneously− MultipleCPUs,eachexecutingdifferentprogram− Tasksmayberelated

§ E.g.eachCPUperformspartofabigmatrixmultiplication− orunrelated

§ E.g.distributedifferentwebhttprequestsoverdifferentcomputers§ E.g.runppt (viewlectureslides)andbrowser(youtube)simultaneously

4. Doalloftheabove:− Highfs,SIMD,multipleparalleltasks

3CS61c Lecture19:ThreadLevelParallelProcessing

Today’slecture

Page 4: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

New-SchoolMachineStructures(It’sabitmorecomplicated!)

• ParallelRequestsAssigned tocomputere.g.,Search“Katz”

• ParallelThreadsAssigned tocoree.g.,Lookup,Ads

• ParallelInstructions>[email protected].,5pipelined instructions

• ParallelData>1dataitem@one timee.g.,Addof4pairsofwords

• HardwaredescriptionsAllgates@onetime

• ProgrammingLanguages 4

SmartPhone

WarehouseScale

Computer

SoftwareHardware

HarnessParallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

CacheMemory

Core

InstructionUnit(s) FunctionalUnit(s)

A3+B3A2+B2A1+B1A0+B0

Project4CS61c Lecture19:ThreadLevelParallelProcessing

Page 5: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ParallelComputerArchitectures

CS61c 5

Severalseparatecomputers,somemeansforcommunication(e.g.Ethernet)

Massivearrayofcomputers,fastcommunicationbetweenprocessors

Multi-coreCPU:1datapathinsinglechip

shareL3cache,memory, peripheralsExample:Hivemachines

GPU“graphicsprocessing unit”

Page 6: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Example:CPUwith2Cores

6

Processor“Core”1

Control

DatapathPC

Registers(ALU)

MemoryInput

Output

Bytes

I/O-MemoryInterfaces

Processor0MemoryAccesses

Processor“Core”2

Control

DatapathPC

Registers(ALU)

Processor1MemoryAccesses

CS61c

Page 7: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

MultiprocessorExecutionModel

• Eachprocessor(core)executesitsowninstructions• Separate resources(notshared)

− Datapath(PC,registers,ALU)− Highestlevelcaches(e.g.1st and2nd)

• Shared resources− Memory(DRAM)− Often3rd levelcache

§ Oftenonsamesiliconchip§ Butnotarequirement

• Nomenclature− “MultiprocessorMicroprocessor”− Multicoreprocessor

§ E.g.4coreCPU(centralprocessingunit)§ Executes4differentinstructionstreamssimultaneously

7CS61c Lecture19:ThreadLevelParallelProcessing

Page 8: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

TransitiontoMulticore

Sequential App Performance

8CS61c Lecture19:ThreadLevelParallelProcessing

Page 9: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

MultiprocessorExecutionModel

• Sharedmemory− Each“core”hasaccesstotheentirememoryintheprocessor− Specialhardwarekeepscachesconsistent− Advantages:

§ Simplifiescommunication inprogramviasharedvariables− Drawbacks:

§ Doesnotscalewell:o “Slow”memorysharedbymany“customers”(cores)o Maybecomebottleneck(Amdahl’sLaw)

• Twowaystouseamultiprocessor:− Job-levelparallelism

§ Processorsworkonunrelatedproblems§ Nocommunicationbetweenprograms

− Partitionworkofsingletaskbetweenseveralcores§ E.g.eachperformspartoflargematrixmultiplication

9CS61c Lecture19:ThreadLevelParallelProcessing

Page 10: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ParallelProcessing

• It’sdifficult!• It’sinevitable

− Onlypathtoincreaseperformance− Onlypathtolowerenergyconsumption(improvebatterylife)

• Inmobilesystems(e.g.smartphones,tablets)− Multiplecores− Dedicatedprocessors,e.g.

§ motionprocessoriniPhone§ GPU(graphicsprocessingunit)

• Warehouse-scalecomputers− multiple“nodes”

§ “boxes”withseveralCPUs,disksperbox− MIMD(multi-core)andSIMD(e.g.AVX)ineachnode

10CS61c Lecture19:ThreadLevelParallelProcessing

Page 11: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

PotentialParallelPerformance(assumingsoftwarecanuseit)

Year Cores SIMD bits /Core Core *SIMD bits

Total, e.g.FLOPs/Cycle

2003 2 128 256 42005 4 128 512 82007 6 128 768 122009 8 128 1024 162011 10 256 2560 402013 12 256 3072 482015 14 512 7168 1122017 16 512 8192 1282019 18 1024 18432 2882021 20 1024 20480 320

11

2.5X 8X 20X

MIMD SIMD MIMD&SIMD+2/

2yrs2X/4yrs

CS61c

12years

20xin12years201/12 =1.28xà 28%peryearor2xevery3years!

IF(!)wecanuseit

Page 12: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 12

Page 13: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ProgramsRunningonmyComputerPID TTY TIME CMD220 ?? 0:04.34 /usr/libexec/UserEventAgent (Aqua)222 ?? 0:10.60 /usr/sbin/distnoted agent224 ?? 0:09.11 /usr/sbin/cfprefsd agent229 ?? 0:04.71 /usr/sbin/usernoted230 ?? 0:02.35 /usr/libexec/nsurlsessiond232 ?? 0:28.68 /System/Library/PrivateFrameworks/CalendarAgent.framework/Executables/CalendarAgent234 ?? 0:04.36 /System/Library/PrivateFrameworks/GameCenterFoundation.framework/Versions/A/gamed235 ?? 0:01.90 /System/Library/CoreServices/cloudphotosd.app/Contents/MacOS/cloudphotosd236 ?? 0:49.72 /usr/libexec/secinitd239 ?? 0:01.66 /System/Library/PrivateFrameworks/TCC.framework/Resources/tccd240 ?? 0:12.68 /System/Library/Frameworks/Accounts.framework/Versions/A/Support/accountsd241 ?? 0:09.56 /usr/libexec/SafariCloudHistoryPushAgent242 ?? 0:00.27 /System/Library/PrivateFrameworks/CallHistory.framework/Support/CallHistorySyncHelper243 ?? 0:00.74 /System/Library/CoreServices/mapspushd244 ?? 0:00.79 /usr/libexec/fmfd246 ?? 0:00.09 /System/Library/PrivateFrameworks/AskPermission.framework/Versions/A/Resources/askpermissiond248 ?? 0:01.03 /System/Library/PrivateFrameworks/CloudDocsDaemon.framework/Versions/A/Support/bird249 ?? 0:02.50 /System/Library/PrivateFrameworks/IDS.framework/identityservicesd.app/Contents/MacOS/identityservicesd250 ?? 0:04.81 /usr/libexec/secd254 ?? 0:24.01 /System/Library/PrivateFrameworks/CloudKitDaemon.framework/Support/cloudd258 ?? 0:04.73 /System/Library/PrivateFrameworks/TelephonyUtilities.framework/callservicesd267 ?? 0:02.15 /System/Library/CoreServices/AirPlayUIAgent.app/Contents/MacOS/AirPlayUIAgent --launchd271 ?? 0:03.91 /usr/libexec/nsurlstoraged274 ?? 0:00.90 /System/Library/PrivateFrameworks/CommerceKit.framework/Versions/A/Resources/storeaccountd282 ?? 0:00.09 /usr/sbin/pboard283 ?? 0:00.90

/System/Library/PrivateFrameworks/InternetAccounts.framework/Versions/A/XPCServices/com.apple.internetaccounts.xpc/Contents/MacOS/com.apple.internetaccounts285 ?? 0:04.72 /System/Library/Frameworks/ApplicationServices.framework/Frameworks/ATS.framework/Support/fontd291 ?? 0:00.25 /System/Library/Frameworks/Security.framework/Versions/A/Resources/CloudKeychainProxy.bundle/Contents/MacOS/CloudKeychainProxy292 ?? 0:09.54 /System/Library/CoreServices/CoreServicesUIAgent.app/Contents/MacOS/CoreServicesUIAgent293 ?? 0:00.29

/System/Library/PrivateFrameworks/CloudPhotoServices.framework/Versions/A/Frameworks/CloudPhotoServicesConfiguration.framework/Versions/A/XPCServices/com.apple.CloudPhotosConfiguration.xpc/Contents/MacOS/com.apple.CloudPhotosConfiguration

297 ?? 0:00.84 /System/Library/PrivateFrameworks/CloudServices.framework/Resources/com.apple.sbd302 ?? 0:26.11 /System/Library/CoreServices/Dock.app/Contents/MacOS/Dock303 ?? 0:09.55 /System/Library/CoreServices/SystemUIServer.app/Contents/MacOS/SystemUIServer

…156total at this momentHow does mylaptopdothis?

Imagine doing 156assignments all at the same time!CS61c Lecture19:ThreadLevelParallelProcessing 13

Page 14: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Threads• Sequentialflowofinstructionsthatperformssometask

− Uptonowwejustcalledthisa“program”

• Eachthreadhasa− DedicatedPC(programcounter)− Separateregisters− Accessesthesharedmemory

• Eachprocessorprovidesone(ormore)− hardware threads (orharts)thatactivelyexecuteinstructions− Eachcoreexecutesone“hardware thread”

• Operatingsystemmultiplexesmultiple− software threads ontotheavailablehardwarethreads− allthreadsexceptthosemappedtohardwarethreadsarewaiting

14CS61c Lecture19:ThreadLevelParallelProcessing

Page 15: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OperatingSystemThreads

Giveillusionofmany“simultaneously”activethreads1. Multiplexsoftwarethreadsontohardwarethreads:

a) Switchoutblockedthreads(e.g.cachemiss,userinput,networkaccess)b) Timer(e.g.switchactivethreadevery1ms)

2. Removeasoftwarethreadfromahardwarethreadbyi. interruptingitsexecutionii. savingitsregistersandPCtomemory

3. Startexecutingadifferentsoftwarethreadbyi. loadingitspreviouslysavedregistersintoahardwarethread’sregistersii. jumpingtoitssavedPC

CS61c Lecture19:ThreadLevelParallelProcessing 15

Page 16: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Example:4Cores

CS61c Lecture19:ThreadLevelParallelProcessing 16

Threadpool:Listofthreadscompetingforprocessor

OSmapsthreadstocoresandscheduleslogical(software)threads

Core2

Each“Core”activelyruns1programatatime

Core1 Core3 Core4

Page 17: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Multithreading

• Typicalscenario:− Activethreadencounterscachemiss− Activethreadwaits~ 1000cyclesfordatafromDRAM−à switchoutandrundifferentthreaduntildataavailable

• Problem−Mustsavecurrentthreadstateandloadnewthreadstate

§ PC,allregisters(couldbemany,e.g.AVX)−àmustperformswitchin≪1000cycles

• Canhardwarehelp?−Moore’slaw:transistorsareplenty

17CS61c Lecture19:ThreadLevelParallelProcessing

Page 18: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

HardwareassistedSoftwareMultithreading

18

MemoryInput

Output

Bytes

I/O-MemoryInterfaces

Processor(1 Core,2Threads)

Control

DatapathPC0

Registers0

(ALU)

PC1

Registers1

• TwocopiesofPCandRegistersinsideprocessorhardware

• Looksliketwoprocessorstosoftware(hardwarethread0,hardwarethread1)

• Hyperthreading:• Boththreadsmaybeactive

simultaneously

CS61c Lecture19:ThreadLevelParallelProcessingNote:presentedincorrectlyinthelecture

Page 19: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Multithreading

• Logicalthreads− ≈1%morehardware,≈10%(?)betterperformance

§ Separateregisters§ Sharedatapath,ALU(s),caches

• Multicore− =>DuplicateProcessors− ≈50%morehardware,≈2Xbetterperformance?

• Modernmachinesdoboth−Multiplecoreswithmultiplethreads percore

19CS61c Lecture19:ThreadLevelParallelProcessing

Page 20: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Bernhard’sLaptop

CS61c Lecture19:ThreadLevelParallelProcessing 20

$ sysctl -a | grep hw

hw.physicalcpu: 2hw.logicalcpu: 4hw.l1icachesize: 32,768hw.l1dcachesize: 32,768hw.l2cachesize: 262,144hw.l3cachesize: 3,145,728

• 2Cores• 4Threadstotal

Page 21: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Example:6Cores,24LogicalThreads

CS61c Lecture19:ThreadLevelParallelProcessing 21

Threadpool:Listofthreadscompetingforprocessor

OSmapsthreadstocoresandscheduleslogical(software)threads

Thread1Core2

Thread2

Thread3

Thread4

Thread1Core6

Thread2

Thread3

Thread4

Thread1Core4

Thread2

Thread3

Thread4

Thread1Core5

Thread2

Thread3

Thread4

Thread1Core3

Thread2

Thread3

Thread4

Thread1Core1

Thread2

Thread3

Thread4

4Logicalthreadspercore(hardware)thread

Page 22: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 22

Page 23: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

LanguagessupportingParallelProgramming

23

ActorScript Concurrent Pascal JoCaml OrcAda Concurrent ML Join OzAfnix Concurrent Haskell Java PictAlef Curry Joule ReiaAlice CUDA Joyce SALSAAPL E LabVIEW ScalaAxum Eiffel Limbo SISALChapel Erlang Linda SRCilk Fortan 90 MultiLisp Stackless PythonClean Go Modula-3 SuperPascalClojure Io Occam VHDLConcurrent C Janus occam-π XC

CS61c Lecture19:ThreadLevelParallelProcessing

Whichonetopick?

Page 24: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Whysomanyparallelprogramminglanguages?

• Piazzaquestion:−Why“intrinsics”?− TOIntel:fixyour#()&$!Compiler!

• It’shappening...but− SIMDfeaturesarecontinuallyaddedtocompilers(Intel,gcc)− Intenseareaofresearch− Researchprogress:

§ 20+yearstotranslateCintogood(fast!)assembly§ HowlongtotranslateCintogood(fast!)parallelcode?

o Generalproblem isveryhardtosolveo Presentstate:specializedsolutions forspecificcaseso Youropportunitytobecomefamous!

CS61c Lecture19:ThreadLevelParallelProcessing 24

Page 25: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ParallelProgrammingLanguages

• Numberofchoicesisindicationof− Nouniversalsolution

§ Needsareveryproblemspecific− E.g.

§ Scientificcomputing(matrixmultiply)§ Webserver:handlemanyunrelatedrequestssimultaneously§ Input/output:it’sallhappeningsimultaneously!

• Specializedlanguagesfordifferenttasks− Someareeasiertouse(forsomeproblems)− Noneisparticularly”easy”touse

• 61C− Parallellanguageexamplesforhigh-performancecomputing− OpenMP

CS61c Lecture19:ThreadLevelParallelProcessing 25

Page 26: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

ParallelLoops

• Serialexecution:for (int i=0; i<100; i++) {

…}

• ParallelExecution:

CS61c Lecture19:ThreadLevelParallelProcessing 26

for (int i=0; i<25; i++) { …

}

for (int i=25; i<50; i++) {

…}

for (int i=50; i<75; i++) {

…}

for (int i=75; i<100; i++) {

…}

Page 27: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Parallelfor inOpenMP

#include <omp.h>

#pragma omp parallel forfor (int i=0; i<100; i++) {

…}

CS61c Lecture19:ThreadLevelParallelProcessing 27

Page 28: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OpenMPExample$ gcc-5 -fopenmp for.c;./a.outthread 0, i = 0thread 1, i = 3thread 2, i = 6thread 3, i = 8thread 0, i = 1thread 1, i = 4thread 2, i = 7thread 3, i = 9thread 0, i = 2thread 1, i = 501 02 03 14 15 16 27 28 39 40

CS61c Lecture19:ThreadLevelParallelProcessing 28

Page 29: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OpenMP

• Cextension:nonewlanguagetolearn• Multi-threaded,shared-memoryparallelism

− CompilerDirectives,#pragma− RuntimeLibraryRoutines,#include <omp.h>

• #pragma− IgnoredbycompilersunawareofOpenMP− Samesourceformultiplearchitectures

§ E.g.sameprogramfor1&16cores

• Onlyworkswithsharedmemory

29CS61c Lecture19:ThreadLevelParallelProcessing

Page 30: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OpenMPProgrammingModel• Fork- JoinModel:

• OpenMPprogramsbeginassingleprocess(masterthread)− Sequentialexecution

• Whenparallelregionisencountered− Masterthread“forks” intoteamofparallelthreads− Executedsimultaneously− Atendofparallelregion,parallelthreads”join”,leavingonlymasterthread

• Processrepeatsforeachparallelregion− Amdahl’slaw?

30CS61c Lecture19:ThreadLevelParallelProcessing

Page 31: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

WhatKindofThreads?

• OpenMPthreadsareoperatingsystem(software)threads.• OSwillmultiplexrequestedOpenMPthreadsontoavailablehardwarethreads.• Hopefullyeachgetsarealhardwarethreadtorunon,sonoOS-leveltime-multiplexing.• Butothertasksonmachinecanalsousehardwarethreads!• Be“careful”(?)whentimingresultsforproject4!

− 5AM?− Jobqueue?

31CS61c Lecture19:ThreadLevelParallelProcessing

Page 32: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Example2:computingp

CS61c 32http://openmp.org/mp-documents/omp-hands-on-SC08.pdf

Page 33: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Sequentialp

CS61c Lecture19:ThreadLevelParallelProcessing 33

pi = 3.142425985001

• Resemblesp,butnotveryaccurate• Let’sincreasenum_steps andparallelize

Page 34: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Parallelize(1)…

CS61c Lecture19:ThreadLevelParallelProcessing 34

• Problem:eachthreadsneedsaccesstothesharedvariablesum

• Coderunssequentially…

Page 35: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Parallelize(2)…

CS61c Lecture19:ThreadLevelParallelProcessing 35

sum[0] sum[1]

1. Computesum[0]andsum[2]

inparallel

2. Computesum = sum[0] + sum[1]

sequentially

Page 36: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Parallelp

CS61c 36Lecture19:ThreadLevelParallelProcessing

Page 37: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

TrialRun

i = 1, id = 1i = 0, id = 0i = 2, id = 2i = 3, id = 3i = 5, id = 1i = 4, id = 0i = 6, id = 2i = 7, id = 3i = 9, id = 1i = 8, id = 0pi = 3.142425985001

CS61c Lecture19:ThreadLevelParallelProcessing 37

Page 38: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Scaleup:num_steps = 106

pi = 3.141592653590

Youverify howmany digitsarecorrect…

CS61c Lecture19:ThreadLevelParallelProcessing 38

Page 39: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

CanweParallelizeComputingsum

CS61c Lecture19:ThreadLevelParallelProcessing 39

Summationinsideparallelsection• Insignificantspeedupinthisexample,but…• pi = 3.138450662641• Wrong!And value changes between runs?!• What’s goingon?

AlwayslookingforwaystobeatAmdahl’sLaw…

Page 40: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

YourTurn

Whatarethepossiblevaluesof*($s0) afterexecutingthiscodeby2concurrent threads?

# *($s0) = 100lw $t0,0($s0)addi $t0,$t0,1sw $t0,0($s0)

CS61c Lecture19:ThreadLevelParallelProcessing 40

Answer *($s0)

A 100 or101B 101C 101or102D 100or101or102E 100or101or102or103

Page 41: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

YourTurn

Whatarethepossiblevaluesof*($s0) afterexecutingthiscodeby2concurrent threads?

# *($s0) = 100lw $t0,0($s0)addi $t0,$t0,1sw $t0,0($s0)

CS61c Lecture19:ThreadLevelParallelProcessing 41

Answer *($s0)

C 101or102

• 102ifthethreadsentercodesectionsequentially• 101ifbothexecutelw beforeeitherrunssw• onethreadsees“stale”data

Page 42: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

What’sgoingon?

CS61c Lecture19:ThreadLevelParallelProcessing 42

• Operationisreallypi = pi + sum[id]

• Whatif>1threadsreadscurrent(same)valueofpi,computesthesum,andstorestheresultbacktopi?

• Eachprocessorreadssameintermediatevalueofpi!• Resultdependsonwhogetstherewhen

• A“race”à resultisnotdeterministic

Page 43: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 43

Page 44: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Synchronization

• Problem:− Limitaccesstosharedresourceto1actoratatime− E.g.only1personpermittedtoeditafileatatime

§ otherwisechangesbyseveralpeoplegetallmixedup

• Solution:

CS61c Lecture19:ThreadLevelParallelProcessing 44

• Taketurns:• Onlyonepersonget’sthe

microphone&talksatatime

• Alsogoodpracticeforclassrooms,btw…

Page 45: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Locks

• Computersuselockstocontrolaccesstosharedresources− Servespurposeofmicrophoneinexample− Alsoreferredtoas“semaphore”

• Usuallyimplementedwithavariable− int lock;

§ 0forunlocked§ 1forlocked

CS61c Lecture19:ThreadLevelParallelProcessing 45

Page 46: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Synchronizationwithlocks// wait for lock releasedwhile (lock != 0) ;// lock == 0 now (unlocked)

// set locklock = 1;

// access shared resource ... // e.g. pi// sequential execution! (Amdahl ...)

// release locklock = 0;

CS61c Lecture19:ThreadLevelParallelProcessing 46

Page 47: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

LockSynchronization

Thread1

while (lock != 0) ;

lock = 1;

// critical section

lock = 0;

Thread2

while (lock != 0) ;

lock = 1; // critical sectionlock = 0;

CS61c Lecture19:ThreadLevelParallelProcessing 47

• Thread2findslocknotset,beforethread1setsit

• Boththreadsbelievetheygotandsetthelock!

Tryasyouwant,thisproblemhasnosolution,notevenattheassemblylevel.

Unlessweintroducenewinstructions,thatis!

Page 48: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

HardwareSynchronization

• Solution:− Atomicread/write− Read&writeinsingleinstruction

§ Nootheraccesspermittedbetweenreadandwrite− Note:

§ Mustusesharedmemory (multiprocessing)

• Commonimplementations:− Atomicswapofregister↔memory− Pairofinstructionsfor“linked”readandwrite

§ writefailsifmemorylocationhasbeen“tampered”withafterlinkedread

§ MIPSusesthissolution

48CS61c Lecture19:ThreadLevelParallelProcessing

Page 49: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

MIPSSynchronizationInstructions• Loadlinked: ll $rt, off($rs)

− Readsmemorylocation(likelw)− Alsosets(hidden)“linkbit”− Linkbitisresetifmemorylocation(off($rs))isaccessed

• Storeconditional: sc $rt, off($rs)

− Storesoff($rs) = $rt (like sw)− Sets$rt=1 (success)iflinkbitisset

§ i.e.no(other)processaccessedoff($rs) sincell− Sets$rt=0 (failure)otherwise− Note:sc clobbers $rt,i.e.changesitsvalue

49CS61c Lecture19:ThreadLevelParallelProcessing

Page 50: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

LockSynchronization

BrokenSynchronization

while (lock != 0) ;

lock = 1;

// critical section

lock = 0;

Fix(lockisatlocation$s1)

Try: addiu $t0,$zero,1ll $t1,0($s1)bne $t1,$zero,Trysc $t0,0($s1)beq $t0,$zero,Try

Locked:

# critical section

Unlock:sw $zero,0($s1)

CS61c Lecture19:ThreadLevelParallelProcessing 50

Tryagainifsc failed(another threadexecutedsc sinceabovell)

$t0 = 1 beforecalling ll:minimize timebetweenll andsc

Page 51: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 51

Page 52: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OpenMPLocks

CS61c Lecture19:ThreadLevelParallelProcessing 52

Page 53: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

SynchronizationinOpenMP

• Typicallyareusedinlibrariesofhigherlevelparallelprogrammingconstructs• E.g.OpenMPoffers$pragmasforcommoncases:

− critical− atomic− barrier− ordered

• OpenMPoffersmanymorefeatures− seeonlinedocumentation− ortutorialat

§ http://openmp.org/mp-documents/omp-hands-on-SC08.pdf

CS61c Lecture19:ThreadLevelParallelProcessing 53

Page 54: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

OpenMPcritical

CS61c Lecture19:ThreadLevelParallelProcessing 54

Page 55: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

TheTroublewithLocks…• …isdead-locks• Consider2cookssharingakitchen

− Eachcooksamealthatrequiressaltandpepper(locks)− Cook1grabssalt− Cook2grabspepper− Cook1noticess/heneedspepper

§ it’snotthere,sos/hewaits− Cook2realizess/heneedssalt

§ it’snotthere,sos/hewaits

• Anotsocommoncauseofcookstarvation− Butdeadlocksarepossibleinparallelprograms− Verydifficulttodebug

§ malloc/free iseasy…

CS61c Lecture19:ThreadLevelParallelProcessing 55

Page 56: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

Agenda

• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…

CS61c Lecture19:ThreadLevelParallelProcessing 56

Page 57: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing

AndinConclusion,…• Sequentialsoftwareexecutionspeedislimited• Parallelprocessingistheonlypathtohigherperformance

− SIMD:instructionlevelparallelism§ Implemented inallhighperformanceCPUstoday(x86,ARM,…)§ Partiallysupportedbycompilers

− MIMD:threadlevelparallelism§ Multicoreprocessors§ SupportedbyOperatingSystems(OS)§ Requiresprogrammerinterventiontoexploitatsingleprogramlevel

o E.g.OpenMP− SIMD&MIMDformaximumperformance

• Synchronization− Requireshardwaresupport:specializedassemblyinstructions− Typicallyusehigher-levelsupport− Bewareofdeadlocks

57CS61c Lecture19:ThreadLevelParallelProcessing