threads - swarthmore college · • modern oses separate the concepts of processes and threads. •...
Post on 25-Aug-2020
6 Views
Preview:
TRANSCRIPT
Threads11/15/16
CS31teachesyou…
• Howacomputerrunsaprogram.• Howthehardwareperformscomputations• Howthecompilertranslatesyourcode• Howtheoperatingsystemconnectshardwareandsoftware
• Theimplicationsforyouasaprogrammer• Pipelininginstructions• Caching• Virtualmemory• Processswitching• SupportforParallelprogramming(threads)
Transistors(*10^3)
ClockSpeed(MHZ)
Power(W)
ILP(IPC)
Whydowecareaboutparallel?
Moore’sLaw
• Circuitdensity(numberoftransistorsinafixedarea)doublesroughlyeverytwoyears.
• Thisusedtomeanthatclockspeeddoubledtoo.• Allyourprogramsruntwiceasfastforfree.• Problem:heat
• Fornow,circuitdensityisstillincreasing.Howcanwemakeuseofit?
The“Multi-CoreEra”
• Wecan’tmakeasinglecoregomuchfaster.• WecanusetheextratransistorstoputmultipleCPUcoresonthechip.
• Thisisexciting:CPUscandoalotmore!• Problem:it’snowuptotheprogrammertotakeadvantageofmultiplecores.• Humansarebadatthinkinginparallel…
ParallelAbstraction
• Tospeedupajob,youhavetodivideitacrossmultiplecores.
• Aprocesscontainsbothexecutioninformationandmemory/resources.
• Whatifwewanttoseparatetheexecutioninformationtogiveusparallelismwithinaprocess?
Threads
• ModernOSes separatetheconceptsofprocessesandthreads.• Theprocessdefinestheaddressspaceandgeneralprocessattributes(e.g.,openfiles).• Thethreaddefinesasequentialexecutionstreamwithinaprocess(PC,SP,registers),
• Athreadisboundtoasingleprocess.• Processes,however,canhavemultiplethreads.• Eachprocesshasatleastonethread.
Threads
Text
Data
Stack1
Thread1PC1
SP1
Process1
OS
Heap
Thisisthepicturewe’vebeenusingallalong:
Aprocesswithasinglethread,whichhasexecutionstate(registers)andastack.
Threads
Thread1PC1
SP1
Thread2
PC2
SP2
Process1
Text
Data
Stack1
OS
Heap
Stack2
Wecanaddathreadtotheprocess.Newthreadsshareallmemory(VAS)withotherthreads.
Newthreadgetsprivateregisters,localstack.
Threads
Thread1PC1
SP1
Thread2
Thread3
PC2
SP2PC3
SP3
Process1
Text
Data
Stack1
OS
Heap
Stack2
Stack3
Athirdthreadadded.
Note:they’reallexecutingthesameprogram(sharedinstructionsintext),thoughtheymaybeatdifferentpointsinthecode.
Threads• Private:tid,copyofregisters,executionstack• Shared:everythingelseintheprocess
11
tid0tid1…tidm
...
per-threadstacks
Process:+ Sharingiseasy+ Sharingischeap
nodatacopyfromonePi’saddressspacetoanotherPj’s addressspace
+ Threadcreatefasterthanprocess+ OScanscheduleonmultipleCPUs
+ Parallelism- Coordination/Synchronization
- Howtonotmuck-upeachother’sstate- Can’tusethreadsindistributedsystems
(whencooperatingPis areondifferentcomputers)
ProgrammingThreads
12
...
EveryProcesshas1threadofexecution• Thesinglemainthreadexecutesfrombeginning
Anexamplethreadedprogram’sexecution:1. Mainthreadofteninitializes
sharedstate2. Thenspawns multiplethreads3. Setofthreadsexecuteconcurrently
toperformsometask4. Mainthreadmaydoajoin,towaitfor
otherthreadstoexit(likewait&reapingprocesses)5. Mainthreadmaydosomefinalsequentialprocessing(like
writeresultstoafile)
LogicalViewofThreads
13
• Threadsformapoolofpeersw/inaprocess(Unlikeprocesseswhichformatreehierarchy)
pwd sh ls
ls
T1
ProcesshierarchyThreadsassociatedwithaprocess
T2T0
T4 T3
sharedcode,data
bash
ThreadConcurrency
14
SingleCoreProcessorSimulatebytimeslicing
Multi-CoreProcessorTrueconcurrency
Time
ThreadA ThreadB ThreadC ThreadA ThreadB ThreadC
Runonmultiplecores
Threads’ExecutionControlFlowsOverlap
Concurrency?
• Severalcomputationsorthreadsofcontrolareexecutingsimultaneously,andpotentiallyinteractingwitheachother.
• Wecanmultitask!Whydoesthathelp?• TakingadvantageofmultipleCPUs/cores• OverlappingI/Owithcomputation
Whyusethreadsoverprocesses?Separatingthreadsandprocessesmakesiteasiertosupportparallelapplications:
• Creatingmultiplepathsofexecutiondoesnotrequirecreatingnewprocesses(lessstatetostore,initialize).
• Low-overheadsharingbetweenthreadsinsameprocess(threadssharepagetables,accesssamememory).
Threads&Sharing
• Code(text)sharedbyallthreadsinprocess• Globalvariablesandstaticobjectsareshared• Storedinthestaticdatasegment,accessiblebyanythread
• Dynamicobjectsandotherheapobjectsareshared• Allocatedfromheapwithmalloc/freeornew/delete
• LocalvariablescanBUTSHOULDNOTbeshared• Refertodataonthestack• Eachthreadhasitsownstack• Neverpass/share/storeapointertoalocalvariableonanotherthread’sstack
Threads&Sharing
• Localvariablesshouldnotbeshared• Refertodataonthestack• Eachthreadhasitsownstack• Neverpass/share/storeapointertoalocalvariableonanotherthread’sstack
…
functionC
functionD
…
functionA
functionB
SharedHeapint *x;
Z
Thread1’sstack Thread2’sstack
Thread2candereferencextoaccessZ.FunctionBreturns…
Threads&Sharing
• Localvariablesshouldnotbeshared• Refertodataonthestack• Eachthreadhasitsownstack• Neverpass/share/storeapointertoalocalvariableonanotherthread’sstack
…
functionC
functionD
…
functionA
functionB
SharedHeapint *x;
Thread1’sstack Thread2’sstack
Thread2candereferencextoaccessZ.
Z
Shareddataonheap!
Thread-levelParallelism• SpeedupapplicationbyassigningportionstoCPUs/coresthatprocessinparallel
• Requires:• partitioningresponsibilities(e.g.,parallelalgorithm)• managingtheirinteraction
• Example:processinganarray
Onecore: Fourcores:
IfoneCPUcorecanrunaprogramatarateofX,howquicklywilltheprogramrunontwocores?
A. Slowerthanonecore(<X)B. Thesamespeed(X)C. Fasterthanonecore,butnotdouble(X-2X)D. Twiceasfast(2X)E. Morethantwiceasfast(>2X)
ParallelSpeedup
• Performancebenefitofparallelthreadsdependsonmanyfactors:• algorithmdivisibility• communicationoverhead• memoryhierarchyandlocality• implementationquality
• Formostprograms,morethreadsmeansmorecommunication,diminishingreturns.
Example
23
static int x;
int foo(int *p) {int y;
y = 3;y = *p;*p = 7;x += y;
} Heap:
Stack:
0x0
Globals:
Instructions:
max
Tid jTid i
Ifthreadsi andjbothexecutefunctionfoocode:Q1:whichvariablesdotheyeach
getowncopyof?whichdotheyshare?
Q2:whichstmts canaffectvaluesseenbytheotherthread?
SharedVirtualAddressSpace:
foo:
Example
24
static int x;
int foo(int *p) {int y;
y = 3;y = *p;*p = 7;x += y;
}
Eachtid getsitsowncopyofyonitsstack
x isinglobalmemoryandissharedbyeverythread
Heap:
Stack:
0x0
Globals:
Instructions:
max
p isparameter,eachtid getsitsowncopyofp.However,pcouldpointanintstoragelocation:onthestack,oringlobalmem,orontheheap,oreveninanother’sstackframe
x:10
y:3
p:-
y:3
p:-
Tid jTid i
Summary• Physicallimitstohowmuchfasterwecanmakeasinglecorerun.• Usetransistorstoprovidemorecores.• Parallelizeapplicationstotakeadvantage.
• OSabstraction:thread• Sharesmostoftheaddressspacewithotherthreadsinsameprocess• Getsprivateexecutioncontext(registers)+stack
• Coordinatingthreadsischallenging!
top related