online computation of critical paths for multithreaded languages
DESCRIPTION
Online Computation of Critical Paths for Multithreaded Languages. Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo. Presentation Outline. What is a critical path? Background & Overview Our work Target language Critical path computation algorithm Instrumentation scheme - PowerPoint PPT PresentationTRANSCRIPT
May/01/2000 HIPS 2000 1
Online Computation ofCritical Paths for
Multithreaded Languages
Yoshihiro OyamaKenjiro Taura
Akinori Yonezawa
University of Tokyo
May/01/2000 HIPS 2000 2
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 3
What is a Critical Path (CP)?
• The longest execution path– Nodes: sequential program parts– Edges: fork/sync points
31
36
5
2
2
2
8
7
4
1
CP length: 31
May/01/2000 HIPS 2000 4
Benefits of Getting CPs(1/2)
• CP info gives us– Performance upper bound
= Exec. time lower bound = lim {exec. time} PE→∞
– Important parts in need of tuning
May/01/2000 HIPS 2000 5
Benefits of Getting CPs(2/2)
• CP info is useful for– Tuning
• CP is short → Overhead should be reduced• Otherwise → CP should be shortened
– Performance prediction• TP = T1 / P + T∞ (by Cilk group)• Exec. time is close to CP length
→ More processors: futile
May/01/2000 HIPS 2000 6
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 7
This Work
• Computing critical paths– Primary targets:
• Multithreaded languages• Shared-memory machines
– On-the-fly• Not using tracefiles
– Source code instrumentation
May/01/2000 HIPS 2000 8
Background(Shortcoming of Existing Work)
• Cilk [Frigo et al. 98]– Provides online computation of CPs
– Supports fork-join synchronization only
– Unrealistic setting• Fork: zero cost• Join: zero cost
May/01/2000 HIPS 2000 9
Contribution
• Developed algorithm for computing CPs– It deals with languages with threads and synchr
onization via first-class data• Not limited to fork-join model
– It takes fork / communication cost into account– It gives length of each subpath in a CP
• Helps us “pinpoint” important program parts
• Demonstrated its usefulness through experiments using SMP
May/01/2000 HIPS 2000 10
CP Info Example
• Displaying a sequence of all subpaths in a CP
frame entry point frame exit point time=============================================================main() --- move_mols(mols,100) 741 usecspawn 10 usecmove_mols(mols,n) --- spawn move_one_mol(mols[i]) 39 usecspawn 10 usecmove_one_mol(molp) --- return 4982 useccommunication 15 usecv = recv(r) --- send(s, v*2) 128 useccommunication 15 usecu = recv(s) --- die 1207 usec=============================================================critical path length 7147 usec
May/01/2000 HIPS 2000 11
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 12
Target Language
• Sequential language(C, Scheme, …)
+ Threads spawn f(x1,…,xn)
+ Channels• are first-class sync. media• can express locks, barriers,
and monitors
r
th2v = recv(r)
th1send(r,8)
8
8
May/01/2000 HIPS 2000 13
Sample Program
main(){ spawn sum(r,vec); ... v = recv(r); ... die;}
sum(r,vec){ ... ... send(r,ans);}
End of Program
Beginning of Program
May/01/2000 HIPS 2000 14
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 15
Behavior of Sample Program
sum(r,vec)
v=recv(r)spawn sum(r,vec)
send(r,ans)
diemain
Nodes: fork & sync. points
Edges: inter-node dependencies
DAG-structured execution
May/01/2000 HIPS 2000 16
Three Kinds of Edges (Dependencies)
• Arithmetic edges• Spawn edges• Communication edges
83
5
14 29
sum(r,vec)
v=recv(r)spawn sum(r,vec)
send(r,ans)
diemain
May/01/2000 HIPS 2000 17
CP Computation AlgorithmBasic Idea
• DAG not constructed– Each thread keeps only the longest path
up to the current program point
recvmain
Path2
Path1thrownaway
May/01/2000 HIPS 2000 18
Key Questions
• How to determine edge values?
• How to compute CP withoutconstructing DAG?– How to manage CP info? – How to keep the longest path?
May/01/2000 HIPS 2000 19
Determining Edge Values
• Computing the amount of time that elapsed after leaving the previous node
Y ZXt1=time() t2=time() t3=time()8 6
May/01/2000 HIPS 2000 20
CP=({…},{…},{…}, {L1,L2,8})
CP=({…},{…},{…})
Extending CP withArithmetic Edge
XL1:
8 YL2:
6 ZL3:
CP=({…},{…},{…}, {L1,L2,8})
CP=({…},{…},{…}, {L1,L2,8}, {L2,L3,6})
The amount of time in nodes: NOT accounted
CP info = a sequence of edge info
May/01/2000 HIPS 2000 21
Extending CP withSpawn Edge
CP=({…},{…},{…})
X spawn Y
ZCP=({…},{…},{…})
CP=({…},{…},{…}, {…,…,Cspawn })
May/01/2000 HIPS 2000 22
Extending CP withCommunication Edge
CPsend=({…},{…})
send
recv
[v, CPsend]
Piggyback a sentvalue with CP
CPsend=({…},{…}, {…,…,Ccomm })
CPsend=({…},{…})
May/01/2000 HIPS 2000 23
Keeping the Longest Path(Throwing Shorter Paths Away)
send
recv
[v, CPsend]
CP=max( CPsend, CPrecv )
CPsend = …
CPrecv = … CPsend=({…},{…}, {…,…,Ccomm })
May/01/2000 HIPS 2000 24
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 25
Instrumentation
• Source-to-source transformation
– Independent of the implementation details• Ex. management of activation frames
– Instrumentation code is inserted into• Sends, recvs, spawns• Entry/exit points of functions
May/01/2000 HIPS 2000 26
Transformation Rule Example
l: v = recv(r);
t = time() - et;[v, cp’] = recv(r);cp’’ = addCommEdge(cp’)if(t + length(cp) < length(cp’)){ cp = cp’ el = l; et = time();} else { et = time() - t;}
Compute CP up to recv
Receive a valuepiggybacked with CP
Compare the two CPs
Extend CP withcomm. edge
Use the sender’s CP
Use the receiver’s CP
May/01/2000 HIPS 2000 27
• DAG shape varies between different runs
Discussion (1/2)-- Nondeterminism --
X Y28
X Y5
– The amounts of time for each part vary(e.g., cache effects)
send
recv
send
recv
send
recv
send
recv
– Comm. edges may connect different pairs
May/01/2000 HIPS 2000 28
Discussion (2/2)-- What we Compute as CP --
• CP of a DAG created in an actual run
– Programs may give different CPsin different runs
– Other reasonable ways?
May/01/2000 HIPS 2000 29
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 30
Experiments
• Schematic: concurrent OO language [Taura et al. 96]
• Sun Ultra Enterprise 10000– UltraSPARC x 64
• Apps:– Prime– Natural Language Parser– Raytrace
• Timer function: gethrtime()
May/01/2000 HIPS 2000 31
Purpose of Experiments
• Checking that execution timesget close to computed CPs
• Identifying how large instrumentation overhead is
May/01/2000 HIPS 2000 32
Raytrace
0
500
1000
1500
0 10 20 30 40 50 60number of processors
tim
e (m
sec)
Instrumented CP Org
We could predictthe best performance
by using only one processor
We could predictthe best performance
by using only one processor
May/01/2000 HIPS 2000 33
Prime
0
300
600
900
1200
0 10 20 30 40 50 60
number of processors
tim
e (m
sec)
Instrumented CP Org
Small (< 5%) differencebetween the actual execution timeand the predicted execution time
Small (< 5%) differencebetween the actual execution timeand the predicted execution time
May/01/2000 HIPS 2000 34
Information Useful for Future Tuning of Prime
• Gathering primes into a list → 95 % of CP
• Dividing prime candidates by smaller primes → 5% of CP
May/01/2000 HIPS 2000 35
Natural Language Parser
0
400
800
1200
0 10 20 30 40 50 60
number of processors
tim
e (m
sec)
Instrumented CP Org
May/01/2000 HIPS 2000 36
Information Useful for Future Tuning of NL Parser
• Application of lexical rules → 4 % of CP
• Application of production rules → 96% of CP
May/01/2000 HIPS 2000 37
Instrumentation Overhead(Execution Time on One Processor)
9.9
4.7 3.7
0
5
10
15
Prime NL Parser Raytrace
norm
aliz
ed t
ime
Org Instrumented
May/01/2000 HIPS 2000 38
Presentation Outline
• What is a critical path?• Background & Overview• Our work
– Target language– Critical path computation algorithm– Instrumentation scheme
• Experimental results• Related work
May/01/2000 HIPS 2000 39
Related Work (1/2)
% foo -nproc 10 20
• Cilk– Breakdown of CP not shown
• CP info: not detailed enough for tuning
Which function should we tune???
result: 524288Running time on 10 procs: 416.33 msTotal work = 3.94 sCritical path = 1.08 msParallelism = 2800.92%
May/01/2000 HIPS 2000 40
Related Work (2/2)
• Paradyn [Hollingsworth 98]– Main target is message-passing programs– It does not display all subpaths in CP
• Tracefile-based offline scheme(Dimemas [Pallas] etc.)– Tracefile contains the parameters and the timin
gs of all communication operations
– Required memory/storage is very large
May/01/2000 HIPS 2000 41
Summary (1/2)
• Scheme for online CP computation– Supports synchronization via first-class data
• Piggybacking communicated values with CP info• Keeping the maximum of two paths in receives
– Takes spawn/communication cost into account
– Shows all subpaths in CP• Attaching subpath info in each CP update
May/01/2000 HIPS 2000 42
Summary (2/2)
• CP info we compute– Helps predict the MP performance
• Small (< 10%) difference between– Actual execution time– Predicted execution time
– Gives a useful guide to tuning• Prime: Tune list construction part!• Parser: Tune production rule application part!
May/01/2000 HIPS 2000 43
Future Work
• More precise performance prediction– Taking thread mapping into account
• Adaptive optimization using CP info– Time-consuming optimizations are
applied to the parts included in CP
May/01/2000 HIPS 2000 44
Any Comments?