![Page 1: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/1.jpg)
Understanding Performance, Power and Energy Behavior in Asymmetric Processors
Nagesh B Lakshminarayana
Hyesoon Kim
School of Computer Science
Georgia Institute of Technology
![Page 2: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/2.jpg)
2
Outline
• Background and Motivation
• Thread Interactions
• Dynamic Scheduling
• Asymmetry Aware Scheduling
• Conclusion and Future Work
![Page 3: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/3.jpg)
3
Heterogeneous Architectures
• A particularly interesting class of parallel machines is Heterogeneous Architectures– Multiple types of Processing Elements (PEs)
available on the same machine
PEA
PEBPEBPEBPEBIn
terconnect
![Page 4: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/4.jpg)
4
Heterogeneous Architectures
• Heterogeneous architectures are becoming very common
IBM Cell processor
Special Accelerator
Fast core
Slow core
Slow core
Slow core
Slow core
Focus of this talk
Asymmetric Processors
Fast core
![Page 5: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/5.jpg)
5
Machine configurations
All-slow (SMP) All processors running at their lowest frequency
Half-half (AMP) Half of the processors running at their highest frequency, rest running at their lower frequency
All-fast (SMP) All processors running at their highest frequency
• M-I experiments have 8 threads, M-II experiments have 16 threads
• AMPs emulated using SpeedStep/PowerNow
Machine-I 2 Socket 1.87 GHz Quad-core Intel Xeon
4MB L2 cache, 8GB RAM, 40GB HDD, RHEL 5
Machine-II 4 Socket 2 GHz Quad-core AMD Opteron 8350
2MB L3 cache, 32GB RAM, 1TB HDD, RHEL 4
![Page 6: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/6.jpg)
6
Power Measurement
• Using Extech 380801 Power Analyzer• Total system power consumption
Experiment Machine
Windows MachinePower CableSerial Cable
Power Socket
![Page 7: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/7.jpg)
7
PARSEC Benchmark Suite
• Desktop-oriented multithreaded benchmark suite– Multithreaded– Animation, Data Mining, Financial Analysis– Pthreads, OpenMP
![Page 8: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/8.jpg)
8
050
100150200250300350
Exec
ution
tim
e (s
ec) All-fastHalf-halfAll-slow
Performance of PARSEC benchmarks
• On average, performance of half-half is between that of all-slow and all-fast
Execution Time
slow-limited middle-perf unstable
![Page 9: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/9.jpg)
9
barrier barrierbarrier
(a) slow-limited (b) middle-perf (c) unstable
Classification of Benchmarks
![Page 10: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/10.jpg)
10
0
50
100
150
200
Ener
gy (K
J)All-fastHalf-halfAll-slow
• In half-half/all-slow, total energy consumption is higher even though average power consumed might be lower
Energy Consumption of PARSEC
Energy consumption
slow-limited middle-perf
![Page 11: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/11.jpg)
11
• Observations
–Different applications behave differently on AMPs
–Usually SMP with fast processors saves energy
Behavior of Parsec Benchmarks
![Page 12: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/12.jpg)
12
Why do different applications behave differently on AMPs?
![Page 13: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/13.jpg)
13
Outline
• Background and Motivation
• Thread Interactions
• Dynamic Scheduling
• Asymmetry Aware Scheduling
• Conclusion and Future Work
![Page 14: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/14.jpg)
14
Thread Interactions
Sources of thread interactions• Critical Sections• Barriers
![Page 15: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/15.jpg)
15
Case (a)
Critical section
Useful work
Case (b)
Waiting
Critical Sections (CS)
• Waiting to enter CSs
![Page 16: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/16.jpg)
16
• Waiting for other threads to finish
barrier
Barriers
barrier
![Page 17: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/17.jpg)
0.8
0.85
0.9
0.95
1
10% CS 15% CS 20% CS 50% CS 75% CSNo
rmal
ized
po
wer
co
nsu
mp
tion
16 @ 1 GHz16 @ 1.2 GHz16 @ 1.4 GHz16 @ 1.7 GHz16 @ 2 GHz
17
Effect of Critical Section length
• CS limited application
• As critical section length increases, the average power consumed decreases
Normalized Power Consumption
![Page 18: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/18.jpg)
18
Effect of Critical Section length
Normalized Execution Time• CS limited application
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10% 15% 20% 50% 75%
Nor
mal
ized
exe
cuti
on ti
me 16 @ 1 GHz (SMP)
16 @ 1.2 GHz (SMP)
16 @ 1.4 GHz (SMP)
16 @ 1.7GHz (SMP)
16 @ 2 GHz (SMP)
![Page 19: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/19.jpg)
19
Effect of Critical Section length
• Performance of AMPs sensitive to CS length
Normalized Execution Time• CS limited application
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10% 15% 20% 50% 75%
Nor
mal
ized
exe
cuti
on ti
me 16 @ 1 GHz (SMP)
16 @ 1.2 GHz (SMP)
16 @ 1.4 GHz (SMP)
16 @ 1.7GHz (SMP)
16 @ 2 GHz (SMP)
8 @ 1 GHz, 8 @ 2 GHz (AMP)
8 @ 1.2 GHz, 8 @ 2 GHz (AMP)
8 @ 1.4 GHz, 8 @ 2 GHz (AMP)
8 @ 1.7GHz, 8 @ 2 GHz (AMP)
![Page 20: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/20.jpg)
20
Effect of Critical Section length
• Energy consumption shows the same trend
Normalized Energy Consumption• CS limited application
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10% 15% 20% 50% 75%
Nor
mal
ized
ene
rgy
cons
umpti
on
16 @ 1 GHz (SMP)
16 @ 1.2 GHz (SMP)
16 @ 1.4 GHz (SMP)
16 @ 1.7GHz (SMP)
16 @ 2 GHz (SMP)
8 @ 1 GHz, 8 @ 2 GHz (AMP)
8 @ 1.2 GHz, 8 @ 2 GHz (AMP)
8 @ 1.4 GHz, 8 @ 2 GHz (AMP)
8 @ 1.7GHz, 8 @ 2 GHz (AMP)
![Page 21: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/21.jpg)
21
Effect of Critical Section frequency
• Both length and frequency of CS affect performance and energy consumption
• As frequency increases, performance difference between half-half and all-fast reduces
• If majority of the execution time is spent waiting for locks, it is OK to have a few slow processors
• Results available in the paper
![Page 22: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/22.jpg)
22
Effect of Barriers
• For few barriers, half-half performs similar to all-slow
• For large number of barriers, half-half performs similar to all-fast
• Results available in the paper
![Page 23: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/23.jpg)
23
Outline
• Background and Motivation
• Thread Interactions
• Dynamic Scheduling
• Asymmetry Aware Scheduling
• Conclusion and Future Work
![Page 24: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/24.jpg)
24
• Motivation: better run-time adaptivity • Each thread requests for more work after
completing the assigned work• OpenMP, Intel Thread Building Blocks
Dynamic Scheduling
![Page 25: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/25.jpg)
25
Dynamic Scheduling
• Can help improve performance and reduce energy consumption in AMPs• Should be preferred to static and guided policies
Machine configuration
Normalized Execution Time
Normalized Energy Consumption
Static/Dynamic Static/Dynamic
16 @ 1 GHz (SMP) 1.0 1.0
16 @ 1.2 GHz (SMP) 0.83 0.87
16 @ 1.4 GHz (SMP) 0.71 0.78
16 @ 1.7 GHz (SMP) 0.59 0.68
16 @ 2 GHz (SMP) 0.50 0.61
8 @ 1 GHz, 8 @ 2 GHz (AMP) 1.00/0.67 1.05/0.73
8 @ 1.2 GHz, 8 @ 2 GHz (AMP) 0.83/0.63 0.90/0.70
8 @ 1.4 GHz, 8 @ 2 GHz (AMP) 0.71/0.59 0.80/0.67
8 @ 1.7 GHz, 8 @ 2 GHz (AMP) 0.59/0.54 0.69/0.63
• Parallel-for application
![Page 26: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/26.jpg)
26
Outline
• Background and Motivation
• Thread Interactions
• Dynamic Scheduling
• Asymmetry Aware Scheduling
• Conclusion and Future Work
![Page 27: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/27.jpg)
27
Scheduling in AMPs
• Longest Job to a Fast Processor First (LJFPF) [Lakshminarayana’08]
barrier
Fast core
Fast core Slow core
Slow core
![Page 28: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/28.jpg)
28
How Does the Scheduler Know
• Length of work? • Current mechanism: application sends task
length information• On-going work: Prediction mechanism
![Page 29: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/29.jpg)
29
LJFPF
• ITK: Medical image processing applications (OpenSource)• MultiRegistration (Registration method)
– kernel with 50 iterations– 50 iterations divided among 8 threads
Normalized Execution Time Normalized Energy Consumption
![Page 30: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/30.jpg)
30
Outline
• Background and Motivation
• Thread Interactions
• Dynamic Scheduling
• Asymmetry Aware Scheduling
• Conclusion and Future Work
![Page 31: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/31.jpg)
31
Conclusion & Future Work
Conclusion• Evaluated the performance/energy consumption behavior
of multithreaded applications in AMPs
• For symmetric workloads– With little thread interaction: SMP with fast processors– With a lot of thread interaction: AMP could be better
• For asymmetric threads – AMP could provide lowest energy consumption
Future Work• Predict application characteristics and use predicted
information for thread scheduling on AMPs
![Page 32: Understanding Performance, Power and Energy Behavior in Asymmetric Processors](https://reader035.vdocuments.net/reader035/viewer/2022062423/568143f3550346895db07b10/html5/thumbnails/32.jpg)
32
Thank you!