recap
DESCRIPTION
Recap. Technology trends Cost/performance. Measuring and Reporting Performance. What does it mean to say “computer X is faster than computer Y ”?. E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: A is 50% faster than B? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/1.jpg)
![Page 2: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/2.jpg)
Recap
• Technology trends
• Cost/performance
![Page 3: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/3.jpg)
Measuring and Reporting Performance
• What does it mean to say “computer X is faster than computer Y”?
E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.
Which is true:1) A is 50% faster than B?2) A is 33% faster than B?
![Page 4: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/4.jpg)
Performance• H&P’s definition: “X is n times faster than
Y” means
nX
Y Time Execution
Time Execution
• Performance is reciprocal of time:
nY
X ePerformanc
ePerformanc
![Page 5: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/5.jpg)
Example
• Answer: 1) A is 50% faster than B
E.g. Machine A executes a program in 10s; Machine B executesthe same program in 15s.
Which is true:1) A is 50% faster than B?2) A is 33% faster than B?
5.110
15
Time Execution
Time Execution
nA
B
![Page 6: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/6.jpg)
Performance
• Response time?
• Throughput?
![Page 7: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/7.jpg)
Measuring Performance
• Focus on execution time of real programs
• Measuring execution time? Wall clock time (elapsed time) CPU time (excludes I/O and other processes)
o User CPU time
o System CPU timeiota:~$ time gcc -g tmpcnv.s -o tmpcnv
real 0m3.352suser 0m0.367ssys 0m0.468s
![Page 8: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/8.jpg)
Choosing Programs to Measure Performance
• Real Programs– Compilers, text-processing, CAD tools, etc.
• Modified applications– Scripted or modified for portability
• Kernels– Attempt to extract key sections from real programs
(Livermore loops, Linpack)
• Toy Benchmarks– Short examples (e.g. Sieve of Eratosthenes)
• Synthetic Benchmarks– Whetstone, Dhrystone
![Page 9: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/9.jpg)
Benchmarking
• H&P: car magazines are more scientific about reporting performance than many CS journals!
![Page 10: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/10.jpg)
Benchmark Suites
• Collections of benchmarks– E.g. SPEC CPU2000 (INT and FP)
• 25 real FORTRAN/C/C++ programs, modified for portability
– Specific graphics benchmarks
![Page 11: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/11.jpg)
Server Benchmarks
• SPEC also has server benchmarks– File server– Web server
• TPC: Transaction Processing Council– Various transaction processing benchmarks
![Page 12: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/12.jpg)
Embedded Benchmarks
• Much less well developed– Tend to use Dhrystone!
• EEMBC– Recent development– 34 benchmarks (mainly kernels) in five
application areas
![Page 13: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/13.jpg)
Summarising Performance Measurements
• Complex area– Weighted arithmetic mean– Geometric mean– Normalised results– …
![Page 14: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/14.jpg)
1.6 Quantitative Principles
• Make the common case fast!– E.g. addition: focus on “normal” addition, not
overflow situations
• Amdahl’s Law– Quantifies improvements gained by focussing
on one aspect of a design
![Page 15: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/15.jpg)
Amdahl’s Law
section enhanced of Speedup
enhanced Fraction
where
)1(
1tenhancemen with timeExecution
tenhancemen without timeExecutiontenhancemen without ePerformanc
tenhancemen withePerformancSpeedup
E
E
E
EE
S
F
SF
F
![Page 16: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/16.jpg)
Example• We are considering an enhancement that is
10 times faster than the original, but is only used 40% of the time.
56.1
104.0
)4.01(
1
)1(
1Speedup
01 0.4
E
EE
EE
SF
F
SF
![Page 17: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/17.jpg)
CPU Performance
• CPU time related to clock speed:– Period (e.g. 1ns)– Rate (e.g. 1GHz)
• Also interested in Cycles Per Instruction (CPI)
![Page 18: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/18.jpg)
Three Equal Factors
• Clock rate (technology)
• CPI (architecture)
• Instruction count (architecture and compiler)
rateClock
CPIIC
timecycleClock CPIICTime CPU
![Page 19: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/19.jpg)
Measuring IC & CPI
• Many modern processors include hardware counters for instructions and clock cycles
• Simulations can give even more detail– Time consuming, but can be very accurate
![Page 20: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/20.jpg)
Another Principle: Locality
• Locality of Reference– “90/10 Rule”
• Also applies to data
• Two aspects:– Temporal locality– Spatial locality
![Page 21: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/21.jpg)
Taking Advantage of Parallelism
• Key principle for improving performance
• Examples:– System level: parallel processing, disk arrays,
etc.– Processor level: pipelining– Digital design: caches, ALU adders, etc.
![Page 22: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/22.jpg)
1.7 Putting It All Together: Performance & Price/Performance
• Measure performance and performance/cost for three categories– Desktop (SPEC INT and FP)– TP Servers (TPC-C)– Embedded Processors (EEMBC)
![Page 23: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/23.jpg)
Desktop
• Integer:– Performance/cost tracks performance
• FP:– Not as closely related– Pentium 4 much better than Pentium III
• AMD Athlon very good value for money
![Page 24: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/24.jpg)
Servers
• Twelve systems– Six top performers– Six best price-performance
• Multiprocessors– 3 P3’s – 280 P3’s
• Cost:– $131,000 – $15 million
![Page 25: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/25.jpg)
Embedded Processors
• Difficult to assess– Benchmarks very new– Designs very application-specific– Power a major constraint– Cost difficult to quantify (are support chips
required?)
![Page 26: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/26.jpg)
Embedded Processors
• Range:– 500MHz AMD K6 ($78) and IBM PowerPC
($94) used for network switches, etc.– 167MHz NEC VR 5432 ($25) popular in colour
laser printers– 180MHz NEC VR 4122 ($33) popular in PDAs
(low power)
![Page 27: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/27.jpg)
1.8 Another View: Power Consumption and Efficiency
• Embedded processors from previous example: power ranged from 700mW to 9600mW
• Fig. 1.27: Performance/watt– NEC VR 4122 huge leader
![Page 28: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/28.jpg)
1.9 Fallacies and Pitfalls
• Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark– Factors such as pipeline structure and memory
system have major impact– E.g. Pentium III vs. Pentium 4 (Fig. 1.28)
![Page 29: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/29.jpg)
1.7GHz P4 –vs– 1.0GHz P3
![Page 30: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/30.jpg)
Fallacies and Pitfalls
• Fallacy: Benchmarks remain valid indefinitely– Optimisations change
– Perhaps deliberately!
– Even real programs are affected by changes in technology
– E.g. gcc: increasing percentage is “system time”
– SPEC has adapted considerably
![Page 31: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/31.jpg)
Fallacies and Pitfalls
• Pitfall: Comparing hand-coded assembly and compiled high-level language performance– E.g. embedded processor benchmarks– Hand-coded is 5 – 87 times faster!
![Page 32: Recap](https://reader035.vdocuments.net/reader035/viewer/2022062721/5681377c550346895d9f164c/html5/thumbnails/32.jpg)