lecture 2c: benchmarks

26
Lecture 2c: Lecture 2c: Benchmarks Benchmarks

Upload: zaynah

Post on 22-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Lecture 2c: Benchmarks. Benchmarking. Benchmark is a program that is run on a computer to measure its performance and compare it with other machines Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine.  Not practical - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 2c: Benchmarks

Lecture 2c:Lecture 2c:

BenchmarksBenchmarks

Page 2: Lecture 2c: Benchmarks

Benchmarking

Benchmark is a program that is run on a computer to measure its performance and compare it with other machines

Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine.

Not practical

Standard benchmarks

Page 3: Lecture 2c: Benchmarks

BenchmarkingTypes of Benchmarks

Synthetic benchmarks

Toy benchmarks

Microbenchmarks

Program Kernels

Real Applications

Page 4: Lecture 2c: Benchmarks

Benchmarking

Synthetic benchmarks

Artificially created benchmark programs that represent the average frequency of operations (instruction mix) of a large set of programs

• Whetstone benchmark

• Dhrystone benchmark

• Rhealstone benchmark

Page 5: Lecture 2c: Benchmarks

Benchmarking

Synthetic benchmarks• Whetstone benchmark

• First written in Algol60 in 1972, today Fortran, C/C++, Java versions are available

• Represents the workload of numerical applications

• Measures floating point arithmetic performance

• Unit is Millions of Whetstone instructions per second (MWIPS)

• Shortcommings:

• Does not represent constructs in modern languages, such as pointers, etc.

• Does not consider cache effects

Page 6: Lecture 2c: Benchmarks

Benchmarking

Synthetic benchmarks• Dhrystone benchmark

• First written in Ada in1984, today

• Represents the workload of C version is available

• Statistics are collected on system software, such as operating system, compilers, editors and a few numerical programs

• Measures integer and string performance, no floating-point operations

• Unit is the number of program iteration completions per second

• Shortcommings:• Does not represent real life programs

• Compiler optimization overstates system performance

• Small code that may fit in the instruction cache

Page 7: Lecture 2c: Benchmarks

Benchmarking

Synthetic benchmarks• Rhealstone benchmark

• Multi-tasking real-time systems

• Factors are:• Task switching time

• Pre-emption time

• Interrupt latency time

• Semaphore shuffling time

• Dead-lock breaking time

• Datagram throughput time

• Metric is Rhealstones per second

6

∑ wi . (1/ ti) i=1

Page 8: Lecture 2c: Benchmarks

Benchmarking

Toy benchmarks 10-100 lines of code that the result is known before running the toy program

• Quick sort

• Sieve of EratosthenesFinds prime numbers

http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif

func sieve( var N ) var PrimeArray as array of size N initialize PrimeArray to all true for i from 2 to N for each j from i + 1 to N, where i divides j

set PrimeArray( j ) = false

Page 9: Lecture 2c: Benchmarks

Benchmarking

Microbenchmarks Small, specially designed programs used to test some specific function of a system (eg. Floating-point execution, I/O subsystem, processor-memory interface, etc.)

• Provide values for important parameters of a system

• Characterize the maximum performance if the overall performance is limited by that single component

Page 10: Lecture 2c: Benchmarks

Benchmarking

Kernels

Key pieces of codes from real applications.

• LINPACK and BLAS

• Livermore Loops

• NAS

Page 11: Lecture 2c: Benchmarks

Benchmarking

Kernels • LINPACK and BLAS Libraries

• LINPACK – linear algebra package

• Measures floating-point computing power

• Solves system of linear equations Ax=b with Gaussian elimination

• Metric is MFLOP/s

• DAXPY - most time consuming routine

• Used as the measure for TOP500 list

• BLAS – Basic linear algebra subprograms

• LINPACK makes use of BLAS library

Page 12: Lecture 2c: Benchmarks

Benchmarking

Kernels • LINPACK and BLAS Libraries

• SAXPY – Scalar Alpha X Plus Y

• Y = X + Y, where X and Y are vectors, is a scalar

• SAXPY for single and DAXPY for double precision

• Generic implementation:for (int i = m; i < n; i++) {

y[i] = a * x[i] + y[i]; }

Page 13: Lecture 2c: Benchmarks

Benchmarking

Kernels • Livermore Loops

• Developed at LLNL

• Originally in Fortran, now also in C

• 24 numerical application kernels, such as:• hydrodynamics fragment,

• incomplete Cholesky conjugate gradient,

• inner product,

• banded linear systems solution, tridiagonal linear systems solution,

• general linear recurrence equations,

• first sum, first difference,

• 2-D particle in a cell, 1-D particle in a cell,

• Monte Carlo search,

• location of a first array minimum, etc.

• Metrics are arithmetic, geometric and harmonic mean of CPU rate

Page 14: Lecture 2c: Benchmarks

Benchmarking

Kernels • NAS Parallel Benchmarks

• Developed at NASA Advanced Supercomputing division

• Paper-and-pencil benchmarks

• 11 benchmarks, such as:• Discrete Poisson equation,

• Conjugate gradient

• Fast Fourier Transform

• Bucket sort

• Embarrassingly parallel

• Nonlinear PDE solution

• Data traffic, etc.

Page 15: Lecture 2c: Benchmarks

Benchmarking

Real Applications

Programs that are run by many users

• C compiler

• Text processing software

• Frequently used user applications

• Modified scripts used to measure particular aspects of system performance, such as interactive behavior, multiuser behavior

Page 16: Lecture 2c: Benchmarks

Benchmarking

Benchmark Suites Desktop Benchmarks

• SPEC benchmark suite

Server Benchmarks • SPEC benchmark suite

• TPC

Embedded Benchmarks• EEMBC

Page 17: Lecture 2c: Benchmarks

Benchmarking

SPEC Benchmark Suite Desktop Benchmarks

• CPU-intensive• SPEC CPU2000

• 11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks• Real application programs:

• C compiler• Finite element modeling• Fluid dynamics, etc.

• Graphics intensive• SPECviewperf

• Measures rendering performance using OpenGL

• SPECapc• Pro/Engineer – 3D rendering with solid models• Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests• Unigraphics – solid modeling for an aircraft design

Server Benchmarks • SPECWeb – for web servers• SPECSFS – for NFS performance, throughput-oriented

Page 18: Lecture 2c: Benchmarks

Benchmarking

TPC Benchmark Suite Server Benchmark Transaction processing (TP) benchmarks Real applications

• TPC-C: simulates a complex query environment

• TPC-H: ad hoc decision support

• TPC-R: business decision support system where users run a standard set of queries

• TPC-W: business-oriented transactional web server Measures performance in transactions per second. Throughput

performance is measured only when response time limit is met. Allows cost-performance comparisons

Page 19: Lecture 2c: Benchmarks

Benchmarking

EEMBC Benchmarks

for embedded computing systems

34 benchmarks from 5 different application classes:

• Automotive/industrial

• Consumer

• Networking

• Office automation

• Telecommunications

Page 20: Lecture 2c: Benchmarks

BenchmarkingBenchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks

Page 21: Lecture 2c: Benchmarks

BenchmarkingBenchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks

Page 22: Lecture 2c: Benchmarks

BenchmarkingFixed-Computation benchmarks

W: fixed workload (number of instructions, number of floating-point operations,

etc)

T: measured execution time

R: speed

Compare

T

WR

1

2

2

1

2

1

/

/

T

T

TW

TW

R

RSpeedup

Page 23: Lecture 2c: Benchmarks

BenchmarkingFixed-Computation benchmarks

Amdahl’s Law

Page 24: Lecture 2c: Benchmarks

BenchmarkingFixed-Time benchmarks

On a faster system, a larger workload can be processed in the same amount of time

T: fixed execution time

W: workload

R: speed

Compare

T

WR

2

1

2

1

2

1

/

/

W

W

TW

TW

R

RSizeup

Page 25: Lecture 2c: Benchmarks

BenchmarkingFixed-Time benchmarks

Scaled Speedup

Page 26: Lecture 2c: Benchmarks

BenchmarkingVariable-Computation and Variable-Time

benchmarks

In this type of benchmark, quality of the solution is improved.

Q: quality of the solution

T: execution time

Quality improvements per second: T

Q