lecture 2c: benchmarks

Lecture 2c:Lecture 2c:

BenchmarksBenchmarks

Benchmarking

Benchmark is a program that is run on a computer to measure its performance and compare it with other machines

Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine.

Not practical

Standard benchmarks

BenchmarkingTypes of Benchmarks

Synthetic benchmarks

Toy benchmarks

Microbenchmarks

Program Kernels

Real Applications

Benchmarking

Synthetic benchmarks

Artificially created benchmark programs that represent the average frequency of operations (instruction mix) of a large set of programs

• Whetstone benchmark

• Dhrystone benchmark

• Rhealstone benchmark

Benchmarking

Synthetic benchmarks• Whetstone benchmark

• First written in Algol60 in 1972, today Fortran, C/C++, Java versions are available

• Represents the workload of numerical applications

• Measures floating point arithmetic performance

• Unit is Millions of Whetstone instructions per second (MWIPS)

• Shortcommings:

• Does not represent constructs in modern languages, such as pointers, etc.

• Does not consider cache effects

Benchmarking

Synthetic benchmarks• Dhrystone benchmark

• First written in Ada in1984, today

• Represents the workload of C version is available

• Statistics are collected on system software, such as operating system, compilers, editors and a few numerical programs

• Measures integer and string performance, no floating-point operations

• Unit is the number of program iteration completions per second

• Shortcommings:• Does not represent real life programs

• Compiler optimization overstates system performance

• Small code that may fit in the instruction cache

Benchmarking

Synthetic benchmarks• Rhealstone benchmark

• Multi-tasking real-time systems

• Factors are:• Task switching time

• Pre-emption time

• Interrupt latency time

• Semaphore shuffling time

• Dead-lock breaking time

• Datagram throughput time

• Metric is Rhealstones per second

6

∑ wi . (1/ ti) i=1

Benchmarking

Toy benchmarks 10-100 lines of code that the result is known before running the toy program

• Quick sort

• Sieve of EratosthenesFinds prime numbers

http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif

func sieve( var N ) var PrimeArray as array of size N initialize PrimeArray to all true for i from 2 to N for each j from i + 1 to N, where i divides j

set PrimeArray( j ) = false

Benchmarking

Microbenchmarks Small, specially designed programs used to test some specific function of a system (eg. Floating-point execution, I/O subsystem, processor-memory interface, etc.)

• Provide values for important parameters of a system

• Characterize the maximum performance if the overall performance is limited by that single component

Benchmarking

Kernels

Key pieces of codes from real applications.

• LINPACK and BLAS

• Livermore Loops

• NAS

Benchmarking

Kernels • LINPACK and BLAS Libraries

• LINPACK – linear algebra package

• Measures floating-point computing power

• Solves system of linear equations Ax=b with Gaussian elimination

• Metric is MFLOP/s

• DAXPY - most time consuming routine

• Used as the measure for TOP500 list

• BLAS – Basic linear algebra subprograms

• LINPACK makes use of BLAS library

Benchmarking

Kernels • LINPACK and BLAS Libraries

• SAXPY – Scalar Alpha X Plus Y

• Y = X + Y, where X and Y are vectors, is a scalar

• SAXPY for single and DAXPY for double precision

• Generic implementation:for (int i = m; i < n; i++) {

y[i] = a * x[i] + y[i]; }

Benchmarking

Kernels • Livermore Loops

• Developed at LLNL

• Originally in Fortran, now also in C

• 24 numerical application kernels, such as:• hydrodynamics fragment,

• incomplete Cholesky conjugate gradient,

• inner product,

• banded linear systems solution, tridiagonal linear systems solution,

• general linear recurrence equations,

• first sum, first difference,

• 2-D particle in a cell, 1-D particle in a cell,

• Monte Carlo search,

• location of a first array minimum, etc.

• Metrics are arithmetic, geometric and harmonic mean of CPU rate

Benchmarking

Kernels • NAS Parallel Benchmarks

• Developed at NASA Advanced Supercomputing division

• Paper-and-pencil benchmarks

• 11 benchmarks, such as:• Discrete Poisson equation,

• Conjugate gradient

• Fast Fourier Transform

• Bucket sort

• Embarrassingly parallel

• Nonlinear PDE solution

• Data traffic, etc.

Benchmarking

Real Applications

Programs that are run by many users

• C compiler

• Text processing software

• Frequently used user applications

• Modified scripts used to measure particular aspects of system performance, such as interactive behavior, multiuser behavior

Benchmarking

Benchmark Suites Desktop Benchmarks

• SPEC benchmark suite

Server Benchmarks • SPEC benchmark suite

• TPC

Embedded Benchmarks• EEMBC

Benchmarking

SPEC Benchmark Suite Desktop Benchmarks

• CPU-intensive• SPEC CPU2000

• 11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks• Real application programs:

• C compiler• Finite element modeling• Fluid dynamics, etc.

• Graphics intensive• SPECviewperf

• Measures rendering performance using OpenGL

• SPECapc• Pro/Engineer – 3D rendering with solid models• Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests• Unigraphics – solid modeling for an aircraft design

Server Benchmarks • SPECWeb – for web servers• SPECSFS – for NFS performance, throughput-oriented

Benchmarking

TPC Benchmark Suite Server Benchmark Transaction processing (TP) benchmarks Real applications

• TPC-C: simulates a complex query environment

• TPC-H: ad hoc decision support

• TPC-R: business decision support system where users run a standard set of queries

• TPC-W: business-oriented transactional web server Measures performance in transactions per second. Throughput

performance is measured only when response time limit is met. Allows cost-performance comparisons

Benchmarking

EEMBC Benchmarks

for embedded computing systems

34 benchmarks from 5 different application classes:

• Automotive/industrial

• Consumer

• Networking

• Office automation

• Telecommunications

BenchmarkingBenchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks

BenchmarkingFixed-Computation benchmarks

W: fixed workload (number of instructions, number of floating-point operations,

etc)

T: measured execution time

R: speed

Compare

T

WR

1

2

2

1

2

1

/

/

T

T

TW

TW

R

RSpeedup

BenchmarkingFixed-Computation benchmarks

Amdahl’s Law

BenchmarkingFixed-Time benchmarks

On a faster system, a larger workload can be processed in the same amount of time

T: fixed execution time

W: workload

R: speed

Compare

T

WR

2

1

2

1

2

1

/

/

W

W

TW

TW

R

RSizeup

BenchmarkingFixed-Time benchmarks

Scaled Speedup

BenchmarkingVariable-Computation and Variable-Time

benchmarks

In this type of benchmark, quality of the solution is improved.

Q: quality of the solution

T: execution time

Quality improvements per second: T

Q

lecture 2c: benchmarks

Documents