grid performance, grid benchmarks, grid metrics zsolt németh mta sztaki computer and automation...

56
Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute [email protected] http://www.lpds.sztaki.hu/~zsnemeth

Post on 19-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance, grid benchmarks, grid metrics

Zsolt NémethMTA SZTAKI Computer and Automation Research [email protected]://www.lpds.sztaki.hu/~zsnemeth

Page 2: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Outline

● What is the grid?● What is grid performance?● Are benchmarks useful?● How can be grid metrics defined?

Page 3: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is the grid?

Page 4: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Distributed applications

● A set of cooperative processes

Page 5: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Distributed applications

● Processes require resources

CPU

MemoryNetwork

Printer

Storage

Database

Librabries

I/O devices

Page 6: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Distributed applications

● Resources can be found on computational nodes

CPU

Memory

Network Printer

Storage

Database

Libraries

I/O devices

CPU

Mapping

Page 7: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Distributed applications

– Process control?– Security?– Naming?– Communication?– Input / output?– File access?

Application:Cooperative processes

Physical layer:Computational nodes

Page 8: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Distributed applications

Application:Cooperative processes

Physical layer:Computational nodes

Virtual machine:• Process control • Security • Naming • Communication • Input / output • File access

Page 9: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● Distributed resources are virtually unified by a software layer

● A virtual machine is introduced between the application and the physical layer

● Provides a single system image to the application

● Types● “Conventional” (PVM, some implementations

of MPI)● Grid (Globus, Legion)

Conventional distributed environments and grids

Page 10: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

• What is the essential difference?

Page 11: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

• Geographical extent?

Page 12: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

• Performance?

Page 13: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

• Tools and services?

Page 14: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

• How is the virtual machine built up? • What does execution mean?• What is the semantics of execution?

Page 15: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Description of grid

● “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” (The anatomy of the grid)

● “single, seamless, computational environment in which cycles, communication and data are shared” (Legion: the Next Step Toward a Nationwide Virtual Computer)

● “widearea environment that transparently consists of workstations, personal computers, graphic rendering engines, supercomputers and nontraditional devices” (Legion - A View from 50,000 Feet)

● “collection of geographically separated resources connected by a high speed network”, “a software layer which transforms a collection of independent resources into a single, coherent virtual machine” (Metacomputing - What’s in it for me)

Page 16: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional environments

Physical level

Set of nodes(node=collection of resources)•Login access•Static

Virtual machine•Constructed on a priori information

Processes•Have resource requests

Mapping•Processes are mapped onto nodes•Resource assignment is implicit

Page 17: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid

Physical layer

Virtual machine•Resources are assigned to processes•Consists of the selected resources

Processes•Have resource requirements

Mapping•Assign nodes to resources?

Set of resources•Shared•Dynamic

Page 18: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid: the resource abstraction

Physical layer

Processes•Have resource needs

Resource abstraction•Explicit mapping between virtual and physical resources•Cannot be solved at user/application level

Page 19: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid: the user abstraction

Physical layer•Local, physical users (user accounts)

Processes•Belong to a user

•User of the virtual machine is authorised to use the constituting resources

•Have no login access to the node the resource belongs to

User abstraction•User of the virtual machine is temporarily mapped onto some local accounts•Cannot be solved at user/application level

Page 20: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

The grid: abstraction

● Semantically: the grid is nothing but abstraction

● Resource abstraction● Physical resources can be assigned to virtual resource

needs (matched by properties)● Grid provides a mapping between virtual and physical

resources● User abstraction

● User of the physical machine may be different from the user of the virtual machine

● Grid provides a temporal mapping between virtual and physical users

Page 21: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conventional distributed environments and grids

[email protected]

Smith 4 nodes

[email protected]

[email protected] [email protected]

[email protected]

Smith, 4 CPU,

memory, storage

Smith 1 CPU

Page 22: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance

Page 23: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

● Performance of ‘grid infrastructure’ or performance of ‘grid application’?

● Traditionally ‘performance’ is● Speed● Throughput● Bandwidth, etc.

● Using grids● Quantitative reasons● Qualitative reasons – QoS● Economic aspects

Page 24: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance analysis scenarios

1. Resource brokering: evaluate the performance of a given resource if it is appropriate for a certain job

2. At runtime: check if a resource can maintain an acceptable/required performance

3. At runtime: check if a job can evolve according to checkpoints

4. Find obvious idling/waiting spots5. Find bad communication patterns6. Find serious performance skew 7. Post mortem: see if brokering strategy was

correct8. Etc.

Page 25: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

• supercomputer • cluster

Page 26: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

• supercomputer• task is done in 20 minutes

• cluster• task is done in 12 hours

Page 27: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

• supercomputer• task is done in 20 minutes• available tomorrow night

• cluster• task is done in 12 hours• available now

Page 28: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

• supercomputer• task is done in 20 minutes• available tomorrow night• costs $200/hour

• cluster• task is done in 12 hours• available now• costs $15/hour

Page 29: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

What is grid performance at all?

• Grid is about resource sharing• What is the benefit of sharing

– acceptable for resource owners– acceptable for resource users

• Speed, bandwidth, capacity, etc. is just one aspect

• Properness, fairness, effectiveness of assignment of processes to resources

Page 30: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance

Physical layer

Virtual layer

Performance?

Page 31: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance

Physical layer

Virtual layer

Performance?

Measurement

Page 32: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance

Physical layer

Virtual layer

Performance?

Measurement

Page 33: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Interaction of application and the infrastructure

● Performance = application perf. infrastructure perf.

● Signature model (Pablo group)● Application signature

● e.g. instructions/FLOPs● Scaling factor (capabilities of the resources)

● e.g. FLOPs/seconds● Execution signature:

● application signature * scaling factor● E.g. instructions/second = instructions/FLOPS *

FLOPs/seconds

Page 34: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Possible performance problems in grids

● All that may occur in a distributed application

● Plus● Effectiveness of resource brokering● Synchronous availability of resources● Resources may change during execution● Various local policies● Shared use of resources● Higher costs of some activities

● The corresponding symptoms must be characterised

Page 35: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance metrics

● Abstract representation of measurable quantities● M=R1xR2x...Rn

● Usual metrics● Speedup, efficiency● Load, queue length, etc.

● Such strict values are not characteristic in grid● Cannot be interpreted● Cannot be compared

● New metrics● Local metrics and grid metrics● Symbolic description / metrics

Page 36: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Processing monitoring information

● Trace data reduction● Proportional to time t, processes P, metrics

dimension n● Statistical clustering (reducing P)

● Similar temporal behaviours are classified● Questionnable if works for grids

● Representative processes are recorded for each class● Statistical projection pursuit (reducing n)

● reduces the dimension by identifying significant metrics

● Sampling frequency (reducing t)

Page 37: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Performance tuning, optimisation

● The execution cannot be reproduced● Post-mortem optimisation is not viable● On-line steering is necessary though, hard to realise

● Sensors and actuators● Application and implementation dependent● E.g Autopilot, Falcon

● Average behaviour of applications can be improved

● Post-mortem tuning of the infrastructure (if possible)

● Brokering decisions● Supporting services

Page 38: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid benchmarking

Page 39: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid performance,resource performance

● The traditional way: benchmarking● As suggested by GGF-GBRG

Page 40: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Running benchmarks

● Benchmarks are executed on a virtual machine

Page 41: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Running benchmarks

● Benchmarks are executed on a virtual machine

● The virtual machine may change (composed of different resources) from run to run

Page 42: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Running benchmarks

● Benchmarks are executed on a virtual machine

● The virtual machine may change (composed of different resources) from run to run

● Benchmark result is representative to one certain virtual machine

Page 43: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Running benchmarks

● Benchmarks are executed on a virtual machine

● The virtual machine may change (composed of different resources) from run to run

● Benchmark result is representative to one certain virtual machine

● What can it show about the entire grid?

● What can it show about a certain resource?

Page 44: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid benchmarking

Physical layer

Virtual layer

Performance?Measurement

Page 45: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Grid metrics

Page 46: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● Load averages, CPU user, system, idle percentages, network bandwidth, cache hit ratio, available memory, page faults, etc.

● Performance is a trajectory in a multi-dimensional space● Cannot be compared● Cannot be interpreted

● processes: 55.2, user: 70, system: 0, idle: 30● underloaded 64-CPU system

● processes: 55.2, user: 70, system: 30, idle: 0● 64-CPU system, serious overheads

● processes: 72.8, user: 99, system: 1, idle: 0● slightly overloaded 64-CPU system

● processes: 4.1, user: 99, system: 1, idle: 0● seriously overloaded 1-CPU system

● Fine details are even more complex to evaluate

Local metrics

Page 47: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Local metrics, global (grid) metrics

● Local metrics are transformed into some globally understandable performance figures

● What are the dimensions?● What is the transformation?

Page 48: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● MIPS, MFLOPS, Gbit/s, etc.● Comparable, interpretable● Most users have no idea about the computing power

they really require● These are usually nominal and not actual values● Too general characterisation – fine details are hidden

Global metrics

Page 49: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● Benchmarks are for comparing computer systems● A well selected benchmark set

● sensitive to different factors: CPU intensive, communication intensive, I/O intensive jobs

● able to show fine details: cache behaviour, floating point capabilities, etc.

● able to show behaviour at different levels: instruction, loop, procedure, application

● These figures can be obtained actively: require time, resources

Benchmark metrics

Page 50: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● Given a local database with local and benchmark performance records

● get the local performance figures● low cost OS functionality

● look up the database for benchmark performance● there may not be record for actual local performance

● symbolic (fuzzy) interpolation ● the actual benchmark figures can be estimated

● actual execution of benchmarks is costly if not impossible● Estimated benchmark figures give a characterisation

of the system in a comparable and interpretable way● Sounds reasonable… but not enough

Benchmark metrics

Page 51: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● Benchmarks may show actual execution performance but it is not enough…

● Real-life experiments: execution time may show no correlation to actual load

● start every job and suffer resource starvation● wait until resources are available and start specific jobs

● Resource management policy must be taken into consideration

Benchmark metrics

Page 52: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

● corona.iif.hu, SUN Ultra Enterprise 10000, 64 CPU● Sun Grid Engine ● Time between submission and actual start

● 1 processor job: within 1 minute● 2 processor job: mostly within 1 minute● 4 processor job: 2-3 hours● 8 processor job: 1-2 days● 9 processor job: 1-2 days● 16 processor job: 2-3 days● 25 processor job: > 4-5 days

● See online:● http://www.lpds.sztaki.hu/~zsnemeth/apart/statistics/

statistics.shtml

Job startup times

Page 53: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Resource performance characterisation

● Execution phase: resource performance can be characterized in the space of benchmark metrics

● analyse relationship between local metrics a benchmark results

● find the principal components● Waiting phase: a stochastic model

● find the parameters of the distribution

Page 54: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Resource performance characterisation

● These parameters (i, i, {t1, t2,…tn} ) can be distributed in an information system

● Interpretable: the stochastic model and the benchmark set give an appropriate framework

● Comparable: figures have the same meaning within this framework

Page 55: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Ongoing work

● Exploring the statistical properties of benchmarks and system parameters

● Intensive benchmark experiments● Getting the most out of figures● Principal component analysis: which figures are really

meaningful● Testing the stability of statistic data● http://www.lpds.sztaki.hu/~zsnemeth/apart/statistics/statistics.shtml

● Exploring the way how benchmark results can be estimated from past measurements

● Database management● Symbolic interpolation

Page 56: Grid performance, grid benchmarks, grid metrics Zsolt Németh MTA SZTAKI Computer and Automation Research Institute zsnemeth@sztaki.hu zsnemeth

Conclusion

● A semantic definition for grids● the presence of user and resource abstraction

● Grid performance has a more complex meaning

● Resource abstraction requires abstraction in the performance characterisation, too

● separation of local (physical) an global (virtual) metrics

● benchmarking is not viable● but benchmarks can serve as metrics

● Experiments with resource characterisation