andré seznec caps team irisa/inria 1 simulation: a user point of view andré seznec irisa/inria...

40
1 André Seznec Caps Team IRISA/INRIA Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

Upload: andrew-lloyd

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

1

André Seznec Caps Team

IRISA/INRIA

Simulation: a user point of view

André  Seznec

IRISA/INRIA

Sept. 1998

Page 2: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

2Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Myself (1)

Senior researcher Working on computer architecture for 15 years Works on:

memory systems pipeline structure cache structures branch prediction mechanisms Simultaneous Multithreading

Page 3: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

3Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Myself (2)

Interested in computer architecture For me, tools are the dark side of architecture!

Page 4: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

4Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Validating microarchitecture concepts

Just a description with some explanation: beginning of the 80 ’s not so bad

Analytical model ? May work for coarse grain evaluation on networks, on

multiprocessors, .. For microarchitecture, just be serious !!

Simulation I have not found a better method ! but, I do not overtrust simulation results

Page 5: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

5Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simulation: my own experience

1986-88: DSPA and multi-DSPA 1988-91: OPAC floating point coprocessor 1991-96: cache simulation 1993-.. : processor simulation 1996- .. : branch prediction 1994- .. : Simultaneous Multithreading

Page 6: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

6Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

DSPA and multi-DSPA

Decoupled Access Execute Architecture Shared memory architecture Original memory and interconnection network

Hardware FIFOs everywhere !!

Page 7: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

7Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

DSPA simulation

Primitive ! Benchmarks:

Just a few numerical kernels hand-coded assembly

Cycle accurate simulation! Validation of the memory system

Page 8: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

8Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

OPAC floating-point coprocessor

Floating-point coprocessor Dedicated to compute-bound kernels

matrix operations: BLAS3 library FFTs, convolutions, ..

Built real hardware ! 300 ICs board a special-purpose VLSI sequencer

Page 9: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

9Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

OPAC (2)

Developped in //: HDL simulator on CAD tool C simulator pseudo-language + pseudo compiler applications

Total interactions: completely accurate simulator design decisions based on:

• hardware constraints

• performance evaluation

• code generation

Page 10: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

10Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

OPAC (3)

It was real fun ! We learned a lot of things ! Research impact: ??

Killer micros appeared at this period

Page 11: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

11Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Cache simulation

We begun in 1991 Among the first groups in Europe We had to learn everything:

how to the get the traces which benchmarks how to simulate what is important

Page 12: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

12Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Getting the first traces:

ATUM traces: hardware monitored VAX traces fttp available very short !

DLX traces: ftp availlable

SparcSim simulator very very slow

Page 13: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

13Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Chosing the first benchmarks:

Picking our own applications !! Not a worst choice than SPEC92 or SPEC95 But, reviewers like standards !

Page 14: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

14Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

The first simulators:

Tried the Dinero simulator from Mark Hill But, Skewed Associative Caches ?

Titre:

Auteur:

Aperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.

Page 15: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

15Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

First results:

Skewed-associative caches (1992) simulation helps to convince the reviewers good presentation and good figures are far more important Five years later, simulation results just become noise

Semi-Unified caches (1993) simulation needed quantify the benefit Unfortunately, we just picked the bad conference

Page 16: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

16Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Further cache studies

Using more accepted traces: SPEC92-95, SPLASH, etc

trace collection through spa: Gordon Irlan

Quantifying the impact through simulations Explaining and analyzing is more important

Page 17: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

17Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Other simulations

Cache simulation is a piece of cake

« Real architects » simulate complete processors scalar processor in-order superscalar orocessors out-of-order superscalar processors Simultaneous Multithreading processors

Next needed step: complete wrong path simulations

Page 18: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

18Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Two block ahead branch prediction

ASPLOS 96 Great idea ! Poor simulation methodology !

Performance impact ignored !

Titre:

Auteur:idrawAperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.

Page 19: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

19Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Two block ahead branch predictor (continued)

Now: complete processor simulation IBS traces (hardware monitored) pro and cons understood:

• high instruction bandwidth• misprediction penalty problem

Limitations: No wrong path execution simplified execution core IBS traces (1993, 16Mhz processor, 16Mb memory)

Page 20: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

20Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simultaneous Multithreading

Several processes sharing functional units in a superscalar processor.

First paper (Tullsen et al): 1995 We begun in //: 1993: one year too late

Page 21: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

21Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simultaneous Multithreading (2)

We got results on: branch prediction cache behavior In-order versus out-order execution

Methodology: Trace-driven simulation mixing traces from different processes

Page 22: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

22Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simultaneous Multithreading (3)

Known limitations: Operating system Context switches spectrum of applications wrong path execution

However: Solid results

Page 23: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

23Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Skewed Branch Predictors

Titre:e-gskew.epsAuteur:fig2dev Version 3.2 Patchlevel 1Aperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.

Page 24: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

24Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Skewed Branch Predictors

The most complete analysis we have ever done: Explanations Simulations Mathematical analysis

Likely to become a reference paper

Page 25: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

25Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

What I have learned from my experience

Simulations help to: convince :

• yourself (most important)• most of the reviewers (that‘s life)

in-depth analysis:• discover why it works• explain to « real » architects

Page 26: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

26Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

What I have learned from my experience (2)

Don ’t overtrust simulation results ! Be aware of:

just a measurement point

Page 27: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

27Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

traces are ALWAYS limited:

toys applications and/or lacks operating system activities and/or old applications:

IBS traces:16 Mhz scalar processor - 16 Mb memory Future processors:

• > 1 Ghz 10-way superscalar, >1 Gb memory

Page 28: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

28Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simulators limitation

« Complete » simulation: Complex ( tenths of thousands code lines) CPU time consuming

Always some simplification assumption: Sometimes valid, Sometimes not ..

Simulators are slow !

Page 29: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

29Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Simulator User Limitations

99 % of the simulation results are directly for the garbage: Just a bug in the simulator ! « This idea was ridiculous ! » (two monthes of work) Just lacking the interesting measure ! « Finally, I do not have place for this graph ! »

Page 30: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

30Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Trusting or distrusting YOUR OWN simulation results

Trust the tendencies: « This mechanism has a better behavior than this other »

Always distrust absolute numbers: « A 32-Kbyte cache is sufficient as it exhibits a 1 % miss

rate »• on your benchmark, with your tracing tool, without

kernel, without context swiches, ..

Page 31: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

31Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

As a reviewer or PC member

I just do not trust simulation results that I do not understand

What is needed: Clear explanation Insight of why things are working Insight of possible limitations

Simulations free papers are refreshing sometimes: « Difference-bit cache » Toni Juan et al 1996

Page 32: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

32Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Needs for microarchitecture simulation today

Some existing processors performance were misestimated by a factor 20 % by the manufacturer

Wrong path execution Kernel activities

Page 33: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

33Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Execution or trace-driven simulation ?

Trace-driven simulation sufficient for: comparing two cache structures or two branch predictors In-order processor simulation

Execution-driven required for: precise out-of-order processor simulation studying bandwidth impact

Page 34: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

34Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Which << traces >>?

We need real workloads: all processes running on a workstation user and kernel activities on real data

Ideally, let us capture the whole activity of a workstation for a second each hour Calvin

Page 35: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

35Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

TOOLS, WHAT WE ARE DOING

Salto: a System for Assembly Languages Transformation and Optimizations

Calvin: Cloning Assembly Languages in View of Instrumentation Needs

Page 36: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Salto Overview

assembly source to source preprocessor retargetable, exists for sparc, alpha, mips, Philips

TM-1000, Pentium, TI C6x can be used

to instrument or transform assembly code, to schedule assembly code, for register allocation, basic bloc layout, etc. derive simulators

fine grain machine description object-oriented interface

Page 37: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Salto Organisation

Transformationtool

SALTO

inte

rfac

e

C++

Machine Description

assemblylanguage

assemblylanguage

Page 38: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

38Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Calvin

With current Instrumentation tools codes are running slow

Just want to pay the penalty when collecting traces

Code cloning tracing allows it Calvin Status:

built using Salto work on single user application

Overall project: instrument a Linux workstation

Page 39: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

39Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

Calvin

With current Instrumentation tools codes are running slow

Just want to pay the penalty when collecting traces

Code cloning tracing allows it Calvin Status: work on single user application

Overall project: instrument a Linux workstation

Page 40: André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

40Sim

ula

tio

n:

a u

ser

po

int

of

view

André SeznecCaps Team

IRISA/INRIA

My conclusion

Architecture is fun Simulation is

boring (always) necessary (sometimes) misleading (often)

As processors become more and more complex, « numbers » will become less and less accurate. Only trust tendencies.

Tools are more and more needed