andré seznec caps team irisa/inria 1 simulation: a user point of view andré seznec irisa/inria...
TRANSCRIPT
1
André Seznec Caps Team
IRISA/INRIA
Simulation: a user point of view
André Seznec
IRISA/INRIA
Sept. 1998
2Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Myself (1)
Senior researcher Working on computer architecture for 15 years Works on:
memory systems pipeline structure cache structures branch prediction mechanisms Simultaneous Multithreading
3Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Myself (2)
Interested in computer architecture For me, tools are the dark side of architecture!
4Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Validating microarchitecture concepts
Just a description with some explanation: beginning of the 80 ’s not so bad
Analytical model ? May work for coarse grain evaluation on networks, on
multiprocessors, .. For microarchitecture, just be serious !!
Simulation I have not found a better method ! but, I do not overtrust simulation results
5Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simulation: my own experience
1986-88: DSPA and multi-DSPA 1988-91: OPAC floating point coprocessor 1991-96: cache simulation 1993-.. : processor simulation 1996- .. : branch prediction 1994- .. : Simultaneous Multithreading
6Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
DSPA and multi-DSPA
Decoupled Access Execute Architecture Shared memory architecture Original memory and interconnection network
Hardware FIFOs everywhere !!
7Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
DSPA simulation
Primitive ! Benchmarks:
Just a few numerical kernels hand-coded assembly
Cycle accurate simulation! Validation of the memory system
8Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
OPAC floating-point coprocessor
Floating-point coprocessor Dedicated to compute-bound kernels
matrix operations: BLAS3 library FFTs, convolutions, ..
Built real hardware ! 300 ICs board a special-purpose VLSI sequencer
9Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
OPAC (2)
Developped in //: HDL simulator on CAD tool C simulator pseudo-language + pseudo compiler applications
Total interactions: completely accurate simulator design decisions based on:
• hardware constraints
• performance evaluation
• code generation
10Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
OPAC (3)
It was real fun ! We learned a lot of things ! Research impact: ??
Killer micros appeared at this period
11Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Cache simulation
We begun in 1991 Among the first groups in Europe We had to learn everything:
how to the get the traces which benchmarks how to simulate what is important
12Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Getting the first traces:
ATUM traces: hardware monitored VAX traces fttp available very short !
DLX traces: ftp availlable
SparcSim simulator very very slow
13Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Chosing the first benchmarks:
Picking our own applications !! Not a worst choice than SPEC92 or SPEC95 But, reviewers like standards !
14Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
The first simulators:
Tried the Dinero simulator from Mark Hill But, Skewed Associative Caches ?
Titre:
Auteur:
Aperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.
15Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
First results:
Skewed-associative caches (1992) simulation helps to convince the reviewers good presentation and good figures are far more important Five years later, simulation results just become noise
Semi-Unified caches (1993) simulation needed quantify the benefit Unfortunately, we just picked the bad conference
16Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Further cache studies
Using more accepted traces: SPEC92-95, SPLASH, etc
trace collection through spa: Gordon Irlan
Quantifying the impact through simulations Explaining and analyzing is more important
17Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Other simulations
Cache simulation is a piece of cake
« Real architects » simulate complete processors scalar processor in-order superscalar orocessors out-of-order superscalar processors Simultaneous Multithreading processors
Next needed step: complete wrong path simulations
18Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Two block ahead branch prediction
ASPLOS 96 Great idea ! Poor simulation methodology !
Performance impact ignored !
Titre:
Auteur:idrawAperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.
19Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Two block ahead branch predictor (continued)
Now: complete processor simulation IBS traces (hardware monitored) pro and cons understood:
• high instruction bandwidth• misprediction penalty problem
Limitations: No wrong path execution simplified execution core IBS traces (1993, 16Mhz processor, 16Mb memory)
20Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simultaneous Multithreading
Several processes sharing functional units in a superscalar processor.
First paper (Tullsen et al): 1995 We begun in //: 1993: one year too late
21Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simultaneous Multithreading (2)
We got results on: branch prediction cache behavior In-order versus out-order execution
Methodology: Trace-driven simulation mixing traces from different processes
22Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simultaneous Multithreading (3)
Known limitations: Operating system Context switches spectrum of applications wrong path execution
However: Solid results
23Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Skewed Branch Predictors
Titre:e-gskew.epsAuteur:fig2dev Version 3.2 Patchlevel 1Aperçu:Cette image EPS n'a pas été enregistréeavec un aperçu intégré.Commentaires:Cette image EPS peut être imprimée sur uneimprimante PostScript mais pas surun autre type d'imprimante.
24Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Skewed Branch Predictors
The most complete analysis we have ever done: Explanations Simulations Mathematical analysis
Likely to become a reference paper
25Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
What I have learned from my experience
Simulations help to: convince :
• yourself (most important)• most of the reviewers (that‘s life)
in-depth analysis:• discover why it works• explain to « real » architects
26Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
What I have learned from my experience (2)
Don ’t overtrust simulation results ! Be aware of:
just a measurement point
27Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
traces are ALWAYS limited:
toys applications and/or lacks operating system activities and/or old applications:
IBS traces:16 Mhz scalar processor - 16 Mb memory Future processors:
• > 1 Ghz 10-way superscalar, >1 Gb memory
28Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simulators limitation
« Complete » simulation: Complex ( tenths of thousands code lines) CPU time consuming
Always some simplification assumption: Sometimes valid, Sometimes not ..
Simulators are slow !
29Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Simulator User Limitations
99 % of the simulation results are directly for the garbage: Just a bug in the simulator ! « This idea was ridiculous ! » (two monthes of work) Just lacking the interesting measure ! « Finally, I do not have place for this graph ! »
30Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Trusting or distrusting YOUR OWN simulation results
Trust the tendencies: « This mechanism has a better behavior than this other »
Always distrust absolute numbers: « A 32-Kbyte cache is sufficient as it exhibits a 1 % miss
rate »• on your benchmark, with your tracing tool, without
kernel, without context swiches, ..
31Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
As a reviewer or PC member
I just do not trust simulation results that I do not understand
What is needed: Clear explanation Insight of why things are working Insight of possible limitations
Simulations free papers are refreshing sometimes: « Difference-bit cache » Toni Juan et al 1996
32Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Needs for microarchitecture simulation today
Some existing processors performance were misestimated by a factor 20 % by the manufacturer
Wrong path execution Kernel activities
33Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Execution or trace-driven simulation ?
Trace-driven simulation sufficient for: comparing two cache structures or two branch predictors In-order processor simulation
Execution-driven required for: precise out-of-order processor simulation studying bandwidth impact
34Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Which << traces >>?
We need real workloads: all processes running on a workstation user and kernel activities on real data
Ideally, let us capture the whole activity of a workstation for a second each hour Calvin
35Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
TOOLS, WHAT WE ARE DOING
Salto: a System for Assembly Languages Transformation and Optimizations
Calvin: Cloning Assembly Languages in View of Instrumentation Needs
Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Salto Overview
assembly source to source preprocessor retargetable, exists for sparc, alpha, mips, Philips
TM-1000, Pentium, TI C6x can be used
to instrument or transform assembly code, to schedule assembly code, for register allocation, basic bloc layout, etc. derive simulators
fine grain machine description object-oriented interface
Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Salto Organisation
Transformationtool
SALTO
inte
rfac
e
C++
Machine Description
assemblylanguage
assemblylanguage
38Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Calvin
With current Instrumentation tools codes are running slow
Just want to pay the penalty when collecting traces
Code cloning tracing allows it Calvin Status:
built using Salto work on single user application
Overall project: instrument a Linux workstation
39Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
Calvin
With current Instrumentation tools codes are running slow
Just want to pay the penalty when collecting traces
Code cloning tracing allows it Calvin Status: work on single user application
Overall project: instrument a Linux workstation
40Sim
ula
tio
n:
a u
ser
po
int
of
view
André SeznecCaps Team
IRISA/INRIA
My conclusion
Architecture is fun Simulation is
boring (always) necessary (sometimes) misleading (often)
As processors become more and more complex, « numbers » will become less and less accurate. Only trust tendencies.
Tools are more and more needed