synthetic traffic generation - slip onlinesliponline.org/slip07/presentations/4c_heirman.pdf ·...
TRANSCRIPT
Synthetic Traffic Generation:a Tool for Dynamic Interconnect Evaluation
W. Heirman, J. Dambre, J. Van CampenhoutELIS Department, Ghent University, Belgium
Sponsored by IAP-V PHOTON & IAP-VI photonics@be, Belgian Science Policy Office
PHOT
ONnetwork
2
Outline
• Introduction• Synthetic traffic generation• Results• Conclusions
3
Distributed shared-memory architecture
CPUMEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
supercomputer
onon--chipchip
server
Network is part of the memory hierarchy
cache
instruction: 0.5 ns
cache: 5 ns
DDR: 50 ns
network: 500 ns
4
Interconnect requirementsNon-uniform network traffic in space and time
=> Reconfigurable network?
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
time
load
Link #5
time
load
Link #9
time
load
Link #13
5
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
CPU MEM
NetIF
Reconfiguration implementation:base network + extra reconfigurable links
Base network (fixed)Extra links (reconfigurable)
CPU 1
CPU 2
...
CPU n
Broadcast element
Fiber links
Processor nodes
Tunable lasers
CPU 1
CPU 2
...
CPU n
Photodetectors
other ‘dynamic networks’: e.g. per-link voltage scaling
6
Evaluate networks with synthetic traffic
• Mimics the behavior of real traffic• But without the computational cost of
modeling application, OS, CPUs, caches, …
ApplicationOS
CPUCaches
Reconfigurablenetwork
Synthetictraffic generator
Reconfigurablenetwork
network traffic
simulator simulator
simulation time ÷10
7
We need better synthetic traffic
Reconfiguration exploits low-frequency dynamics in the network traffic
• Trace-driven simulation using static traffic patterns (uniform, hotspot, shuffle, …)won’t do!
• Full execution-driven simulation (traffic is driven by application:FFT, weather forecast, database) is too slow!
8
Outline
• Introduction• Synthetic traffic generation• Results• Conclusions
9
Realistic synthetic traffic generation
• One execution-driven simulation
• Resulting traffic profile re-used many times
ApplicationOS
CPUCaches
Synthetictraffic generator
Reconfigurablenetwork
network traffic
Parameterextraction
Statistical traffic profile
×1 ×n
10
Preserve packet-interdependencies by using packet groups
Packets are processed/generated in groups, corresponding to one memory operation each
processor
home
REQ (1) REPLY (2)
processor
home
REQ (1) REPLY (4)
owner
WBreq (2) WBreply (3)
processor
home
REQ (1) REPLY (4)
sharer
INVreq (2) INVreply (3)
sharer ...
(3) (2)
11
Distribution of # involved nodes...
2 3 n
12
Reuse distance of home nodes:introduce locality
previous destinations
next destination
node A
node Cnode Bnode Anode D…
2
13
Computation or ‘think’ timeModels time delay between subsequent requests
14
Outline
• Introduction• Synthetic traffic generation• Results• Conclusions
15
Simulations
• Simulation platform: Simics, providing functionalmultiprocessor simulation– 16 UltraSPARC III processors– SPLASH-2 parallel benchmarks
• Timing model:– Computes the latency
for each memory access– Models caches, interconnection network– Base network: 4x4 torus– Extra links: configurable number,
fan-out, reconfiguration interval
16
Simulations
Once per benchmark:Simulate execution of the benchmark, base network only, measuring traffic profile (1)
For each set of extra link parameters:• Execution-driven simulation with reconfigurable network (2)
“correct” result• Trace-driven simulation using (simplified) traffic from (2)
tracing error• Trace-driven simulation using (simplified) traffic from (1)
traffic-dependence on network• Trace-driven simulation using synthetic traffic
total error
17
Several parameters can be measured
18
Detailed view of “average packet latency”
19
Variability for shorter traces
trace-driven +profiling* exec-driven* assuming traffic profile is re-used 100 times
synthetic traffic execution-driven
20
Outline
• Introduction• Synthetic traffic generation• Results• Conclusions
21
Conclusions
• Synthetic traffic generation was extended to– shared-memory cache-coherence protocols,– reconfigurable networks
• Good relative accuracy for different network topologies
• Much less computationally expensive (x10), even more so for shorter traces (x100)
• Reproducibility equal to or better than execution-driven simulations