fist: a fast, lightweight, fpga-friendly packet latency...

26
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C. Hoe, Onur Mutlu [email protected], [email protected], [email protected] Computer Architecture Lab at Our work was supported by NSF. We thank Xilinx and Bluespec for their FPGA and tool donations.

Upload: vothu

Post on 28-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency

Estimator for NoC Modeling in Full-System Simulations

5/3/2011

Michael K. Papamichael, James C. Hoe, Onur [email protected], [email protected], [email protected]

Computer Architecture Lab at

Our work was supported by NSF. We thank Xilinxand Bluespec for their FPGA and tool donations.

Page 2: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Simulation in Computer Architecture

Slow for large-scale multiprocessor studies

Full-system fidelity + long benchmarks

How can we make it faster?

Speed, accuracy, flexibility trade-off

Full-system simulators sacrifice accuracy for speed and flexibility

Speed

Accuracy Flexibility

Accelerate simulation with FPGAs

Can simulate up to millions of gates

2

Orders of magnitude simulation speedup

Page 3: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

The FIST Project

3

Explores fast NoC models for full-system simulations FPGA-friendly, but avoid direct implementation Low error, many topologies, >10M packets/sec

Simpler requirements of full-system simulation Estimate packet latencies, capture high-order effects

L2L1P

L2L1P

L2L1P

L2L1P

R

R

R

R

FPGA

(Xilinx LX110T)

FPGA area requirements

for state-of-the-art mesh NoC*

4x4

5x5

*NoC RTL from http://nocs.stanford.edu/router.html

Page 4: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

FIST Approach

View NoC as set of routers/links Abstract router into black-box Represent by load-delay curves Specific to each router configuration and traffic pattern

R

R

R

R R

R R

R R

NN

N

N

N

NN

N N

R

R

R R

RR

4

N

RStateLogic

BuffersRRLa

ten

cy

Load

Page 5: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

N

N

N

N

N

N

FIST Approach

Treat each hop as a set of load-delay curves Trade-off between model complexity and fidelity

Keep track of load at each node To track router load monitor traffic over window of time

R

R

R

R

R

RN

N

N

R

R

R

5

Late

ncy

Load

Page 6: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

N

N

N

N

N

NN

N

N

FIST in Action

Route packet from source to destination Determine routers that will be traversed

Sum up the delays for each traversed router Index load-delay curves using current load at each router

R

R

R

R

R

R

R

R

R

S

D

packet

delay

6

Page 7: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Outline

Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions

Page 8: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Outline

Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions

Page 9: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Fee

db

ack

Putting FIST Into Context

Train Curves Use Curves

Network models within full-system simulators Model network within a broader simulated system

Assign delay to each packet traversing the network

Traffic generated by real workloads

Up

date

d C

urve

s

Detailed network models Cycle-accurate network simulators (e.g. BookSim)

Analytical network models

Typically study networks under synthetic traffic patterns

9

Page 10: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Offline and Online FIST

Offline FIST Detailed network simulator generates curves offline Can use synthetic or actual workload traffic Load curves into FIST and run experiment

Detailed Network Model

Online FIST (tolerates dynamic changes in network behavior) Initialization of curves same as offline Periodically run detailed network simulator on the side Compare accuracy and, if necessary, update curves

Detailed Network Model

Provide feedback and receive updated curves

10

Page 11: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Online Training in Action

El

05

101520253035404550

0

66

13

2

19

8

26

4

33

0

39

6

46

2

18

84

19

50

20

16

20

82

21

48

22

14

22

80

23

46

24

12

24

78

Late

ncy

Ellapsed Cycles (in 1000s)

Actual Latency

Estimated Latency

El

Elapsed cycles (in 1000s)

Example with no initial training

Before Training

After Training

Page 12: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

FIST Applicability

“FIST-Friendly” Networks

Exhibit stable, predictable behavior as load fluctuates

Actual traffic similar to training traffic

FIST Limitations

Depends on fidelity, representativeness of training models

Higher loads and large buffers can limit FIST’s accuracy

High network load increased packet latency variance

Large buffers increased range of observed packet latencies

Cannot capture fine-grain packet interactions

Cannot replace cycle-accurate detailed network models

FIST only as good as its training data12

Page 13: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Applying FIST to NoCs

NoCs affected by on-chip limitations and scarce resources

Employ simple routing algorithms

Usually simple deterministic routing

Operate at low loads

NoCs usually over-provisioned to handle worst-case

Have been observed to operate at low injection rates

Small buffers

On-chip abundance of wires reduces buffering requirements

Amount of buffering in NoCs is limited or even eliminated

NoCs are “FIST-Friendly”13

Page 14: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Outline

Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions

Page 15: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

FIST Implementations

PacketDescriptors

Router Elements

Src

Dest

Size

CalculateDelay

Pickrouters

Routing Logic

Packet

Delays

Partial DelaysB

RA

M

Load Tracker

Curve

Software Implementation of FIST (written in C++) Implements online and offline FIST models

Hardware Implementation (written in Bluespec) Precisely replicates software-based FIST Block diagram of architecture

15

Handle load tracking

& delay queries

Page 16: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Peeking Under The Hood

Similar issues arise for load tracking & dynamic training

Tracking Latency T 567 4 23 H

R0

S D

R1 R2

Wormhole-routed

Injection latencyTraversal latencies

?

Packet Latency = 30

Latency =

Latency =

Latency =

4 5 6 7 TH

TH

0 10 20 30

TH

5 15 25

2 3

4 5 6 72 3

52 3

R0R1R2 4

R0R1R2

Store-and-forwardPacket Latency = 30

Latency = 9

Latency = 14

Latency = 7

4 5 6 7 TH

TH

0 10 20 30

TH

5 15 25

2 3

4 5 6 72 3

5 6 742 3

R0R1R2

R0R1R2

Use separate injection and traversal latency curves per router

Page 17: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Methodology

Examined online and offline FIST models Replaced cycle-accurate NoC model in tiled CMP simulator

Network and system configuration 4x4, 8x8, 16x16 wormhole-routed mesh Each network node hosts core+coherent L1 and a slice of L2

Multiprogrammed and multithreaded workloads 26 SPEC CPU2006 benchmarks of varying network intensity 8 SPLASH-2 and 2 PARSEC workloads

Traffic generated by cache misses

Consists of control, data and coherence packets

Offline and Online FIST models with two curves per router Curves represent injection and traversal latency at each router Initial training using uniform random synthetic traffic

Please see paper for more details!17

Page 18: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Accuracy Results (offline)

8x8 mesh using FIST offline model

Average Latency and Aggregate IPC Error

-15%

-10%

-5%

0%

5%

10%

15%Latency ErrorIPC Error

MT (SPL/PAR)MP (High)MP (Med)MP (Low)

Late

ncy

/IP

CEr

ror

(in

%)

IPC Error < 4%

Latency Error < 8%

18

Page 19: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Accuracy Results (online)

-0.1

-0.05

0

0.05

0.1

Late

ncy

Err

or

Latency Error

IPC Error

MT (SPL/PAR)MP (High)MP (Med)MP (Low)

8x8 mesh using FIST online model

Average Latency and Aggregate IPC Error

Both Latency and IPC Error below 3%

19

Late

ncy

/IP

CEr

ror

(in

%)

-10%

-5%

0%

5%

10%

Page 20: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

What about a very simple model?

Very high error for both latency and IPC!20

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Late

ncy

Erro

r

Latency Error

IPC Error

MT (SPL/PAR)MP (High)MP (Med)MP (Low)

Late

ncy

/IP

CEr

ror

(in

%)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Late

ncy

Erro

r

Latency Error

IPC ErrorLatency ErrorIPC Error

-10%

-20%

0%

20%

40%

60%

80%

-40%

-60%

-80%

8x8 mesh using hop-based model How does simple network model affect high-order results?

FIST models always within this range

Page 21: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Performance Results

Qualitative comparison

10K

100K

1M

10M

100M

SW FIST

HW FIST

Complexity / Level of Detail

Sp

ee

d (

pa

cke

ts/s

ec)

21

1K

SW NoC Sims(e.g. BookSim)

SW-based speedup results for 16x16 mesh Offline FIST: 43x Online FIST: 18x

HW-based speedup (offline): ~3-4 orders of magnitude

Page 22: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Hardware Implementation Results

FPGA resource usage & clock frequency

Different mesh configurations

Xilinx Virtex-5 LX155T FPGA

FIST Model Direct Implementation

Size FPGA Area Freq. FPGA Area Freq.

4x4 4% 380 MHz 61% 130 MHz

8x8 15% 263 MHz - -

12x12 34% 250 MHz - -

16x16 60% 214 MHz - -

20x20 94% 200 MHz - -

FIST can scale to large NoCs with many routers

Will not fit

Page 23: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Outline

Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions

Page 24: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Related Work

Vast body of work on network modeling

Analytical models, hardware prototyping, etc.

Abstract network modeling

Performance vs. accuracy trade-off studies [Burger 95]

Load-delay curve representation of network [Lugones 09]

FPGAs for network modeling

Cycle-accurate fidelity at the cost of limited scalability

Time-multiplexing can help with scalability [Wang 10]

But still suffer from high implementation complexity

Page 25: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Conclusions & Future Directions

Conclusions Full-system simulators can tolerate small inaccuracies

FIST can provide fast SW- or HW-based NoC models

SW model provides 18x-43x average speedup w/ <2% error

HW model can scale to 100s routers with >1000x speedup

NoCs within a CMP are “FIST-friendly”

But not all networks good candidates for FIST modeling

Future Directions FPGA-friendly NoC models at multiple levels of fidelity

Configurable generation of hardware NoC models

Page 26: FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator

CALCM Computer Architecture Lab at Carnegie Mellon

Thanks!

Questions?