scalapipe: a streaming application generatorroger/transfer/saahpc-2012.pdf · scalapipe: a...

34
ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron 1 This work is supported by the National Science Foundation under grants CNS-09095368 and CNS-0931693.

Upload: others

Post on 20-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

ScalaPipe: A Streaming Application Generator

Joseph G. Wingbermuehle, Roger D. Chamberlain,

Ron K. Cytron

1

This work is supported by the National Science Foundation under grants CNS-09095368 and CNS-0931693.

Page 2: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Streaming

Computation kernels, or blocks connected by explicit communication channels

Stage 1 Stage 2 Stage 3

2

Advantages: – Performance – Reuse – Abstraction

Systems: – Auto-Pipe [Fr06]

– Streams-C [Go00] – StreamIT [Th02]

Page 3: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Example:Solution to Laplace’s Equation

• PDE with several uses, including stationary heat diffusion • Solvable using a Monte-Carlo technique

3

Page 4: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Streaming Implementation

4

Random Walk Print

Page 5: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Parallel Walks

5

Random Split

Walk

Walk

Average Print

Page 6: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Auto-Pipe X

X Description

C Block

X compiler

VHDL Block

Application

6

&

Page 7: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Edge Connections

Laplace Application in X

block top { ! Random rand; Split split; Walk walk1; Walk walk2; Average avg; Print print; !e1: rand -> split; e2: split.y0 -> walk1; e3: split.y1 -> walk2; e4: walk1 -> avg.x0; e5: walk2 -> avg.x1; e6: avg -> print; };

Labels

Block instances

Blocks are implemented

externally in C or an HDL.

7

rand

split

walk1 walk2

avg

print

e1

e2 e3

e4 e5

e6

Page 8: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Observation 1As the number of Walk blocks increases,the amount of configuration code increases

8

ÊÊ

Ê

Ê

Ê

5 10 15Walk Blocks

20

40

60

80

100

Lines

128 Walk blocks requires 896 lines of X!

e1: rand -> split;!e2: split.y0 -> walk1;!e3: split.y1 -> walk2;!e4: walk1 -> avg.x0;!e5: walk2 -> avg.x1;!e6: avg -> print;

Two Walk blocks:

e1: rand -> split1; e2: split1.y0 -> split2; e3: split1.y1 -> split3; e4: split2.y0 -> walk1; e5: split2.y1 -> walk2; e6: split3.y0 -> walk3; e7: split3.y1 -> walk4; e8: walk1 -> avg1.x0; e9: walk2 -> avg1.x1; e10: walk3 -> avg2.x0; e11: walk4 -> avg2.x1; e12: avg1 -> avg3.x0; e13: avg2 -> avg3.x1; e14: avg3 -> print;

Four Walk blocks:

Page 9: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Our ApproachType-safe generator language

9

val Laplace = new AutoPipeApp { ! val random = Random() val splits = iteratedMap(levels, random, SplitU32) ! val walks = Array.tabulate(1 << levels) { x => Walk(splits(x))() } ! val result = iteratedFold(walks, AverageU32) Print(result) }

Same code can generate 1 Walk block or 128 Walk blocks.

Page 10: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Observation 2

Moving blocks to a new device requires reimplementation

10

HDL Implementation

C Implementation Others

Page 11: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Our ApproachA single language for block implementations

11

ScalaPipe Block

HDL Implementation

C Implementation Others

Page 12: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Observation 3Changing the data type requires new block implementations

12

module ShiftRightU32(...); ! input wire[31:0] input_x; output wire[31:0] output_y;!!...! output_y <= input_x >> 1;!... !endmodule

module ShiftRightS64(...); ! input wire[63:0] input_x; output wire[63:0] output_y;!!...! output_y <= input_x >>> 1;!... !endmodule

Page 13: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Our Solution

Polymorphic block implementations

13

class Average(t: AutoPipeType) extends AutoPipeBlock { ! val in0 = input(t) val in1 = input(t) val out = output(t) ! out = (in0 + in1) / 2 }

Same implementation works for integral, fixed point, and floating point types.

Page 14: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Observation 4

The block interface for blocks on the same resource is a bottleneck

14

Block 1 Implementation

Block 2 Implementation

Runtime System

Block Interface Block Interface

Page 15: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Compiler

Our Approach

Single compiler for both the block language and coordination language.

15

Coordination Language Block Language

Page 16: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

ScalaPipe

Source code (Scala)

Scala compiler Generator

ScalaPipe Library

Coordination DSL

Block DSL

Application 1 (e.g. 2 Walks)

16

Application 2 (e.g. 8 Walks)

Page 17: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

AverageU32 Block

val AverageU32 extends AutoPipeBlock { ! val in0 = input(UNSIGNED32) val in1 = input(UNSIGNED32) val out = output(UNSIGNED32) ! out = (in0 + in1) / 2 }

17

AverageU32 out

in0

in1

Page 18: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Polymorphic Average Block

18

class Average(t: AutoPipeType) extends AutoPipeBlock { ! val in0 = input(t) val in1 = input(t) val out = output(t) ! out = (in0 + in1) / 2 }!!val AverageU32 = new Average(UNSIGNED32)

t can be any of the following:

•Signed or unsigned integer of any width

•Fixed point type

•Floating point type

Page 19: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Language Virtualization [Ch10]

19

class Repeat(v: Int, count: Int) extends AutoPipeBlock { ! val in = input(SIGNED32) val out = output(SIGNED32) val tmp = local(SIGNED32) ! tmp = in if (tmp == v) { // Evaluated at run time for (i <- 1 to count) { // Expanded at compile time out = tmp } } else { out = tmp } }

Page 20: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

External AverageU32

val AverageU32 = new AutoPipeBlock { ! val in0 = input(UNSIGNED32) val in1 = input(UNSIGNED32) val out = output(UNSIGNED32) ! external(“HDL”, “AverageU32”) ! // Optional internal implementation }

• Potentially more efficient • External and internal blocks can be mixed

20

Page 21: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Block Code Generation

Internal Block Specification

Abstract Syntax Tree

Control Flow Graph

C

OpenCL C

Verilog

External Block Specification

21

Optimizer

Page 22: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

HDL Code Optimizer

• Common subexpression elimination • Dead store elimination • Dead code elimination • Strength reduction • Copy propagation • ASAP scheduling

22

Page 23: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Coordination DSLDescribes the topology and resource mapping

23

val Laplace = new AutoPipeApp { ! val random = Random() val splits = iteratedMap(levels, random, SplitU32) ! val walks = Array.tabulate(1 << levels) { x => Walk(splits(x))() } ! val result = iteratedFold(walks, AverageU32) Print(result) }

Page 24: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Generating Pipelines

24

def pipeline(s: Stream,! b: AutoPipeBlock,! n: Int): Stream = {! if (n > 0) {! pipeline(b(s), b, n - 1)! } else {! s! }!}

Inc Inc Inc Inc

val result = pipeline(source,! Inc, 4)

block pipeline {!! input UNSIGNED32 source;! output UNSIGNED32 result;!! Inc inc1;! Inc inc2;! Inc inc3;! Inc inc4;!! source -> inc1;! inc1 -> inc2;! inc2 -> inc3;! inc3 -> inc4;! inc4 -> result;!};

ScalaPipe:X language:

Page 25: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

CPU 0 CPU 0FPGA 0

Aspect-Oriented Resource Mapping

25

Random Split

Walk

Walk

Average Print

map(ANY_BLOCK -> Print, FPGA2CPU()

map(Random -> ANY_BLOCK, CPU2FPGA())

Page 26: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

TimeTrial [La11]

measure(ANY_BLOCK -> Walk, ‘backpressure)

10 20 30 40 50 60Frame

20

40

60

80

100% Backpressure

26

Random Split

Walk

Walk

Average Print

How do we find bottlenecks?

Page 27: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

CPU FPGA 16 Walks Custom RNG

50

100

150

200

Time HsL

101s

CPU 0

Illustration of Use

27

RNG Walk Print

Page 28: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

CPU FPGA 16 Walks Custom RNG

50

100

150

200

Time HsL

101s

174s

CPU 0

Illustration of Use

28

FPGA 0

RNG Walk Print

83% Backpressure

Page 29: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

CPU FPGA 16 Walks Custom RNG

50

100

150

200

Time HsL

101s

174s

41s

CPU 0

Illustration of Use

29

FPGA 0

RNG Split

Walk

0% Backpressure

Walk

Print

Page 30: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

CPU 0

Illustration of Use

30

FPGA 0

cRNG Split

Walk

Walk

Print

CPU FPGA 16 Walks Custom RNG

50

100

150

200

Time HsL

101s

174s

41s

12s

Page 31: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

The Current State of ScalaPipe

•Code generation for CPUs, FPGAs, and GPUs •FPGA and GPU code generation is suboptimal •No cross-block optimizations

31

Page 32: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

The Future of ScalaPipe

•Improved code generation -Consume multiple items at a time -More Verilog and OpenCL C optimizations

•Support for more devices •Library generation •Cross-block optimizations

32

Page 33: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

Conclusion

•ScalaPipe is a streaming application generator •The block DSL allows code reuse across data

types and platforms •The coordination DSL allows easy generation of

large and complex topologies •Keeping everything in the same language

exposes optimization opportunities

33

ScalaPipe

Block DSLCoordination DSL

Page 34: ScalaPipe: A Streaming Application Generatorroger/transfer/saahpc-2012.pdf · ScalaPipe: A Streaming Application Generator Joseph G. Wingbermuehle, Roger D. Chamberlain, Ron K. Cytron

References

H. CHAFI, Z. DEVITO, A. MOORS, T. ROMPF, A. K. SUJEETH, P. HANRAHAN, M. ODERSKY, AND K. OLUKOTUN, “Language virtualization for hetero- geneous parallel computing,” in Proc. of ACM Int’l Conf. on Object Oriented Programming Systems, Languages, and Applications, 2010, pp. 835–847.

J.M. LANCASTER, J. G. WINGBERMUEHLE, AND R. D. CHAMBERLAIN, “Asking for performance: Exploiting developer intuition to guide instrumentation with TimeTrial,” in Proc. of IEEE 13th Int’l Conf. on High Performance Computing and Communcations, Sep. 2011, pp. 321–330.

M. A. FRANKLIN, E. J. TYSON, J. BUCKLEY, P. CROWLEY, AND J. MASCHMEYER, “Auto-Pipe and the X language: A pipeline design tool and description language,” in Proc. of Int’l Parallel and Distributed Processing Symp., Apr. 2006.

M. B. GOKHALE, J. M. STONE, J. ARNOLD, AND M. KALINOWSKI, “Stream- oriented FPGA computing in the Streams-C high level language,” in Proc. of IEEE Symp. on Field-Programmable Custom Computing Machines, Apr. 2000, pp. 49–56.

W. THIES, M. KARCZMAREK, AND S. AMARASINGHE, “StreamIt: A language for streaming applications,” in Proc. of 11th Int’l Conf. on Compiler Construction, 2002, pp. 179–196.

34