pld synthesis algorithmscadlab.cs.ucla.edu/~cong/slides/pld synthesis algorithms_dac las vegas.pdf3...

41
6/22/2001 DAC 2001 Tutorial: Jason Cong 1 PLD Synthesis Algorithms Professor Jason Cong Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 <[email protected]> http://cadlab.cs.ucla.edu/~cong 6/22/2001 DAC 2001 Tutorial: Jason Cong 2 What to Synthesize Structured logic: Examples: datapath, register files, … Best to be provided by FPGA vendors as libraries and functional generators Random logic: Examples: control circuits, finite state machines Good candidates to be synthesized by automatic tools

Upload: others

Post on 12-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

1

6/22/2001 DAC 2001 Tutorial: Jason Cong 1

PLD Synthesis Algorithms

Professor Jason CongComputer Science Department

University of California, Los AngelesLos Angeles, CA 90095<[email protected]>

http://cadlab.cs.ucla.edu/~cong

6/22/2001 DAC 2001 Tutorial: Jason Cong 2

What to Synthesize

Structured logic:Examples: datapath, register files, …Best to be provided by FPGA vendors as libraries and functional generators

Random logic: Examples: control circuits, finite state machinesGood candidates to be synthesized by automatic tools

Page 2: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

2

6/22/2001 DAC 2001 Tutorial: Jason Cong 3

Programmable Logic Blocks (PLBs) in FPGAs

Lookup-table basedAltera APEX and FLEX devicesLucent Technologies ORCA devicesXilinx Virtex and XC4K devices

MUX-basedActel ACT1 and ACT2Quicklogic Eclips

PLA-based (CPLD)Altera MAX7000Cypress CY37000 and CY39000

6/22/2001 DAC 2001 Tutorial: Jason Cong 4

Focus of This Talk

Synthesis for random logicNeed high-degree of automationMuch room for optimizationExtensive research

Synthesis for SRAM-based (LUT-based) FPGAsHas the largest share in the FPGA marketSynthesis-friendlyReconfigurability provides many potential applications

Page 3: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

3

6/22/2001 DAC 2001 Tutorial: Jason Cong 5

Formulation of LUT-Based Synthesis Problems

Logic optimization (Network transformation)Transform the input network into another network that is more suitable for mapping into LUT networks

Technology mapping (LUT covering)Cover the optimized network with LUTs for one or more objectives

6/22/2001 DAC 2001 Tutorial: Jason Cong 6

Logic Optimization OperationsExample: decomposition

structural : abcd = ((ab)c)d

functional: f(a,b,c,d) = g (y1(a,b,c), y2(a,b,c), d)y1

y2 gf

Page 4: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

4

6/22/2001 DAC 2001 Tutorial: Jason Cong 7

Logic Optimization Operations (Cont’d)

Extractionf = ac + bc, g = ad + bd then f = ec, g = ed, e = (a+b)

Substitutionf = a+bc , h = bc then f = a + h

Eliminationf = a+bc , b = d+e, then f = a+cd+ce

Critical path re-synthesis…...

6/22/2001 DAC 2001 Tutorial: Jason Cong 8

Technology Mapping for K-LUT

Cover the network using K-LUTsDuplication-free v.s. duplicated mapping (k = 3)

original circuit duplication-free duplication

Page 5: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

5

6/22/2001 DAC 2001 Tutorial: Jason Cong 9

Outline

Early results (1990-95)

Recent advances (1995-1999)

New challenges (2000 - )

6/22/2001 DAC 2001 Tutorial: Jason Cong 10

Outline

1990 1995 20001998

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping +

retiming

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

Architecture

Synthesis

Page 6: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

6

6/22/2001 DAC 2001 Tutorial: Jason Cong 11

Outline

1990 1995 20001998

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping +

retiming

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

Architecture

Synthesis

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping

+ retiming

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

6/22/2001 DAC 2001 Tutorial: Jason Cong 12

Outline

1990 1995 20001998

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping +

retiming

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

Architecture

Synthesis

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

Page 7: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

7

6/22/2001 DAC 2001 Tutorial: Jason Cong 13

Outline

Early results (1990-95)Developed for homogeneous LUTsFocus on combinational circuits

Recent advances (1995-1999)

New challenges (2000 - )

6/22/2001 DAC 2001 Tutorial: Jason Cong 14

Early Results: Depth Minimization

Optimal mapping for treesChortle-d [Francis, Rose, Vranesic, ICCAD’91]

Optimal mapping for general networksFlowMap [Cong&Ding, ICCAD’92]

Page 8: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

8

6/22/2001 DAC 2001 Tutorial: Jason Cong 15

Early Result: FlowMapDepth-optimal technology mapping [Cong&Ding, TCAD’94]

BASIC APPROACHCompute a label for each node

Label of a node represents the minimum possible depth of the node in any mapping solution

Dynamic ProgrammingStarting from PI nodes, compute node labels in topological order: compute the label of a node based on labels of its predecessors

Labels of PO nodes give the depth of the optimal mapping solution.

6/22/2001 DAC 2001 Tutorial: Jason Cong 16

Cuts in a NetworkGiven a cut (X, X) and a label l(v) on each node v

Node-Cut size:n(X,X) = |{v:(v,u) is cut}|

K-feasible cut: n(X,X) < K

Height of a cut:h(X,X) = max{l(v)|v ∈ X}

0 0 0

s

1 1

12

2

3

33

3

4 44

4

t

X

X

Page 9: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

9

6/22/2001 DAC 2001 Tutorial: Jason Cong 17

Label Computation in FlowMapDynamic programming - compute each node label (optimal mapping depth) by computing a min-height K-feasible cut.Min-height K-feasible cut can be computed in O(Km) time using flow computation

LUT input size K = 3

1 1 1

1

2 2

0 0 0 0 0 0

infeasible cut, h = 0

K-feasible cut, h = 1

Primary inputs

u v

w

6/22/2001 DAC 2001 Tutorial: Jason Cong 18

FlowMap Algorithm: SummaryPhase1 : Label computationProcess each node t in topological order starting from PIs:Compute minimum height K-feasible cut (Xt, Xt) in Nt;l(t) = h(Xt, Xt) + 1;

Phase 2: Generate necessary K-LUTsL = list of POs;WHILE L≠ 0 DO

remove a node t from L;LUT(t) = Xt; L = L ∪ {non-PI inputs to LUT(t)}

END.Produce depth-optimal mapping for any K-bounded network in O(Kmn) time where m: # number of edges; n: # nodes in the network

Page 10: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

10

6/22/2001 DAC 2001 Tutorial: Jason Cong 19

Early Results: Area MinimizationOptimal mapping for trees with bounded or unbounded fanins

Chortle-crf [Francis, Rose, Vranesic, DAC’91] :

Optimal mapping without logic duplication for general networks

DF-Map [Cong&Ding, DAC’93] :

NP-hard for general networks with possible logic duplication

[Farrahi&Sarrafzadeh, TCAD’94] :6/22/2001 DAC 2001 Tutorial: Jason Cong 20

Early Results –Combined Synthesis with Mapping

Extension of traditional logic optimization techniques + covering & functional decomposition

MIS-pga and MIS-pga-delay [Murgai et. al., DAC’90, ICCAD91]

Use of functional decomposition to generate a LUT network directly

FGSyn [Lai, Pedram, Vrudula, DAC’93], IMODEC [Wurth, et al, DAC’95], BoolMap-D [Legl, et al, DAC’96]

Mapping with Re-synthesisFlowSYN [Cong&Ding, ICCAD’93] , ALTO [Huang, Jou, Shen,ICCAD’96]

Page 11: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

11

6/22/2001 DAC 2001 Tutorial: Jason Cong 21

Outline

1990 1995 20001998

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

Architecture

Synthesis

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping

+ retiming

6/22/2001 DAC 2001 Tutorial: Jason Cong 22

Outline

Early resultsRecent advances

Optimization and mapping for sequential circuitsSynthesis for heterogeneous FPGAsSynthesis for FPGAs with embedded memory blocksUse of Boolean matching instead of pattern matchingCombined decomposition and mappingUCLA RASP FPGA synthesis system

New challenges

Page 12: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

12

6/22/2001 DAC 2001 Tutorial: Jason Cong 23

Direct Optimization and Mapping for Sequential Circuits

F = 2 without retiming

3-LUT

Φ = 1 with retiming

3-LUT

original circuit

Traditional approachesAssuming the positions of FFs are fixedMapping each combinational subcircuit separatelyThe optimal solutions for all subcircuits may not lead to the optimal solution of the entire circuit

6/22/2001 DAC 2001 Tutorial: Jason Cong 24

Difficulties and Solutions

Difficulties:When to retime?

Before mapping? -- delay is un-known for retimingAfter mapping? -- FF positions are fixed during mapping

How to compute an equivalent initial state?

Solutions:Simultaneous mapping with retiming [Pan&Liu, DAC’96] [Cong&Wu, ICCD’96]

Optimal mapping + forward retiming [Cong&Wu, DAC’98]

Page 13: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

13

6/22/2001 DAC 2001 Tutorial: Jason Cong 25

Simultaneous Mapping with Retiming

Key idea -- expanded circuita DAG rooted at a node and,every path from a node to the root has the same #FFs

Usage: to form all possible LUTs under retiming

3-LUToriginal circuit

a

b c

a0a

b c

0

0 1

a

a

b c

0

0 1

1

1b c 2

6/22/2001 DAC 2001 Tutorial: Jason Cong 26

Simultaneous Mapping + Retiming (Cont’d)

Polynomial-time optimal algorithm for mapping + retiming

First proposed in SeqMapII [Pan&Liu, DAC’96]

Significant speed-up (over 2000x) achieved by TurboMap [Cong&Wu, ICCD’96 ]

Automatic pipelining with use of re-synthesis to reduce max. loop’s delay-to-register ratio

TurboSYN [Cong&Wu, DAC’97]

Page 14: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

14

6/22/2001 DAC 2001 Tutorial: Jason Cong 27

Experimental Results: Mapping + Retiming + Pipelining

16 MCNC FSMs and ISCAS Sequential Benchmarks with 30~10,000 simple gates

3.3

6.97.9

avg. Clock Period

TurboSYN:resynthesis+retiming+pipeliningTurboMap:mapping+retiming

FlowMap+retiming:separate mapping with retiming

6/22/2001 DAC 2001 Tutorial: Jason Cong 28

Experimental Results:Mapping + Retiming + Pipelining (Cont’d)

16 MCNC FSMs and ISCAS Sequential Benchmarks with 30~10,000 simple gates

206

134 139

84

3617

avg. #LUT avg. #Flipflop

TurboSYN:resynthesis+retiming+pipeliningTurboMap:mapping+retiming

FlowMap+retiming:separate mapping withretiming

Page 15: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

15

6/22/2001 DAC 2001 Tutorial: Jason Cong 29

Retiming with Initial States

Many sequential circuit have initial statesRetiming will change the initial state!Equivalent initial state computation for (backward) retiming is NP-hard

f(X)ijk

f(X)move

backward(BRT)

???

exists X, f(X)=y?NP-complete

f(X) ymove

forward(FRT)

y = f(X)guaranteeinit-state

6/22/2001 DAC 2001 Tutorial: Jason Cong 30

Conventional Approaches

Initial-state computation for a given retiming is NP-hardIteration may not find a feasible retiming solution

can find anequivalentinit-state?

originalcircuit

computea retiming

no

yes

finish

Page 16: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

16

6/22/2001 DAC 2001 Tutorial: Jason Cong 31

Optimal Mapping with Forward RetimingOptimal mapping with forward retiming (FRT) in polynomial time => guarantee initial state computation

TurboMap-frt [Cong&Wu, DAC’98]

New flow for retiming: Step 1: move FFs backward as much as possible

create large freedom for mapping+FRTStep 2: optimal mapping+FRT

clock period min. with guaranteed equivalent initial states

6/22/2001 DAC 2001 Tutorial: Jason Cong 32

Experimental Results of Optimal Mapping with Forward Retiming

18 Benchmarks with 30~10,000 simple gates10 out 18 TurboMap solutions cannot compute init-states

5.8 5.6

7.0

avg. Clock Period

TurboMap-frt:mapping+forwardretimingTurboMap:mapping+retiming

FlowMap-frt: separatemapping with forwardretiming

Page 17: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

17

6/22/2001 DAC 2001 Tutorial: Jason Cong 33

Experimental Results of Optimal Mapping with Forward Retiming

18 Benchmarks with 30~10,000 simple gates10 out 18 TurboMap solutions cannot compute init-states

92 94100

23 2415

avg. #5-LUTs avg. #Flipflops

TurboMap-frt:mapping+forwardretimingTurboMap:mapping+retiming

FlowMap-frt: separatemapping with forwardretiming

6/22/2001 DAC 2001 Tutorial: Jason Cong 34

Technology Mapping for FPGAs with Heterogeneous LUTS

Almost all recent FPGA architectures support heterogeneous LUTs

“One-size fits all” is not good enough

ExamplesXilinx XC4000

1 CLB = 2 x 4-LUTs = 1 x 5-LUT

Lucent ORCA2C1 PFU = 4 x 4-LUTs = 2 x 5-LUTs = 1 x 6-LUT

Page 18: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

18

6/22/2001 DAC 2001 Tutorial: Jason Cong 35

XC4000 Block Diagram

1 CLB = 2 x 4-LUTs = 1 x 5-LUT6/22/2001 DAC 2001 Tutorial: Jason Cong 36

1 PFU = 4 x4-LUTs

1 PFU = 2 x5-LUTs

1 PFU = 1 x6-LUT

ORCA2C Block Diagram

Page 19: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

19

6/22/2001 DAC 2001 Tutorial: Jason Cong 37

Problem Formulation

Problem

Heterogeneous LUTs have different delays and areas

Two types of heterogeneous LUT-based FPGAs

Fully configurable, no fixed ratio between different types of LUTs

Fixed combination of several different LUTs in a PLB (discussed

later using Boolean matching)

Objective:

Delay or area minimization

6/22/2001 DAC 2001 Tutorial: Jason Cong 38

Mapping for Heterogeneous FPGAs

SolutionsCompute multiple cuts at each node in the network

network-flow computationcut enumeration

Select the most appropriate LUT implementation

Depth minimizationHeteroMap [Cong&Xu, DAC’98]: Polynomial-time delay-optimal for general networks

Area minimizationOptimal for trees [Korupolu, Lee, Wong, DAC’98]

Heuristic for general networks [He&Rose, FPGA’94]

Page 20: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

20

6/22/2001 DAC 2001 Tutorial: Jason Cong 39

Comparison between FlowMap and HeteroMap onXC4000 Series FPGAs

0

0.2

0.4

0.6

0.8

1

1.2

Mapping-Delay

PostLayout-Delay

#PLB

Com

paris

on R

atio

FlowMap(5)HeteroMap(5,4)

Comparison between FlowMap and HeteroMap onXC4000 Series FPGAs

0

0.2

0.4

0.6

0.8

1

1.2

Mapping-Delay

PostLayout-Delay

#PLB

Com

paris

on R

atio

FlowMap(5)HeteroMap(5,4)

Heterogeneous v.s. Homogeneous MappingXC4000

-19%-7% +2%

Delay(4-LUT) : Delay(5-LUT) = 1 : 1.5

[Cong&Xu, DAC’98]

6/22/2001 DAC 2001 Tutorial: Jason Cong 40

Comparison between Homogeneous and Heterogeneous FPGAs

Performance Comparison between Homogeneous and Heterogeneous FPGAs

0

0.5

1

1.5

2

2.5

Mapping-Delay MemoryCell-Area

Com

paris

on R

atio

3-LUT-FPGA

4-LUT-FPGA

5-LUT-FPGA

6-LUT-FPGA

3-4-5-6-LUT-HeteroFPGA

Performance Comparison between Homogeneous and Heterogeneous FPGAs

0

0.5

1

1.5

2

2.5

Mapping-Delay MemoryCell-Area

Com

paris

on R

atio

3-LUT-FPGA

4-LUT-FPGA

5-LUT-FPGA

6-LUT-FPGA

3-4-5-6-LUT-HeteroFPGA

Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2

Page 21: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

21

6/22/2001 DAC 2001 Tutorial: Jason Cong 41

Embedded memory blocks (EMBs)

On-chip memories

Logic functions

FLEX10K Device Block Diagram

Mapping for FPGAs with Embedded Memory Blocks

6/22/2001 DAC 2001 Tutorial: Jason Cong 42

Problem Formulation

Minimize delay and/or area

Mapped Circuit

EMB

EMB

LUTLUT

LUT

LUT

Unmapped Circuit

Limited number of EMBs in one chipConfiguration flexibility of EMBs

E.g. Each EMB in FLEX10K has 2K cells and can be configured to

2Kx1, 1Kx2, 512x4, 256x8 memory

Page 22: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

22

6/22/2001 DAC 2001 Tutorial: Jason Cong 43

SolutionsEMB_Pack [Cong&Xu, FPGA’98]

Use EMBs to minimize the circuit area

Maintain the circuit delay

Post-mapping processing and pre-mapping processing

Smap [Wilton, FPGA’98]

Use EMBs to minimize the circuit area

Post-mapping processing

6/22/2001 DAC 2001 Tutorial: Jason Cong 44

Results of EMB_Pack

Comparison between CutMap [Cong&Hwang, FPGA'95] and CutMap Followed by EMB_Pack on MCNC

Benchmarks on FLEX10K Device Family

00.20.40.60.8

11.2

#LUT Layout Delay

Com

paris

on R

atio CutMap

CutMap Followedby EMB_Pack

Comparison between CutMap [Cong&Hwang, FPGA'95] and CutMap Followed by EMB_Pack on MCNC

Benchmarks on FLEX10K Device Family

00.20.40.60.8

11.2

#LUT Layout Delay

Com

paris

on R

atio CutMap

CutMap Followedby EMB_Pack

-10%

Page 23: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

23

6/22/2001 DAC 2001 Tutorial: Jason Cong 45

Boolean Matching for Complex PLBs

PLB: Programmable Logic Block

XC4K

G

FHx

f(X)

Example: given a 9-input function f off = x’1x2 + x2x’3 + x’2x3x8 + x5x6a + x’5x’7a + x4x’5x6x7 + x’5x’6x’7a + x5x’6x’7a’a = x’0x4 + x0x’4

Target: Xilinx XC4K FPGAsLUT covering + packing: 4 CLBsBoolean matching: 1 CLB

Advantage: significant area & delay reduction

6/22/2001 DAC 2001 Tutorial: Jason Cong 46

BenefitsMay have significant area and delay reduction

Difficulties : Need to perform Boolean matching

Given an arbitrary function f and a PLB,

determine if PLB can implement f .

Direct Mapping to Programmable Logic Blocks (PLBs)

Page 24: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

24

6/22/2001 DAC 2001 Tutorial: Jason Cong 47

Example: Boolean Matching for XC4K CLB

Functional decompositionf (X) = H ( F (X1) , G (X2) ),f(X) = H ( F (X1) , G (X2) , x ),f(X) = H (F(X1,x), G(X2), x ),f(X) = H (F(X1,x), G(X2,x), x ).

ConditionsF and G input sizes ≤ 4

XC4K

G

FHx

f(X)

6/22/2001 DAC 2001 Tutorial: Jason Cong 48

Boolean Matching Results-- for MCNC benchmarks

XC4K CLB can implement98% of 6-input functions

88% of 7-input functions

Circuits 5-input 6-input 7-input9sym 651 1256 2333C499 3649 9716 27599alu2 2696 6666 18231alu4 5889 14841 40332des 28875 65245 157028

Experiment: enumerate all K-input functions

Page 25: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

25

6/22/2001 DAC 2001 Tutorial: Jason Cong 49

Application to Technology Mapping(for XC4000 and XC5200 FPGAs)

Comparing to LUT mapping results, the PLB mapping obtains

for XC5200 FPGAs7% depth reduction

13% area reduction

for XC4000 FPGAs17% depth reduction

3% area increase

6/22/2001 DAC 2001 Tutorial: Jason Cong 50

Application to Architecture Evaluation(logic capability v.s. silicon area)

XC4K(0,4,3)24 Memory cells

( > 4 inputs) GH

XC4K(3/4,4,2) 28,36 Memory cells

( > 4 inputs)G

FH

3,4

XC4K CLB40 Memory cells

( > 5 inputs) G

FHH1 XC5K

24-48 Memory cells( > 4 or 5 inputs)G

F3,4,5

S

Page 26: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

26

6/22/2001 DAC 2001 Tutorial: Jason Cong 51

Architecture Evaluation(for wide function implementation)

# implementable functions / # memory cells for each type of PLB

0

1000

2000

3000

4000

B.2XC4K

(4,4,MUX) (0,4,3)XC4K

(3,4,2) (4,4,2)XC5K XC4K

5-input funcs6-input funcs

6/22/2001 DAC 2001 Tutorial: Jason Cong 52

Combined Decomposition with Mappinga

(a) Initial 5-bounded network

b c d e f g a b c d e f g

(b) Best mapping after dmig: depth 3, area 5

Page 27: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

27

6/22/2001 DAC 2001 Tutorial: Jason Cong 53

Impact of Decompositiona b c d e f g

(c) Optimal decomposition: depth 2, area 3

a

(a) Initial 5-bounded network

b c d e f g

6/22/2001 DAC 2001 Tutorial: Jason Cong 54

Problem Formulation

Structural Gate Decomposition in a W-bounded network for K-LUT mapping (W-SGD/K)Goal: find a decomposition with minimum depth after mappingThe W-SGD/K problem is NP-hard for W ≥ K ≥ 5 [Cong&Hwang, DAC96]

Page 28: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

28

6/22/2001 DAC 2001 Tutorial: Jason Cong 55

Available Solutions

Simultaneous decomposition and mapping for trees (Chortle-crf or Chortle-d algorithms)Combines bin-packing with flow computation (for computing the min height cuts): DOGMA [Cong&Hwang, DAC’96]Use a mapping graph to encode all possible (or a large class of) decompositions, and compute a mapping solution on it: SLDmap [Chen&Cong, FPGA’01]

6/22/2001 DAC 2001 Tutorial: Jason Cong 56

Mapping Graph Definition

A modified AND2/INV network to encode a set of circuit structures in a single graph [Lehman et al. ICCAD95]

choice nodes (logical equivalence)ugates (two choice nodes and fanins)cycles

Reductionunique choice nodeunique INV and AND2 nodes

Page 29: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

29

6/22/2001 DAC 2001 Tutorial: Jason Cong 57

Mapping Graph Example

A4

B

C

D

1

2

3

5 6 7 8 9

A

B

C

D

a

b

c

d

e

f g h

Z

6/22/2001 DAC 2001 Tutorial: Jason Cong 58

Mapping Graph Example

A4

B

C

D

1

2

3

5 6 7 8 9

a

b

c

d

e

f g h

Z3

c

Page 30: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

30

6/22/2001 DAC 2001 Tutorial: Jason Cong 59

Mapping Graph Example

A4

B

C

D

1

2

3

5 6 7 8 9

a

b

d

e

f g h

Z3i

7

f

3i

6/22/2001 DAC 2001 Tutorial: Jason Cong 60

Mapping Graph Example

A4

B

C

D

1

2

3i

5 6

7

8 9

a

b

d

e

f g h

Z

8

g

9

h

Page 31: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

31

6/22/2001 DAC 2001 Tutorial: Jason Cong 61

Mapping Graph Example

A4

B

C

D

1

2

3i

5 6

7a

b

d

e

f8g 9h Z

6/22/2001 DAC 2001 Tutorial: Jason Cong 62

Initial W-bounded network generationMapping graph constructionDepth optimal labelingLabel relaxation and area minimizationDecomposition selectionFixed decomposition mapping

Overview of SLDMap [Chen&Cong, FPGA’01]

Page 32: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

32

6/22/2001 DAC 2001 Tutorial: Jason Cong 63

Experimental Flow

MCNC Circuit Set, UCLA RASP package, CUDDdmig [chen et. al, IEEE Design & Test 92]

dogma [cong, DAC96]

dmig dogma sldmap

Xilinx Foundation 3.1 P&R

greedy_pack

cutmap

Initial W-bounded network

6/22/2001 DAC 2001 Tutorial: Jason Cong 64

Depth/Area Comparison

0.940.960.98

11.021.041.061.08

1.11.12

depth area

dmigdogmasldmap

Page 33: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

33

6/22/2001 DAC 2001 Tutorial: Jason Cong 65

Post-layout Delay

0

20

40

60

80

100

k2(V) 9sym(S) i3(V) x1(S) C499(3K)

dogma sldmap

ns

V:Vertex S:Spartan 3K:XC3K

6/22/2001 DAC 2001 Tutorial: Jason Cong 66

Outline

Early resultsRecent advances

Synthesis and optimization for sequential circuitsSynthesis for heterogeneous FPGAsSynthesis for FPGAs with embedded memory blocksUse of Boolean matching instead of pattern matchingCombined decomposition with mappingUCLA RASP FPGA synthesis system

New challenges

Page 34: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

34

6/22/2001 DAC 2001 Tutorial: Jason Cong 67

UCLA RASP Synthesis Systemhttp://cadlab.cs.ucla.edu

EDIFnetlist HDL design

Internal netlist

LUT MappingEngine

LUT netlist PLB MappingEngine

Vendor Specific netlistXilinx, Altera, ORCA

PlacementRouting

Chip ProgrammingInformation

6/22/2001 DAC 2001 Tutorial: Jason Cong 68

Objective 1:A Flexible and Efficient FPGA Synthesis Engine

Delay Optimal Mapping Area Optimal MappingFlowMap/HeteroMap

FlowSYNTurboMapTurboSYN

DF-mapCutMap-EMarkMap

Delay/Area Trade-offFlowMap-r

CutMapCutSyn

PLB MappingPDDmapPDDSYN

Match-4K/3KEAB-pack

Gate Decompositionfor Mapping

DMIGDOGMASLDmap

Page 35: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

35

6/22/2001 DAC 2001 Tutorial: Jason Cong 69

Objective 2:FPGA Architecture Evaluation

6/22/2001 DAC 2001 Tutorial: Jason Cong 70

Outline

1990 1995 20001998

• Simple, homogeneous LUTS• E.g. XC2K, Flex8K

• Homogeneous K-LUT mapping for depth, area min.

• Focus on combinational circuits

• Heterogeneous FPGAs• Embedded memory blocks• Complex PLBs

• Heterogeneous FPGA mapping

• Mapping for EMBs• Boolean matching• Simultaneous mapping +

retiming

Architecture

Synthesis

• Million-gate FPGAs• Field-programmable

system-on-a-chip

• Layout-driven synthesis• Use of IP blocks• Synthesis for FPSOC

Page 36: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

36

6/22/2001 DAC 2001 Tutorial: Jason Cong 71

Outline

Early resultsRecent advancesNew challenges

Integration of synthesis and layoutField-programmable system-on-a-chip

6/22/2001 DAC 2001 Tutorial: Jason Cong 72

Logic vs. Interconnect DelaysExample: Altera FPGA: (EPF8282A A-2 speed, Altera Data Book’98)

LE delay: 2.4 nsconnection between LE in same LAB: 0.5 nsconnection between LE in same row, different LAB:4.7 nsconnection between LE in different row: 7.2 ns

Page 37: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

37

6/22/2001 DAC 2001 Tutorial: Jason Cong 73

Layout-Driven SynthesisIterative design flow

construct-by-correctionneed to guarantee convergence

Concurrent design flowcorrect-by-constructionneed to handle design abstraction, constraint propagation, design refinement

Best candidate: combination of iterative and concurrent design approaches

concurrent synthesis, layout planning, and solution refinementlimited number of iterations within the same or adjacent levels to correct unacceptable estimation errors

6/22/2001 DAC 2001 Tutorial: Jason Cong 74

Layout-Driven Synthesis Flow of ADT

HDL DESIGN OR NETLIST FROM THIRD PARTY SYNTHESIS TOOL

FPGA VENDOR P&R TOOL

GLOBAL LOGIC OPTIMIZATION AND INTERCONNECT PLANNING

PLACEMENT-DRIVEN SYNTHESIS AND ARCHITECTURE EMBEDDING

Source: www.aplus-dt.comCourtesy of Aplus Design Technologies, Inc. (ADT)

Page 38: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

38

6/22/2001 DAC 2001 Tutorial: Jason Cong 75

Use of IP Blocks

Classification of IP BlocksSoftHardFirm

ChallengesIP representation and characterizationInterface with synthesis toolsIP protection

6/22/2001 DAC 2001 Tutorial: Jason Cong 76

Field-Programmable System-on-a-Chip (FPSOC)

processor

memory

ProgrammableLogic

General-Purpose FPSOC

processor

memory ProgrammableLogic

ASIC

Application-specific FPSOC

Page 39: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

39

6/22/2001 DAC 2001 Tutorial: Jason Cong 77

Design Challenges

Integration ofEmbedded operating systemsCompilersSynthesis toolsLayout tools

Need for architecture evaluationExplore and choose the best embedded FPGA architecture (for the given application domain)

6/22/2001 DAC 2001 Tutorial: Jason Cong 78

ArchEvaluator: ADT’s PLD Architecture Evaluation Tool

Evaluation of programmable logic blocksSizesConfigurations

Evaluation of on-chip hierarchyNumber of levelsSizes, configuration, and delays at each level

Evaluation of heterogeneous architecturesMultiple sizes and/or configurations of the same type of logic blocksMultiple types of logic blocksDifferent kinds of resources on the same chip

Embedded Array Configuration Array aspect-ratioSingle vs. multiple arrays …

Source: www.aplus-dt.comCourtesy of Aplus Design Technologies, Inc. (ADT)

Page 40: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

40

6/22/2001 DAC 2001 Tutorial: Jason Cong 79

Conclusions

Synthesis and technology mapping for homogeneous LUTs is a well-understood problem (some room for area min.)Recent advances in FPGA synthesis enable many new architecture innovations

Embedded memory blocksHeterogeneous LUT based FPGAsArchitectures for efficient retiming and pipelining

New FPGA synthesis tools and algorithms have toconsider layout designsupport efficient IP re-use

Field-programmable logic will be an important component of system-on-a-chip designs

6/22/2001 DAC 2001 Tutorial: Jason Cong 80

Acknowledgments

Contributions from Current and former graduate students from my group: Michael Chen (UCLA), Eugene Ding (Agere), Yean-Yow Hwang (Altera), John Peck (AMD), Chang Wu (ADT), and Songjie Xu (ADT)Other colleagues: Peichen Pan (ADT)

Supports from National Science FoundationSupport from Actel, Altera, Lucent Technologies, Quickturn, Vantis/Lattic, and Xilinx under the California MICRO Program

Page 41: PLD Synthesis Algorithmscadlab.cs.ucla.edu/~cong/slides/PLD Synthesis Algorithms_DAC LAs Vegas.pdf3 6/22/2001 DAC 2001 Tutorial: Jason Cong 5 Formulation of LUT-Based Synthesis Problems

41

6/22/2001 DAC 2001 Tutorial: Jason Cong 81

Further Information

Visithttp://cadlab.cs.ucla.edu/~conghttp://cadlab.cs.ucla.edu/projects/fpga

Updated copy of the slides of this talkSurvey/tutorial paper on FPGA synthesis

Cong and Ding, ACM TODAES, 1996

Recent research publications and software on FPGA synthesis from UCLA

6/22/2001 DAC 2001 Tutorial: Jason Cong 82

Speaker Bio

JASON CONG received his B.S. degree in computer science from Peking University in 1985, his M.S. and Ph. D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 1987 and 1990, respectively. Currently, he is a Professor and Co-Director of the VLSI CAD Laboratory in the Computer Science Department of University of California, Los Angeles. His research interests include layout synthesis and logic synthesis for high-performance low-power VLSI circuits, design and optimization of high-speed VLSI interconnects, synthesis and architecture design for FPGAs. Dr. Cong is a fellow of IEEE and serves as a consultant or advisory board member for several semiconductor or EDA companies. In 1998, Dr. Cong founded Aplus Design Technologies, Inc. (www.aplus-dt.com), which provides innovative layout-driven synthesis solutions and architecture evaluation solutions for both stand-alone FPGAs/CPLDs and embedded FPGAs for SOC designs.