cs184a: computer architecture (structures and...

34
1 Caltech CS184a Fall2000 -- DeHon 1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a Fall2000 -- DeHon 2 Last Time Dominance of Interconnect Simple things – and why they don’t work Characterizing Interconnect Requirements – start

Upload: others

Post on 19-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

1

Caltech CS184a Fall2000 -- DeHon 1

CS184a:Computer Architecture

(Structures and Organization)

Day12: November 1, 2000

Interconnect Requirements

and Richness

Caltech CS184a Fall2000 -- DeHon 2

Last Time

• Dominance of Interconnect

• Simple things– and why they don’t work

• Characterizing Interconnect Requirements– start

Page 2: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

2

Caltech CS184a Fall2000 -- DeHon 3

Today

• Followups from Monday (3)

• Interconnect Design Space

• Characterizing Interconnect Requirements

• Interconnect Implications

• How rich should interconnect be– specifics of understanding interconnect

– methodology for attacking these kinds ofquestions

Caltech CS184a Fall2000 -- DeHon 4

Tree Cut

• Bisection bandwidth– binary: 1

– general: log(n)

• Rent IO Cut– IO~K/2 * N

– P=1

• Difference:– include inputs

Page 3: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

3

Caltech CS184a Fall2000 -- DeHon 5

Resource Bounded Scheduling

• Last time: pointed out can get lower boundon time (upper bound on performance)

• Scheduling in general NP-hard– (find optimum)

– can approximate in O(E) time

Caltech CS184a Fall2000 -- DeHon 6

Lower Bound: Critical Path

• ASAP schedule ignoring resourceconstraints– (look at length of remaining critical path)

• Certainly cannot finish any faster than that

Page 4: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

4

Caltech CS184a Fall2000 -- DeHon 7

Lower Bound: Resource Capacity

• Sum up all capacity required per resource

• Divide by total resource (for type)

• Lower bound on remaining schedule time– (best can do is pack all use densely)

Caltech CS184a Fall2000 -- DeHon 8

Example

Critical Path

Resource Bound (2 resources)

Resource Bound (4 resources)

Page 5: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

5

Caltech CS184a Fall2000 -- DeHon 9

Example 2

RB = 8/2=4LB = 5best delay= 6

Caltech CS184a Fall2000 -- DeHon 10

Example 3

LB = 3RB = 13/2 = 7best delay = 7

Page 6: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

6

Caltech CS184a Fall2000 -- DeHon 11

Good Model?

Log-log plot ==> straight lines represent geometric growth

Caltech CS184a Fall2000 -- DeHon 12

Rent’s Rule

• Long standing empirical relationship– IO = C*NP

– 0≤P ≤1.0

– compare (F,α)-bifurcator� α= 2P

• Captures notion of locality– some signals generated and consumed locally

– reconvergent fanout

Page 7: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

7

Caltech CS184a Fall2000 -- DeHon 13

Rent and Locality

• Rent and IO capture locality– local consumption

– local fanout

Caltech CS184a Fall2000 -- DeHon 14

Resuming...

Page 8: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

8

Caltech CS184a Fall2000 -- DeHon 15

Rent’s Rule

• Typically consider– 0.5≤P ≤0.75

• “High-Speed” Logic P=0.67

• Memory (P~0.1-0.2)

• Example (i10)– max C=7, P=0.68

– avg C=5, P=0.72

Caltech CS184a Fall2000 -- DeHon 16

What tell us about design?

• Recursive bandwidth requirements innetwork

Page 9: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

9

Caltech CS184a Fall2000 -- DeHon 17

What tell us about design?

• Recursive bandwidth requirements innetwork– lower bound on resource requirements

• N.B. necessary but not sufficient conditionon network design– I.e. design must also be able to use the wires

Caltech CS184a Fall2000 -- DeHon 18

What tell us about design?

• Interconnect lengths– Intuition

• if p>0.5, everything cannot be nearest neighbor

• as p grows, so wire distances

Page 10: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

10

Caltech CS184a Fall2000 -- DeHon 19

What tell us about design?

• Interconnect lengths– IO=(n2)P cross distance n

– dIO/dn end at exactly distance n

– E(l)=Integral 0 to n=√N• of n*(dIO/dn)/n2

• assume iid sources

– E(l)=O(N(p-0.5))• p>0.5

Caltech CS184a Fall2000 -- DeHon 20

What Tell us about design?

• IO∝NP

• Bisection BW∝NP

• side length ∝NP

– N if p<0.5

• Area ∝N2p

– p>0.5

N.B. 2D VLSI world has “natural” Rent of P=0.5 (area vs. perimeter)

Page 11: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

11

Caltech CS184a Fall2000 -- DeHon 21

Rent’s Rule Caveats

• Modern “systems” on a chip -- likely tocontain subcomponents of varying Rentcomplexity

• Less I/O at certain “natural” boundaries

• System close– (Rent’s Rule apply to workstation, PC, PDA?)

Caltech CS184a Fall2000 -- DeHon 22

Area/Wire Length

• Bad news– Area ~ O(N2p)

• faster than N

– Avg. Wire Length ~ O(N(p-0.5))• grows with N

• Can designers/CAD control p (locality)once appreciate its effects?

• I.e. maybe this cost changes designstyle/criteria so we mitigate effects?

Page 12: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

12

Caltech CS184a Fall2000 -- DeHon 23

What Rent didn’t tell us

• Bisection bandwidth purely geometrical

• No constraint for delay– I.e. a partition may leave critical path weaving

between halves

Caltech CS184a Fall2000 -- DeHon 24

Critical Path and Bisection

Minimum cut may cross critical path multiple times.Minimizing long wires in critical path => increase cut size.

Page 13: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

13

Caltech CS184a Fall2000 -- DeHon 25

Rent Weakness

• Not account for path topology

• ? Can we define a “Temporal” Rent whichtakes into consideration?– Promising research topic

Caltech CS184a Fall2000 -- DeHon 26

Administrative Interlude

• …won’t catchup today + lots more stuff

• No Class Wed 11/8

• Can we meet Friday 11/10?

• Homework 3+4 graded

• P/F– (reluctantly) …if you must

– must attempt all (>90%) problems to getpassing grade

Page 14: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

14

Caltech CS184a Fall2000 -- DeHon 27

Interconnect Richness

Caltech CS184a Fall2000 -- DeHon 28

Now What?

• There is structure (locality)

• Rent characterizes locality

• How rich should interconnect be?– Allow full utilization?

– Model requirements and area impact

Page 15: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

15

Caltech CS184a Fall2000 -- DeHon 29

Step 1: Build Architecture Model

• Assume geometric growth

• Pick parameters: Build architecture cantune– F, C

� α, p

Caltech CS184a Fall2000 -- DeHon 30

Tree of Meshes

• Tree

• Restricted internalbandwidth

• Can match to model

Page 16: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

16

Caltech CS184a Fall2000 -- DeHon 31

Parameterize C

Caltech CS184a Fall2000 -- DeHon 32

Parameterize Growth

(2 1)* => α=√2

(2 2 1)* => α=(2*2)(1/3) =2(2/3)

(2 2 2 1)* =>α=2(3/4)

Page 17: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

17

Caltech CS184a Fall2000 -- DeHon 33

Wednesday classstopped here

Caltech CS184a Fall2000 -- DeHon 34

Step 2: Area Model

• Need to know effect of architectureparameters on area (costs)– focus on dominant components

• wires

• switches

• logic blocks(?)

Page 18: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

18

Caltech CS184a Fall2000 -- DeHon 35

Area Parameters

• Alogic = 40Κλ2

• Asw = 2.5Κλ2

• Wire Pitch = 8λ

Caltech CS184a Fall2000 -- DeHon 36

Switchbox Population

• Full population is excessive (next week?)

• Hypothesis: linear population adequate– still to be (dis)proven

Page 19: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

19

Caltech CS184a Fall2000 -- DeHon 37

“Cartoon” VLSI Area Model

(Example artificially small for clarity)

Caltech CS184a Fall2000 -- DeHon 38

Larger “Cartoon”

1024 LUT Network

P=0.67

LUT Area 3%

Page 20: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

20

Caltech CS184a Fall2000 -- DeHon 39

Effects of P (α) on Area

P=0.5 P=0.67 P=0.75

1024 LUT Area Comparison

Caltech CS184a Fall2000 -- DeHon 40

Effects of P on Capacity

Page 21: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

21

Caltech CS184a Fall2000 -- DeHon 41

Step 3: Characterize ApplicationRequirements

• Identify representative applications.– Today: IWLS93 logic benchmarks

• How much structure there?

• How much variation among applications?

Caltech CS184a Fall2000 -- DeHon 42

Application Requirements

Max: C=7, P=0.68 Avg: C=5, P=0.72

Page 22: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

22

Caltech CS184a Fall2000 -- DeHon 43

Benchmark Wide

Caltech CS184a Fall2000 -- DeHon 44

Benchmark Parameters

Page 23: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

23

Caltech CS184a Fall2000 -- DeHon 45

Complication

• Interconnect requirements vary amongapplications

• Interconnect richness has large effect onarea

• What is effect of architecture/applicationmismatch?– Interconnect too rich?

– Interconnect too poor?

Caltech CS184a Fall2000 -- DeHon 46

Interconnect Mismatch in Theory

Page 24: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

24

Caltech CS184a Fall2000 -- DeHon 47

Step 4: Assess Resource Impact

• Map designs to parameterized architecture

• Identify architectural resource required

Compare: mapping to k-LUTs; LUT count vs. k.

Caltech CS184a Fall2000 -- DeHon 48

Mapping to Fixed Wire Schedule

• Easy if need lesswires than Net

• If need morewires than net,must depopulateto meetinterconnectlimitations.

Page 25: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

25

Caltech CS184a Fall2000 -- DeHon 49

Mapping to Fixed-WS

• Better results if“reassociate”rather thankeeping originalsubtrees.

Caltech CS184a Fall2000 -- DeHon 50

Observation

• Don’t really want a “bisection” of LUTs– subtree filled to capacity by either of

• LUTs

• root bandwidth

– May be profitable to cut at some place otherthan midpoint

• not require “balance” condition

– “Bisection” should account for both LUT andwiring limitations

Page 26: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

26

Caltech CS184a Fall2000 -- DeHon 51

Challenge

• Not know where to cut design into– not knowing when wires will limit subtree

capacity

Caltech CS184a Fall2000 -- DeHon 52

Brute Force Solution

• Explore all cuts– start with all LUTs in group

– consider “all” balances

– try cut

– recurse

Page 27: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

27

Caltech CS184a Fall2000 -- DeHon 53

Brute Force

• Too expensive

• Exponential work

• …viable if solving same subproblems

Caltech CS184a Fall2000 -- DeHon 54

Simplification

• Single linear ordering

• Partitions = pick split point on ordering

• Reduce to finding cost of [start,end] ranges(subtrees) within linear ordering

• Only n2 such subproblems

• Can solve with dynamic programming

Page 28: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

28

Caltech CS184a Fall2000 -- DeHon 55

Dynamic Programming

• Start with base set ofsize 1

• Compute all splits ofsize n, from solutionsto all problems of sizen-1 or smaller

• Done when computewhere to split 0,N-1

Caltech CS184a Fall2000 -- DeHon 56

Dynamic Programming

• Just one possible “heuristic” solution to thisproblem– not optimal

– dependent on ordering

– sacrifices ability to reorder on splits to avoidexponential problem size

• Opportunity to find a better solution here...

Page 29: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

29

Caltech CS184a Fall2000 -- DeHon 57

Ordering LUTs

• Another problem– lay out gates in 1D line

– minimize sum of squared wire length• tend to cluster connected gates together

– Is solvable mathematically for optimal• Eigenvector of connectivity matrix

• Use this 1D ordering for our linear ordering

Caltech CS184a Fall2000 -- DeHon 58

Mapping Results

Page 30: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

30

Caltech CS184a Fall2000 -- DeHon 59

Step 5: Apply Area Model

• Assess impact of resource results

Caltech CS184a Fall2000 -- DeHon 60

Resources × Area Model ⇒ AreaResources × Area Model ⇒ AreaResources × Area Model ⇒ AreaResources × Area Model ⇒ Area

Page 31: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

31

Caltech CS184a Fall2000 -- DeHon 61

Net Area

Caltech CS184a Fall2000 -- DeHon 62

Picking Network Design Point

Don’t optimize for 100% compute util. (100% yield) …also don’t optimize for highest peak.

Page 32: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

32

Caltech CS184a Fall2000 -- DeHon 63

What about a single design?

Caltech CS184a Fall2000 -- DeHon 64

LUT Utilization predict Area?

Singledesign

Page 33: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

33

Caltech CS184a Fall2000 -- DeHon 65

Methodology

• Architecture model (parameterized)

• Cost model

• Important task characteristics

• Mapping Algorithm– Map to determine resources

• Apply cost model

• Digest results– find optimum (multiple?)

– understand conflicts (avoidable?)

Caltech CS184a Fall2000 -- DeHon 66

Big Ideas[MSB Ideas]

• Rent’s rule characterize locality

• => Area growth O(N2p)

• p>0.5 => interconnect growing faster thancompute elements– expect interconnect to dominate other resources

Page 34: CS184a: Computer Architecture (Structures and Organization)andre/courses/CS184/slides/day12_2u… · Day12: November 1, 2000 Interconnect Requirements and Richness Caltech CS184a

34

Caltech CS184a Fall2000 -- DeHon 67

Big Ideas[MSB Ideas]

• Interconnect area dominates logic area

• Interconnect requirements vary– among designs

– within a single design

• To minimize area– focus on using dominant resource

(interconnect)

– may underuse non-dominant resources (LUTs)

Caltech CS184a Fall2000 -- DeHon 68

Big Ideas[MSB Ideas]

• Two different resources here– compute, interconnect

• Balance of resources required varies amongdesigns (even within designs)

• Cannot expect full utilization of everyresource

• Most area-efficient designs may waste somecompute resources (cheaper resource)