the sfra: a fixed frequency fpga nweaver/papers/nweaver_thesis.pdf · pdf file1 abstract...
Post on 26-Jun-2018
215 views
Embed Size (px)
TRANSCRIPT
The SFRA: A Fixed Frequency FPGA Architecture
by
Nicholas Croyle Weaver
B.A. (University of California at Berkeley) 1995
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Computer Science
in the
GRADUATE DIVISION
of the
UNIVERSITY of CALIFORNIA at BERKELEY
Committee in charge:
Professor John Wawrzynek, ChairProfessor John KubiatowiczProfessor Steven Brenner
2003
The dissertation of Nicholas Croyle Weaver is approved:
Chair Date
Date
Date
University of California at Berkeley
2003
The SFRA: A Fixed Frequency FPGA Architecture
Copyright 2003
by
Nicholas Croyle Weaver
1
Abstract
The SFRA: A Fixed Frequency FPGA Architecture
by
Nicholas Croyle Weaver
Doctor of Philosophy in Computer Science
University of California at Berkeley
Professor John Wawrzynek, Chair
Field Programmable Gate Arrays (FPGAs) are synchronous digital devices used to realize digital
designs on a programmable fabric. Conventional FPGAs use design dependent clocking, so the
resulting clock frequency is dependent on the user design and the mapping process.
An alternative approach, a Fixed-Frequency FPGA, has an intrinsic clock rate at which
all designs will operate after automatic or manual modification. Fixed-Frequency FPGAs offer
significant advantages in computational throughput, throughput per unit area, and ease of integration
as a computational coprocessor in a system on a chip. Previous Fixed-Frequency FPGAs either
suffered from restrictive interconnects, application restrictions, or issues arising from the need for
new toolflows.
This thesis proposes a new interconnect topology, a Corner Turn network, which main-
tains the placement properties and upstream toolflows of a conventional FPGA while allowing effi-
cient, pipelined, fixed frequency operation. Global routing for this topology uses efficient, polyno-
mial time heuristics and complete searches. Detailed routing uses channel-independent packing.
C-slow retiming, a process of increasing the throughput by interleaving independent streams
of execution, is used to automatically modify designs so they operate at the arrays intrinsic clock
frequency. An additional C-slowing tool improves designs targeting conventional FPGAs to isolate
the benefits of this transformation from those arising from fixed frequency operation. This seman-
tic transformation can even be applied to a microprocessor, automatically creating an interleaved
multithreaded architecture.
Since the Corner Turn topology maintains conventional placement properties, this thesis
defines a Fixed-Frequency FPGA architecture which is placement and tool compatible with Xil-
2
inx Virtex FPGAs. By defining the architecture, estimating the area and performance cost, and
providing routing and retiming tools, this thesis directly compares the costs and benefits of a Fixed-
Frequency FPGA with a commercial FPGA.
Professor John WawrzynekDissertation Committee Chair
iii
To my grandparents, who made this all possible.
iv
Contents
List of Figures ix
List of Tables xiii
1 Introduction 11.1 What are FPGAs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Why FPGAs as Computational Devices? . . . . . . . . . . . . . . . . . . . . . . . 41.3 Why Fixed-Frequency FPGAs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Why the SFRA? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Why a Corner-Turn Interconnect? . . . . . . . . . . . . . . . . . . . . . . 71.4.2 Why C-slow Retiming? . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4.3 The Complete Research Toolflow . . . . . . . . . . . . . . . . . . . . . . 101.4.4 This Works Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work 142.1 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Unpipelined interconnect, variable frequency . . . . . . . . . . . . . . . . 152.1.2 Unpipelined Interconnect Studies . . . . . . . . . . . . . . . . . . . . . . 182.1.3 Pipelined Interconnect, Variable Frequency . . . . . . . . . . . . . . . . . 192.1.4 Pipelined, fixed frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 FPGA Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.1 General Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Datapath Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.4 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Benchmarks 253.1 AES Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Smith Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Synthetic microprocessor datapath . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 LEON Synthesized SPARC core . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
I The SFRA Architecture 32
4 The Corner-Turn FPGA Topology 344.1 The Corner-Turn Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 The SFRA: A Pipelined, Corner-Turn FPGA 395.1 The SFRA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 The SFRA Toolflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Retiming Registers 456.1 Input Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Output Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.3 Mixed Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.4 Deterministic Routing Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.5 Area Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.6 Retiming and the SFRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7 The Layout of the SFRA 527.1 Layout Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.1.1 SRAM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.2 Output Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.1.3 Input Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.1.4 Interconnect Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2 Overall Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.3 Timing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8 Routing a Corner-Turn Interconnect 638.1 Interaction with Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648.2 The Routing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.2.1 Direct Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.2.2 Fanout Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.2.3 Pushrouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668.2.4 Zig-zag (2 turn) routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 678.2.5 Randomized Ripup and Reroute . . . . . . . . . . . . . . . . . . . . . . . 688.2.6 Detailed Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.3 Routing Quality & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.3.1 AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.3.2 Smith/Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.3.3 Synthetic Microprocessor Datapath . . . . . . . . . . . . . . . . . . . . . 728.3.4 Leon 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
vi
9 Other Implications of the Corner-Turn Interconnect 769.1 Defect Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769.2 Memory Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789.3 Hardware Assisted Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799.4 Heterogeneous channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
II Retiming 81
10 Retiming, C-slow Retiming, and Repipelining 8310.1 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8410.2 Repipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8710.3 C-slow Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8810.4 Retiming for Conventio