the sfra: a fixed frequency fpga nweaver/papers/nweaver_thesis.pdf · pdf file1 abstract...

Click here to load reader

Post on 26-Jun-2018

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • The SFRA: A Fixed Frequency FPGA Architecture

    by

    Nicholas Croyle Weaver

    B.A. (University of California at Berkeley) 1995

    A dissertation submitted in partial satisfaction of the

    requirements for the degree of

    Doctor of Philosophy

    in

    Computer Science

    in the

    GRADUATE DIVISION

    of the

    UNIVERSITY of CALIFORNIA at BERKELEY

    Committee in charge:

    Professor John Wawrzynek, ChairProfessor John KubiatowiczProfessor Steven Brenner

    2003

  • The dissertation of Nicholas Croyle Weaver is approved:

    Chair Date

    Date

    Date

    University of California at Berkeley

    2003

  • The SFRA: A Fixed Frequency FPGA Architecture

    Copyright 2003

    by

    Nicholas Croyle Weaver

  • 1

    Abstract

    The SFRA: A Fixed Frequency FPGA Architecture

    by

    Nicholas Croyle Weaver

    Doctor of Philosophy in Computer Science

    University of California at Berkeley

    Professor John Wawrzynek, Chair

    Field Programmable Gate Arrays (FPGAs) are synchronous digital devices used to realize digital

    designs on a programmable fabric. Conventional FPGAs use design dependent clocking, so the

    resulting clock frequency is dependent on the user design and the mapping process.

    An alternative approach, a Fixed-Frequency FPGA, has an intrinsic clock rate at which

    all designs will operate after automatic or manual modification. Fixed-Frequency FPGAs offer

    significant advantages in computational throughput, throughput per unit area, and ease of integration

    as a computational coprocessor in a system on a chip. Previous Fixed-Frequency FPGAs either

    suffered from restrictive interconnects, application restrictions, or issues arising from the need for

    new toolflows.

    This thesis proposes a new interconnect topology, a Corner Turn network, which main-

    tains the placement properties and upstream toolflows of a conventional FPGA while allowing effi-

    cient, pipelined, fixed frequency operation. Global routing for this topology uses efficient, polyno-

    mial time heuristics and complete searches. Detailed routing uses channel-independent packing.

    C-slow retiming, a process of increasing the throughput by interleaving independent streams

    of execution, is used to automatically modify designs so they operate at the arrays intrinsic clock

    frequency. An additional C-slowing tool improves designs targeting conventional FPGAs to isolate

    the benefits of this transformation from those arising from fixed frequency operation. This seman-

    tic transformation can even be applied to a microprocessor, automatically creating an interleaved

    multithreaded architecture.

    Since the Corner Turn topology maintains conventional placement properties, this thesis

    defines a Fixed-Frequency FPGA architecture which is placement and tool compatible with Xil-

  • 2

    inx Virtex FPGAs. By defining the architecture, estimating the area and performance cost, and

    providing routing and retiming tools, this thesis directly compares the costs and benefits of a Fixed-

    Frequency FPGA with a commercial FPGA.

    Professor John WawrzynekDissertation Committee Chair

  • iii

    To my grandparents, who made this all possible.

  • iv

    Contents

    List of Figures ix

    List of Tables xiii

    1 Introduction 11.1 What are FPGAs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Why FPGAs as Computational Devices? . . . . . . . . . . . . . . . . . . . . . . . 41.3 Why Fixed-Frequency FPGAs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Why the SFRA? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.4.1 Why a Corner-Turn Interconnect? . . . . . . . . . . . . . . . . . . . . . . 71.4.2 Why C-slow Retiming? . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4.3 The Complete Research Toolflow . . . . . . . . . . . . . . . . . . . . . . 101.4.4 This Works Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.5 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2 Related Work 142.1 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.1.1 Unpipelined interconnect, variable frequency . . . . . . . . . . . . . . . . 152.1.2 Unpipelined Interconnect Studies . . . . . . . . . . . . . . . . . . . . . . 182.1.3 Pipelined Interconnect, Variable Frequency . . . . . . . . . . . . . . . . . 192.1.4 Pipelined, fixed frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.2 FPGA Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.1 General Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Datapath Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.4 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3 Benchmarks 253.1 AES Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Smith Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Synthetic microprocessor datapath . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 LEON Synthesized SPARC core . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

  • v

    I The SFRA Architecture 32

    4 The Corner-Turn FPGA Topology 344.1 The Corner-Turn Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5 The SFRA: A Pipelined, Corner-Turn FPGA 395.1 The SFRA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 The SFRA Toolflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    6 Retiming Registers 456.1 Input Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Output Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.3 Mixed Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.4 Deterministic Routing Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.5 Area Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.6 Retiming and the SFRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    7 The Layout of the SFRA 527.1 Layout Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    7.1.1 SRAM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.2 Output Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.1.3 Input Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.1.4 Interconnect Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    7.2 Overall Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.3 Timing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    8 Routing a Corner-Turn Interconnect 638.1 Interaction with Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648.2 The Routing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    8.2.1 Direct Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.2.2 Fanout Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.2.3 Pushrouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668.2.4 Zig-zag (2 turn) routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 678.2.5 Randomized Ripup and Reroute . . . . . . . . . . . . . . . . . . . . . . . 688.2.6 Detailed Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    8.3 Routing Quality & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.3.1 AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.3.2 Smith/Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.3.3 Synthetic Microprocessor Datapath . . . . . . . . . . . . . . . . . . . . . 728.3.4 Leon 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

  • vi

    9 Other Implications of the Corner-Turn Interconnect 769.1 Defect Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769.2 Memory Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789.3 Hardware Assisted Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799.4 Heterogeneous channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    II Retiming 81

    10 Retiming, C-slow Retiming, and Repipelining 8310.1 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8410.2 Repipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8710.3 C-slow Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8810.4 Retiming for Conventio

View more