design and implementation of three-dimensional logic

Design and Implementation of Three-Dimensional

Logic Structures

by

Shamik Das

Submitted to the Department of Electrical Engineering and ComputerScience

in Partial Fulfillment of the Requirements for the Degrees of

Bachelor of Science in Electrical Science and Engineering

and

Master of Engineering in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2000

© Shamik Das, MM. All rights reserved.

The author hereby grants to MIT permission to reproduce anddistribute publicly paper and electronic copies of this thesis document

in whole or in part, and to grant others the right to do so.MASSACHUSETTS INSTITUTE

OF TECHNOLOGY

JUL 2 7 2000

Author .......................... ..................... .LIBRARIESDepartment of Electrical Engineering and Computer Science

May 22, 2000

C ertified by ...... ... .......................SV Joseph Jacobson

Associate-Professor, Media Arts and Sciences- :;jhis S3ervisor

Accepted by.......... . ........Art hu+r ni.t

Chairman, Department Committee on Graduate Students

Design and Implementation of Three-Dimensional Logic

Structures

by

Shamik Das

Submitted to the Department of Electrical Engineering and Computer Scienceon May 22, 2000, in Partial Fulfillment of the

Requirements for the Degrees ofBachelor of Science in Electrical Science and Engineering

andMaster of Engineering in Electrical Engineering and Computer Science

Abstract

In this thesis, a computer-aided-design (CAD) system is developed that assists in thedesign of novel three-dimensional integrated circuits. The software tools allow for thespecification of a multilayer transistor circuit by means that are readily accessibleto those familiar with two-dimensional CMOS VLSI design. This software systemprovides desirable features such as SPICE circuit extraction and the ability to producethe design formats necessary for automated fabrication (e.g. mask specifications forlithography or Gerber data for inkjet printing). Finally, in this thesis, the softwaretools are used to design a ring oscillator, a 3-D static RAM, and a 3-D cellularautomata machine.

Thesis Supervisor: Joseph JacobsonTitle: Associate Professor, Media Arts and Sciences

2

Acknowledgments

I am grateful to many people for their support in the development of this thesis.

My thesis advisor, Joe Jacobson, deserves thanks for his guidance and motivation, as

well as for many helpful discussions about the research. Babak Nivi, Colin Bulthaup,

and Eric Wilhelm were instrumental in fabricating test structures from the software-

produced specifications. I also appreciate the many TFT discussions with Babak,

Colin, and Brent Ridley, as these were important for shaping the form the circuit-

design process was to take. Saul Griffith and Sawyer Fuller deserve thanks for their

input on laser and inkjet patterning of functional materials.

In addition, this thesis would not have been completed without the support of

many friends, brothers, and loved ones. I would especially like to thank my family

- my parents, Dilip and Mala, and my sister, Alina - for their inspiration, direction,

and support.

3

Contents

1 Introduction

1.1 Design of the Layout Software . . . . .

1.2 Implementation of Test Circuits . . . .

1.2.1 Ring Oscillator . . . . . . . . .

1.2.2 Static Random-Access Memory

1.2.3 Cellular Automata Machine

2 FluidLayout - The Layout Software

2.1 Overall Considerations . . . . . . . . .

2.2 Implementation . . . . . . . . . . . . .

2.2.1 2-D Slice Manipulation . . . . .

2.2.2 Circuit Partitioniing . . . . . .

2.3 Circuit Verification . . . . . . . . . . .

2.4 Circuit Fabrication . . . . . . . . . . .

2.5 Design Walk-Through . . . . . . . . .

3 Some Basic Transistor Circuits

3.1 Minimum Criteria for the Technology

3.2 Design of Basic Circuits . . . . . . .

4 The Static Random-Access Memory

4.1 Background and Motivation . . . . .

4.2 SRAM operation . . . . . . . . . . .

4

8

10

12

12

13

14

16

17

20

21

24

26

27

28

31

31

34

42

43

44

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

4.3 Extensions to 3-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4 Layout of a 3-D SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 The Cellular-Automata Machine 55

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.1 Finite-State Machines . . . . . . . . . . . . . . . . . . . . . . 55

5.1.2 Cellular-Automata Machines . . . . . . . . . . . . . . . . . . . 57

5.1.3 The Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Layout of a 3-D Game of Life . . . . . . . . . . . . . . . . . . . . . . 60

5.2.1 Game of Life Cell . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.2 Game of Life CAM Architecture . . . . . . . . . . . . . . . . . 62

6 Conclusion 67

A FluidLayout User's Guide 69

A .1 O verview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.2 Basic Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.3 Higher-Level Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.3.1 Node Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.3.2 Translation, Rotation, and Reflection . . . . . . . . . . . . . . 72

A.4 Circuit-Level Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.4.1 Circuit Traversal . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.4.2 Cell Hierarchy Management . . . . . . . . . . . . . . . . . . . 74

A.4.3 Magic Importation . . . . . . . . . . . . . . . . . . . . . . . . 75

A.4.4 Circuit Netlist Extraction . . . . . . . . . . . . . . . . . . . . 75

A.4.5 VLSI/MEMS Fabrication . . . . . . . . . . . . . . . . . . . . 75

A.5 Step-By-Step Design Walk-Through . . . . . . . . . . . . . . . . . . . 78

A.6 Summary of Useful Commands . . . . . . . . . . . . . . . . . . . . . 88

5

List of Figures

2-1 CLayer object with embedded CRectangle objects. . . . . . . . . . . 22

2-2 Corner-stitched CRectangle object. . . . . . . . . . . . . . . . . . . . 23

2-3 Area enumeration of CRectangle objects within a bounding rectangle. 24

2-4 Canonical technology used in FluidLayout. . . . . . . . . . . . . . . . 26

2-5 Box-outlining is used to place materials in FluidLayout. . . . . . . . . 28

2-6 Complete NMOS pulldown path. . . . . . . . . . . . . . . . . . . . . 29

2-7 Placement of a metal2-+gate via results in a gate-+metal2 hint on the

second layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2-8 The com plete inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3-1 NM O S inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3-2 NMOS inverter small-signal model about VM. . . . . . . . . . . . . . 33

3-3 Layout of test devices. . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3-4 3-D layout of a ring oscillator. The three inverters shown are stacked

to form the 3-D layout. . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3-5 Stamp pattern with spatially-separated material patterns. . . . . . . 39

3-6 Patterned gate for a NOR gate, SRAM cell, and ring oscillator. . . . 40

3-7 Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator. 41

4-1 Six-transistor circuit for individual bit storage. . . . . . . . . . . . . . 45

4-2 Eight-transistor circuit for use in 3-D SRAM. . . . . . . . . . . . . . 47

4-3 Proper cell distribution improves aspect ratio and decreases bit-line

length........ .................................... 48

4-4 3-D partitioning of rows allows for simple tri-stating of word lines. . . 50

6

4-5 First and second layers of an eight-layer 3-D SRAM . . . . . . . . . . 52

4-6 6-T SRAM cell layout. . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4-7 Word-line tri-stating. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4-8 Bit-line decoding using the word-line tri-state control signal. . . . . . 54

5-1 Finite state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5-2 Turing machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5-3 Four cells of a cellular-automata machine. . . . . . . . . . . . . . . . 58

5-4 Insertion sort bit-slice. . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5-5 G am e of Life cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5-6 Clock distribution in the 3-D Game of Life architecture . . . . . . . . 64

5-7 Layer of a 3-D Game of Life architecture comprising a 4 x 4 array of

cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6

A-1 FluidLayout screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . 70

A-2 The main FluidLayout toolbar. . . . . . . . . . . . . . . . . . . . . . 71

A-3 Partial view showing node selection . . . . . . . . . . . . . . . . . . . 72

A-4 Label dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A-5 Subcircuit rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A-6 Cell hierarchy management toolbar. . . . . . . . . . . . . . . . . . . . 74

A-7 Edit->Properties->Laser Setup ..... ................... 76

A-8 4 x 2 box used for the source of an NMOS transistor. . . . . . . . . . 79

A-9 Completed source node of the NMOS transistor. . . . . . . . . . . . . 79

A-10 NMOS source and drain nodes. . . . . . . . . . . . . . . . . . . . . . 79

A-11 Inverter source and drain nodes. . . . . . . . . . . . . . . . . . . . . . 80

A-12 Complete inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A-13 Window menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A-14 First-layer inverter with contact pads. . . . . . . . . . . . . . . . . . . 83

A-15 First-layer inverter with via stacks to the second layer. . . . . . . . . 84

A-16 Labeled first-layer inverter. . . . . . . . . . . . . . . . . . . . . . . . . 85

A-17File->Export menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7

Chapter 1

Introduction

CMOS integrated circuits are traditionally fabricated on crystalline silicon wafers.

Transistor structures are created on the surface of these wafers by implantation into

the wafer and growth and deposition of material over the surface of the wafer. This

fundamentally results in a two-dimensional circuit layout, as the transistors are con-

fined to inhabit the boundary of the silicon substrate.

However, a number of advances in solid-state technology have made possible

the development of more complicated three-dimensional transistor structures. All

of these advances rely on the creation of two-dimensional circuit "layers" by stan-

dard means, and then interconnecting these layers into a multi-layer structure. For

example, the development of wafer-scale integration (WSI) allows for the creation of

three-dimensional circuits by stacking wafers using a lift-off and bonding process [24].

Also, the advent of silicon-on-insulator (SOI) technology allows for a third dimension

of circuitry by encapsulating an existing two-dimensional circuit with insulating ma-

terial, planarizing this material, and placing the next layer of silicon on this insulator.

A very promising path to multi-layer transistor circuits involves the use of solution-

processed metals, semiconductors, and insulators. Transistors can be laid out by

depositing the appropriate solutions onto an insulating surface, followed by curing to

produce the desired materials [17, 16]. This approach has the advantage that it does

not require integration on the wafer scale (which itself requires novel means of wafer

verification and packaging) and is theoretically extensible to thousands of layers.

8

Having multiple layers in which to fabricate transistors gives the circuit designer

the potential to improve the efficiency of circuits in terms of area, power, and speed.

The savings in area are two-fold: first, by utilizing the third dimension as space for

additional circuitry, integrated circuits can be made more dense without expanding

the "footprint" of the circuit and without having to improve the process technology.

This form of area improvement is best for memory devices such as SRAM, and also

good for DRAM and EEPROM, where the goal is to fit as many bits of memory as

possible into a given chip. Specifically, the use of n active layers allows for an n-fold

improvement in the storage capacity of a memory chip, with little area overhead in

the control circuitry.

The other approach to area-savings lies in the retargetting of two-dimensional

circuit layouts for a three-dimensional process technology. Theoretical results in-

dicate that for many interesting 2-D circuit layouts, there exist corresponding 3-D

layouts that are more efficient in terms of area (by which, in the three-dimensional

context, we mean the aggregate area of all layers of the circuit) and maximum wire-

run (the longest length of wire between any two active nodes). For example, the

n-point Fast Fourier Transform (FFT) network can be implemented in area O(n3/ 2)

with maximum wire-run O(ni/2 ) with three dimensions of active circuitry, while in

a standard 2-D process, the same circuit would occupy area Q(n 2 ) with maximum

wire-run Q(n/ log n) [18]. A hypercube network of n nodes, used in many parallel-

processing schemes, can also be implemented in area O(n 3/2 ) using three dimensions,

but requires area Q(n 2) using the standard two dimensions [11]. Finally, results in

[10] indicate that any n-device circuit that can be laid out in area A in two dimensions

can be laid out in area approximately (nA)1/ 2 using three.

The savings in power may be realized by reducing the switched capacitance in

circuits. By reducing the lengths of interconnect, the capacitance of internal nodes

can be reduced, thus reducing the dynamic power dissipation of the circuit. For

example, a potentially important savings can be realized in the layout of H-trees,

which as indicated by [18] can be laid out with maximum wire-run O(ni/3) in three

dimensions but require wire runs of Q(n'/ 2 / log n) in two dimensions. Since clock

9

distribution nets are often realized as H-trees, the potential exists to save power by

utilizing the more efficient distribution architectures available with three dimensions.

Finally, as circuits get more and more complicated, lengths of interconnect will

affect timing characteristics. Reducing the wire run of circuits will reduce the charge

and discharge times on these wires and enable faster operation of circuits.

There are many reasons to develop the technology to fabricate three-dimensional

CMOS devices. While developing such technology is beyond the scope of this thesis,

it is important to realize that the ability to design logic with this technology must be

developed simultaneously. Therefore, in this thesis, software tools are developed that

are used to target circuit designs for a three-dimensional MOS process that has been

developed contemporaneously. These tools are then used to synthesize some circuits

that demonstrate the viability of the layout tools and some of the benefits of the new

medium.

1.1 Design of the Layout Software

Digital system design is usually done at three levels: behavioral, structural, and phys-

ical [23]. At the behavioral level, a digital system is specified by what it is intended

to do; at the structural level, by what functional building blocks (e.g. gates, adders,

registers, IP cores) are to be used; and at the physical level, by what construction

materials are to be used and in what geometry they are to be configured. Corre-

sponding to this division are several layers of abstraction - architectural, register-

transfer-language (RTL), logical, and circuit layers - at which the designer can work

[231.

In a typical design flow, the designer will often work through both of these chains in

parallel; for example, he or she may start by developing a behavioral and architectural

specification for a system and proceed to flesh out the implementation details down

to the circuit layer and at the physical level. Much of the task of fleshing out the

details of a system is done with computer-aided design (CAD) tools. The goal of any

suite of CAD tools for digital system design is to produce a working circuit, that is,

10

to specify fully a working design at the physical level and at the lowest abstraction

layer.

Typically, the design flow can be broken up into two phases - technology-independent

design and technology-dependent design. For example, the process of behavioral speci-

fication is ideally technology-independent, whereas physical design at the circuit level

is clearly technology-dependent. Each phase has an associated set of algorithms that

are generally implemented in separate CAD tools. The process of targetting an ab-

stract system design for a particular technology is called technology mapping, and can

be done by a third set of CAD tools or as a final- or initial-stage operation of the two

phases of design.

In order to maximize the usefulness of any new technology, design tools must be

developed that allow both for the use of new features of the technology and for the

seamless integration of the technology with existing means of technology-independent

design. In this thesis, the focus is on the technology-dependent phase of design. CAD

tools are developed that allow the designer to work at the physical and structural

levels at arbitrary levels of abstraction. The emphasis is twofold: first, familiar

graphical user interfaces (GUIs) are adapted for use in working in a three-dimensional

environment; second, the CAD tool allows for maximal use of the new features of the

this environment.

Specifically, a CAD tool for 3-D circuit layouts is designed using the open-source

Magic VLSI layout system as a basis [15, 12]. The primary features in Magic that

will be relied upon are the speed of the central algorithms and the familiarity of the

GUI. Magic uses a geometric representation of the physical layout of the system that

is based upon a scheme devised by Mead and Conway [13].

At the physical level, the design approach used for this thesis is to design each

layer of a multi-layer circuit as a distinct two-dimensional circuit. What this means

is that from a physical perspective, each individual layer of a multi-layer circuit

looks like a traditional two-dimensional circuit; therefore, the layout of each layer can

theoretically be done using available CAD tools. In fact, 3-D circuit design using

this approach is the subject of ongoing research, where a small number of layers is

11

considered [25]. However, the use of existing CAD tools becomes infeasible as the

number of layers becomes large. Thus, in this thesis, a CAD system is developed that

integrates the familiarity of two-dimensional circuit design with a means of managing

large numbers of layers and a direct means of wiring between the layers.

1.2 Implementation of Test Circuits

The viability of the layout software is best tested by using it to implement various test

circuits. These circuits should be chosen both to exhibit features of the medium and

to exhibit useful properties of the software. To this end, the layout of three circuits

has been carried out using the new software: a ring oscillator, a static random-access

memory (SRAM), and a simple cellular automata machine (CAM).

1.2.1 Ring Oscillator

In the development of any new technology in which circuits are to be fabricated, the

ring oscillator is a fundamental circuit in that it is the simplest circuit to demonstrate

the ability to cascade logic gates. That is, in any such technology, while the first goal

is always to fabricate individual transistors, it does not necessarily follow that these

transistors can be fashioned into a suitable multi-transistor logic gate. An individual

logic gate must provide gain from the input to the output, or else when the gates are

cascaded, the signals eventually decay to an ambiguous logic level [5].

A ring oscillator consists of an odd number of inverters cascaded in series into

a ring. Once the ring oscillator is powered, any latent signal is able to propagate

through the ring; this signal is inverted as it passes through each gate. When the

signal returns to its starting point, it returns as the inverse of the original signal since

there is an odd number of gates in the ring. So if the voltage at any particular node

of the circuit is observed as a function of time, the result is an oscillation with period

equal to twice the transit time of the signal through the ring.

Since there is an odd number of inverters in the ring, the circuit acts to provide

negative feedback on the signal. Oscillations are produced if feedback loop is unstable.

12

Thus, if the inverters that make up the ring do not provide sufficient gain (i.e. if the

loop gain never exceeds 1), the signal stabilizes to an ambiguous logic level midway

between the supply voltage and ground. In particular, it is desirable for the individual

transistors to have as large a transconductance, g-, as possible, where gm is measured

at the voltage corresponding to this ambiguous logic level. Having a sufficient gm

produces the necessary gain to drive the output signals away from ambiguous logic

levels and towards the voltage extremes (i.e. low or high voltage). Since g. is directly

proportional to the mobility, p, of carriers in the transistor channel, having a large

mobility is desired. However, it is possible to overcome the absence of large mobilities

to some extent, because of other factors on which gm is dependent. For example,

one may either increase the width-to-length ratio of the transistor channel, decrease

the thickness of the gate oxide, or increase the supply voltage (thereby increasing the

midpoint voltage where g, is measured). Further, in any technology, the proper choice

of pullup (for an n-channel technology) or pulldown (for a p-channel technology)

can reduce the dependency on device parameters. Nonetheless, the efficacy of these

maneuvers is limited, due to circuit-area constraints, device-breakdown limits, and

second-order effects. Also, in a complementary technology, which is most desirable

due to power considerations, having high-quality basic device parameters is essential.

Since the ring oscillator is a planar circuit, there exists a "natural" two-dimensional

layout for the circuit at the transistor level. However, having a third dimension

presents the opportunity to examine some new layout strategies.

1.2.2 Static Random-Access Memory

One immediate application of a viable three-dimensional integration technology is in

memories. The density of memory arises directly from the ability to pack as many

homogeneous cells into a given chip as possible. So the availability of multiple layers

in a chip allows for a direct approach to increasing density - simply stacking 2-D

memory circuits into a 3-D chip gives the desired increase in density.

This approach has been implemented at the system level by physically stacking

chips and using off-chip circuitry to control the chip-enable signals and I/O [4]. This

13

is, of course, not extensible to arbitrarily many layers. However, the same approach

can be taken at the chip level by internally wiring control signals to each layer of the

circuit and wiring the data lines together in the same way that the data pins have

been soldered together.

In this thesis, a simple 3-D static random-access memory (SRAM) is designed and

laid out using the CAD software. SRAM is chosen for several reasons. First, it can be

fabricated using MOS technology and does not require special transistor structures.

While other memories such as EEPROM and DRAM may see a greater push for in-

creased density, these memories require dedicated fabrication technologies. SRAM,

on the other hand, can be fabricated in a standard logic technology. Secondly, while

read-only memories (ROMs) can also be fabricated using only standard MOS transis-

tors, the need for high density is more prevalent in systems with writable storage. So

for this thesis, an SRAM is designed that exhibits writable and retrievable storage and

uses multiple layers of active material while using a standard two-dimensional pin-

out. Such an SRAM can therefore be made into a drop-in replacement for currently

available SRAM.

1.2.3 Cellular Automata Machine

While digital system design at the architecture level ideally is done without the fab-

rication technology in mind, the limitations of the technology inevitably play a role

in the selection of a computational architecture. For example, in designing a multi-

processing architecture, physical constraints to two dimensions lead to architectural

constrains in terms of the number of processors that can be imbedded in a given area

[11].

In particular, design choices are often driven by the problem to be solved by the

system. There are many computational problems that can be described efficiently

using certain physical architectures and thus solved by modelling the architecture

by a digital system. For example, single-instruction multiple-data (SIMD) architec-

tures, and in particular cellular automata, have been shown to effectively model the

(inherently three-dimensional) dynamics of many problems in physics [22, 8].

14

Additionally, it has been shown that there exist cellular automata machine (CAM)

architectures that can do general-purpose computations, i.e. that are equivalent to a

Universal Turing Machine. The CAM architecture therefore can serve as a potential

alternative to the architectures found in traditional computer processors [20].

For CAMs that are designed to model 3-D physical processes, it becomes infeasible

to map the 3-D CAM into a two-dimensional circuit as the number of cells becomes

large - the cost of interconnect becomes prohibitive. However, with a true three-

dimensional technology, the mapping of the CAM architecture to an integrated circuit

is direct, thereby allowing the physical construction of machines that are impossible

to integrate onto a single 2-D chip.

Therefore, in this thesis, the software tools are used to design a simple 3-D CAM.

One of the simplest cellular automata machines to exhibit interesting global behaviors

is the Game of Life, devised by John Conway in 1970 [20]. The Game of Life is

specified for a two-dimensional architecture, but can be extended to three dimensions

[1, 2, 3]. It describes the cells as having one of two states (either "alive" or "dead"),

with the state of any given cell on the next cycle of the game being determined by

the states of its neighbors on the current cycle. The behavior of the machine as a

whole can thus be observed by visually inspecting the cells ("alive" being indicated

by a color or dot). Since a circuit that simulates the Game of Life can be readily

verified, such a circuit is designed in this thesis.

15

Chapter 2

FluidLayout - The Layout Software

There is a clear potential for circuit design innovation above and beyond what is

possible with a two-dimensional fabrication technology. Since all known routes to

three-dimensional circuit fabrication involve the construction of multiple layers of

two-dimensional circuits, with inter-layer interconnect done by vias, it is tempting

to propose that the design of three-dimensional circuits be carried out using existing

software tools for each two-dimensional layer of the circuit. This approach provides

the fastest route to working 3-D prototype circuits.

However, there are several drawbacks to this approach that will limit severely its

usability for designing complex 3-D circuits. First, as each layer of the circuit must be

managed as a separate design, the management overhead increases with the number

of layers. The designer is responsible for keeping track of interconnect between each

pair of layers. While this may be feasible for a fixed number of layers, it becomes

intractable for an arbitrary number of layers. Second, the automation of system-

level tasks such as circuit netlist extraction and mask generation becomes difficult or

impossible without additional software scripts or programs.

A better approach is to design CAD software with integrated support for designing

three-dimensional circuits. From a system perspective, software with this capability

can provide the designer both with needed assistance in 3-D design management

and with useful system-level design tools. Simultaneously, the software can utilize

algorithms written for two-dimensional circuit design, as the individual operations

16

that will be carried out by the designer are the same in both 2-D and 3-D design.

In this thesis, such a software tool, named FluidLayout owing to the solution-

processing fabrication technology being used, has been delevoped. FluidLayout pro-

vides designers of 3-D circuits an integrated environment for laying out all layers of

a circuit and for verification and fabrication of three-dimensional layouts.

2.1 Overall Considerations

Much as circuit fabrication has been limited to constructing two-dimensional devices,

circuit design has been fraught with limitations imposed by two-dimensional design

methodologies. Traditional pen-and-paper circuit design, for example, requires par-

titioning a 3-D circuit into 2-D layers, either by using separate sheets of paper or by

spatially separating the layers on a single sheet. In the case of circuit fabrication, the

gains introduced with a third dimension justify the expense of developing the fabri-

cation technology. However, in the case of circuit design, it is better to make optimal

use of familiar design techniques rather than impose new design methodologies with

associated learning curves. In addition, the costs of implementing a truly 3-D user

interface are prohibitive.

In FluidLayout, therefore, three-dimensional circuit layout is done by managing

an arbitrarily large set of individual two-dimensional layers. The layout of each layer

is performed in the same manner as a two-dimensional layout would be performed

in many existing software packages. The consequences of this design decision are

twofold. First, efficient algorithms for 2-D layout have already been developed and

the corresponding source code may be reused. Thus, the core layout manipulation

routines do not have to be redeveloped. Second, it is desirable to implement many

whole-circuit algorithms such as netlist extraction and mask generation. These algo-

rithms can be extended from 2-D to 3-D while maintaining their efficiency in terms

of order-of-growth as a function of the number of transistors in the design.

There are two stated goals to be achieved with FluidLayout. First, the user should

be able to manage a true three-dimensional circuit with an arbitrary number of layers.

17

Second, the design process for individual layers should be familiar to designers of

two-dimensional VLSI circuits. These goals have been taken into consideration at all

stages of the software design process of FluidLayout.

For example, in typical two-dimensional design formats, the representation of a

VLSI layout is encoded as a list of material regions. Each region contains data that

identifies the type of material (e.g. polysilicon) and the boundary coordinates of the

material (e.g. the corners of a rectangle). The VLSI layout may then be stored as a

file containing a list of regions.

Thus, if an existing software tool uses this format as its native format, extension

of this software for use in designing 3-D circuits becomes difficult; there is no means

for differentiating the regions in the data file with respect to their locations along

the third dimension if the coordinates used are 2-D coordinates. However, there are

several remedies of varying efficacy.

The first is to differentiate the material types by layer. For example, polysilicon on

the first layer of transistors might be assigned material type poly-1 while polysilicon

on the second layer might be assigned type poly-2. Since many software packages

have support for adding or changing material types, this approach is straightforward

to implement. However, the approach also has several drawbacks. For example, the

user must define material types for each layer of transistors, a process that becomes

tedious as the number of layers grows. Second, since the user interface is not 3-D-

aware (i.e. not cognizant of the fact that poly-2 corresponds to a different layer than

poly-1), all transistor layers will be displayed simultaneously. While this may be

acceptable for two layers of transistors, it becomes unmanageable for more.

Another approach to handling 3-D circuits in existing software packages is to

manage each third-dimension layer as a separate circuit. Small helper programs

may be written to perform the inter-layer registration, circuit netlist extraction, and

preparation of fabrication-ready output. While this approach is more sophisticated

than the previous approach, it has the drawback that the user has to run multiple

programs in order to obtain a working circuit; each program will have its associated

learning curve.

18

By contrast, FluidLayout organizes the material regions by layer. Rather than

store a VLSI layout as a collection of rectangles, FluidLayout stores a collection of

layers, where each layer is stored as a collection of rectangles. This is equivalent to

packing a collection of 2-D layout files into a single meta-file, and in fact, FluidLayout

has the means to import two-dimensional circuits (created in a traditional software

package) as individual layers of a three-dimensional circuit.

This approach has advantages over the others. First, a circuit in FluidLayout

may contain arbitrarily many layers. FluidLayout provides easy means to add layers

to a circuit while at the same time obviating the task of managing as many files

as there are layers. Second, since FluidLayout is aware of the three-dimensionality

of its circuits, the user interface can display the individual layers separately while

simultaneously being able to indicate inter-layer interconnections.

This display interface is another area of FluidLayout where careful consideration

was made of the 3-D nature of the circuits. The user must be able to manage all

the layers of a circuit without having to view them simultaneously. Similarly, when

laying out circuits, it must be clear both to the user and to FluidLayout as to which

layer is the target of the user's instructions.

Two approaches to solving these problems were considered. One way is to allow

the user to set a "visible range" of materials, where the materials are ordered according

to their physical order in the technology. For example, the user might wish to view

all materials between the gate on layer 7 and metal 2 on layer 9. This allows the

user to manage the entire circuit within a single document window, and also permits

the user to view as little as one material or as much as the entire circuit all at once.

However, there are several issues with this design. For example, if the user elects to

view a range of materials that spans more than one layer, there is potential ambiguity

when the user decides to place certain materials. For example, if the visible range

encompasses gate material on layers 2 and 3, and the user wants to place a new gate,

it is not clear to the interface whether this is a 2nd-layer gate or a 3rd-layer gate. A

similar manifestation of this problem is that in the same situation, it is difficult to

distinguish visually the two gate layers that are being viewed simultaneously. The

19

only corrective means is to restrict the visible range to at most one material of any

given type. However, this then prohibits the user from viewing and editing different

layers of the circuit simultaneously.

The second approach is to maintain a separate view window for each layer of the

circuit. Each view may then be treated exactly as a two-dimensional circuit. The only

additional function to be performed is to manage the inter-layer interconnect, which

can be done by message-passing between the views. This approach has the drawback

that the view windows number as many as the layers, meaning that simultaneous

editing of more than a few layers is prohibitively complicated. However, each layer

may be edited without ambiguity, and with the assumption that the user will not

want to edit more than three or so layers at a time, this option becomes the more

desirable choice for implementation.

2.2 Implementation

FluidLayout was written in Microsoft Visual C++ (version 5.0) for the Microsoft

Windows operating systems.

In the graphical user interface (GUI), a three-dimensional integrated circuit is

represented as an ordered set of 2-D slices. Each slice may be manipulated as an

individual 2-D circuit. The representation of a slice in the GUI is the traditional

mask representation, i.e. a top-down viewpoint with metals and semiconductor rep-

resented as colored rectangular paths on a Manhattan grid. Thus, the manipulation

of individual slices should be familiar to those experienced in 2-D integrated-circuit

design.

A circuit layout is stored internally in FluidLayout as a CCellDef object. A

CCellDef object contains several 2-D circuit slices implemented as sets of CLayer

objects. Each CLayer object consists of a set of CRectangle objects that represent

the 2-D circuit slice materials. Additionally, a circuit layout may contain discrete

layouts within it as subcells of the layout; these are referenced via CRectangle objects

of type CELL. Thus, the designer can maintain a hierarchy of CCellDef objects that

20

represents designs at various levels of integration, and any given CCellDef object

may be used as a subcell of another CCellDef object.

The CCellDef methods are mainly used for editing the circuit layout. CCellDef

has methods for adding rectangles and subcells to the layout and erasing rectan-

gles and subcells from the layout. There are also methods for producing copies of

individual slices with or without the subcell contents flattened into the slice.

Most of the layout manipulation is done within the Clayer object.

2.2.1 2-D Slice Manipulation

The design interface for an individual layer is modelled after that of the Magic VLSI

CAD system [15, 12]. Magic is a geometric box-painting tool that has algorithms for

interpreting box paintings as integrated-circuit layouts. A circuit layout is represented

as a set of colored rectangles in a 2-D coordinate system; the core algorithm in Magic

is thus an efficient means of rectangle manipulation [14].

Within FluidLayout, each 2-D slice of a 3-D circuit is represented as a collection

of CLayer objects. The different mask layers for each slice of a circuit (e.g. gate,

source/drain, metals, vias) are partitioned among different CLayer objects, with one

CLayer for all metals and semiconductor, one CLayer for each type of via, and one

CLayer for each type of subcell. This partitioning is done to maximize the efficiency

of top-level algorithms such as rendering the layout in the GUI and extracting the

circuit netlist, while not consuming excessive amounts of memory in overhead.

The CLayer object, depicted in Figure 2-1, contains a pointer to a CRectangle

object. Each CRectangle object contains its integer coordinates, its material type

(e.g. gate, source/drain, semiconductor, or via) and pointers to CRectangle objects

adjacent at the upper-right and lower-left corners. Thus, a CLayer object may be

thought of as a collection of disjoint rectangles that tiles a plane. Further, there are

efficient algorithms to traverse the plane from any particular starting point to a given

finish point and to iterate through all rectangles within a bounding rectangle [14].

These algorithms have been implemented in the Magic source code [15, 12] and are

readily implemented in FluidLayout.

21

CLayerobject

U

U

Figure 2-1: CLayer object with embedded CRectangle objects.

22

I

top

right

left

bottom

Figure 2-2: Corner-stitched CRectangle object.

Specifically, FluidLayout represents a VLSI layout as a set of corner-stitched rect-

angles, as discussed in [14]. Figure 2-2 illustrates the corner-stitching of a rectangle.

This corner-stitching allows for linear-time searching of a CLayer object and linear-

time area enumeration.

To do this, each CRectangle object has a GotoPoint method. Given a CRectangle

R and a destination point in the CLayer containing R, GotoPoint follows the top

and bottom pointers to reach the desired ordinate and then follows the left and

right pointers to reach the desired abscissa. Since following left and right may

cause deviation from the desired ordinate, this procedure must be iterated until the

rectangle at the destination point is found. However, since the stitched objects are

convex, the algorithm is guaranteed to terminate [14].

Similarly, each CLayer object has a Paint method derived from [14]. Paint al-

lows the caller to paint a rectangular region of the layer. All CRectangle objects that

intersect this rectangular region are clipped against it, and the material types of the

resulting pieces are adjusted to perform the painting. The enumeration of rectangles

within the clipping region is done in linear time by the following procedure: by follow-

ing down and right pointers from the upper-leftmost rectangle, the rectangles along

23

l

2 3

I 1

6

5

I 4 7

8 9

Figure 2-3: Area enumeration of CRectangle objects within a bounding rectangle.

the left edge of the clipping rectangle may be identified. For each of these, horizontal

swaths of the clipping region may be enumerated by following right pointers. A

sample area enumeration is shown in Figure 2-3.

The relevant VLSI algorithms can be expressed in terms of searches and area enu-

merations and are thus carried out efficiently in FluidLayout. For example, placement

of a wire is done by selecting the rectangular areas where metal is desired and calling

the Paint method on those areas. Viewing the circuit in the GUI is done by enu-

merating the rectangles within the view rectangle. For each enumeration, Windows

drawing methods are called to render the rectangle.

All that remains for implementation is the interconnection of distinct 2-D layers

into a 3-D circuit.

2.2.2 Circuit Partitioniing

There are many possible approaches to the problem of partitioning 3-D circuits among

2-D slices. From a design tools standpoint, the crux of the issue is that in 2-D design,

24

new fabrication technologies require extensive modification to the technology support

in the software. For example, suppose that a 2-D design package has support for a two-

poly, three-metal CMOS process. If the technology for four metal layers is developed,

the design package must be modified to accommodate the new metal layer, both in

the internal representation and in the GUI. While the internal representation may

be readily modifiable or may already support arbitrary technologies, modifying or

extending the GUI is nontrivial, and in fact, 2-D design packages generally do not

have support in the GUI for arbitrarily many material layers.

On the other hand, FluidLayout can support arbitrarily many material layers

without modification of the 2-D slice object structure and without extensions to the

GUI. This is accomplished by assigning a finite number of material layers to each 2-D

slice and alloying the number of slices to vary arbitrarily. Each 2-D slice is assigned

gate, source/drain, and semiconductor material as well as two metal layers and all

inter-layer vias. The slice is also provided with vias from the top metal layer to the

gate layer of the next slice, thus forming the inter-slice interconnect. This material

set is necessary and sufficient to create arbitrary circuits within an individual 2-D

slice.

Arbitrary technology mappings can then be implemented by ignoring portions of

slices as necessary. For example, a 3-D process with six interconnect layers per slice

may be implemented by pairing adjacent CLayer objects and ignoring the semiconduc-

tor material on the second CLayer of each pair. Also, a 2-D process with arbitrarily

many metal layers can be implemented by ignoring all semiconductor material except

on the lowest slice.

The canonical technology in FluidLayout is centered around a bottom-gate thin-

film transistor structure [21]. Each layer of transistors is thus represented as a set of

TFTs along with two metal layers. This results in the technology shown in Figure 2-4.

With the implementation framework thus described, FluidLayout is able to per-

form various system-level procedures, such as circuit verification through netlist ex-

traction and circuit fabrication.

25

gate (next layer)

q metal2

6 metallsemiconductor

~Tsource/drain

gate

Figure 2-4: Canonical technology used in FluidLayout.

2.3 Circuit Verification

One of the main advantages of having CAD software with integrated 3-D capabilities

is that the software can perform tasks such as circuit netlist extraction. FluidLayout

is able to extract connectivity information from 3-D integrated-circuit layouts.

In order to perform this netlist extraction, FluidLayout separates a given layout

into planes, where each plane contains a given material (gate, source/drain, metall,

or metal2) and the vias that connect that material to the next higher material along

the third dimension. Then, FluidLayout enumerates all the rectangles in each plane,

starting with the lowest. Each rectangle is checked to see if it belongs to a previously-

defined electrical node. FluidLayout then checks adjacencies to determine if two

nodes have been assigned to a single wire, and if so, merges the nodes. Finally, if

the rectangle is not assigned to a node, and is not adjacent to a node, a new node is

created. Further, if the rectangle is part of a via, the corresponding rectangle on the

next plane up is marked and added to the node. This allows the extraction procedure

to maintain electrical connectivity along the third dimension.

In FluidLayout, a CWire object is used to identify a node. Each CWire object

contains a list of pointers to the CRectangle objects associated with the node. Each

CRectangle object has a generic pointer that is used in netlist extraction to point to

the CWire object to which the rectangle belongs. Thus, when the area enumeration

is complete, all electrical nodes in the circuit have been identified, and each rectangle

is able to identify the node to which it belongs.

Once this is complete, FluidLayout extracts the semiconductor planes from the

26

layout. The semiconductor rectangles are enumerated, and surrounding gate and

source/drain rectangles are identified. Each valid combination of gate, source, and

drain is used to construct a CTransistor object that identifies the electrical nodes for

the gate, source, and drain and the width and length of the transistor channel. The

list of CTransistor objects is then written to a text file using the standard SPICE

format for MOSFETs.

2.4 Circuit Fabrication

One motivation for writing FluidLayout is to have the ability to support any fabri-

cation technologies that emerge in the laboratory. For example, it is desirable to be

able to target designs for an inkjet nanoparticle MEMS process [7] or a VLSI/MEMS

liquid embossing process [16], without having to modify the layout.

Support in FluidLayout for these facilities is provided by integrating methods that

handle the circuit extraction into the document class. For each fabrication process,

different file formats must be exported to support the fabrication. For example, the

target inkjet process is a 3-D gantry system that is computer-controlled and requires

G-code. The process prints materials in the same way that an inkjet printer prints

rectangles, so in FluidLayout, there are methods to extract the separate material

layers and raster the individual rectangles. Similarly, the liquid embossing process

uses an elastomeric stamp to separate a film of solution into desired and undesired

regions. A rectangular wire is thus created using a stamp whose raised surface is

the outline of the rectangle. When pressed onto a uniform film, the stamp then

drives away liquid corresponding to the wire outline. These stamps are created from

wafers, which are created using lithographic masks. These masks are specified using

the GDSII binary file format, so FluidLayout has methods for converting a set of

rectangles to their outlines and writing these outline rectangles to GDSII binary

output.

27

Place source/drain metal

Figure 2-5: Box-outlining is used to place materials in FluidLayout.

2.5 Design Walk-Through

To demonstrate the use and capabilities of the FluidLayout software, FluidLayout

is used here to lay out an inverter. A more complete walk-through is available in

Appendix A.

Figure 2-5 shows the use of box outlines, drawn by left-clicking at the lower left

and right clicking at the upper right, to place materials in the layout.

By using the materials toolbar, layout of an n-channel MOSFET is completed on

the first layer, as shown in Figure 2-6.

Interconnect to layer 2 is done through a via from the top metal on layer 1 (i.e.,

metal2) to the gate metal on layer2. As shown in Figure 2-7, the via is placed on

layer 1. FluidLayout marks a corresponding via on the second layer, as can be seen

in the view window for the second layer.

A p-channel MOSFET is laid out on the second layer, and the complete inverter

is shown in Figure 2-8.

To verify that the layout corresponds to an inverter, the SPICE deck extraction

28

Figure 2-6: Complete NMOS pulldown path.

Ready |X:14, Y: 20

Figure 2-7: Placement of a metal2-+gate via results in a gate-+metal2 hint on the

second layer.

29

Figure 2-8: The complete inverter.

feature is used. The feature size is set to 5 microns per unit grid length (lambda).

The following is the circuit netlist produced:

***** C:\WINNT\Profiles\shamikd\Desktop\inverter. sp

***** Created by FluidLayout

***** Created on 5/1/2000

M1 2 3 GND! 0 NTFT W=15u L=10u

M2 2 3 Vdd! Vdd! PTFT W=15u L=10u

This SPICE deck may then be used for verification of the layout.

A more comprehensive guide to using FluidLayout can be found in Appendix A.

As shown here, FluidLayout is a useful software CAD system for laying out and

fabricating 3-D circuits. These capabilities will now be demonstrated with several

test circuits.

30

Chapter 3

Some Basic Transistor Circuits

The immediate application of FluidLayout is in targetting simple, commonly-known

circuits for an emerging three-dimensional fabrication technology. This allows both

for testing the functionality of FluidLayout and for exploring the viability of the

technology.

3.1 Minimum Criteria for the Technology

A new transistor technology is viable for computation only if suitable multi-transistor

devices can be fabricated. In particular, it is possible for transistors to provide non-

linear input-output behavior, yet still be unsuitable for multi-transistor circuitry.

There are several criteria that need to be met. These criteria are evaluated within

the context of the metal-insulator-semiconductor field-effect technology discussed in

[17, 16].

Consider, for example, the NMOS inverter in Figure 3-1. The desired function of

this inverter is to take the signal represented by in and produce the logical negation

of that signal at out. Represented using voltages, therefore, if V1 is below VM,

then V0 t should be above VM, and vice versa, where VM is some midpoint voltage

between Vdd and ground. To verify that this circuit produces the desired behavior,

the characteristics of the individual transistors must be examined.

31

Vdd

(W/L) 1

out

in (W/L) 2

Figure 3-1: NMOS inverter.

The governing I - V relationships for the n-channel FET are given as follows:

ID ( Vn~ins (L) ( GS V DSn DS

for VGS - VTn VDS (the linear regime) and

ID,SAT = I AnCins (+) (VGS - Tn ) 2 (1 + AnVDS)

for VGS - VTn < VDS (the saturation regime), where pn is the field-effect mobility, Cin,

is the gate insulator capacitance, W/L is the transistor channel width-to-length ratio,

VGS is the gate-source voltage, VTn is the threshold voltage, VDS is the drain-source

voltage, and An is the channel-length modulation parameter [9].

It is clear, then, that this type of circuit produces the desired operation: for a

low input voltage, transistor 2 is turned off and transistor 1 pulls the output voltage

high, and for a high input voltage, transistor 2 is turned on, thus pulling the output

voltage low, provided transistor 2 is stronger than transistor 1. Transistor 2 is thus

called a pulldown, while transistor 1 is called a pullup.

As important as functionality, however, is the ability of the inverter to restore

logic levels. That is, while a 0 is represented ideally by 0 volts and a 1 is represented

ideally by Vdd, in actuality, this is not necessarily the case. However, a functioning

32

9 M1 Voutr 0

0~ vf r 0 v

Figure 3-2: NMOS inverter small-signal model about VM.

logic gate should to some extent recognize and accommodate these deviations from

ideality. Consider, for example, a series of cascaded inverters, each of which outputs

0 volts for an input of Vdd and vice versa. Suppose the input to the first is slightly less

than VXd, say Vdd - AVi. The output of this inverter will then be greater than 0 volts

by some amount, say AVst. If the inverter restores logic levels, then AV,0 t < AV".

If this is not the case, then as the signal passes through each inverter, the deviation

from the ideal will increase until the signal stabilizes at VM.

Level restoration follows if the gain of the inverter at VM is greater than one in

magnitude. Consider an inverter whose output is Vdd minus the input. Then the

gain at VM is identically -1, and deviations in the input are reflected exactly in the

output.

The gain of the inverter in Figure 3-1 at the midpoint voltage VM can be de-

termined by examining the small-signal model of this inverter, shown in Figure 3-2.

From the model, it follows that the gain of the inverter is

yout _ m2ro

Vin 2 + 9mlTo

33

where gm is the small-signal transconductance, defined as gm = 2 pCims (EL) ID,SAT,

and r, is the small-signal output resistance, defined as r, = (AID,SATY 1 [9]. Thus,

having a large transconductange and a large output resistance is critical to the per-

formance of the inverter: consider the cases where gmir, is small or is large. If the

product gmiro is small compared to 1, then the gain is essentially -gm2ro/2, which is

also small. If, on the other hand, gmiro is sufficiently large, then the gain is approx-

imately -gm2/gm1 = g. The transistor sizing is thus dictated by the need for

high gain.

The technology described in [17, 16] features transconductances on the order of

10-5 S and output resistances on the order of 106Q for a device with W = 292.5 pm

and L = 2 pm at a VM of about 10 volts. This indicates that a gain of greater than

unity is achievable with device sizes that currently can be fabricated.

3.2 Design of Basic Circuits

In order to test both the fabrication technology and the capabilities of FluidLayout,

some simple circuits are laid out using FluidLayout, and fabrication-ready output is

produced. The layout used here consists of an inverter, a NOR gate, a basic static

memory cell, and two ring oscillators.

The inverter implementation used in this layout is that in Figure 3-1. The device

sizes used are channel length of 10 pm, a channel width of 200 pm for the pullup

FET, and a channel width of 1200 pm for the pulldown FET. Using the above device

parameters, this should provide a gain of approximately -1.5.

The NOR gate uses two pulldown transistors wired in parallel. Each is identical

in size to the inverter pulldown.

The memory cell, discussed in detail in Chapter 4, uses a pair of coupled inverters

together with two access transistors.

Finally, the two ring oscillators are different layouts of the same circuit. This

circuit comprises three inverters wired in a series loop. Provided that the inverters

have sufficient gain, a latent signal on an input to one of the inverters is amplified to

34

Figure 3-3: Layout of test devices.

Vdd or to ground as it passes through the inverters. Further, when the signal returns

to its starting point, it does so as its inverse, so that the signal oscillates when viewed

at any fixed point. However, if the inverters do not have sufficient gain, the signal

will decay to the midpoint voltage, VM. Thus, a ring oscillator is an ideal circuit to

test intrinsic device parameters.

Further, it is possible to examine different layout strategies with this multi-gate

circuit. In particular, the second ring oscillator, though electrically identical to the

first, is partitioned along the third dimension into three layers, with one inverter on

each layer. Power and ground signals are distributed through vias to the upper layers,

and the return signal from the output of the last inverter to the input of the first

inverter travels through a via stack located near the outputs.

35

- a

Bif

Figure 3-4: 3-D layout of a ring oscillator. The three inverters shown are stacked to

form the 3-D layout.

36

.M

The first layer of the layout of the entire structure is shown in Figure 3-3. Figure 3-

4 shows the 3-D layout of the ring oscillator, including the 2nd and 3rd layers of the

layout.

In order to verify the functionality of the devices in this structure, the SPICE

netlist is extracted using FluidLayout.1

***** C:\WINNT\Profiles\shamikd\Desktop\Maskl.sp


***** Created on 5/1/2000

*** inverter

M1 invout inv_in GND! 0 NTFT W=1200u L=10uM34 invout Vdd! Vdd! 0 NTFT W=200u L=10u

*** NOR gate

M2 norout norA GND! 0 NTFT W=1200u L=10u

M3 norout norB GND! 0 NTFT W=1200u L=10uM35 norout Vdd! Vdd! 0 NTFT W=200u L=10u

*** SRAM cell (4,7 are internal bit storage nodes)M4 7 sramWL sramBL 0 NTFT W=200u L=10u

M13 sramBLBAR sramWL 4 0 NTFT W=200u L=10u

M5 7 4 GND! 0 NTFT W=1150u L=10u

M6 GND! 7 4 0 NTFT W=1150u L=10u

M36 7 Vdd! Vdd! 0 NTFT W=150u L=10u

M37 Vdd! Vdd! 4 0 NTFT W=150u L=10u

*** 2-D ring oscillator (three inverters)

M14 6 ring GND! 0 NTFT W=1200u L=10uM38 6 Vdd! Vdd! 0 NTFT W=200u L=10uM15 5 6 GND! 0 NTFT W=1200u L=10u

M39 5 Vdd! Vdd! 0 NTFT W=200u L=10u

M16 ring 5 GND! 0 NTFT W=1200u L=10uM40 ring Vdd! Vdd! 0 NTFT W=200u L=10u

'This SPICE deck has been edited for clarity. For example, with typical circuit layouts, thearea enumeration algorithm results in all the n-channel devices grouped together and all the p-channel devices grouped together. The SPICE deck shown here has the transistors grouped byfunction. Also, since one of the features of the technology is the ability to route gate across sourceor drain, the netlist extraction will sometimes output a transistor as a parallel combination of two ormore smaller transistors. This will allow for more accurate capacitance modelling once the relevantparameters have been obtained from the technology. However, in the SPICE deck shown here,parallel transistors have been merged.

37

*** 3-D ring oscillator

M29 3 ring_3D GND! 0 NTFT W=1200u L=10uM41 3 Vdd! Vdd! 0 NTFT W=200u L=10u

M42 2 3 GND! 0 NTFT W=1200u L=10uM47 2 Vdd! Vdd! 0 NTFT W=200u L=10u

M48 ring_3D 2 GND! 0 NTFT W=1200u L=10u

M53 ring-3D Vdd! Vdd! 0 NTFT W=200u L=10u

Functional simulation can then be performed using this netlist to verify the per-

formance of the circuits.

The circuit can now be extracted to output that can be used for fabrication. The

process in [17, 16] uses elastomeric liquid embossing to form circuit patterns. Each

material layer (gate, source/drain, etc.) is patterned using a unique part of the stamp.

This stamp is created using a wafer as a mold. Thus, FluidLayout is used to extract

the circuit to a GDSII binary stream that can be used to fabricate a wafer mask.

This mask pattern is shown in Figure 3-5.

From this mask, an elastomeric stamp is created. This stamp is used to pattern

solution-processed materials, which are then cured. The resulting structures form the

gate, source/drain, semiconductor, and interconnect for the circuits. For example,

Figure 3-6 shows the gate metal for the NOR gate, SRAM cell, and 2-D ring oscillator.

Figure 3-7 shows the source/drain metal for these circuits. Thus, FluidLayout is useful

both for laying out circuit structures and for fabrication of these circuits.

For the remainder of this thesis, some designs will be examined that utilize more of

the potential of FluidLayout. In particular, the focus is no longer on rapid prototyping

of three-dimensional integrated circuits, but instead on exploring the architectural

ramifications of being able to lay out circuits in 3-D.

38

layer 3 pte-via

Ilayer 2

somreMrain-vialayer 2 metal layer 2nuttl- ayvi2 nmtti2 lUavr 2 -> lApr 3 via

e-pelayer 2 gate-via

i

layer 1sourceldrain-via

layer 1 gate

IM-

Ilayer 1 metal l

TI1 i1'

I iL I

layer I zstul 1 -ia layer 1 retal 2

ser % ukto r

-layer I - layer.12 via

I

layer 1 gate-via

Ilayer 1 source/dain

layer 1 n-typesmnicnductor

layer 1 p-typesemicondutor

Figure 3-5: Stamp pattern with spatially-separated material patterns.

39

I111- t

layer 2 gate

liarige 3 -ylaye r 13 a= eem llayer 3 gatelayer 3 p-typesermicqndutor

-. CI

C

a

BFl

V

iiH

0

a

47

4

I

4

Figure 3-6: Patterned gate for a NOR gate, SRAM cell, and ring oscillator.

40

-S0

4

1.r

AI -

0 4

* K ~

~' It ~____ a

'I--k-a-Iji

: "79e* 1:~ .0:

- K 1'

-'ISi

K'I--, _p

-Yr

Figure 3-7: Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator.

41

2

41

VZY

Chapter 4

The Static Random-Access

Memory

The technology for fabricating true three-dimensional integrated circuits is very much

a new technology; it has yet to approximate tried-and-true 2-D fabrication in terms

of feature size and circuit speed. However, preliminary research ([17, 16]) suggests

that the fabrication of multilayer transistor circuits with transistor properties rivaling

that of conventional silicon MOSFETs is neither unreasonable as a research goal, nor

unrealistic as a commercial technology within the near future. This is one of the main

reasons that FluidLayout has been engineered to handle complex three-dimensional

integrated circuits while at the same time providing the file formats necessary for

rapid prototyping of basic circuits.

In order to demonstrate the capabilities of the FluidLayout software as well as

the benefits of the three-dimensional medium, FluidLayout has been used to design

a multilayer static random-access memory (SRAM). This implementation, shown in

Figure 4-5, is capable of statically storing 512 bits of data in a chip whose footprint

is the same as that of a 64-bit 2-D SRAM. Further, read/write power dissipation for

this chip is approximately that of the 64 bit 2-D SRAM as well.

It is helpful to examine this 3-D SRAM in the context of its 2-D counterpart.

42

4.1 Background and Motivation

A static random-access memory must provide certain features by virtue of its name.

Memory signifies that the circuit must have a read operation, and optionally, a write

operation, by which a user may store data via a write (for writable memories) and

retrieve the same data later, via a read. The memory may therefore be read-only

(ROM) or read-write.

Memories are further characterized by their access mechanism. In a multibit

memory, the storage location of a particular bit or group of bits may be identified by an

address. The access mechanism may restrict the user to data retrieval from sequential

addresses; this is implemented in circuits such as the FIFO (first-in, first-out) and

the shift register. Alternatively, access to random addresses may be permitted. This

random-access memory generally requires more sophisticated control circuitry.

Finally, memories may be categorized according to the permanence of their stor-

age. Non-volatile memories do not require external power to retain their contents.

For example, erasable, programmable read-only memories (EPROMs) implement the

write mechanism using special circuitry and/or voltages that are not accessible during

the normal read operation of the circuit. A specific, commonplace example of this

type of memory is the Flash ROM. By contrast, volatile memories lose the contents of

memory if the power supply is removed. Volatile memories may be further categorized

as static or dynamic depending on the mode of storage. In static memories, such as

SRAM, the circuit actively reinforces the value of the bit stored in memory. In order

to overwrite a bit, the write circuitry must be able to overcome the static protection

on the old bit in memory. Conversely, in dynamic memories, such as DRAM, bits are

stored using the capacitance of internal nodes (or using an explicit capacitor for each

bit). No circuitry protects the stored value; thus, the memory may be overwritten by

charging or discharging the capacitor. Since capacitors tend to discharge naturally

through leakage, dynamic memories must be refreshed periodically to prevent loss of

data. This refresh introduces an overhead that may be recouped due to the increased

storage density of bits in DRAM versus bits in SRAM.

43

Of the various types of memory, writable memories have seen the largest push for

increased density. There are multiple reasons for this: first, as computers become

faster and more sophisticated, the size of desirable computations grows, and thus

the need for computer memory grows also; increased DRAM density allows one to

increase the amount of main memory available to a computer, and increased SRAM

density allows computer architects to increase the amount of memory cache available

to a processor. Second, the advent of digital media has created a similar push in

the consumer goods market, as digital cameras store pictures on Flash ROMs, and

users of personal computers transfer media from laptops to palmtop organizers and

portable audio players using any number of memory-card interfaces.

Of these three types of memory (Flash ROM, DRAM, and SRAM), Flash ROM

and DRAM place the most emphasis on bit density, and thus would benefit most

from the area improvements that a three-dimensional process would bring. However,

both Flash ROM and DRAM utilize special fabrication processes. In the case of

Flash ROM, special transistor structures are used to provide the non-volatility, while

in DRAM, explicit capacitors are often used for the bit storage mechanism. On the

other hand, SRAM may be fabricated using the same technology as is used for logic,

which is why all embedded memory has been in the form of SRAM until quite recently.

It is for this reason that the layout of a three-dimensional SRAM is explored.

SRAM benefits from the third dimension as much as DRAM or Flash memory would;

and while the density of 3-D DRAM would exceed that of 3-D SRAM, the fabrication

technology for 3-D SRAM is expected to be more readily available.

4.2 SRAM operation

In SRAM of any dimensionality, the basic storage mechanism remains the same. The

core of bit storage in the SRAM is the coupled-inverter cell shown in Figure 4-1.

The bit may be thought of as being stored at node Q. Thus, in this circuit, the

coupled inverters provide positive feedback on the stored bit, ensuring that the bit

is not erased or overwritten unless so desired. As long as power is supplied to the

44

BL BL

WL

M1 M2

Figure 4-1: Six-transistor circuit for individual bit storage.

inverters, the state of Q and Q is maintained; this provides the static functionality of

the memory cell [23].

Access to the cell contents is provided through access FETs MI and M2. To read

the cell contents, the word line WL is pulled high, turning on Mi and M2. The bit

lines BL and BL are then pulled to Q and Q by Mi and M2 respectively.

To write to the cell, the desired bit and its inverse are asserted on BL and BL

respectively. WL is then pulled high. Ml and M2 must then be able to reverse the

state of Q and Q in order to write to the cell.

Since each inverter comprises two transistors, the cell as a whole is referred to as a

6-T (six-transistor) cell. The cell may be compressed to five transistors by removing

MI or M2; however, this requires very careful design of the cell, as reversing the

internal state during a write is now more difficult.

An SRAM comprises any number of these cells arranged into an array of rows and

colums. All of the cells in a given row share the same word line, and all of the cells

in a given column share the same bit lines. This has two consequences: first, read

and write operations act on whole words at a time, where a word consists of however

many bits there are in a single row of the memory. Second, since columns share bit

lines, no more than one row at a time may be permitted access to the bit lines of the

memory. Thus, an SRAM must contain control circuitry that limits bit-line access to

45

a single row. This circuitry is therefore called row-select or row-decode circuitry.

Further, in practical SRAM design, the word size is often much smaller than

the desired size in words of the memory. A straightforward implementation would

thus result in a memory that is much taller than it is wide. However, packaging

constraints generally favor circuits that are square. The standard approach to solving

this problem is to split the memory into N columns, where N is chosen to make

the layout approximately square. A given row selection will thus address N words

simultaneously. To refine this selection to the one desired word, column-select or

column-decode circuitry is added to the memory.

Typically, then, an SRAM interface consists of A address bits and D data bits.

The A address bits are split into R row bits and A - R column bits. The memory

core of the SRAM thus consists of 2 R rows and D x 2 A-R columns of individual 6-T

cells.

4.3 Extensions to 3-D

A cursory examination of the two-dimensional SRAM structure shows that to first

order, a particular cell is addressed via a row (word) line and a column (bit) line. The

2-D memory is thus addressed by a row-column matrix. It seems natural to extend

this idea to three dimensions by incorporating a third dimension into the matrix

addressing. At the cell level, this is done by adding extra transistors to the circuit.

The addition of transistors M3 and M4, as shown in Figure 4-2, allows a third-

dimension word line, WL2, to control access to the bit lines. However, an immediate

objection to the circuit in Figure 4-2 is that it requires two additional transistors

per bit of storage. This overhead could be prohibitively expensive in high-density

SRAMs. Proposed solutions include making the 3-D SRAM cell single-ended by

removing transistors M2 and M4 and bit line BL. This results in a six-transistor cell

whose area is essentially that of the standard 2-D SRAM cell.

However, if the SRAM is to be extended to three dimensions, this must be done

in a way that considers the overall performance of the memory. In a conventional

46

BL BL

WL2

WL

M3 M1 M2 M4

Figure 4-2: Eight-transistor circuit for use in 3-D SRAM.

SRAM, each row must be coupled to an active row decoder circuit, often implemented

as an AND of address bits, that drives the word line for that row. This active

decoder is necessary to prevent contention for the bit lines, which would happen if

two word lines were simultaneously high. (The columns, by contrast, may be decoded

using pass transistors since it is permissible to leave the bit lines in an unknown or

disconnected state.) Therefore, using the 3-D matrix-of-word-lines approach to 3-D

SRAM, there are necessarily more word lines to drive. In the single-ended SRAM

described above, for example, a second set of word lines (running perpendicular to

the first set) is needed. Thus, in addition to the decoder circuit for each row, this

SRAM implementation requires an active decoder for each column.

In addition, there are power considerations in choosing how to address the cells in

the SRAM matrix. The bit lines are usually high-capacitance wires whose switching

consumes a good deal of power. It is desirable to minimize the number and length

of these bit lines, and to avoid switching the bit lines if possible. For an N-bit data

bus, it would be optimal to have exactly 2N bit lines (N if the memory is single-

ended). However, two considerations interfere as the number of words in the SRAM

grows: first, with the number of bit lines fixed, the length of the bit lines grows with

the number of words, and thus the switching time and power consumption increase.

Second, as the number of words grows, the aspect ratio (width-to-length ratio) of the

47

N-K '' col. decode

K

Figure 4-3: Proper cell distribution improves aspect ratio and decreases bit-linelength.

SRAM deviates further from 1:1, which is the square footprint that is most desirable

for packaging. For high-density memory, having multiple sets of bit lines is a necessity.

For example, a commonplace 2Kx8 (16384-bit) two-dimensional SRAM may be

partitioned into 2048 rows and 8 columns. This would require only 16 bit lines, but

these wires would be prohibitively long, and the memory would be many times taller

than it would be wide. A better packaging scheme would be to have 128 rows and

128 columns and to use a 16:1 decoder to provide 8 bits of output. While this means

that potentially 16 times as many bit lines could switch simultaneously, the switched

capacitance is reduced by the same factor (since the length of the lines has been

reduced).

While the power consumption of the latter configuration is thus essentially the

same as that of the former, it is possible to utilize the benefits of the former to

reduce the power consumption of the latter. Specifically, the main benefit of the

2048x8 configuration is that for any given access, this configuration activates the

fewest number of memory cells (in this case, eight). The 128x 128 approach activates

128 memory cells and then selects the relevant eight bits of those. The reason for this

is that the 16:1 decoder can be implemented more compactly using pass transistors

that come after the memory-cell activation, thus trading off chip area for power

48

consumption. If, instead, the decoder were implemented at the cell level, only eight

cells would need to be activated. This could be coupled with a redundant 16:1 decoder

so as not to increase the length of bit lines. (The in-cell decoder eliminates the need

for an external column decoder; the bit lines may be wired together in bundles of 16.

However, this increases the bit-line capacitance.)

In two-dimensional memories, in-cell decoding results in a number of trade-off

choices, as integrating another decoder into the cell increases the cell size and de-

creases the bit density. The issue is that the word lines are used to activate cells, and

in the 128x128 cell example, a single word line activates 128 cells. In-cell decoding

is thus just the use of an orthogonal set of word lines to select a subset of the 128

cells for activation. If, instead, parts of a word line could be independently tri-stated,

the 16:1 decoding could be done without increasing the cell size. For example, a

128-cell word line could be separated into 16 blocks of eight cells. When the address

decoder drives a word line high, block decoders could be used to select which of the

16 blocks receives the word line. The other 15 blocks would remain at their previous

state (possibly active). The previously redundant 16:1 decoder would then be used

to ensure that only the currently active cells have access to the data bus.

The difficulty lies in implementing this tri-state scheme in a two-dimensional mem-

ory. The motivation for using word lines and bit lines is that the memory cells are

self-wiring and the word line control is simple. Further, and more importantly, the

visibility of signals on the word lines is sequential. That is, the memory cells on a

single word line receive signals on that line in order from closest to the decoder to

farthest away. If any signal on a word line is not meant for a particular cell, there

must be a "pass-through" mechanism for the cell so that the signal may be transmit-

ted to cells further down the chain. This mechanism requires that there be two word

lines for each row - a master word line that traverses the entire row, and a per-block

slave word line that is wired to the master through a pass transistor (which serves

as the tri-state switch). This additional complexity results in increased chip area or

diminished functionality (e.g. by sacrificing bit lines to recover area for the tri-state

transistors). Further, the power requirements increase by a significant proportion

49

row of cells in a2-D memory

master word line

++++access transistor slave word line

two stacked rows in3-Df emo

Figure 4-4: 3-D partitioning of rows allows for simple tri-stating of word lines.

since the master word line must be driven in both cases.

While this approach is complicated for two-dimensional memories, in three-dimensional

memories, the resulting tree structure of master and slave word lines can be geometri-

cally rearranged for a 3-D memory that is efficient in both area and power. As shown

in Figure 4-4, the blocks that share slave word lines may be treated as rows in a 3-D

SRAM. The tri-state circuitry may therefore be stacked along the third dimension,

thus reducing the master word line to a series of stacked vias. The total length of the

word lines is thus reduced back to essentially that of the original 2-D SRAM without

tri-stating. In addition, the power dissipation is reduced by a factor asymptotically

equal to the number of layers relative to a 2-D SRAM of the same capacity. This may

be formalized as follows: suppose a 2-D SRAM has been partitioned into MV/7 rows

and NvY columns; further assume that the SRAM cells are of unit length and width.

The word lines are then of length N/7 while the bit lines are of length MVI. The

SRAM is now to be repackaged into a 3-D SRAM with L layers, via the following

procedure. The 2-D SRAM is first repackaged into M rows and NL columns, with a

L:1 column decoder. This reduces the length of the bit lines to M; however, as vIE

times as many bit lines are switching, no power savings are realized thus far. In fact,

50

a

y1

the word lines are now longer by a factor of vi; to counteract this effect, the rows

are then partitioned into L blocks of N cells each, and each block is assigned a slave

word line that is fed from the master word line and tri-stated. (It is noted for the

time being that as stated before, this configuration still dissipates appreciably more

power in the 2-D case.) To construct the 3-D SRAM, each block of N cells in a row

is assigned to one of L layers along the third dimension, as illustrated in Figure 4-4.

This restores the original aspect ratio of M : N. The repackaging of the 2-D SRAM

into three dimensions is complete. The worst-case switched capacitance (two word

lines and all bit lines on a layer) is reduced from 2N L + 2MNL = 2NVI( 1+ M L)

to 2N + 2MN = 2N(+ M).

This is therefore the approach that will be taken in laying out a 3-D SRAM, which

is explored in the following section.

4.4 Layout of a 3-D SRAM

The first and second layers of a 512-bit 3-D SRAM are shown in Figure 4-5. The

design of this SRAM is based on the concepts outlined above.

The memory cell used in the 3-D SRAM is a standard 6-T cell, shown in Figure 4-

6. This means that volume utilization is asymptotically equal to the area utilization

of a 2-D SRAM of the same capacity.

In order to obtain the power savings available to the 3-D SRAM, each of eight

layers of the SRAM uses three address bits as a "layer select." That is, three address

bits are used to provide the tri-state signal for the entire layer. If a given layer is

not selected, all word lines and bit lines on that layer are disconnected from the row

decoders and bit-line drivers (located on the first layer). The row decoder outputs

and individual bits on the data bus are thus actually tri-stated buses wired along

the third dimension. Figure 4-7 shows the connection of a row decoder to the word

line through a pass transistor. Figure 4-8 shows the connection of bit lines to the

tri-stated buses making up the data lines.

Thus, the SRAM effectively acts as eight 2-D SRAMs whose I/O pins are wired

51

IWO.Ot lop L' 1 %Z

-Vw

'all a 047,lir

"6 it

46

tfka IImaqup 7: n77uMAElU

LIFigure 4-5: First and second layers of an eight-layer 3-D SRAM.

52

U

if"' JI,

%M 26R1_%,i 6.:- 6;-.%.-%

t4'

go PIP q--i r

Figure 4-6: 6-T SRAM cell layout.

tri-state signal

- word lmie

row decoder access FET

Figure 4-7: Word-line tri-stating.

53

to

_ ~,,bit-hne Wii-state si'gnal

write -enable

Figure 4-8: Bit-line decoding using the word-line tri-state control signal.

together, with the important difference that the row decoders and bit-line drivers are

reused for all eight layers. Like a stack of 2-D SRAM chips soldered together, only

one layer consumes power for any single operation. The power consumption of the

512-bit 3-D SRAM shown here is thus essentially that of a single 64-bit layer. The

circuit area, as can be seen in Figure 4-5, is little more than that of a 2-D SRAM with

an 8:64 aspect ratio, as the seven upper layers are dedicated to storage cells. The

area overhead incurred by the 3-D implementation is in the form of access transistors

that are clearly visible in the layout of the second layer; this overhead is not due to

the three-dimensionality of the architecture, but instead a tradeoff for maintaining

low power consumption as described above. Finally, while speed has not been the

primary consideration here, care has been taken not to sacrifice performance in this

regard. All word and bit lines are strictly shorter in the 3-D case, and the layer select

computation is done in parallel with the row decoding. At any rate, the use of sense

amplifiers would greatly reduce the dependency on bit line speed, and consequently

diminish any speed advantage presented by a 3-D SRAM with this architecture.

It is clear, then, that memory architectures benefit directly from implementation

in a three-dimensional medium. It is useful to consider now the suitability of this

medium for computation.

54

Chapter 5

The Cellular-Automata Machine

Ever since the advent of integrated-circuit technology, general-purpose computation

has been one of its main applications. Today, about half the world semiconductor

market is in computer chips. Further, research into novel computational architectures

is highly active. Many of these architectures could benefit from integration in a 3-D

technology; one such architecture is the cellular automata machine.

5.1 Background

Much of computational theory is centered around what are called finite automata.

In particular, interest is centered here on a particular class called deterministic finite

automata (DFAs), also known as finite-state machines (FSMs).

5.1.1 Finite-State Machines

A finite-state machine is a quintuple (Q, E, 6, qO, F). Q is a finite set of states in which

the machine can be. E is a finite alphabet from which the inputs to the machine are

taken. 6 is the transition map, defined as a function 6 : Q x E --+ Q that maps a

combination of the current state of the machine and the current input to the next

state of the machine. qO is the initial state of the machine, and F C Q is the set of

accepted final states.

55

inputs

0 combinational0 1 logic

0

register

0 current state

A. Finite-State Machine B. Hardware Implementation

Figure 5-1: Finite state machine.

The finite-state machine, depicted in Figure 5-1, is thus designed for synchronous

operation. The FSM is set to state qo and an input is provided. At each tick of a

global clock, 6 determines the next state, and the current input is discarded for the

next input. If at the end of the input stream, the state of the machine is in F, then

the machine is said to accept the input.

The FSM admits a straightforward hardware implementation, also shown in Fig-

ure 5-1. FSMs are thus used for many types of control hardware.

More important, however, is a modification of an FSM called a Turing Machine.

A Turing machine comprises an FSM and an infinite tape. At each tick of a global

clock, the FSM reads from the tape (its input) and decides (1) its next state, (2)

an output to the tape, and (3) whether to move left or right on the tape. A Turing

machine can thus be defined by a septuple (Q, E, F, 6, qO, B, F). Q is again the set of

states of the FSM. F is the finite tape alphabet. B is a special character, the "blank."

E C F - {B} is then the input alphabet. 6 : Q x F - Q x F x {left, right} is then

the transition function for the Turing machine. F is again the set of accepted final

states.

Turing machines are important for general-purpose computation because for the

56

Finite-StateMachine

Read/Write Head

#I #I#I#0 I1 l0l11111101011#1#1## I#I.l##

Infinite TapeFinite Alphabet (0,1,#,...,],/,@)

Figure 5-2: Turing machine.

appropriate choice of parameters, there exists a Turing machine that can take as

input a specification of an arbitrary Turing machine and an input to that machine,

and simulate the behavior of the specified machine on that input. The simulator is

thus called a Universal Turing Machine (UTM). It has been shown that a UTM is

capable of solving a vast class of useful problems [19].

In particular, a UTM is computationally equivalent to modern general-purpose

processors, in that a UTM can simulate the behavior of any of these processors, and

vice versa. Therefore, any architecture that is equivalent to a UTM is suitable for

general-purpose computation. Since certain cellular-automata machines can be shown

to be equivalent to a UTM, it is worthwhile to explore this architecture further.'

5.1.2 Cellular-Automata Machines

The cellular-automata machine (CAM) is a variant on the finite-state machine much

in the same way a Turing machine varies from the FSM: the CAM adds unbounded

memory to the system. The chief difference is that in a Turing machine, one pro-

'A more in-depth review of these concepts is available in many texts, such as [19].

57

FSM FSM

state state

FSM FSM

Figure 5-3: Four cells of a cellular-automata machine.

cessing element has access to the unbounded memory, but in a CAM, the memory is

distributed as finite chunks among an infinite array of processing elements.

Specifically, a CAM is a regular array of cells, each of which is an FSM. For any

given cell, the input to the cell is a set of states of other cells; the set of cells whose

states are polled is called the neighborhood of the given cell. CAMs generally also have

the property of uniformity; that is, each cell has the same FSM and the topology of

the neighborhood is invariant from cell to cell. It is, therefore, only the input (i.e. the

initial configuration of states) that varies from program to program, as implemented

on a CAM.

Part of a hypothetical CAM is thus shown in Figure 5-3. This CAM features a

square array and exhibits a common neighborhood known as a Moore neighborhood,

which contains all cells in the immediate "square". Thus, in this CAM, each cell has

eight neighbors.

Unlike UTMs, CAM architectures are highly variable, in that there are many

choices for the cell topology and for the neighborhood. For example, a hexagonal

grid is used for many problems in physics. The CAM shown in Figure 5-3 uses a

58

2-D mesh. As is no doubt evident, the topology to be considered here is the 3-D

rectangular mesh.

This topology is suitable for a number of reasons. First and foremost, it has been

shown that there exist CAMs using 2-D and 3-D meshes that are equivalent to a

UTM [20]. Secondly, it admits a more efficient layout in three dimensions than in

two.

To demonstrate these efficiencies, a sample CAM is laid out that executes the

Game of Life in three dimensions.

5.1.3 The Game of Life

The Game of Life, as devised by John Conway [6], is a simple specification for a

two-dimensional cellular-automata machine that exhibits a variety of interesting be-

haviors. The cells are arranged in a Moore configuration so that each cell talks to

its eight nearest neighbors. At any given time, each cell is in one of two states, alive

or dead. The transition rule for a cell is as follows: a living cell remains alive if and

only if it has exactly two or three living neighbors (the environment condition), and

a dead cells becomes alive if and only if it has three living neighbors (the fertility

condition).

More formally, a candidate Game is specified by the quadruple (El, E", F, F"),

where a living cell remains alive if and only if it has m living neighbors with E <

m K E, and a dead cell becomes alive if and only if it has n living neighbors with

F < n K F,. The Conway Game of Life is thus (2, 3,3,3), hereafter termed Life 2333

[1].

Life 2333 exhibits some important behaviors. For example, there exist cell con-

figurations that are stable (i.e. that maintain the same state indefinitely). Other

configurations oscillate (i.e. cycle through a finite set of states). Further, some

configurations, called gliders, translate across the plane. Finally, Life 2333 has con-

figurations called glider guns, which are stationary configurations that emit gliders

at regular intervals.

It has been shown that using these glider guns, Life 2333 can be made to emulate

59

Boolean logic, and thus is equivalent to a UTM [6]. Fabrication of a Game of Life

CAM is thus potentially more interesting than fabrication of some special-purpose

machines.

All that remains is to extend Life to three dimensions. The main difficulty is

that a Moore neighborhood on a 3-D mesh consists of 26 cells. Thus, the range of

candidate Games is greatly increased over the 2-D case. However, there do exist

several candidate Games in three dimensions. Two of these are Life 4555 and Life

5766. While Life 4555 exhibits more prolific behaviors, Life 5766 has the interesting

property that many special configurations in Life 2333 are extensible to Life 5766.

In fact, Life 5766 is provably most analogous to Life 2333. However, it is not known

whether a glider gun exists for either three-dimensional game [1].

Nonetheless, Life 5766 is used as the CAM to be laid out, as a wealth of configu-

rations are already known.

5.2 Layout of a 3-D Game of Life

The layout of a 3-D cellular-automata machine consists of two basic steps. First,

the individual cell must be laid out; second, the cells must be wired together, and

global signals must be distributed. It is in this second phase where layout efficiencies

will be examined. This is for several reasons: primarily, this is due to the fact that

the cell contents are highly variable from CAM to CAM; thus, optimizations for one

particular CAM may not generalize. Secondarily, the motivation behind the CAM is

towards simple cell design, and the design of simple FSMs is not terribly interesting.

Nevertheless, the cell design will be discussed.

5.2.1 Game of Life Cell

The cell behavior in the Game of Life, and in Life 5766 in particular, is easy to

specify in a rule-based form. This means that Life is easy to implement in software.

However, Life does not admit a simple expression in hardware. The hardware imple-

mentation in three dimensions is confounded by the fact that the neighborhood size is

60

26 cells as opposed to 8 in the two-dimensional case. Thus, neither a straightforward

combinational-logic approach nor a lookup-table approach is feasible.

The standard approach to Life-type problems in hardware is to emulate the soft-

ware mechanism. In software implementations, the cell states are stored in memory

and an accumulator is used to count the number of live cells in the neighborhood. A

hardware implementation could then consist of a number of adders.

However, a neighborhood of 26 cells implies 25 additions, some of which are neces-

sarily five bits wide. The resulting logic structure would both be unnecessarily large

and compute unnecessary information. The only way to implement such a cell using

logic or arithmetic is through hardware reuse; that is, computation of the state must

be done over several clock cycles, where different inputs are considered on each cycle.

By contrast, an area-efficient single-cycle implementation must recognize that permu-

tations in the input data are irrelevant and that only two result bits are necessary to

compute the output (these being whether El < n < E, and whether F < n < F).

Combinational logic is thus inefficient because each permutation of the input data

must be recognized by a separate pulldown or pullup path. Lookup-table implemen-

tations fail for this reason as well. On the other hand, an adder implementation is

inefficient because it produces extraneous output; it is not possible to discard higher-

order bits of the sum, even though these bits are not desired for all practical Life

implementations.2

Thus, in order to simplify the FSM structure, non-computational approaches must

be considered. One such approach is through sorting nets. A sorting net is a hardware

structure that takes n unsorted inputs and produces n outputs that are the inputs

in ascending or descending numerical order. Sorting nets differ from conventional

sorting algorithms in two ways: first, comparisons are done in parallel, and second,

the sequence of comparisons is independent of the input. Sorting nets are therefore

2 In the construction of a practical CAM, i.e. one that is more adept than the Game of Life atgeneral-purpose computation, the computational structure of the FSM is paramount. That is, thecell would be implemented via a lookup table or via pipelined combinational logic, and the abilityto do single-cycle computation on all 26 inputs would be sacrificed. The emphasis would thus be onoptimization of the logic, rather than on optimization of the overall CAM structure, which is thefocus here.

61

Vdd A, A 2 A,

AA

B, B2 B, B

Figure 5-4: Insertion sort bit-slice.

ideally suited for hardware implementation.

Life may be implemented by taking the input bus and sorting the bits on the

bus. If the nth-highest bit is a 1, this then guarantees that there are at least n

live neighbors. It is then straightforward to compute the next state of the cell by

examining the relevant wires in the bus.

An area-efficient wide sorting net can be implemented as an insertion sort. Inser-

tions are efficient because each bit is inserted either at the top of the list or at the

bottom. Consider the circuit in Figure 5-4. Bits A 3 , A 2 , and A 1 are sorted, and A 0

is to be inserted. Bits B 3 .. .B 0 are thus also sorted.

A 26-bit sorter may be constructed using this method. Moreover, all but the fifth

through eighth highest bits may be discarded for Life 5766. The next state may then

be computed as a combinational function of these four bits and the current state.

The Game of Life cell is shown in Figure 5-5.

5.2.2 Game of Life CAM Architecture

Of vital interest is the usability of the 3-D medium for system-level optimizations of

the CAM architecture. In particular, the inter-cell communication and clock distri-

bution both benefit from implementation in three-dimensions. System I/O can also

be done in an efficient manner. To experiment with some of these optimizations, the

Game of Life cell is fitted into a 4 x 4 x 4 rectangular mesh.

Inter-cell communication is a difficult problem mainly because of the number of

wires - 26 per bit of state. This problem confounds implementation of a 3-D CAM

62

output rester state logic

Figure 5-5: Game of Life cell.

in a 2-D medium. With a three-dimensional medium, the problem reduces to that of

wiring the eight-neighbor 2-D configuration. Each cell requires inputs from its eight

neighbors on the same layer, nine neighbors from one layer down, and nine neighbors

from one layer up. An efficient 3-D wiring scheme is to wire the eight same-layer

neighbor inputs to the cell, and then wire each input to three via stacks. The first

stack connects to the gate on that layer, the second stack goes up to the next layer,

and the third stack goes down to the layer below. Beneath the second stack lies a

via stack coming up from the layer below, and above the third stack lies a via stack

coming down from the layer above. Thus, when the cells are aligned along the third

dimension, the vias align to provide I/O between layers.

Clock distribution is done through an H-tree network. The clock net on the first

two layers of the CAM is shown in Figure 5-6. The four peripheral vias connect the

four-cell H-trees on the second layer to the four-cell H-trees on the first layer, thus

forming a 3-D H-tree. The central via then connects the 32-cell H-tree shown in the

63

Figure 5-6: Clock distribution in the 3-D Game of Life architecture.

figure to the 32-cell H-tree on the third and fourth layers of the CAM.

The savings in wire length can be seen immediately. Distributing the clock signal

to an 8 x 8 2-D grid would require four H-trees like the one in the left half of Fig-

ure 5-6; this 3-D implementation requires two. Further, the four H-trees in the 2-D

implementation would have to be connected into an H-tree, requiring yet more wire.

In this 3-D circuit, the trees are connected by the central via stack. Thus, the 3-D

medium features wire-length savings, and thus power savings, that can be exploited

readily.

Of course, a 3-D architecture is useless if there is no way in which it can be

programmed. System I/O thus becomes an important concern. While various bit-

serial methods such as shift-register implementations have been proposed for this

purpose, the approach taken here is to integrate a 3-D memory interface with the

CAM. Specifically, each row receives a word line and each column receives a bit line.

Each cell also receives a write-enable signal that is used to program the CAM and that

64

| -

also serves to disconnect the state logic from the output register. External addressing

and control can then be used to select a multi-bit word for reading out of the CAM

while the CAM is operating, or can be used to stall the CAM and alter some or all of

its state. The CAM architecture shown here may thus be visualized as a 3-D memory

with integrated processing elements.

A single layer of the 4 x 4 x 4 Game of Life architecture is shown in Figure 5-7.

65

-*

0

C

.0

0

Chapter 6

Conclusion

Currently, integrated circuits are fabricated using a CMOS technology which con-

strains the circuits to a two-dimensional geometry. This places fundamental limits on

the density of both memory elements and processors in a single chip. The develop-

ment of new fabrication technologies, however, has made feasible the implementation

of three-dimensional (multiple-layer) integrated circuits. The possibility of creating

such devices will open new doors for circuit design and fabrication.

In order to be prepared to utilize this technology, software design tools have been

developed that allow circuit designers to utilize their background in 2-D CMOS logic

design to develop 3-D circuits. This thesis centers in the development of such tools.

Specifically, a computer-aided design system called FluidLayout has been devel-

oped that integrates a familiar means of circuit layout with the means to handle 3-D

circuits of arbitrarily many layers. In addition to providing a familiar circuit design

interface, FluidLayout also provides useful system-level functions such as the ability

to extract circuit netlists and the ability to produce file formats useful for fabrication.

The tools and technology have been used in this thesis to design some interesting

circuits. Some basic transistor circuits have been laid out to test the viability of

FluidLayout and of the medium. FluidLayout has been used to specify the circuit

arrangement and to verify functionality of the circuit through netlist extraction. A

wafer mask has been produced using the specification provided by FluidLayout.

To explore some interesting ramifications of the technology, a 3-D static RAM has

67

also been designed using FluidLayout. This SRAM has a 512-bit capacity, but has

the same footprint as that of a 64-bit 2-D SRAM. Further, wiring costs have been

reduced so that the power consumption of this memory is approximately that of the

64-bit 2-D SRAM as well. The I/O packaging of this 3-D SRAM is arranged so that

it may be used as a drop-in replacement for conventional SRAM; thus, immediate

benefits may be seen from this technology.

Finally, a 3-D cellular automata machine has been designed in order to explore

some system issues in 3-D design. This CAM implements a 3-D version of Conway's

Game of Life in hardware. The architecture features an efficiently-distributed global

clock signal and a 3-D memory-style I/O interface.

As the technology matures, circuits can be fabricated using the tools and tech-

niques developed in this thesis. The potential benefits introduced by such new devices

are enormous.

68

Appendix A

FluidLayout User's Guide

A.1 Overview

FluidLayout is a design tool for geometric layout of three-dimensional integrated

circuits and basic microelectromechanical systems (MEMS). That is, within Fluid-

Layout, it is possible to construct arbitrary metal-insulator-semiconductor structures

so long as the structures are confined to a Manhattan (900) grid.

Figure A-i shows the main appearance of FluidLayout. The components outlined

in the figure will be discussed in detail.

A.2 Basic Layout

Basic layout is done by painting rectangular boxes on a grid. A colored box is created

by first drawing the box outline (the edit box) and then selecting the paint with which

to fill it. The left mouse button is used to select the lower-left corner of the box and

the right mouse button is used to select the upper-right corner. The box, once sized,

may be moved by clicking the left mouse button at the desired lower-left coordinate.

Once the box has been sized, it must be painted. There are two ways of doing

this. The first is to select the desired material from the main toolbar, shown in

Figure A-2. Substrate materials are identified by their color and their associated tool

tips (for example, passing the mouse over the red button produces a "Gate" pop-up).

69

intil

F01

r VIZI Jql o z!ml lowir-tlvAlR-,,l d

.. . . . .. . .. . . . . .

ciell f 6 dlb if i tna i albini

............. ...... ......... .........

---------- ------------- ....... --------

I IN t... ... .....

IF

. .......... . . ..............

.... ............... ----------

oq

0

C71-

Figure A-2: The main FluidLayout toolbar.

Via materials are identified by the color of the underlying metal along with an "X"

through the button.

The other way to paint boxes is to use existing paint. The middle mouse button

acts as a copy button, in that if it is clicked over a painted region, the paint in that

region is added to the paint in the edit box. Also, if the middle mouse button is

clicked over empty space, the edit box is cleared.

To make rectangle placement easier, several features are supplied. The first is

a grid whose visibility may be toggled by the grid toolbar button or the 'g' key.

Rectangle placement is aligned to this grid regardless of its visibility. Another feature

is the current-coordinate window. The coordinates given are relative to the lower left

of the entire circuit, so this information is especially useful in coordinating object

locations across layers. Finally, zoom buttons are provided in order to allow the user

to view any part of the layout easily. Specifically, three buttons are provided: zoom

in (2:1), zoom out (1:2), and view all. Further, the 'z'/'Z' and 'v' keys are enabled

as shortcuts; 'z' zooms to the edit box, 'Z' is equivalent to "zoom out", and 'v' is

equivalent to "view all."

A.3 Higher-Level Functions

A.3.1 Node Labeling

For circuit verification purposes, it is desirable to label some of the nodes of a circuit.

This is done in FluidLayout using the "select" and "label" functionality.

A node that is to be labeled must first be selected. In Figure A-3, a selected node

is shown. FluidLayout outlines the selected rectangle in bold white. A rectangle may

be selected using the 's' key.

71

Figure A-3: Partial view showing node selection.

Once selected, a rectangle may be labeled by pressing the 't' key or the label

button (marked 'T') on the toolbar. Doing so causes the dialog in Figure A-4 to pop

up.

The user may then enter an appropriate label.

A.3.2 Translation, Rotation, and Reflection

The ability to move portions of a circuit around the layout is extremely useful. In

FluidLayout, this is accomplished with the Cut, Copy, and Paste facilities. The

contents of the edit box are copied to the clipboard (and in the case of Cut, erased

from the layout), and may be placed at a new location by relocating the edit box

there.

Additionally, FluidLayout supports 900 rotations about the lower-left corner of

the edit box and reflections about the left side of the edit box. A rotation is depicted

in Figure A-5.

72

Car

Figure A-4: Label dialog.

Figure A-5: Subcircuit rotation.

73

oLi

ILI

Figure A-6: Cell hierarchy management toolbar.

A.4 Circuit-Level Tools

A.4.1 Circuit Traversal

As discussed before, FluidLayout provides zoom-in, zoom-out, and view-all function-

ality for circuits on a layer. The mechanism for creating and viewing different layers

of the circuit is provided from the pull-down menus.

For example, one can add a layer to the circuit by pulling down the Window menu

and selecting New Layer. One can then open a view window for any of the existing

layers by pulling down the View menu and selecting Switch to Layer. A dialog box

will then pop up in which the user can specify the layer to view.

A.4.2 Cell Hierarchy Management

Every FluidLayout design can be used as a subcell of another design. Thus, Fluid-

Layout has full support for cell hierarchy. Cell manipulation is done through the cell

toolbar, shown in Figure A-6.

The cell toolbar has buttons for loading cells into the design, adding cells to the

layout, and for manipulation of individual cells. Specifically, the cell toolbar has the

following buttons from left to right:

Load Cell allows the user to load a cell from disk into memory and associate the

cell with the current working design.

Add Cell pops up a dialog box from which the user can select a loaded cell for

addition to the circuit layout.

The next three buttons specify the operation mode for cell manipulation. A cell

may be accessed by double-clicking on the cell. The access mode is specified by the

toolbar button that is currently depressed.

74

Copy Cell signifies that double-clicking should create a copy of the accessed cell.

Move Cell signifies that double-clicking should move the accessed cell.

Erase Cell signifies that double-clicking should erase the accessed cell.

With this interface, it is feasible to construct circuits with arbitrary hierarchical

depth.

A.4.3 Magic Importation

FluidLayout has the ability to import circuit layouts created in Magic. Text files

with the .mag extension are recognized as Magic layout specifications, and those

rectangles with material types that correspond to FluidLayout materials are imported

to FluidLayout's native format.

This option is selected by pulling down the File menu and selecting the Import->MAGIC

file (.mag) option.

A.4.4 Circuit Netlist Extraction

FluidLayout is able to extract the connectivity information in VLSI circuit layouts

and produce a netlist in SPICE deck format. This format is suitable for functional

and timing simulation using the Berkeley SPICE circuit simulator or its variants.

(The user must supply SPICE models for the transistors; this information cannot be

provided by FluidLayout.)

The user may select this option by pulling down the File menu and selecting the

Export->SPICE deck (.sp) option.

A.4.5 VLSI/MEMS Fabrication

At some point, it is desirable to target a design for fabrication. There are two steps

that must be taken: first, the fabrication technology must be specified, and second,

the circuit must be extracted to a suitable output file.

75

In~tSetp, Laser Setup -stj

Lambda (0.01: _.1

Laser position:

Lasei curent (mA):

Printing feed rate:

-Restore -Defaults1

Figure A-7: Edit->Properties->Laser Setup

Fabrication Technology Specification

FluidLayout is designed to understand fabrication technologies on a system-by-system

basis. This is mainly because the fabrication methods are drastically different from

each other. For example, FluidLayout is able to handle both inkjet and embossing

means of fabrication.

FluidLayout accommodates this by maintaining a property sheet for each design.

This sheet may be accessed by pulling down the Edit menu and selecting Properties.

Figure A-7 shows the property sheet for laser rasterization of the layout.

76

X: -135

300

500

The individual fabrication parameters are set as described here:

Inkjet Setup The inkjet is a 48-nozzle liquid printing head mounted on an X-Y-Z

gantry. This property sheet therefore configures the size of the printing head and

various printing parameters.

Lambda corresponds to the size of the grid spacing. This should be set to the line

width of the inkjet, since rectangles are rastered, and it is preferred that rectangles

are printed as such as opposed to sets of snake-like lines.

Inter-nozzle spacing refers to the space between each of the 48 nozzles.

Printing feed rate is the rate of gantry movement during printing.

Non-printing feed rate is the rate of gantry movement when not printing.

Pre-print (acceleration) spacing is used to produce smooth, even lines. The

inkjet is given a small amount of space to accelerate to full speed before printing.

Laser Setup The laser setup is similar to that of the inkjet, as the laser is mounted

on the gantry also.

Laser position is the (x, y) location of the laser in gantry coordinates.

Laser current is the operating current of the laser. This is used for automated

operation of the laser via the serial port of the controlling computer.

Stamp Setup The stamp setup uses a flexible stamp to pattern liquid materials.

Each stamp contains all the patterns for a circuit (e.g the gate pattern is followed

by the gate-via pattern); these patterns are spatially separated on the stamp. Each

pattern consists of rectangular outlines that are used to separate the desired material

rectangles from undesirable excess material.

Layer-to-layer spacing thus gives the spacing from a given layer on the stamp

to the next. For example, if the gate pattern is at (0, 0) and the spacing is set to 1000

microns, then the gate-via pattern is at (1000, 0).

Width of stamp outline refers to the width of the rectangular outlines in the

stamp patterns.

77

Printer Use The gantry printing system is designed so that laser and inkjet print-

ing may be done interchangeably, meaning that different material parts of the same

circuit may be done with either inkjet or laser. Thus, this property sheet allows the

user to specify which method is to be used for printing.

Fabrication File Production

Once the technology is set, all that remains is to produce output that can be fabri-

cated. This can be done in three ways, depending on the target process.

File->Export->MMI code is used to produce G-code for the gantry system.

File->Export->VLSI Nanoprinting GDSII produces GDSII binary stream data

that is almost universally accepted for mask fabrication.

File->Export->MEMS Nanoprinting GDSII produces GDSII binary data, but

the layout is interpreted as a MEMS process and the release layers are generated

accordingly.

A.5 Step-By-Step Design Walk-Through

As an example, a three-dimensional ring oscillator is laid out here. Once FluidLayout

is started, the user is presented with the screen in Figure A-1. The grid is off by

default; it should be turned on by clicking on the grid toolbar button or by hitting

the 'g' key.

First, an NMOS transistor is created. To do so, a source node is created by

left-clicking at some location and right-clicking to define a 4 x 2 box, as shown in

Figure A-8.

Clicking on the blue toolbar button fills in this box with source/drain material,

as shown in Figure A-9.

By left-clicking, the edit box can be relocated to the right of the new source node

as a suitable location for a drain node. Then, by middle-clicking over the source

node, the source node material can be copied to the new edit-box location, yielding

Figure A-10.

78

I I- - r I

I-

I II I

I-

I- -

I I

I -

I- --

I [

Figure A-8: 4 x 2 box used for the source of an NMOS transistor.

I I I II I -- i-

I I J -iI I ~ - I-

Figure A-9: Completed source node of the NMOS transistor.

I . - L .- - -

Figure A-10: NMOS source and drain nodes.

79

I

Source and drain nodes for the PMOS transistor can be created using the same

method. The resulting pattern is shown in Figure A-11.

Figure A-11: Inverter source and drain nodes.

The common gate may be laid out using the edit box and clicking on the red

toolbar button to fill it. Similarly, the n-type and p-type semiconductor can be

placed using the green and brown buttons respectively. The power and ground lines

can also be laid out using the source/drain material. Finally, the transistor drains

can be wired together to form the output. This results in the image in Figure A-12,

which can be seen in its entirety by pressing the 'v' key or clicking on the 'V' toolbar

button.

It is now desired to create inverters on the second and third layers of this circuit,

80

I I I ; ; I I - I V 1 I I - I ; I

I

-- 'u- - r -a i

-..L J

I V

-L--P.. - -

I I

-r--r- r

L r INI

I f

Figure A-12: Complete inverter.

81

thus making a three-dimensional ring-oscillator. This is started by creating a new

layer from the Window menu, shown in Figure A-13.

_e P

I- ayout

f I A I I -I I I INJ w b.ta L_ ye -

Figure A-13: Window menu.

This layout process could then be repeated for each new layer. Instead, the Copy

facility is used to duplicate the inverter. This is done in four steps. First, the edit box

is used to outline the area to be copied. Then either control-C or the copy toolbar

button is used to copy the contents of the edit box. Next, in the view window for

the second layer, the edit box is placed at the desired location for the copy (it is only

necessary for the lower-left corner of the box to be in the correct place; this can be

verified from the current-coordinate indicator). Finally, the Paste function is used by

pressing control-V or the paste toolbar button.

Next, vias must be placed to connect the three layers. Starting with the first

layer, contact pads to the input, output, power, and ground are laid out, as shown in

Figure A-14.

Each contact pad may then be wired to a series of vias that connect it to the next

layer up. Specifically, this is done by outlining the contact pad with the edit box,

then selecting the via material to place. In this case, the gate pad needs gate-via,

source/drain-via, metall-via, and metal2-via to connect it to layer 2. A via is placed

by clicking on the appropriate toolbar button. Specifically, the (xyz)-via button looks

like the (xyz) button with an 'X' through it.

Upon placing the metal2 via, which connects to the next layer up, a corresponding

82

-.- -

-- -

I I I I I I I I 1- 1 1 1 - I. F I I I I1 - 1- - 1. jr -4 -

i - - -- i- - r - i - r

J . L ---- - -- .- L - - - L -- J - L----

f, A I L, I I

- "g" - J1 "Lr """1

r -k -i 'I -i r - i - - r

1 L - J. J L J- J

I t I ~ i I

$,I 4,L .

i~~ ~ ~ L J -JLJ

.1 -

J-1 - A -- ,

L1

J J..L J_-J .L .--I t I I

1.1

f I I

I I I

1 r - -r -

'J~~- J--Lj "W - - J -L --

I I - A I I I I I II I I V P j

I ~ ~ ~ 'r I -v t I 1 1 1I - - r I- 'I i - i - - -17,

J L- -- -J L __1U J L --- J--L -- -

-4 - -- - 4-1- - 1 -- -4 -- P.--I- -4--7 i - -

Figure A-14: First-layer inverter with contact pads.

83

-I- -

via is marked on the next layer. This is shown in Figure A-15.

I I I I I

S I I - e I I L J - L 1 L J L _

I.. I of I I 4I -

-r -I

-- r - 1

r T -r 1- T ' I r i - r i - -,

I _kI J- I I _1 _L i- J L -- j i

I I I I I "1 1

I 11 11II I ,I-

1 1 1 I 1l ~ 1 I

- - r - -

-I -- - --

Figure A-15: First-layer inverter with via stacks to the second layer.

The inverters on the second and third layers may be wired similarly. This com-

pletes the ring-oscillator structure.

Finally, the nodes may be labeled for circuit-verification purposes. This is done

by placing the mouse pointer over the desired node and pressing the 's' key to select

it. Then, pressing the 't' key or clicking on the label button (marked 'T') pops up

a dialog box in which the user can label the node. Vdd, ground, and the oscillator

output are labeled in this way, as shown in Figure A-16.

84

I I I 4 mI 1 I I

* -I

-'-------I- I

I I

I I

I I

I I I I I I-

. I: - . I -- I-

. I

I I--------

I I

II

I I

I I I I L I_ I ~ I I

_- - N - i- - L---I---

-- 1

I-I

I

i t I I-

-- ... Ji.. a i -i Ie A a a I

r i r

I ~ ~ A IL L

i r i

-.- J-L J -- L. --

Figure A-16: Labeled first-layer inverter.

85

I

J_

-1

-ya

The circuit is then extracted to a SPICE deck via the File->Export menu, shown

in Figure A-17.

m , EditNew

Opery..CloseS ave

View Window Help

11Sae As...

Pr'nt PreviewPint Setup...

1 ring-oscillator2 C:\Users\...3 DGameOfLife3 CAIsers\...\3DGame0fLifeCel4 C:\Users\...\Mask 1

Exit

UMI code ( _prg}

VLSI Nanoprinting GDSI (.gds}

MEMS Nandrinting G D II (.gds)

SPICE, dek (.sp)

IFigure A-17: File->Export menu.

This produces the SPICE circuit netlist shown here.

***** C: \WINNT\Profiles\shamikd


***** Created on 5/1/2000

\Desktop\ring-oscillator. sp

3 ring GND! 0 NTFT W=10u L=10u

3 ring Vdd! Vdd! PTFT W=30u L=10u

2 3 GND! 0 NTFT W=10u L=10u

2 3 Vdd! Vdd! PTFT W=30u L=10u

ring 2 GND! 0 NTFT W=10u L=10u

ring 2 Vdd! Vdd! PTFT W=30u L=10u

86

M1M2M3

M4

M5M6

-41

Finally, it is desired to extract the circuit to output that can be fabricated. Using

the same File->Export menu, VLSI nanoprinting GDSII is selected. This produces

a binary stream that can be used to turn a mask, which is used to produce the

wafer mold from which a circuit stamp can be made. The circuit design flow is thus

complete.

87

A.6 Summary of Useful Commands

Desired Action Method, Shortcut, or Menu

place edit box left-click at lower-left corner

resize edit box right-click at upper-right corner

paint box click on applicable toolbar button or

middle-click over an area with the same paint

zoom to edit box 'z' key

zoom in zoom-in toolbar button (+ magnifying glass)

zoom out zoom-in toolbar button (- magnifying glass)

view entire circuit 'v' key or view-all toolbar button (marked 'V')

toggle grid grid toolbar button or 'g' key

cut contents of edit box Cut toolbar button or ctrl-X

copy contents of edit box Copy toolbar button or ctrl-C

paste contents of edit box Paste toolbar button or ctrl-V

rotate contents of edit box Rotate toolbar buttons (circular arrows)

select rectangle/node 's' key

label rectangle/node 't' key or Label toolbar button (marked 'T')

manipulate cell double-click left mouse button over cell

load cell Load Cell toolbar button (image of cell from floppy disk)

add cell Add Cell toolbar button (image of cell from list)

copy cell Copy Cell toolbar button (image of duplicate cells)

move cell Move Cell toolbar button (arrow)

erase cell Erase Cell toolbar button (obliterated cell image)

edit fabrication parameters Edit->Properties menu

import Magic layout File->Import menu

export SPICE deck File->Export->SPICE

export fabrication files File->Export menus

88

Bibliography

[1] Carter Bays. Candidates for the game of life in three dimensions. Complex

Systems, 1:373-400, 1987.

[2] Carter Bays. A new game of three-dimensional life. Complex Systems, 5:15-18,

1991.

[3] Carter Bays. A new candidate rule for the game of three-dimensional life. Com-

plex Systems, 6:433-441, 1992.

[4] Claude Bertin et al. Integrated multichip memory module structure. United

States Patent 5,502,667, March 1996.

[5] A. R. Brown et al. Logic gates made from polymer transistors and their use in

ring oscillators. Science, 270:972-974, 1995.

[6] J. H. Conway, E. R. Berlekamp, and R. K. Guy. Winning Ways for Your Math-

ematical Plays. Academic Press, New York, 1983.

[7] Sawyer Fuller and Joseph Jacobson. Ink jet fabricated nanoparticle mems. In

13th Annual IEEE Conf. on MEMS, 2000.

[8] Information Mechanics Group. Cam8: A parallel, uniform, scalable architecture

for cellular automata experimentation. On the World-Wide-Web at http://www.

im. Ics. mit. edu/ cam8.

[9] Roger T. Howe and Charles G. Sodini. Microelectronics: An Integrated Approach.

Prentice-Hall, NJ, 1997.

89

[10] F. T. Leighton and Arnold L. Rosenberg. Three-dimensional circuit layouts.

SIAM Journal on Computing, 15(3):793-813, 1986.

[11] Charles E. Leiserson. Vlsi theory and parallel supercomputing. MIT/LCS/TM

402, Massachusetts Institute of Technology Laboratory for Computer Science,

May 1989.

[12] Magic - a vlsi layout system. On the World-Wide-Web at http://research. com-

paq. com/ wrl/ projects/ magic/ index. html.

[13] C. A. Mead and L. A. Conway. Introduction to VLSI Systems. Addison-Wesley,

Reading, MA, 1980.

[14] J. Ousterhout. Corner stitching: A data structuring technique for vlsi layout

tools. IEEE Trans. Computer-Aided Design, CAD-3(1):87-100, 1984.

[15] J. Ousterhout et al. Magic: A vlsi layout system. In Proceedings of the 21st

IEEE Design Automation Conference, pages 152-159, 1984.

[16] B. A. Ridley et al. Solution-processed inorganic transistors and sub-micron non-

lithographic patterning using nanoparticle inks. In Materials Research Society

Proceedings, Fall 1999.

[17] B. A. Ridley, B. Nivi, and J. M. Jacobson. All-inorganic field-effect transistors

fabricated by printing. Science, 286:746-749, 1999.

[18] Arnold L. Rosenberg. Three-dimensional integrated circuitry, pages 69-80. VLSI

Systems and Computations. Computer Science Press, Rockville, MD, 1981.

[19] Michael Sipser. Introduction to the Theory of Computation. PWS Pub., Boston,

1997.

[20] Francoise F. Souli6, Yves Robert, and Maurice Tchuente, editors. Automata

Networks in Computer Science: Theory and Applications. Princeton University

Press, Princeton, NJ, 1987.

90

[21] Andrew C. Tickle. Thin-Film Transistors; A New Approach to Microelectronics.

1961.

[22] Tomaso Toffoli and Norman Margolus. Cellular Automata Machines: A New

Environment for Modeling. MIT Press, Cambridge, MA, 1991.

[23] Neil H. E. Weste and Kamran Eshraghian. Principles of CMOS VLSI Design:

A Systems Perspective. Addison-Wesley, Reading, MA, 1993.

[24] Ronald Williams and Ogden Marsh. Future wsi technology: stacked monolithic

wsi. IEEE Transactions on Components, Hybrids, and Manufacturing Technol-

ogy, 16:610-614, 1993.

[25] P. Zavracky. 3d microelectronics. On the World-Wide-Web at http://www. ece.

neu. edu/ edsnu/ zavracky/ mfl/ programs/ 3d/ 3dmicro.html.

91

design and implementation of three-dimensional logic

Documents