105.100.000 designntest esl special edition complete

107
SEPTEMBER-OCTOBER 2006 Electronic System-Level Design Component-Based Design Platform-Based Taxonomy Improving Transition Delay Test IEEE DESIGN & TEST OF COMPUTERS September-October 2006 Electronic System-Level Design VOLUME 23 NUMBER 5 ALEXANDER TORRES 2006 SPECIAL ITC SECTION

Upload: 646

Post on 28-Nov-2014

119 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 105.100.000 DesignnTest ESL Special Edition Complete

SEPTEMBER-OCTOBER 2006

Electronic System-Level DesignComponent-Based Design • Platform-Based Taxonomy

• Improving Transition Delay Test

IEEE DESIGN & TEST OF COM

PUTERSSeptem

ber-October 2006 Electronic System

-Level DesignVOLUM

E 23 NUMBER 5

ALEXANDER TORRES 2006

SPECIAL ITC SECTION

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:08 UTC from IEEE Xplore. Restrictions apply.

Page 2: 105.100.000 DesignnTest ESL Special Edition Complete

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:08 UTC from IEEE Xplore. Restrictions apply.

Page 3: 105.100.000 DesignnTest ESL Special Edition Complete

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:10 UTC from IEEE Xplore. Restrictions apply.

Page 4: 105.100.000 DesignnTest ESL Special Edition Complete

Features

335 Guest Editors’Introduction: The

True State of the Art of ESL DesignSandeep K. Shukla, Carl Pixley, and

Gary Smith

338 AComponent-BasedDesign Environment

for ESL DesignPatrick Schaumont and

Ingrid Verbauwhede

348 Modeling EmbeddedSystems: From

SystemC and Esterel to DFChartsIvan Radojevic, Zoran Salcic, and

Partha S. Roop

359 APlatform-BasedTaxonomy for ESL

DesignDouglas Densmore,

Roberto Passerone, and

Alberto Sangiovanni-Vincentelli

375 The Challenges ofSynthesizing Hardware

from C-Like LanguagesStephen A. Edwards

ITC Special Section

388 Guest Editor’sIntroduction: ITC Helps

Get More out of TestKenneth M. Butler

390 Extracting Defect Densityand Size Distributions

from Product ICsJeffrey E. Nelson, Thomas Zanon,

Jason G. Brown, Osei Poku,

R.D. (Shawn) Blanton, Wojciech Maly,

Brady Benware, and Chris Schuermyer

402 Improving Transition Delay Test Using a

Hybrid MethodNisar Ahmed and

Mohammad Tehranipoor

414 Impact of ThermalGradients on Clock Skew

and TestingSebastià A. Bota, Josep L. Rosselló,

Carol de Benito, Ali Keshavarzi, and

Jaume Segura

September–October 2006Volume 23 Number 5

http://www.computer.org/dt

Copublished by the

IEEE Computer Society

and the IEEE Circuits and

Systems Society

ISSN 0740-7475

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:12 UTC from IEEE Xplore. Restrictions apply.

Page 5: 105.100.000 DesignnTest ESL Special Edition Complete

Cover design by Alexander Torres

Departments333 From the EIC387 Counterpoint425 TTTC Newsletter426 Book Reviews428 Standards430 CEDA Currents432 The Last Byte

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:12 UTC from IEEE Xplore. Restrictions apply.

Page 6: 105.100.000 DesignnTest ESL Special Edition Complete

Staff EditorRita Scanlan

IEEE Computer Society10662 Los Vaqueros CircleLos Alamitos, CA 90720-1314Phone: +1 714 821 8380Fax: +1 714 821 [email protected]

Group Managing EditorJanet [email protected]

Assoc. Staff EditorEd Zintel

Magazine [email protected]

Contributing EditorsThomas Centrella

Noel Deeley

Tim Goldman

Louise O’Donald

Joan Taylor

Art DirectionJoseph Daigle

Cover DesignAlexander Torres

PublisherAngela Burgess

[email protected]

Associate PublisherDick Price

Membership/CirculationMarketing Manager

Georgann Carter

Business Devel. Mgr.Sandy Brown

Advertising CoordinatorMarian Anderson

Editor in ChiefKwang-Ting (Tim) ChengUniv. of California, Santa Barbara

[email protected]

Editor in Chief EmeritusRajesh K. Gupta

Univ. of California, San Diego

[email protected]

Associate EICMagdy Abadir

Freescale Semiconductor

[email protected]

CS Publications BoardJon Rokne (chair)Michael R. BlahaMark ChristensenFrank FerranteRoger U. FujiiPhillip LaplanteBill N. SchilitLinda ShaferSteven L. TanimotoWenping Wang

CS Magazine OperationsCommittee

Bill N. Schilit (chair)Jean BaconPradip BoseArnold (Jay) BraggDoris L. CarverKwang-Ting (Tim) ChengNorman ChonackyGeorge V. CybenkoJohn C. DillRobert E. FilmanDavid Alan GrierWarren HarrisonJames HendlerSethuraman (Panch)

PanchanathanRoy Want

Submission Information: Submit a Word, pdf, text, or PostScript version of your

submission to Manuscript Central, http://cs-ieee.manuscriptcentral.com.

Editorial: Unless otherwise stated, bylined articles and columns, as well as product and service

descriptions, reflect the author’s or firm’s opinions. Inclusion in IEEE Design & Test of Computers

does not necessarily constitute endorsement by the IEEE Computer Society or the IEEE Circuits

and Systems Society. All submissions are subject to editing for clarity and space considerations.

Copyright and reprint permissions: Copyright ©2006 by the Institute of Electrical and

Electronics Engineers, Inc. All rights reserved. Abstracting is permitted with credit to the

source. Libraries are permitted to photocopy beyond the limits of US Copyright Law for

private use of patrons (1) those post-1977 articles that carry a code at the bottom of the first

page, provided the per-copy fee indicated in the code is paid through the Copyright

Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; (2) for other copying, reprint, or

republication permission, write to Copyrights and Permissions Department, IEEE Publications

Administration, 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855-1331.

IEEE Design & Test of Computers (ISSN 0740-7475) is copublished bimonthly by the IEEE

Computer Society and the IEEE Circuits and Systems Society. IEEE Headquarters: 345 East 47th

St., New York, NY 10017-2394. IEEE Computer Society Publications Office: 10662 Los Vaqueros

Circle, PO Box 3014, Los Alamitos, CA 90720-1314; phone +1 714 821 8380. IEEE Computer

Society Headquarters: 1730 Massachusetts Ave. NW, Washington, DC 20036-1903. IEEE Circuits

and Systems Society Executive Office, 445 Hoes Lane, Piscataway, NJ 08854; phone +1 732 465

5853. Annual subscription: $38 for CS members and $68 for other IEEE society members in

addition to IEEE and Computer Society dues; $69 for members of other technical organizations

outside the IEEE. Back issues: $20 for members and $96 for nonmembers. The Biomedical

Engineering Citation Index on CD-ROM lists IEEE Design & Test of Computers articles.

Postmaster: Send undelivered copies and address changes to IEEE Design & Test of

Computers, Circulation Dept., PO Box 3014, Los Alamitos, CA 90720-1314. Periodicals postage

paid at New York, NY, and at additional mailing offices. Canadian GST #125634188. Canada Post

Corp. (Canadian distribution) Publications Mail Agreement #40013885. Return undeliverable

Canadian addresses to 4960-2 Walker Road; Windsor, ON N9A 6J3. Printed in USA.

TECHNICAL AREAS____Analog and Mixed-SignalTest: Michel Renovell, LIRMM;[email protected]

CAE/CAD: Dwight Hill, Synopsys;[email protected]

Configurable Computing: FadiKurdahi, University of California,Irvine; [email protected]

Deep-Submicron IC Designand Analysis: Sani Nassif, IBM;[email protected]

Defect and Fault Tolerance:Michael Nicolaidis,iRoC Technologies;[email protected]

Defect-Based Test: Adit Singh,Auburn University,[email protected]

Design for Manufacturing,Yield, and Yield Analysis:Dimitris Gizopoulos, University ofPiraeus; [email protected]

Design Reuse: Grant Martin,Tensilica; [email protected]

Design Verification andValidation: Carl Pixley, Synopsys;[email protected]

Economics of Design andTest: Magdy Abadir, Freescale;[email protected]

Embedded Systems andSoftware: Sharad Malik,Princeton University;[email protected]

Embedded Test: Cheng-Wen Wu,National Tsing Hua University;[email protected]

Infrastructure IP: André Ivanov,University of British Columbia;[email protected]

Low Power: Anand Raghunathan,NEC USA; [email protected]

Memory Test: Fabrizio Lombardi,Northeastern University;[email protected]

Microelectronic IC Packaging:Bruce Kim, University of Alabama;[email protected]

Nanotechnology Architecturesand Design Technology:Seth Goldstein,Carnegie Mellon University;[email protected]

Performance Issues in IC Design:Sachin Sapatnekar, University ofMinnesota; [email protected]

SoC Design: Soha Hassoun, Tufts University; [email protected]

System Specification andModeling: Sandeep Shukla,Virginia Polytechnic and StateUniversity; [email protected]

Member at Large: Kaushik Roy,Purdue University;[email protected]

DEPARTMENTSBook Reviews: Scott Davidson,Sun Microsystems,[email protected];Grant Martin, Tensilica,[email protected]; and SachinSapatnekar, Univ. of Minnesota,[email protected]

Conference Reports andPanel Summaries: YervantZorian, Virage Logic;[email protected]

DATC Newsletter: Joe Damore;[email protected]

Interviews: Ken Wagner, DesignImplementation and OttawaDesign Centre, PMC Sierra;[email protected]

The Last Byte: Scott Davidson,Sun Microsystems;[email protected]

Perspectives: AlbertoSangiovanni-Vincentelli,University of California, Berkeley,[email protected]; andYervant Zorian, Virage Logic,[email protected]

The Road Ahead: AndrewKahng, University of California,San Diego; [email protected]

Roundtables: William H. JoynerJr., Semiconductor ResearchCorp.; [email protected]

Standards: Victor Berman,Cadence Design Systems;[email protected]

TTTC Newsletter: Bruce Kim, University of Alabama;[email protected]

D&T ALLIANCEPROGRAM______________DTAP chair:Yervant Zorian, Virage Logic;[email protected]

Asia: Hidetoshi Onodera, KyotoUniversity; [email protected]

CANDE: Richard C. Smith,EDA and Application ProcessConsulting; [email protected]

DAC: Luciano Lavagno,Politecnico di Torino,[email protected]; and Andrew Kahng, University of California, San Diego

DATC: Joe Damore;[email protected]

DATE: Ahmed Jerraya, TIMA;[email protected]

Europe: Bernard Courtois, TIMA-CMP; [email protected]

Latin America: Ricardo Reis,Universidade Federal do RioGrande do Sul; [email protected]

TTTC: André Ivanov, University of British Columbia;[email protected]

ADVISORY BOARD_____Anthony Ambler,University of Texas at Austin

Ivo Bolsens, Xilinx

William Mann

Tom Williams, Synopsys

Yervant Zorian, Virage Logic

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:14 UTC from IEEE Xplore. Restrictions apply.

Page 7: 105.100.000 DesignnTest ESL Special Edition Complete

3330740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006

DESIGNERS ARE HUNGRY for electronic system-

level (ESL) methodologies and supporting tools that

can raise the abstraction level of design entry and

enhance the global analysis and exploration of design

trade-offs. A recent report by Gartner Dataquest on

worldwide EDA market trends forecasted a strong

growth rate for ESL tools over the next five years.

However, existing solutions remain inadequate, and a

comprehensive ESL design infrastructure brings with

it several challenges that design and test professionals

must solve. This issue of IEEE Design & Test discusses

some of these challenges and their corresponding solu-

tions. Guest editors Sandeep Shukla, Carl Pixley, and

Gary Smith have collected a set of interesting articles

concerning languages, tools, and methodologies of

ESL design. I’d like to thank them for the great job

they’ve done in putting together this strong issue.

In addition, we are happy to present a special sec-

tion highlighting the 2006 International Test Conference

(ITC). In the sub-65-nanometer technology era, in which

electronic products encounter a far wider variety of fail-

ure sources and a higher failure rate than ever, test has

gradually expanded its role in the semiconductor indus-

try. Test is no longer limited to defect detection. It has

become a critical technology for debugging, yield

improvement, and design for reliability as well. This

trend inspired this year’s ITC theme, “Getting More out

of Test.” Guest editor Ken Butler, 2005 ITC program

chair, has selected three articles for this special section

that highlight this theme.

We also have some exciting plans for the next few

issues of D&T. Special-issue themes will include impor-

tant industry topics such as process variation and sto-

chastic design and test, biochips, functional validation,

and IR drop and power supply noise effects on design

and test. We will also present exciting roundtables, such

as the one moderated by Andrew Kahng at the 43rd

Design Automation Conference (DAC 06), on design

and tool challenges for next-generation multimedia,

game, and entertainment platforms. In addition, at the

6th International Forum on Application-Specific Multi-

Processor SoC (MPSoC 06), Roundtables editor Bill

Joyner moderated a roundtable on single-chip multi-

processor architectures, which we will include in a

future issue of D&T. You will also see interesting inter-

views with key technologists, such as Texas Instruments’

Hans Stork, keynote speaker at this year’s DAC.

If you’d like to participate in a future D&T issue,

please submit your theme or nontheme manuscript as

soon as it is ready. To serve as a guest editor, submit

your special-issue proposal for evaluation by the D&T

editorial board. See D&T’s Web site (http://computer.

org/dt) for guidelines. For additional information or

clarification, please feel free to contact me directly.

Kwang-Ting (Tim) Cheng

Editor in Chief

IEEE Design & Test

The new world of ESL design

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:19 UTC from IEEE Xplore. Restrictions apply.

Page 8: 105.100.000 DesignnTest ESL Special Edition Complete

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:30 UTC from IEEE Xplore. Restrictions apply.

Page 9: 105.100.000 DesignnTest ESL Special Edition Complete

0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 335

ESL, OR ELECTRONIC SYSTEM LEVEL, is a new

buzzword in the EDA industry. It has found its way into

the mainstream EDA vocabulary in the past few years

because of increased interest in finding new ways to raise

the abstraction level for the design entry point during the

electronic-systems design process.

In hardware design over the past three decades, the

design entry point has moved upward in the abstraction

hierarchy—from hand-drawn schematics to gate-level

design, to RTL descriptions. As hardware design com-

plexity has become increasingly unmanageable, find-

ing ways to design hardware ICs at higher abstraction

levels and developing tools to automatically create the

circuits’ actual layouts has gained more importance in

industry and academia. This upward trend in abstrac-

tion has enabled engineers to exploit the scientific and

engineering advances that have tracked Moore’s law

quite closely.

Since the late 1980s, the design entry point’s abstrac-

tion level had remained almost stagnant at the structur-

al RTL. Behavioral synthesis had remained mostly

elusive, with some domain-specific success areas, such

as DSP chips. But by the late 1990s, recognition of the so-

called “productivity gap problem” led to various

attempts at abstraction enhancement. These attempts

gave rise to various languages for system-level design,

such as SpecC, SystemC, and variants of these. Tools and

methodologies for using these languages for design entry

have emerged in the market and in academic circles.

In the meantime, integrated systems such as cell

phones, network routers, consumer electronics, and per-

sonal electronic devices like PDAs started to dominate

the electronics landscape. In contrast to earlier comput-

ing devices such as microcontrollers and general-purpose

microprocessors (GPPs), these systems had one thing in

common: you could distribute their functionality into

hardware or software with sufficient fluidity based on var-

ious trade-offs in performance, power, cost, and so on.

This development broke the hardware abstraction for

software development hitherto used in traditional com-

puting platforms, such as GPPs, as illustrated by the

Windows and Intel platforms in desktop computing.

Another phenomenon that had occurred for decades

in avionics, automotive, and industrial-control systems

also gained increased attention among EDA researchers.

The design of such systems’ embedded software was typ-

ically at a much higher abstraction level, using synchro-

nous languages, Argos-like visual environments, and so

on to describe required control trajectories. Control-sys-

tems engineers had also used Matlab and similar mathe-

matical and visual enhancements of such tools for

decades to design, validate, and even synthesize their

software. In the meantime, increasingly more computing

devices were mixtures of software and hardware, and

there was increased flexibility for deciding the hardware-

software partitioning. Consequently, architectural explo-

ration at the functional and architectural level became

increasingly critical for finding the right trade-off points

in the design. It’s best to perform such explorations

before the designers commit to the RTL hardware logic

design, or before the embedded-software writers commit

to the embedded-software code.

These evolutionary trajectories of electronic-system

design led to the introduction of ESL design. According

Guest Editors’ Introduction:The True State of the Art ofESL DesignSandeep K. Shukla

Virginia Polytechnic and State University

Carl Pixley

Synopsys

Gary Smith

Gartner Dataquest

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.

Page 10: 105.100.000 DesignnTest ESL Special Edition Complete

to the popular “ESL Now!” Web site (http://www.

esl-now.com), ESL design concerns the following:

■ “the development of product architectures and spec-

ifications, including the incorporation and configu-

ration of IP,”

■ “the mapping of applications to a product specifica-

tion, including hardware/software partitioning and

processor optimization,”

■ “the creation of pre-silicon, virtual hardware plat-

forms for software development,”

■ “the determination/automation of a hardware imple-

mentation for that architecture,” and

■ “the development of reference models for verifying

the hardware.”

In this special issue, we explore recent developments

in ESL languages, tools, techniques, and methodologies

related to improving productivity or enhancing design

quality. We wanted this issue to answer the following

key questions regarding ESL design:

■ What is ESL design, and what are the current lan-

guages that support ESL features?

■ What tool chains and design flows are appropriate

for ESL-based design and validation?

■ What new validation techniques and methodologies

are available if ESL abstractions are used in a design

flow? Are there any test technology benefits?

■ Are there major industrial projects today that have

been successful due to ESL usage?

■ What are the market indicators and forces that might

make or break ESL design?

Although the articles in this special issue don’t nec-

essarily answer all these questions, they address some

key issues and are quite thought-provoking. In the first

article, Patrick Schaumont and Ingrid Verbauwhede

focus on two properties they see as key to ESL design:

abstraction and reuse. They present an ESL design flow

using the Gezel language, and they show with several

very different design examples how Gezel supports their

case for reuse and abstraction.

The second article, by Ivan Radojevic, Zoran Salcic,

and Partha Roop, considers the need for directly express-

ing heterogeneous, hierarchical behaviors for modeling

specific embedded systems. The authors examined two

existing ESL languages: SystemC and Esterel. Their analy-

sis led them to create a new computation model as well

as a graphical language to gain the direct expressivity

they need for their model. Although there have been var-

ious attempts at changing SystemC and Esterel to fit mod-

eling requirements, these authors mainly consider

standard SystemC and Esterel here.

In the next article, Douglas Densmore, Roberto

Passerone, and Alberto Sangiovanni-Vincentelli attempt

to stem the seemingly ever-increasing tide of confusion

that permeates the ESL world. Not only are software

developers and hardware designers having difficulty find-

ing a common language—verbally, as well as design-

wise—but communication failures are common within

those communities as well. Traditionally, there are three

rules of design: First, there is a methodology, then there

is a design flow, and last there are the tools necessary to

fill that flow. But, as this article points out, we seem to

have approached ESL backward. We have built tools, but

we have no flow. And, it goes without saying, we have no

methodology. No wonder then that the predictions of ESL

taking off in the next four years seem to be overly opti-

mistic. Still, the customer demand is there. But these cus-

tomers have had to fill the need with internally

developed ESL tools. The University of California,

Berkeley, has long been the champion of platform-based

design, and these authors base their taxonomy on a com-

bination of UC Berkeley’s platform work and Dan Gajski’s

Y-chart work (at UC Irvine). Hopefully, this taxonomy will

help stem the tide of confusion and enable the design

community to turn around its ESL efforts.

Finally, the article by Stephen Edwards presents one

side of an ongoing debate on the appropriateness of C-

like languages as hardware description languages. In

the ESL landscape, it is often assumed that a high-level

programming language can substitute for a higher-

abstraction-level hardware description language. This

article attempts to deconstruct such a myth about the C

programming language by extensively documenting the

shortcomings of such an approach and by identifying

the features that an ESL language should have. A brief

alternative opinion by John Sanguinetti immediately fol-

lows this article.

ESL DESIGN, METHODOLOGIES, LANGUAGES, AND

TOOLS are still not clearly identified and taxonomized,

and the articles in this special issue attempt to reduce

some of the confusion regarding the term ESL. However,

we believe that we are still in the early stages of ESL-

based design. Many more discussions, expository arti-

cles, and debates must take place before it can find its

permanent design entry point in industry.

Electronic System-Level Design

336 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.

Page 11: 105.100.000 DesignnTest ESL Special Edition Complete

The articles in this special issue could not cover

everything. Although many of the synthesis technologies

mentioned address algorithm design—for instance, for

DSP—technologies to synthesize high-level control logic

are necessary for ESL design to address the breadth of

circuits designed by hardware engineers. In the recent

past, researchers insufficiently addressed behavioral syn-

thesis, but this segment is now showing increased activ-

ity. Bluespec (http://www.bluespec.com), for example,

offers new technology to raise the abstraction level for

complex control logic and to synthesize RTL design from

these descriptions. Other behavioral-synthesis solutions

are coming to the market as transaction-level models.

We hope you find the articles in this special issue

interesting. We encourage you to send us critiques,

comments, or questions about this special issue. Letters

to the editor for publication in future issues are also

encouraged. Finally, we thank the authors, the review-

ers, and the editorial staff at IEEE Design & Test for their

help in making this issue possible. ■

Sandeep K. Shukla is an assistantprofessor of computer engineering atVirginia Tech. He is also founder anddeputy director of the Center forEmbedded Systems for Critical Appli-

cations (CESCA), and he directs the Fermat (FormalEngineering Research with Models, Abstractions, andTransformations) research lab. His research interestsinclude design automation for embedded-systemsdesign, especially system-level design languages, for-mal methods, formal specification languages, proba-bilistic modeling and model checking, dynamic powermanagement, application of stochastic models andmodel analysis tools for defect-tolerant system design,and reliability measurement of defect-tolerant systems.Shukla has a PhD in computer science from the StateUniversity of New York (SUNY) at Albany. He has beenelected as a College of Engineering Faculty Fellow atVirginia Tech, and he is on the editorial board of IEEEDesign & Test.

Carl Pixley is group director at Syn-opsys. His pioneering achievementsinclude model checking based onbinary decision diagrams (BDDs),Boolean equivalence, alignability

equivalence, constraint-based verification, and C-to-RTL verification. Pixley has a PhD in mathematics fromSUNY at Binghamton. He is a member of the IEEE and

the Mathematics Association of America, and is verifi-cation editor for IEEE Design & Test.

Gary Smith is a chief analyst atGartner Dataquest, where he is part ofthe Design & Engineering Group andserves in the Electronic DesignAutomation Worldwide program. His

research interests include design methodologies,ASICs, and IC design. Smith has a BS in engineeringfrom the United States Naval Academy in Annapolis,Maryland. He is a member of the Design TechnologyWorking Group for the International TechnologyRoadmap for Semiconductors (ITRS).

Direct questions or comments about this specialissue to Sandeep K. Shukla, Department of Electricaland Computer Engineering, Virginia Polytechnic andState University, Blacksburg, VA 24061, [email protected]; Carl Pixley, Synopsys, 2025 NW Cornelius PassRd., Hillsboro, OR 97124, [email protected]; orGary Smith, Gartner Dataquest, 281 River Oaks Pkwy,San Jose, CA 95134, [email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

337September–October 2006

The IEEE Computer Society

publishes over 150 conference

publications a year.

For a previewof the latest papersin your field, visit

The IEEE Computer Society

publishes over 150 conference

publications a year.

For a previewof the latest papersin your field, visit

www.computer.org/publications/

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.

Page 12: 105.100.000 DesignnTest ESL Special Edition Complete

Electronic System-Level Design

338 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

RECENTLY, there has been an increasingly greater

variety of target architecture options for digital electronics

design. Whereas the driving applications for these archi-

tectures are often governed by standards and thus tend

to be regularized, there is still a lot of design freedom in

the target architectures themselves. There is a wide range

of programmable-processor architectures,1,2 and with any

given application, designers must balance performance,

power consumption, time to market, and silicon cost.3

The obvious question is how to choose the most appro-

priate target architecture for a given application.

In this article, we present Gezel, a component-based,

electronic system-level (ESL) design environment for

heterogeneous designs. Gezel consists of a simple but

extendable hardware description language (HDL) and

an extensible simulation-and-refinement kernel. Our

approach is to create a system by designing, integrating,

and programming a set of programmable components.

These components can be processor models or hard-

ware simulation kernels. Using Gezel, designers can

clearly distinguish between component design, plat-

form integration, and platform programming, thus sep-

arating the roles of component builder, platform

builder, and platform user.

Embedded applications have driven the develop-

ment of this ESL design environment. To demonstrate

the broad scope of our component-based approach, we

discuss three applications that use our environment; all

are from the field of embedded security.

ESL design has many facesA common definition for ESL design

is the collection of design techniques for

selecting and refining an architecture.

But ESL design has many aspects and

forms. Even within a single application

domain, system-level design can show

wide variations that are difficult to cap-

ture with universal design languages and architectures.

Therefore, you can also think of ESL design as the abil-

ity to successfully assemble a system out of its con-

stituent parts, regardless of their heterogeneity or nature.

Consider the following three examples. All of them

closely relate to design for secure embedded systems,

but they also require very different design configura-

tions. Thus, these examples show the need for a more

general approach, which we achieve using Gezel.

Example 1: Public-key cryptography on 8-bitmicrocontrollers

Sensor networks and radio-frequency identification

tags are examples of the next generation of distributed

wireless and portable applications requiring embedded

privacy and authentication. Public-key systems are

preferable because they allow a more scalable, flexible

key distribution compared to secret-key cryptosystems.

Unfortunately, public-key systems are computationally

intensive and hence consume more power. Recent pro-

posals suggest replacing the RSA (Rivest-Shamir-

Adleman) system with more economical solutions such

as elliptic-curve cryptosystems (ECCs) or hyper-elliptic-

curve cryptosystems (HECCs). ECCs and HECCs provide

security levels equivalent to RSA but with shorter word

lengths (a 1,024-bit RSA key is equivalent to a 160-bit

ECC key and an 83-bit HECC key), at the expense of

highly complex arithmetic. Figure 1 shows the hierar-

chy and mapping of such a system. On top is the HECC

A Component-Based DesignEnvironment for ESL Design

Editor’s note:This article focuses on two key properties that the authors see as critical toESL design: abstraction and reuse. The authors present an ESL design flowusing the Gezel language. Using several very different design examples, theyshow how this design flow supports their case for abstraction and reuse.

—Carl Pixley, Synopsys

Patrick Schaumont

Virginia TechIngrid Verbauwhede

Katholieke Universiteit Leuven

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 13: 105.100.000 DesignnTest ESL Special Edition Complete

point multiplication oper-

ation, which consists of a

sequence of basic elliptic-

curve point operations.

Each of these basic ellip-

tic-curve operations con-

sists of a sequence of

more elementary opera-

tions in the underlying

Galois field. For HECC,

this field is 83 bits. If the

system were an ECC, this

field would be 160 bits.

We implemented this

design as an 8051 micro-

controller, extended with

a hardware acceleration

unit. The 8-bit microcon-

troller interfaces are quite

narrow compared to HECC

word lengths. Therefore,

when building a hardware acceleration unit, it is crucial to

consider overall system performance. Because of the hier-

archy in the calculations, there are multiple ways to accel-

erate the HECC operations—in contrast to secret-key

algorithms, which have fewer hierarchy layers and thus

offer fewer implementation choices. As a stand-alone opti-

mized C implementation, an HECC point multiplication

takes 192 seconds to calculate. A small hardware accel-

erator, requiring only 480 extra FPGA lookup unit tables

(LUTs) and 100 bytes of RAM, improves this time by a fac-

tor of 80, to only 2.5 seconds. Figure 1 indicates the result-

ing split between hardware and software, which is not yet

optimal for an 8051.

Hardware acceleration makes HECC public key pos-

sible on small, embedded microcontrollers. But the

optimal implementation depends on the selection of

the mathematical algorithms and the system-level archi-

tecture. Only a platform-based design approach makes

this design space exploration possible and discloses

opportunities for global improvement.

Example 2: Concurrent codesign for securepartitioning

The design of secure embedded systems leads to

design cases requiring tight interaction between hard-

ware and software—even down to the single-statement

level. Figure 2 shows a fingerprint authentication design,

the ThumbPod-2 system, which is resistant to side-chan-

nel attacks; we implemented and tested this design in

silicon.4 The protocol, shown in Figure 2a, accepts an

input fingerprint and compares it to a prestored, secret

template. The matching algorithm must treat this tem-

plate as a secret, and the ThumbPod-2 system stores it

in a secure circuit style that is resistant to side-channel

attacks. However, because the matching algorithm

manipulates the template, part of the algorithm’s circuit

must also migrate to a secure circuit style. Because this

secure circuit style consumes twice the area of normal

circuits, mapping the complete matching protocol to it

would be inefficient. We therefore separated the proto-

col into an insecure software partition and a secure

hardware partition, and we ended up with the imple-

mentation in Figure 2b. The software reads the input fin-

gerprint and feeds the data to the oracle inside the

secure partition. The oracle compares each input minu-

tia with the template minutia, returning only a global-

matching result: reject or accept. It is impossible for an

attacker with full access to the untrusted software to

determine how the oracle has obtained this decision.

The design and verification of the secure protocol

requires continuous covalidation between hardware

and software. We evaluated various attack scenarios

that attempt to extract the secret template from the

secure hardware partition, assuming that the attacker

can arbitrarily choose the software program at the inse-

cure partition. This led to an iterative refinement of the

oracle interface and the driving software, which we

designed completely within the Gezel environment.

339September–October 2006

P0 P1

CTL DATA

Software

Hardware

Hyper-elliptic-curvecryptography (HECC)

Scalar multiplication

Point or divisoroperations

CombinedGalois field (2nelements)

operations

BasicGalois field (2n elements)

operations

Galois field (2n)coprocessor

8051 CPU

API

C code

Assemblylanguageroutines

Microcodesequences

Data path

Figure 1. Public-key cryptography on an 8-bit microcontroller.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 14: 105.100.000 DesignnTest ESL Special Edition Complete

Example 3: Accelerated embedded virtualmachines

For a third application, shown in Figure 3, we had to

provide hardware acceleration of a cryptographic

library for an embedded virtual machine.5 We used a

Java embedded virtual machine, the Kilobyte Virtual

Machine (KVM), extended with native methods that

allow hardware access directly from a Java application.

We integrated an advanced encryption standard (AES)

coprocessor into the Java virtual machine’s host proces-

sor, and we triggered execution of the coprocessor

using a native method. The virtual machine handles all

data management and synchronization. As Figure 3b

shows, hardware acceleration can improve perfor-

mance by two orders of magnitude. Moreover, data

movement from Java, to and from the coprocessor, has

two orders of magnitude of overhead compared to actu-

al hardware execution. A combined optimization of the

Java-native API, the coprocessor, and the coprocessor

interface is necessary to avoid design errors and, more

importantly, security holes in the final system.

All three examples are practical design problems

from the field of embedded security. There is no unified

design platform or unified design language that could

solve all of them. However, it’s still possible to general-

ize their underlying design principles by using a com-

ponent-based approach.

Component-based ESL designEach programmable architecture comes with a specif-

ic set of design techniques. ESL design, therefore, is no

tightly knit set of techniques, tools, and data models.

Unlike RTL design, which logic synthesis enabled, ESL

design doesn’t offer a standard design flow. In fact, ESL

design might never be unified in a single design flow, given

the architectural scope, the complexities in capturing all

facets of an application, and the daunting task of devel-

oping tools for these different facets. Still, all ESL tech-

nologies share two fundamental objectives: facilitating

design reuse and supporting design abstraction. These two

objectives have guided every major technology step that

has turned transistors into gates, and gates into systems.

Reuse and abstraction for ESL design, however, are

unique and different from other technology transitions.

In ESL design, reuse relates not only to architectures but

also to design environments. For example, when a

designer builds a SoC architecture around a micro-

processor, the microprocessor’s compiler and the instruc-

tion-set simulator (ISS) are as critical to the design’s

success as the actual microprocessor implementation.

Electronic System-Level Design

340 IEEE Design & Test of Computers

(b)(a)Root of trust

ThumbPod-2 client

Minutiaeextraction

Matchingalgorithm

Reject Accept

Loadbogus

Loadmaster

Session key Sk

Secure circuit style

Template

Masterkey

Template

Oracle

Masterkey

Cryptographymodule

RAM orFlash

Leon-2processor

Bridge AMBA UART

Out port In port

Chip commandinterface

Matching

algorithm

AMBAUART

Advanced Microcontroller Bus ArchitectureUniversal asynchronous receiver transmitter

Figure 2. Partitioning for security in the ThumbPod-2 system: protocol for session key generation (a), and

implementation (b).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 15: 105.100.000 DesignnTest ESL Special Edition Complete

The compiler and the simulator are

reused design environments, and the

microprocessor is a reused design artifact.

As another example, consider SystemC.

You can view SystemC as a reusable

design environment for RTL hardware

components. As a C++ library, it can link

to any environment that needs RTL hard-

ware design capability; thus, the SystemC

library itself is a reusable component.

Abstraction in ESL design concerns

not only the masking of implementation

details but also platform programming

mechanisms. Finding successful system-

level abstractions is extremely difficult,

because abstractions tend to restrict the

scope of coverable target architectures.

For example, C is a successful program-

ming abstraction for a single-core system,

but it becomes cumbersome in multi-

core systems. Despite the multitude of

system-level design languages, none has

so far been able to unify the architecture

design space in a single programming

abstraction.

These two elements of ESL design—its reuse of

design environments and design artifacts, and the com-

ponent-specific nature of its programming abstrac-

tions—guided us toward a component-based approach

in system design. In ESL design, we define a component

as a single programmable element included in a plat-

form. For example, a microprocessor, reconfigurable

hardware, a software virtual machine, and the SystemC

simulation kernel are all programmable components.

As Figure 4 shows, a component-based model for ESL

341September–October 2006

0

1

2

3

4

55.28

3.25Driver C

CPU

Cryptographyhardware

or

(b)(a)

Java applicationelectronic code book aes()

J2ME

Kilobyte VirtualMachine (KVM)

Cryptography

Advanced encryptionstandard (AES)

KVM native interface(KNI)

AESCore

Java APIinterface

KNIinterface

Hardware-softwareinterface

AEScoprocessor

Ove

rhea

d(n

o. o

f cyc

les,

log 1

0)

Java Hardware

109×performance

gain

160×integrationoverhead

Figure 3. Accelerated embedded virtual machine: general structure (a) and performance improvements and

associated overhead (b).

Programming interface

Simulation-and-refinementkernel

Integration interface

Finite-statemachine with data

path (FSMD)

Instructionset

architecture

Gezelkernel

Instructionset simulator

IP block Memory bus

Gezel ARM-C

Platform

Componentdesign

Platformdesign

Platform-baseddesign

Applicationindependent

Application-domainspecific

Applicationspecific

Scheduling and interconnect

Figure 4. Three phases for ESL design automation: component, platform,

and platform based.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 16: 105.100.000 DesignnTest ESL Special Edition Complete

design requires a design flow with three phases of

design: component, platform, and platform based. These

phases correspond to the creation, integration, and use

of programmable components. Several different engi-

neers might work in each design phase, each with his

own perspective on an application. These engineers gen-

erally fall into one of three categories: design automa-

tion, hardware design, or software design. Figure 4 offers

the perspective of the design automation engineer.

In component design, a design automation engineer

develops a design environment for a single program-

mable component. The engineers can do this indepen-

dent of the application. Two interfaces—integration and

programming—characterize a programmable compo-

nent. Through the integration interface, a component

connects to another (possibly heterogeneous) compo-

nent. Between these two is a simulation-and-refinement

kernel. Component design can be very elaborate,

including, for instance, the development of an ISS and

a C compiler for a new processor.

In platform design, a design engineer or design

automation engineer selects various programmable

components and combines them into a single platform

by interconnecting their integration interfaces. Platform

design requires the creation of a platform system sched-

uler to coordinate the individual components’ activi-

ties. This phase also requires the creation of

communication channels between components. The

notion of a platform as an articulation point between

application and architecture is a well-known concept.6,7

In platform-based design, a design engineer devel-

ops an application by writing application programs for

each programmable component in the platform. The

platform simulator lets the designer instantiate a partic-

ular application and tweak overall system performance.

For heterogeneous components, it’s important to bring

the individual components’ programming semantics

sufficiently close together so that a designer can easily

migrate between them.

Designers have used

component-based design

approaches, typically in

software development, to

address problems requir-

ing high architectural

flexibility. For example,

Cesario et al. presents

a component-based ap-

proach for multiprocessor

SoC (MPSOC) design,8

based on four types of components: software tasks,

processor cores, IP cores, and interconnects.

Designing and integrating FSMDcomponents with Gezel

The Gezel design environment (http://rijndael.ece.vt.

edu/gezel2) supports the modeling and design of hard-

ware components. By integrating the Gezel kernel with

other simulators (such as ISSs), we obtain a platform

simulator. The three examples we discussed all rely on

custom hardware design, each with a different platform.

We’ve combined Gezel with other programmable com-

ponents, such as 32- and 8-bit cores. We’ve also com-

bined it with other types of programming environments,

including the SystemC simulation kernel and Java. For

the parts of the design described in the Gezel language,

the Gezel design environment automatically creates

VHDL, enabling technology mapping into FPGA or

standard cells.

Platform-based design using GezelThe Gezel language captures hardware using a cycle-

based description paradigm based on the finite-state

machine with data path (FSMD) model. Widely used for

RTL hardware design, this model has been popularized

through SpecCharts and SpecC.9 The FSMD model

expresses a single hardware module as a combination

of a data path and its controller. You can combine sev-

eral different FSMDs into a network, as Figure 5a shows.

A pure FSMD network is only of limited value for a plat-

form simulator, because such a network supports only

communication between FSMDs. Such a network does-

n’t have the means to communicate with any part of a

platform that is not captured as an FSMD.

To employ FSMDs as platform components, Gezel

supports extended FSMD networks, as Figure 5b shows.

Such an extended FSMD network also includes a sec-

ond type of module called an IP block. An IP block has

an interface similar to that of an FSMD, but the IP block

Electronic System-Level Design

342 IEEE Design & Test of Computers

FSMD2FSMD1FSMD2FSMD1

IP blockWire (input is

instantaneouslydefined by connected

output)(a) (b)

Figure 5. Finite-state machine with data path (FSMD) network: pure (a) and extended (b).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 17: 105.100.000 DesignnTest ESL Special Edition Complete

is implemented outside the Gezel lan-

guage. A similar concept of capturing

heterogeneity also exists in Ptolemy.10

Technically, an IP block is implemented

as a shared library in C++ and thus

can include arbitrary programming con-

structs within the boundaries of a

cycle-based interface. To the Gezel pro-

grammer, the IP block looks like a simu-

lation primitive. The platform designer

defines the IP block’s behavior. In a

component-based design model, these

IP blocks implement communication

channels, which connect Gezel to a

wide range of other components, such

as ISSs, virtual machines, and system

simulation engines.

Platform design using GezelFigure 6 illustrates a platform simula-

tor that uses the Gezel kernel and sever-

al ISSs. Each component simulator exists

as an individual (C++) library, linked

together in a system simulation. For this

platform simulator, we use IP blocks to implement the

cosimulation interfaces between the Gezel model and

the ISS. In addition, a system scheduler calls all the

included component simulators. We implement the

platform simulator in C++.

The extended FSMD network in Gezel, combined

with the component-based design model, offers essen-

tial advantages over a traditional HDL- or SystemC-

based approach. VHDL has no means to natively

support a simulation setup like the one in Figure 6,

because it lacks the equivalent of an IP block construct.

Consequently, an HDL-based design flow usually imple-

ments such a simulation setup at the HDL level. This

needlessly increases simulation detail and penalizes

simulation performance.

It’s also possible to implement such a simulation

setup in SystemC. But the platform and the application

are no longer distinguishable, because SystemC cap-

tures everything in C++. This complicates the synthesis

of the application onto the final platform. In other

words, SystemC does not distinguish between the plat-

form and platform-based design phases.

Table 1 lists several platform components that

we’ve used with Gezel to create platform simulators.

They include 8- and 32-bit ISSs, Java (through its native

interface), and SystemC. We coupled each of these

simulators to the Gezel FSMD model using IP blocks.

There are two categories of IP blocks, corresponding

to two different design scenarios: IP blocks that model

a processor’s bus or a dedicated communication port

to implement a coprocessor design scenario like the

one in Figure 7a. Other IP blocks capture a complete

component.

Designers can also use the Gezel IP block construct

to explore multiprocessor architectures, such as the

PicoBlaze microcontrollers shown in Figure 7b. In the

multiprocessor design scenario, the Gezel model cap-

tures the complete platform, clearly improving flexibil-

ity. In addition, this model allows dynamically selecting

the number and types of cores. The Gezel language cap-

tures synchronous, single-clock hardware designs. The

platform simulators in Table 1, however, can accom-

modate multiple clock frequencies to the individual

processors included within the simulation.

Many of the environments in Table 1 are open

source, which greatly eases the construction of platform

simulators. In commercial environments, open source

might still be an unattainable goal, but there are still sig-

nificant benefits from using an open interface. Several

of our cosimulators (including TSIM and SH-ISS) use

commercial, closed-source components, built on the

basis of an open interface.

343September–October 2006

FSMDEmbeddedsoftware

Application

Platform simulator

Gezel kernel(C++ library)

Parser

VHDLcode

generator

C++code

generator

RTcode

generator

Executable-object

hierarchy

Communicationchannel

Instructionset simulatorUser-defined

IP blockimplementation

Cycle-truesystem scheduler

Figure 6. Gezel platform simulator.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 18: 105.100.000 DesignnTest ESL Special Edition Complete

Systematic reuse with a component-basedapproach

We can also implement IP management with Gezel.

IP transfer is notoriously difficult because reuse inter-

faces are hard to define. Microprocessor buses have tra-

ditionally been the reuse interface of choice. New

industry efforts such as the Open Core Protocol IP (OCP-

IP, http://www.ocpip.org) and the Spirit consortium

(http://www.spiritconsortium.com) have focused on

generically packaging IP components rather than using

standard buses. Spirit’s approach is to provide a meta-

data model that encapsulates existing IP components

(expressed in VHDL or

SystemC, for example).

The metadata provides

additional language-neu-

tral information on the IP

interface. However, a

component-based design

flow with Gezel does not

need this encapsulation,

because the language

directly models the reuse

interfaces. Indeed, these

reuse interfaces corre-

spond to the set of IP

blocks that connect the

Gezel models to other plat-

form components.

Consider the case in

which multiple parties

participate in the plat-

form-based design phase.

For example, for the simu-

lator of Figure 6, assume

that an IP developer cre-

ates hardware components in Gezel, and a system inte-

grator creates the system (embedded) software. In such

a case, the IP developer expects a reasonable level of

IP protection before releasing the actual implementa-

tion, whereas the system integrator wants access to the

hardware components as detailed and as soon as pos-

sible. Gezel can support this scenario, as Figure 8 shows.

We define two phases in the IP transfer. In IP creation

and evaluation, the IP developer provides a cycle-based

simulation model of the hardware IP as a black box to

the system integrator; this model provides a nonsyn-

thesizable simulation view of the IP. When the system

Electronic System-Level Design

344 IEEE Design & Test of Computers

Table 1. Platform simulators using Gezel.

Simulation Cross-compiler IP block interface

Component engine* or assembler Core Port or bus

8-bit cores

Atmel AVR Avrora GNU avr-gcc •PicoBlaze kpicosim KCPSM3 assembler • •8051 Dalton ISS SDCC, Keil CC • •32-bit cores

ARM Simit-ARM GNU arm-linux-gcc • •Leon2-Sparc TSIM GNU sparc-rtems-gcc •SH3-mobile SH-ISS GNU sh-elf-gcc •Simulation engines

Java JVM 1.4 javac •SystemC SystemC 2.0.1 GNU g++ •* Information on simulation engines is available as follows:

Avrora: http://compilers.cs.ucla.edu/avrora (open source);

kpicosim: http://www.xs4all.nl/~marksix (open source);

Dalton ISS (Dalton 8051): http://www.cs.ucr.edu/~dalton/i8051 (open source);

Simit-ARM: http://sourceforge.net/projects/simit-arm (open source);

TSIM (TSIM 1.2; cross compiler, sparc-rtems-gcc 2.95.2): http://www.gaisler.com;

SH-ISS (Renesas SH3DSP simulator and debugger, v3.0; cross compiler: sh-elf-gcc 3.3):

http://www.kpitgnutools.com

ARMFSMD

networkFSMD

network

IPblock

IPblock

PicoBlazeIP block

PicoBlazeIP block

(a) (b)

Gezel model Gezel model

Figure 7. Application of different IP block categories: coprocessor (a) and multiprocessor (b) design scenarios.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 19: 105.100.000 DesignnTest ESL Special Edition Complete

integrator decides to

acquire the hardware IP,

the second phase of the IP

transfer begins. Now the

IP developer provides a

synthesizable version of

the hardware IP in VHDL.

The component-based

approach of Gezel is well-

suited for this IP design

flow. We model black

boxes as IP blocks. The IP

block simulation views are

in binary format as shared

libraries, and thus of little

value for this implementa-

tion. We wrote two code

generators for FSMD net-

works in Gezel. The first

converts FSMDs into

equivalent IP block simu-

lation views. The second converts FSMD

into synthesizable VHDL code. The IP

developer can use them together to

implement the design flow of Figure 8.

Table 2 shows several examples of IP

modules designed in Gezel. They range

from simple components, such as an

Internet packet check-sum evaluation

module (CHKSUM) to complex IP mod-

ules, such as an AES module and a high-

speed Gaussian-noise generator for

bit-error-rate measurements (BOXMUL).

For each module, Table 2 lists the line

counts of the original Gezel design and the amount of

generated code in C++ and VHDL. We also mapped the

VHDL code onto an FPGA, and Table 2 gives the area

and speed of the results. We expect the numbers shown

to be close to those of manually written VHDL. For

example, a comparable AES design by Usselman on

Xilinx Spartan3 technology lists a LUT count of 3,497.

Design examples revisitedNow, we briefly discuss how we used our compo-

nent-based approach to support the three design exam-

ples presented earlier.

Public-key cryptographyThe platform simulator for the HECC application

consisted of two components: the Gezel kernel and the

8051 ISS (http://www.cs.ucr.edu/~dalton/i8051/). Using

IP block models, we designed communication links

between the 8051 ISS and the coprocessor. We devel-

oped the driver software running on the 8051 using the

Keil tool suite. The platform simulator maps the HECC

mathematical formulas into a combination of C, assem-

bly language, and hardware. After obtaining a suitable

partitioning, we converted the hardware coprocessor

into VHDL. We then combined this coprocessor with a

synthesizable view of the 8051 processor and mapped

it into an FPGA.

Security partitioning for an embeddedfingerprint authentication design

This platform contains the Leon2 ISS and the Gezel

kernel. We constructed it in a process similar to that of

345September–October 2006

C++

IPcreationandevaluation

IPtransfer

IP transfer VHDLARMcoreVHDL

bus

Platform implementation

Implementation

Hardware developer System integrator

Gezel

Gezel

IP block ARM-C

FSMD(Black-box

view)

VHDL codegeneration

Generatesimulation view Simulation

library

Platform

Figure 8. IP reuse in the platform-based design phase.

Table 2. IP model complexity. (NCLOC: noncommented source line of code)

Model line count (NCLOC) Area (no.

Design Gezel C++ (IP blocks) VHDL of LUTs)* Speed (ns)**

CHKSUM 149 1,564 907 131 9.19

EUCLID 69 710 62 557 560.00

JPEG 526 8,091 719 5,514 14.62

AES 292 2,653 1,807 3,332 8.29

BOXMUL 763 6,105 6,282 4,225 20.30

* Target platform was Xilinx Virtex4, speed grade 12

** Speed is the clock period we recorded after place and route.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 20: 105.100.000 DesignnTest ESL Special Edition Complete

constructing the public-key cryptography platform. We

developed software using the GNU tool suite. In a later

design phase, we used the VHDL code generator to con-

vert the Gezel design into VHDL, eventually leading to

a tested and fully functional chip.4 This design, howev-

er, requires fitting the hardware coprocessor onto a non-

standard synthesis design flow based on logic for

resisting side-channel attacks. So that chip designers

could verify their custom synthesis flows, we extended

the platform simulator to record trace stimuli for indi-

vidual hardware modules. We can also provide this

capability using the IP block approach. It is important

to separate design flow issues, such as the stimuli

recording facility, from actual design issues. The design

flow in Figure 4 also supports this concept by distin-

guishing between the platform builder and the platform

user. Gezel lets users write new IP blocks in C++ accord-

ing to a standard template, and more advanced Gezel

users can develop them as library plug-ins.

Acceleration of embedded virtual machinesFor the third design, we integrated three components:

a port of the Java-embedded virtual machine, the SH3-

DSP ISS, and the Gezel kernel. We developed software in

Java, C, and assembly language. In addition, this design

required a considerable number of cryptographic sup-

port libraries. This kind of design demonstrates the impor-

tance of varying the design abstraction level within a

single platform. The entire cryptographic application in

Java can take millions of clock cycles, and the hardware

coprocessor is active for a fraction of the time. On the one

hand, we need increased simulation efficiency (and

decreased simulation detail) for much of the design, but

on the other hand, at a few select places we must observe

every bit that toggles in every gate. A component-based

design approach can cope with this heterogeneity.

HETEROGENEOUS SYSTEM architectures will continue

to dominate in applications that require dedicated,

high-performance, and energy-efficient processing. The

challenge at the electronic system level will be to design

these architectures in increasingly shorter design cycles.

New tools will have to quickly create not only deriva-

tive platforms but also entirely new platforms. We are

exploring novel mechanisms in Gezel to further accel-

erate platform construction, and we are presently work-

ing on such a platform designer for FPGA technology.

We’d also like to stress that ESL design requires not

only new tools but also a change in design culture.

Designers of heterogeneous architectures will inevitably

get in touch with new design cultures and practices, not

only from those novel ESL tools but also from their col-

league designers. ■

AcknowledgmentsWe thank the reviewers for their constructive feed-

back. We also thank the many students who have

experimented with Gezel and whose designs we’ve

mentioned in this article. This research has been made

possible with the support of STMicroelectronics,

Atmel, the National Science Foundation, University of

California Microelectronics and Computer Research

Opportunities (UC Micro), SRC, and FWO (Fonds voor

Wetenschappelijk Onderzoek).

References1. C. Rowen and S. Leibson, Engineering the Complex

SoC: Flexible Design with Configurable Processors,

Prentice Hall, 2004.

2. T.J. Todman et al., “Reconfigurable Computing:

Architectures and Design Methods,” Proc. IEE, vol. 152,

no. 2, Mar. 2005, pp. 193-207.

3. D. Talla et al., “Anatomy of a Portable Digital

Mediaprocessor,” IEEE Micro, vol. 24, no. 2, Mar.-Apr.

2004, pp. 32-39.

4. K. Tiri et al., “A Side-Channel Leakage Free Coproces-

sor IC in 0.18um CMOS for Embedded AES-Based

Cryptographic and Biometric Processing,” Proc. 42nd

Design Automation Conf. (DAC 05), ACM Press, 2005,

pp. 222-227.

5. Y. Matsuoka et al., “Java Cryptography on KVM and Its

Performance and Security Optimization Using HW/SW

Co-design Techniques,” Proc. Int’l Conf. Compilers,

Architecture, and Synthesis for Embedded Systems

(CASES 04), ACM Press, 2004, pp. 303-311.

6. T. Claassen, “System on a Chip: Changing IC Design

Today and in the Future,” IEEE Micro, vol. 21, no. 3,

May-June 2003, pp. 20-26.

7. A. Sangiovanni-Vincentelli, “Defining Platform-Based

Design,” EE Times, Feb. 2002, http://www.eetimes.com/

news/design/showArticle.jhtml?articleID=16504380.

8. W.O. Cesario et al., “Multiprocessor SoC Platforms: A

Component-Based Design Approach,” IEEE Design &

Test, vol. 19, no. 6, Nov.-Dec. 2002, pp. 52-63.

9. D. Gajski et al., SpecC: Specification Language and

Methodology, Kluwer Academic Publishers, 2000.

10. E. Lee, “Overview of the Ptolemy Project,” tech. memo

UCB/ERL M03/25, Dept. of Electrical Eng. and Comput-

er Science, Univ. of California, Berkeley, 2003.

Electronic System-Level Design

346 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 21: 105.100.000 DesignnTest ESL Special Edition Complete

Patrick Schaumont is an assistantprofessor in the Electrical and Com-puter Engineering Department at Vir-ginia Tech. His research interestsinclude design methods and architec-

tures for embedded systems, with an emphasis ondemonstrating new methodologies in practical appli-cations. Schaumont has an MS in computer sciencefrom Ghent University, Belgium, and a PhD in electri-cal engineering from the University of California, LosAngeles. He is a senior member of the IEEE.

Ingrid Verbauwhede is an associ-ate professor at the University of Cali-fornia, Los Angeles, and an associateprofessor at Katholieke UniversiteitLeuven, in Belgium. Her research

interests include circuits, processor architectures, anddesign methodologies for real-time, embedded sys-tems in application domains such as security, cryp-tography, DSP, and wireless. Verbauwhede has anelectrical engineering degree and a PhD in appliedsciences, both from Katholieke Universiteit Leuven.She is a senior member of the IEEE.

Direct questions or comments about this article toPatrick Schaumont, 302 Whittemore Hall (0111),Virginia Tech, VA 24061; [email protected].

347

SubscribeNow!

IEEE Pervasive Computing delivers

the latest peer-reviewed developments in pervasive,

mobile, and ubiquitous computing to developers,

researchers, and educators who want to keep abreast

of rapid technology change. With content that’s

accessible and useful today, this publication acts as a

catalyst for progress in this emerging field, bringing

together the leading experts in such areas as

• Hardware technologies

• Software infrastructure

• Sensing and interaction with the physical world

• Graceful integration of human users

• Systems considerations,including scalability, security, and privacy

•• HealthcareHealthcare

•• Mining a Sensor-RichMining a Sensor-RichWWorldorld

•• Urban ComputingUrban Computing

•• Security & Privacy Security & Privacy

VISITwww.computer.org/pervasive/subscribe.htm

F E AT U R I N G

IN 2007

www.computer.org/e-News

Available for FREE to members.

Be alerted to• articles and

special issues

• conference news

• registrationdeadlines

Sign Up Today for the IEEE

Computer Society’s

e-News

Sign Up Today for the IEEE

Computer Society’s

e-News

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Page 22: 105.100.000 DesignnTest ESL Special Edition Complete

Electronic System-Level Design

348 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

THE DESIGN PRODUCTIVITY of engineers has not

kept pace with rapid improvements in silicon technolo-

gy. This has resulted in what is commonly known as the

productivity gap. To close this gap, researchers have intro-

duced various system-level design languages (SLDLs) to

raise the design abstraction level by focusing on a sys-

tem’s behavior rather than low-level implementation

details. A major challenge that SLDLs face stems from the

behavioral heterogeneity of most embedded systems. For

example, one part of an embedded system might perform

intensive computations on samples that regularly arrive

from an analog-to-digital converter. Another part of the

same system might perform only minor computations

while being ready to quickly respond to events that arrive

asynchronously from the environment.

An embedded system’s behavior usually involves a

set of concurrent, communicating processes. A model

of computation (MoC) defines the rules for communi-

cation and synchronization between processes.

Different MoCs are suitable for different behaviors. For

example, hierarchical concurrent finite-state machines

(HCFSMs), which the statecharts family uses,1 are suit-

able for describing control-dominated behavior, where-

as dataflow models are good for data-dominated

behavior. SLDLs must support multiple

MoCs to successfully cope with embed-

ded systems’ behavioral heterogeneity.

Using a case study of a practical, het-

erogeneous embedded system called fre-

quency relay, we evaluate the modeling

capabilities of two popular system-level

languages, SystemC and Esterel.2,3 Based

on this case study, we establish an

expanded set of system-level language

requirements, against which we evaluate

the strengths and weaknesses of these two languages.

Because of these languages’ limitations, we suggest a

new MoC for heterogeneous systems called DFCharts,

which SystemC and Esterel should follow to support bet-

ter modeling of heterogeneous embedded systems.

(The “Related work” sidebar discusses other efforts to

compare languages for embedded-systems design.)

DFCharts targets heterogeneous embedded systems

by combining a data-dominated MoC called synchro-

nous dataflow (SDF) with a control-dominated MoC

called Argos (which, like statecharts, is based on

HCFSMs).4,5 In terms of MoCs that are combined,

DFCharts is similar to *charts,6 which also uses HCFSMs

and SDF. However, *charts allows only hierarchical

refinement of one model by another. At each hierar-

chical level, blocks must obey the semantics of a single

MoC, but internally a designer can refine each block

into a system that behaves according to some other

model. The major problem with this approach has to do

with the communication between hierarchical levels,

which can lead to the loss of some of a given MoC’s orig-

inal characteristics. Unlike *charts, DFCharts lets SDFs

and FSMs coexist at the same hierarchical level, and a

rendezvous mechanism of communicating sequential

Modeling EmbeddedSystems: From SystemC andEsterel to DFCharts

Editor’s note:This article addresses the need for directly expressing heterogeneous,hierarchical behaviors for modeling specific embedded systems. Afteranalyzing two existing ESL languages, SystemC and Esterel, the authorscreated a new model of computation and a graphical language to gain thedirect expressivity they need for their model. Although researchers havesuggested various changes to SystemC and Esterel to fit modelingrequirements, this article considers mainly standard SystemC and Esterel.

—Sandeep K. Shukla, Virginia Polytechnic and State University

Ivan Radojevic, Zoran Salcic, and Partha S. Roop

University of Auckland

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 23: 105.100.000 DesignnTest ESL Special Edition Complete

processes (CSPs) enables communication between

them.7 In this way, each model retains its characteris-

tics, and there is more flexibility in modeling.

Initial system-level requirements forSystemC and Esterel

Whereas an industry consortium is proposing

SystemC and it has no formal semantics, Esterel has for-

mal semantics and formal verification capabilities.

Hence, both languages represent differing perspectives

on system-level modeling. Some of the key modeling

requirements at the system level are as follows:

■ Separation of communication and computation. This

makes the model suitable for reuse in an environ-

ment involving several independently developed,

concurrent components.

■ Concurrency and communication primitives at a high

abstraction level. The purpose of the system-level

design is to create a model involving several com-

ponents, each having its own MoC. Therefore, the

modeling language must combine several MoCs and

facilitate communication among them.

■ Functional hierarchy. The modeling language might

need to express a particular functionality hierarchi-

cally to enable succinct specification. Hierarchy

should allow mixing different MoCs that exist at dif-

ferent hierarchical levels. This requirement is also

called hierarchical heterogeneity.8

■ Exception handling. Because exceptions are critical

to embedded systems, the language must provide

direct support to capture and handle exceptions.

In light of the frequency relay case study, we will

expand these requirements.

Case study: Frequency relayPower systems need protection from overloading.

When a power system is overloaded, it’s necessary to dis-

connect some loads to prevent damage. A significant

decrease in the main AC signal’s frequency level (the nor-

mal value is 50 Hz) indicates a dangerously overloaded

system. The same problem also occurs when the AC sig-

nal’s rate of change (ROC) is too fast. The frequency relay

is a system that measures the frequency and its ROC in a

power network, comparing measurement results against

a set of thresholds that a control system can modify via

the Internet. If the current thresholds indicate that the fre-

quency is too low or that its ROC is too fast, the frequen-

cy relay disconnects some loads from the network by

349September–October 2006

Related workThere have been a few other attempts to describe and com-

pare languages for embedded-systems design. Edwardsreviews hardware description languages, programming lan-guages, and system-level languages.1 Cai et al. compare spec-ification languages SpecC and SystemC.2 Gorla et al. compareseveral languages for system specification.3 They also use thecase study of a practical heterogeneous embedded system toillustrate relevant concepts. Brisolara et al. use the same casestudy to compare two variants of the Unified ModelingLanguage with Simulink.4

The key difference between our work and these is that weclosely concentrate on the link between the specification lan-guages and the models of computation (MoCs) suitable for het-erogeneous systems. Moreover, we introduce a new MoC,DFCharts, to model heterogeneous systems.

Other models that target heterogeneous embedded systemsinclude Reactive Process Networks,5 Funstate,6 CompositeSignal Flow,7 and Mode Automata.8

References1. S. Edwards, Languages for Digital Embedded Systems, Kluwer

Academic Publishers, 2000.

2. L. Cai, S. Verma, and D.D. Gajski, Comparison of SpecC and

SystemC Languages for System Design, tech. report CECS-03-

11, Center for Embedded Computer Systems, Univ. of

California, Irvine, 2003.

3. G. Gorla et al., “System Specification Experiments on a

Common Benchmark,” IEEE Design & Test, vol. 17, no. 3,

July-Sept. 2000, pp. 22-32.

4. L. Brisolara et al., “Comparing High-Level Modeling

Approaches for Embedded System Design,” Proc. Asia and

South Pacific Design Automation Conf. (ASP-DAC 05), ACM

Press, 2005, pp. 986-989.

5. M. Geilen and T. Basten, “Reactive Process Networks,” Proc.

4th ACM Int’l Conf. Embedded Software (EMSOFT 04), ACM

Press, 2004, pp. 137-146.

6. K. Strehl et al., “FunState—An Internal Design Representation

for Codesign,” IEEE Trans. Very Large Scale Integration (VLSI)

Systems, vol. 9, no. 4, Aug. 2001, pp. 524-544.

7. A. Jantsch and P. Bjureus, “Composite Signal Flow: A

Computational Model Combining Events, Sampled Streams,

and Vectors,” Proc. Design, Automation and Test in Europe

Conf. (DATE 00), IEEE CS Press, 2000, pp. 154-160.

8. F. Maraninchi and Y. Remond, “Mode-Automata: A New

Domain-Specific Construct for the Development of Safe Critical

Systems,” Science of Computer Programming, vol. 46, no. 3,

Mar. 2003, pp. 219-254.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 24: 105.100.000 DesignnTest ESL Special Edition Complete

opening one or more switches (three in the case we pre-

sent here), as determined by a decision algorithm. The

system gradually reconnects loads if the frequency and

its ROC improve.

Figure 1 illustrates the

main operation that we

just described, mode1.

Data-dominated processes

perform a DSP operation

similar to autocorrelation;

this operation is necessary

for frequency calculation.

Control-dominated pro-

cesses perform various

decision-making and mi-

nor computations. The

parameter-settings process

monitors the interface with

the Internet. The frequen-

cy calculation and ROC calculation processes determine

the frequency and its ROC.

Figure 2 shows the switch-control process, repre-

senting it as an FSM with four states. The initial state is

S3. Each state determines how many switches are

closed. For example, three switches are closed in S3,

whereas all three switches are open in S0. The state tran-

sitions come from inputs t1, t2, and t3, which indicate

whether certain thresholds have been exceeded. The

input from timer to (time-out) is also a factor. The

switch-control block can restart the timer by emitting

output st.

Figure 3 shows the frequency relay’s global states.

The initial state, initialize, configures some system para-

meters. After this initialization, init_done, the next state

is mode1, in which the main operation occurs (as

described by the processes in Figure 1). If reset occurs,

the system reinitializes. When off occurs, mode1 termi-

nates and mode2 begins. Nothing happens in this state;

the system simply stops, and all switches close. If on

occurs, the system enters mode1 again. The FSM in

Figure 3 represents the frequency relay’s top level. (The

processes in Figure 1 are one level below this.)

The arrows between the processes in Figure 1 denote

directions of communication, but so far we have not dis-

cussed the communication semantics. Before writing

the specification in SystemC and Esterel, we need to

state the required communication mechanisms.

Furthermore, we need to state how the computations

inside the processes will occur. By identifying the

required models for computation and communication,

we can make a complete list of requirements against

which we will evaluate SystemC and Esterel.

The three data-dominated blocks perform intensive

computations on samples that regularly arrive from the

Electronic System-Level Design

350 IEEE Design & Test of Computers

Figure 1. Main operation of frequency relay mode1. (Clear boxes indicate data-

dominated processes. Shaded boxes indicate control-dominated processes.)

Figure 2. Switch-control process.

Figure 3. Global states of frequency relay.

Averagingfilter

Switchcontrol

Timer

Symmetryfunction

Frequencycalculation

Rate-of-change(ROC)

calculation

Peakdetection

Parametersettings

AC waveform

SwitchesCommunicationnetwork

S3

S0

t3/st

S2

S1

t1/stt1/st

t2/st

t3/stt3/st t2/st to/st

to

t3/st

to/st

t2/st

init_done off

on

mode1

mode2initialize

reset

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 25: 105.100.000 DesignnTest ESL Special Edition Complete

power system. Lee and Messerschmitt have successful-

ly applied SDF for this type of behavior.4 In SDF, process-

es communicate through asynchronous FIFO buffers.

Each process can fire when its firing rule is satisfied, and

this determines how many tokens must be present in

the input buffers. Imperative statements (which pro-

grammers can write in C, for instance) describe the

algorithms inside the processes. FSMs, such as the one

for the switch-control process in Figure 2, can effectively

capture the control-dominated processes’ behavior.

FSMs can also be hierarchical. The most convenient

communication model among concurrent FSMs

appears to be synchronous reactive (SR).9 (In fact, most

variants of statecharts use SR.1) Thus, we need hierar-

chical, concurrent FSMs with SR communication. In

addition, we need imperative statements for minor com-

putations performed on state transitions. Finally, we can

use CSP-like rendezvous for communication between

peak detection and frequency calculation processes. A

high-level communication mechanism guarantees loss-

less transmission of data without buffers.

The models listed thus far (HCFSM, SR, SDF, CSP,

and imperative statements) cover the majority of mod-

els that Edwards et al. discuss.9 An important model that

we haven’t discussed is discrete event. Although high-

ly expressive, discrete-event models are very difficult to

synthesize.9

Suitability of SystemC and Esterel formodeling heterogeneous embeddedsystems

Based on the frequency relay case study, we expand

the system-level language requirements given earlier

into the following six requirements:

■ concurrent processes, an essential requirement and

a precondition for all other points that follow;

■ rendezvous communication;

■ support for dataflow, including buffered communi-

cation between processes and specification of firing

rules for dataflow modules;

■ support for HCFSM models with synchronous com-

munication;

■ imperative statements to describe data transforma-

tions inside SDF actors, as well as smaller computa-

tions performed by FSMs; and

■ hierarchy and preemption, multiple processes inside

a hierarchical state, and instant termination of lower-

level processes when any transition leaves the hier-

archical state.

The first five requirements relate to the first two

requirements given earlier (separation of communica-

tion and computation, and concurrency and commu-

nication primitives at a high abstraction level). The last

requirement, hierarchy and preemption, relates to the

last two requirements (functional hierarchy and excep-

tion handling) from the earlier list.

Evaluation of SystemC and Esterel based onthese requirements

Now, we evaluate the level of support of SystemC and

Esterel for each of the six expanded system-level require-

ments. Table 1 summarizes the results of this evaluation.

Concurrent processes. SystemC relies on implicitly

assumed concurrency; processes defined in a single

module are concurrent. When multiple modules con-

nect at any hierarchical level, they always execute con-

currently. In fact, specifying the execution order of

modules, as in sequential or pipelined execution (avail-

able in some other languages), is not possible in

SystemC. The designer would have to use control sig-

nals to manipulate the execution order of modules.

Esterel lets programmers explicitly create concur-

rency using parallel operator || at any hierarchical level.

The || operator creates concurrent threads that commu-

nicate and synchronize using the synchronous broad-

cast. This approach is based on the SR MoC, which

assumes there is a global clock. Esterel generates inputs

and corresponding outputs in the same tick of the glob-

al clock, leading to the logical-zero delay model. Also,

Esterel broadcasts events generated in any thread to all

other threads. Clever programming would be necessary

for any other form of concurrency, however.

Rendezvous communication. SystemC has no higher-

level construct to implement rendezvous directly.

351September–October 2006

Table 1. Level of support provided by SystemC and

Esterel, in a scale of 0 to 3, with 3 being the highest

level of support.

Requirement SystemC Esterel

Concurrent processes 3 3

Rendezvous communication 2 2

Support for dataflow 2 0

Support for HCFSMs 2 3

Data transformations 3 3

Hierarchy and preemption 0 3

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 26: 105.100.000 DesignnTest ESL Special Edition Complete

However, creating rendezvous between two processes

using wait and notify statements should not be difficult.

Esterel does not allow direct specification of ren-

dezvous. Instead, programmers must create rendezvous

using a combination of appropriately employed await

and emit statements.

Support for dataflow. In SystemC, primitive channel

sc_fifo can implement FIFO buffers. Because of constant

data rates, it’s best to implement data-dominated blocks

as method processes. There’s no need to use less efficient

thread processes. However, only thread processes that

are dynamically scheduled by the SystemC kernel can

use sc_fifo buffers. Hence, implementing static schedul-

ing with the firing rules of the SDF model is difficult.

Esterel allows the implementation of a FIFO buffer as

a separate process (C function), thus separating compu-

tation and communication. However, the FIFO process

would still synchronize with the tick signal. Thus, the

abstraction level would be lower than in asynchronous

SDF buffers. In the frequency relay, the SDF blocks per-

forming signal processing must be reactive, like all other

processes in the system. The event to which they react is a

sample from the analog-to-digital converter. The problem

is that all processes must align with a single tick signal—

that is, they must read inputs and produce outputs at the

same time instant. The most efficient solution for the SDF

processes is to have the tick signal coincide with the AC

input signal’s sampling frequency. The ticks must be fre-

quent enough to capture all system inputs. Thus, the

process with the fastest system inputs determines the tick

signal rate. The result is an implementation that is likely

to be inefficient, because the data-dominated blocks work

faster than they would oth-

erwise need to work. A

more efficient implemen-

tation would specify data-

dominated blocks as

asynchronous tasks, taking

more than one tick to

complete computations.

However, using asynchro-

nous tasks leads to integra-

tion problems.

Support for HCFSMs.

SystemC lets you describe

FSMs using switch-case

constructs, which can be

nested for hierarchical

FSMs. This involves using multiple state variables. Signal

sensitivities and the wait statement support reactivity.

However, SystemC cannot match powerful preemption

statements such as abort and trap in the SR-based Esterel

language.

Esterel obviously completely supports SR communi-

cation. Statements such as abort and trap can naturally

describe preemption. Although Esterel’s imperative state-

ments can easily describe an FSM, using visual syntax is

probably more convenient in most cases. This is where

SyncCharts (http://www.esterel-technologies.com) com-

plements Esterel.

Data transformations. SystemC, as an imperative lan-

guage, provides excellent support for describing

sequential algorithms. In Esterel, C is available as a host

language; hence, Esterel can specify complex algo-

rithms for data transformations inside transformational

blocks similar to the way SystemC does. However,

Esterel requires you to assume that computation of time-

consuming algorithms is instantaneous.

Hierarchy and preemption. In SystemC, there is no

direct way to implement exceptions modeled by exits

from higher-level hierarchical states. We indicated earli-

er that hierarchy in an FSM could be modeled by using

nested switch-case statements; however, this type of mod-

eling is not applicable here, because it’s not possible to

instantiate processes inside a case branch. Because pre-

empting processes is not possible, one or more control

signals must control each process. Consequently, the

global-state FSM in Figure 3 must be at the same hierar-

chical level as the processes in Figure 1 (see Figure 4).

Electronic System-Level Design

352 IEEE Design & Test of Computers

Figure 4. Modified frequency relay model for SystemC implementation.

Averagingfilter

Switchcontrol

Timer

Symmetryfunction

Global statefinite-statemachine

Frequencycalculation

ROCcalculation

Peakdetection

Parametersettings

AC waveform

SwitchesCommunicationnetwork

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 27: 105.100.000 DesignnTest ESL Special Edition Complete

Esterel supports behavioral hierarchy and has sev-

eral statements that enable preemption. For example,

concurrent blocks can run inside the body of an abort

statement.

Additional analysis of SystemC and EsterelTables 2 and 3 give the SystemC and Esterel specifi-

cations of the frequency relay. The Esterel specification

is a mixture of Esterel files with an .strl extension, and

C files with a .c extension. Neither specification com-

pletely follows the model in Figure 1.

The total code size for the SystemC specification,

excluding the testbench file, was 1,102 lines. The total

code size for the Esterel specification was 901 lines. This

difference is not significant, considering that the

SystemC specification has more files and thus more dec-

larations. Each SystemC file contains one process. The

first three files in Table 2 contain thread processes; all

others contain method processes.

Although the time required to prepare a simulation is

important, a more critical factor is the actual simulation

time. The Esterel simulation took close to 4 hours,

whereas the SystemC simulation took only 5 minutes.

We performed both simulations on the same platform.

For SystemC, we used Microsoft Visual C++ version 6

with SystemC class library 2.0.1. For Esterel, we used

Esterel Studio version 4, which supports Esterel version

5. The latest, recently released version of Esterel (ver-

sion 7) allows multiclock designs that are globally asyn-

chronous, locally synchronous (GALS).

Several factors might account for the huge difference

in actual simulation times, but the most interesting one

concerns modeling in Esterel. The entire system must

run on one clock because Esterel doesn’t support mul-

tiple clocks. The process with the fastest changing

inputs—the parameter-setting block—determines the

system speed. This speed is unnecessarily high for data-

dominated parts, which need to read inputs only when

a sample arrives. Consequently, there are many ticks

with absent inputs in this part of the system.

Although simulation is the most widely used valida-

tion method, it is not the only one. The other method is

formal verification, which Esterel specifications (unlike

SystemC) may employ. However, formal verification is

not particularly helpful for the frequency relay, because

any useful properties that could be verified would relate

to data-dependent internal activities rather than inputs

and outputs. It would be difficult to define such prop-

erties using Esterel observers, which check properties

only in the control part.

DFChartsBecause of the limitations of SystemC and Esterel,

we introduced DFCharts as a model they should sup-

port to capture heterogeneous embedded systems. (We

explain the detailed semantics of DFCharts else-

where.10) DFCharts combines two well-known models,

SDF and Argos,4,5 in a novel way. SDF is suitable for data-

dominated systems. Argos is suitable for control-domi-

nated systems.

SDF belongs to the family of dataflow models. In

SDF, each process operates on streams of tokens in fir-

ings. A process’ firing rule specifies how many tokens

each firing consumes and produces. In SDF, unlike

dynamic dataflow models, those numbers must be con-

stant, which limits buffer size and makes it possible to

construct efficient static schedules. Because of static

scheduling, the iteration of an SDF graph is clearly iden-

tifiable: It is a series of process firings that return the

buffers to their original state. In Figure 5a, some possi-

353September–October 2006

Table 2. SystemC files for frequency relay specification

(effective lines of source code).

SystemC files Code size

averaging_filter.cpp 85

symmetry_function.cpp 95

peak_detection.cpp 66

frequency_calculation.cpp 93

roc_calculation.cpp 100

parameter_settings.cpp 239

switch_control.cpp 135

timer.cpp 38

frequency_relay.cpp 251

testbench.cpp 412

Table 3. Esterel files for frequency relay specification

(effective lines of source code).

SystemC files Code size

dataflow.strl 76

averaging_filter.c 34

symmetry_function.c 41

measurement.strl 77

freq_average.c 31

roc_average.c 43

parameter_settings.strl 251

switch_control.strl 139

frequency_relay.strl 209

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 28: 105.100.000 DesignnTest ESL Special Edition Complete

ble schedules that create a single iteration are BCA,

CBA, or C and B running concurrently before A. The

numbers next to the processes describe their firing

rules. SDF is suitable for a wide range of signal-process-

ing systems with constant data rates.

Argos models consist of parallel and hierarchical

compositions of FSMs. Argos execution is based on the

synchrony hypothesis, which states that all computations

and communications in the system are instantaneous.

As a result, there is no delay between inputs and outputs;

they are synchronous. Model execution involves a series

of instants (called ticks) of a global clock. In each tick,

Argos reads inputs and instantaneously produces out-

puts. Because all components react simultaneously,

there is no need for scheduling. The three main opera-

tors that Argos uses to construct the HCFSM model are

refinement for hierarchy, synchronous parallel for con-

currency, and hiding for synchronization. Figure 5b

shows a simple Argos specification, which refines state S1

into two concurrent FSMs that synchronize using event c.

When S1 is active and event b occurs, FSM2 makes the

transition and emits c, causing the transition in FSM3 in

the same instant, which in turn emits d. In the instant

when the refined FSM leaves the hierarchical state, the

refining FSMs can react. Thus, d is emitted even if signal

a is present. (This corresponds to the notion of weak pre-

emption, called weak abort in Esterel).

Like Argos, DFCharts has synchronous parallel,

refinement, and hiding operators. However, it also has

the additional asynchronous parallel operator, which it

uses to connect an SDF graph with one or more FSMs.

This operator is asynchronous because the SDF graph

operates independently of FSMs. The SDF graph syn-

chronizes with FSMs only between two iterations: when

it’s receiving inputs for the next iteration and sending

outputs produced during the previous iteration. SDF

graphs can be at any level in the hierarchy of FSMs.

All FSMs in a DFCharts specification use the same set

of ticks (clock). When a tick occurs, every FSM makes a

transition. However, SDF graphs operate at their own

speed. This produces a system with multiple clock

domains, a different domain for each SDF graph and a

single clock domain for all FSMs. This type of mixed syn-

chronous and asynchronous specification supports effi-

cient implementation. Moreover, because DFCharts

allows FSMs and an SDF graph at the same hierarchical

level, each retains its own characteristics.

The example in Figure 6 illustrates the features of

DFCharts. At the top level, state S2 is refined into two par-

allel FSMs that synchronize by local event e. S1 is also

refined into two FSMs, connected by the synchronous

parallel operator; in addition, the asynchronous parallel

operator connects these two FSMs with SDF graph SDF1.

The communication between the SDF graph and the

FSMs passes through channels ch1 and ch2. The arrows

indicate the direction of data exchange. For the SDF

graph, ch1 is an output channel, and ch2 is an input chan-

nel. The communication through each channel occurs

when both the SDF graph and the relevant FSM are ready

for it. (The SDF graph and the FSM meet using CSP-style

rendezvous operations.) If the sender attempts to send

when the receiver is not ready, the sender will block

itself. Similarly, if the receiver attempts to read while the

sender is not ready, the receiver will block itself.

FSMs communicate with SDF graphs from rendezvous

states, which cannot be refined. A rendezvous state is one

that has an outgoing transition triggered by a rendezvous

action. In Figure 6, the rendezvous states are S7 and S9.

When FSM4 is in S7, it is ready to receive data from SDF1

through ch1, as evident from transition ch1?x. We use CSP

notation,7 where “?” denotes a read action, and “!” denotes

a write action. When SDF1 is ready to send data, the com-

munication occurs, triggering transition ch1?x. The data

received from SDF1 is stored in variable x, event h is emit-

ted, and state S8 begins. S8 can also follow S7 when event m

is present, preempting rendezvous on ch1. On the other

hand, FSM5 remains blocked in S9 until SDF1 is ready to

receive data through ch2 from variable y. Figure 7 shows

how DFCharts represents the frequency relay.

Property verification in a DFCharts model is similar

Electronic System-Level Design

354 IEEE Design & Test of Computers

Figure 5. Example specifications of the two models used in DFCharts: SDF (a) and Argos (b).

B 1

111

22 1

3CA S0

a

ab/c c /d

FSM1 FSM2 FSM3c

S1

(a) (b)

S1

Synchronousparallel

Hiding

Refinement

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 29: 105.100.000 DesignnTest ESL Special Edition Complete

to that in Argos. In the latter, combining FSMs removes

hierarchy and concurrency. The result is a single, flat

FSM, whose behavior is equivalent to the original

model. In DFCharts, it is also necessary to integrate SDF

graphs. DFCharts accomplishes this by representing the

operation of each SDF graph as an equivalent HCFSM.

In general, the top-level FSM representing an SDF graph

has two states: io (I/O) and iterate. Figure 8 gives a sim-

ple example of an SDF graph with one input channel

and one output channel.

355September–October 2006

Figure 6. Example of DFCharts model.

Figure 7. Frequency relay in DFCharts.

S3

S4

S5

S6

eg

S2

S8

S7

S8

S9

S10

l

A

B

C

SDF1

S1

2

2

1

1 1

1

1 3

S0

S1 S2

a

c

e

FSM6SDF2 SDF3

m

b/d

n /fg /e

FSM1FSM2 FSM3

FSM2 FSM3

Synchronous parallel

Asynchronous parallelHiding

Refinement

ch1?x/hch1

ch3

ch4 ch2

ch2!y h/assigny1 = y1 + 1y2 = y2 + 2

On Off

1 1 1 111 averagingfilter

symmetryfunction

peakdetectionch1

ch2

initialize mode2

mode1

find_peaks

freq_relay

init_donereset

Timer Switchcontrol

Parametersettings

ROCcalculation

Frequencycalculation

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 30: 105.100.000 DesignnTest ESL Special Edition Complete

The io state is refined by as many concurrent FSMs as

there are inputs and outputs. The input FSM, which con-

sists of two states, receives data through channel cin, and

stores it into variable din. The output FSM sends data from

variable dout through channel cout, as the transition from

so2 to so3 indicates. If no iteration has occurred yet, which

the presence of init indicates, there is nothing to send,

and the output FSM enters so3 immediately after so1.

Otherwise, init is absent (denoted by ⎯⎯init in Figure 8), and

so2 is entered from so1. When the input and output FSMs

enter si2 and so3, respectively, ioc becomes present (io

complete), and the top-level FSM enters iterate, thus

completing a single iteration of the SDF graph. An FSM

representing a particular schedule can further refine this

state. However, this refinement isn’t necessary for the

global analysis.

Comparison between DFCharts andother models

Besides DFCharts, the only other model that com-

bines FSMs and SDFs is *charts,6 which is a part of

Ptolemy.8 The Ptolemy environment hierarchically com-

bines several MoCs. At each hierarchical level, blocks

must obey a single MoC’s semantics, but a designer can

internally refine each block into a system that behaves

according to some other model. The closest subset of

Ptolemy to DFCharts is *charts, which focuses on mix-

ing FSMs with other models. With hierarchical hetero-

geneity, it might be difficult in *charts to devise a

meaningful communication mechanism between outer

and inner models. The inner model might lose some

properties while adjusting to the outer model. For exam-

ple, if a network of SR blocks refines an SDF block, the

refining blocks receive their inputs through blocking

reads, so they are not really reactive. Conversely, if an

SDF network refines an SR block, the SDF network must

conform to the synchrony hypothesis. This means

*charts will assume its iteration is instantaneous and

will synchronize it to all SR blocks in the upper hierar-

chical level. Such assumptions are likely to produce

inefficient implementations. With the parallel hetero-

geneity used in DFCharts, FSMs are free to react to exter-

nal events, and SDF graphs can run at their own speed.

The Communicating Reactive State Machines

(CRSM) language also extends Argos with an asyn-

chronous parallel operator, which uses rendezvous

channels to connect parallel FSMs.11 Thus, DFCharts has

more in common with CRSM than Argos. However, the

purpose of the asynchronous parallel operator in CRSM

is to connect parts in a distributed system, whereas in

DFCharts this operator serves to connect physically

close control-dominated and data-dominated parts.

Another important difference is that in CRSM the asyn-

chronous parallel operator can function only at the top

level (in a GALS manner), whereas in DFCharts it can

function at any hierarchical level.

Feature extensions of SystemC andEsterel

According to our analysis, SystemC only partially

supports or does not support at all the expanded system-

level requirements of rendezvous communication,

dataflow, HCFSMs, and hierarchy and preemption. A

designer can construct a rendezvous channel using wait

and notify statements to create the necessary request

and acknowledge lines for the rendezvous protocol, but

this could take some effort. Ideally, the designer should

add a standard rendezvous channel to the library of

channels that includes sc_fifo, sc_signal, and so on.

Asynchronous thread processes that communicate

through FIFO channels using blocking reads provide a

good foundation for dataflow models.

However, it’s also still difficult in SystemC to specify

firing rules and construct static-scheduling orders, so

improvements are necessary in this area as well.

Synchronous processes can be created in SystemC, and

this is essential for HCFSM support. It’s also possible to

model reactivity using signal sensitivities and wait and

notify statements. But the absence of preemption is a

serious disadvantage when modeling control-dominat-

ed behavior. Processes cannot be instantaneously ter-

minated or interrupted, which is necessary for the

hierarchy and preemption requirement. Overcoming

this fundamental limitation would require making deep

changes in SystemC’s simulation semantics.

SystemC-H is an extension of SystemC that incorpo-

rates some of these desired changes.12 SystemC-H has

an extended SystemC kernel to better support SDF, CSP,

Electronic System-Level Design

356 IEEE Design & Test of Computers

Figure 8. FSM representing the operation of a two-channel

SDF graph.

io state

iterateioc

itc

si1

si2

so2cin?din

cout!dout

so1

so3

initinit

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 31: 105.100.000 DesignnTest ESL Special Edition Complete

and FSM models. Constructing static schedules for SDF

models is possible, and this increases simulation effi-

ciency. Another important addition is hierarchical het-

erogeneity with SDF and FSM models. In its current

form, though, SystemC-H probably wouldn’t be able to

support DFCharts entirely, because the former adheres

to purely hierarchical heterogeneity, as in Ptolemy,

whereas DFCharts represents a mixture of hierarchical

and parallel heterogeneity.

Like SystemC, Esterel does not directly support ren-

dezvous, but using await and emit statements, a design-

er could construct rendezvous. The main problem with

Esterel is its complete lack of support for the third

expanded system-level requirement: support for

dataflow, including buffered communication between

processes and specification of firing rules for dataflow

modules. The assumption made by the synchrony

hypothesis (that all computations are instantaneous) is

seldom valid for data-dominated systems. Furthermore,

Esterel syntax is not appropriate for dataflow. It would

be possible to design a dataflow network inside an asyn-

chronous task. But, describing something in an asyn-

chronous task means going outside Esterel and its

development tools. Creating a solid basis for an inte-

grated environment requires defining a MoC (such as

SDF) for asynchronous tasks and interfacing this MoC

with the SR model.

WE INTEND TO CREATE a graphical environment

for designing embedded systems using DFCharts.

Therefore, we’ve implemented a Java class library to

execute DFCharts specifications. This library incorpo-

rates methods for analyzing SDF graphs from Ptolemy

II. In fact, this was one of the reasons we chose Java for

the implementation. The next step is to create a graph-

ical interface. Another direction of research, which is

the focus of this article, is to modify widely accepted

system-level languages such as SystemC and Esterel to

support DFCharts. ■

References1. M. von der Beeck, “A Comparison of Statecharts

Variants,” Proc. Formal Techniques in Real-Time and

Fault-Tolerant Systems, LNCS 863, Springer-Verlag,

1984, pp. 128-148.

2. Open SystemC Initiative, SystemC Version 2.0 User’s

Guide; http://www.systemc.org.

3. G. Berry and G. Gonthier, “The Esterel Synchronous

Programming Language: Design, Semantics, Implemen-

tation,” Science of Computer Programming, vol. 19, no.

2, Nov. 1992, pp. 87-152.

4. E.A. Lee and D.G. Messerschmitt, “Synchronous Data

Flow,” Proc. IEEE, vol. 75, no. 9, Sept. 1987, pp. 1235-

1245.

5. F. Maraninchi and Y. Remond, “Argos: An Automation-

Based Synchronous Language,” Computer Languages,

vol. 27, nos. 1-3, 2001, pp. 61-92.

6. A. Girault, B. Lee, and E. Lee, “Hierarchical Finite State

Machines with Multiple Concurrency Models,” IEEE

Trans. Computer-Aided Design of Integrated Circuits

and Systems, vol. 18, no. 6, June 1999, pp. 742-760.

7. C.A.R. Hoare, “Communicating Sequential Processes,”

Comm. ACM, vol. 21, no. 8, Aug. 1978, pp. 666-677.

8. J. Eker et al., “Taming Heterogeneity—The Ptolemy

Approach,” Proc. IEEE, vol. 91, no. 1, Jan. 2003, pp.

127-144.

9. S. Edwards et al., “Design of Embedded Systems:

Formal Methods, Validation, and Synthesis,” Proc.

IEEE, vol. 85, no. 3, Mar. 1997, pp. 366-390.

10. I. Radojevic, Z. Salcic, and P. Roop, “Modeling Hetero-

geneous Embedded Systems in DFCharts,” Proc. Forum

Design and Specification Languages (FDL 05), Euro-

pean Chips and Systems Initiative, 2005, pp. 441-452.

11. S. Ramesh, “Communicating Reactive State Machines:

Design, Model and Implementation,” Proc. IFAC Work-

shop Distributed Computer Control Systems, Pergamon

Press, 1998; http://www.cfdvs.iitb.ac.in/projects/crsm/

ifac.ps.

12. H. Patel and S. Shukla, SystemC Kernel Extensions for

Heterogeneous System Modeling: A Framework for

Multi-MoC Modeling & Simulation, Kluwer Academic

Publishers, 2004.

Ivan Radojevic is a PhD candidatein the Department of Electrical andComputer Engineering at the Univer-sity of Auckland in New Zealand. Hisresearch interests include design lan-

guages, models of computation, and hardware-soft-ware codesign for embedded systems. Radojevic hasa BE in electrical engineering from the University ofAuckland.

357September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 32: 105.100.000 DesignnTest ESL Special Edition Complete

Zoran Salcic is a professor of com-puter systems engineering at the Uni-versity of Auckland. His researchinterests include complex digital-systems design, custom-computing

machines, reconfigurable systems, FPGAs, processorand computer systems architectures, embedded sys-tems and their implementation, design automationtools for embedded systems, hardware-softwarecodesign, new computing architectures and modelsof computation for heterogeneous embedded sys-tems, and related areas in computer systems engi-neering. Salcic has a BE, an ME, and a PhD inelectrical and computer engineering from the Univer-sity of Sarajevo. He did most of his PhD research at theCity University of New York (CUNY). He is a seniormember of the IEEE.

Partha S. Roop is a senior lecturerin the Department of Electrical andComputer Engineering at the Universi-ty of Auckland. His research interestsinclude the design and verification of

embedded systems—especially formal verificationtechniques such as model checking and modulechecking, and their applications in embedded sys-tems. Roop has a BE in engineering from Anna Uni-versity, Madras, India; an MTech from the IndianInstitute of Technology, Kharagpur, India; and a PhDin computer science from the University of New SouthWales, Sydney, Australia.

Direct questions or comments about this article to Ivan Radojevic, Department of Electrical andComputer Engineering, University of Auckland, 38Princess St., Auckland, New Zealand; [email protected].

Electronic System-Level Design

358 IEEE Design & Test of Computers

IEEE Design & Test Call for PapersIEEE Design & Test, a bimonthly publication of the IEEE Computer Society and the IEEE Circuits and Systems Society, seeks original manuscripts

for publication. D&T publishes articles on current and near-future practice in the design and test of electronic-products hardware and supportive

software. Tutorials, how-to articles, and real-world case studies are also welcome. Readers include users, developers, and researchers concerned

with the design and test of chips, assemblies, and integrated systems. Topics of interest include

To submit a manuscript to D&T, access Manuscript Central, http://cs-ieee.manuscriptcentral.com. Acceptable file formats include MS Word,

PDF, ASCII or plain text, and PostScript. Manuscripts should not exceed 5,000 words (with each average-size figure counting as 150 words toward

this limit), including references and biographies; this amounts to about 4,200 words of text and five figures. Manuscripts must be double-spaced, on

A4 or 8.5-by-11-inch pages, and type size must be at least 11 points. Please include all figures and tables, as well as a cover page with author contact

information (name, postal address, phone, fax, and e-mail address) and a 150-word abstract. Submitted manuscripts must not have been previously

published or currently submitted for publication elsewhere, and all manuscripts must be cleared for publication.

To ensure that articles maintain technical accuracy and reflect current practice, D&T places each manuscript in a peer-review process. At least

three reviewers, each with expertise on the given topic, will review your manuscript. Reviewers may recommend modifications or suggest additional

areas for discussion. Accepted articles will be edited for structure, style, clarity, and readability. Please read our author guidelines (including

important style information) at http://www.computer.org/dt/author.htm.

Submit your manuscript to IEEE Design & Test today!

D&T will strive to reach decisions on all manuscripts within six months of submission.

■ Analog and RF design,

■ Board and system test,

■ Circuit testing,

■ Deep-submicron technology,

■ Design verification and validation,

■ Electronic design automation,

■ Embedded systems,

■ Fault diagnosis,

■ Hardware-software codesign,

■ IC design and test,

■ Logic design and test,

■ Microprocessor chips,

■ Power consumption,

■ Reconfigurable systems,

■ Systems on chips (SoCs),

■ VLSI, and

■ Related areas.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.

Page 33: 105.100.000 DesignnTest ESL Special Edition Complete

0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 359

THE GROWTH OF THE EDA INDUSTRY has been less

than satisfactory in the past few years. For example, in

2005 growth was only 0.6%,1 and in 2006 it is predicted to

be less than 3%.2 The reasons are varied and are beyond

the scope of this article. However, one of the main issues

is the failure of EDA to address new customers. New cus-

tomers imply a revenue potential that is not consuming

present business, thus allowing real industry growth.

Traditionally, EDA has served the IC industry, where the

demand for tools has been rampant since the early 1980s.

An obvious adjacent market for EDA growth is electron-

ic system-level (ESL) design. (See the “Trends affecting

the ESL design market” sidebar for a brief history and

explanation of how various market factors have con-

tributed to developments in ESL design.)

The 2004 International Technology Roadmap for

Semiconductors (ITRS) placed ESL “a level above RTL,”

including both hardware and software design. The ITRS

defined ESL to “consist of a behavioral (before HW/SW

partitioning) and architectural level (after)” and claimed

it would increase productivity by 200,000 gates per

designer-year. The ITRS states that ESL will improve pro-

ductivity by 60% over an “Intelligent Testbench”

approach—the previously proposed ESL design improve-

ment.3 Although these claims cannot yet be verified and

seem quite aggressive, most agree that

ESL’s overarching benefits include

■ raising the abstraction level at which

designers express systems,

■ enabling new levels of design reuse,

and

■ providing for design chain integration

across tool flows and abstraction levels.

The purpose of this article is to paint the ESL design

landscape by providing a unified framework for plac-

ing and analyzing existing and future tools in the con-

text of an extensible design flow. This approach should

help designers use tools more efficiently, clarify their

flow’s entry and exit points, and highlight areas in the

design process that could benefit from additional tools

and support packages. This framework is based on plat-

form-based design concepts.4,5 Using this framework,

we’ve classified more than 90 different academic and

industrial ESL offerings and partitioned the tool space

into metaclasses that span an ideal design flow.

(Although we try to cover as much of the ESL tool space

as possible, we make no claim of completeness. We

apologize in advance to the authors of tools we have

inadvertently ignored. Also, we don’t analyze the exten-

sive literature that describes these tools; rather, we iden-

tify Web sites that contain relevant information.)

We used this framework to explore three design sce-

narios to demonstrate how those involved in ESL design

at various levels and roles can effectively select tools to

accomplish their tasks more efficiently than in a tradi-

tional IC design flow. The ability to study design sce-

narios goes beyond mere classification, because our

framework exposes the relationships and constraints

A Platform-Based Taxonomyfor ESL Design

Editor’s note:This article presents a taxonomy for ESL tools and methodologies thatcombines UC Berkeley’s platform-based design terminologies with DanGajski’s Y-chart work. This is timely and necessary because in the ESL worldwe seem to be building tools without first establishing an appropriate designflow or methodology, thereby creating a lot of confusion. This taxonomy canhelp stem the tide of confusion.

—Gary Smith, Gartner Dataquest

Douglas Densmore

University of California, Berkeley

Roberto Passerone

University of Trento

Alberto Sangiovanni-Vincentelli

University of California, Berkeley

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 34: 105.100.000 DesignnTest ESL Special Edition Complete

among different classes to the designer, who may wish

to implement a specific integration flow. (The “Related

work” sidebar discusses other efforts to categorize ESL

design approaches.)

The ESL classification frameworkThe design framework shown in Figure 1 is based on

the platform-based design (PBD) paradigm presented

by Sangiovanni-Vincentelli and Martin.5 This framework

treats the design process as a sequence of steps that

repeat themselves as the design moves from higher

abstraction levels to implementation. The primary struc-

ture is a Y shape; thus, it is similar to the famous Y-chart

introduced by Gajski. The left branch expresses the

functionality (what) that the designer wishes to imple-

ment; the right branch expresses the elements the

designer can use to realize this functionality (how); and

the lower branch identifies the elements the designer

will use to implement the functionality (the mapping).6

In this context, the right branch is the platform, and it

includes

■ a library of elements, including IP blocks and com-

munication structures, and composition rules that

express which elements can be combined and how;

and

■ a method to assess the quantities associated with

each element—for example, power consumed or

time needed to carry out a computation.

Each legal composition of elements from the plat-

form is a platform instance. Mapping involves selecting

the design components (choosing the platform

instance) and assigning functionality parts to each ele-

ment, thus realizing the complete functionality, possi-

bly with overlaps. Designers optimize this process

according to a set of metrics and constraints defined

from the cost figures provided, or quantities mentioned.

The designers then use these metrics to evaluate the

design’s feasibility and quality.

This view of the design process is basically an

abstraction of a process that designers have used implic-

itly for years at particular abstraction levels. For exam-

Electronic System-Level Design

360 IEEE Design & Test of Computers

The number of electronic system-level (ESL) designersis reportedly several orders of magnitude larger than thenumber of IC designers. However, until the late 1990s, thesystem-level design market had been highly fragmented.Consumers were unwilling to pay a high price for tools, soEDA companies produced relatively simple tools. For mostof the products in this market, the end product’s complex-ity was not a limiting factor.

In the late 1990s, the situation began to change dra-matically as system complexity reached an inflection pointwith the appearance of increasingly powerful electronicdevices. Demand increased for demonstrably safe, effi-cient, and fault-tolerant operation of transportation systemssuch as automobiles and airplanes. Demand alsoincreased for greater functionality in IT and communica-tion devices, such as computing equipment and cellphones. During the past 10 years, several recalls (consid-er those from BMW and Daimler-Chrysler alone in the pasttwo years, for example) and delays in the launch of previ-ously announced products in the consumer electronicssectors demonstrated that new design methods, tools, andflows were sorely needed to prevent expensive fixes in thefield and to bring new products to the market more quick-ly and reliably.

This situation created the conditions for the birth of new

tool companies and new offerings in established EDA com-panies to address the needs of a changing market.However, because the system industry landscape is verydiverse—with companies varying as widely as Nokia andGeneral Motors, Boeing and Otis Elevators, and Hewlett-Packard and ABB—a design approach that could satisfyall these diverse needs would have required a large invest-ment, with a high risk of failure. Hence, the bulk of the ESLdesign effort (with a few notable exceptions) has comefrom academia and some small start-up companies tryingto address a subset of the many problems and gearedtoward a limited number of potential customers.

For years, Gartner Dataquest has predicted dramaticgrowth in ESL tool revenues, which unfortunately has failedto materialize. One of the reasons for unrealized growth isthe lack of a vision in EDA of what system-level designought to be and of how various tools fit in an overallmethodology that the system industry at large could satis-factorily adopt. Consequently, there is confusion about thevery definition of ESL and about what role it could play inthe overall design of electronic products. Some compa-nies have adopted ESL methodologies and tools, devel-oped either internally or in academic circles, integratingsome commercial tools as well. However, we are certainlyat a relatively early stage of adoption.

Trends affecting the ESL design market

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 35: 105.100.000 DesignnTest ESL Special Edition Complete

ple, interpreting the logic synthesis process in this

framework, we find the following:

■ RTL code or Boolean functions represent the

design’s functionality.

■ The platform includes a library of gates, or higher-

complexity logic blocks.

■ Mapping is the actual logic synthesis step that imple-

ments the functionality as an interconnection of

gates (platform instance) optimizing a set of metrics

involving area, power, and timing; the synthesis tool

then exports the mapped design (gate-level netlist)

to the layout phase, and the physical design tool

maps this representation to a physical platform.

The PBD paradigm applies equally well to the applica-

tion and algorithmic levels, where functionality can be a

mathematical description—for example, a Moving Picture

Experts Group (MPEG) encoding algorithm. Also, the plat-

form can be a set of subalgorithms for implementing each

functional block of the encoding method. The result of the

mapping process then goes to a lower level, where the left

branch is a mapped platform instance, and the right

361September–October 2006

We are not the first to realize the importance of catego-rizing ESL design approaches. Smith and Nadamuni usedtwo axes for this purpose.1 The first axis contains threemethodology components: an algorithmic methodology, aprocessor and memory methodology, and a control-logicmethodology. Each refers to the way in which a designerthinks about the design or its components. The secondaxis includes the abstraction levels to express the designs:behavioral, architectural, and platform based. Smith andNadamuni examined approximately 50 approaches in thisframework.

Maniwa presented a similar approach, also based ontwo axes, to categorize industrial tools.2 The first axis is thedesign style: embedded software, SoC (hardware), behav-ioral, or component. The second axis is the language (forexample, C, C++, or Verilog) to describe the design.Maniwa examined approximately 41 approaches.

Gries also used two axes to classify ESL tools devel-oped in academia and industry.3 The axes in this caserelated to abstraction levels (for example, system level andmicroarchitectural level) and design stages (such as appli-cation, architecture, and exploration). Gries examinedapproximately 19 approaches.

Finally, Bailey, Martin, and Anderson provided a com-prehensive set of taxonomies: a model taxonomy, a func-tional-verification taxonomy, a platform-based designtaxonomy, and a hardware-dependent software taxonomy.4

To the best of our knowledge, their book provides the bestclassification of high-level design tools, and we follow itsdefinitions when appropriate. Compared to their approach,our paradigm places tools in a more general design con-text and gives guidelines on how to connect the availabletools, and IP blocks and their models, in a design flow.

References1. G. Smith and D. Nadamuni, “ESL Landscape 2005,”

Gartner Dataquest, 2005.

2. T. Maniwa, “Focus Report: Electronic System-Level

(ESL) Tools,” Chip Design, Apr./May 2004, http://www.

chipdesignmag.com/display.php?articleId=23&issueId=4.

3. M. Gries, “Methods for Evaluating and Covering the Design

Space during Early Design Development,” Integration: The

VLSI J., vol. 38, no. 2, Dec. 2004, pp. 131-138.

4. B. Bailey, G. Martin, and T. Anderson, Taxonomies for

the Development and Verification of Digital Systems,

Springer, 2005.

Related work

FunctionalityF P

M

Platform

Mapping

Figure 1. Platform-based design classification framework

elements. Functionality indicates functional representations

of a design completely independent of implementation

architectures. Platform concerns the modules used to

implement the functional description—for example,

processors, memories, and custom hardware. Mapping refers

to instances of the design in which the functionality has been

assigned to a set of correctly interconnected modules.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 36: 105.100.000 DesignnTest ESL Special Edition Complete

branch is a new set of elements for implementing the

mapped platform instance. This process repeats until the

result of the mapping process is a fully implemented solu-

tion. Thus, the design process is partitioned into levels,

where each level represents a particular abstraction. The

corresponding platform and mapping process optimizes

specific aspects of the design.

This framework prescribes a unified design method-

ology and hence is useful for identifying where existing

tools and flows fit and how to integrate them in the over-

all system design process.

Classifying ESL toolsWe use the PBD paradigm to classify several ESL-relat-

ed tools. Doing so casts present system-level design efforts

in a global framework that serves as a unifying element.

Of course, existing approaches may fall into more than

one classification category because they cover more than

one step of PBD. We could consider this a fault of the

classification method, because a classification is effec-

tive only if it can cleanly partition the various objects

being classified. However, partitioning the design steps

rather than the tool coverage is more powerful because it

identifies the tools’ roles in the overall design paradigm.

Indeed, the classification criteria can provide hints on

how to connect different tools to yield an encompassing

design flow. We’ve developed an environment for design

space exploration called Metropolis, which completely

reflects the design paradigm followed here. Metropolis

can serve as the unifying framework for system design,

where tool developers can embed tools, libraries, and

approaches if the appropriate interfaces are built.

The classification classes reflect the Y-shaped diagram,

with an additional classification criterion related to the

abstraction level at which the tools work (see Figure 1):

Bin F consists of functional representations of a

design independent of implementation architectures

and with no associated physical quantity, such as time

or power. For example, a Simulink diagram expressing

an algorithm for automotive engine control and a

Ptolemy II description of an MPEG-decoding algorithm

both belong to this bin. These diagrams could be refine-

ments of more abstract representations such as meta-

models, as in Metropolis. To this bin, we assign tools that

manipulate, simulate, and formally or informally ana-

lyze functional descriptions.

Bin P represents the library of modules for imple-

menting the functional description. The modules are

architectural elements such as processors, memories,

coprocessors, FPGAs, custom hardware blocks, and

interconnections (buses, networks, and so on). The ele-

ments also include middleware, such as operating sys-

tems for processors and arbitration protocols for buses,

because these software components present the archi-

tectural services that the hardware offers to the applica-

tion software. To this bin, we assign tools for connecting

or manipulating the modules, as well as tools for ana-

lyzing the property of the complete or partial platform

instances obtained.

Bin M represents mapped instances of the design in

which the designer or an automatic mapping tool has

assigned functionality to a set of correctly intercon-

nected modules. The connection between bins F, P,

and M represents the mapping process. In this bin, we

classify any tool that assigns architectural elements to

functionality or generates the design’s mapped view.

For example, bin M would include a high-level synthesis

tool because the designer has assigned, perhaps man-

ually, part of the functionality to a virtual hardware com-

ponent in the platform and is asking the tool to generate

the lower-level view, in this case an RTL description of

the design. By the same token, we can classify a code

generation tool in bin M because the designer has

assigned (perhaps manually) part of the functionality

to a software-programmable element of the library and

is asking the tool to generate the lower-level view. In this

case, the view is a software program—whether assem-

bly language, C, or a higher-level language—which is

then compiled to move toward implementation. In this

article, we consider the compilation phase and the syn-

thesis from RTL to gates to be part of a traditional design

flow and thus not part of our ESL tool classification.

Some tools can handle two or even all three aspects

of the PBD paradigm. To classify these tools, we intro-

duce metaclasses (or metabins), indicated by combi-

nations of F, P, and M. For example, in metabin FM, we

assign a synthesis tool that handles functional compo-

nents along with their mappings to platform compo-

nents. Tools classified in metaclasses cover several parts

of the PBD design flow. Designers using these tools can

benefit from the design view we propose by clearly

decoupling function from architecture and mapping.

Doing so can enhance reusability and help the design-

er reach a correct implementation efficiently.

To make the partitioning of the tools finer, we intro-

duced another, orthogonal criterion for classification:

the abstraction level at which the tools operate.

Whereas PBD doesn’t limit the abstraction levels that

designers use per se, most of the tools we reviewed

work at three levels, listed here from highest to lowest:

Electronic System-Level Design

362 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 37: 105.100.000 DesignnTest ESL Special Edition Complete

■ System level S corresponds to heterogeneous designs

that use different models of computation (MoCs) to

represent function, platforms, and mappings.

■ Component level C involves subsystems containing

homogeneous components.

■ Implementation level I comprises the final design

step, when the design team considers the job

complete.

We now present our classification, beginning with

tools that fall into individual bins—those meant to be

part of a larger tool flow or that work in a very specific

application domain. We then address tools that cover

larger portions of the design flow space.

Bin FTools in this bin often serve to capture designs and

their specifications quickly without making any assump-

tions about the underlying implementation details (see

Tables 1-3). At this level, the descriptions might include

behavioral issues such as concurrency, or communi-

cation concepts such as communication protocols.

Some tools handle only one MoC—for example, finite-

state machines (FSMs). Others are more general, han-

dling a set of MoCs or having no restrictions. For

example, the Simulink representation language handles

discrete dataflow and continuous time. Hence, it is a

limited heterogeneous modeling-and-analysis tool.

Ptolemy II, with its actor-oriented abstract semantics,

363September–October 2006

Table 1. Tools in bin F: Industrial. (C: component level; I: implementation level; S: system level)

Provider Tools Focus Abstraction Web site

MathWorks Matlab High-level technical computing S: Matlab language, http://www.mathworks.com/products/

language and interactive vector, and matrix matlab

environment for algorithm operations

development, data visualization,

analysis, and numeric

computation.

Scilab Scicos Graphically model, compile, and S: Hybrid systems http://www.scilab.org

simulate dynamic systems

Novas Verdi Debugging for SystemVerilog I: Discrete event http://www.novas.com

Software

Mentor SystemVision Mixed-signal and high-level S: VHDL-AMS, http://www.mentor.com/products/

Graphics simulation Spice, C sm/systemvision

EDAptive EDAStar Military and aerospace S: Performance http://www.edaptive.com

Computing system-level design models

Time Rover DBRover, Temporal rules checking, pattern C: Statecharts http://www.time-rover.com

TemporalRover, recognition, and knowledge assertions

StateRover reasoning

Maplesoft Maple Mathematical problem S: Mathematical http://www.maplesoft.com

development and solving equations

Wolfram Mathematica Graphical mathematical S: Mathematical http://www.wolfram.com

Research development and problem equations

solving with support for Java,

C, and .Net

Mesquite CSIM 19 Process-oriented, general-purpose S: C, C++ http://www.mesquite.com

Software simulation toolkit for C and C++

Agilent Agilent Ptolemy Functional verification C: Timed http://www.agilent.com

Technologies synchronous

dataflow

National LabView Test, measurement, and control S: LabView http://www.ni.com/labview

Instruments application development programming

language

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 38: 105.100.000 DesignnTest ESL Special Edition Complete

can handle all MoCs. Depending on the MoC support-

ed, design entry for each tool could start at a higher or a

lower abstraction level.

Bin PThis category includes providers of platforms or plat-

form components, as well as tools and languages that

describe, manipulate, or analyze unmapped platforms

(see Tables 4 and 5). Similar to tools in bin F, those in bin

P can span several abstraction layers and support differ-

ent kinds of architectural components. For example,

Xilinx and Altera mainly concern programmable hard-

ware devices, whereas Tensilica focuses on configurable

processors. Others, such as Sonics and Beach Solutions,

focus on integration and communication components.

This category’s main characteristic is configurability,

which ensures the applicability of a platform or compo-

nents to a wide variety of applications and design styles.

Bin MThis bin contains tools dedicated to refining a func-

tional description into a mapped platform instance,

including its performance evaluation and possibly the

synthesis steps required to proceed to a more detailed

abstraction level (see Tables 6-8). The tools in bin M

vary widely in particular design style, MoC, and sup-

ported application area. To provide the necessary qual-

ity of results, the tools are typically very specific.

Electronic System-Level Design

364 IEEE Design & Test of Computers

Table 2. Tools in bin F: Academic.

Provider Tools Focus Abstraction Web site

Univ. of Ptolemy II Modeling, simulation, and design of S: All MoCs http://ptolemy.eecs.berkeley.edu

California, concurrent, real-time,

Berkeley embedded systems

Royal Inst. of ForSyDe System design starts with a C: Synchronous MoC http://www.imit.kth.se

Technology, synchronous computational

Sweden model, which captures

system functionality

Mozart Board Mozart Advanced development platform for S: Object-oriented http://www.mozart-oz.org

intelligent, distributed applications GUI using Oz

Table 3. Tools in bin F: Languages.

Provider Tools Focus Abstraction Web site

Celoxica Handel-C Compiling programs into hardware C: Communicating NA

images of FPGAs or ASICs sequential processes

Univ. of SpecC ANSI-C with explicit support for C: C language based http://www.ics.uci.edu/~specc

California, behavioral and structural

Irvine hierarchy, concurrency, state

transitions, timing, and

exception handling

Inria Esterel Synchronous-reactive C: Synchronous http://www-sop.inria.fr/meije/

programming language reactive esterel/esterel-eng.html

Univ. of Rosetta Compose heterogeneous S: All MoCs http://www.sldl.org

Kansas specifications in a single

declarative semantic

environment

Mozart Board Oz Advanced, concurrent, networked, C: Dataflow http://www.mozart-oz.org

soft real-time, and reactive synchronization

applications

Various ROOM Real-time object-oriented modeling S: Object oriented NA

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 39: 105.100.000 DesignnTest ESL Special Edition Complete

Metabin FPThis category consists of languages that can express

both functionality and architecture (see Tables 9 and 10

on p. 368). Typically, they express algorithms and differ-

ent styles of communication and structure for different

MoCs. Assertions, or constraints, complement the platform

description. In the case of Unified Modeling Language

(UML), the semantics are often left unspecified.

Metabin FMThis metabin reflects tools that provide some com-

bination of functional description and analysis capa-

bilities plus mapping and synthesis capabilities (see

Table 11 on p. 368). In this case, the platform architec-

ture is typically fixed. This lack of flexibility is offset by

the often superior quality of achievable implementation

results.

Metabin PMThis metabin includes tools that combine architec-

tural services and mapping (see Tables 12-14 on pp. 369-

370). These tools have a tight coupling between the

services they provide and how functionality can map to

these services. They require the use of other tools for

some aspect of system design (often in the way the

design functionality is specified).

Metabin FPMEntries in this category are the frameworks that sup-

port the PBD paradigm (see Tables 15 and 16 on p. 371).

365September–October 2006

Table 4. Tools in bin P: Industrial.

Provider Tools Focus Abstraction Web site

Prosilog Nepsys Standards-based IP libraries and C: RTL and http://www.prosilog.com

support tools (SystemC) transaction-level

SystemC; VHDL

for SoCs

Beach EASI-Studio Solutions to package and deploy C: Interconnection http://www.beachsolutions.com

Solutions IP in a repeatable, reliable manner

Altera Quartus II FPGAs, CPLDs, and structured I: IP blocks, C, and http://www.altera.com

ASICs RTL; FPGAs

Xilinx Platform Studio IP integration framework C: IP blocks, FPGAs http://www.xilinx.com

Mentor Nucleus Family of real-time operating S: Software http://www.mentor.com/products/

Graphics systems and development tools embedded_software/nucleus_rtos

Sonics Sonics Studio On-chip interconnection I: Bus-functional http://www.sonicsinc.com

infrastructure models

Xilinx ISE, EDK, FPGAs, CPLDs, and structured I: IP blocks, C, and http://www.xilinx.com

XtremeDSP ASICs RTL; FPGAs

Design and Hosted Extranet IP delivery systems S: All types of IP http://www.design-reuse.com

Reuse Services

Stretch Software Compile a subset of C into C: Software- http://www.stretchinc.com

Configurable hardware for instruction configurable

Processor extensions processors

compiler

ProDesign CHIPit Transaction-based verification C: FPGA-based rapid http://www.prodesign-usa.com

platform prototyping

Table 5. Tools in bin P: Languages.

Provider Tools Focus Abstraction Web site

Spirit Spirit IP exchange and integration S: Various IP levels http://www.spiritconsortium.com

Consortium standard written in XML

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 40: 105.100.000 DesignnTest ESL Special Edition Complete

In particular, Metropolis fully embodies this paradigm,

covering all bins and all abstraction layers. In this cate-

gory, we include design space exploration tools and lan-

guages that can separately describe the functionality on

the one hand, and the possible architectures for an

implementation on the other. These tools can also map

the functionality onto the platform instances to obtain

metrics for the implementation’s performance.

Design scenariosHere, we use the PBD framework of Figure 1 to map

three design flow scenarios on the tool landscape.

Figure 2 (see p. 372) shows the metabins and the hier-

archical levels where activities take place.

Scenario 1: New application design fromspecification

The requirements of this scenario include the need

to start from a high-level specification; the desire to cap-

ture and modify the initial specification quickly; the

ability to express concurrency, constraints, and other

behavior-specific characteristics efficiently; and the

ability to capture useful abstract services for imple-

menting high-level specifications into a more detailed

functional view. The flow thus starts at the higher

abstraction levels in bin F of our classification. We can

expand these levels into a Y diagram of the same struc-

ture as the one described in Figure 1. This structure

offers

■ flexible specification capture—no ties to a particu-

lar implementation style or platform;

■ services that help move the abstract design toward a

more constrained version (for example, algorithms

that can implement functionality); and

■ independent mapping of functionality onto algo-

rithmic structures that enable reuse of the functional

specification.

Electronic System-Level Design

366 IEEE Design & Test of Computers

Table 6. Tools in bin M: Industrial, set I.

Provider Tools Focus Abstraction Web site

MathWorks Real-Time Code generation and embedded- S: Simulink-level http://www.mathworks.com

Workshop software design models

dSpace TargetLink Optimized code generation and S: Simulink models http://www.dspace.com

software development

ETAS Ascet Modeling, algorithm design, code S: Ascet models http://en.etasgroup.com/products/

generation, and software ascet/index.shtml

development, with emphasis on

the automotive market

Y Explorations eXCite Take virtually unrestricted ISO or S: C language input http://www.yxi.com

ANSI-C with channel I/O behavior

and generate Verilog or VHDL

RTL output for logic synthesis

AccelChip AccelChip and DSP synthesis; Matlab to RTL C: Matlab http://www.accelchip.com

AccelWare

Forte Design Cynthesizer Behavioral synthesis C: SystemC to RTL http://www.forteds.com

Systems

Future Design System Center ASCI-C to RTL synthesis toolset C: C to RTL http://www.future-da.com

Automation Co-development

Suite

Catalytic DeltaFX, RMS Synthesis of DSP algorithms on I: Matlab algorithms http://www.catalytic-inc.com

processors or ASICs

ACE CoSy Automatic generation of compilers I: DSP-C and http://www.ace.nl

Associate for DSPs embedded-C

Compiler language extensions

Experts

Tenison VTOC RTL to C++ or SystemC I: RTL, transactional http://www.tenison.com

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 41: 105.100.000 DesignnTest ESL Special Edition Complete

Let’s examine an example in the multimedia

domain: the implementation of a JPEG encoder on a

heterogeneous multiprocessor architecture such as the

Intel MXP5800. This architecture has eight image signal

processors (ISP1 to ISP8) connected with programmable

quad ports (eight per processor).7 The encoder com-

presses raw image data and emits a compressed bit-

stream. The first step in the scenario is to choose a

367September–October 2006

Table 7. Tools in bin M: Industrial, set II.

Provider Tools Focus Abstraction Web site

Sequence ESL Power Power analysis and optimization I: SystemC level http://www.sequencedesign.com

Design Technology,

Power Theater,

CoolTime,

CoolPower

PowerEscape PowerEscape Memory hierarchy design, code C: C code http://www.coware.com/products/

(with Architect, performance analysis, powerescape.php

CoWare) PowerEscape complete profiling

Synergy,

PowerEscape

Insight

CriticalBlue Cascade Design flow for application-specific I: C code to Verilog http://www.criticalblue.com

hardware acceleration or VHDL

coprocessors for ARM processors

Synfora PICO Express C to RTL, or C to System C I: Pipeline processor http://www.synfora.com

(transaction-level models) arrays

Actis AccurateC Static code analysis for SystemC C: C syntax and http://www.actisdesign.com

semantic checking

Impulse CoDeveloper C to FPGA C: C code http://www.impulsec.com

Accelerated

Technologies

Poseidon Triton Tuner, Design flow for application-specific C: C and SystemC http://www.poseidon-systems.com

Design Triton Builder hardware acceleration

Systems coprocessors

SynaptiCAD SynaptiCAD line Testbench generators and C: RTL and SystemC http://www.syncad.com

simulators

Avery TestWizard Verilog HDL, VHDL, and C-based I: RTL and C http://www.avery-design.info

Design testbench automation

Systems

Emulation and ZeBu Functional verification I: Hardware emulation http://www.eve-team.com

Verification

Engine

Table 8. Tools in bin M: Academic.

Provider Tools Focus Abstraction Web site

Univ. of Impact Compiler Compilation development for S: C code for high- http://www.crhc.uiuc.edu/Impact

Illinois at instruction-level parallelism performance

Urbana- processors

Champaign

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 42: 105.100.000 DesignnTest ESL Special Edition Complete

particular MoC to describe the design’s functionality.

To be more efficient in applying our proposed design

paradigm, the designer should use a MoC that is also

suitable for describing the architecture’s capabilities.

Hence, the designer eases the mapping task and the

analysis of the mapped design’s properties. In addition,

a synthesis step could execute the mapping process

automatically.

Because this is a data-streaming application that

maps onto a highly concurrent architecture, it is natur-

al to use a Kahn process networks (KPN) representa-

tion. In KPN, a set of processes communicate through

one-way FIFO channels. Reads from channels are

blocked when no tokens are present; processes cannot

query the channel status. However, this model is Turing

complete, so scheduling and buffer size are undecid-

able. The KPN model of the JPEG encoder algorithm is

completely independent of the target architecture sat-

Electronic System-Level Design

368 IEEE Design & Test of Computers

Table 9. Tools in metabin FP: Industrial.

Provider Tools Focus Abstraction Web site

MathWorks Simulink, Modeling, algorithm design, and S: Timed dataflow, http://www.mathworks.com

State Flow software development FSMs

Table 10. Tools in metabin FP: Languages.

Provider Tools Focus Abstraction Web site

Open SystemC Provide hardware-oriented S: Transaction level http://www.systemc.org

SystemC constructs within the to RTL

Initiative context of C++

Object Unified Specify, visualize, and document S: Object-oriented, http://www.uml.org

Management Modeling software system models diagrams

Group Language

Accellera SystemVerilog Hardware description and verification S: Transaction level, http://www.systemverilog.org

language extension of Verilog RTL, assertions

Table 11. Tools in metabin FM: Industrial.

Provider Tools Focus Abstraction Web site

Celoxica DK Design Suite, Algorithmic design entry, behavioral C: Handel-C based http://www.celoxica.com

Agility Compiler, design, simulation, and synthesis

Nexus-PDK

BlueSpec BlueSpec BlueSpec SystemVerilog rules S: SystemVerilog and http://www.bluespec.com

Compiler, and libraries term-rewriting

BlueSpec synthesis

Simulator

I-Logix Rhapsody and Real-time UML-embedded S: UML based http://www.ilogix.com

Statemate applications

Mentor Catapult C C++ to RTL synthesis C: Untimed C++ http://www.mentor.com

Graphics

Esterel SCADE, Esterel, Code generation for safety-critical I: Synchronous http://www.esterel-technologies.com

Technologies Studio applications such as avionics and

automotive

Calypto SLEC System Functional verification between C: SystemC, RTL http://www.calypto.com

system level and RTL

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 43: 105.100.000 DesignnTest ESL Special Edition Complete

isfying the requirements for this scenario. We could use

Ptolemy II to capture this model and simulate the select-

ed algorithm’s behavior.

To allow a better analysis and to refine the model

toward implementation, we can map this model into

another dataflow model, similar to cyclostatic dataflow,8

which permits only one writer per channel but allows

multiple reader processes. For all channels, each reader

process can read each data token exactly once. Also, this

dataflow model allows limited forms of data-dependent

communication. To enable the execution of multiple

processes on a single processing element, this MoC sup-

ports multitasking. In particular, the system may suspend

a process only between firings. Because of the limitations

just discussed, this MoC lets designers decide scheduling,

buffer sizing, and mapping. It is easy to express the model

in Ptolemy II and to describe it in Simulink or the Signal

Processing Worksystem (SPW). This first step—mapping

a more flexible model for the functionality into a more

restricted one that is easier to implement and analyze—

is critical in any system-level design.

Subsequently, the mapped specification becomes

the functional representation for the diagram in Figure

1. So, the flow can continue at lower abstraction levels

with tools in metabin FM for an integrated solution, or

in bin F followed by M for a multitool solution. Because

most of the architecture is fixed, an efficient, special-

ized approach is more appropriate. Figure 2a shows a

369September–October 2006

Table 12. Tools in metabin PM: Industrial, set I.

Provider Tools Focus Abstraction Web site

ARM RealView Embedded microprocessors and C: C++ ARM http://www.arm.com

MaxSim development tools; system-level processor

development tools development

Tensilica Xtensa, XPRES Programmable solutions with C: Custom ISA http://www.tensilica.com

specialized Xtensa processor processor, C and

description from native C and C++ code

C++ code

Summit System Architect, Efficiently design and analyze the C: SystemC http://www.sd.com

Visual Elite architecture and implementation

of multicore SoCs and

large-scale systems

VaST Comet, Meteor Very high-performance processor S: Virtual processor, http://www.vastsystems.com

Systems and architecture models bus, and peripheral

Technology devices

Virtio Virtio Virtual High-performance software model I: Virtual platform http://www.virtio.com

Platform of a complete system models at

SystemC level

Cadence Incisive Integrated tool platform for S: RTL and SystemC http://www.cadence.com

verification, including simulation, assertions

formal methods, and emulation

Mentor Platform Express XML-based integration environment C: XML-based http://www.mentor.com

structure

SpiraTech Cohesive Protocol abstraction transformers C: Transaction level, http://www.spiratech.com

IP blocks

ARC ARC Embedded microprocessors and I: ISA extensions, http://www.arc.com

International development tools microarchitectural

level

Arithmatica CellMath Proprietary improvements for I: Microarchitectural http://www.arithmatica.com

Tool Suite implementing silicon computational datapath

units computation elements

and design

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 44: 105.100.000 DesignnTest ESL Special Edition Complete

potential traversal of the framework. For our JPEG case,

we can map the functionality onto the MXP5800 using

the Metropolis environment to analyze potential prob-

lems with the architecture or to optimize the applica-

tion’s coding for the chosen platform instance.

Scenario 2: New integration platformdevelopment

This scenario describes the development of a new

integration platform: a hardware architecture, embed-

ded-software architecture, design methodologies

Electronic System-Level Design

370 IEEE Design & Test of Computers

Table 13. Tools in metabin PM: Industrial, set II.

Provider Tools Focus Abstraction Web site

Target Chess (compiler), Retargetable tool suite for I: Mapping of C code http://www.retarget.com

Compiler Checkers (ISS) developing, programming, and to processors written

Technologies verifying embedded IP cores in nML

Arteris Danube, Synthesis of NoC C: NoC dataflow http://www.arteris.net

NoCexplorer

ChipVision Orinoco Pre-RTL power prediction for C: SystemC http://www.chipvision.com

Design behavioral synthesis algorithm input

Systems

Wind River Various Provide various platforms for I: Software API http://www.windriver.com

Systems platform different design segments

solutions (auto, consumer)

CoWare ConvergenSC Capture, design, and verification S: SystemC http://www.coware.com

for SystemC functionality input;

SystemC, HDL

services

Carbon VSP Presilicon validation flow C: Verilog and VHDL, http://www.carbondesignsystems.com

Design bus protocols

Systems

GigaScale IC InCyte Chip estimation and architecture S: High-level chip http://www.chipestimate.com

analysis information (gate

count, I/O, IP blocks)

Virtutech Virtutech Simics Build, modify, and I: C language http://www.virtutech.com

program new virtual and ISAs

systems

National LabView 8 FPGA Create custom I/O and control C: LabView graphical http://www.ni.com/fpga

Instruments hardware for FPGAs programming

CoWare LisaTek Embedded-processor C: Lisa architecture http://www.coware.com

design tool suite description language

Table 14. Tools in metabin PM: Academic.

Provider Tools Focus Abstraction Web site

Carnegie MESH Enable heterogeneous microdesign C: C input; http://www.ece.cmu.edu/~mesh

Mellon Univ. through new simulation, programmable,

modeling, and design strategies heterogeneous

multiprocessors

Univ. of xPilot Automatically synthesize high-level C: C, SystemC http://cadlab.cs.ucla.edu/soc

California, behavioral descriptions for silicon

Los Angeles platforms

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 45: 105.100.000 DesignnTest ESL Special Edition Complete

(authoring and integration), design guidelines and

modeling standards, virtual-components characteri-

zation and support, and design verification (hardware-

software, hardware prototype), focusing on a

particular target application.9 Unlike the first scenario,

this one is not concerned with the design of a particu-

lar application but rather with the development of a

substrate to realize several applications. Characteristic

of this scenario is the service- and mapping-centric

requirements that concern tools in metabin PM for

development and analysis at the desired abstraction

level. The platform developer builds the substrate, or

platform, and uses the tools in metabin PM. The plat-

form user proceeds in metabin FM to map the desired

371September–October 2006

Table 15. Tools in metabin FPM: Industrial.

Provider Tools Focus Abstraction Web site

CoFluent CoFluent Studio Design space exploration through S: Transaction-level http://www.cofluentdesign.com

Design Y-chart modeling of functional SystemC

and architectural models

MLDesign MLDesigner Integrated platform for modeling S: Discrete event, http://www.mldesigner.com

Technologies and analyzing the architecture, dynamic dataflow,

function, and performance of and synchronous

high-level system designs dataflow

Mirabilis VisualSim Multidomain simulation kernel and S: Discrete event, http://www.mirabilisdesign.com

Design product family extensive modeling library synchronous

dataflow,

continuous time,

and FSM

Synopsys System Studio Algorithm and architecture capture, S: SystemC http://www.synopsys.com

performance evaluation

Table 16. Tools in metabin FPM: Academic.

Provider Tools Focus Abstraction Web site

Univ. of Metropolis Operational and denotational S: All MoCs http://www.gigascale.org/metropolis

California, functionality and architecture

Berkeley capture, mapping, refinement,

and verification

Seoul Peace Codesign environment for rapid S: Objected-oriented http://peace.snu.ac.kr

National development of heterogeneous C++ kernel (Ptolemy

Univ. digital systems based)

Vanderbilt GME, Great, Metaprogrammable tool for S: Graph http://repo.isis.vanderbilt.edu

Univ. Desert navigating and pruning large transformation, UML

design spaces and XML based,

and external

component support

Delft Univ. Artemis, Workbench enabling methods and C: Kahn process http://ce.et.tudelft.nl/artemis

of Compaan and tools to model applications and networks

Technology Laura, Sesame, SoC-based architectures

Spade

Univ. of Mescal Programming of application-specific S: Extended Ptolemy II, http://www.gigascale.org/mescal

California, programmable platforms network processors

Berkeley

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 46: 105.100.000 DesignnTest ESL Special Edition Complete

functionality to the selected platform instance. Figure

2b illustrates the metabin flows that support these

development requirements.

Consider as a test case the development of a new

electronic control unit (ECU) platform for an automo-

tive engine controller. The application designers have

already developed the application code for the plat-

form, but a Tier 1 supplier wants to improve the cost and

performance of its part of the platform to avoid losing

an important original equipment manufacturer (OEM)

customer. If the designers employ the paradigm

described in this article, the application becomes as

independent on the ECU platform as possible. Next, in

collaboration with a Tier 2 supplier (a chip maker), the

Tier 1 supplier determines qualitatively that a dual-core

architecture would offer better performance at a lower

manufacturing cost. A platform designer then uses a

tool for platform development, such as LisaTek, to cap-

ture the dual-core architecture. If the dual core is based

on ARM processing elements, the designers and the Tier

1 supplier can also use ARM models and tool chains. An

appropriate new real-time operating system could

exploit the implementation’s multicore nature. At this

point, the designers map the application onto one of the

possible dual-core architectures, considering the num-

ber of bits supported by the CPU, the set of peripherals

to integrate, and the interconnect structure. For each

choice, the designers simulate the mapped design with

the engine control software or a subset of it to stress the

architecture. These simulations can employ the ARM

tools or VaST offerings to rapidly obtain important sta-

tistics such as interconnect latency and bandwidth,

overall system performance, and power consumption.

At the end of this exercise, the Tier 2 supplier is fairly

confident that its architecture is capable of supporting a

full-fledged engine control algorithm. Any other Tier 1

supplier can use this product now for its engine control

offering.

Electronic System-Level Design

372 IEEE Design & Test of Computers

F1

Functionality

PlatformFunctionality

Mapping

F0 P0

Step 1.PM-basedtool

Step 2. FM(augmentedfunctionality)

Option 1.P-tool at

appropriateabstraction

level

Functionality Platform Platform

Mapping

Mapping

F0 P0

M0 M0

F0 P0

M0

F1F1 P1

P1P1

M1 M1

M1 Mapping

Step 1.FPM-basedtool

Option 1a. F(multitools)

Option 2.FP with synthesisto lower-levelflows

Option 1b. M(multitools)

Option 2. FM(integrated tools)

(Functionality

Functionality

PlatformFunctionality

Platform

Mapping

Mapping

Platform

(a) (b) (c)

Figure 2. Metabins and hierarchical levels for three design scenarios: new application design from specification

(a), new integration platform development (b), and legacy design integration (c).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 47: 105.100.000 DesignnTest ESL Special Edition Complete

Scenario 3: Legacy design integrationThe final scenario represents a common situation

for many companies wishing to integrate their exist-

ing designs into new ESL flows. In this case, it’s diffi-

cult to separate functionality and architecture,

because in most embedded systems the documenta-

tion refers to the final implementation, not to its orig-

inal specifications and the relative implementation

choices. If modifying the design is necessary to imple-

ment additional features, it’s very difficult to deter-

mine how the new functionality will affect the existing

design. This situation calls for reverse engineering to

extract functionality from the final implementation.

The most effective way to do this might be to start the

description of the functionality from scratch, using

tools in bin F. An alternative might be an effective

encapsulation of the legacy part of the design so that

the new part interacts cleanly with the legacy part. We

could then consider existing components as archi-

tectural elements that we must describe using tools in

bin P. This, in turn, is possible at different abstraction

levels. Because legacy components typically support

a specific application, mapping is often unnecessary,

and functional or architectural cosimulation can val-

idate a new design. Metabin FP at the system level is

therefore the appropriate flow model in this case.

Figure 2c illustrates this scenario.

ESL WILL EVENTUALLY BE in the limelight of the

design arena. But structural conditions in the EDA

and electronics industry must change to offer a suffi-

ciently receptive environment that will allow the

birth of new companies and the evolution of present

ones into this exciting area. An important technical

prerequisite is industry and academia agreement on

a holistic view of the design process in which to cast

existing and future tools and flows. Our unified

design framework can act as a unifying element in

the ESL domain. However, standardization of system-

level design will take years and require significant

effort to fully materialize. ■

AcknowledgmentsWe thank the following for their support in reviewing

this article and in helping to classify the various ESL

approaches. Without them, this article would not have

been possible: Abhijit Davare, Alessandro Pinto, Alvise

Bonivento, Cong Liu, Gerald Wang, Haibo Zeng, Jike

Chong, Kaushik Ravindran, Kelvin Lwin, Mark McKelvin,

N.R. Satish, Qi Zhu, Simone Gambini, Wei Zheng, Will

Plishker, Yang Yang, and Yanmei Li. A special thanks

goes to Guang Yang and Trevor Meyerowitz for their valu-

able feedback. This work was done under partial sup-

port from the Center for Hybrid Embedded Software

Systems and the Gigascale Systems Research Center.

References1. G. Smith et al., Report on Worldwide EDA Market

Trends, Gartner Dataquest, Dec. 2005.

2. J. Vleeschouwer and W. Ho, “The State of EDA: Just

Slightly up for the Year to Date Technical and Design

Software,” The State of the Industry, Merrill Lynch report,

Dec. 2005.

3. International Technology Roadmap for Semiconductors

2004 Update: Design, 2004, http://www.itrs.net/Links/

2004Update/2004_01_Design.pdf.

4. A. Sangiovanni-Vincentelli, “Defining Platform-Based

Design,” EE Times, Feb. 2002, http://www.eetimes.com/

news/design/showArticle.jhtml?articleID=16504380.

5. A. Sangiovanni-Vincentelli and G. Martin, “Platform-

Based Design and Software Design Methodology for

Embedded Systems,” IEEE Design & Test, vol. 18, no. 6,

Nov.-Dec. 2001, pp. 23-33.

6. D.D. Gajski and R.H. Kuhn, “Guest Editors’ Introduction:

New VLSI Tools,” Computer, vol. 16, no. 12, Dec. 1983,

pp. 11-14.

7. A. Davare et al., “JPEG Encoding on the Intel

MXP5800: A Platform-Based Design Case Study,” Proc.

3rd Workshop Embedded Systems for Real-Time Multi-

media (ESTIMedia 05), IEEE CS Press, 2005, pp. 89-94.

8. G. Bilsen et al., “Cyclo-Static Dataflow,” IEEE Trans. Sig-

nal Processing, vol. 44, no. 2, Feb. 1996, pp. 397-408.

9. H. Chang et al., Surviving the SOC Revolution: A Guide

to Platform-Based Design, Kluwer Academic Publishers,

1999.

Douglas Densmore is a PhD can-didate in the Department of ElectricalEngineering and Computer Sciencesat the University of California, Berke-ley. His research interests focus on

system-level architecture modeling, with emphasis onarchitecture refinement techniques for system-leveldesign. Densmore has a BS in computer engineeringfrom the University of Michigan, Ann Arbor, and an MSin electrical engineering from the University of Califor-nia, Berkeley. He is a member of the IEEE.

373September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 48: 105.100.000 DesignnTest ESL Special Edition Complete

Roberto Passerone is an assistantprofessor in the Department of Infor-mation and Communication Technolo-gy at the University of Trento, Italy. Hisresearch interests include system-

level design, communication design, and hybrid sys-tems. Passerone has a Laurea degree in electricalengineering from Politecnico di Torino, Italy, and anMS and a PhD in electrical engineering and computersciences from the University of California, Berkeley. Heis a member of the IEEE.

Alberto Sangiovanni-Vincentelliholds the Buttner Endowed Chair of theElectrical Engineering and ComputerSciences Department at the Universityof California, Berkeley. His research

interests include design tools and methodologies, large-

scale systems, embedded controllers, and hybrid sys-tems. Sangiovanni-Vincentelli has a PhD in engineeringfrom Politecnico di Milano. He is cofounder of Cadenceand Synopsys, an IEEE Fellow, a member of the Gener-al Motors Scientific and Technology Advisory Board,and a member of the National Academy of Engineering.

Direct questions or comments about this article toDouglas Densmore, Dept. of Electrical Engineeringand Computer Sciences, Univ. of California, Berkeley,545Q Cory Hall (DOP Center), Berkeley, CA 94720;[email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

Electronic System-Level Design

374 IEEE Design & Test of Computers

DON’T RUN THE RISK.BE SECURE.

Ensure that your networks operate safely and provide critical services even in the face of attacks. Develop lasting security solutions, with this

peer-reviewed publication.

Top security professionals in the field share information you can rely on:

Wireless Security • Securing the Enterprise • Designing for Security Infrastructure Security • Privacy Issues • Legal Issues • Cybercrime • Digital Rights Management

• Intellectual Property Protection and Piracy • The Security Profession • Education

Order your subscription today.

Submit an article to IEEE Security & Privacy. Log onto Manuscript Central at http://cs-ieee.manuscriptcentral.com/.

www.computer.org/security/

BE SECURE.DON’T RUN THE RISK.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.

Page 49: 105.100.000 DesignnTest ESL Special Edition Complete

0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 375

THE MAIN REASON people have proposed C-like lan-

guages for hardware synthesis is familiarity. Proponents

claim that by synthesizing hardware from C, we can effec-

tively turn every C programmer into a hardware design-

er. Another common motivation is hardware-software

codesign: Designers often implement today’s systems as a

mix of hardware and software, and it’s often unclear at

the outset which portions can be hardware and which

can be software. The claim is that using a single language

for both simplifies the migration task.

I argue that these claims are questionable and that

pure C is a poor choice for specifying hardware. On the

contrary, the semantics of C and similar imperative lan-

guages are distant enough from hardware that C-like

thinking might be detrimental to hardware design.

Instead, successful hardware synthesis from C seems to

involve languages that vaguely resemble C, mostly its

syntax. Examples of these languages include Celoxica’s

Handel-C1 and NEC’s Behavior Description Language

(BDL).2 You can think of executing C code on a tradi-

tional sequential processor as synthesizing hardware

from C, but the techniques presented here strive for

more highly customized implementations that exploit

greater parallelism, hardware’s main advantage.

Unfortunately, the C language has no support for user-

specified parallelism, and so either the synthesis tool

must find it (a difficult task) or the

designer must use language extensions

and insert explicit parallelism. Neither

solution is satisfactory, and the latter

requires that C programmers think dif-

ferently to design hardware.

My main point is that giving C pro-

grammers tools is not enough to turn

them into reasonable hardware design-

ers. Efficient hardware is usually very difficult to describe

in an unmodified C-like language, because the language

inhibits specification or automatic inference of adequate

concurrency, timing, types, and communication. The

most successful C-like languages, in fact, bear little

semantic resemblance to C, effectively forcing users to

learn a new language (but perhaps not a new syntax).

As a result, techniques for synthesizing hardware from

C either generate inefficient hardware or propose a lan-

guage that merely adopts part of C syntax.

Here, I focus only on the use of C-like languages for

hardware synthesis and deliberately omit discussion of

other important uses of a design language, such as vali-

dation and algorithm exploration. C-like languages are far

more compelling for these tasks, and one in particular,

SystemC, is now widely used, as are many ad hoc variants.

A short history of CDennis Ritchie developed C in the early 1970s,3

based on experience with Ken Thompson’s B language,

which had evolved from Martin Richards’ Basic

Combined Programming Language (BCPL). Ritchie

described all three as “close to the machine” in the

sense that their abstractions are similar to data types and

operations supplied by conventional processors.

A core principle of BCPL is its memory model: an

The Challenges ofSynthesizing Hardware fromC-Like Languages

Editor’s note:This article presents one side of an ongoing debate on the appropriatenessof C-like languages as hardware description languages. The article examinesvarious features of C and their mapping to hardware, and makes a cogentargument that vanilla C is not the right language for hardware description ifsynthesis is the goal.

—Sandeep K. Shukla, Virginia Polytechnic and State University

Stephen A. Edwards

Columbia University

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 50: 105.100.000 DesignnTest ESL Special Edition Complete

undifferentiated array of words. BCPL represents inte-

gers, pointers, and characters all in a single word; the

language is effectively typeless. This made perfect sense

on the word-addressed machines BCPL was targeting,

but it wasn’t acceptable for the byte-addressed PDP-11

on which C was first developed.

Ritchie modified BCPL’s word array model to add the

familiar character, integer, and floating-point types now

supported by virtually every general-purpose processor.

Ritchie considered C’s treatment of arrays to be charac-

teristic of the language. Unlike other languages that have

explicit array types, arrays in C are almost a side effect

of its pointer semantics. Although this model leads to

simple, efficient implementations, Ritchie observed that

the prevalence of pointers in C means that compilers

must use careful dataflow techniques to avoid aliasing

problems while applying optimizations.

Ritchie listed a number of infelicities in the language

caused by historical accident. For example, the use of

break to separate cases in switch statements arose

because Ritchie copied an early version of BCPL; later

versions used endcase. The precedence of bitwise-AND

is lower than the equality operator because the logical-

AND operator was added later.

Many aspects of C are greatly simplified from their

BCPL counterparts because of limited memory on the

PDP-11 (24 Kbytes, of which 12 Kbytes were devoted to

the nascent Unix kernel). For example, BCPL allowed

the embedding of arbitrary control flow statements with-

in expressions. This facility doesn’t exist in C, because

limited memory demanded a one-pass compiler.

Thus, C has at least four defining characteristics: a set

of types that correspond to what the processor directly

manipulates, pointers instead of a first-class array type,

several language constructs that are historical accidents,

and many others that are due to memory restrictions.

These characteristics are well-suited to systems soft-

ware programming, C’s original application. C compil-

ers have always produced efficient code because the C

semantics closely match the instruction set of most gen-

eral-purpose processors. This also makes it easy to

understand the compilation process. Programmers rou-

tinely use this knowledge to restructure source code for

efficiency. Moreover, C’s type system, while generally

very helpful, is easily subverted when needed for low-

level access to hardware.

These characteristics are troublesome for synthesiz-

ing hardware from C. Variable-width integers are natur-

al in hardware, yet C supports only four sizes, all larger

than a byte. C’s memory model is a large, undifferenti-

ated array of bytes, yet hardware is most effective with

many small, varied memories. Finally, modern compil-

ers can assume that available memory is easily 10,000

times larger than that available to Ritchie.

C-like hardware synthesis languagesTable 1 lists some of the C-like hardware languages

proposed since the late 1980s (see also De Micheli4).

One of the earliest was Cones, from Stroud et al.5 From

a strict subset of C, it synthesized single functions into

combinational blocks. Figure 1 shows such a function.

Cones could handle conditionals; loops, which it

unrolled; and arrays treated as bit vectors.

Ku and De Micheli developed HardwareC6 for input

to their Olympus synthesis system.7 It is a behavioral

hardware language with a C-like syntax and has exten-

sive support for hardware-like structure and hierarchy.

Electronic System-Level Design

376 IEEE Design & Test of Computers

Performance or bustThroughout this article, I assume that optimizing perfor-

mance—for example, speed under area and power con-straints—is the main goal of hardware synthesis (beyond, ofcourse, functional correctness). This assumption implicitlyshapes all my criticisms of using C for hardware synthesis andshould definitely be considered carefully.

On the one hand, performance optimization has obviouseconomic advantages: An efficient circuit solves problemsfaster, is cheaper to manufacture, requires less power, and soforth. Historically, this has been the key focus of logic synthe-sis, high-level synthesis, and other automated techniques forgenerating circuits.

On the other hand, optimization can have disadvantagessuch as design time and nonrecurring engineering costs. Thedistinction between full-custom ICs and ASICs illustrates this.A company like Intel, for example, is willing to invest an enor-mous number of hours in designing and hand-optimizing itsnext microprocessor’s layout because of the volume and mar-gins the company commands. A company like Cisco, howev-er, might implement its latest high-end router on an FPGAbecause it doesn’t make economic sense to design a com-pletely new chip. Both approaches are reasonable.

A key question, then, is: What class of problems does hard-ware synthesis from C really target? This article assumes anaudience of traditional hardware designers who want to designhardware more quickly, but other articles target designers whowould otherwise implement their designs in software but needfaster results. The soundness of my conclusions may welldepend on which side of this fence you’re on.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 51: 105.100.000 DesignnTest ESL Special Edition Complete

Figure 2 shows the greatest common divisor (GCD) algo-

rithm in HardwareC.

Galloway’s Transmogrifier C is a fairly small C subset

that supports integer arithmetic, conditionals, and loops.8

Unlike Cones, it generates sequential designs by inferring

a state at function calls and at the beginning of while

loops. Figure 3 shows a decoder in Transmogrifier C.

377September–October 2006

Table 1. C-like languages for hardware synthesis.

Language Comment

Cones Early, combinational only

HardwareC Behavioral synthesis centered

Transmogrifier C Limited scope

SystemC Verilog in C++

Ocapi Algorithmic structural descriptions

C2Verilog Comprehensive

BDL Many extensions and restrictions (NEC)

Handel-C C with CSP (Celoxica)

SpecC Resolutely refinement based

Bach C Untimed semantics (Sharp)

CASH Synthesizes asynchronous circuits

Catapult C ANSI C++ subset (Mentor Graphics)

INPUTS: IN[5];OUTPUT: OUT[3];rd53(){

int count, i;count = 0;for (i=0 ; i<5 ; i++)if (IN[i] == 1)count = count + 1;

for (i=0 ; i<3 ; i++) {OUT[i] = count & 0x01;count = count >> 1;

}}

Figure 1. A function that returns a count of the

number of 1’s in a five-bit vector in Cones. The

function is translated into a combinational

circuit.

#define SIZE 8process gcd (xi, yi, rst, ou)

in port xi[SIZE], yi[SIZE];in port rst;out port ou[SIZE];

{boolean x[SIZE], y[SIZE];

write ou = 0;if ( rst ) <x = read(xi);y = read(yi);

>

if ((x != 0) & (y != 0))repeat {while (x >= y)x = x – y;

<x = y; /* swap x and y */y = x;

>} until (y == 0);

elsex = 0;

write ou = x;}

Figure 2. Greatest common divisor algorithm in

HardwareC. Statements within a < > block run in

parallel; statements within a { } block execute in

parallel when data dependencies allow.

#pragma intbits 8seven_seg(x)#pragma intbits 4int x;{#pragma intbits 8

int result;x = x & 0xf; result = 0;if (x == 0x0) result = 0xfc; if (x == 0x1) result = 0x60;if (x == 0x2) result = 0xda; if (x == 0x3) result = 0xf2;if (x == 0x4) result = 0x66; if (x == 0x5) result = 0xb6;if (x == 0x6) result = 0xbe; if (x == 0x7) result = 0xe0;if (x == 0x8) result = 0xfe; if (x == 0x9) result = 0xf6;return(~result);

}

twodigit(y)int y;{

int tens;int leftdigit, rightdigit;outputport(leftdigit, 37, 44, 40, 29, 35, 36, 38, 39);

outputport(rightdigit, 41, 51, 50, 45, 46, 47, 48, 49);

tens = 0;while (y >= 10) {tens++;y –= 10;

}leftdigit = seven_seg(tens);rightdigit = seven_seg(y);

}

Figure 3. Two-digit decimal-to-seven-segment

decoder in Transmogrifier C. Output-port

declarations assign pin numbers.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 52: 105.100.000 DesignnTest ESL Special Edition Complete

SystemC is a C++ dialect that supports hardware and

system modeling.9 Its popularity stems mainly from its

simulation facilities (it provides concurrency with light-

weight threads), but a subset of the language can be syn-

thesized. SystemC uses the C++ class mechanism to

model hierarchical structure and describes hardware

through combinational and sequential processes, much

as Verilog and VHDL do. Cynlib, from Forte Design

Systems, is similar. Figure 4 shows a decoder in SystemC.

The Ocapi system from IMEC (the Interuniversity

Microelectronics Center in Belgium) is also C++ based

but takes a different approach.10 Instead of being

parsed, analyzed, and synthesized, the C++ program is

run to generate in-memory data structures that repre-

sent the hardware system’s structure. Supplied classes

provide mechanisms for specifying data paths, finite-

state machines (FSMs), and similar constructs. These

data structures are then translated into languages such

as Verilog and passed to conventional synthesis tools.

Figure 5 shows an FSM in Ocapi.

The C2Verilog compiler developed at CompiLogic

(later called C Level Design and, since November 2001,

part of Synopsys) is one of the few compilers that can

claim broad support of ANSI C. It can translate pointers,

recursion, dynamic memory allocation, and other

thorny C constructs. Panchul, Soderman, and Coleman

hold a broad patent covering C-to-Verilog-like transla-

tion, which describes their compiler in detail.11

NEC’s Cyber system accepts BDL.2 Like HardwareC,

Cyber is targeted at behavioral synthesis. BDL has been

in industrial use for many years and deviates greatly

from ANSI C by including processes with I/O ports, hard-

ware-specific types and operations, explicit clock

cycles, and many synthesis-related pragmas.

Celoxica’s Handel-C is a C variant that extends the

language with constructs for parallel statements and

Occam-like rendezvous communication.1 Handel-C’s

timing model is uniquely simple: Each assignment state-

ment takes one cycle. Figure 6 shows a four-place buffer

in Handel-C.

Gajski et al.’s SpecC language12 is a superset of ANSI C,

augmented with many system- and hardware-modeling

constructs, including constructs for FSMs, concurrency,

pipelining, and structure. The latest language reference

manual lists 33 new keywords.13 SpecC imposes a refine-

ment methodology. Thus, the entire language is not direct-

ly synthesizable, but a series of manual and automated

rewrites can refine a SpecC description into one that can

be synthesized. Figure 7 shows a state machine described

in a synthesizable RTL dialect of SpecC.

Electronic System-Level Design

378 IEEE Design & Test of Computers

#include “systemc.h”#include <stdio.h>

struct decoder : sc_module {sc_in<sc_uint<4> > number;sc_out<sc_bv<7> > segments;

void compute() {static sc_bv<7> codes[10] = {0x7e, 0x30, 0x6d, 0x79, 0x33,0x5b, 0x5f, 0x70, 0x7f, 0x7b };

if (number.read() < 10)segments = codes[number.read()];

}

SC_CTOR(decoder) {SC_METHOD(compute);sensitive << number;

}};

struct counter : sc_module {sc_out<sc_uint<4> > tens;sc_out<sc_uint<4> > ones;sc_in_clk clk;

void tick() {int one = 0, ten = 0;for (;;) {if (++one == 10) {one = 0;if (++ten == 10) ten = 0;

}ones = one;tens = ten;wait();

}}

SC_CTOR(counter) {SC_CTHREAD(tick, clk.pos());

}};

Figure 4. A two-digit, decimal-to-seven-segment

decoder in SystemC. The decoder produces

combinational logic; the counter produces

sequential logic.

S0

S1

/sfg 1!eof/sfg 3

eof/sfg 2

fsm f;initials0;states1;

s0 << always << sfg1 << s1;

s1 << cnd(eof) << sfg2 << s1;

s1 << !cnd(eof)<< sfg3 << s0;

Figure 5. FSM described in Ocapi. This is a declarative style

executed to build data structures for synthesis rather than

compiled in the traditional sense.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 53: 105.100.000 DesignnTest ESL Special Edition Complete

Like Handel-C, Sharp’s Bach C is an ANSI C variant

with explicit concurrency and rendezvous communi-

cation.14 However, Bach C only imposes sequencing

rather than assigning a particular number of cycles to

each operation. Also, although it supports arrays, Bach

C does not support pointers.

Budiu and Goldstein’s CASH compiler is unique

among the C synthesizers because it generates asyn-

chronous hardware.15 It accepts ANSI C, identifies

instruction-level parallelism (ILP), and generates an

asynchronous dataflow circuit.

Mentor Graphics’ recent (2004) Catapult C performs

behavioral synthesis from an ANSI C++ subset. Because

it is a commercial product, details of its features and lim-

itations are not publicly available. However, it appears

to be a strict subset of ANSI C++ (that is, with few, if any,

language extensions).

ConcurrencyThe biggest difference between hardware and soft-

ware is its execution model. Software follows a sequen-

tial, memory-based execution model derived from

Turing machines, whereas hardware is fundamentally

concurrent. Thus, sequential algorithms that are effi-

cient in software are rarely the best choice in hardware.

This has serious implications for software programmers

designing hardware—their familiar toolkit of algorithms

is suddenly far less useful.

Why is so little software developed for parallel hard-

ware? The plummeting cost of parallel hardware

would make such software appear attractive, yet

concurrent programming has had limited success

compared with its sequential counterpart. One funda-

mental reason is that humans have difficulty conceiv-

ing of parallel algorithms, and thus many more

sequential algorithms exist. Another problem is dis-

agreement about the preferred parallel-programming

379September–October 2006

behavior even(in event clk, in unsigned bit[1] rst,in bit[31:0] Inport, out bit[31:0] Outport,in bit[1] Start, out bit[1] Done,out bit[31:0] idata, in bit[31:0] iocount,out bit[1] istart, in bit[1] idone,in bit[1] ack_istart, out bit[1] ack_idone)

{void main(void) {bit[31:0] ocount;bit[31:0] mask;enum state { S0, S1, S2, S3 } state;

state = S0;

while (1) {wait(clk);if (rst == 1b) state = S0;switch (state) {case S0:Done = 0b;istart = 0b;ack_idone = 0b;if (Start == 1b) state = S1;else state = S0;break;

case S1:mask = 0x0001;idata = Inport;istart = 1b;if (ack_istart == 1b)

state = S2;else state = S1;break;

case S2:istart = 0b;ocount = iocount;if (idone == 1b) state = S3;else state = S2;break;

case S3:Outport = ocount & mask;ack_idone = 1b;Done = 1b;if (idone == 0) state = S0;else state = S3;break;

}}

}};

Figure 7. State machine in a synthesizable RTL

dialect of SpecC. The wait(clk) statement

denotes a clock cycle boundary.

Figure 6. Four-place buffer in Handel-C. The ? and ! operators

are CSP-inspired receive and transmit operators.

const dw = 8;

void main(chan (in) c4 : dw, chan (out) c0 : dw){

int d0, d1, d2, d3;chan c1, c2, c3;

void e0() { while (1) { c1 ? d0; c0 ! d0; } }void e1() { while (1) { c2 ? d1; c1 ! d1; } }void e2() { while (1) { c3 ? d2; c2 ! d2; } }void e3() { while (1) { c4 ? d3; c3 ! d3; } }

par {e0(); e1(); e2(); e3();

}}

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 54: 105.100.000 DesignnTest ESL Special Edition Complete

model (for example, shared memory versus message

passing), as demonstrated by the panoply of parallel-

programming languages, none of which has emerged

as a clear winner.

Rather than exposing concurrency to the program-

mer and encouraging the use of parallel algorithms, the

more successful approach has been to automatically

expose parallelism in sequential code. Because C does

not naturally support user-specified concurrency, such

a technique is virtually mandatory for synthesizing effi-

cient hardware from plain C. Unfortunately, these tech-

niques are limited.

Finding parallelism in sequential codeThere are three main approaches to exposing paral-

lelism in sequential code, distinguished by their granu-

larity. Instruction-level parallelism (ILP) dispatches

groups of nearby instructions simultaneously. Although

this has become the preferred approach in the com-

puter architecture community, programmers recognize

that there are fundamental limits to the amount of ILP

that can be exposed in typical programs.16 Adding hard-

ware to approach these limits, usually through specu-

lation, results in diminishing returns.

The second approach, pipelining, requires less hard-

ware than ILP but can be less effective. A pipeline dis-

patches instructions in sequence but overlaps

them—the second instruction starts before the first com-

pletes. Like ILP, interinstruction dependencies and con-

trol-flow transfers tend to limit the maximum amount of

achievable parallelism. Pipelines work well for regular

loops, such as those in scientific or signal-processing

applications, but are less effective in general.

The third approach, process-level parallelism, dis-

patches multiple threads of control simultaneously. This

approach can be more effective than finer-grained par-

allelism, depending on the algorithm, but process-level

parallelism is difficult to identify automatically. Hall et

al. attempt to invoke multiple iterations of outer loops

simultaneously,17 but unless the code is written to avoid

dependencies, this technique might not be effective.

Exposing process-level parallelism is thus usually the pro-

grammer’s responsibility. Such parallelism is usually con-

trolled through the operating system (for example, Posix

threads) or the language itself (for example, Java).

Approaches to concurrencyThe C-to-hardware compilers considered here take

either of two approaches to concurrency. The first

approach adds parallel constructs to the language,

thereby forcing the programmer to expose most of the

concurrency. SystemC, BDL, and Ocapi all provide

process-level parallel constructs. HardwareC, Handel-

C, SpecC, and Bach C additionally provide statement-

level parallel constructs. SystemC’s parallelism

resembles that of standard hardware description lan-

guages (HDLs) such as Verilog, in which a system is a

collection of clock-edge-triggered processes. Hard-

wareC, Handel-C, SpecC, and Bach C’s approaches are

more like software, providing constructs that dispatch

collections of instructions in parallel.

The other approach lets the compiler identify paral-

lelism. Although the languages that provide parallel

constructs also identify some parallelism, Cones,

Transmogrifier C, C2Verilog, Catapult C, and CASH rely

on the compiler to expose all possible parallelism. The

Cones compiler takes the most extreme approach, flat-

tening an entire C function with loops and conditionals

into a single two-level combinational function evaluat-

ed in parallel. The CASH compiler takes an approach

closer to compilers for VLIW processors, carefully exam-

ining interinstruction dependencies and scheduling

instructions to maximize parallelism. None of these

compilers attempts to identify process-level parallelism.

Both approaches have drawbacks. The latter

approach places the burden on the compiler and there-

fore limits the parallelism achievable with normal,

sequential algorithms. Although carefully selecting eas-

ily parallelized algorithms could mitigate this problem,

such thinking is foreign to most software programmers

and may be more difficult than thinking in an explicitly

concurrent language.

The former approach, by adding parallel constructs

to C, introduces a fundamental and far-reaching change

to the language, again demanding substantially differ-

ent thinking by the programmer. Even for a programmer

experienced in concurrent programming with, say,

Posix threads, the parallel constructs in hardware-like

languages differ greatly from the thread-and-shared-

memory concurrency model typical of software.

A good hardware specification language must be

able to express parallel algorithms, because they are the

most efficient for hardware. Its inherent sequentiality

and often undisciplined use of pointers make C a poor

choice for this purpose.

Which concurrency model the next hardware design

language should employ remains an open question, but

the usual software model—asynchronously running

threads communicating through shared memory—is

clearly not the one.

Electronic System-Level Design

380 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 55: 105.100.000 DesignnTest ESL Special Edition Complete

TimingThe C language is mute on the subject of time. It

guarantees causality among most sequences of state-

ments but says nothing about the amount of time it

takes to execute each sequence. This flexibility simpli-

fies life for compilers and programmers alike but makes

it difficult to achieve specific timing constraints. C’s

compilation technique is transparent enough to make

gross performance improvements easy to understand

and achieve, and differences in efficiency of sequential

algorithms is a well-studied problem. Nevertheless,

wringing another 5% speedup from an arbitrary piece

of code can be difficult.

Achieving a performance target is fundamental to

hardware design. Miss a timing constraint by a few per-

centage points and the circuit will fail to operate or the

product will fail to sell. Achieving a performance target

under power and cost constraints is usually the only rea-

son to implement a particular function in hardware

rather than using an off-the-shelf processor. Thus, an ade-

quate hardware specification technique needs mecha-

nisms for specifying and achieving timing constraints.

This disparity leads to yet another fundamental ques-

tion in using C-like languages for hardware design: where

to put the clock cycles. Figure 8 shows a program frag-

ment that is interpreted in at least three different ways by

different compilers. Most of the compilers described here

generate synchronous logic in which the clock cycle

boundaries have been defined. There are only two

exceptions: Cones and CASH. Cones only generates com-

binational logic; CASH generates self-timed logic.

Compilers use various techniques for inserting clock

cycle boundaries, which range from fully explicit to

fully implicit. Ocapi’s clocks are the most explicit. The

designer specifies explicit state machines, and each

state gets a cycle. At some point in the SpecC refine-

ment flow, the state machines are also explicit, although

clock boundaries might not be explicit earlier in the

flow. The clocks in the Cones system are also explicit,

but in an odd way—because Cones generates only

combinational logic, clocks are implicit at function

boundaries. SystemC’s clock boundaries are also explic-

it; as in Cones, the clock boundaries of combinational

processes are at the edges, and in sequential processes,

explicit wait statements delay a prescribed number of

cycles. BDL takes a similar approach.

HardwareC lets the user specify clock constraints, an

approach common in high-level synthesis tools. For

example, the user can require that three particular state-

ments should execute in two cycles. This presents a

greater challenge to the compiler and is sometimes more

subtle for the designer, but it allows flexibility that can

lead to a better design. Bach C takes a similar approach.

Like HardwareC, the C2Verilog compiler also inserts

cycles using fairly complex rules and provides mecha-

nisms for imposing timing constraints. Unlike HardwareC,

however, these constraints are outside the language.

Transmogrifier C and Handel-C use fixed implicit

rules for inserting clocks. Handel-C’s are the simplest:

Each assignment and delay statement takes one cycle;

everything else executes in the same clock cycle.

Transmogrifier C’s rules are nearly as simple: Each loop

iteration and function call takes a cycle. Unfortunately,

such simple rules can make it difficult to achieve a par-

ticular timing constraint. To speed up a Handel-C spec-

ification, assignment statements might require fusing,

and Transmogrifier C might require loops to be manu-

ally unrolled.

The ability to specify or constrain detailed timing

in hardware is another fundamental requirement.

Whereas slow software is an annoyance, slow hardware

is a disaster. When something happens in hardware is

usually as important as what happens. This is another

big philosophical difference between software and

hardware, and again hardware requires different skills.

A good hardware specification language needs the

ability to specify detailed timing, both explicitly and

through constraints, but should not demand the pro-

grammer to provide too many details. The best-effort

model of software is inadequate by itself.

TypesData types are another central difference between

hardware and software languages. The most fundamen-

tal type in hardware is a single bit traveling through a

memoryless wire. By contrast, each base type in C and

381September–October 2006

for (i = 0 ; i < 8 ; i++) {a[i] = c[i];b[i] = d[i] || f[i];

}

Figure 8. It is not clear how many cycles it should

take to execute this (contrived) loop written in C.

Cones does it in one (it is combinational),

Transmogrifier-C chooses eight (one per

iteration), and Handel-C chooses 25 (one per

assignment). Others, such as HardwareC, allow

the user to specify the number.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 56: 105.100.000 DesignnTest ESL Special Edition Complete

C++ is one or more bytes stored in memory. Although C’s

base types can be implemented in hardware, C has

almost no support for types smaller than a byte. (The one

exception is that the number of bits for each field in a

struct can be specified explicitly. Oddly, none of these

languages even mimics this syntax.) As a result, straight

C code can easily be interpreted as bloated hardware.

Compilers take three approaches to introducing

hardware types to C programs. The first, and perhaps

the purest, neither modifies nor augments C’s types but

allows the compiler or designer to adjust the width of

the integer types outside the language. For example, the

C2Verilog compiler provides a GUI that lets the user set

the width of each variable in the program. In

Transmogrifier C, the user can set each integer’s width

through a preprocessor pragma.

The second approach is to add hardware types to the

C language. HardwareC, for instance, adds a Boolean

vector type. Handel-C, Bach C, and BDL add integers

with an explicit width. SpecC adds all these types and

many others that cannot be synthesized, such as pure

events and simulated time.

The third approach, used by C++-based languages,

is to provide hardware-like types through C++’s type sys-

tem. C++ supports a one-bit Boolean type by default,

and its class mechanism makes it possible to add more

types, such as arbitrary-width integers, to the language.

The SystemC libraries include variable-width integers

and an extensive collection of types for fixed-point frac-

tional numbers. Ocapi, because it is an algorithmic

mechanism for generating structure, also effectively

takes this approach, letting the user explicitly request

wires, buses, and so on. Catapult C presumably has a

similar library of hardware-like types.

Each approach, however, is a fairly radical departure

from C’s call-it-an-integer-and-forget-about-it approach.

Even the languages that support only C types compel a

user to provide each integer’s actual size. Worrying

about the width of each variable in a program is not

something a typical C programmer does.

Compared with timing and concurrency, however,

adding appropriate hardware types is a fairly easy prob-

lem to solve when adapting C to hardware. C++’s type sys-

tem is flexible enough to accommodate hardware types,

and minor extensions to C suffice. A larger question,

which none of the languages adequately addresses, is

how to apply higher-level types such as classes and inter-

faces to hardware description. SystemC has some facili-

ties for inheritance, but the inheritance mechanism is

simply the one used for software; it is not clear that this

mechanism is convenient for adding to or modifying the

behavior of existing hardware. Incidentally, SystemC has

supported more high-level modeling constructs such as

templates and more elaborate communication protocols

since version 2.0, but they are not typically synthesizable.

A good HDL needs a rich type system that allows pre-

cise definition of hardware types, but it should also

assist in ensuring program correctness. C++’s type sys-

tem is definitely an improvement over C’s in this regard.

CommunicationC-like languages are built on the very flexible RAM

communication model. They implicitly treat all memo-

ry locations as equally costly to access, but this is not

true in modern memory hierarchies. At any point, it can

take hundreds or even thousands of times longer to

access certain locations. Designers can often predict the

behavior of these memories, specifically caches, and

use them more efficiently. But doing so is very difficult,

and C-like languages provide scant support for it.

Long, nondeterministic communication delays are

anathema in hardware. Timing predictability is manda-

tory, so large, uniform-looking memory spaces are rarely

the primary communication mechanism. Instead, hard-

ware designers use various mechanisms, ranging from

simple wires to complex protocols, depending on the

system’s needs. An important characteristic of this

approach is the need to understand a system’s com-

munication channels and patterns before it is running

because communication channels must be hardwired.

The problem with pointersCommunication patterns in software are often diffi-

cult to determine a priori because of the frequent use

of pointers. These are memory addresses computed at

runtime, and as such are often data dependent and can-

not be known completely before a system is running.

Implementing such behavior in hardware mandates, at

least, small memory regions.

Aliasing, when a single value can be accessed

through multiple sources, is an even more serious prob-

lem. Without a good understanding of when a variable

can be aliased, a hardware compiler must place that

variable into a large, central memory, which is neces-

sarily slower than a small memory local to the compu-

tational units that read and feed it.

One of C’s strengths is its flexible memory model,

which allows complicated pointer arithmetic and essen-

tially uncontrolled memory access. Although very use-

ful for system programs such as operating systems, these

Electronic System-Level Design

382 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 57: 105.100.000 DesignnTest ESL Special Edition Complete

abilities make analyzing an arbitrary C program’s com-

munication patterns especially difficult. The problem is

so great, in fact, that software compilers often have an

easier time analyzing a Fortran program than an equiv-

alent C program.

Any technique that implements a C-like program in

hardware must analyze the program to understand all

possible communication pathways; resort to large, slow

memories; or do some combination of the two.

Séméria, Sato, and De Micheli applied pointer analy-

sis algorithms from the software compiler literature to esti-

mate the communication patterns of C programs for

hardware synthesis.18 Although this is an impressive body

of work, it illustrates the difficulty of the problem. Pointer

analysis identifies the data to which each pointer can

refer, allowing memory to be divided. Solving the point-

er analysis problem precisely is undecidable, so

researchers use approximations. These are necessarily

conservative and hence might miss opportunities to split

memory regions, leading to higher-cost implementations.

Finally, pointer analysis is a costly algorithm with

many variants.

Communication costsSoftware’s event-oriented communication style is

another key difference from hardware. Every bit of data

communicated among parts of a software program has

a cost (that is, a read or write operation to registers or

memory), and thus communication must be explicitly

requested in software. Communicating the first bit is

very costly in hardware because it requires the addition

of a wire, but after that, communication is actually more

costly to disable than to continue.

This difference leads to a different set of concerns.

Good hardware communication design tries to mini-

mize the number of pathways among parts of the

design, whereas good software design minimizes the

number of transactions. For example, good software

design avoids forwarding through copying, preferring

instead to pass a reference to the data being forwarded.

This is a good strategy for hardware that stores large

blocks of data in memory, but is rarely appropriate in

other cases. Instead, good hardware design considers

alternate data encodings, such as serialization.

Communication approachesThe languages considered here fall broadly into two

groups: those that effectively ignore C’s memory model

and look only at communication through variables, and

those that adopt the full C memory model.

Languages that ignore C’s memory model don’t sup-

port arrays or pointers. Instead they look only at how

local variables communicate between statements.

Cones is the simplest; all variables, arrays included, are

interpreted as wires. HardwareC and Transmogrifier C

don’t support arrays or memories. Ocapi also falls into

this class, although arrays and pointers can assist during

system construction. BDL is perhaps the richest of this

group, supporting multidimensional arrays, but it doesn’t

support pointers or dynamic memory allocation.

Languages in the second group go to great lengths to

preserve C’s memory model. The CASH compiler takes

the most brute-force approach. It synthesizes one large

memory and puts all variables and arrays into it. The

Handel-C and C2Verilog compilers can split memory into

multiple regions and assign each to a separate memory

element. Handel-C adds explicit constructs to the lan-

guage for specifying these elements. SystemC also sup-

ports explicit declaration of separate memory regions.

Other languages provide communication primitives

whose semantics differ greatly from C’s memory style of

communication. HardwareC, Handel-C, and Bach C

provide blocking, rendezvous-style (unbuffered) com-

munication primitives for communicating between con-

currently running processes. SpecC and later versions

of SystemC provide a large library of communication

primitives.

Again, the difference between appropriate software

and hardware design is substantial. Software designers

usually ignore memory access patterns. Although this

can slow overall memory access speed, it is usually

acceptable. Good hardware design, in contrast, usual-

ly starts with a block diagram detailing every commu-

nication channel and attempts to minimize

communication pathways.

So, software designers usually ignore the funda-

mental communication cost issues common in hard-

ware. Furthermore, automatically extracting efficient

communication structures from software is challenging

because of the pointer problem in C-like languages.

Although pointer analysis can help mitigate the prob-

lem, it is imprecise and cannot improve an algorithm

with poor communication patterns.

A good hardware specification language should make

it easy to specify efficient communication patterns.

MetadataA high-level construct can be implemented in many

different ways. However, because hardware is at a far

lower level than software, there are many more ways to

383September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 58: 105.100.000 DesignnTest ESL Special Edition Complete

implement a particular C construct in hardware. For

example, consider an addition operation. A processor

probably has only one useful addition instruction,

whereas in hardware there are a dizzying number of dif-

ferent adder architectures—for example, ripple carry,

carry look-ahead, and carry save.

The translation process for hardware therefore has

more decisions to make than translation for software.

Making many decisions correctly is difficult and compu-

tationally expensive. Furthermore, the right set of deci-

sions varies with design constraints. For example, a

designer might prefer a ripple-carry adder if area and

power are at a premium and speed is a minor concern,

but a carry-look-ahead adder if speed is a greater concern.

Much effort has gone into improving optimization

algorithms, but it remains unrealistic to expect all these

decisions to be automated. Instead, designers need

mechanisms that let them ask for exactly what they

want. Such designer guidance takes two forms: manu-

al rewriting of high-level constructs into the desired

lower-level ones (for example, replacing a “+” operator

with a collection of gates that implement a carry-look-

ahead adder) or annotations such as constraints or hints

about how to implement a particular construct. Both

are common RTL design approaches. Designers rou-

tinely specify complex data paths at the gate level

instead of using higher-level constructs. Constraint infor-

mation, often supplied in an auxiliary file, usually drives

logic optimization algorithms.

Although it might seem possible to use C++’s opera-

tor-overloading mechanism to specify, for example,

when a carry-look-ahead adder should implement an

addition, using this mechanism is probably very diffi-

cult. C++’s overloading mechanism uses argument types

to resolve ambiguities, which is natural when you want

to treat different data types differently. But the choice

of algorithm in hardware is usually driven by resource

constraints (such as area or delay) rather than data rep-

resentation (although, of course, data representation

does matter). Concurrency is the fundamental problem.

In software, there is little reason to have multiple imple-

mentations of the same algorithm, but it happens all the

time in hardware. Not surprisingly, C++ doesn’t support

this sort of thing.

The languages considered here take two approach-

es to specifying such metadata. One group places it

within the program itself, hiding it in comments, prag-

mas, or added constructs. The other group places it out-

side the program, either in a text file or in a database

populated by the user through a GUI.

C has a standard way of supplying extra information

to the compiler: the #pragma directive. By definition, a

compiler ignores such lines unless it understands them.

Transmogrifier C uses the directive to specify integer

width, and Bach C uses it to specify timing and mapping

constraints. HardwareC provides three language-level

constructs: timing constraints, resource constraints, and

arbitrary string-based attributes, whose semantics are

much like a C #pragma. BDL has similar constructs.

SpecC takes the other approach; many tools for syn-

thesizing and refining SpecC require the user to speci-

fy, using a GUI, how to interpret various constructs.

Constructs such as addition, which are low level in

software, are effectively high level in hardware. Thus,

there must be a mechanism for conveying designer

intent to any hardware synthesis procedure, regardless

of the source language. A good hardware specification

language needs a way of guiding the synthesis proce-

dure to select among different implementations, trad-

ing off between, say, power and speed.

WHY BOTHER generating hardware from C? It is clearly

not necessary, because there are many excellent proces-

sors and software compilers, which are certainly the

cheapest and easiest way to run a C program. So why

consider using hardware? Efficiency is the logical answer.

Although general-purpose processors get the job done,

well-designed customized hardware can always do it

faster, using fewer transistors and less energy. Thus, the

utility of any hardware synthesis procedure depends on

how well it produces efficient hardware specialized for

an application. Table 2 summarizes the key challenges

of a successful hardware specification language.

Concurrency is fundamental for efficient hardware,

but C-like languages impose sequential semantics and

require the use of sequential algorithms. Automatically

exposing concurrency in sequential programs is limit-

ed in effectiveness, so a successful language requires

explicit concurrency, something missing from most

Electronic System-Level Design

384 IEEE Design & Test of Computers

Table 2. The big challenges for hardware languages.

Challenge Comment

Concurrency model Specifying parallel algorithms

Specifying timing How many clock cycles?

Types Need bits and bit-precise vectors

Communication patterns Need isolated memories

Hints and constraints How to implement something

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 59: 105.100.000 DesignnTest ESL Special Edition Complete

C-like languages. Adding such a construct is easy, but

teaching software programmers to use concurrent algo-

rithms is difficult.

Careful timing design is also required for efficient

hardware, but C-like languages provide essentially no

control over timing, so the language needs added tim-

ing control. The problem amounts to where to put the

clock cycles, and the languages offer a variety of solu-

tions, both implicit and explicit. The bigger problem,

though, is changing programmer habits to consider

such timing details.

Using software-like types is also a problem in hard-

ware, which wants to manipulate individual bits for effi-

ciency. The problem is easier to solve for C-like

languages. Some languages add the ability to specify

the number of bits used for each integer, for example,

and C++’s flexible type system allows hardware types to

be defined. The type problem is the easiest to address.

Communication also presents a challenge. C’s flexi-

ble global-memory communication model is not effi-

cient for hardware. Instead, memory should be broken

into smaller regions, often as small as a single variable.

Compilers can do so to a limited degree, but efficiency

often demands explicit control over this. A fundamen-

tal problem, again, is that C programmers generally

don’t worry about memory, and C programs are rarely

written with memory behavior in mind.

A high-level HDL must let the designer provide con-

straints or hints to the synthesis system because of the

wide semantic gap between a C program and efficient

hardware. There are many ways to implement a con-

struct such as addition in hardware, so the synthesis sys-

tem needs a way to select an implementation.

Constraints and hints are the two main ways to control

the algorithm, but standard C has no such facility.

Although presenting designers with a higher level of

abstraction is obviously desirable, presenting them with

an inappropriate level of abstraction—one in which

they cannot effectively ask for what they want—is not

much help. Unfortunately, C-like languages, because

they provide abstractions geared toward the generation

of efficient software, do not naturally lend themselves

to the synthesis of efficient hardware.

The next great hardware specification language

won’t closely resemble C or any other familiar software

language. Software languages work well only for soft-

ware, and a hardware language that does not produce

efficient hardware is of little use. Another important

issue will be the language’s ability to build systems from

existing pieces (known as IP-based design), which none

of these languages addresses. This ability appears nec-

essary to raise designer productivity to the level need-

ed for the next generation of chips.

Looming over all these issues, however, is verification.

What we really need are languages that let us create cor-

rect systems faster by making it easier to check for, iden-

tify, and correct mistakes. Raising the abstraction level

and facilitating efficient simulation are two well-known

ways to achieve this, but are there others? ■

AcknowledgmentsEdwards is supported by the National Science

Foundation, Intel, Altera, the SRC, and New York

State’s NYSTAR program.

References1. Handel-C Language Reference Manual, RM-1003-4.0,

Celoxica, 2003.

2. K. Wakabayashi and T. Okamoto, “C-Based SoC Design

Flow and EDA Tools: An ASIC and System Vendor Per-

spective,” IEEE Trans. Computer-Aided Design of Inte-

grated Circuits and Systems, vol. 19, no. 12, Dec. 2000,

pp. 1507-1522.

3. D.M. Ritchie, “The Development of the C Language,”

History of Programming Languages-II, T.J. Bergin Jr.

and R.J. Gibson Jr., eds., ACM Press and Addison-Wes-

ley, 1996.

4. G. De Micheli, “Hardware Synthesis from C/C++

Models,” Proc. Design, Automation and Test in Europe

(DATE 99), IEEE Press, 1999, pp. 382-383.

5. C.E. Stroud, R.R. Munoz, and D.A. Pierce, “Behavioral

Model Synthesis with Cones,” IEEE Design & Test, vol.

5, no. 3, July 1988, pp. 22-30.

6. D.C. Ku and G. De Micheli, HardwareC: A Language for

Hardware Design, Version 2.0, tech. report CSTL-TR-

90-419, Computer Systems Lab, Stanford Univ., 1990.

7. G. De Micheli et al., “The Olympus Synthesis System,”

IEEE Design & Test, vol. 7, no. 5, Oct. 1990, pp. 37-53.

8. D. Galloway, “The Transmogrifier C Hardware Descrip-

tion Language and Compiler for FPGAs,” Proc. Symp.

FPGAs for Custom Computing Machines (FCCM 95),

IEEE Press, 1995, pp. 136-144.

9. T. Grötker et al., System Design with SystemC, Kluwer

Academic Publishers, 2002.

10. P. Schaumont et al., “A Programming Environment for

the Design of Complex High Speed ASICs,” Proc. 35th

Design Automation Conf. (DAC 98), ACM Press, 1998,

pp. 315-320.

11. Y. Panchul, D.A. Soderman, and D.R. Coleman, System

for Converting Hardware Designs in High-Level

385September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 60: 105.100.000 DesignnTest ESL Special Edition Complete

IEEE Design & Test of Computers

Electronic System-Level Design

Programming Language to Hardware Implementations,

US patent 6,226,776, Patent and Trademark Office,

2001.

12. D.D. Gajski et al., SpecC: Specification Language and

Methodology, Kluwer Academic Publishers, 2000.

13. R. Dömer, A. Gerstlauer, and D. Gajski, SpecC

Language Reference Manual, Version 2.0, SpecC Con-

sortium, 2001.

14. T. Kambe et al., “A C-Based Synthesis System, Bach,

and Its Application,” Proc. Asia South Pacific Design

Automation Conf. (ASP-DAC 01), ACM Press, 2001, pp.

151-155.

15. M. Budiu and S.C. Goldstein, “Compiling Application-

Specific Hardware,” Proc. 12th Int’l Conf. Field-Program-

mable Logic and Applications (FPL 02), LNCS 2438,

Springer-Verlag, 2002, pp. 853-863.

16. D.W. Wall, “Limits of Instruction-Level Parallelism,” Proc.

4th Int’l Conf. Architectural Support for Programming Lan-

guages and Operating Systems (ASPLOS 91), Sigplan

Notices, vol. 26, no. 4, ACM Press, 1991, pp. 176-189.

17. M.W. Hall et al., “Detecting Coarse-Grain Parallelism

Using an Interprocedural Parallelizing Compiler,” Proc.

Supercomputing Conf., IEEE Press, p. 49.

18. L. Séméria, K. Sato, and G. De Micheli, “Synthesis of

Hardware Models in C with Pointers and Complex Data

Structures,” IEEE Trans. Very Large Scale Integration

(VLSI) Systems, vol. 9, no. 6, Dec. 2001, pp. 743-756.

Stephen A. Edwards is an associ-ate professor in the Computer ScienceDepartment of Columbia University.His research interests include embed-ded-system design, domain-specific

languages, and compilers. Edwards has a BS from theCalifornia Institute of Technology and an MS and aPhD from the University of California, Berkeley, all inelectrical engineering. He is an associate editor ofIEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems. He is a senior memberof the IEEE.

Direct questions and comments about this articleto Stephen A. Edwards, Dept. of Computer Science,Columbia University, 1214 Amsterdam Ave. MC 0401,New York, NY 10027; [email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

Product DisplayJohn RestchackPhone: +1 212 419 7578Fax: +1 212 419 7589Email: [email protected]

Recruitment DisplayMid Atlantic (recruitment)Dawn BeckerPhone: +1 732 772 0160Fax: +1 732 772 0164Email: [email protected]

New England (recruitment)John RestchackPhone: +1 212 419 7578Fax: +1 212 419 7589Email: [email protected]

Southeast (recruitment)Thomas M. FlynnPhone: +1 770 645 2944Fax: +1 770 993 4423Email: [email protected]

Midwest/Southwest (recruitment)Darcy GiovingoPhone: +1 847 498-4520Fax: +1 847 498-5911Email: [email protected]

Northwest/Southern CA (recruitment)Tim MattesonPhone: +1 310 836 4064Fax: +1 310 836 4067Email: [email protected]

Japan (recruitment)Tim MattesonPhone: +1 310 836 4064Fax: +1 310 836 4067Email: [email protected]

Europe (recuirtment) Hilary TurnbullPhone: +44 1875 825700Fax: +44 1875 825701Email: [email protected]

A D V E R T I S E R I N D E XS E P T E M B E R / O C T O B E R 2 0 0 6

FUTURE ISSUE:

November-December 2006

Process Variation and Stochastic

Design and TestAdvertising close date: 01 October 06

Advertising Personnel

Marion DelaneyIEEE Media, Advertising DirectorPhone: +1 415 863 4717Email: [email protected]

Marian AndersonAdvertising CoordinatorPhone: +1 714 821 8380Fax: +1 714 821 4010Email: [email protected]

Sandy BrownIEEE Computer Society,Business Development ManagerPhone: +1 714 821 8380Fax: +1 714 821 4010Email: [email protected]

Advertising Sales Representatives

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.

Page 61: 105.100.000 DesignnTest ESL Special Edition Complete

3870740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006

IN “THE CHALLENGES of Synthesizing Hardware

from C-Like Languages,” Stephen Edwards has provided a

good survey of the many attempts to adapt C to hardware

creation. His thesis, that “pure C is a poor choice for spec-

ifying hardware,” was recognized by all the people doing

this work. Unfortunately, he does not recognize the evo-

lution of these C variations. The last in this line, SystemC,

has satisfactorily addressed all the language issues.

Edwards acknowledges that using pragmas to direct the

synthesis process is a satisfactory way to provide the nec-

essary additional information for efficient hardware cre-

ation, yet he criticizes the language for not providing a

different means of doing so. In Table 2 of the article, two of

the five listed challenges are language issues (concurren-

cy and data types), and three are synthesis issues. Edwards

acknowledges that SystemC adequately solves the con-

currency model and types challenges but seems unaware

that existing modern synthesis products have solved the

other three—specifying timing, communication patterns,

and hints and constraints—with pragmas.

Problems have largely been solvedConfusing language with the synthesis process,

Edwards comes to the conclusion that C-like languages

“do not naturally lend themselves to the synthesis of effi-

cient hardware.” That is simply wrong. Commercial Sys-

temC synthesis tools routinely produce more efficient

hardware than handwritten RTL code typically pro-

duces. Edwards argues that properties of C-like lan-

guages make this synthesis process computationally

hard and time-consuming. Although some of the prop-

erties he has cited do make synthesis more difficult,

those problems have largely been solved. Fundamen-

tally, the complexity imposed on these synthesis prod-

ucts results from starting at a higher abstraction level,

not from the language.

Little trouble for competent hardwaredesigner

Edwards says, “My main point is that giving C pro-

grammers tools is not enough to turn them into reason-

able hardware designers.” This statement is unarguably

true. Giving people C compilers is not enough to turn

them into reasonable programmers either. Tuning code

for performance has long been recognized as a separate

skill, closely related to the underlying target processor. For

efficient performance, vector, SIMD (single instruction,

multiple data), SMP (symmetric multiprocessing), and

VLIW (very long instruction word ) machines all require

special techniques, encompassing both coding style and

pragmas. It should surprise no one that when the under-

lying target processor is raw gates, additional skill and

knowledge are required. In fact, a competent hardware

designer has little trouble creating efficient hardware

using SystemC and a modern synthesis product.

IN THE END, though, Edwards’ thesis is beside the

point. As IC capacity increases, it is becoming routine

to implement increasingly larger algorithms in hard-

ware, for the performance and efficiency reasons

Edwards cites. Those algorithms nearly always start out

in C or C++. It is far better to operate on the original ver-

sion directly than to manually translate it to a different

language before beginning to transform it into hard-

ware. Recognition of this fact has motivated most of the

efforts surveyed here. Sure, there are challenges, but the

benefits are worth it. ■

A different view: Hardware synthesis fromSystemC is a maturing technologyJohn Sanguinetti

Forte Design Systems

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:41 UTC from IEEE Xplore. Restrictions apply.

Page 62: 105.100.000 DesignnTest ESL Special Edition Complete

ITC Special Section

388 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

THIS SPECIAL SECTION of IEEE Design & Test of

Computers, along with the International Test Conference

2006, highlights the value that test adds to the electronics

manufacturing business. It leads us to think about test in

a whole new way.

The theme for ITC 2006 is “Getting More out of Test,”

which is very appropriate in light of recent advances

and changes in our industry. These days, everybody is

talking about things like design for manufacturability

(DFM), yield enhancement technologies, test-based out-

lier techniques, and the like. Based on these concepts,

whole companies have been founded and have pros-

pered, such as PDF Solutions, whose CEO was keynote

speaker at ITC 2005. What makes these developments

truly exciting is the role test plays in all of these new

technologies. Test is truly the cornerstone on which the

disciplines of yield and reliability engineering are built.

And we’re not just talking about characterization test or

an occasional product lot, but large production vol-

umes analyzed with new and ever-more powerful data

mining and data reduction techniques.

We have also had to rethink what it means for a die,

chip, board, or system to “pass” or “fail” a test. In the

early days, particularly for digital products, we could

always devise a test whose results were clear indicators

of good or bad units. Yes, there was (and is) the peren-

nial question of the test’s coverage or thoroughness. But,

that aspect related more to the effort level expended to

incorporate good test-access mechanisms into the

design and less to the technology in which the product

was manufactured. Today, however, we see ample evi-

dence of electronics failure mechanisms’ increasingly

subtle nature. We can view this problem from two per-

spectives: the “time zero” or “test escape” question, and

the separate but equally important reliability aspect.

A good example of the former is the relatively recent

proliferation of fault models and test approaches that

various groups are advocating. Everybody continues to

rely on the workhorse stuck-at fault model for bulk sta-

tic defect coverage. But how long will that strategy con-

tinue to work for us? At what point must we supplement,

or dare I say replace, stuck-at testing with other candi-

date test techniques such as N-detect tests, extracted

bridging fault tests, or other nontraditional forms of test-

ing? Authors in this magazine, at ITC, and at other

venues continue to grapple with this question.

On the reliability side, the underlying mechanisms,

such as channel hot carrier (CHC) effects and negative

bias temperature instability (NBTI), have always been

there. We have known about them for decades, but

their impact on quality and product lifetime was rela-

tively invisible to us. Unfortunately, that statement is no

longer true. NBTI and other reliability mechanisms

degrade product lifetime and performance and

demand that we add margins for their occurrence. So,

again, we must call on test to help us identify these prob-

lems when they occur, quantify the magnitude of the

yield/reliability impact, and screen the material before

it gets into the consumer’s hands. Overall, therefore, we

can see that test must play an ever-more-important role

in more aspects of the electronics business.

The first article in this special section, “Extracting

Defect Density and Size Distributions from Product ICs”

by Jeffrey Nelson et al., is a classic example of learning

all you can about the manufacturing process via pro-

duction test. Today, the cost to construct and populate

an IC wafer fabrication facility is measured in billions

of dollars, and the cost of a mask set in an advanced

technology is approaching or can exceed $1 million.

The inevitable outcome of these spiraling costs is that

Guest Editor’s Introduction:ITC Helps Get More out of TestKenneth M. Butler

Texas Instruments

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:40 UTC from IEEE Xplore. Restrictions apply.

Page 63: 105.100.000 DesignnTest ESL Special Edition Complete

fewer companies can afford to maintain captive IC man-

ufacturing sites and thus are moving to fabless, foundry-

based business models. But how do you learn and

respond to important yield and defect Pareto informa-

tion when design and manufacturing are in two com-

pletely separate companies, often geographically

distant from each other, without having to devote cost-

ly wafer volume to test vehicles? This article addresses

that important and timely question.

“Improving Transition Delay Test Using a Hybrid

Method” by Nisar Ahmed and Mohammad Tehranipoor

deals with the increasingly complex subject of delay test.

Starting somewhere around the 130-nm technology

node, and perhaps spurred by the advent of copper met-

allization, delay defects suddenly became something

that, left untested, could result in too large an escape rate

as seen by the customer. The industry responded in

earnest by applying delay test techniques to large num-

bers of production ICs. Immediately, users of this tech-

nology discovered issues with things like pattern volume,

realizable coverage, and test generation tool runtimes.

This article is an example of the types of new thinking

being applied to this problem to make delay test more

tractable and more usable, thus getting more out of it.

The final article, “Impact of Thermal Gradients on

Clock Skew and Testing” by Sebastià Bota et al., in some

sense turns the ITC theme on its ear. To get more out of

test, we must fundamentally understand not only its

capabilities but also its limitations. As die sizes grow

increasingly larger and clock rates continue to climb,

so, too, do power requirements, driving die tempera-

tures higher as well. Within-die thermal gradients can

have negative effects on timing and clocking, which

degrade testing’s accuracy and results. This article sys-

tematically examines the issue of thermal effects, intro-

duces a methodology for quantifying them, and

proposes a design technique for counteracting them.

TAKEN AS A WHOLE, the articles demonstrate the

changing role of test in the entire electronics industry

and how it’s not just for pass/fail anymore. Contributors

to ITC, IEEE Design & Test, and numerous other IEEE

test conferences and workshops are continually invent-

ing and demonstrating new ways in which the test

process can increase our rate of product and process

learning, speed products to yield and reliability entitle-

ment, and generally contribute more to our collective

bottom line. I hope that this information will inspire you

to come to ITC, see the presentations of articles like

these, interact with their authors, visit the exhibits floor

and see the new products that leverage the best test has

to offer, and, most importantly, share your thoughts and

ideas on how we can get more out of test.

I would like to take this opportunity to thank Editor-

in-Chief Tim Cheng and the entire IEEE D&T editorial

staff for their encouragement and assistance in pro-

ducing this special issue. ■

Kenneth M. Butler is a TI Fellow atTexas Instruments in Dallas. Hisresearch interests include outlier tech-niques for quality and reliability andtest-data-driven decision making. But-

ler has a BS from Oklahoma State University and anMS and a PhD from the University of Texas at Austin,all in electrical engineering. He was the program chairof ITC 2005 and currently serves on the program andsteering committees. He is a Senior Member of theIEEE and a member of the ACM.

Direct questions and comments about this specialsection to Kenneth M. Butler, Texas Instruments,13121 TI Boulevard, MS 366, Dallas, TX 75243;[email protected].

389September–October 2006

Join a community that targets your discipline.

In our Technical Committees, you’re in good company.

www.computer.org/TCsignup/

Looking for a community targeted to yourarea of expertise? IEEE Computer SocietyTechnical Committees explore a variety

of computing niches and provide forums fordialogue among peers. These groups influenceour standards development and offer leadingconferences in their fields.

JOIN A THINKTANK

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:40 UTC from IEEE Xplore. Restrictions apply.

Page 64: 105.100.000 DesignnTest ESL Special Edition Complete

ITC Special Section

390 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

DEFECTS FREQUENTLY OCCUR during IC manufac-

ture. Modeling the resulting yield loss is an important part

of any design-for-manufacturability strategy. Of the many

mechanisms that cause yield loss, some have sufficiently

accurate models and are well understood, whereas oth-

ers are unpredictable and difficult to characterize. Current

yield-related research focuses mainly on systematic

defects. In contrast, this article addresses random spot

defects, which affect all processes and currently require a

heavy silicon investment to characterize. We propose a

new approach for characterizing random spot defects in

a process. This approach enables accurate measurement

of parameters for the critical-area yield model—the work-

horse of modern yield-learning strategies.

IC manufacturers often neglect the need to tune the

yield model—that is, to continuously update yield

model parameters—because of the silicon area

required to characterize a process. But the inherently

stochastic nature of yield makes frequent process char-

acterization necessary for accurate yield models. We

present a system that overcomes the obstacle of silicon

area overhead by using available wafer sort test results

to measure critical-area yield model parameters. We use

only wafer sort test results, so no additional silicon area

is required. Our strategy uses the most realistic charac-

terization vehicle for the product IC—the product

itself—rather than memory or special-

ized test structures that waste silicon area

and often do not represent the product’s

design style.

BackgroundDefect density and size distributions

(DDSDs) are important parameters for

characterizing spot defects in a process. A DDSD tells

us what the defect density is for a given defect radius—

that is, the number of defects per unit area. The distrib-

ution gives this information for all defect radii. Typically,

though, as defect radius increases, defect density quick-

ly decreases. Thus, we can generally curtail the distrib-

ution and measure only defect density for a range of

defect radii, because larger defects have a density

approaching zero. This inherent feature becomes use-

ful in attempting to discretize the DDSD.

We can subdivide the distributions characterizing a

process beyond defect size. Each metal layer of the

process can potentially have a different DDSD. Ideally,

we’d like to measure each layer’s DDSD rather than

attempt to characterize all layers simultaneously with a

single distribution. These distributions are parameters

for the critical-area yield model.1-3

IC manufacturers measure DDSDs primarily with spe-

cialized test structures on a wafer. Test structures con-

tain geometries specifically designed to observe defects.

When a defect occurs in a particular region of a test

structure, that structure observes the defect, making it

easy for the process engineer to identify what the defect

mechanism is, where it occurred, and to learn about the

defect’s size. The price we pay for this convenience is

that test structures consume silicon area on the wafer.

Extracting Defect Densityand Size Distributions fromProduct ICs

Editor’s note:Defect density and size distributions are difficult to characterize, especially ifyou have little or no access to test vehicles specifically designed for thepurpose. The authors propose a new methodology for extracting thatinformation directly from production test data on actual products.

—Ken Butler, Texas Instruments

Jeffrey E. Nelson, Thomas Zanon, Jason G. Brown,

Osei Poku, R.D. (Shawn) Blanton, and Wojciech Maly

Carnegie Mellon University

Brady Benware and Chris Schuermyer

LSI Logic

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 65: 105.100.000 DesignnTest ESL Special Edition Complete

Thus, test structures impose a trade-off between area

cost and defect observability.

Consider the three wafers in Figure 1. In Figure 1a,

the entire wafer is dedicated to test structures. This con-

figuration allows excellent defect observability, but the

obvious drawback is that no product can be manufac-

tured from it—product volume is zero. Manufacturers

typically use a full wafer of test structures only during

the earliest yield-learning phase, when the yield

improvement realized from these structures significantly

outweighs manufacturing cost.

In Figure 1b, products have replaced many of the test

structures, raising volume to a medium level. However,

observability has decreased because now there is a sig-

nificant amount of area where defects can occur with

no direct ability to characterize them. The wafer in

Figure 1b also contains test structures in the scribe lines.

This configuration is a compromise between defect

observability and volume. Manufacturers typically use

it during yield ramp, when volume is necessary, but the

ability to characterize defects—particularly systematic

defects—is still required.

Finally, the wafer configuration shown in Figure 1c

uses the entire silicon area to manufacture products.

The scribe lines still contain test structures because they

don’t affect product volume. As in the Figure 1b con-

figuration, this configuration provides limited area to

observe defects, but it is even more extreme because it

relegates the test structures to the scribe lines. This con-

figuration is used most during the volume phase of yield

ramp, when characterization of random spot defects is

most important for predicting yield.

The observability-versus-area trade-off has led to

research that seeks the best of both worlds: high observ-

ability and low (or no) area overhead. In particular,

researchers have used SRAMs to extract DDSDs.4 This

technique requires no additional overhead, because

the characterization vehicle (the SRAM) is a useful

product itself. SRAMs, however, have undesirable char-

acterization characteristics, such as confinement to a

few metal layers, which limits the scope of observable

defects. SRAMs’ extremely regular structure means that

if the replicated cell has a narrow scope of geometric

features for defect observation, this limitation will

extend over the entire chip. These limitations are only

noteworthy when the memories are extracting DDSDs

for yield-loss prediction for random-logic circuits. A

preferable defect characterization vehicle in such cases

is a random-logic product.

Other researchers have suggested using a random-

logic product to estimate the defect pareto in a process

using only test results.5 That work, in conjunction with

the SRAM work, inspired the initial idea that we could

extract a DDSD for each process layer using a random-

logic product IC as a virtual test structure.6 The first pub-

lication describing an investigation of this idea

appeared in March 2006.7 Here, we elaborate on that

publication and present new findings from an experi-

ment conducted on test data from silicon wafers pro-

vided by LSI Logic.

Proposed approachOur system accurately characterizes spot defects that

contribute to yield loss by measuring defect density in

each metal IC layer, without the silicon overhead

required by current techniques. The various geometries

and line spacing in a typical layout lead to defects of dif-

ferent sizes with varying effects on the IC (some small

391September–October 2006

(a)

(b)

(c)

Figure 1. Wafers with different test structure

configurations and varying levels of defect

observability (gray areas and scribe lines

represent test structures): all test structures and

no products (a), some test structures replaced by

products (b), and entire area used for products,

with test structures in scribe lines only (c).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 66: 105.100.000 DesignnTest ESL Special Edition Complete

defects may have a negligible impact). Therefore, in

addition to defect density, we must measure the distri-

bution of defect sizes.

The strategy for achieving this goal is straightfor-

ward.6-8 By nature, each spot defect affects only a small

subset of nodes in close proximity to one another. Each

spot defect leads to a unique, defective circuit response.

Likewise, given a circuit response, there are some poten-

tial spot defects that cause that response. Using results

from structural testing, we can estimate the probability

of a particular circuit response and consequently the

probabilities of defect occurrence. By grouping respons-

es according to specific characteristics, such as the size

of a defect necessary to cause that circuit response, we

can determine the occurrence probabilities of defects

of that size.

Using a modeling strategy to predict faulty circuit

responses as a function of defect characteristics in the

process, we can mathematically derive defect charac-

teristics that minimize the difference between the mod-

eled test response probabilities and the estimated test

response probabilities. Thus, the calculated defect char-

acteristics must represent the actual defect characteris-

tics in the process. Of course, for this to be true, certain

conditions must be met. We propose a defect charac-

terization methodology based on this concept. That is,

we develop and apply a modeling strategy that predicts

probabilities of test responses depending on a DDSD,

and then we find the DDSD that leads to agreement

between circuit test responses measured by a tester and

test responses predicted by the model.

To accomplish this, we have developed a modeling

technique that relates the analyzed IC’s test responses

to defect characteristics that could cause such test

responses. We will describe two mappings: one

between defect probabilities and fault probabilities, and

one between faults and test responses.

Microevents and macroeventsA spot of extra conducting material deposited in a

metal layer can introduce an extra, unwanted bridge

connection between nonequipotential metal regions in

the layer. In most cases, a bridge will affect the circuit’s

electrical behavior. An instance of a bridge that con-

nects two or more nonequipotential metal islands is

called a microevent.4 Each microevent involves a set of

circuit nodes, S = {n1, n2, …, nm}, that are bridged by the

spot defect of a specific radius. We can calculate the

probability of a single, independent microevent using

the critical-area yield model.7 Equation 1 shows the prob-

ability that microevent i will occur, where Ci is the

microevent’s critical area, and Dj(ri) is the defect densi-

ty for defects of radius ri (the same radius as microevent

i) in layer j, the layer in which microevent i occurs.

(1)

Here, we define microevent μEi as a bridge, thus limit-

ing our scope to spot defects causing bridges. We do

this for two reasons: First, it is important that the physics

of the investigated yield loss mechanism be well under-

stood, which is indeed the case for bridges. Second,

spots of extra conducting material are still a major rea-

son for IC malfunctions in many processes.

An IC’s vulnerability to random spot defects greatly

depends on the layout. The critical-area concept was

developed to provide a metric of design sensitivity to

defects.1,9 Critical area is the layout region where, if a

spot defect of radius r occurs, a circuit can fail. Figure

2 shows a small portion of a sample layout with signal

lines in metal 1 and metal 2. The figure illustrates six

microevents: four in metal 1 and two in metal 2. Four

sample spot defects demonstrate how a microevent can

occur. Each microevent has an associated critical area

for a specific defect radius. For example, microevents

μE1 to μE3 have critical area for a defect of radius r1, rep-

resented by the solid boxes associated with each

microevent label. Likewise, microevents μE4 to μE6 have

critical area for radius r2, represented by the dashed

boxes. This example shows that even within a single

metal layer, microevents involving the same circuit

node set S can occur in several discrete regions. In this

p ei

C D ri j i= ( )−

ITC Special Section

392 IEEE Design & Test of Computers

c

b

c

b

μE1

μE2

μE4

μE3

μE6

μE5(a) (b)

Figure 2. Sample layout with six microevents: four in metal

layer 1 (a), and two in metal layer 2 (b). Microevents μE1 to

μE3 have radius r1 (solid boxes) and μE4 to μE6 have radius r2

(dashed boxes), where r1 < r2. Spot defects are circles.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 67: 105.100.000 DesignnTest ESL Special Edition Complete

case, S = {b, c}. Each discrete region of critical area rep-

resents a separate microevent. In addition, microevents

involving the same set of circuit nodes can exist in dif-

ferent metal layers.

Critical-area measurement occurs in steps. First, we

measure critical area for all potential microevents in a

layout for a given radius, rstart. In each subsequent step,

the defect radius is incremented by a small amount and

the first step repeated for the new radius. This process

repeats, continuing over a specified range of defect radii

until reaching rend.

We can now define a macroevent as the set of all

microevents that exist for the same set of circuit nodes

S. As mentioned, many microevents involving S can

exist in different layers for different defect radii. So, a

collection of independent microevents describes each

macroevent. Figure 2 shows a single macroevent, occur-

ring between lines b and c, which consists of

microevents 1 through 6. Because a macroevent is a set

of independent microevents, the probability of a

macroevent involving S is one minus the product of the

probabilities of each microevent involving S not occur-

ring. Thus, in this example, the probability of the

macroevent involving b and c occurring is one minus

the product of the probabilities of each of the six

microevents not occurring.

Critical-area extraction for a range of defect radii pro-

vides a list of microevents and their associated critical

areas. With those measurements, we can calculate

microevent probabilities, and thus macroevent proba-

bilities, as a function of defect densities. Because a

macroevent represents a multiline bridge, we have in

fact extracted a list of potential bridge defects along

with their occurrence probability. This results in the first

mapping between defects and faults.

Logic-level modelingThe final modeling stage necessary for mapping

defect characteristics to test responses is a mapping

between the macroevent list and the test responses. This

mapping is embodied by the T matrix, which we calcu-

late by simulating the entire test set against each

macroevent. Because simulation time for a large number

of macroevents (even a small circuit can have hundreds

of thousands) can be enormous, we model them as logic-

level faults, making efficient simulation possible. To

maintain accuracy when simulating at the logic level, we

first derive an accurate gate-level model of the circuit.

Typical standard-cell representations obscure the

cell’s internal workings, causing the omission of impor-

tant signal lines from the logic-level netlist. This netlist

includes only standard-cell ports, even if the standard

cell contains several CMOS logic gates. Therefore, we

map a standard-cell layout to a logic-level description

that captures the structure of static CMOS gates in the

cell, using the gate primitives NAND, NOR, and NOT.

This change lets us consider gate outputs routed in

metal 1 in a standard cell during microevent extraction

and tie them to logic signals in the netlist.

An AND-gate standard cell illustrates this issue.

Typically, an AND gate is implemented in CMOS by a

NAND gate followed by an inverter, with the connec-

tion between the two routed in metal 1. Microevents

involving the internal metal 1 routing might occur, but

without the layout-to-logic mapping used here, we have

no basis for forming a logic-level fault model that

includes this metal line. With our mapping, we can effi-

ciently handle critical area that involves all metal lines

in a standard cell (which can account for a significant

portion of the chip’s total critical area).

However, some standard cells might still contain

metal structures that are not mapped to the logic level.

These polygonal structures are metal lines that don’t

correspond to a CMOS logic gate’s output (these struc-

tures do not include power and ground, which easily

map to logic 1 and 0). They are typically in complex

CMOS gates such as AND-OR-INVERT gates, multiplex-

ers, and other complex logic functions. Although we

could ignore macroevents involving these polygons,

they will become an additional source of error. We

developed a technique to handle the polygons by map-

ping their logic functions to standard cell ports, and we

used this technique in the silicon experiment that we

describe later.

The extracted macroevents represent bridges that

can involve two or more signal lines. Test engineers

commonly use bridge faults10 to model two-line bridge

defects, but because macroevents can involve more

than two lines, more-advanced fault models are neces-

sary. We use the voting-bridge fault model,11 in which

pull-up and pull-down network drive strengths deter-

mine the erroneous lines.

We form a voting model for each macroevent by sep-

arately summing the drive strengths of all lines in the

macroevent driven to logic 0 and logic 1. We then com-

pare the two sums to determine which logic value will

be imposed on the other lines. An error occurs on each

line with the weaker logic value. To implement the vot-

ing model described here, we use fault tuples, a gener-

alized fault representation mechanism.12 Despite the

393September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 68: 105.100.000 DesignnTest ESL Special Edition Complete

complex models we use, the behavior of real spot

defects is unpredictable and therefore can be a source

of error.

To simulate the macroevents modeled as voting-

bridge faults, we use FATSIM, a concurrent fault simu-

lator for fault tuples.12 To determine which test vectors

detect which macroevents, we use no fault dropping

during simulation. The resulting data is stored in the T

matrix, which has the following form:

where V is the number of test vectors simulated, M is the

total number of macroevents, and ts,i is a 1 (0), indicat-

ing that macroevent i is detected (undetected) by test

vector s. The T matrix provides the mapping between

logic-level faults and circuit test responses.

We have verified qualitatively that an inaccurate T

matrix can significantly decrease the overall accuracy

of our DDSD extraction approach. When we use a ran-

dom T matrix, the resulting DDSDs have no resem-

blance to the expected distribution. Therefore, it is

critical that macroevents be modeled precisely and sim-

ulated correctly; otherwise, the T matrix’s quality will

be questionable. Simulation techniques that are more

detailed than a logic-level model (for example, transis-

tor-level Spice simulation) could possibly lead to greater

accuracy, but they would increase the required simu-

lation time considerably.

DDSD extractionAs discussed earlier, we can measure DDSDs by min-

imizing the difference between the predicted and the

observed probability of passing tests (yield per test). We

have described the various components necessary to

predict probability pi of test i passing. We adapt the crit-

ical-area yield model for this task, using critical-area

functions of macroevents, and the DDSD per layer as

parameters of the model. After measuring the T matrix

and critical-area functions of macroevents, the DDSDs

are the only unknown parameters of the model. We can

easily measure observed yield per test p̂i from tester

results as the ratio of the number of chips that pass test

i to the total number of chips manufactured.

We can find the DDSDs that minimize the error

between pi and p̂i by using linear regression. The key

idea is to abandon the concept of individual DDSDs per

layer. Because we will capture each distribution dis-

cretely using some number of points, we can concate-

nate all the DDSDs’ defect densities into a single vector.

The linear regression’s output will be this vector, which

can then be split into a DDSD for each metal layer. We

present a detailed mathematical description of these

steps elsewhere.9,10

Simulation experimentTo evaluate the proposed approach, we performed

an experiment based on a simulated, artificial process.

We assumed DDSDs for each layer of the artificial

process and inserted defects into the process based on

these distributions. We measured the estimated yield

per test vector by emulating a tester. We then applied

the DDSD extraction strategy to the circuit and com-

pared the extracted DDSDs with the inserted DDSDs.

Demonstration circuitFor this experiment, we used circuit c3540 from the

1985 International Symposium on Circuits and Systems

(ISCAS) benchmark suite.13 We logically optimized the

c3540 implementation and technology-mapped it to a

0.18-micron commercial standard-cell library. The final

layout was routed in five metal layers and used approx-

imately 100 μm × 100 μm of area.

In modern manufacturing processes, a design of this

size would typically be free of defects because of rela-

tively low defect densities. To ease the simulation bur-

den, we assumed that a single die consisted of 10,000

parallel instances of c3540, with each instance retain-

ing its original controllability and observability. As a

result, each die had an area of approximately 1 cm2 and

could still be tested with a test set for a single instance

of c3540. Although this die had a total critical area com-

parable to typical chips, it lacked the diverse geometri-

cal features that a die would normally exhibit. However,

the impact of design diversity on the DDSD extraction

technique was not the experiment’s focus.

After preparing the demonstration circuit, we extract-

ed macroevents, modeled them using fault tuples, and

simulated them with FATSIM to generate the T matrix.

The production test set consisted of 155 stuck-at test pat-

terns. During macroevent extraction, we determined crit-

ical area for a range of defect sizes to build a critical-area

function for each macroevent. For metal layers 1 through

4, the critical-area function domain was 0.2 micron to 2

microns, and for metal layer 5, it was 0.34 micron

to 2 microns, with samples spaced at 50-nm intervals.

This resulted in 182 critical-area points. We determined

the limits on the basis of minimum line spacing for the

T =⎡

⎢⎢

⎥⎥

t t

t t

t

tV V

M

V M

11 1 2

1 2

1, ,

, ,

,

,

LM

LM

ITC Special Section

394 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 69: 105.100.000 DesignnTest ESL Special Edition Complete

lower bound and selected the upper

bound to capture a sufficient portion of

the DDSD’s tail. Figure 3 shows the total

discretized critical-area function (sum of

critical area functions of all microevents

involving the layer) for each of the five

metal layers for one instance of c3540.

Tester emulationIn the proposed DDSD extraction

methodology, we measure the yield per

test from the structural test results of a

real tester. In the simulation experiment,

we substituted tester emulation for actu-

al test results. We generated defects

according to a stochastic Poisson

process in which each potential defect is

an independent event. The assumed

DDSD followed the well-known power

law, with the defect densities shown in

Table 1. We increased defect densities to

levels well beyond realistic figures to

reduce the simulation time required for

test emulation.

We consider each macroevent’s

occurrence an independent Poisson

process because we assume that each

defect’s occurrence is independent of all

others. As a result, each macroevent occurs with a fre-

quency dictated by a Poisson process at a rate deter-

mined from the critical-area function of the macroevent

and the DDSDs. Table 2 shows the percentage of dies

containing zero, one, two, or three macroevents in a

sample size of 50,000 for this experiment. From this

table, we reach two conclusions:

■ Because the occurrence rates of the number of

macroevents per die align with the theoretical occur-

rence rates, 50,000 dies are sufficient.

■ Of the simulated dies, multiple macroevents affect

only a small percentage.

From the artificial process simulation, we knew

which macroevents occurred on each faulty die. We

then obtained the yield per test by inspecting the T

matrix. The yield per test varied slightly around an aver-

age of 98% for each test. We assume that no masking

effects occur for dies affected by multiple macroevents.

Thus, if a test detects any of the individual macroevents,

we assume that the test will fail. Table 2 shows that the

assumption that no masking occurs applies to about

0.16% of all dies; thus, any impact from this assumption

is minimal.

395September–October 2006

Defect radius (μm)

Crit

ical

are

a (μ

m2 )

0.2 0.5 1 20.3

12,000

10,000

8,000

6,000

4,000

2,000

0

Metal182 bins19 bins

1 2 3 4 5

Figure 3. Critical-area functions (white symbols) extracted from all metal

layers of a single instance of circuit c3540 from the ISCAS 85 benchmark

suite. Black symbols represent critical-area functions after combining a

range of defect sizes.

Table 1. Injected defect density and size distributions

(DDSDs) following the power law distribution, with power

parameter p and peak-probability parameter X0 = 0.05 μm

for each metal layer. D0 [cm–2] represents defect density.

Metal layer

Parameter 1 2 3 4 5

D0 (cm–2) 1 2 2 1 3

p 3 4 3 2 3

Table 2. Occurrence rates for the number of macroevents per die for a

sample size of 50,000.

Parameter Occurrence rate

No. of macroevents per die 0 1 2 3

Percentage of dies 94.17 5.67 0.15 0.01

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 70: 105.100.000 DesignnTest ESL Special Edition Complete

DDSD extractionWe formulated the DDSD extraction process as a

minimization problem to be solved using linear regres-

sion analysis. Here, we detail the regression procedure

for the demonstration circuit.

As already mentioned, the total number critical-area

points from the critical-area analysis for all layers is 182.

It is natural to likewise want to discretize the DDSDs by

solving for their values at the same points as the critical

area points. Each of these is referred to as a bin. The

individual defect densities in the 182 bins comprise the

DDSD vector we wish to derive. However, given that

there are only 155 test vectors, we can obtain only 155

yields per test. Consequently, there are more unknowns

than equations, which means the minimization is an

undetermined problem with an infinite number of solu-

tions.

To reformulate the problem so that it is solvable, we

grouped sample points for defect size ranges into

fewer, wider bins, thus reducing the overall number of

densities to be derived. Figure 3 shows the 19 bins

used for this experiment. We recalculated critical-area

functions for the new bin arrangements, represented

by the black symbols in Figure 3. This reconstruction

doesn’t affect the T matrix, so there is no need to res-

imulate the faults. We used principal component

regression to find the values for the 19 bins that make

up the DDSDs. We obtained 95% confidence intervals

for the extracted DDSDs, using standard bootstrapping

techniques.14

Figure 4 shows the final extracted results of the analy-

sis for all five metal layers. The triangles represent the

19 extracted DDSD vector components, and the small

circles represent the assumed DDSD components.

Although the results aren’t perfect, the inserted DDSD

and the extracted DDSD correlate well—a positive and

promising result. Figure 4 also shows the 95% confi-

dence intervals for each DDSD component. Some of the

confidence intervals are quite large. The source of this

variance can be traced to the properties of the critical-

area functions and the T matrix. Specifically, critical-

area functions that contribute to one test’s failing

correlate strongly with critical-area functions con-

tributing to other test patterns.

Silicon experimentAfter the success of the simulation experiment, we

conducted a similar experiment on a chip manufac-

tured in a commercial facility. The chip is an array of

64-bit ALUs manufactured in a 0.11-micron process. LSI

Logic designed the chip as a process development and

silicon-debugging vehicle closely mimicking the design

style of the company’s other digital-logic products.

Hence, the chip is ideally suited for testing and validat-

ing our DDSD extraction strategy. Each die contains 384

ALUs, each independently controllable and observable

(similar to the assumption made in the simulation

experiment).

The chip’s structure is convenient from the perspec-

tive of scale because the die is partitioned into many

small blocks, each a single ALU. Although not all

designs are this regular, large designs are frequently par-

titioned into smaller blocks and tested separately with

scan chains. Analyzing each block independently or

limiting the analysis to just a handful of blocks is one

strategy for coping with the large number of macro-

events associated with an industrial design.

We performed the experiment in almost the same

manner as that of the simulation experiment. We

adjusted the critical-area bins to account for the small-

er feature size. The bin edges were 0.1, 0.2, 0.4, 1, and

2 microns. The silicon chip was routed in six layers

rather than five and thus required 23 bins (like metal

layer 5, metal layer 6 was captured with only three

bins). Another difference in this experiment was that

we used real test results for a test set containing 262

patterns provided by the manufacturer. We extracted

the results using 451 failing ALUs; the part’s yield is IP,

so we don’t disclose the total number of manufac-

tured ALUs.

Figure 5 shows the extracted DDSDs for the six

metal layers. We did not simply parameterize an

assumed model, yet the extracted curve for each layer

follows a power law distribution, a DDSD shape typi-

cally found in manufacturing processes. This strongly

indicates that these results are meaningful. Addition-

ally, the plots indicate that although the distributions

don’t vary widely, there are differences in defect den-

sities from layer to layer. The y-axis in each graph has

the same range, making plot comparisons easier. Final-

ly, the large confidence intervals for the smallest

defect sizes in metal layers 5 and 6 occur because

there is very little critical area for small defects in the

higher metal layers, as Figure 6 shows. This can be the

result of either design rules that force lines to be far-

ther apart or simply the decreased routing density in

those layers. Either way, there is limited ability to

observe small defects in those layers—hence, the large

confidence intervals.

The results of the experiment on chips fabricated in

ITC Special Section

396 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 71: 105.100.000 DesignnTest ESL Special Edition Complete

397September–October 2006

Defect radius (μm)

Metal 1

Def

ect d

ensi

ty (

cm−2

)

0.2 0.5 1 2

1 2 5 63 4

Metal 2

7 8

Metal 3

9 10 1411 12

Metal 4

15 16

Metal 517 18 19

1.0

0.8

0.6

0.4

0.2

0

Defect radius (μm)

Def

ect d

ensi

ty (

cm−2

)

0.2 0.5 1 2

1.0

0.8

0.6

0.4

0.2

0

Defect radius (μm)

Def

ect d

ensi

ty (

cm−2

)

0.2 0.5 1 2

1.0

0.8

0.6

0.4

0.2

0

Defect radius (μm)

Def

ect d

ensi

ty (

cm−2

)

0.3 0.5 1 2

1.0

0.8

0.6

0.4

0.2

0

Defect radius (μm)

Def

ect d

ensi

ty (

cm−2

)

0.2 0.5 1 2

1.0

0.8

0.6

0.4

0.2

0

Extracted defect densityPresumed defect density

(a)

(c)

(b)

(d)

(e)

95% confidence interval

Figure 4. Assumed and extracted DDSDs for all metal layers and corresponding 95% confidence

intervals: metal 1 (a), metal 2 (b), metal 3 (c), metal 4 (d), and metal 5 (e).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 72: 105.100.000 DesignnTest ESL Special Edition Complete

silicon confirm the results of the simulation experiment:

We can measure DDSDs that characterize a process in

ordinary digital circuits using only slow, structural test

results from the product.

RATHER THAN DISCARDING pass/fail test results once a

part has been sorted, we can derive valuable process

characteristics from the test data. Our strategy extracts

DDSDs consistent with those we’d expect to see for a

modern manufacturing process—an achievement not

previously accomplished without using additional sili-

con area. Our ongoing research is looking for ways to

improve accuracy by using high-fidelity fault models and

greater data volume, as well as by accounting for yield

loss due to other defect types such as open circuits.

Many manufacturers continue to rely on inspection

techniques whose quality degrades with every new

process generation. Our approach to extracting process

characteristics doesn’t suffer from the same degrada-

tion. Although manufacturers stand to gain much from

using this approach, our strategy also offers an oppor-

tunity for fabless companies to gain insight into the fab-

rication of their chips. For the first time, such companies

can independently compute their products’ defect char-

acteristics and improve design yield by tuning designs

for a given fabline. ■

AcknowledgmentsSemiconductor Research Corporation supported

this work under contract 1172.001.

ITC Special Section

398 IEEE Design & Test of Computers

0.2 0.5 1 2

Metal 1

3 4

0.2 0.5 1 2

Metal 2

7 8

0.2 0.5 1 2

Metal 3

11 12

0.2 0.5 1 2

Metal 4

15 16

0.2 0.5 1 2

Metal 5

18 19 20

0.2 0.5 1 2

Metal 6

21 2214

9 101 2 5 6

23

Defect radius (μm) Defect radius (μm) Defect radius (μm)

Defect radius (μm) Defect radius (μm) Defect radius (μm)

(a) (b) (c)

(d) (e) (f)

Figure 5. Extracted DDSDs for all metal layers in a fabricated 64-bit ALU test chip, and corresponding 95%

confidence intervals. Defect densities are hidden to protect IP, but the scale of all plots is identical. Metal 1 (a),

metal 2 (b), metal 3 (c), metal 4 (d), metal 5 (e), and metal 6 (f).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 73: 105.100.000 DesignnTest ESL Special Edition Complete

References1. W. Maly and J. Deszczka, “Yield Estimation Model for

VLSI Artwork Evaluation,” Electronic Letters, vol. 19, no.

6, Mar. 1983, pp. 226-227.

2. D. Schmitt-Landsiedel et al., “Critical Area Analysis for

Design-Based Yield Improvement of VLSI Circuits,” Qual-

ity and Reliability Eng. Int’l, vol. 11, 1995, pp. 227-232.

3. D.J. Ciplickas, X. Li, and A.J. Strojwas, “Predictive Yield

Modeling of VLSICs,” Proc. 5th Int’l Workshop Statistical

Metrology (WSM 00), IEEE Press, 2000, pp. 28-37.

4. J. Khare, D. Feltham, and W. Maly, “Accurate Estimation

of Defect-Related Yield Loss in Reconfigurable VLSI Cir-

cuits,” IEEE J. Solid-State Circuits, vol. 8, no. 2, Feb.

1993, pp. 146-156.

5. Y.J. Kwon and D.M.H. Walker, “Yield Learning via Func-

tional Test Data,” Proc. Int’l Test Conf. (ITC 95), IEEE

Press, 1995, pp. 626-635.

6. W. Maly, Spot Defect Size Measurements Using Results

of Functional Test for Yield Loss Modeling of VLSI IC,

white paper, Carnegie Mellon Univ., 2004.

7. J.E. Nelson et al., “Extraction of Defect Density and Size

Distributions from Wafer Sort Test Results,” Proc.

Design, Automation and Test in Europe (DATE 06),

IEEE Press, 2006, pp. 913-918.

8. J.E. Nelson et al., Extraction of Defect Density and Size

Distributions from Wafer Probe Test Results, tech. report

CSSI 05-02, Center for Silicon System Implementation,

Carnegie Mellon Univ., 2005.

9. C.H. Stapper, “Modeling of Integrated Circuit Defect

Sensitivities,” IBM J. Research and Development, vol.

27, no. 6, Nov. 1983, pp. 549-557.

10. K.C.Y. Mei, “Bridging and Stuck-at Faults,” IEEE Trans.

Computers, vol. 23, no. 7, July 1974, pp. 720-727.

11. R.C. Aitken and P.C. Maxwell, “Biased Voting: A Method

for Simulating CMOS Bridging Faults in the Presence of

Variable Gate Logic Thresholds,” Proc. Int’l Test Conf.

(ITC 93), IEEE Press, 1993, pp. 63-72.

12. R.D. Blanton, Methods for Characterizing, Generating

Test Sequences for, and Simulating Integrated Circuit

Faults Using Fault Tuples and Related Systems and

Computer Program Products, US Patent 6,836,856,

Patent and Trademark Office, 2004.

13. F. Brglez and H. Fujiwara, “A Neutral Netlist of 10 Com-

binational Benchmark Designs and a Special Translator

in Fortran,” Proc. Int’l Symp. Circuits and Systems

(ISCAS 85), IEEE Press, 1985, pp. 695-698.

14. B. Efron and R.J. Tibshirani, An Introduction to the Boot-

strap, Chapman & Hall, 1993.

399September–October 2006

0.E+00

1.E+04

2.E+04

3.E+04

4.E+04

5.E+04

0 0.5 1.0 1.5 2.0

Crit

ical

are

a (μ

m2 )

Metal 1Metal 2Metal 3Metal 4Metal 5Metal 6

Defect radius (μm)

Figure 6. Total critical-area functions per layer extracted from all metal layers of a 64-bit ALU.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 74: 105.100.000 DesignnTest ESL Special Edition Complete

Jeffrey E. Nelson is a PhD candi-date in the Department of Electri-cal and Computer Engineering atCarnegie Mellon University. His re-search interests include process char-

acterization and testing of digital systems. He has aBS and an MS in electrical and computer engineeringfrom Rutgers University and Carnegie Mellon Univer-sity, respectively. He is a member of the IEEE.

Thomas Zanon is a PhD candidatein the Department of Electrical andComputer Engineering at CarnegieMellon University and a yield rampingconsulting engineer at PDF Solutions,

in San Jose, California. His research interests includedefect and process characterization based on testresults. Zanon has a Dipl. Ing. degree in electricalengineering and information technology from theTechnische Universitaet Muenchen. He is a memberof the IEEE and EDFAS.

Jason G. Brown is a PhD candi-date in the Department of Electricaland Computer Engineering at Car-negie Mellon University. His researchinterests include defect-based test,

inductive fault analysis, and layout-driven diagnosis.He has a BS in electrical engineering from WorcesterPolytechnic Institute and an MS in computer engi-neering from Carnegie Mellon University.

Osei Poku is a PhD candidate in theDepartment of Electrical and Comput-er Engineering at Carnegie MellonUniversity. His research interestsinclude various aspects in test and

diagnosis of VLSI circuits, such as automatic test pat-tern generation, volume diagnosis, and diagnosis-based yield learning. Poku has a BS in electricalengineering from Hampton University and an MS inelectrical and computer engineering from CarnegieMellon University.

R.D. (Shawn) Blanton is a profes-sor in the Department of Electrical andComputer Engineering at CarnegieMellon University, where he is the asso-ciate director of the Center for Silicon

System Implementation (CSSI). His research interests

include test and diagnosis of integrated, heterogeneoussystems. He has a BS in engineering from Calvin Col-lege, an MS in electrical engineering from the Universi-ty of Arizona, and a PhD in computer science andengineering from the University of Michigan, Ann Arbor.

Wojciech Maly is the Whitaker Pro-fessor of Electrical and ComputerEngineering at Carnegie Mellon Uni-versity. His research interests focus onthe interfaces between VLSI design,

testing, and manufacturing, with emphasis on the sto-chastic nature of phenomena relating these three VLSIdomains. Maly has an MSc in electronic engineeringfrom the Technical University of Warsaw and a PhDfrom the Institute of Applied Cybernetics, Polish Acad-emy of Sciences.

Brady Benware is a staff engineerin the Product Engineering group atLSI Logic, where his current focus is on developing defect-based testmethods to achieve very low defec-

tive-parts-per-million levels. Benware has a PhD inelectrical engineering from Colorado State University.

Chris Schuermyer is an engineerin the Advanced Defect Screeninggroup at LSI Logic. His research inter-ests include test for yield and defectlearning, defect-based testing, and

logic diagnosis. He has a BS in physics and a BS andan MS in electrical engineering, all from Portland StateUniversity.

Direct questions or comments about this article toR.D. Blanton, Dept. of Electrical and ComputerEngineering, Carnegie Mellon University, 5000 ForbesAve., Pittsburgh, PA 15213; [email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

ITC Special Section

400 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.

Page 75: 105.100.000 DesignnTest ESL Special Edition Complete

President:DEBORAH M. COOPER* PO Box 8822

Reston, VA 20195

Phone: +1 703 716 1164

Fax: +1 703 716 1159

[email protected]

President-Elect:MICHAEL R. WILLIAMS*Past President: GERALD L. ENGEL*

VP, Conferences and Tutorials: RANGACHAR KASTURI (1ST VP)*VP, Standards Activities:SUSAN K. (KATHY) LAND (2ND VP)*

VP, Chapters Activities: CHRISTINA M. SCHOBER*

VP, Educational Activities: MURALI R. VARANASI†VP, Electronic Products andServices: SOREL REISMAN†

VP, Publications:

JON G. ROKNE†

VP, Technical Activities:

STEPHANIE M. WHITE*

Secretary: ANN Q. GATES*

Treasurer:STEPHEN B. SEIDMAN†

2006–2007 IEEE Division VDirector: OSCAR N. GARCIA†

2005–2006 IEEE Division VIIIDirector: STEPHEN L. DIAMOND†2006 IEEE Div. VIII Director-Elect: THOMAS W. WILLIAMS†

Computer Editor in Chief:DORIS L. CARVER†

Executive Director: DAVID W. HENNAGE†

* voting member of the Board of Governors† nonvoting member of the Board of

Governors

B O A R D O F G O V E R N O R STerm Expiring 2006: Mark Christensen, Alan

Clements, Robert Colwell, Annie Combelles, Ann Q.

Gates, Rohit Kapur, Bill N. Schilit

Term Expiring 2007: Jean M. Bacon,

George V. Cybenko, Antonio Doria, Richard A.

Kemmerer, Itaru Mimura, Brian M. O’Connell,

Christina M. Schober

Term Expiring 2008: Richard H. Eckhouse,

James D. Isaak, James W. Moore, Gary McGraw,

Robert H. Sloan, Makoto Takizawa, Stephanie M.

White

Next Board Meeting: 01 Nov. 06, San Diego, CA

E X E C U T I V E S T A F FExecutive Director: DAVID W.HENNAGEAssoc. Executive Director:

ANNE MARIE KELLYPublisher: ANGELA BURGESSAssociate Publisher: DICK PRICEDirector, Administration:

VIOLET S. DOANDirector, Business & Product Development:

PETER TURNERDirector, Finance and Accounting:

JOHN MILLER

COMPUTER SOCIETY O F F I C E SWashington Office

1730 Massachusetts Ave. NW

Washington, DC 20036-1992

Phone: +1 202 371 0101 • Fax: +1 202 728 9614

E-mail: [email protected]

Los Alamitos Office10662 Los Vaqueros Cir., PO Box 3014

Los Alamitos, CA 90720-1314

Phone:+1 714 8218380

E-mail: [email protected]

Membership and Publication Orders:

Phone: +1 800 272 6657 Fax: +1 714 821 4641

E-mail: [email protected]

Asia/Pacific OfficeWatanabe Building

1-4-2 Minami-Aoyama,Minato-ku,

Tokyo107-0062, Japan

Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553

E-mail: [email protected]

I E E E O F F I C E R SPresident: MICHAEL R. LIGHTNER

President-Elect: LEAH H. JAMIESON

Past President: W. CLEON ANDERSON

Executive Director: JEFFRY W. RAYNES

Secretary: J. ROBERTO DE MARCA

Treasurer: JOSEPH LILLIE

VP, Educational Activities: MOSHE KAM

VP, Publication Services and Products:SAIFUR RAHMAN

VP, Regional Activities: PEDRO RAY

President, Standards Association: DONALD N. HEIRMAN

VP, Technical Activities: CELIA DESMOND

IEEE Division V Director: OSCAR N. GARCIA

IEEE Division VIII Director: STEPHEN L. DIAMOND

President, IEEE-USA: RALPH W. WYNDRUM, JR. rev. 2 Aug. 06

AVAILABLE INFORMATIONTo obtain more information on any of thefollowing, contact the Publications Office:

• Membership applications• Publications catalog• Draft standards and order forms• Technical committee list• Technical committee application• Chapter start-up procedures• Student scholarship information• Volunteer leaders/staff directory• IEEE senior member grade application

(requires 10 years practice and signifi-cant performance in five of those 10)

To check membership status or report achange of address, call the IEEE toll-freenumber, +1 800 678 4333. Direct all otherComputer Society-related questions to thePublications Office, +1 714 821 8380.

PUBLICATIONS AND ACTIVITIES

Computer.The flagship publication of theIEEE Computer Society, Computer publishespeer-reviewed technical content that coversall aspects of computer science, computerengineering, technology, and applications.

Periodicals. The society publishes 14magazines, 10 transactions, and one letters.Refer to membership application or requestinformation as noted at left.

Conference Proceedings & Books.Conference Publishing Services publishesmore than 175 titles every year. CS Presspublishes books in partnership with JohnWiley & Sons.

Standards Working Groups. Morethan 150 groups produce IEEE standardsused throughout the world.

Technical Committees. TCs provideprofessional interaction in over 30 technicalareas and directly influence computer engi-neering conferences and publications.

Conferences/Education. The societyholds about 150 conferences each yearand sponsors many educational activities,including computing science accreditation.

PURPOSE The IEEE Computer Society isthe world’s largest association of comput-ing professionals, and is the leadingprovider of technical information in thefield.

MEMBERSHIP Members receive themonthly magazine Computer, discounts,and opportunities to serve (all activitiesare led by volunteer members). Member-ship is open to all IEEE members, affiliatesociety members, and others interested inthe computer field.

COMPUTER SOCIETY WEB SITEThe IEEE Computer Society’s Web site, atwww.computer.org, offers informationand samples from the society’s publica-tions and conferences, as well as a broadrange of information about technical com-mittees, standards, student activities, andmore.

OMBUDSMAN Members experiencing prob-lems—magazine delivery, membership sta-tus, or unresolved complaints—may write tothe ombudsman at the Publications Office orsend an e-mail to [email protected].

CHAPTERS Regular and student chaptersworldwide provide the opportunity to inter-act with colleagues, hear technical experts,and serve the local professional community.

E X E C U T I V E C O M M I T T E E

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:12 UTC from IEEE Xplore. Restrictions apply.

Page 76: 105.100.000 DesignnTest ESL Special Edition Complete

ITC Special Section

402 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

THIS TRANSITION-FAULT-TESTING TECHNIQUE

combines the launch-off-shift method and an enhanced

launch-off-capture method for scan-based designs. The

technique improves fault coverage and reduces pattern

count and scan-enable design effort. It is practice orient-

ed, suitable for low-cost testers, and implementable with

commercial ATPG tools.

Scan-based structural tests increasingly serve as a cost-

effective alternative to the at-speed functional-pattern

approach to transition delay testing.1,2 Transition fault

testing involves applying a pattern pair (V1, V2) to the

circuit under test. V1 is the initialization pattern, and V2

is the launch pattern. V2 launches the desired signal tran-

sition (0 → 1 or 1 → 0) at the target node, and the

response of the circuit under test is captured at func-

tional speed (the rated clock period). The entire oper-

ation consists of three cycles:

■ initialization—a scan-in operation applies V1;

■ launch—a transition is launched at the target gate ter-

minal (V2 is applied); and

■ capture—the transition is captured at an observable

point.

Transition fault test patterns can be generated and

applied in three ways: the launch-off-shift (LOS)

or skewed-load method, the launch-off-capture (LOC) or

broadside method, or the enhanced-scan method. In

this article, we focus only on the first two methods. In LOS,

the transition at a target gate output is

launched in the last shift cycle during

the shift operation. Figure 1a shows the

waveforms during a LOS operation’s

cycles. The launch cycle is part of the

shift operation and is immediately fol-

lowed by a fast capture pulse. The time

period for the scan-enable signal (SEN)

to make this 1 → 0 transition corre-

sponds to the functional frequency. Hence, LOS

requires that SEN be timing critical. In LOC, the transi-

tion is launched and captured through the functional

pin (D) of any flip-flop in the scan chain.

Figure 1b shows the waveforms of the LOC method,

which separates the launch cycle from the shift opera-

tion. Because launch pattern V2 depends on the func-

tional response of initialization vector V1, the launch

path is less controllable, so test coverage is low. LOC

relaxes the at-speed constraint on SEN and adds dead

cycles after the last shift to provide enough time for SEN

to settle low.

As device frequencies become higher, production

test equipment capabilities limit the ability to test a

device at speed. Rather than purchasing a more expen-

sive tester, test engineers use one of several on-chip DFT

alternatives, such as an on-chip clock generator for at-

speed clock, pipeline SEN generation, or on-chip at-

speed SEN generation3 for LOS transition fault testing.

The LOS method is preferable to the LOC method in

terms of ATPG complexity and pattern count. However,

because of increasing design sizes, the SEN fan-out

exceeds any other net in the design. LOS constrains SEN

to be timing critical, requiring a design effort that makes

it difficult to implement products in reasonable turn-

around times. That’s the main reason for the widespread

use of the LOC method, especially on very low-cost

testers.2 In this article, we propose a hybrid technique

that uses both LOS and LOC in scan-based designs, pro-

Improving Transition DelayTest Using a Hybrid Method

Editor's note:Structured delay test using scan transition tests is becoming commonplace.But high coverage and compact tests can still be elusive in some situations.The authors propose a novel technique combining the cost-effectiveness oflaunch-from-capture test with the coverage/pattern volume advantages oflaunch-from-shift.

—Ken Butler, Texas Instruments

Nisar Ahmed and Mohammad Tehranipoor

University of Connecticut

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 77: 105.100.000 DesignnTest ESL Special Edition Complete

viding higher fault coverage and lower pattern count

with a small scan-enable design effort. (The “Related

work” sidebar discusses other approaches to improving

transition delay test quality.)

OverviewOur proposed scan architecture controls a small sub-

set of selected scan cells by the LOS method, and con-

trols the remaining scan cells by the enhanced

launch-off-capture, or ELOC, method (see the “Related

work” sidebar). We use an efficient ATPG-based con-

trollability-and-observability measurement approach to

select the scan cells controlled by LOS or ELOC. The

selection criteria improve fault coverage and reduce the

overall pattern count. Because a few scan cells are LOS

controlled, only a small subset of the scan chains’ SEN

signals must be timing closed; this reduces the scan-

enable design effort. The method is robust and practice

oriented, and it uses existing commercial ATPG tools.4

To control the scan chain operation mode (LOS or

ELOC), two new cells called local scan-enable genera-

tors (LSEGs) generate on-chip SEN signals. The scan-

enable control information for the launch and capture

cycles is embedded in the test data itself. The LSEGs can

be inserted anywhere in the scan chain with negligible

hardware area overhead. The proposed technique is

suitable for low-cost testers because it doesn’t require

external at-speed SEN.

MotivationELOC improves the controllability of launching a

transition through either the scan path or the functional

path.5 However, it provides less observability than LOS

does because a scan chain working in shift mode to

launch a transition is not observable at the time of cap-

ture (SEN is held high during the launch and capture

cycles). Therefore, ELOC’s fault coverage is less than

that of LOS but greater than that of LOC. Figure 2a (on

p. 405) shows fault coverage analysis for the three tran-

sition fault methods. A common set of transition faults

is detected by both LOS and LOC, and some faults in the

LOC transition fault set are not detected by LOS, such

as shift-dependency untestable faults.6,7 However, ELOC

covers LOC’s entire transition fault set and also detects

some extra faults in the LOS-detected fault set. This is

because LOC is a special case in which all local SEN sig-

nals are held at 0 during the launch and capture cycles.

ELOC provides an intermediate fault coverage point

between LOS and the conventional LOC method.5

To improve fault coverage and identify the union of

fault sets detected in both the LOS and ELOC modes,

the scan cells must be controllable in both modes. Also,

to reduce the design effort for at-speed, scan-enable sig-

nal (required for LOS), we must determine the mini-

mum number of scan cells that require very high

controllability and observability during pattern genera-

tion. We must control the resulting smaller subset of

scan cells in LOS mode, and the remaining scan cells in

ELOC mode. This reduces the design effort to timing-

close the SEN signal at speed as required for LOS-con-

trolled scan flip-flops.

Figure 2b shows an example of a hybrid scan archi-

tecture with eight scan chains. The LOS-controlled scan

flip-flops are stitched in separate scan chains. A fast SEN

signal controls the first three scan chains containing

LOS-controlled flip-flops, and a slow SEN signal controls

the remaining scan chains in ELOC mode. Moreover,

this architecture also requires configuring the LOS-con-

trolled scan chains in functional mode because some

faults are detected only by LOC and not by LOS.

Local SEN generationThe new method for testing transition faults provides

more controllability in launching a transition but

requires an independent SEN for each scan chain. We

can use multiple scan-enable ports, but this increases

403September–October 2006

Scan-in pattern iScan-out response i – 1

Scan-in pattern i + 1Scan-out response i

CaptureLaunchInitialization

Scan-in pattern iScan-out response i – 1

Scan-in pattern i + 1Scan-out response i

CaptureLaunchInitialization

SEN

Clock

Clock

SEN

(a)

(b)

Figure 1. Transition delay fault pattern generation methods:

launch-off-shift (LOS) (a) and launch-off-capture (LOC) (b).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 78: 105.100.000 DesignnTest ESL Special Edition Complete

ITC Special Section

404 IEEE Design & Test of Computers

Wang, Liu, and Chakradhar propose a hybrid scanarchitecture that controls a small subset of selected scancells by launch-off shift (LOS), and the rest by launch-offcapture (LOC).1 The authors have designed a fast scan-enable signal (SEN) generator that drives the LOS-con-trolled scan flip-flops. The selection criteria of theLOS-controlled scan flip-flops determine the method’seffectiveness. In some cases, the number of patterns gen-

erated by the hybrid method exceeds the LOC patterncount. Moreover, the LOS-controlled flip-flops cannot beused in LOC mode. Figure A1 shows the SEN waveformsof this hybrid technique.

In a new scan-based, at-speed test called enhancedlaunch-off-capture (ELOC), the ATPG tool deterministical-ly targets the transition launch path either through a func-tional path or the scan path.2 The technique improves

transition fault testing con-trollability and fault coverage,and it does not require SENto change at speed. FigureA2 shows SEN waveforms inthe ELOC technique. TheSEN signal of a subset ofscan chains stays at 1 (SEN1)during the launch and cap-ture cycles to launch thetransition only. The secondSEN signal (SEN2) controlsthe remaining scan chains tolaunch a transition throughthe functional path during thelaunch cycle and capture theresponse during the capturecycle. Figure A3 shows a cir-cuit with two scan chains,chain 1 acting as a shift reg-ister, and chain 2 acting infunctional mode. The con-ventional LOC method is aspecial condition of theELOC method in which theSEN signals of all chains are0 during the launch and cap-ture cycles.

Two other proposed tech-niques improve LOS faultcoverage by reducing shiftdependency.3,4 A techniqueby Li et al. reorders the scanflip-flops to minimize thenumber of undetectablefaults, and restricts the dis-tance by which a scan flip-flop can be moved to createthe new scan chain order.

Related work

CLK

SEN2

SEN1

(1)

CLK

SEN2

SEN1

(2)

(3)

Combo logic

Chain 2

Chain 1

Controlled by SEN2

Controlled by SEN1

LOC

LOS

Shift-mode

LOC

Scan-in pattern iScan-out response i – 1

Scan-in pattern i + 1Scan-out response i

Scan-in pattern iScan-out response i – 1

Scan-in pattern i + 1Scan-out response i

CaptureLaunchInitialization

CaptureLaunchInitialization

Figure A. Previously proposed techniques: SEN waveforms in hybrid scan

technique (1), SEN waveforms in enhanced LOC (ELOC) technique (2); ELOC

controllability—chain 1 used in shift mode, and chain 2 in functional mode (3).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 79: 105.100.000 DesignnTest ESL Special Edition Complete

the number of pins. Two types of SEN sig-

nals must be generated on chip. The

scan-enable control information for the

scan flip-flops differs only during the pat-

tern’s launch and capture cycles. Hence,

we can use the low-speed SEN signal

from the external tester for the scan shift

operation and internally generate the

scan-enable control information for only

the launch and capture cycles.

LSEG cellsBecause our hybrid technique uses

both LOS and enhanced LOC tech-

niques, we must generate both fast and

slow local SEN signals. We propose two

LSEG cells to generate on-chip local SENs

using a low-speed external SEN generat-

ed by a low-cost tester.

Slow scan-enable generator (SSEG).We designed an LSEG to control a scan

flip-flop’s transition launch path.5 In this

article, we refer to this cell as the slow

scan-enable generator (SSEG) because

the local SEN signal does not make an at-

speed transition. Figure 3a shows the

SSEG cell architecture. It consists of a single flip-flop that

loads the control information required for the launch

and capture cycles. The input scan-enable (SENin) pin

connected to the external SEN signal from the tester is

called global scan-enable (GSEN). An additional out-

put scan-enable pin (SENout LSEN) represents the local

scan-enable (LSEN) signal. Therefore, after going to con-

trol state Q at the end of the shift operation (that is, after

GSEN is deasserted), LSEN remains in this state as long

as GSEN asynchronously sets it to 1. The SSEG cell essen-

tially holds the value 0 or 1 loaded at the end of the shift

operation (GSEN = 1) for the launch and capture cycles:

405September–October 2006

SEN2

LOS-controlledFast SEN signal

ELOC-controlledSlow SEN signal

LOC

LOS

ELOC

AU

SEN1

(b)

(a)

Figure 2. Hybrid method analysis and architecture: Fault analysis of LOS,

LOC, and ELOC techniques (a), and hybrid scan architecture: with LOS-

controlled scan chains using fast SEN signal and ELOC-controlled scan

chains using slow SEN signal (b).

Gupta et al. propose a technique that inserts dummy flip-flops and reorders scan flip-flops, considering wire lengthcosts to improve path delay fault coverage. Wang andChakradhar propose using a special ATPG to identify pairsof adjacent flip-flops and inserting test points (dummygates or flip-flops) between them.5

References1. S. Wang, X. Liu, and S.T. Chakradhar, “Hybrid Delay

Scan: A Low Hardware Overhead Scan-Based Delay Test

Technique for High Fault Coverage and Compact Test

Sets,” Proc. Design, Automation and Test in Europe

(DATE 03), IEEE Press, 2004, pp. 1296-1301.

2. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar,

“Enhanced Launch-off-Capture Transition Fault Testing,”

Proc. Int’l Test Conf. (ITC 05), IEEE Press, 2005, pp. 246-

255.

3. W. Li et al., “Distance Restricted Scan Chain Reordering

to Enhance Delay Fault Coverage,” Proc. 18th Int’l Conf.

VLSI Design, IEEE Press, 2005, pp. 471-478.

4. P. Gupta et al., “Layout-Aware Scan Chain Synthesis for

Improved Path Delay Fault Coverage,” Proc. Int’l Conf.

Computer-Aided Design (ICCAD 03), IEEE Press, 2003,

pp. 754-759.

5. S. Wang and S.T. Chakradhar, “Scalable Scan-Path Test

Point Insertion Technique to Enhance Delay Fault

Coverage for Standard Scan Designs,” Proc. Int’l Test

Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 80: 105.100.000 DesignnTest ESL Special Edition Complete

Table 1 shows the SSEG cell’s operation modes.

GSEN = 1 represents the pattern’s normal shift opera-

tion. When GSEN = 0 and Q = 1, LSEN = 1 and the con-

trolled scan flip-flops act in the shift mode to launch

the transitions-only, shift-launch (no-capture) mode.

Moreover, there is no capture, because the LSEN signal

is 1 (LSEN = 1 → 1 at the launch edge). The other

observable scan flip-flops perform the capture. The

LSEN-controlled scan flip-flops act in the conventional

LOC mode when GSEN = 0 and Q = 0 (functional-

launch-capture mode).

Fast scan-enable generator (FSEG). Figure 3b shows

our new local, at-speed, scan-enable generator architec-

ture, which we call the

fast scan-enable generator

(FSEG). Table 2 shows

the FSEG cell’s operation

modes. As in SSEG cell

operation, GSEN = 1 rep-

resents the pattern’s nor-

mal shift operation. When

GSEN = 0 and Q = 1, LSEN

= 1 and the scan flip-flops

act in the shift-launch-cap-

ture mode to launch the

transition from the scan

path and capture the

response at the next cap-

ture cycle (conventional

LOS method). The LSEN from the FSEG cell makes a

1 → 0 at-speed transition at the launch cycle. The LSEN-

controlled scan flip-flops act in the conventional LOC

mode when GSEN = 0 and Q = 0 (functional-launch-

capture mode).

LSEG cell operationLSEG cells inserted in the scan chains pass control

information as part of the test data. The scan-enable con-

trol information is part of each test pattern and is stored

in the tester’s memory. Figure 4a shows the normal scan

architecture with a single SEN signal from the external

tester. The scan chain contains eight scan flip-flops, and

the shifted test pattern is 10100110. Figure 4b shows the

same circuit, which generates an LSEN signal from the test

pattern data for the hybrid transition fault test method. The

main objective is to deassert the external GSEN signal after

the entire shift operation and then generate the LSEN sig-

nal from the test data during the launch and capture

cycles. In this case, the shifted pattern is modified to

[C]10100110, where C is the scan-enable control bit stored

in the LSEG cell at the end of the scan operation.

The GSEN signal asynchronously controls the shift

operation. GSEN is deasserted after the nth shift (ini-

tialization) cycle, where n = 9; n is the length of the

scan chain after insertion of the LSEG cell. After the

GSEN signal is deasserted at the end of the shift opera-

tion, the scan-enable control during the launch and

capture cycles is control bit C stored in the LSEG. At the

end of the capture cycle, GSEN asynchronously sets the

LSEN signal to 1 for scanning out the response.

Figure 4c shows the process of test pattern applica-

tion and the timing waveforms for the two LSEG cells,

SSEG and FSEG.

LSEN GSEN QGSEN

Q GSEN= +( ) = =

=⎧⎨⎩1 1

0

if

if

ITC Special Section

406 IEEE Design & Test of Computers

SENin (GSEN) SENin (GSEN)

SENout (LSEN) SENout (LSEN)

SD01

Q

CLK

(b)

SD

CLK

01

QDFF

Q QD

FF

(a)

Figure 3. LSEG cells: slow scan-enable generator (SSEG) cell (a) and fast scan-enable

generator (FSEG) cell (b).

Table 1. SSEG operation, where GSEN is the global scan-enable signal, Q

is the flip-flop’s state, and LSEN is the local scan-enable signal.

GSEN Q LSEN Operation

1 X 1 Shift

0 1 1 → 1 Shift-launch (no capture)

0 0 0 → 0 Functional launch and capture

Table 2. FSEG operation.

GSEN Q LSEN Operation

1 X 1 Shift

0 1 1 → 0 Shift-launch capture

0 0 0 → 0 Functional launch and capture

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 81: 105.100.000 DesignnTest ESL Special Edition Complete

Flip-flop selection: Measuringcontrollability and observability

In the LOS technique, the fault activation path (scan

path), unlike the functional path used in the LOC

method, is fully controllable from the scan chain input.

Hence, in most cases, for the same detected fault, a LOS

pattern requires fewer care bits than a LOC pattern. The

controllability measure of a scan flip-flop is the per-

centage of patterns in the entire pattern set (P) for

which a care bit is required in the scan flip-flop to

enable activation or propagation of a fault effect. Figure

5 shows a scan flip-flop with an input (observability)

and output (controllability) logic cone. A large output

logic cone implies that the scan flip-flop will control a

greater number of faults; that is, a care bit will be

required in their activation or propagation. Similarly,

the input logic cone determines a scan flip-flop’s observ-

ability. We define this observability as the percentage

of patterns in the entire pattern set (P) for which a valid

care bit is observed in the scan flip-flop.

In a transition fault test pattern pair (V1, V2), initial-

ization pattern V1 is essentially an IDDQ pattern to initial-

ize the target gate to a known state. In the next time

frame, pattern V2 is a stuck-at-fault test pattern to acti-

vate and propagate the required transition at the target

node to an observable point. Therefore, to find the con-

trollability-observability measure of a scan flip-flop, we

use an ATPG tool to generate stuck-at patterns and force

it to fill in don’t-care (X) values for scan flip-flops that

don’t affect any fault’s activation or propagation. The

407September–October 2006

(b)

[C]10100110

GSEN

LSEN

LSEG (FSEG or SSEG)

(a) GSEN

Scan output Scan input10100110

987654321Shift operation

IC LC CC

CLK

GSEN

C C C SSEG cell

FSEG cell(c)

LSEN = (GSEN + C)

C

Figure 4. LSEG cell operation: Scan chain architecture (a), LSEN generation using LSEG (b), and

LSEN generation process and waveforms (c).

Controllabilitylogic cone

Observabilitylogic cone

QD

Scan input

Scan flip-flop

Figure 5. Scan flip-flop controllability-and-

observability measure.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 82: 105.100.000 DesignnTest ESL Special Edition Complete

ith scan flip-flop’s controllability is Ci = pc/P , where pc

is the number of patterns with a care bit in the scan flip-

flop during scan-in, and P is the total number of stuck-

at patterns. Similarly, observability is Oi = po/P, where po

is the number of patterns with a care bit in the scan flip-

flop during scan-out.

We then use each scan flip-flop’s measured control-

lability and observability factors to determine cost func-

tion CFi = CiOi. The scan flip-flops are arranged in

decreasing order of cost function, and a subset with very

high cost functions is selected as LOS-controlled flip-

flops. The ATPG-based controllability-observability mea-

surement technique overcomes the limitation of the

Scoap-based method8 used by Wang, Liu, and

Chakradhar,6 which makes it possible to select a scan flip-

flop that has high 0(1) controllability but is not controlled

to 0(1) during pattern generation by the ATPG tool.

Case studyThe following case study illustrates DFT insertion and

ATPG flow of our hybrid scan transition fault-testing tech-

nique. It includes an analysis of extra detected faults.

Test architectureThe LSEG-based solution presented

here provides a method of generating

internal LSEN signals from pattern data,

and GSEN signals from the tester. The

overhead of generating the LSEN signal is

the additional LSEG (SSEG or FSEG) cell

in the scan chain. An LSEG cell’s area

overhead is a few extra gates, which is

negligible in modern designs. We assume

that the area overhead of the buffer tree

required to drive all the LOS-controlled

scan flip-flops through the LSEG cells is

equal to the overhead of applying an at-

speed GSEN signal from external ATE.

Figure 6 shows a multiple-scan-chain

architecture with n scan chains. The LOS-

controlled scan flip-flops determined by

the controllability-observability measurement are stitched

in separate scan chains. Each scan chain i, where 1 ≤ i ≤ n,

consists of an LSEG (FSEG or SSEG) cell that generates

signal LSENi for the respective scan chain. The GSEN sig-

nal connects only to the SENin port of the LSEG cells.

Study descriptionIn this case study, we experimented with a subchip

of an industrial-strength design with the characteristics

listed in Table 3. One LSEG cell is inserted per scan

chain. The test strategy was to get the highest possible

transition fault test coverage. When generating test pat-

terns for transition faults, we targeted only faults in the

same clock domain. During pattern generation, only

one clock is active during the launch and capture

cycles. Hence, only faults in that particular clock

domain are tested. All primary inputs remain un-

changed, and all primary outputs are unobservable dur-

ing test-pattern generation for transition faults. This is

because the very low cost testers are not fast enough to

provide PI values and strobe POs at speed.

DFT insertionWe measure a scan flip-flop’s cost function (controlla-

bility × observability) using the ATPG-based technique

explained earlier. Figure 7 shows the cost function of each

scan flip-flop in our design. Approximately only 20% to

30% of the flip-flops require very high controllability and

observability. Hence, SEN need not be at speed for all

scan flip-flops. We arrange the scan flip-flops in decreas-

ing order of cost function, and we use this order during

scan insertion.

ITC Special Section

408 IEEE Design & Test of Computers

LSENi

LOS-controlled

ELOC-controlled

FSEG

SSEG

1

n

2

i

GSEN

Figure 6. Hybrid scan test architecture: FSEG cells driving LOS-controlled

scan chains, and SSEG cells driving ELOC-controlled scan chains.

Table 3. Case study design characteristics.

Characteristics No.

Clock domains 6

Scan chains 16

Scan flip-flops 10,477

Nonscan flip-flops 13

Transition delay faults 320,884

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 83: 105.100.000 DesignnTest ESL Special Edition Complete

In the new order of scan chains, the few

initial chains consist of very high control-

lability-observability flip-flops, and we

select them for LOS according to their aver-

age cost function. We measure a scan

chain’s average cost function as ∑CFi/N,

where CFi = Ci × Oi is the cost function of

the ith scan flip-flop in the chain, and N is

the number of flip-flops in the scan chain.

Figure 8 shows each chain’s average cost

function for normal scan insertion and after

scan insertion based on controllability and

observability. As Figure 8b shows, after this

scan insertion, the average cost function of

the first five scan chains is very high (due

to scan flip-flops with very high cost func-

tions) and very low for the rest of the

chains. Therefore, we can design the first

five chains’ SEN signal to be at speed (con-

trolled by the FSEG cell), and the rest of the

scan chains can use a slow-speed SEN

(controlled by the SSEG cell).

We used the Synopsys DFT Compiler for scan chain

insertion.4 To insert the LSEG cells, the synthesis tool must

recognize the LSEG cell as a scan cell and stitch it into the

chain. This means that the LSEG cell must be defined as

a new library cell with scan cell attributes. A workaround

is to design the LSEG cell as a module, instantiate it, and

declare it as a scan segment of length 1. The GSEN signal

is connected to all LSEG cell SENin pins. During scan inser-

tion, we specify only the LSEG cell in the scan path

because the tool will stitch the rest of the cells, including

the LSEG cell, and balance the scan chain, depending on

the longest scan chain length parameter. Additionally, the

tool provides the flexibility to hook up each LSEG cell’s

SENout port in a particular chain to all the SENin ports of the

scan flip-flops in the respective chain.

ATPGThe ATPG tool must understand the LSEG methodol-

ogy and deterministically choose the transition fault acti-

vation path. We used a commercial ATPG tool, Synopsys

TetraMax,4 which supports two ATPG modes: basic scan

and sequential. Basic-scan ATPG is a combinational-only

mode with only one capture clock between pattern

scan-in and response scan-out; the sequential mode uses

a sequential time-frame ATPG algorithm. By default,

when generating test patterns for the transition fault

model in functional launch mode, the ATPG tool uses a

two-clock ATPG algorithm that has some features of both

the basic-scan and sequential engines. The tool under-

stands the LSEG technique and can choose the launch

path for the target transition fault deterministically.

Hence, there is no fundamental difference in ATPG

methodology when we use the LSEG-based solution.

The SEN signal for the flip-flops in the launch and cap-

ture cycles comes from an internally generated signal.

The OR gate in the LSEG cell generates the LSEN signal

through a logical OR of the flip-flop’s GSEN and Q output

(see Figure 3). The GSEN signal is high during scan shift

operation. The tool determines each chain’s LSEN and

shifts the control value into the LSEG cell during pattern

shift for launch and capture. It also deterministically

decides the combination of scan chains to work in shift

or functional launch mode, to activate a transition fault.

Table 4 shows results for conventional LOS and LOC

(normal scan insertion), ELOC, and hybrid transition

delay ATPG on the case study design. We see that LOS

gave approximately 3% higher fault coverage than LOC.

ELOC gave approximately 1.9% higher fault coverage than

the LOC method. The hybrid technique gave better fault

coverage than the other methods and provided a better

pattern count than the LOC and ELOC methods. The pat-

tern count was greater than that of LOS but at the advan-

tage of less scan-enable design effort—only five scan

chains being timing closed for at-speed SEN. (The hybrid

scan technique proposed by Wang, Liu, and Chakradhar6

sometimes gives a greater pattern count than the LOC

technique.) Our hybrid method used more CPU time than

409September–October 2006

1.0

0.8

0.6

0.4

0.2

0

Cos

t fun

ctio

n =

con

trol

labi

lity

× ob

serv

abili

ty

0 2,000 4,000 6,000No. of cells

8,000 10,000 12,000

Cost function

Figure 7. Cost functions of scan flip-flops in case study design.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 84: 105.100.000 DesignnTest ESL Special Edition Complete

the other techniques because for hard-to-detect faults, the

ATPG tool must do more processing to determine the pos-

sible combinations of the SSEG-controlled scan chains in

shift register mode or functional mode.

Analysis of extradetected faults

As Rearick discusses,

the detection of function-

ally untestable faults poses

a potential yield loss prob-

lem.9 We analyzed the

additional faults detected

by the hybrid scan archi-

tecture over the conven-

tional LOC technique. To

determine the nature of

these extra faults, we per-

formed conventional LOC

ATPG on them. For exam-

ple, for ITC99 benchmark

circuit b17, the hybrid scan

method detected 17,926

extra faults. LOC ATPG on

these faults showed all of

them as nonobservable

faults—faults that can be

controlled but cannot be

propagated to an observ-

able point.

It can be argued that

some of these nonobserv-

able detected faults can

result in yield loss because

some of them might be

functionally untestable.

However, some of these

faults are actually func-

tionally testable but

become nonobservable

because of low-cost tester

ATPG constraints such as

no primary input changes

or no primary output mea-

sures. For example, of the

17,926 extra faults detect-

ed by hybrid scan in the

nonobservable class, 1,155

were detectable without

the low-cost tester con-

straints. Also, Lai, Krstic,

and Cheng show that functionally untestable nonob-

servable faults might not need testing if the defect does-

n’t cause a delay exceeding twice the clock period.10

With technology scaling and increasing operating fre-

ITC Special Section

410 IEEE Design & Test of Computers

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15

Scan chains(a)

(b) LOS

Ave

rage

cha

in c

ost f

unct

ion

0

0.05

0.10

0.15

0.02

0.25

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15

Scan chains

Ave

rage

cha

in c

ost f

unct

ion

Figure 8. Average cost function before (a) and after (b) scan insertion based on

controllability and observability.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 85: 105.100.000 DesignnTest ESL Special Edition Complete

quencies, detecting multicycle delay faults might

become important, and more than two vectors are

required to detect such faults.10 The hybrid scan tech-

nique can be advantageous because it eases ATPG and

detects multicycle faults using a two-vector pair.

Experimental resultsWe experimented with our hybrid scan technique

on the three largest 1999 International Test Conference

(ITC) benchmark circuits and on four more industrial

designs ranging in size from 10,000 to 100,000 flip-flops.

We inserted 16 scan chains in each design. For the LOS

and LOC techniques, we used the Synopsys DFT

Compiler to perform normal scan insertion. For the

ELOC and hybrid techniques, we performed scan inser-

tion based on controllability and observability, and we

inserted one LSEG cell in each scan chain. In the case

of ELOC, we inserted only SSEG cells in each scan

chain. In the hybrid technique, we selected only the first

four scan chains to be LOS controlled (FSEG) after con-

trollability-observability measurement; the remaining

scan chains were ELOC controlled (SSEG). This

reduced the at-speed scan-enable design effort signifi-

cantly because the SEN signal to only one fourth of the

scan flip-flops needed to be timing closed.

During ATPG, the faults related to clocks, scan-enable,

and set or reset pins, referred to as untestable faults, are

not added to the fault list. Table 5 shows the ATPG results,

comparing the LOS, LOC, ELOC, and hybrid methods. The

ELOC method provides higher fault coverage than the

LOC method (up to 15.6% for design b19), and in most

cases an intermediate fault coverage and pattern count

between LOS and LOC. The hybrid method provides bet-

ter coverage than all other methods because it has the

flexibility to use combinations of functional and scan

paths for launching a transition. This method provides

higher fault coverage, by up to 2.68% (design D) and

19.12% (design b19) than LOS and LOC, respectively.

In a worst-case analysis, the lower bound for ELOC

is LOC with no extra faults detected over LOC, and the

upper bound is LOS. Similarly, for the hybrid technique,

the lower bound is ELOC, and the upper bound can be

greater than or equal to LOS. However, in the worst-case

scenario, for a given fault coverage, the hybrid method

will still benefit in test-pattern count reduction com-

pared to LOC, thereby reducing test time, with mini-

mum scan-enable design effort. In some cases, the CPU

time for the hybrid or ELOC method is greater than that

of the LOC method because the ATPG tool needs a larg-

er search space to find the transition launch activation

path for hard-to-detect faults.

Typically, in an ASIC design flow, scan insertion takes

place in a bottom-up manner, independent of a physical

synthesis step. The DFT insertion tool stitches the scan

chains based on the alphanumeric order of scan flip-flop

names in each module. The resulting scan chains are

then reordered during physical synthesis to reduce the

scan chain routing area. At the top level, the module-level

411September–October 2006

Table 4. Case study ATPG results.

Parameter LOS LOC ELOC Hybrid

Detected faults 292342 282658 288681 295288

Test coverage (%) 91.30 88.27 90.15 91.92

Fault coverage (%) 91.11 88.09 89.96 91.74

Pattern count 1,112 2,145 2,014 1,799

CPU time (s) 329.30 896.96 924.74 1,014.60

Table 5. ATPG results for 1999 International Test Conference (ITC) benchmark circuits and industrial designs.

LOS LOC ELOC Hybrid

No. of Fault Fault Fault Fault

FFs coverage No. of CPU coverage No. of CPU coverage No. of CPU coverage No. of CPU

Design (1,000s) (%) patterns time (s) (%) patterns time (s) (%) patterns time (s) (%) patterns time (s)

b17 1.4 95.09 1,088 95.4 81.02 1,190 1,000.8 94.29 1,328 325 96.50 1,179 187.9

b18 3.3 92.67 1,451 279.7 77.50 1,309 1,020.9 93.01 1,876 726 95.18 1,334 336.6

b19 6.6 85.98 2,280 645.3 69.21 1,153 1,050.4 84.81 1,422 1,000 88.33 1,590 1,000.9

A 10 91.11 1,112 329 88.09 2,145 896 89.96 2,014 924 91.74 1,799 1,014

B 30 87.94 4,305 3,569 85.14 8,664 7,800 86.57 8,539 8,702 88.03 8,062 6,611

C 50 81.10 6,869 8,415 79.42 12,073 22,930 80.48 11,583 25,642 83.29 8,134 14,451

D 104 92.15 5,933 6,559 91.56 10,219 12,088 92.28 12,505 47,788 94.83 9,674 18,410

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 86: 105.100.000 DesignnTest ESL Special Edition Complete

scan chains are stitched together. Similarly, in our bot-

tom-up scan insertion flow, the scan chains in each mod-

ule are stitched based on the decreasing order of scan

flip-flops’ cost functions, and the resulting scan chains

are reordered during physical synthesis to reduce the

scan chain routing area. Therefore, the new scan inser-

tion method will not be affected significantly because

scan insertion and physical synthesis are not performed

for the entire chip. Although, it can be argued that our

scan chain stitch for controllability and observability

might slightly increase the scan chain routing area in

some cases, the decreases in scan-enable design effort

and area overhead compared with LOS are significant.

Moreover, the technique has the flexibility to shuffle and

reorder the different groups of scan chains (LOS con-

trolled and ELOC controlled) if any scan-chain-routing

problem arises.

THE PROPOSED HYBRID TECHNIQUE significantly

reduces the design effort and eases the timing closure

by selecting a small subset of scan chains to be con-

trolled using LOS. The experimental results also show

that the pattern count is reduced and fault coverage is

considerably increased. A statistical analysis is required

to find the optimum number of LOS-controlled scan

chains. Minimizing the number of LOS-controlled scan

chains will even further reduce the design effort, and

future work must focus on this issue. ■

AcknowledgmentsMohammad Tehranipoor’s work was supported in

part by SRC grant no. 2005-TJ-1322. Nisar Ahmed per-

formed the implementation work at Texas Instruments,

India.

References1. X. Lin et al., “High-Frequency, At-Speed Scan Testing,”

IEEE Design & Test, vol. 20, no. 5, Sept.-Oct. 2003, pp.

17-25.

2. J. Saxena et al., “Scan-Based Transition Fault Testing—

Implementation and Low Cost Test Challenges,” Proc. Int’l

Test Conf. (ITC 02), IEEE Press, 2002, pp. 1120-1129.

3. N. Ahmed et al., “At-Speed Transition Fault Testing with

Low Speed Scan Enable,” Proc. 24th VLSI Test Symp.

(VTS 05), IEEE Press, 2005, pp. 42-47.

4. User Manual for Synopsys Toolset Version 2005.09,

Synopsys, 2005.

5. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar,

“Enhanced Launch-off-Capture Transition Fault Testing,”

Proc. Int’l Test Conf. (ITC 05), IEEE Press, 2005, pp.

246-255.

6. S. Wang, X. Liu, and S.T. Chakradhar, “Hybrid Delay

Scan: A Low Hardware Overhead Scan-Based Delay

Test Technique for High Fault Coverage and Compact

Test Sets,” Proc. Design, Automation and Test in Europe

(DATE 03), IEEE Press, 2004, pp. 1296-1301.

7. S. Wang and S.T. Chakradhar, “Scalable Scan-Path

Test Point Insertion Technique to Enhance Delay Fault

Coverage for Standard Scan Designs,” Proc. Int’l Test

Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.

8. L.H. Goldstein and E.L. Thigpen, “SCOAP: Sandia Con-

trollability/Observability Analysis Program,” Proc. 17th

Design Automation Conf. (DAC 80), IEEE Press, 1980,

pp. 190-196.

9. K.J. Rearick, “Too Much Delay Fault Coverage Is a Bad

Thing,” Proc. Int’l Test Conf. (ITC 01), IEEE Press, 2001,

pp. 624-633.

10. W.C. Lai, A. Krstic, and K.T. Cheng, “On Testing the

Path Delay Faults of a Microprocessor Using Its Instruc-

tion Set,” Proc. 19th VLSI Test Symp. (VTS 00), IEEE

Press, 2000, pp. 15-20.

Nisar Ahmed is a PhD student in theElectrical and Computer EngineeringDepartment of the University of Con-necticut. His research interests includedesign for testability, at-speed testing,

and CAD. Ahmed has an MS in electrical engineeringfrom the University of Texas at Dallas. He is a memberof the IEEE.

Mohammad Tehranipoor is anassistant professor in the Electricaland Computer Engineering Depart-ment at the University of Connecticut.He has a PhD in electrical engineering

from the University of Texas at Dallas. His researchinterests include computer-aided design and test,DFT, delay fault testing, test resource partitioning, andtest and defect tolerance for nanoscale devices. He isa member of the IEEE, the ACM, and ACM SIGDA.

Direct questions and comments about this article to Mohammad Tehranipoor, ECE Dept. of Univ. ofConnecticut, Storrs, CT 06268; [email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

ITC Special Section

412 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.

Page 87: 105.100.000 DesignnTest ESL Special Edition Complete

Special ITC Section

414 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

CMOS TECHNOLOGY SCALING has brought circuit

applications using hundreds of millions of transistors with

dimensions below 65 nm and operating frequencies

beyond 4 GHz. Among the many challenges imposed by

this scaling race during the past decade, increasing power

consumption from generation to generation is a major

concern. Two factors have caused most of the increase

in total circuit power consumption: a scaling model

based on supply voltage reduction, forcing the same

trend on transistor threshold voltage, and an increase in

operating frequency. The first factor contributes to static

or leakage power increase because of the exponential

dependence of the transistor’s off-state current on the

threshold voltage. The second factor determines active

power because of short-circuit and capacitor charg-

ing/discharging current components. Researchers have

pursued the development of advanced techniques to con-

trol IC total power consumption; these techniques span

many domains, including manufacturing technology,

device design, circuit design, and architecture.

In addition to increasing overall power, a related

effect drawing significant attention is increasing power

density. This increase is due to circuit critical-dimension

reduction, which packs more active devices per unit area

and therefore increases both static and dynamic power

density components. This trend has a direct impact on

circuit junction temperature, with a resulting increase of

overall average operating temperature. Power density

relates closely to circuit activity and gen-

erally is not uniformly distributed within

the circuit. As a result, thermal gradients

between circuit regions can be as high as

40°C to 50°C in high-performance designs,

creating nonuniform thermal maps.1 This

phenomenon can lead to hot spots in

localized IC regions.

The main challenges to the accurate prediction of

power density distribution and control stem from a lack

of tools capable of handling the various mechanisms

that determine hot-spot appearance. Such capabilities

would include accurate layout-based determination of

induced activity and resulting power distribution, cir-

cuit thermal-impedance computation, and heat flux dis-

tribution determination.

Power containment tools and methods have tradi-

tionally targeted overall mean power or peak power esti-

mation and reduction and in general are not valid to

predict hot spots. Predetermination of circuit hot spots

is important not only for reliability (for example, an

increase in wire temperature accelerates interconnect

electromigration), but also because of the circuit’s delay

dependency on temperature. Hot spots can slow specific

circuit regions with respect to other blocks or the clock

line and can cause circuit failure because of timing-rule

violation. Circuit hot spots can also directly affect the

clock line at a given point, causing timing violations.

These problems pose two concerns for circuit testing:

■ normal circuit operation can induce a given thermal

map that is not reproduced during circuit testing,

and

■ activity induced during circuit testing can lead to

modified thermal maps that can cause a circuit to

erroneously pass or fail the test.

Impact of Thermal Gradientson Clock Skew and Testing

Editor's note:It is a well-known phenomenon that test-mode switching activity and powerconsumption can exceed that of mission mode. Thus, testing can inducelocalized heating and temperature gradients with deleterious results. The authorsquantify this problem and propose a novel design scheme to circumvent it.

—Ken Butler, Texas Instruments

Sebastià A. Bota,

Josep L. Rosselló, and

Carol de Benito

University of the Balearic Islands

Ali Keshavarzi

Intel Jaume Segura

University of the Balearic Islands

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 88: 105.100.000 DesignnTest ESL Special Edition Complete

Differences in thermal-map distribution between nor-

mal and test mode operations lead to a nonuniform effect

on relative path delay within logic blocks. Test-induced

hot spots can artificially slow noncritical paths or accel-

erate critical ones with respect to the clock, causing the

entire die to fail (pass) delay testing for a good (bad) part.

Therefore, if designers don’t properly consider higher

activity during test mode and its effect on the clock net-

work, a given percentage of dies can fail during test due

to test-induced thermal-map modification. This would

cause increased yield loss because the thermal map’s

impact on path delay during normal operation is different

from that induced during test. This article shows that clock

circuit distribution plays an important role in determining

the effect of these mechanisms on circuit behavior.

The evolution of VLSI chips toward larger die sizes

and faster clock speeds makes clock design an increas-

ingly important issue. In a synchronous digital IC, the

clock network significantly influences circuit speed, area,

and power dissipation. Because the clock function is vital

to a synchronous system’s operation, clock signal char-

acteristics and distribution networks have drawn much

attention. Any uncertainty in clock arrival times between

two points, especially if these points are near each other,

can limit overall circuit performance or even cause func-

tional errors. Clock signals typically carry the largest fan-

outs, travel over the longest distances, and operate at the

highest speeds of any signal, either control or data, in the

entire chip. Furthermore, technology scaling particular-

ly affects clock signals because long global interconnect

lines become more resistive. In addition, as technology

feature size shrinks, global metal layers that carry the

clock signal are closer to the substrate while the use of

low-k dielectrics for intralevel gap filling can significant-

ly increase thermal effects because these dielectrics have

lower thermal conductivity than SiO2. Both effects con-

tribute to a higher impact of substrate temperature

nonuniformities on the clock line thermal distribution.

Therefore, designers must investigate the possibility that

the nonuniform substrate temperature’s effect on clock

skew is a new delay fault mechanism, even with exact

zero-skew clock-routing algorithms.

In this article, we analyze the impact of within-die

thermal gradients on clock skew, considering tempera-

ture’s effect on active devices and the interconnect sys-

tem. This effect, along with the fact that the test-induced

thermal map can differ from the normal-mode thermal

map, motivates the need for a careful consideration of

the impact of temperature gradients on delay during

test. After our analysis, we propose a dual-VDD clocking

strategy that reduces temperature-related clock skew

effects during test.

Clock networks and clock skewClock network design is a critical task in developing

high-performance circuits because circuit performance

and functionality depend directly on this subsystem’s

performance. When distributing the clock signal over

the chip, clock edges might reach various circuit regis-

ters at different times. The difference in clock arrival time

between the first and last registers receiving the signal is

called clock skew. With tens of millions of transistors

integrated on the chip, distributing the clock signal with

near-zero skew introduces important constraints in the

clock distribution network’s physical implementation

and affects overall circuit power and area.

Researchers have done extensive work on automat-

ic clock network design to minimize the effect of unbal-

anced clock path delays resulting from routing or

differences in capacitive loading at the clock sinks.2 Most

clock distribution schemes exploit the irrelevance of the

absolute delay from a central clock source to clocking

elements—only the relative phase between two clock-

ing points is important. Early methods used symmetric

structures such as H trees or balanced trees. Figure 1

shows the H-tree clock topology, which consists of

trunks (vertical lines) and branches (horizontal lines).

In nonbuffered trees, top-level interconnect segments

are wider than lower-level segments. Furthermore, top-

415September–October 2006

D/2

Figure 1. Symmetric three-level H-tree layout for

clock distribution. D is the length of the H tree.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 89: 105.100.000 DesignnTest ESL Special Edition Complete

level global interconnect segments are routed through

upper metal layers, whereas low-level local segments are

routed through lower metal layers.

In addition to zero skew, a second important require-

ment for a clock network is obtaining a high slew rate

to get sharp clock edges. Designers achieve this by

inserting buffers and repeaters in the clock network, cre-

ating a multistage clock tree, to isolate downstream

capacitance and reduce transition times. Clock net-

works with several buffer stages are common in high-

performance digital designs. Researchers have also

proposed approaches that incorporate uneven loading

and buffering effects resulting in non-H-tree topologies.3

Current designs incorporate clock distribution net-

works consisting of two parts: a global clock network

and a local network. The global clock network distrib-

utes the clock signal from the clock source to local

regions and usually has a symmetric structure. The local

distribution network delivers clock signals to registers

in a local area using a nonsymmetric structure because

register location in the circuit is typically not regular.

Any phenomenon that affects a net’s delay can con-

tribute to skew, so we can no longer ignore the portion

of clock skew caused by process variations in nanome-

ter technologies. Process variations—such as effective

gate length, doping concentrations, oxide thickness,

and interlayer dielectric thickness—cause uncertain

device and interconnect characteristics and can be a

source of significant clock skew. Dynamic variations—

such as power supply variations, coupling noise, and

junction temperature—can contribute to additional

skew during circuit operation.4 Temperature is difficult

to model and predict because of the switching activi-

ties of the various blocks composing the circuit and

their variation over time. Thus temperature is an impor-

tant source of skew. A nonuniform temperature gradi-

ent created by a hot spot can significantly impact clock

tree performance and worsen worst-case clock skew.

Algorithms used to design zero-skew clock tree net-

works usually don’t consider process variations or

nonuniform thermal distributions as possible sources

for clock skew. Researchers have proposed grid-based

clock networks driven by one or more lines of buffers

as an alternative to tree topologies. This method has

proved highly effective in reducing sensitivity to process

variations and environmental effects, typically at the

cost of consuming more wire resources and power. A

recent trend is to use hybrid structures formed by a sym-

metric tree and a mesh for the global clock network.5

Mori et al. demonstrated that adding a mesh to bottom-

level leaves of an H tree helps significantly reduce clock

skew caused by process variations.6

We focus on the relative impact of temperature and

nonuniform thermal maps on hybrid clock networks, as

they are widely used to achieve low clock skew and

power consumption.

Temperature effects on delayThe impact of environmental variations on skew is

difficult to analyze given its dependence on circuit

activity that changes over time. The two major sources

of environmental variations are power supply variations

and temperature. Power supply variations are the main

source of jitter, whereas temperature is a source of skew

(typical time constants for temperature changes are on

the order of milliseconds).

Temperature affects the delay of both interconnect

lines and clock buffers. The main sources of tempera-

ture generation in the chip are switching activities of the

cells over the substrate and joule heating of the inter-

connects when current passes through them. In a high-

performance design, junction temperature can vary

more than 50°C and reach an absolute temperature of

120°C in some circuit regions. To explain these mecha-

nisms, we introduce the temperature dependence of

interconnect and buffer parameters.

Interconnect temperature dependenceInterconnect delay relates to metal resistance and

the parasitic capacitance of wires that connect gates.

An interconnect’s resistance has a polynomial relation-

ship to its temperature. Assuming a first-order approxi-

mation, this dependence is

R(T) = r0(1 + β(T – T0)) (1)

where r0 is the unit length resistance at reference temper-

ature T0, and β is the temperature coefficient of resistance

(°C–1). The dependence of capacitance on temperature

is usually small and is not comparable to resistance vari-

ations. Deutch et al. reported that temperature variation

has a marked impact on wire delay for long interconnects

that are basically resistance limited in terms of delay (as

compared with capacitive and inductive components).7

Interconnect line resistance changes are about 20%, for a

variation of 75°C from ambient temperature.

Buffer temperature dependenceBuffer delay also changes with temperature through

transistor parameters’ dependence on junction tem-

Special ITC Section

416 IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 90: 105.100.000 DesignnTest ESL Special Edition Complete

perature. These parameters include threshold voltage

(VT), mobility (μ), and silicon energy band gap (Eg).

Energy band gap thermal variations are usually small

and not comparable to VT and μ variations. The expres-

sions for the relationships of these last two components

with temperature are

VT(T) = VT(T0) – κ(T – T0)

and

μ(T) = μ(T0)(T/T0)–M

where T0 is room temperature (T0 = 300 K); κ is the

threshold voltage temperature coefficient, whose typi-

cal value is 2.5 mV/K, and M is the temperature expo-

nent, whose typical value is 1.5.

Junction temperature variation is an important

source of driver resistance variation and can have a sig-

nificant impact on buffer propagation delay. Figure 2

shows the variation of high-to-low and low-to-high prop-

agation time for a 70-nm inverter, obtained from elec-

trical simulations using Berkeley Predictive Technology

models (http://www.eas.asu.edu/~ptm).

The switching speed of CMOS inverters used as buffers

is basically a function of resistance-capacitance (RC) time

constants. To determine switching speed in Figure 2, we

measured the 50% transition delay of an inverter loaded

with another inverter stage and ideal wires. We assume

that capacitance is temperature independent. Figure 2

shows that a model similar to the one in Equation 1 can

approximate driver resistance variation with temperature.

Our analysis of interconnect and buffer delay varia-

tion with temperature makes clear that a uniform

increase of IC junction temperature results in a net

increase in absolute delay through the clock distribu-

tion path (clock latency). In balanced trees, this effect is

irrelevant because the main parameter for setting the

system clock period is the worst-case delay of logic

blocks between two consecutive register stages. The key

parameter affecting skew is the relative arrival of the

clock edge at registers at the end of each clock path.

Nonuniform thermal map effectsAs mentioned earlier, an IC’s power dissipation dis-

tribution is not uniform and depends on device and

interconnect electrical characteristics, layout circuit

placement, and the relative switching activity of differ-

ent chip blocks. In this sense, dynamic thermal gradi-

ents are inevitable during normal circuit operation.

Here, we compare temperature effects on nonbuffered

and buffered clock tree networks.

Nonbuffered treesWe model nonbuffered trees using a lumped-RC tree.

Figure 3a shows an example RC tree. We assume that the

417September–October 2006

1.2E-11

1.6E-11

2.0E-11

2.4E-11

2.8E-11

3.2E-11

−50 −30 −10 10 30 50 70 90 110 130Temperature (ºC)

Tim

e (s

)

High-to-low

Low-to-high

Figure 2. Delay versus temperature in a 70-nm

low-leakage inverter gate.

n0

n0

C3R2

R3

R4

R6

R7

C2

R5

C5

R1

C1

C4

C6

C7

n1

n2

n3

n4

n3 n4

n1 n2

(b )

(a)

Figure 3. RC tree used to compute Elmore delay (a) and

equivalent one-level H tree (b).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 91: 105.100.000 DesignnTest ESL Special Edition Complete

tree has been designed such that the only sources of

skew are process variations and environmental condi-

tions. Using an Elmore delay metric, the delay from root

node n0 to sink node ni in the RC tree is

(2)

where Rj is the set of resistances in the path between

the source (root) and node ni, and ∑Ck is the down-

stream capacitance at j, defined as the sum of all

capacitances at any node k such that the unique path

in the tree from k to the root must pass through j. As an

example, we can compute the delay from root node n0

to node n3 in the H tree of Figure 3b as follows:

Tree symmetry leads us to assume that at the refer-

ence temperature, R2(T0) = R5(T0) = RL1,0, C2 = C5 = CL1,

R3(T0) = R4(T0) = R6(T0) = R7(T0) = RL2,0, and C3 = C4 = C6

= C7 = CL2, therefore there is no skew between nodes n1,

n2, n3, and n4. Given that resistances are temperature

dependent and parameter β is positive, performance

degrades with increasing temperature (worsening the

effective signal delay). In addition, because a nonuni-

form thermal profile doesn’t impact all regions of the

clock network distribution but slows only a restricted

area, it has a major effect on skew. Therefore, as a result

of temperature nonuniformities, the H tree’s symmetry

cannot guarantee zero skew.

For simplicity and without loss of generality, we con-

sidered a symmetric three-level H-tree clock structure

to evaluate and compare the effects of variability and

temperature gradients in nonbuffered structures. The

area covered by the tree is 5 mm × 5 mm. We consid-

ered circuit parameters for AlCu interconnects with

β = (3 ×10–3)°C–1, rsh = 0.077 Ω/sq at T0, and csh = 7.68×10–18

F/μm2 as unit sheet resistance and unit area capaci-

tance, respectively. We analyzed clock tree structures

with three different designs:

■ Design A is a clock tree using minimum-width inter-

connects.

■ Design B has interconnect widths computed with

Chen and Wong’s algorithm,8 which optimizes for

both clock delay and minimum skew.

■ Design C is the same as Design B except that it has a

grid shorting the H tree’s sink nodes. This modifica-

tion has moderate impact on mean delay but pro-

vides significant skew reduction.

We investigated the impacts of parameter variation

and temperature gradients on skew for each structure.

Table 1 shows mean delay, mean skew, sigma skew,

and maximum skew obtained from Monte Carlo simu-

lations for 1,000 samples at a uniform room tempera-

ture. Both mean delay and skew from design A (wmean =

0.45 μm, 3σ = 20%) are much higher than those

obtained from design B, which used the optimization

algorithm. Design C provides the best values for sigma

and maximum skew distributions, while providing

about one third of additional overall delay with respect

to design B. Redundancy created by mesh loops

smoothes out undesirable variations between signal

nodes spatially distributed over the chip.9

Figure 4 shows skew induced by a local hot spot of

radius D/8 (D is the length of the H tree shown in Figure

1) when located at different positions of the H tree

obtained for designs A and B. Our most significant

observations from these experiments are as follows:

■ Total skew depends on hot-spot position.

■ In nonoptimized trees, worst-case skew occurs when

the hot spot appears near the clock driver.

■ A design algorithm to optimize clock tree skew also

optimizes the impact of thermal-induced hot spots.

■ In optimized clock trees, depending on the hot spot’s

magnitude and size, its impact can be about 20% of

the skew from parameter variations.

Figure 5 compares worst-case clock skew caused by

hot spots affecting one whole quadrant for designs B (no

grid) and C. For C, we considered an ideal grid (no para-

sitic capacitance) and a realistic grid (with parasitic

capacitance). The amount of worst-case skew caused by

a 10°C difference is of the same order of magnitude as the

D R C R C R Ck

k

k

k

k

k

1 1

1

7

2

2

4

3 3=⎛

⎝⎜

⎠⎟ +

⎝⎜

⎠⎟ +

=

=

=

=

∑ ∑

DD R C R C R Ck

k

k

k

k

k

3 1

1

7

5

5

7

6=⎛

⎝⎜

⎠⎟ +

⎝⎜

⎠⎟ +

=

=

=

=

∑ ∑ 66

D R Ci j

j

k

k

=⎛

⎝⎜

⎠⎟∑ ∑

Special ITC Section

418 IEEE Design & Test of Computers

Table 1. Comparison of unbuffered clock tree designs.

Design Mean Mean Sigma Maximum

style delay (ps) skew (ps) skew (ps) skew (ps)

A 356.11 35.84 15.09 110.1

B 68.95 4.18 0.88 7.45

C 107.0 1.06 0.33 2.60

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 92: 105.100.000 DesignnTest ESL Special Edition Complete

delay of one clock buffer, while the skew caused by a

50°C difference is of the same order of magnitude as the

skew from process parameter variations. Figure 5 also

shows that inserting a grid reduces skew resulting from

nonuniform thermal maps.

Buffered treesBuffers isolate downstream capacitance in the clock

network (see Equation 2), thus reducing latency and

transition times. In these networks, buffers are a prima-

ry source of total clock skew for two reasons. First,

device parameter variation with temperature is much

larger than interconnect variation. Delay degradation

caused by temperature effects on the driver’s on-resis-

tance are far more severe than delay caused by inter-

connect resistance thermal dependency. Second, delay

related to wiring length between two consecutive buffer

stages is independent of the RC parameters of previous

and subsequent wiring stages.

We designed a buffered H-tree clock network (design

A) and a clock network with a grid shorting the buffered H

tree’s sink nodes (design B) in a 1-V nominal supply volt-

age, 70-nm technology (http://www.eas.asu.edu/~ptm).

For design B we considered an ideal grid B1 (no parasitic

capacitance) and a realistic grid B2 (with parasitic capac-

itance). We considered a 2-mm × 2-mm chip and synthe-

sized a three-level symmetric H tree using the method

described by Cheng et al.,10 obtaining five buffer stages

between the clock source and any of the 64 sinks.

To compute process variability’s influence on skew,

we repeated the Monte Carlo analysis described earlier

(a 3σ variation of 30% in threshold voltage and 20% in

interconnection width). Table 2 shows mean delay,

mean skew, sigma skew, and maximum skew at a uni-

form room temperature. Again, redundancy created by

mesh loops noticeably reduces undesirable variations

between signal nodes spatially distributed over the chip.

419September–October 2006

1.0E-11

8.0E-12

6.0E-12

4.0E-12

2.0E-12

0.0E+00

Hot spotS

kew

(s)

1.4E-12

1.0E-12

1.2E-12

8.0E-13

6.0E-13

2.0E-13

4.0E-13

0.0E+00

Hot spot

Ske

w (

s)

(a) (b)

Figure 4. Hot-spot-induced skew at different clock tree positions for design A (a) and design B (b).

The skew is computed across the whole clock tree; only the quadrant where the hot spot is placed

is shown for simplicity.

0

1E-12

2E-12

3E-12

4E-12

5E-12

6E-12

7E-12

0 10 20 30 40 50Temperature increase (ºC)

Ske

w (

s)

Without gridNonideal gridIdeal grid (C = 0)

Figure 5. Impact of a hot spot on skew in one

clock network quadrant for an optimized clock

tree without a meshing grid, with an ideal grid

without parasitic capacitances, and with a grid

including parasitic capacitances.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 93: 105.100.000 DesignnTest ESL Special Edition Complete

Comparing designs B1 and B2 shows the impact of the

additional capacitance caused by the grid.

Figure 6 plots total skew’s dependency on the mag-

nitude of temperature increase between two different

clock paths (we assume that the hot spot affects all

stages of one path, while the other path remains at the

reference temperature). The figure shows that skew is

roughly proportional to ΔT.

A comparison of results in Table 2 and Figure 6 indi-

cates that in a clock network without a grid, skew relat-

ed to a hot spot that increases temperature by 30°C can

be as much as 20% of overall clock latency (mean

delay). The skew plotted in Figure 6 is due only to the

thermal gradient’s effect; if the combined effect of ther-

mal maps and process variability is included, skew

increases 1.3 times in case A and 2.0 times in case B2.

The interconnect system plays a fundamental role in

overall delay (which decreases by 50%—from 403 ps to

142 ps—if the interconnect is neglected through zero

wire resistance and capacitance). Despite this benefit,

the interconnect system’s impact on thermal-induced

skew is around 7%. Therefore, although overall delay is

interconnect dominated, its heat-related variation is

mainly due to active devices.

We also ran two experiments to investigate the rel-

ative impact on delay of the number of inverters rela-

tive to spot size. In the first experiment (the

nongradual case), we computed the skew caused by

eight equal-size inverters, five at the same hot-spot-ele-

vated temperature, and three at a reference tempera-

ture. In the second experiment, we considered a

chain affected by a gradual hot spot—not all invert-

ers affected by the hot spot had the same temperature,

but the chain had a nonuniform, gradual thermal pro-

file in terms of the hot spot’s peak temperature T

above Tref. Temperature distribution decreased from

the central inverters to the side inverters. We consid-

ered eight inverters on the chain at the following

respective temperature increments: 1/4T, 1/2T, 3/4T,

T, T, 3/4T, 1/2T, 1/4T. Note that the sum of all temper-

ature values is 5T, the same as the sum of all temper-

ature increments for the nongradual case with five

inverters at temperature Tref + T.

Figure 7 compares skew results obtained for the non-

gradual and gradual cases, showing that skew is almost

identical in the two cases. This suggests that we can

compute the additional delay of n buffers (Dn), each at

temperature Ti as

Special ITC Section

420 IEEE Design & Test of Computers

Table 2. Comparison of buffered clock tree designs.

Design Mean Mean Sigma Maximum

style delay (ps) skew (ps) skew (ps) skew (ps)

A 403 15.77 4.36 32.6

B1 (C = 0) 403 0.17 0.23 1.2

B2 466 1.054 0.39 2.6

−3E-11

−1E-11

1E-11

3E-11

5E-11

7E-11

9E-11

−20 −10 0 10 20 30 40 50Hot spot temperature increment (ºC)

Ske

w (

s)

Without gridWith ideal grid without parasitic capacitancesWith grid including parasitic capacitances

Figure 6. Impact of a hot spot on skew.

Measurements used 27°C as the reference

temperature with buffers biased at nominal

supply voltage.

−2E-03

0E+00

2E-03

4E-03

6E-03

8E-03

−10 0 10 20 30 40 50Trise (ºC)

Ske

w (

ns)

GradualNongradual

Figure 7. Skew due to five inverters at a hot spot

of temperature T in an eight-inverter chain

(nongradual), and eight inverters at

temperatures 1/4T, 1/2T, 3/4T, T, T, 3/4T, 1/2T,

and 1/4T, respectively (gradual).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 94: 105.100.000 DesignnTest ESL Special Edition Complete

where Tref is a reference temperature.

Therefore, the skew between two different clock

sinks i and j is proportional to

If the sum is performed for all tree stages, Ti,k is the junc-

tion temperature of the k stage in the path from the root

to sink i, and Tj,k is the junction temperature of the k

stage in the path from the root to sink j.

Finally, from our comparison of buffered and non-

buffered clock trees, we conclude the following:

■ In buffered trees, skew is less dependent on the hot

spot’s position in the tree.

■ The relative impact of thermal gradients on skew with

respect to parameter-variation-induced skew is greater

in buffered clock trees than in nonbuffered trees.

Temperature impact on testingOperating frequency and circuit activity are the two

main factors that determine a circuit’s active power and,

therefore, contribute to nonuniformities in junction tem-

perature distribution. Active power increases almost lin-

early with operating frequency, but circuit activity’s

effect on relative temperature at different operating fre-

quencies has not been investigated. This issue is impor-

tant in comparing a circuit’s relative temperature

increase during normal and test modes.

Typically, a circuit working in normal mode operates

at its maximum frequency, but only a small fraction of its

internal blocks are active. Designers determine power

constraints for normal circuit operation, usually assum-

ing that random logic blocks will have about 20% to 30%

of activity with respect to the clock signal. On the other

hand, circuit activity is substantially higher in test mode

than in normal operating mode, although the effective

operating frequency is much lower because test stimuli

must be scanned in and out through DFT structures. Such

switching activity increases the device’s overall energy,

peak power, and average power consumption. The result-

ing elevated average power will affect the chip’s tem-

perature distribution, and might not only increase overall

chip temperature but also increase hot-spot apparition.

Figure 8 compares relative temperature increase

with internal circuit activity at two different operating

frequencies (50% and 90% of maximum frequency) for

a circuit constructed from a 7 × 7 array of c432 ISCAS

benchmark circuits. We obtained these results with

Rosselló et al.’s thermal and power computation mod-

els.11 We used the resulting power map density to obtain

a thermal map and calculate temperature increase. The

results show that we can obtain a relatively equal junc-

tion temperature increase by running the circuit near

full speed with a typical circuit activity of 20% (normal

mode), or at half speed with increasing activity to about

80%. Therefore, since we can achieve similar thermal

levels during normal and test mode operations, it is

worthwhile to investigate the effect of thermal maps on

delay during test mode. Our results are in line with other

work showing the relative impact of increased power

dissipation during test mode.12

Researchers have proposed strategies for limiting

test-induced power excess by controlling either peak or

average power. Some propose a proper selection of test

vectors to reduce power dissipation and energy con-

sumption while achieving high fault coverage. Many of

these techniques rely on power-constrained test-sched-

uling algorithms and focus on reducing or maintaining

circuit power consumption within safe operating mar-

gins. These methods don’t pursue uniform power dis-

tribution over the die and therefore don’t guarantee a

uniform thermal map.

We have explored a possible method for avoiding

the delay impact of artificially created thermal maps

due to test activity and the consequent masking of test

results. Bellaouar et al. have shown that the rate of dri-

D D T Ti j i k j k

k

− ∝ −( )∑ , ,

D T Tn i

i

n

∝ −( )=

∑ ref

1

421September–October 2006

0

2

4

6

8

10

10 18 26 34 42 50 58 66 74 82 90 98Activity (%)

Tem

pera

ture

incr

ease

(ºC

) 90% fmax50% fmax

Figure 8. Temperature increase versus activity,

while controlling inputs of an array of

independent logic circuits.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 95: 105.100.000 DesignnTest ESL Special Edition Complete

ver resistance variations due to temperature fluctuations

is strongly dependent on power supply voltage, and that

an optimum bias voltage (VDDopt) minimizes these vari-

ations.13 We have proposed a dual-supply-voltage clock

tree to reduce skew related to temperature gradients.14

Figure 9 shows such a tree. The high-to-low converter

(HLconverter) is a buffer that converts the incoming

clock signal to the chip from a standard swing to a lower

voltage swing. The HLconverter’s structure is relatively

straightforward. To convert the clock swing from the

standard voltage range to a lower voltage range, we use

a conventional buffer driven by supply voltage VDDopt.

The clock signal is then transmitted on the chip as a low-

voltage signal. At the utilization points at the sink flip-

flops, the low-to-high converter (LHconverter) converts

the signal using the block to the higher voltage swing,

which is the voltage used by the logic network. The

LHconverter’s structure is more involved; some design

examples appear in other works.15

We performed a simulation experiment on a 130-nm

technology to test a multiple-supply-voltage scheme that

uses a bias supply selected to compensate for tempera-

ture-related effects. As Figure 10 shows, changing the

clock buffers’ supply voltage from VDD to VDDopt signifi-

cantly reduced total skew.

A related advantage of this clock scheme is a reduc-

tion in power consumption. However, side effects such

as noise on the supply network could be significant.

Also, an increased impact of process parameter varia-

tions on delay at the reduced supply voltage16 could

compromise the compensation effects.

Our results suggest an alternative clock skew opti-

mization approach: introducing a clock grid in the

Special ITC Section

422 IEEE Design & Test of Computers

−1.0E-11

−5.0E-12

0.0E+00

5.0E-12

1.0E-11

1.5E-11

2.0E-11

2.5E-11

3.0E-11

−10 0 10 20 30 40 50Temperature increase (ºC)

Ske

w (

s)

VDD = 1.2 VVDDopt = 0.8 V

Figure 10. Skew versus temperature increase

for a three-level buffered clock tree without

grid, biased at nominal supply voltage VDD and at

VDDopt for various hot-spot temperatures.

VDDopt region VDD region

HLconverter LHconverter

Figure 9. Dual-voltage clock scheme.

5.0E-04

1.0E-03

1.5E-03

2.0E-03

2.5E-03

3.0E-03

3.5E-03

Ave

rage

pow

er p

er c

ycle

(W

)

−1.0E-11

−5.0E-12

0.0E+00

5.0E-12

1.0E-11

1.5E-11

2.0E-11

0.7 0.8 0.9 1.0 1.1 1.2VDD

Ske

w (

HS

= 1

0ºC

)

With gridWithout grid

Figure 11. Isopower skew improvement gained

by using clock grid design for low-voltage

section of clock tree shown in Figure 9.

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 96: 105.100.000 DesignnTest ESL Special Edition Complete

clock region polarized at VDDopt during testing. This

option would minimize the impact of process parame-

ter variations and noise on clock distribution. It would

also optimize power dissipation because the clock dis-

tribution circuitry would not contain a grid mesh—

avoiding mesh-based architectures’ power penalty of

about 40%.5

To verify this optimization, we compared average

power per cycle dissipated by the clock tree distribution

and the resulting clock skew in two designs: a design

without a grid and a design with a grid on the portion of

clock distribution at the reduced optimal supply voltage.

Figure 11 shows the results, which confirm the benefit

in overall skew reduction for an isopower comparison

between these two design alternatives. The isopower

scheme refers to the comparison between the two

designs. The upper graph in Figure 11 is used to set the

power limit (2.1 mW in this case), thus the horizontal

line intersects two points from each curve in this graph

being at the same power value. The intersection of the

horizontal curve (iso power) with the supply voltage of

the “With Grid” design determines the supply voltage of

the clock tree; the lower graph in Figure 11 provides the

value of the skew reduction. The isopower scheme does-

n’t provide the full gain achievable because the reduced

supply voltage provided by the isopower requirement

(around 0.97 V) is slightly beyond the optimal supply

voltage for this technology (0.8 V). Nevertheless, skew

decreased from 12 ps to less than 1 ps.

CLOCK SKEW has as much impact on overall paramet-

ric yield as any propagation delay. Large clock skews

can cause timing violations because of the erosion in

setup or hold times. Researchers have reported that

process parameter variations, parasitics, and noise

effects such as crosstalk affect the delay of each clock

tree branch. We have shown that temperature gradients

can also be an important source of clock skew, causing

spatially correlated variations.

Nonbuffered and buffered clock-tree networks

respond differently to nonuniform thermal maps. In

nonbuffered trees, a hot spot’s relative location in the

tree structure has a high impact on overall thermal

skew. The clock network distribution temperature is dif-

ficult to evaluate because network parasitics come from

resistive components distributed in different metal lay-

ers and at different levels from those of the main power

sources. In buffered trees, the main contributions to

skew are differences in clock tree buffer delay, even if

the overall delay magnitude is interconnect dominat-

ed. In this case, the hot spot’s relative position has much

less impact than in nonbuffered trees. Interestingly, we

have also observed that in buffered trees the hot spot’s

impact on delay can be quantified without computing

the hot spot’s exact thermal spatial profile with respect

to the buffers. This might significantly affect future CAD

tool development.

Our results show the importance of having a tem-

perature-aware clock tree design. The combination of

cross-link insertion and multiple-supply-voltage clock

schemes is likely to provide the best trade-off between

skew reduction and power-conscious design. ■

AcknowledgmentsThis work was partially supported by the Spanish

Ministry of Science and Technology, the Regional

European Development Fund under EU project

TEC2005-05712/MIC, and Intel Research Labs.

References1. S. Borkar et al., “Parameter Variations and Impact on

Circuits and Microarchitecture,” Proc. 40th Design

Automation Conf. (DAC 03), ACM Press, 2003, pp. 338-

342.

2. B. Lu et al., “Process Variation Aware Clock Tree Rout-

ing,” Proc. Int’l Symp. Physical Design (ISPD 03), ACM

Press, 2003, pp. 174-181.

3. G.E. Tellez and M. Sarrafzadeh, “Minimal Buffer Inser-

tion Clock Trees with Skew and Skew Rate Constraints,”

IEEE Trans. Computer-Aided Design of Integrated Cir-

cuits and Systems, vol. 16, no. 4, Apr. 1997, pp. 333-

342.

4. G. Bai, S. Bobba, and N. Hajj, “Static Timing Analysis

Including Power Supply Noise Effect on Propagation

Delay,” Proc. 38th Design Automation Conf. (DAC 01),

ACM Press, 2001, pp. 295-300.

5. C. Yeh et al., “Clock Distribution Architectures: A Com-

parative Study,” Proc. 7th Int’l Symp. Quality Electronic

Design (ISQED 06), IEEE Press, 2006, pp. 85-91.

6. M. Mori et al., “A Multiple Level Network Approach for

Clock Skew Minimization with Process Variations,” Proc.

Asia South Pacific Design Automation Conf. (ASP-DAC

04), ACM Press, 2004, pp. 263-268.

7. A. Deutch et al., “On-Chip Wiring Design Challenges for

Gigahertz Operation,” Proc. IEEE, vol. 89, no. 4, Apr.

2001, pp. 529-555.

423September–October 2006

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 97: 105.100.000 DesignnTest ESL Special Edition Complete

8. Y. Chen and D. Wong, “An Algorithm for Zero-Skew

Clock Tree Routing with Buffer Insertion,” Proc.

European Design and Test Conf. (ED&TC 96), IEEE

Press, 1996, pp. 230-236.

9. A. Rajaram, J. Hu, and R. Mahapatra, “Reducing Clock

Skew Variability via Cross Links,” Proc. 41st Design

Automation Conf. (DAC 04), ACM Press, 2004, pp. 18-23.

10. C.K. Cheng et al., Interconnect Analysis and Synthesis,

Wiley InterScience, 2000.

11. J. Rosselló et al., “A Fast Concurrent Power-Thermal

Model for Sub-100 nm Digital ICs,” Proc. Design

Automation and Test in Europe (DATE 05), vol. 1, IEEE

Press, 2005, pp. 206-211.

12. E. Larsson and Z. Peng, “Power-Aware Test Planning in

the Early System-on-Chip Design Exploration Process,”

IEEE Trans. Computers, vol. 55, no. 2, Feb. 2006, pp.

227-239.

13. A. Bellaouar et al., “Supply Voltage Scaling for Tempera-

ture Insensitive CMOS Circuit Operation,” IEEE Trans.

Circuits and Systems II, vol. 45, no. 3, Mar. 1998, pp.

415-417.

14. S. Bota et al., “Within Die Thermal Gradient Impact on

Clock-Skew: A New Type of Delay-Fault Mechanism,”

Proc. Int’l Test Conf. (ITC 04), IEEE Press, 2004, pp.

1276-1284.

15. J. Pangjun and S. Sapatnekar, “Low-Power Clock Distri-

bution Using Multiple Voltages and Reduced Swings,”

IEEE Trans. Very Large Scale (VLSI) Integration

Systems, vol. 10, no. 3, June 2002, pp. 309-318.

16. S. Bota et al., “Low VDD vs. Delay: Is It Really a Good

Correlation Metric for Nanometer ICs?” Proc. 24th VLSI

Test Symp. (VTS 06), IEEE Press, 2006, pp. 358-363.

Sebastià A. Bota is an associateprofessor in the Electronic Technolo-gy Group of the University of theBalearic Islands, Palma de Mallorca,Spain. His research interests include

very large-scale integration design and test and low-temperature CMOS design. Bota has a PhD in physicsfrom University of Barcelona in Spain.

Josep L. Rosselló is an associateprofessor in the Electronic Technolo-gy Group of the University of theBalearic Islands. His research inter-ests include device and circuit model-

ing, very large-scale integration design and test, andlow-temperature CMOS design. Rosselló has a PhD inphysics from the University of the Balearic Islands.

Carol de Benito is an associateprofessor in the Electronic TechnologyGroup of the University of the BalearicIslands. Her research interests includedevice and circuit modeling and low-

temperature CMOS design. De Benito has an MS inphysics from the University of the Balearic Islands.

Ali Keshavarzi is a research scien-tist at Circuit Research Laboratories(CRL) of Intel. His research interestsinclude low-power/high-performancecircuit techniques and transistor

device structures for future generations of micro-processors. He has a PhD in electrical engineeringfrom Purdue University.

Jaume Segura is an associate pro-fessor in the Electronic TechnologyGroup of the University of the BalearicIslands. His research interests includedevice and circuit modeling and very

large-scale integration design and test. Segura has aPhD in physics from the Polytechnic University ofCatalunya.

Direct questions and comments about this articleto Sebastià A. Bota or Jaume Segura, Electronic Tech.Group, Univ. Illes Balears, Cra. Valldemossa, km. 7.5,07122 Palma de Mallorca, Spain; [email protected], [email protected].

Special ITC Section

424 IEEE Design & Test of Computers

Memberssave 25%

on all conferences sponsored bythe IEEE Computer Society.

Not a member?Join online today!

www.computer.org/publications/

Memberssave 25%

on all conferences sponsored bythe IEEE Computer Society.

Not a member?Join online today!

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.

Page 98: 105.100.000 DesignnTest ESL Special Edition Complete

September–October 2006 1

UPCOMING TTTC EVENTS

12th International Workshop on ThermalInvestigations of ICs and Systems

27-29 September 2006Nice, Francehttp://tima.imag.fr/conferences/therminic/Therminic workshops are offered annually to address the

essential thermal questions of microelectronic microstructures,and of electronic parts in general. This year’s workshop dis-cusses issues in thermal simulation, monitoring, and cooling.

21st IEEE International Symposium onDefect and Fault Tolerance in VLSI

Systems (DFT 06)4-6 October 2006Arlington, Va.http://netgroup.uniroma2.it/DFT06/cfp.htmlDFT provides an open forum for discussing defect and fault

tolerance in VLSI systems, including emerging technologies.Topics include all aspects of design, manufacturing, test, relia-bility, and availability affected by defects during manufacturingor by faults during system operation.

International Test Conference (ITC 06)24-26 October 2006Santa Clara, Calif.http://www.itctestweek.org/ITC is the world’s premier conference on the electronic test of

devices, boards, and systems. It covers the complete cyclefrom design verification, test, diagnosis, and failure analysis toprocess and design improvement. At ITC, test and design pro-fessionals can confront the challenges the industry faces andlearn how academia, design-tool and equipment suppliers,designers, and test engineers address these challenges.

IEEE International Workshop on Current &Defect Based Testing (DBT 06)

26-27 October 2006Santa Clara, Calif.http://www.cs.colostate.edu/~malaiya/dbt.html

To develop more appropriate fault models, designers and testengineers must have a good handle on both systematic andrandom defect mechanisms to support the manufacturability ofICs for defect-based test approaches. Because of increasingdesign complexity and process variability, the focus is shifting tosuch approaches. This workshop addresses these issues.

First IEEE International Design and TestWorkshop (IDT 06)

19-20 November 2006Dubai, United Arab Emirateshttp://www.tttc-idt.org/index_files/IDT.CFP.06.pdfThis event provides a unique forum in the Middle East and

Africa region for researchers and practitioners of VLSI design,test, and fault tolerance to discuss new research ideas andresults. IDT will run in conjunction with the annual Innovationsof IT Conference and in parallel with Global IT Exhibitions(GITEX).

7th International Workshop onMicroprocessor Test and Verification

4-5 December 2006Austin, Texashttp://mtv.ece.ucsb.edu/MTVThis workshop brings together researchers and practitioners

from verification and test to discuss today’s difficult challengesin the processor and SoC design environments. It’s the idealenvironment for joint test and verification experiences and inno-vative solutions.

NEWSLETTER EDITOR’S INVITATIONI’d appreciate input and suggestions about the newsletter

from the test community. Please forward your ideas, contribu-tions, and information on awards, conferences, and workshopsto Bruce C. Kim, Dept. of Electrical and Computer Engineer-ing, Univ. of Alabama, 317 Houser Hall, Tuscaloosa, AL 35487-0286; [email protected].

Bruce C. KimEditor, TTTC Newsletter

CONTRIBUTIONS TO THIS NEWSLETTER: Send contributions to Bruce C. Kim, Dept. of Electrical and Computer Engineering, Univ. of Alabama,

317 Houser Hall, Tuscaloosa, AL 35487-0286; [email protected]. For more information, see the TTTC Web page: http://tab.computer.org/tttc/.

TEST TECHNOLOGY TCNEWSLETTER

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:13 UTC from IEEE Xplore. Restrictions apply.

Page 99: 105.100.000 DesignnTest ESL Special Edition Complete

Book Reviews

IN THE November-December 2004

Last Byte, I bemoaned the fact that

design has become so complex that no

one person can understand all of it, and

that EDA tools have become so diverse

and complicated that we confine our-

selves to a small subset of their function-

ality. The massive book under review

here, Electronic Design Automation

for Integrated Circuits Handbook, repre-

sents the best way I know to address this

problem.

This two-volume set contains 49 articles on EDA,

ranging from high-level design to technology CAD. The

first volume, EDA for IC System Design, Verification, and

Testing, has five sections: An introductory section out-

lines and summarizes the design process. A section on

system-level design discusses modeling languages,

processor and system modeling, performance metrics,

and system-level power management. The microarchi-

tectural design section describes performance estima-

tion, power management, and design planning at this

level. Six chapters on logic verification cover design and

verification languages, and various verification meth-

ods. The final section, on test, focuses on DFT, test gen-

eration, and analog test.

The second volume, EDA for IC Implementation,

Circuit Design, and Process Technology, focuses on the

second part of the IC design flow. It includes sections

on synthesis, place and route, analog and mixed-signal

design, physical verification, and technology CAD.

Chapters within these sections cover topics such as syn-

thesis, power management at all levels, design rule

checking, design for manufacturability, timing analysis,

noise analysis, and libraries.

I confess that I did not read all the chapters in this

book, wishing to complete the review before we move to

biochips. In fact, when I first received this text, I was cer-

tain that I’d never finish it. But I’m not so sure now. I have

already read more chapters than I’d originally intended,

and I think the reason for this explains why this hand-

book is a success. In most cases, the material covers the

important points without going into so much detail or

length as to be intimidating. Chapters range from seven

pages to 33 pages, with an average of 15 to 20, each

including an extensive list of references. This seemed just

right for the surveys making up this handbook. You can-

not completely learn EDA from a book like this, of course,

but you can learn quite a lot about EDA.

There are three types of chapters throughout the

book. Some are introductory in nature, surveying a

topic such as design flow at a high level. Some target

EDA users, showing the types of tools that are available

and putting them into context. Others target EDA devel-

opers, describing the algorithms underlying the tools,

with information on the benefits of each. Some subjects

are covered from several angles. Most subjects could

be, of course, but that would balloon this work into

three or four volumes. For the most part, I was happy

with the choice of angle; only in a few cases, such as the

chapter on design rule checking, would I have preferred

a more user-oriented approach.

A comprehensive EDA handbookScott Davidson

Sun Microsystems

426 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

Reviewed in this issueElectronic Design Automation

for Integrated Circuits Handbook,edited by Louis Scheffer, LucianoLavagno, and Grant Martin (CRCPress, 2006, ISBN 0-849-33096-3,2 vols., 1152 pp., $149.95).

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:23 UTC from IEEE Xplore. Restrictions apply.

Page 100: 105.100.000 DesignnTest ESL Special Edition Complete

One danger of a handbook approach is repetition,

as important subjects tend to get covered more than

once. However, I found very little redundancy in this

book. There was practically none in the sections on test.

The editors must have done an excellent job reviewing

chapter outlines.

The most important thing, though, is how good the

individual chapters are. So, I will give my impressions of

some of the ones I read. The first full chapter is “The

Integrated Design Process and Electronic Design

Automation.” This chapter starts a bit abruptly, but quick-

ly progresses into an excellent overview of the design

process. I’d recommend it to everyone who reads these

volumes. I wish it had some pointers to subsequent chap-

ters, however. Chapter 5, “SoC Block-Based Design and

IP Assembly,” is an excellent tutorial focusing on real

issues, especially in the area of verification.

Chapter 8, “Processor Modeling and Design Tools,”

provides a taxonomy and survey of architecture descrip-

tion languages (ADLs). The taxonomy is excellent,

describing very clearly what ADLs are and what they are

not. However, I would have liked to see more of an

industrial focus in the survey. This chapter had a bit

more of an academic slant than most of the others. The

chapter on “Design and Verification Languages,” on the

other hand, covers commercially available languages

with excellent examples. An outline gives the salient

points of each language, with strong points and weak

points, and includes a taste of how to do coding in each

language. At 28 pages, this is one of the longer chapters,

but hardly a word is wasted.

There are three chapters on test, two of which I’d like

to discuss here. Chapter 21, “Design-for-Test,” is one of

the longest (35 pages) in the book. It contains even

more text than the page count indicates, in fact,

because it includes absolutely no figures or diagrams.

This chapter covers a lot, from the objectives and his-

tory of DFT; through scan, BIST, and compression for

logic testing; to memory test. It ends with a short section

on FPGA test (which could easily have been cut). The

reader of this chapter might have a hard time distin-

guishing which of these concepts are truly important

and which are minor. In addition, there is often too

much detail for an introductory survey. For example,

there is almost an entire page in the section on logic

BIST about structural dependencies and scan chain

lengths. These are issues, but they could have been

eliminated to make the chapter shorter and more

readable. The chapter on “Automatic Test Pattern

Generation” is more developer oriented, with a survey

of ATPG algorithms. It’s somewhat academic, with a

large section on Boolean satisfiability (SAT) solvers for

test generation, but this is balanced by an excellent sec-

tion on applications for ATPGs beyond test generation.

The chapter on “Logic Synthesis” (in volume 2) is 15

pages and has 11 references. It gives a very high-level

view of a well-known subject. I think the author was

right to avoid trying to cover all aspects of this area in

depth, instead pointing the reader to places for further

study. Chapter 6, “Static Timing Analysis,” is one of the

best chapters I read. It is at the right length and depth,

and it provides helpful pseudocode for the major algo-

rithms discussed. Chapter 9, “Exploring Challenges of

Libraries for Electronic Design,” considers not cell

libraries, but IP libraries. At eight pages, it is very short

and superficial. The last three subsections are basical-

ly only outlines.

The last chapter I want to highlight is the one on

“Design Databases.” This chapter is excellent. It targets

users, but displays a deep knowledge of the implemen-

tations of design databases. It is also very readable.

EVERY DESIGN GROUP should have a copy of this

handbook in its library. It is an excellent reference text.

It can also serve as outstanding background reading

for new engineers exposed to some of these areas for

the first time. The material here is better organized and

better written than what could be found on the Web.

Putting together such a high-quality, substantive work

is quite an achievement. I’ll be reading more chapters

for quite some time to come. ■

Direct questions and comments about this depart-ment to Scott Davidson, Sun Microsystems, 910Hermosa Court, M/S USUN05-217, Sunnyvale, CA94085; [email protected].

For further information on this or any other computing

topic, visit our Digital Library at http://www.computer.org/

publications/dlib.

September–October 2006 427

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:23 UTC from IEEE Xplore. Restrictions apply.

Page 101: 105.100.000 DesignnTest ESL Special Edition Complete

Standards

AT A RECENT IEEE Design Automation Standards

Committee (DASC) meeting (http://www.dasc.org/

meetings/2006-07/20060727_DASC_Minutes.doc), we

discussed two interesting standardization proposals:

Rosetta and Esterel version 7. Both are based on tech-

nology that has been under development for a long

time, and both target the formalization of system-level

design and verification. But, otherwise, they take very

different approaches. We hear a lot of talk about move-

ment to more abstract design paradigms. Are these

proposals confirmation of this trend, or are they yet

another false start? Read these brief outlines, and

decide for yourself.

The Rosetta languageSystem-level design involves consolidating informa-

tion from multiple domains to predict the effects of

design decisions. To support system-level design, a lan-

guage must allow heterogeneous specification while

providing mechanisms to compose information across

domains.

The goal of the Rosetta system-level design language

(http://www.sldl.org/standards.htm) is to compose het-

erogeneous specifications in a single semantic envi-

ronment. Rosetta provides modeling support for

different design domains, employing semantics and syn-

tax appropriate for each. Thus, Rosetta lets designers

write individual specifications with semantics and

vocabulary appropriate for their domains. Users com-

pose information across specification domains by defin-

ing interactions between them. To achieve this end,

Rosetta provides a collection of domains, called facets,

for describing system models. Interactions provide a

mechanism for defining constraints between domains.

Facets define system models from one engineering

perspective. Users can write facets by extending a

domain that provides vocabulary and semantics for the

model. Using the design abstractions that its domain

provides, a facet describes a system’s requirements,

behavior, constraints, or function. Domains provide

vocabulary and semantics for defining facets. Each

domain provides mechanisms for describing data, com-

putation, and communication models appropriate for

one area of systems design.

Interactions define how information from one engi-

neering domain is reflected in another. Domains don’t

share a common set of semantics, but rather share infor-

mation when necessary using interactions. Thus,

Rosetta defines each design facet by using appropriate

design abstractions from that facet’s domain rather than

forcing a common design model across all facets.

Facet algebra expressions use facets, domains, and

interactions to compose models into system descrip-

tions. Users can evaluate local design decisions from a

systems perspective by using interactions to understand

how these decisions impact other system domains.

Work on Rosetta is ongoing, with this Web site serv-

ing as a clearinghouse for language definition and

usage information. The various Web pages provide def-

inition and tutorial documents, as well as examples and

standardization information.

The Esterel languageEsterel (http://www.esterel-technologies.com) is a for-

mal synchronous language for unambiguously specify-

ing and implementing hardware and software embedded

systems. Esterel was initially developed in academia, with

strong cooperation by industrial users. The Esterel devel-

oper community has developed the current Esterel ver-

sion 7 language as a proposed standard. The developers

derived this version from the previous Esterel v5 acade-

mic version by adding new features necessary for hard-

ware design. Because of the formal character of the

language and semantic kernels, you can fully and faith-

DASC sees moves toward formality in designVictor Berman

Cadence Design Systems

428 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:25 UTC from IEEE Xplore. Restrictions apply.

Page 102: 105.100.000 DesignnTest ESL Special Edition Complete

fully translate Esterel programs either to hardware circuit

descriptions written in conventional hardware descrip-

tion languages (HDLs) or to equivalent conventional soft-

ware programs, with the very same behavior in both

cases. It’s also possible to translate Esterel programs to

input for formal-verification systems (for example, model

checkers) so that verified properties will be guaranteed

to hold in hardware and software implementations.

The proposed project will create an initial IEEE stan-

dard based on Esterel v7, ensuring unambiguous defi-

nition of the language syntax and semantics and,

therefore, full interoperability between Esterel-based

program implementation, static analysis, and verifica-

tion tools. The output of the project will be the standard

Esterel Language Reference Manual.

This project’s purpose is to provide the EDA, semi-

conductor, and systems-design communities with a well-

defined, official IEEE definition of the Esterel language.

This is necessary because Esterel is not a minor variant

of existing languages that could be defined with an

addendum to existing standards. Rather, Esterel is

unique in the way it formally merges sequencing as typ-

ically only software languages do, uses single-clock or

multiclock concurrency as typically only HDLs do, and

employs unique temporal primitives that drive the life

and death of activities within programs. Esterel also sup-

ports formal definition of data paths based on arbitrary

precision and exact arithmetic, bit vectors, and arrays of

arbitrary dimensions and types. These language primi-

tives facilitate, by at least one order of magnitude, the

expression of complex behavior, providing the user with

unmatched clarity and productivity for specification,

design, and verification activities. Esterel lets you obtain

equivalent hardware and software targets from a single

source, so hardware simulation using software is more

tenable. Esterel also lets you perform late choices

between hardware and software final implementation.

The key technical objective is to stabilize and fully

define the language’s syntax and semantics. The tech-

nical aspects to be scrutinized concern the data path’s

arbitrary-precision and exact-arithmetic features, the

temporal statements particular to Esterel, and the life

and death of activities and signals. I have no doubt that

developers can solve all the involved questions in a

completely rigorous way, thus providing a fully solid

basis for both users and tool builders, and ensuring full

interoperability between tools from diverse origins.

A derived objective is to ensure that it’s possible to

effectively compile a given Esterel design to other stan-

dardized languages such as VHDL, Verilog, SystemVerilog,

C, and SystemC, with the same guaranteed behavior for

all these different targets. This will require checking that

all Esterel constructs are synthesizable in hardware or soft-

ware, up to well-identified limitations of back-end syn-

thesis or compilation tools. ■

Direct questions and comments about this depart-ment to Victor Berman, Cadence Design Systems, 270Billerica Road, Chelmsford, MA 01824; [email protected].

September–October 2006 429

Get accessto individual IEEE Computer Society

documents online.

More than 100,000 articles

and conference papers available!

US$9 per article for members

US$19 for nonmembers

http://computer.org/publications/dlib/

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:25 UTC from IEEE Xplore. Restrictions apply.

Page 103: 105.100.000 DesignnTest ESL Special Edition Complete

CEDA Currents

A Conversation with Robert BraytonOn the occasion of Robert

Brayton receiving the 2006 EDAA

(European Design and Automation

Association) Lifetime Achievement

Award and the 2006 IEEE Emanuel

R. Piore Award, Karti Mayaram

from CEDA Newsletter spoke to

him about his career, achieve-

ments, and moments of inspira-

tion. Brayton also had some practical advice for young

researchers.

It was a pleasure talking with Bob Brayton. All of us

who have been affiliated with the EDA field are well

aware of the many fundamental contributions he’s

made. His impact on the industry has been tremendous.

The early yearsBob grew up in Ames, Iowa, and attended Iowa State

University, where he graduated with a BS in electrical

engineering in 1956. After a 6-month stint in the US

Army, he went to MIT to pursue a PhD in mathematics.

He chose math because he thought he lacked mathe-

matical foundations and he had a strong interest in this

field. Bob believes this unique combination of an

undergraduate EE degree and a PhD in math has been

a major contributor to his success. He not only has a

good understanding of the application area but also a

strong foundation in mathematical tools to solve rele-

vant problems.

A year before completing the PhD program, Bob

accepted a summer job at IBM Research (T.J. Watson

Research Center) in the Mathematical Sciences

Department. This was such a wonderful experience that

he joined the department upon completing his PhD.

Looking back, he realizes that both MIT and IBM

Research were very influential in his life. He had the

opportunity to work with exceptional people. Moreover,

at IBM Research he had freedom to work on the

research topics that most interested him. This combi-

nation of wonderful colleagues and flexibility to pursue

appealing subjects helped shape Bob’s career.

Back to schoolAfter spending 26 years at IBM Research, Bob start-

ed a second career as a professor in the Electrical

Engineering and Computer Sciences Department at the

University of California, Berkeley, in 1987. He had spent

a year at UC Berkeley on sabbatical from IBM Research

in 1985, during which he had worked with some very

talented students on logic synthesis and the develop-

ment of industrial-quality tools. When he returned in

1986, IBM Research was offering early retirement, which

he decided to accept, turning his sights toward acade-

mia. When UC Berkeley—his top choice—made him

an offer, he accepted, and he has continued to make

important contributions in logic synthesis.

The ‘Aha!’ momentThe most thrilling moment in his career was the

development of the Sparse Tableau Approach (STA) for

assembling and solving circuit equations. He and his col-

league, Gary Hachtel (now a professor at the University

of Colorado in Boulder), were having a conversation

after a game of tennis. They’d been thinking about an

elegant solution for assembling circuit equations for

some time. Suddenly, all the pieces of the puzzle were

falling into place. This was the start of STA. Before their

work, circuit equation assembly required different kinds

of manipulations and reductions. STA provided a sim-

ple way to assemble electrical-circuit equations. There

was no need for reducing equations; with STA, you

could directly apply Gaussian elimination. This work

was one of the cornerstones of IBM’s circuit simulator,

Astap (Advanced Statistical Analysis Program).

Some thoughts on EDA developmentsI also asked Bob what he thought were the most

exciting developments in EDA. He said there is always a

progression of things, and newer developments over-

shadow some of the developments of the past. But he

named two topics that he saw as step functions. One

was binary decision diagrams. BDDs provided a way for

efficiently manipulating large logic equations, and they

proved important for logic synthesis and verification.

The other topic was the work on solving stiff differential

430 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:29 UTC from IEEE Xplore. Restrictions apply.

Page 104: 105.100.000 DesignnTest ESL Special Edition Complete

equations, performing equation assembly and solution,

and integrating these techniques as packages—such as

the Astap and Spice circuit simulators—for solving cir-

cuit problems.

What lies aheadBob named deep-submicron and nanometer design

as the greatest challenges facing EDA. We no longer

have the luxury of working on independent problems

that can be solved separately. Electrical interference

and manufacturing variations make very low nanome-

ter CMOS design a difficult problem. Then there are the

new technologies that will require effective design tools.

Some practical adviceHis advice to young researchers in a challenging

funding environment is to work on relevant problems

and to keep putting out proposals. Being able to solve

relevant problems in interesting ways can be a big moti-

vator. This is what motivates Bob. He is able to identify

such problems and find interesting ways to solve

them—just like putting together the pieces of a puzzle.

The whole notion of fads driving research funding and

publications is not appealing to him. Such an approach

takes away resources from basic research. There should

be more emphasis on fundamental work.

Asked how he felt about receiving the two recent

awards, he replied, “surprised.” He also thanked the

people who took the time and effort to put together the

nominations for these awards. Bob is an extremely mod-

est and wonderful person who has made seminal con-

tributions to EDA. We all are happy to see him get the

recognition that he deserves.

Upcoming Research FundingOpportunities

US Department of DefenseExperimental and Theoretical Development of

Quantum Information Science

Deadline: 11 December 2006

http://www.arl.army.mil/main/Main/

DownloadedInternetPages/CurrentPages/

DoingBusinesswithARL/research/QC06Final6Jul06.pdf

National Science FoundationPower, Controls and Adaptive Networks (PCAN)

Deadline: 7 September - 7 October 2006

http://nsf.gov/funding/pgm_summ.jsp?pims_id=13380

Foundations of Computing Processes andArtifacts (NSF 06-585)Deadline: 10 October 2006

http://www.nsf.gov/pubs/2006/nsf06585/

nsf06585.htm

Upcoming CEDA Events

Please see these Web sites for upcoming events:

■ CODES+ISSS: http://www.esweek.org

■ Nano-Net: http://www.nanonets.org

■ FMCAD: http://www.cs.utexas.edu/users/hunt/FMCAD

■ ICCAD: http://www.iccad.com

■ PATMOS: http://www.patmos-conf.org

■ VLSI-SoC: http://tima.imag.fr/conferences/

VLSI-SoC06

CEDA Distinguished Speaker Reception

The Council’s Distinguished Speaker Series features

detailed presentations of the most significant research

results in EDA over the past year, as demonstrated by

awards at our top conferences and journals. The sec-

ond presentation in this series took place at the

Moscone Center in San Francisco during DAC 2006. The

featured article was by Janusz Rajski, Jerzy Tyszer, Mark

Kassab, and Nilanjan Mukherjee, the authors of this

year’s IEEE Transactions on Computer Aided Design

Donald O. Pederson Best Paper Award. Their presenta-

tion, which covered several aspects of VLSI testing, had

significant tutorial value and will be archived at the

Council’s Web site (http://www.c-eda.org).

CEDA Currents is a publication of the IEEE Coun-cil on Electronic Design Automation. Please sendcontributions to Kartikeya Mayaram ([email protected]) or Preeti Ranjan Panda ([email protected]).

September–October 2006 431

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:29 UTC from IEEE Xplore. Restrictions apply.

Page 105: 105.100.000 DesignnTest ESL Special Edition Complete

The Last Byte

THE 2006 INTERNATIONAL TEST CONFERENCE

theme encourages us to consider ways for “getting more

out of test.” How about getting more out of the

International Test Conference? Technical paper sessions

are the heart and soul of ITC—and there’s something

there for everyone, from classic microprocessor and ATE

sessions to delay, test compression, test power, and more.

But there are plenty of ways to get more out of ITC.

We’ve changed the structure of ITC and Test Week

(22-27 October) for this year’s new site, Silicon Valley.

The new format offers some great opportunities. First,

be sure to arrive in time for Monday’s test Q&A panel

(23 October), starting at 4:45 p.m. Come hear the

experts discuss diverse test topics unrehearsed. Remind

yourself that even the experts don’t have all the answers;

there’s still plenty to debate on every topic.

Get up the next morning to attend the Tuesday ple-

nary, which starts at 9:30 a.m. The plenary kicks off a

day specially organized to include material for those

who manage test. Don’t miss Tuesday afternoon’s exec-

utive test panel, which boasts an impressive array of

participants sharing unique perspectives on the cost of

quality. Watch the users of silicon debate the providers,

and find out their views on how we can get more out

of test. Afterward, enjoy the welcome reception, where

you can meet friends and colleagues and find out

which panelists’ perspectives they plan to take home

with them.

Be sure to schedule enough time to visit the exhibit

floor. How else can you improve your standing with

your children by bringing home all sorts of nifty gadgets

and at the same time find out about the latest offerings

from the key vendors in test-related fields? Don’t forget

the free lunch Tuesday, Wednesday, and Thursday on

the exhibit floor. If the line looks long, grab a few col-

leagues and discuss the latest developments you’ve

heard. Afterward, take advantage of an opportunity to

hear industry authorities address your favorite topic and

mine—test, of course—at each day’s invited address,

conveniently located adjacent to the exhibit hall.

ITC has only one regular panel slot this year, so be

sure not to miss it—and don’t forget the wine-and-

cheese party afterward. In addition to these treats, we

have our usual outstanding set of papers, so you can

learn what advances are on the way. We also have an

interesting lecture series, providing you with informa-

tion that you can take back to work to use right away.

This year marks ITC’s first visit to Silicon Valley. For

those who work in the area, this is the easiest ITC to

attend yet. Getting to the Santa Clara Convention Center

might be a shorter commute than going to work—and

parking is free.

I LOOK FORWARD to seeing you at ITC. Let’s learn some,

do some business, have some laughs, and get inspired to

get more out of test. You can find out all about ITC and

Test Week at http://itctestweek.org. ■

Anne Gattiker is a research staff member at IBMAustin Research Lab. Contact her at [email protected]

Direct questions, comments, and contributionsabout this department to Scott Davidson, SunMicrosystems, 910 Hermosa Court, M/S USUN05-217,Sunnyvale, CA 94085; [email protected].

Getting more out of ITCAnne Gattiker

IBM Austin Research Lab

432 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:31 UTC from IEEE Xplore. Restrictions apply.

Page 106: 105.100.000 DesignnTest ESL Special Edition Complete

Looking for accessible tutorials on software development, project management, and emerging technologies? Then have a lookat ReadyNotes, another new product from the IEEE Computer Society.

ReadyNotes are guidebooks that serve as quick-start references for busy computing professionals.

Available as immediately downloadable PDFs (witha credit card purchase), ReadyNotes sell for $19 or less.

Here now from the IEEE Computer Society

IEEE ReadyNotes

www.computer.org/ReadyNotes

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:34 UTC from IEEE Xplore. Restrictions apply.

Page 107: 105.100.000 DesignnTest ESL Special Edition Complete

IEEE DESIGN & TEST OF COM

PUTERSSeptem

ber-October 2006 Electronic System

-Level DesignVOLUM

E 23 NUMBER 5

CAREERACCELERATORFORUMAUTUMN 2006

improve your career

in the long run.Changing Lanes Webinars12 October 2006 Visit our online Conference Pavilion and

participate in our FREE LIVE WEBINARSon 12 October, featuring renowned career

specialists, educators and prominent tech-

science professionals. Get insight and advice

on advancing your degree, getting ahead in

your field, landing your dream job and even

changing your career. Plus, this interactive

event makes it possible for you to question

our expert panel — all from the convenience

of your desktop!

Interactive Exhibition Hall Visit our Online Exhibition Hall to view presen-

tations by leading colleges and universities.

Find key information on graduate and post-

graduate studies, take online campus tours,

get career counseling advice and download

admissions applications. Search at your

own pace 24/7.

Featuring panelists in the fields of Engineering, ComputerScience and MBA Education

Webinar 1: The Risks and Rewards of Entrepreneurship in Today’sTechnology EnvironmentTIME: 12:00 PM ET/9 AM PT/16:00 GMT

Topics and Speakers TBA

It worked for Bill Gates, Andy Grove, Irwin

Jacobs, and Scott McNally. Why not you? This

may be just the time to start your own business

or take an executive position at a start-up.

Experienced executives and experts in

entrepreneurship will tell you how to get started,

what to expect, and how to leverage education

and work experience to ensure success.

Webinar 2: Programming Your Future in Computer ScienceTIME: 1 PM ET/10 AM PT/17:00 GMT

Topics and Speakers TBA

Wireless Internet and security are currently the

“hot” technologies, and require a more diverse

level of skills in computer science. But there

are also opportunities for positions as systems

analysts for those with degrees in related fields.

Industry experts will tell you what employers

are looking for, how to match your training

and skills with the best jobs, and how to use

relevant work experience to your advantage.

Hit your stride at this innovative online event. Sign up today at

www.spectrum.ieee.org/caforum

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:35 UTC from IEEE Xplore. Restrictions apply.