test and characterization methodologies for advanced

146
HAL Id: tel-01808874 https://tel.archives-ouvertes.fr/tel-01808874 Submitted on 6 Jun 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Test and characterization methodologies for advanced technology nodes Darayus Adil Patel To cite this version: Darayus Adil Patel. Test and characterization methodologies for advanced technology nodes. Elec- tronics. Université Montpellier, 2016. English. NNT : 2016MONTT285. tel-01808874

Upload: others

Post on 02-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

HAL Id: tel-01808874https://tel.archives-ouvertes.fr/tel-01808874

Submitted on 6 Jun 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Test and characterization methodologies for advancedtechnology nodes

Darayus Adil Patel

To cite this version:Darayus Adil Patel. Test and characterization methodologies for advanced technology nodes. Elec-tronics. Université Montpellier, 2016. English. �NNT : 2016MONTT285�. �tel-01808874�

Délivré par Université de Montpellier

Préparée au sein de l’école doctorale I2S

Et de l’unité de recherche LIRMM

Spécialité : SYAM

Présentée par DARAYUS ADIL PATEL

Soutenue le 5 Juillet 2016 devant le jury composé de

M. Daniel CHILLET Professeur, Université de

Rennes

Président du Jury /

Rapporteur

M. Salvador MIR Directeur de Recherche

CNRS, TIMA

Rapporteur

Mme. Sylvie NAUDET Team Leader,

STMicroelectronics

Examinateur

M. Alberto BOSIO MCF HDR, Université de

Montpellier

Examinateur

M. Patrick GIRARD Directeur de Recherche

CNRS, LIRMM

Directeur de Thèse

M. Arnaud VIRAZEL MCF HDR, Université de

Montpellier

co-Directeur de

Thèse

Mme. Béatrice PRADARELLI PAST, Université de

Montpellier

Membre Invitée

TEST AND CHARACTERIZATION

METHODOLOGIES FOR ADVANCED

TECHNOLOGY NODES

TEST AND CHARACTERIZATION

METHODOLOGIES FOR ADVANCED

TECHNOLOGY NODES

Presented by

Darayus Adil Patel

STMicroelectronics

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier

Université de Montpellier

This dissertation is submitted for the degree of Doctor of Philosophy

July 2016

Dedicated to my Beloved Parents

and

The Digital Test Team at ST Crolles

i

ABSTRACT

The introduction of nanometer technologies, has allowed the semiconductor

industry to create nanoscale devices in combination with gigascale complexity.

However, new technologies bring with them new challenges. In the era of large systems

embedded in a single System-On-Chip and fabricated in continuously shrinking

technologies, it is important to test and ensure fault-free operation of the whole system.

The cost involved in semiconductor test has been steadily growing and testing

techniques for integrated circuits are today facing many exciting and complex

challenges. Although important advances have been made, existing test solutions are

still unable to exhaustively cover all types of defects in advanced technology nodes.

Consequently, innovative solutions are required to cope with new failure mechanisms

under the constraints of higher density and complexity, cost and time to market

pressure, product quality level and usage of low cost test equipment.

The work of this thesis is focused on the development of silicon test and

characterization methodologies that aid in the accurate detection and resolution of issues

that may arise due to variability, manufacturing defects, wear-out or interference. A

wide spectrum of these challenges has been addressed from a test perspective to ensure

that the availability of effective test solutions does not become a bottleneck in the path

towards further scaling. Additionally the advances and innovations introduced in the

myriad domains of electronic design, reliability management, manufacturing process

improvements etc. that call for the development of advanced, modular and agile test

methodologies have been effectively covered within the scope of this work.

This manuscript presents the significant contributions made for enabling

resolution of state of the art industrial test challenges via the design and implementation

of novel test strategies (targeting the 28nm FDSOI technology node) for:

· Detection & diagnosis of timing faults in standard cells.

· Analysis of Setup and Hold margins within silicon.

· Verification & reliability analysis of innovative test structures.

· Analysis of on-chip self heating.

· Enabling characterization and performance evaluation of high speed digital IPs.

ii

iii

ACKNOWLEDGEMENTS

“If I have seen further than others,

it is by standing upon the shoulders of giants.”

Issac Newton

First and foremost, I would begin by thanking my Lord, the Almighty, for

holding my hand and showing the path ahead throughout the course of my thesis. Only

due to His countless gifts of sense, intellect, patience, health, family and many more,

could I complete this task. Thank you for the strength you have given me and all the

wonderful people around me who made the past three years truly meaningful.

During my time at STMicroelectronics, I was extremely fortunate to have had

the opportunity to learn and grow under the tutelage of my super amazing ‘Maman’, Dr.

Sylvie Naudet during the development of this work. I would like to gratefully and

sincerely thank Sylvie for her care, understanding, patience and most importantly

friendship during the course of my thesis from inception to completion. Despite her

busy schedule, Maman always kept a watchful eye over me and provided me with her

invaluable guidance whenever needed. To synthesize, I consider myself truly privileged

to have had this opportunity to grow my intellectual prowess, sharpen my critical

thinking and understand the subtle nuances of the game of life under her umbrella.

I am very grateful to my Professors Patrick Girard, Arnaud Virazel & Alberto

Bosio for their guidance, support and regular feedback throughout the course of my

thesis. Also for their thought-provoking ideas and helpful criticism in improving the

quality of this manuscript and other publications. Working at LIRMM under their

supervision, helped me to understand and better appreciate the academic essence of any

project. I am thankful for their time, efforts and constant interaction all along. Their

invaluable advises helped keeping my research well directed while maintaining my

autonomy.

Perhaps I owe my largest quantum of gratitude to the truly awesome team that

I was lucky to be part of at ST Crolles. Thank you Daniel Noblet for being the huge

pillar of strength that you have been for me all throughout and the ginormous force of

positivity that you are. Thank you Fabien for being a great Mentor and friend. While

learning from you I have always been inspired by your attitude, zeal and enthusiasm

iv

towards any given task. Thank you Damien Croain & Daniel Caspar for always being

there to shine light upon any technical darkness that befell me. Your mere presence

gave me the confidence that the solution to any problem could be mined out of your

mountains of knowledge. Merci beacoup my ‘Queen’ Françoise for your camaraderie

and royal influence. Many thanks to David Meyer & David Bonciani for being the great

friends that you are to me. I shall always greatly appreciate and fondly remember all

your support and help. Thank you Julien Berger for always being accommodating and

helpful. Thank you Robert ‘Bob’ Seilles for teaching me among many things that there

is a great deal more to life than just work. Thank you Loïc for always being Good.

Being part of such a wonderful team greatly accelerated my learning curve and made

life a joyful journey.

A special word of thanks goes to Deepak Kumar Arora & Kapil Juneja at ST

Noida. It was an enriching experience working with them as I benefited from their

clarity of thought and lucidity in explanation. Despite their hectic workloads they were

always available to have discussions and help me solve my doubts. I would also like to

acknowledge all the other people with whom I had the pleasure and privilege of

collaborating with. They are Abhishek Jain, Rajesh Kumar Immadi, Sujatha Balaraman,

Philippe Brahic, Sylvain Clerc, Fady Abouzeid, Franck Gardik, Jeff Nowakowski, Guy

Bossan, Shahabuddin Qureshi, Fabrice Carite, Manoubi Bourabei, Gilles Lievre, Dennis

Roche, Vincent Huard and Robin Wilson.

I would also like to express my gratitude to two of my most cherished friends

Giulio & Giulio with whom I shared the path during these last three years. Words are

not enough to describe them, they are really extraordinary. Thank you for the great

support in the moments when I needed help, for all the great time that we have spent

together and most of all for your friendship. Also a big thank you to Anu Asokan, Imran

Wali, Aymen Touati, Alejandro Nocua & Amit Karel at LIRMM for their warm

friendships and companionship.

And at last, but certainly not the least, thanks to my parents who have never

lost their faith in me and have always supported me in every decision I took, no matter

how far away from home it has taken me. Their constant support and strength has aided

and encouraged me throughout this endeavor. I shall forever remain indebted for your

love and sacrifice.

Darayus Adil Patel

v

TABLE OF CONTENTS

ABSTRACT ..................................................................................................................... I

ACKNOWLEDGEMENTS ......................................................................................... III

TABLE OF CONTENTS ............................................................................................... V

LIST OF FIGURES ..................................................................................................... IX

LIST OF TABLES .................................................................................................... XIII

LIST OF ABBREVIATIONS .................................................................................... XV

INTRODUCTION ........................................................................................................... 1

1 CONTEXT AND THESIS OBJECTIVES ................................................................ 5

1.1 SEMICONDUCTOR TECHNOLOGY EVOLUTION ............................................ 6

1.2 CHALLENGES IN ADVANCED TECHNOLOGY NODES ................................... 6

1.2.1 Variability .......................................................................................... 7

1.2.2 Manufacturing Defects ...................................................................... 8

1.2.3 Wear-out ............................................................................................ 8

1.2.4 Interference ........................................................................................ 9

1.3 FDSOI TECHNOLOGY NODE ...................................................................... 9

1.4 TESTING OVERVIEW ................................................................................. 10

1.4.1 Characterization Test ....................................................................... 11

1.4.2 Production Test ................................................................................ 12

1.4.3 Burn-in Test ..................................................................................... 12

1.4.4 Incoming Inspection ........................................................................ 13

1.5 RESEARCH OBJECTIVES ........................................................................... 13

2 AN EFFECTIVE POWER-AWARE AT-SPEED TEST METHODOLOGY FOR

IP QUALIFICATION AND CHARACTERIZATION ............................................. 17

2.1 INTRODUCTION ........................................................................................ 18

2.2 BACKGROUND AND MOTIVATIONS........................................................... 19

2.3 CIRCUIT DESIGN ARCHITECTURE AND IMPLEMENTATION ........................ 22

2.4 TEST METHODOLOGY .............................................................................. 25

2.4.1 Test Patterns..................................................................................... 25

vi

2.4.2 Test Flow ......................................................................................... 28

2.5 EVALUATION OF THE PROPOSED TEST METHODOLOGY ............................ 30

2.5.1 Memory and Runtime Evaluation .................................................... 30

2.5.2 Peak Power Analysis ....................................................................... 31

2.6 RESULTS AND ANALYSIS OF FMAX AND VMIN CHARACTERIZATIONS ...... 32

2.7 SUMMARY ................................................................................................ 36

3 ANALYSIS OF SETUP AND HOLD MARGINS INSIDE SILICON FOR

ADVANCED TECHNOLOGY NODES ..................................................................... 39

3.1 INTRODUCTION ........................................................................................ 40

3.2 STATE OF THE ART ................................................................................... 41

3.3 BASIC CONCEPTS, MEASUREMENT CIRCUIT AND METHODOLOGY ........... 42

3.4 TEST METHODOLOGY .............................................................................. 44

3.5 RESULTS AND ANALYSIS.......................................................................... 45

3.6 SUMMARY ................................................................................................ 52

4 ADAPTIVE WEAROUT MANAGEMENT WITH IN-SITU AGING

MONITORS .................................................................................................................. 53

4.1 INTRODUCTION ........................................................................................ 54

4.2 STATE OF THE ART ................................................................................... 56

4.3 MONITORING SYSTEM – DESIGN PERSPECTIVES ...................................... 58

4.4 RESULTS AND ANALYSIS.......................................................................... 62

4.4.1 In-Situ Monitors .............................................................................. 62

4.4.2 Adaptive Wearout Management ...................................................... 64

4.4.3 Power Savings ................................................................................. 67

4.5 SUMMARY ................................................................................................ 68

5 THERMAL CHARACTERIZATION USING DIGITAL SENSORS .................. 69

5.1 INTRODUCTION ........................................................................................ 70

5.2 TEST-CHIP DESIGN OVERVIEW ................................................................. 70

5.3 TEST PATTERNS ....................................................................................... 72

5.4 SELECTIVE DIGITAL CAPTURE METHODOLOGY ....................................... 74

5.5 RESULTS AND ANALYSIS.......................................................................... 78

5.5.1 Results Summary ............................................................................. 78

vii

5.5.2 Analysis and Future Directions ....................................................... 78

5.6 SUMMARY ................................................................................................ 79

6 SIMULATION AND SILICON VERIFICATION OF ADAPTIVE VOLTAGE

SCALING ...................................................................................................................... 81

6.1 INTRODUCTION ........................................................................................ 82

6.2 REFERENCE DESIGN ................................................................................. 84

6.3 AVS SIMULATION METHODOLOGY ......................................................... 87

6.4 SILICON CHARACTERIZATION METHODOLOGY ........................................ 90

6.5 RESULTS & ANALYSIS ............................................................................. 94

6.6 SUMMARY ................................................................................................ 97

7 DESIGN AND PERFORMANCE PARAMETERS OF AN ULTRA-LOW

VOLTAGE PROCESSOR ........................................................................................... 99

7.1 INTRODUCTION ...................................................................................... 100

7.2 DESIGN AND FABRICATION PLATFORM .................................................. 100

7.2.1 Technology .................................................................................... 100

7.2.2 Standard Cells ................................................................................ 102

7.2.3 Memories ....................................................................................... 103

7.3 DESIGN ARCHITECTURE AND IMPLEMENTATION .................................... 104

7.4 TEST METHODOLOGY ............................................................................ 105

7.5 RESULTS AND ANALYSIS ........................................................................ 106

7.5.1 Electrical Performance................................................................... 106

7.5.2 Cache Impact on Energy ................................................................ 107

7.6 SUMMARY .............................................................................................. 108

8 CONCLUSION AND FUTURE PERSPECTIVES .............................................. 109

8.1 THESIS SUMMARY .................................................................................. 110

8.2 FUTURE WORKS ..................................................................................... 112

SCIENTIFIC CONTRIBUTIONS ............................................................................ 113

REFERENCES ............................................................................................................ 115

viii

ix

LIST OF FIGURES

FIGURE 1.1: EFFECTS OF SCALING WITH SUCCESSIVE TECHNOLOGY NODES ....................... 7

FIGURE 1.2: SCHEMATIC LAYOUT OF BULK AND FDSOI TRANSISTORS ............................. 9

FIGURE 1.3: BASIC TESTING APPROACH ............................................................................ 10

FIGURE 1.4: BATHTUB CURVE SHOWING THE RATE OF FAILURE OF INTEGRATED CIRCUITS

AT DIFFERENT PHASES OF LIFE .................................................................................. 12

FIGURE 2.1: BASIC STRUCTURE OF A TILE ........................................................................ 22

FIGURE 2.2: ILLUSTRATION OF INSTANCE INTERFACING WITH CONTROLLING CIRCUITRY 23

FIGURE 2.3: ON-CHIP CLOCK CONTROLLER .................................................................... 24

FIGURE 2.4: IP QUALIFICATION BLOCK ........................................................................... 24

FIGURE 2.5: FLOORPLAN OF THE TEST-CHIP, INSTANCE & TILE ....................................... 24

FIGURE 2.6: AT-SPEED TIMING DIAGRAM FOR A TILE ..................................................... 26

FIGURE 2.7: OCC CLOCKING SCHEME – (A) WITHOUT PEAK POWER AWARE APPROACH (B)

WITH PEAK POWER AWARE APPROACH ..................................................................... 27

FIGURE 2.8: TEST FLOWCHART ........................................................................................ 28

FIGURE 2.9: COMPARISON BETWEEN AT-SPEED FUNCTION AND AT-SPEED ATPG

PATTERNS ................................................................................................................ 30

FIGURE 2.10: REPRESENTATIVE GRAPH OF THE FIRST FAIL FREQUENCY DISTRIBUTION .. 32

FIGURE 2.11: CAD V/S SILICON COMPARISON ACROSS MULTIPLE PVT POINTS .............. 34

FIGURE 2.12: GRAPHS OF FAILING CYCLES V/S. FREQUENCY AND VOLTAGE ................... 35

FIGURE 3.1: SIGN-OFF CLOCK DATA DE-RATES ............................................................... 42

FIGURE 3.2: CLOCK DATA DE-RATE: BASIC DIAGRAM ................................................... 43

FIGURE 3.3: TIMING DIAGRAMS FOR HOLD RISE/FALL AND SETUP RISE/FALL VIOLATIONS

................................................................................................................................. 44

FIGURE 3.4: 28NM FDSOI TEST-CHIP WITH THE CDD BLOCK EMBEDDED AT 7 DIFFERENT

LOCATIONS ............................................................................................................... 44

FIGURE 3.5: CLOCK DATA DE-RATE: EXTRA MARGIN FOR HOLD IN SILICON .................. 46

x

FIGURE 3.6: CLOCK DATA DE-RATE: EXTRA MARGIN FOR SETUP IN SILICON ................. 46

FIGURE3.7: HOLD EXTRA MARGIN WAVEFORMS FOR CAPTURE ..................................... 47

FIGURE 3.8: SETUP EXTRA MARGIN WAVEFORMS FOR CAPTURE ................................... 47

FIGURE 3.9: NORMALIZED HOLD RISE EXTRA MARGIN ON SILICON DEPICTING WITHIN

DIE VARIATIONS & CONSTITUENT COMPONENTS ..................................................... 49

FIGURE 3.10: NORMALIZED HOLD FALL EXTRA MARGIN ON SILICON DEPICTING WITHIN

DIE VARIATIONS & CONSTITUENT COMPONENTS ..................................................... 49

FIGURE 3.11: NORMALIZED SETUP RISE EXTRA MARGIN ON SILICON DEPICTING WITHIN

DIE VARIATIONS & CONSTITUENT COMPONENTS .................................................... 50

FIGURE 3.12: NORMALIZED SETUP FALL EXTRA MARGIN ON SILICON DEPICTING WITHIN

DIE VARIATIONS & CONSTITUENT COMPONENTS ..................................................... 50

FIGURE 4.1: DIFFERENT VOLTAGE MARGINS IN ADDITION TO THE NOMINAL MINIMUM

VOLTAGE .................................................................................................................. 54

FIGURE 4.2: DIFFERENT IN-SITU DELAY MONITOR SCHEMATICS ....................................... 56

FIGURE 4.3: PRE-ERROR MONITOR SCHEMATIC ................................................................ 58

FIGURE 4.4: TIMING DIAGRAM FOR PRE-ERROR MONITOR ................................................ 59

FIGURE 4.5: PATH DELAY HISTOGRAM FOR BCH IP AT DIFFERENT CONDITIONS OF

VOLTAGE & PROCESS ............................................................................................... 60

FIGURE 4.6: EXPERIMENTAL TIMING DIAGRAM SHOWING PRE-ERROR FLAGS BEHAVIOR

WITH LOWERING VOLTAGE ....................................................................................... 62

FIGURE 4.7: EXPERIMENTAL WARNING FLAG/BCH1 AUTOTEST MAXIMUM FREQUENCIES

FOR VARIOUS PVTA CONDITIONS ............................................................................ 63

FIGURE 4.8: ADAPTIVE VOLTAGE MANAGEMENT FLOWCHART ....................................... 64

FIGURE 4.9: ADAPTIVE WEAROUT MANAGEMENT (A) WITHOUT CONSIDERING AGING

EFFECTS AND (B) WITH CONSIDERING AGING EFFECTS ............................................. 65

FIGURE 4.10: DETECTION WINDOW EVOLUTION WITH STRESS TIME ................................. 66

FIGURE 4.11: MONITORING SYSTEM FAILURE RATE AND POWER REDUCTION FOR VARIOUS

DVFS OPERATING CONDITIONS ............................................................................... 67

FIGURE 4.12: TOTAL POWER SAVINGS WITH RESPECT TO STATIC VOLTAGE MARGIN AT

DIFFERENT TEMPERATURES ...................................................................................... 67

xi

FIGURE 5.1: FLOOR PLAN AND OVERVIEW OF TEST-CHIP USED FOR THERMAL

CHARACTERIZATION ................................................................................................. 71

FIGURE 5.2: PATTERN FOR STATIC SELF HEATING MEASUREMENTS ................................ 72

FIGURE 5.3: PATTERN FOR DYNAMIC SELF HEATING MEASUREMENTS ........................... 72

FIGURE 5.4: PLL AND THS CLOCK WAVEFORMS ............................................................ 74

FIGURE 5.5: MEMORY MANAGEMENT DURING DIGITAL CAPTURE [100] .......................... 74

FIGURE 5.6: SMARTEST MEMORY OVERFLOW MESSAGE ................................................ 75

FIGURE 5.7: SELECTIVE DIGITAL CAPTURE WAVETABLE ................................................ 76

FIGURE 5.8: EXAMPLE OF A DIGITAL CAPTURE VARIABLE WITH ASSOCIATED PARAMETERS

[100] ........................................................................................................................ 77

FIGURE 5.9: THERMAL CHARACTERIZATION RESULTS FOR STATIC & DYNAMIC

CONDITIONS ............................................................................................................. 78

FIGURE 6.1: BASIC REPRESENTATION OF POWER SAVINGS THROUGH ADAPTIVE VOLTAGE

SCALING .................................................................................................................. 83

FIGURE 6.2: APIP BASIC REPRESENTATION ..................................................................... 84

FIGURE 6.3: GROUPPMM BLOCK DIAGRAM ....................................................................... 85

FIGURE 6.4: FREQGEN & APIP INTERFACING ............................................................... 86

FIGURE 6.5: COMPUTATION EFFORT FOR FULL PVT SIMULATION USING FAST SPICE ....... 87

FIGURE 6.6: SIMULATION FLOWS BASED ON CUSTOMSIM & VCS AMS .......................... 88

FIGURE 6.7: SIMULATION METHODOLOGY OVERVIEW AND RESULTING RUN-TIME

IMPROVEMENT ......................................................................................................... 89

FIGURE 6.8: FLOWCHART FOR APIP FMAX CHARACTERIZATION ..................................... 91

FIGURE 6.9: FLOWCHART FOR APIP VMIN CHARACTERIZATION ..................................... 93

FIGURE 6.10: NORMALIZED SIMULATION V/S SILICON FMAX COMPARISON AT 125°C ..... 95

FIGURE 6.11: NORMALIZED SIMULATION V/S SILICON LEAKAGE CURRENT COMPARISON

AT 125°C ................................................................................................................. 95

FIGURE 6.12: NORMALIZED SIMULATION V/S SILICON DYNAMIC_POWER/MHZ

COMPARISON AT 125°C ........................................................................................... 96

xii

FIGURE 6.13: NORMALIZED SILICON RESULTS FOR POWER SAVING THROUGH AVS AT 30°C

................................................................................................................................. 96

FIGURE 6.14: NORMALIZED SILICON RESULTS FOR DYNAMIC POWER SAVING THROUGH

AVS FOR FF CORNER CHIP AT 30°C ......................................................................... 97

FIGURE 7.1: FDSOI WELL ARCHITECTURES [104] ........................................................ 101

FIGURE 7.2: STATIC CURRENT VS. FREQUENCY TREND FROM 0.5 TO 1.3V OF VDD [104] 101

FIGURE 7.3: QUASI-FO4 CRITICAL PATH ....................................................................... 102

FIGURE 7.4: POLY BIAS CELLS LEAKAGE VS QUASI-FO4 CRITICAL PATH FREQUENCY ... 102

FIGURE 7.5: 10 TRANSISTOR ULV BITCELL ................................................................... 103

FIGURE 7.6: COMPARISON OF LEAKAGE BETWEEN ULV 10T DUAL WELL (DW) AND

SINGLE WELL (SW) BITCELLS ............................................................................... 104

FIGURE 7.7: SPARC V8 32B LEON3 MICROPROCESSOR SYSTEM & FDSOI 28NM RETICLE

PICTURE ................................................................................................................. 104

FIGURE 7.8: AVERAGE MICROPROCESSOR MINIMUM VOLTAGE EXTRACTED AT 1MHZ ... 106

xiii

LIST OF TABLES

TABLE 2.1: COMPARISON OF PEAK POWER AWARE AND PEAK POWER UNAWARE

APPROACHES ............................................................................................................ 31

TABLE 2.2: NORMALIZED PROCESS VARIATION VALUES ACROSS PVT POINTS ................ 33

TABLE 2.3: NORMALIZED PERCENTAGE OF WITHIN DIE VARIATION WITH RESPECT TO

PROCESS CORNERS................................................................................................... 33

TABLE 2.4: NORMALIZED PERCENTAGE OF WITHIN DIE VARIATION WITH RESPECT TO

VOLTAGE ................................................................................................................. 34

TABLE 2.5: NORMALIZED PERCENTAGE OF WITHIN DIE VARIATION WITH RESPECT TO

TEMPERATURE ......................................................................................................... 34

TABLE 2.6: NORMALIZED CAD AND SILICON SPREAD VALUES ACROSS PVT POINTS ...... 36

TABLE 3.1: CLOCK & DATA DELAY POSITIVE DIRECTIONS IN THE GRAPH ........................ 48

TABLE 4.1: IMPLEMENTATION RESULTS OF BCH IP WITH IN-SITU MONITORS .................. 61

TABLE 7.1: MICROPROCESSOR PERFORMANCE COMPARISON ........................................ 106

TABLE 7.2: COMPARISON OF EXECUTION ENERGY WITH AND WITHOUT CACHE ACTIVATION

............................................................................................................................... 107

xiv

xv

LIST OF ABBREVIATIONS

ABB Adaptive Body Bias

APIP Area and Power IP

ASIC Application Specific Integrated Circuit

ATE Automatic Test Equipment

ATPG Automatic Test Pattern Generation

AVM Adaptive Voltage Management

AVS Adaptive Voltage Scaling

AWM Adaptive Wearout Management

BCH Bose Ray-Choudhary and Hocquenghem

BOX Buried Oxide

CAD Computer Aided Design

CDD Clock Data Derate

CMOS Complementary Metal Oxide Semiconductor

CPR Critical Path Replica

CUT Circuit Under Test

DFT Design For Test

DVFS Dynamic Voltage and Frequency Scaling

FDSOI Fully Depleted Silicon On Insulator

FMAX Maximum Frequency

FMIN Minimum Frequency

IC Integrated Circuit

IOT Internet Of Things

IP Intellectual Property

ISM In-Situ Monitor

xvi

ITRS International Technology Roadmap for Semiconductors

LOC Launch Off Capture

LOS Launch Off Shift

MTF Mean Time to Failure

PRBS Pseudo Random Binary Sequence

PV Process Variation

PVT Process Voltage Temperature

PVTA Process Voltage Temperature Aging

SE Scan Enable

SM Sequencer Memory

SOC System On Chip

SPARC Scalable Processor Architecture

STA Static Timing Analysis

UTBB Ultra Thin Body and Buried Oxide

VLSI Very Large Scale Integration

VM Vector Memory

VMAX Maximum Voltage

VMIN Minimum Voltage

1

INTRODUCTION

“Nearly all men can stand adversity,

but if you want to test a man's character, give him power.”

Abraham Lincoln

The ever-increasing advances in semiconductor technology evolution continue

to open new avenues for research in the domains of digital circuit design and test.

Designing and manufacturing smaller, faster, cheaper and low-power devices are some

of the main challenges for the semiconductor industry. The incessant increase in density

and the corresponding decrease in feature sizes of Integrated Circuits as evinced by

Moore’s Law [1] have been a driving force in the progress of the industry for the past

few decades. However, scaling of CMOS structures into the nanometer domain has

posed new challenges to the physical design and reliability of circuits. The reasons for

this are manifold [2, 22-24]. Firstly, manufacturing structures much smaller than the

wavelength of light used in modern lithography are difficult to fabricate and can be

practically done only with certain coarse limits. Similarly, it is difficult to control the

doping concentration for transistors in the nanometer range. Further, the structures are

located closer to each other with every technology node, resulting in even the smallest

of impurities or metal silvers being able to create shorts or other defects. Lastly, as the

number of transistors, wires, contacts and vias on a single chip increase, the probability

of one or more of them being faulty increases. These limitations have resulted in

increasing the probability of defects in advanced technology nodes.

A circuit defect may lead to a fault causing an error that can result in a system

failure. Manufacturing defects are physical defects introduced during manufacturing

that cause the circuit to fail to function properly. The diversity of defects makes it

difficult to generate meaningful tests maximizing the defect coverage. Fault models

which are gate-level representation of physical defects are necessary for generating and

evaluating a set of test patterns. Generally, a good fault model should accurately reflect

the behavior of the physical defects and should be computationally efficient in terms of

the time required for fault simulation and test generation. Many fault models have been

proposed, but unfortunately, no single fault model accurately reflects the behavior of all

2

possible defects that can occur. As a result, a combination of different fault models is

often used in the generation and evaluation of test patterns. Some well-known and

commonly used fault models for modern VLSI test, developed to describe different

kinds of physical defects, include the stuck-at, bridging and delay fault models [19].

As many faults are not visible at the design phase, they can only be handled by

discarding faulty chips after manufacturing and testing them. Since the faults become

visible only post-fabrication, circuits need to be tested with patterns that can detect the

presence of any defects. This in turn necessitates the need for having effective test

architectures and methodologies for efficiently capturing all modelled faults while

maintaining ease of application. Different test methodologies like parametric,

functional, structural and delay testing are combined together to maximize the fault

detection. Parametric testing is typically used to check the electrical properties of the

device and can be used to characterize any potential systematic issues with the process

node. These tests may not check any functionality of the device but can find gross

shorts, opens, leakage issues, or current drive problems. Functional testing is commonly

used for design verification to check if the output responses are inline with expected

values according to design specifications. However it is considered time consuming and

expensive due to the exhaustive nature of applying test patterns to cover each known

reachable functional state. Structural tests deal with observing the state of internal

signals at the primary outputs of a circuit. Unlike functional testing, structural tests do

not require enumeration of all functional states to test the design, so test volumes are not

as large. Delay tests are used to detect the presence of very slow-to-rise or very slow-to-

fall switching transitions and verify that a design operates correctly at the specified

clock speed.

The importance of testing cannot be overstated as it is required at different

stages of the semiconductor lifecycle from the qualification of new technology nodes to

diagnosis of in-field failures. The optimal test solution for any circuit must take into

account a wide variety of different factors like fault coverage, test cost and time,

capability of available test equipment, reliability requirement of end-user application,

etc. Often times these different factors aren’t complementary to each other and require

intelligent trade-offs to be made on a case to case basis. This eliminates the possibility

of having a global test solution for a wide variety of circuit designs and applications.

Hence custom test solutions and methodologies are required and developed which are

specific to the circuit design and application. In this context differentiated test solutions

3

for different problems at various levels of circuit complexity ranging from simple

standard cells to complex microprocessors are presented as part of this thesis. The main

contributions are divided into six distinct chapters in this manuscript. It is pertinent to

note that the organization of the chapters follows the level of circuit complexity

involved, from basic standard cells to complex microprocessors. The content of each

chapter is briefly summarized below:

· Chapter 1 details the context and objectives of the presented research work. It

starts with a discussion on the trends in semiconductor technology scaling and

the resultant challenges that have emerged. Thereafter the Fully Depleted Silicon

On Insulator (FDSOI) technology node and basic concepts related to

semiconductor test are discussed. The final part of the chapter enumerates the

main objectives of this thesis.

· Chapter 2 presents a configurable circuit and an associated power-aware at-

speed test methodology for the purpose of qualifying basic standard cells and IP

structures to detect the presence of small delay defects. The relevant silicon

results and analysis validate the proposed power-aware test methodology for

qualification and characterization of IPs and provide deeper insights for process

improvements.

· Chapter 3 presents a design and methodology for accurate characterization of

setup and hold margins of sequential elements in silicon while taking into

account effects of Process Variations. The test results provide deeper insights

into sources of extra timing margins available on silicon. This in turn, enables

accurate guard banding by preventing optimism and reducing unnecessary

pessimism in the timing margins provided during sign-off.

· Chapter 4 shows the fundamental elements towards an Adaptive Wearout

Management (AWM) control loop for the dynamic supply voltage regulation of

digital circuits. Firstly a new generation of in-situ monitors which generate

warning flags to prevent circuit failure under nominal conditions have been

explained. A voltage management control loop to efficiently tackle all voltage

margins over a large range of DVFS operating conditions and temperatures has

been implemented. The relevant experimental silicon results related to timing

windows and power savings have been provided.

4

· Chapter 5 elucidates a methodology for thermal characterization under dynamic

workloads using embedded digital sensors. Firstly the design of the test-chip

used is explained and thereafter the test patterns and test methodology employed

are elaborated upon. In particular, the methodology required for managing high

volume of data generated during such tests is elaborated upon. Analyses of the

on-chip thermal results provide useful insights for test-chip design requirements

related to thermal measurements.

· Chapter 6 presents the full silicon level electrical characterization correlation of

the simulation done using a novel method based upon existing CAD tools and a

mixed mode approach, on a representative digital design. Thus enabling

complete Process-Voltage-Temperature (PVT) characterization on the portion of

the design for which Adaptive Voltage Scaling (AVS) is defined. The

correlation of silicon results with the ones obtained via the simulation flow

provide designers with deeper insights into the gains with AVS from within the

simulation environment.

· Chapter 7 presents the design and performance metrics of a single-supply

SPARC 32bit V8 microprocessor designed with Ultra Low Voltage (ULV)

adapted standard cells and memories, aiming at low energy operation and stand

by power. A comparative analysis with similar implementations has been

provided highlighting the performance gain and power savings that are achieved

by the presented design methodology and implementation technology. Wafer-

level tests throw light on the operating range and energy savings achieved by the

ULV adapted microprocessor.

· Chapter 8 concludes the manuscript by summarizing the results of this thesis and

presenting some future perspectives.

5

1 CONTEXT AND THESIS

OBJECTIVES

“The best way to have a good idea is to have a lot of ideas.”

Linus Pauling

6

1.1 Semiconductor Technology Evolution

Technology scaling allows realization of increasingly complex systems on a

single chip. The steady growth of the global semiconductor industry over the past few

decades has been driven by the demand for enhancing performance and functionality at

reduced cost. This growth has been primarily facilitated by the continuous evolution of

semiconductor manufacturing technology. Following the so-called Moore’s law [1], the

scale of ICs has doubled every 18 months. A simple example of this trend is the

progression from Small Scale Integration (SSI) to Very Large Scale Integration (VLSI)

devices. In the 1980s, the term VLSI was used for chips having more than 100,000

transistors and has continued to be used over time to refer to chips with millions and

now hundreds of millions of transistors. In 1986, the first megabit Random Access

Memory (RAM) contained more than 1 million transistors. Microprocessors produced

in 1994 contained more than 3 million transistors. VLSI devices with many millions of

transistors are commonly used in today’s computers and electronic appliances. This is a

direct result of the steadily decreasing dimensions, referred to as feature size, of the

transistors and interconnecting wires from tens of microns to tens of nanometers, with

current submicron technologies based on feature sizes of just a few nanometers.

Transistors are scaled in each successive technology node to increase their speed,

improve packing density, decrease their power consumption and reduce cost. The

reduction in feature size has also resulted in increased operating frequencies with

current commercially available microprocessors operating in the gigahertz range. Thus,

semiconductor technology scaling optimizes circuit performance and power

consumption, and allows realization of more and more complex systems [2]. Figure 1.1

representatively depicts the shrink in silicon area required for fabricating the same

circuit in successively smaller technology nodes.

1.2 Challenges in Advanced Technology Nodes

While offering many advantages, technological trends into the nanometer

regime have led to a host of challenges. The sources that affect the integrity and

reliability of modern circuits and systems are wide spread including manufacturing

7

Figure 1.1: Effects of scaling with successive technology nodes

process variations, manufacturing defects, wear-out and interference. In this

section we discuss some of the major issues that have emerged with technology scaling.

1.2.1 Variability

Variability of transistor characteristics due to variations in Process, Voltage

and Temperature (PVT) have always been an issue in integrated circuit design [3]. PVT

variations may prevent the circuit from achieving the required performance even in the

case of fault-free fabrication. As device scaling enters into the nano-metric regime the

effects of PVT variations are becoming excessively pronounced. Manufacturing process

variations are increased due to the lack of precise control on fabrication processes at

small-feature technologies. They cause deviation of transistor characteristics and the

resulting wide distribution leads to lower parametric yield [4]. Supply voltage scaling

brings it close to or even below the transistor threshold voltage (Vth) to achieve

significant gains in energy efficiency. Once the supply voltage is lower than Vth the

delay shows an exponential dependence on the supply voltage, and therefore the

magnitude of supply voltage shifts can have a significant impact on circuit delay [5, 6].

Scaling also gives rise to chip power densities and inadequate heat sinking cause hot

spots to appear. These fluctuations of ambient temperatures alter the timing

characteristics of circuits [7]. A conventional approach to deal with the problem of

variability is to introduce large voltage and frequency guard-bands during sign-off

which considerably impacts the power consumption and performance, cannot respond to

8

rapid environmental changes [8] and implies a sharp increase in chip cost. Effectively

dealing with variability to maintain or improve the high performance and energy

efficient systems while satisfying the required standards for reliability is becoming

increasingly challenging with the scaling of technology [9].

1.2.2 Manufacturing Defects

Semiconductor manufacturing processes may induce permanent defects in a

chip during one or more of the process steps involving implantation, etching,

deposition, cleaning and lithography due to imperfections. The device miniaturization is

making advanced technology nodes increasingly vulnerable to these manufacturing

irregularities. It is important that the testing be as thorough as possible to uncover

defective chips before they are shipped out [10]. However, according to past

microprocessor data, the die size remains relatively constant [11], whereas the number

of transistors per chip double every 2 to 3 years. This means that defect densities

continue to increase. Moreover technology scaling and increasing complexity give rise

to defects which are more subtle and difficult to detect [12, 13]. All these factors when

combined with aggressive time-to-market objectives cause test escapes and raise

reliability concerns.

1.2.3 Wear-out

Although the area scaling has had an exponential rate, the supply voltage (Vdd)

scaling has been quite slow. The two main reasons for slow Vdd scaling are to keep up

with the competitive frequency growth and to retain the basic noise immunity and cell

stability [14]. Hence the dissimilar area and supply voltage scaling rates result in high

power densities and elevated temperatures. The well-known wear-out failure

mechanisms namely Time-Dependent Dielectric Breakdown (TDDB), Electromigration

and thermal stress are all adversely affected by increase in temperature. These age-

related hard errors that appear in-field after a certain period of use are major factors

impairing the life-time reliability of modern microprocessors [15] used in data centers

and space equipment. These applications demand high throughput and better lifetime

reliability. However, these wear-out failures limit their useful lifetime and performance.

9

1.2.4 Interference

Beyond manufacturing defects and wear-out related permanent faults,

integrated systems of recent years are more susceptible to temporary effects like

transient or intermittent faults. Smaller geometries increase coupling capacitances and

as a result, cross-capacitance between wires in nanometer interconnect structures has

turned out to be the dominant component of total net capacitance. High interconnect

coupling leads to increase in circuit delay and crosstalk noise. Therefore, reducing of

interconnect coupling capacitance is an important task in the view of both circuit noise

and timing optimization. [16 - 18]

1.3 FDSOI Technology Node

Figure 1.2: Schematic layout of Bulk and FDSOI Transistors

To build better devices, the size of the transistors must be reduced while

increasing performance and reducing power consumption. FDSOI or Fully Depleted

Silicon on Insulator [38] is an advanced technology node that delivers these benefits by

building transistors that are faster, cooler and simpler. In FDSOI, the innovation lies in

inserting a thin layer of Insulator called the Buried Oxide (BOX) positioned just below

the channel and eliminating the need to add dopants to the channel, thus making it fully

depleted. Another key feature is that the silicon on oxide layer is very thin together with

the ultra thin body layer. Together this is referred to as the Ultra Thin Body and Buried

Oxide (UTBB). On the same technology node, the FDSOI transistors have a shorter

effective channel as compared to a bulk silicon transistor. This reduces the effective

10

time required by electrons to travel from the source to the drain, leading to faster

transistor. Figure 1.2 shows the schematics of Bulk and FDSOI Transistors.

As the transistor shrinks, the length of the gate is reduced and the control the

gate exercises over the channel region is also reduced thereby lowering the transistor

performance. Some unwanted leakage current flows even when the transistor is turned

off and represents a growing proportion of power consumption with each new

technology node. For FDSOI transistors, the insulator layer effectively confines the

electrons from the source to the drain thereby drastically reducing the leakage currents

from the channel to the substrate. An FDSOI chip is able to run at a lower voltage due

to the reduced margins added for process variability which also makes it cooler with

lower power consumption. To summarize, FDSOI transistors operate at lower power at

equivalent frequency as compared to their bulk counterparts.

The fabrication of all designs and their associated silicon validation results

presented within the scope of this thesis employ the 28nm FDSOI unless stated

otherwise.

1.4 Testing Overview

Figure 1.3: Basic testing approach

The reduction in feature size increases the probability that a manufacturing

defect in the IC will result in a faulty chip. A very small defect can easily result in a

faulty transistor or interconnecting wire when the feature size is less than 100 nm.

11

Furthermore, it takes only one faulty transistor or wire to make the entire chip fail to

function properly or at the required operating frequency. Yet, defects created during the

manufacturing process are unavoidable, and as a result, some number of ICs are

expected to be faulty. Therefore, testing is required to guarantee fault-free products,

regardless of whether the product is a VLSI device or an electronic system composed of

many VLSI devices. The goal of manufacturing test is to detect any defect that occurs in

the fabricated circuits. Ideally, we can differentiate faulty circuits and fault-free circuits

after the manufacturing test. Figure 1.3 illustrates the basic principle of chip testing [2]

wherein test vectors are applied to inputs of the Circuit-Under-Test (CUT), and

responses are collected and compared with expected values. If the responses match, the

circuit is considered good. Otherwise, it is considered bad. The Automatic Test

Equipment (ATE) is used to test chips. It is obvious that the test quality depends upon

the thoroughness of the test vectors. However, the test quality and test cost are

interdependent. A large number of test vectors/patterns may result in a good test quality,

but it will increase the test time and test cost at the same time. It is also necessary to test

components at various stages during the manufacturing process. Testing and diagnosis

of VLSI systems can be broadly classified into four types depending on the specific

purpose it accomplishes and the current phase of production (from fabrication to

shipment) for the circuit under test [19]. In the following subsections, four types of test

methods are briefly covered in the order they are conducted during the design and

manufacturing processes.

1.4.1 Characterization Test

Also known as design debug or verification testing, this form of testing is

performed on a new design before it is sent to production [19]. The main objective of

characterization test is to verify that the design is correct and the device will meet all

specifications. Comprehensive AC and DC measurements are made during this test

process. The requirement for thoroughness during this testing phase may often lead to

probing of internal nodes of a chip, not performed as part of any other test process.

Generally, devices are tested for the worst case because it is easier to evaluate than

average cases and devices passing this test usually work for any other conditions.

12

1.4.2 Production Test

Every fabricated chip is subjected to production tests, which are less

comprehensive than characterization tests yet they must enforce the quality

requirements by determining whether the device meets specifications. It may not be

possible to cover all possible functions and data patterns, but production tests must have

a high coverage of modelled faults. Since every device must be tested before being

packaged, test application time is of great importance. Production test should be as brief

as possible and is usually different from characterization tests.

1.4.3 Burn-in Test

Figure 1.4: Bathtub curve showing the rate of failure of integrated circuits at

different phases of life

All devices that pass production tests are not identical. When put to actual use,

some will fail very quickly whereas others will function for a long time. Burn in tests

screen for long-term reliability of devices by either continuous or periodic testing over a

period, usually under nonrated conditions. Rate of failure of integrated circuits at

different phases of life follows a bathtub curve [20] as shown in Figure 1.4. Correlation

studies show that the occurrence of potential failures can be accelerated at elevated

temperatures [21]. Two types of failures are isolated by burn-in:

1. Infant mortality failures, often caused by a combination of sensitive design and

process variation, and may be screened out by a short-term burn-in (10–30 h) in

a normal or slightly accelerated working environment, and

13

2. Freak failures, that is, devices having the same failure mechanisms as the

reliable devices, require long burn-in time (100–1000 h) in an accelerated

environment.

In practice, a manufacturer must balance economic considerations against the

device reliability. In any case, the elimination of infant mortality failures is considered

essential for most applications [19].

1.4.4 Incoming Inspection

System manufacturers perform incoming inspection (also called quality

assurance) on purchased devices before integrating them into the system. Depending

upon the context, this testing can be either similar to production testing, or more

comprehensive than production testing, or even tuned to the specific systems

application. The most important purpose of this testing, performed at the vendor site, is

to avoid placing a defective device in a system assembly where the cost of diagnosis

may far exceed the cost of incoming inspection.

The development of new technology platforms is done through the validation

of various test chips, which fulfil the fundamental requirement of determining the

silicon parameters of IP’s (functionality, timing, power, etc.) before they can be used in

a real System-on-Chip. Therefore, while qualifying new technology nodes, it is essential

to ensure that the silicon performance of test circuits is within acceptable limits.

Furthermore, accurate diagnosis and understanding of the sources of deviation from

expected behaviour is essential for improving the yield and performance of a given

technology node.

1.5 Research Objectives

The work of this thesis is focused on the development of efficient test

methodologies that aid in the accurate detection and resolution of issues that may arise

due to variability, manufacturing defects, wear-out or interference. The intent is to

14

tackle a wide spectrum of these challenges from a test perspective to ensure that the

availability of effective test solutions does not become a bottleneck in the path towards

further scaling. To this end, special emphasis is laid on capturing faults and

understanding deviation from expected behavior during the process development phase

itself. The goal is to address the great need and scope to improve and optimize existing

test structures and methodologies with regards to test time, silicon area occupied by test

structures and the ability to analyze the impact of deep submicron effects. Additionally

the aim is to develop advanced, modular and agile test methodologies that address the

advances and innovations introduced in the myriad domains of electronic design,

reliability management, manufacturing process improvements etc. The main objectives

of the thesis are briefly listed below:

· Develop a power-aware at-speed test methodology for the purpose of qualifying

basic standard cells / complex IP structures and detect the presence of small

delay defects at an early phase of technology development.

· Accurately characterize setup and hold margins of sequential elements in silicon

while taking into account effects of Process Variations. The test results should

provide deeper insights into the constituent components of extra timing margins

available on silicon. This in turn should enable accurate guard banding by

preventing optimism and reducing unnecessary pessimism in the timing margins

added during sign-off.

· Validate and characterize the functionality of a new generation of pre-error

warning monitors. Implement an Adaptive Wearout Management (AWM)

control loop for dynamic supply voltage regulation of digital circuits using the

warning flags generated by the pre-error monitors. Analyse the silicon results to

quantify the reliability benefits and power savings that are accrued from the

implemented AWM strategy.

· Develop a test methodology for thermal characterization under dynamic

workloads using embedded digital sensors. In particular, the methodology

should be capable of capturing the thermal gradient across small time intervals

and also be efficient in managing the resultant high volume of generated data.

Analyses of the on-chip thermal results should provide useful insights into

dynamic self-heating on silicon and should also show directions for future test-

chip design requirements related to thermal measurements.

15

· Implement the full silicon level electrical characterization correlation of the

simulation done using a novel method based upon existing CAD tools and a

mixed mode approach, on a representative digital design. The correlation of

silicon results with the ones obtained via the simulation flow should provide

designers with deeper insights into the gains achieved using Adaptive Voltage

Scaling from within the simulation environment.

· Evaluate the electrical performance metrics of a single-supply SPARC 32bit V8

microprocessor designed with Ultra Low Voltage (ULV) adapted standard cells

and memories, aiming at low energy operation and stand by power. Wafer-level

tests should throw light on the operating range and energy savings achieved by

the ULV adapted microprocessor. Further perform a comparative analysis with

similar implementations to understand the performance gain and power savings

that are achieved by the design methodology and implementation technology.

The above listed objectives are individually dealt with in the forthcoming chapters.

16

17

2 AN EFFECTIVE POWER-AWARE AT-SPEED TEST

METHODOLOGY FOR IP

QUALIFICATION AND

CHARACTERIZATION

“The real voyage of discovery consists

not in seeking new landscapes but in having new eyes.”

Marcel Proust

Abstract: Advanced nanometer technologies have led to a drastic increase in operational frequencies resulting in the performance of circuits becoming increasingly vulnerable to timing variations. The increasing process spread in advanced nanometer nodes poses considerable challenges in predicting post-fabrication silicon performance from timing models. Thus, there is a great need to qualify basic building structures on silicon in terms of critical parameters before they could be integrated within a complex System-on-Chip (SoC). This chapter presents a configurable circuit and an associated power-aware at-speed test methodology for the purpose of qualifying basic standard cells and complex IP structures to detect the presence of small delay defects. The circuit design has been embedded within test-chips used for the development of the 28nm Fully Depleted Silicon On Insulator (FDSOI) technology node. The relevant silicon results and analysis validate the proposed power-aware at-speed test methodology for qualification and characterization of IPs and provide deeper insights for process improvements.

18

2.1 Introduction

Consequent application of reusable components (IP: Intellectual Property) is

indispensable to enhance productivity in system design. IP qualification is a process to

measure and evaluate the specific attributes of IP cores. Its goal is to ensure that the

required quality & reliability of IP blocks is achieved and that all design constraints are

met. IP qualification offers IP providers and integrators with measurable quality

characteristics that allow making design decisions based upon quantifiable metrics of

performance and reliability [25-27]. The development of new technology platforms is

done through the silicon qualification of basic building structures/gates embedded

within test-chips, which ensure greater controllability and observability of the Circuit

Under Test (CUT). While qualifying new technology nodes, it is essential to understand

the silicon performance of CUTs and ensure that it is within acceptable limits. Test-

Chips fulfil the fundamental requirement of determining silicon parameters of IP’s

(functionality, timing, power, etc.) before they can be integrated in a real SoC design.

Having modular design architectures for test, effective test methodologies, efficient

diagnosis of defects and the ability to accurately characterize the impact of Process

Variations (PV) [28, 29] is essential to the IP Qualification process.

From a post-silicon test point of view, the physical phenomenon observed in

deep sub-micron nodes give rise to new defects and immensely complicate the task of

timing analysis and path selection for testing and grading the final chips. IC

manufacturing test is continually evolving, with an increased emphasis on at-speed

testing to maintain test quality for larger, more complex chips and new fabrication

processes. At-speed testing is crucial for ensuring product quality from the timing

performance point of view and maintaining the reliability level of chips delivered to

customers after testing. At-speed testing is now the industry norm as many timing

related defects go undetected by ATPG patterns generated using the classical stuck-at

and bridging fault models [30, 31]. Further, power consumption during test is

significantly higher than during normal functional mode [19, 32-35]. The reason is that

test patterns tend to activate switching on a large number of nodes while a typical

functional mode only activates a small portion of the design at any given time instance.

Also successive functional input vectors applied to a given circuit during system mode

have significant correlation, while the correlation between consecutive test patterns can

be very low [36]. The excessive power consumption can result in parts being damaged

19

during test, or in preventing good parts from passing test, leading to yield loss. This

necessitates the need to develop power-aware test methodologies. In order to have

proper characterization in terms of timing it becomes inevitable to evaluate the impact

of aging degradation too through burn-in tests. Stress tests such as high voltage tests and

burn-in tests are often used to perform accelerated testing in order to detect infant

moralities and screen weak parts [37]. Furthermore, accurate diagnosis of the sources of

defects & deviations from expected behaviour is essential for improving the yield and

performance of a given technology.

In this context, this chapter presents a circuit design which may be embedded

within test-chips and an effective test methodology that enables the characterization of

basic standard cells and/or complex IP structures. This helps in detecting Small Delay

Defects (SDD) at an early phase of technology development. The circuit design & test

methodology have been optimized for low power consumption & used for the

development of the 28nm FDSOI technology node [38]. The associated relevant results

and analysis provide deeper insights into the speed limitations, reliability and

defectivity of the said technology node. Furthermore, silicon results help in the

diagnosis of captured defects and in qualifying the standard cell library with respect to

multiple Process-Voltage-Temperature (PVT) configurations.

2.2 Background and Motivations

Performance verification in the presence of PV is difficult to perform because

the critical path is no longer unique. Different paths become frequency limiting in

different process corners. In advanced nanometer manufacturing, systematic delays

owing to unpredictable process changes have assumed a greater share of customer fails

than catastrophic defects [39]. Conventional ASIC design methodology uses pre-

characterized standard cells to optimize the design as per specifications. During the

process development phase it is necessary to complement pre-silicon analysis with post-

silicon timing and power characterization on fabricated devices. Characterization results

are used to identify outliers, understand the impact of PV and improve the design before

ramp and production [40]. Accurate timing characterization of standard cells as well as

20

complex IPs is required for power and performance optimization of designs which

embed them.

It has been observed that the presence of random defects can affect circuit

performance with very slow-to-rise or very slow-to-fall switching transitions [41]. Such

defects can cause an increase in the delay of paths going through circuits. A device fails

to operate at the rated speed if it contains a delay defect that creates a path delay higher

than the maximum allowed path delay [42]. The different fault models for generating

test patterns for delay defect detection are the transition delay model, the path delay

fault model and the SDD fault model. In the transition fault model [43], a gate net has a

slow-to-rise and a slow-to-fall fault associated with it while the path delay fault model

[44] targets the cumulative delay through the entire set of gates in a pre-defined path.

Moreover, the path delay fault model not only takes into account the cumulative delay

through the entire list of gates in a pre-defined path, but it can also take into account

localized delay defects. Depending on how delay faults are tested in scan design

environment, there are two transition delay fault pattern generation approaches, namely

the Launch-Off-Shift (LOS) and Launch-Off-Capture (LOC) schemes [30, 45].

Application of at-speed tests for detecting delay faults in synchronous sequential

circuits is usually done using scan-based structural tests generated by an Automatic Test

Pattern Generator (ATPG). Timing-aware ATPGs require long CPU run time for pattern

generation and fault simulation, and also result in a significantly large pattern count

[46]. In [47], a variation aware method based on statistical timing to select critical paths

for structural test is discussed wherein path delays are projected into different process

corners using their linear delay function forms. The work of [48] has explored

improving transition fault pattern quality through at-speed diagnosis to automatically

identify the failing paths and locate the root cause of failure. The premise is that

complete timing information is unavailable during ATPG and it may be possible that

some of the at-speed patterns may activate paths which are not required to meet system

speed (for e.g. non-functional paths), and these patterns may fail during production test.

Power inducing over-test occurs when fully-functional chips fail during testing

because of excessive heat and/or Power Supply Noise caused by test patterns with

excessive switching activity and is rapidly becoming one of the most serious problems

in scan testing, especially for low-power, high-speed, and deep-submicron integrated

circuits [49, 50]. This is particularly aggravated during burn-in testing which can lead to

significant power variations within the die [51]. Since power consumption increases

21

significantly with elevated temperature and voltage, the execution of burn-in tests can

be limited by the allowed power dissipation budget [52]. For scan-based tests, power is

divided into shift-power and capture-power. Shift-power is the power consumed during

the process of simultaneously serially loading the test stimulus and unloading the

previous test response whereas capture-power is the power consumed during one or

more capture clock cycles. Generally, test power problems are classified as peak power

issue which is highest permissible value of power at any given instant and average

power issue which is total distribution of power over a time period [53]. Test power

reduction can be addressed at various stages of test generation for logic circuits, by

employing low-power ATPG, low-power test compaction, low-power X-filling, and

low-power test vector ordering [54]. The most significant advantage of reducing test

power through low-power test generation is that this approach causes neither circuit

overhead nor performance degradation. Power-aware Design-for-Test (DfT) techniques

allow for controlling the power consumption and reduce the overall energy consumed

during test. Clock gating can be employed to reduce test power without impacting fault

coverage. Techniques such as segmentation, clustering and ordering of scan cells reduce

the power consumption, and improve the effectiveness of power reduction techniques

such as test planning and power-aware test generation [54]. A common method to

reduce the excess switching activity during shifting is to split the scan path into several

segments. Shifting is then done one segment after the other [55]. The segments not

currently active are not clocked and do not contribute to shift power. Staggered clocking

or skewed clocking is a technique used in low-power design where segmented parts of a

design are clocked in an alternating fashion to reduce the simultaneous switching

activity and thereby the peak power during test [56, 57].

The work of this chapter focuses on the detection of timing related defects at

the standard cell level during the process development phase. The presented power-

aware modular design structure and test methodology allow for the reduction of capture

power during at-speed tests. The proposed framework and methodology simplify the

procedure for fault diagnosis and outlier localization using functional at-speed patterns.

The associated results and analysis help in further understanding the effects of PV

across multiple corners on timing related defects and design margins.

22

2.3 Circuit Design Architecture and Implementation

Figure 2.1: Basic structure of a Tile

The primary intent of the design is to enable the qualification of basic digital

structures, gates or IPs by accurately determining their operational parameters and

characteristics at high frequencies. In terms of physical implementation, a bottom-up

approach has been used by implementing the basic building structure of the design and

then using it as a hard-macro for further implementation. This basic building structure

referred to as a Tile and is shown in Figure 2.1.

A tile consists of N different standard cells or complex IP structures between

launch and capture flops. The capture flop is referred to as the Scan Flop as it forms part

of the output scan chain. The Circuit Under Test (CUT) refers to a single standard cell /

IP as shown in Figure 2.1. CUTs embedded within a tile for the purpose of qualification

differ in terms of geometry and characteristics such as layout, orientation etc. A CUT is

activated and tested by either a rising or falling transition. A NX1 multiplexer is used to

select & observe the output of any CUT within the tile, which is then captured into the

scan flop. Scan flops of multiple tiles are stitched together to facilitate the shifting out

of the CUT’s capture response, such that it is observable on the top-level pad of the test-

chip. The excitation stimuli may be either generated on-chip or loaded externally

depending on the mode in which we are the structure is operated. For retaining

observability in case of any faults within the scan chain, redundancy has been added

using a Bypass Mux structure. Launch and capture flops as well as multiplexers

employed within the tile have higher reliability as compared to embedded CUTs i.e.

they have been qualified earlier or if the tile is used for the qualification of an entirely

new technology node, these cells employ a relatively relaxed geometrical layout. A tile

23

can be clocked dynamically by either using the slow tester clock or a fast clock

generated using an on-chip PLL.

Figure 2.2: Illustration of instance interfacing with controlling circuitry

For having a large population of tiles to help in better understanding the

properties of fabricated devices while maintaining ease of design, each tile is replicated

several times within an instance as shown in Figure 2.2. Each instance is then interfaced

with an internal on-chip data generator and an On-Chip Clocking (OCC) mechanism.

The internal data generator is a Linear Feedback Shift Register which provides rising

and falling transitions. Having an on-chip data generator provides the possibility to

easily test an instance when it is not possible to apply stimuli from top-level pads of the

test-chip. Such a requirement may be mandatory for Burn-In testing where limited

number of pins may be available on the Burn-In board due to hardware limitations of

the oven equipment.

The on-chip clock controller shown in Figure 2.3 is used to generate at-speed

clock pulses. The OCC is a complex multiplexer with synchronization circuitry, which

selects between at-speed (PLL) clock and slow clock depending on the status of the

Test_Enable signal. The length of the Clock Shift Register is easily configurable & is

programmed to generate the required number of fast clock pulses during capture mode,

i.e. Test_Enable = 0 & Test_Mode = 1. During capture, the Test_Enable signal is

latched internally at the negative edge of the Slow_Clk which enables the generation of

fast clock pulses depending on the contents of the clock shift register. The configurable

clock shift register provides flexibility to adjust the arrival time of the at-speed pulses

after the Test_Enable signal goes low. This forms the basis of the proposed architecture

for peak power reduction.

24

Figure 2.3: On-Chip Clock Controller

Figure 2.4: IP Qualification Block

Figure 2.5: Floorplan of the Test-chip, Instance & Tile

25

As shown in Figure 2.4, an instance is further replicated inside a Block with

the goal of increasing the cell population. For the test case under consideration, a single

block contains 5 instances with 100 tiles (Instances 1 - 5) and 3 instances having 50 tiles

(Instances 6 - 8). Each tile embeds within itself 128 CUTs. This leads to a population of

about 80k CUTs within a single block. However, this configuration can be easily

modified to suit specific requirements. Having multiple instances embedded within a

single test-chip allows studying the impact of within die variations on performance.

This architecture and implementation flow not only enables IP qualification but also

helps us to reuse small portion of the structure multiple times resulting in reduction of

overall development cycle of the test-chip. The modular design allows sharing resources

and increasing area utilization efficiency for different tiles and instances. The presented

circuit design is capable of being clocked in the GHz range and occupies a chip area of

1.18 mm2 when implemented in the 28nm FDSOI technology. Figure 2.5 shows the

floor-plan of the test-chip, instance and tile respectively.

2.4 Test Methodology

In this section, firstly the application of power-aware at-speed patterns

employed for testing tiles within the test-chip design have been explained with the aid

of timing diagrams. Thereafter, the test flow used for qualifying the CUTs embedded

into the test-chip is explained.

2.4.1 Test Patterns

Figure 2.6 shows the timing diagram when a tile is clocked during at-speed

test. The LOC approach is used for this implementation permitting the scan chain to

shift at low speeds and simplifying the design by eliminating the need for switching the

Scan Enable (SE) signal at high frequencies. The dynamic clock signal from the On-

chip Clock Controller is used to clock the launch flop, the CUT and the scan flops,

enabling them to be either clocked at low frequencies using the tester clock or at high

frequencies using the on-chip PLL. Initially, data coming from either the top level pads

of the test-chip or from the internal data generator is stabilized at the input of the launch

flop. The input of the launch flop toggles whenever the Test SE signal goes low, which

26

Figure 2.6: At-Speed Timing Diagram for a Tile

ensures a transition at the input of the CUT during at-speed test. Thereafter, three at-

speed pulses are provided when the SE signal is low which sequentially triggers the

output of the launch flops, CUT and scan flops. Data from the scan flop is then shifted

out at slow speeds. At-speed patterns used for testing the design may either be

developed using an ATPG tool or written using a cycle based pattern development

language. In the rest of this chapter, at-speed patterns developed using an ATPG tool are

referred to as at-speed ATPG patterns whereas those developed using a pattern

development language are referred to as at-speed functional patterns.

For increasing efficiency and reducing test time, multiple instances are tested

in parallel. From a peak power perspective, it is necessary to optimize the capture power

as the number of CUTs switching during capture is much greater than the number of

nodes switching during scan shifting. This is due to the fact that within a tile, activity is

induced on N CUTs which are tested in parallel and only a single scan flop which is

stitched across tiles to form the scan chain. Assuming a switching activity of X at any

time for each instance embedding 100 tiles, this particular implementation would result

in a total switching activity of 6.5X since a single block embeds 5 instances with 100

tiles and 3 instances with 50 tiles. Such concurrent switching as shown in Figure 2.7(a)

would cause an unacceptable spike in peak capture power. For tackling peak power

issues during capture, at-speed pulses for each instance are mutually skewed such that at

any given point in time, the capture pulses for just a single instance induce activity as

shown in Figure 2.7(b). This is achieved by setting the required length of the Clock

27

Figure 2.7: OCC clocking scheme – (a) Without peak power aware approach (b)

With peak power aware approach

Control Shift Register of the OCC during the design phase and then programming it

accordingly to provide the required skew amongst instances. This approach induces a

maximum switching of X across all instances at any point of time during capture. In

order to guarantee this clocking scheme on silicon, clock skew of zero must be ensured

at the input of each instance.

28

2.4.2 Test Flow

Figure 2.8: Test Flowchart

Steps to test the design for deriving the silicon spread and detecting any

outliers present on silicon are sequentially performed as follows (see Figure 2.8):

1. Firstly the status of an ATPG pattern generated using the Stuck-At fault model is

checked for the entire block to filter the presence of any static faults present

within the design.

2. The functionality of the reference scan chain is verified at low speeds by

ensuring that a reference pattern is well propagated.

3. The functionality of the bypass structure is verified at low speeds by ensuring

that reference data is well captured.

4. Functional scan patterns which apply a sequence of input data to the CUTs

within an instance are tested at low speed. This provides a picture of the

functional integrity of CUTs at low frequencies.

29

5. Steps 1-4 are performed at nominal operating conditions and an instance is

considered to be faulty if a failure is observed on any pattern during these steps.

Thereafter, characterization using at-speed patterns is performed by determining

the maximum frequency (Fmax) or the minimum voltage (Vmin) of operation.

Fmax Characterization: Functional at-speed scan pattern with high-frequency capture

pulses and low-frequency shift out is tested repeatedly for multiple frequencies in the

entire test frequency range. This is done by dynamically modifying the PLL frequency

programmed within the functional at-speed pattern. A step size of 10Mhz has been used

to achieve the required test granularity but may be easily adjusted to suit specific

requirements.

Vmin Characterization: Functional at-speed scan pattern with high-frequency capture

pulses and low-frequency shift out is tested repeatedly for multiple voltage points in the

entire test range. A step size of 10mV has been used to achieve the required test

granularity but may be easily adjusted to suit specific requirements.

The first failing frequency/voltage for every instance is logged to have an

insight into the speed limitation of the tile on silicon. The subsequent failures are

captured to observe the silicon spread in terms of delay. For failing

frequencies/voltages, a cycle-based failure datalog is generated for diagnosis, which

simply details the failing cycles and the output pin. Using this information, the failing

CUT, tile and instance can be easily identified as a particular failing cycle can be

accurately mapped back to the failing CUT. On a comparative basis, Vmin

characterization is faster as compared to Fmax characterization as there is no need to

dynamically modify the pattern during runtime for programming the PLL. This makes

Vmin characterization more suitable to production test where a simple change of the

input spec value is required. Also, most silicon test environments provide in-built

functions for performing a Vmin search. However, an Fmax search is more suitable for

understanding the silicon spread in terms of frequency at any given voltage point.

Silicon wafers with test-chips having our design embedded within them have been

fabricated as part of a corner lot using the 28nm FDSOI technology node. For the

purpose of investigation, wafers of the slow, typical and fast corners were tested at

multiple temperatures using the Verigy 93k automated test equipment.

30

2.5 Evaluation of the proposed Test Methodology

This section provides a comparative analysis of the different at-speed patterns

that may be used in the test methodology and highlight the benefits of the power-aware

approach.

2.5.1 Memory and Runtime Evaluation

Figure 2.9: Comparison between At-Speed Function and At-Speed ATPG Patterns

The design architecture supports at-speed testing with patterns that may either

be developed using an ATPG tool or written using a cycle based pattern development

language. Figure 2.9 graphically displays the comparison between at-speed functional

and at-speed ATPG patterns for a single instance with respect to memory requirement

and runtime. From the graph it is evident that at-speed functional patterns require about

90% less memory as compared to at-speed ATPG patterns as these patterns use an

internal Finite State Machine to generate control signals and apply stimuli to the

embedded CUTs. At-speed functional patterns are two magnitudes of order faster than

the ones generated using an ATPG tool, in terms of the number of tester cycles required

for their execution. Since a functional pattern is cycle-based, a deeper knowledge of

design interactions and activations from within the pattern is easily available. This

information is essential for programming the PLL to produce a range of frequencies

when operated at a constant input frequency. Furthermore, at-speed functional cycle-

31

based patterns simplify the diagnosis process as a failing cycle can be easily mapped

back to the faulty CUT. This is due to the fact that for at-speed functional patterns, each

particular failing cycle corresponds to a unique CUT within the design and may be

tabulated in a look-up table and does not require the design database for diagnosis as in

the case of tests performed using at-speed ATPG patterns. The process of identifying

failing CUTs with at-speed functional patterns has been automated using a script, which

reads the pattern description and Tester generated failure datalogs to isolate failing

CUTs. Both at-speed patterns i.e. functional and ATPG, provide 100% fault coverage

with the transition fault model on the targeted CUTs. The only comparative drawback

of Functional patterns is that they need to be generated manually as compared to tool

based generation for ATPG patterns and require deep knowledge regarding the

implementation details of the block.

2.5.2 Peak Power Analysis

Quality Parameter

Peak Power (W)

SS_0.6V_-40°C FF_1.3V_125°C

Power Unaware Approach 0.1516 2.0135

Power Aware Approach 0.0427 0.6021

Percentage Reduction 71.83% 70.1%

Table 2.1: Comparison of Peak Power aware and Peak Power unaware approaches

Peak power analysis results derived using Synopsys Prime Time for the IP

qualification block are presented in Table 2.1. It could be seen that under best PVT

corner for stress (SS_0.6V_-40°C), the peak power at any instant of time is low for both

power aware and unaware approaches. However for the worst stress corner

(FF_1.3V_125°C), peak power consumption with the power unaware approach is much

higher and could violate the power budget for the test-chip which would be risky for

silicon testing. Further with power-aware approach, it is found that power consumption

reduces by almost 70% across PVT corners which enable to comfortably meet the

power budget of the block.

32

2.6 Results and Analysis of Fmax and Vmin Characterizations

This section presents the results and analysis of Fmax and Vmin

characterization. It begins by showcasing the resultant impact of process variation and

further investigates process-corner, voltage and temperature as individual contributors

to variation. Thereafter the CAD vs. Silicon alignment at multiple PVT points is

studied. Finally the silicon spread with respect to frequency and voltage is analyzed.

Figure 2.10: Representative graph of the First Fail Frequency Distribution

Figure 2.10 shows a representative graph of the first fail frequency derived via

Fmax characterization across a population of dies embedded within the test-chip at a

single PVT point. Analysis of the data allows us to quickly understand the global speed

limitation across all instances within the block. The quantum of within die variation and

within wafer variation is easily discernable from the results. Within die or within wafer

variation refers to the difference between the maximum and minimum values of the first

failing frequency of instances within a die or across a wafer. Characterization for the

block was performed at various PVT points to understand the impact of process

variations across different process corners. The two metrics used to highlight these

values here are the percentage of average within die variation and the percentage of

within wafer variation. Their normalized values are listed in Table 2.2.

33

Values listed in Table 2.2 have been normalized using normalization factors

based on the absolute values at the PVT Point TT_1.0V_25°C. From a comparative

point of view, it can be seen that there is no linear correspondence between within die

and within wafer variations i.e. a higher amount of within die variations may not

PVT Configuration Normalized Percentage of Average Within Die

Variation

Normalized Percentage Within Wafer Variation

SS_0.6V_-40°C 1.053686232 2.555555556

SS_0.8V_-40°C 0.744038322 1.590268886

TT_1.0V_25°C 1 1

FF_1.3V_125°C 2.186772963 1.637221269

Table 2.2: Normalized Process variation values across PVT Points

necessarily lead to a higher amount of within wafer variations. The methodology and

analysis provides designers with accurate inputs regarding the amount of guard-banding

to take into account the effects of process variations at a given PVT point. To have a

better understanding of process, voltage and temperature as individual contributors to

variation, wafers are tested under dynamic conditions and the normalized values of

within die variation are listed in Tables 2.3, 2.4 and 2.5. Trends derived from the results

in Table 2.3 show that the variation is least for the slow process corner and most for fast

process corner. From Table 2.4 it can be seen that within die variation is directly

proportional to voltage while Table 2.5 shows that within die variation is inversely

proportional to temperature.

The derived data is also used to compare silicon results with the sign-off

estimates and, for this purpose, the instance having the median first failing frequency

across dies is selected. Figure 2.11 shows the CAD v/s Silicon comparison of the first

failing frequency across different PVT points. From the graph it can be seen that CAD

PVT Configuration Normalized Percentage of Average Within Die

Variation

SS_1V_25°C 0.922901

TT_1V_25°C 1

FF_1V_25°C 1.087106

Table 2.3: Normalized Percentage of Within Die Variation with respect to Process

Corners

34

PVT Configuration Normalized Percentage of Average Within Die

Variation

TT_0.6V_25°C 0.286596

TT_0.8V_25°C 0.656665

TT_1V_25°C 1

Table 2.4: Normalized Percentage of Within Die Variation with respect to Voltage

PVT Configuration Normalized Percentage of Average Within Die

Variation

TT_1V_-40°C 1.037410046

TT_1V_25°C 1

TT_1V_125°C 0.924126355

Table 2.5: Normalized Percentage of Within Die Variation with respect to

Temperature

Figure 2.11: CAD v/s Silicon Comparison across Multiple PVT Points

and silicon values are perfectly aligned at SS_0.6V_-40°C. It is seen that CAD data is

pessimistic at SS_0.8V_-40°C, i.e. the median first failing frequency on silicon is higher

than the one expected in CAD. Whereas at TT_1.0V_25°C & FF_1.3V_125°C the CAD

data are optimistic as compared to the silicon results. This information is vital for

35

calibrating CAD models to accurately reflect the performance on silicon and thus save

debugging time and effort.

Figure 2.12: Graphs of Failing cycles v/s. Frequency and Voltage

The failure datalogs of instances having the lowest first fail frequency are considered

for deeper analysis to locate cells with the lowest speed tolerance. Outlier cells if any

are chosen for failure analysis to isolate any design limitations or process imperfections.

Such a study is important on standard cells that are the basic building blocks of complex

SoCs, as cells having lower than anticipated speed may contribute to SDD which may

cumulatively result in timing failures. Resolution and elimination of timing defects

during the early phase of process development also helps in ramping up the yield at

faster rate. Figure 2.12 depicts graphs showing the total failing cycles as well as the

incremental failing cycles with respect to frequency & voltage for a single instance with

100 tiles, i.e. 12800 CUTs. While interpreting the graphs of Figure 2.12, it is important

to note that both rising & falling transitions cause a single cycle to fail with a functional

at-speed pattern. Hence as seen in Figure 2.12, at a constrained value of frequency /

voltage when all CUTs are failing, a total of 25600 cycle failures are recorded. Since

Fmax characterization is performed at a finer granularity as compared to Vmin

characterization, a higher number of data points are seen in the graph for the failing

cycles vs frequency. This analysis provides an insight into the window of frequency /

voltage between the first and last failing CUT of an instance and helps in better

calibrating CAD models for accurate sign-off.

Table 2.6 shows the CAD & silicon spread values across multiple PVT points

which have been normalized using normalization factors based on the absolute values at

the PVT point TT_1.0V_25°C. In order to calculate the silicon spread, the difference

between the average frequency at which the first cycle fails & the average first

frequency at which all cycles fail is considered for each PVT point. From the results it

36

can be seen that the trends for CAD and silicon data are well aligned across PVT points

and the degree of divergence between CAD & silicon can also be accurately gauged.

Understanding the silicon spread provides an accurate idea of dynamic voltage and

frequency scaling limits that may be applied for designs embedding the qualified cells.

PVT Configuration Normalized CAD

Spread

Normalized Silicon

Spread

SS_0.6V_-40°C 0.251851524 0.291666667

SS_0.8V_-40°C 0.647669182 0.833333333

TT_1.0V_25°C 1 1

FF_1.3V_125°C 1.797637065 1.45758636

Table 2.6: Normalized CAD and Silicon spread values across PVT Points

It is important to note that the normalized and comparative results provided in

this section are derived from a specific process flow of the technology under

investigation. However the design and test methodology is generic in nature and may be

deployed in test-chips across process flows and technology nodes for the purpose of

library qualification.

2.7 Summary

This chapter presents a test circuit for accurately characterizing IPs and

standard cell libraries using at-speed testing. The design supports the qualification of

basic building gates, cells and structures on silicon during the process development

phase in terms of timing before they are integrated within complex SoCs. The presented

power aware design and test methodology helps mitigate peak power issues during

capture and reduce power consumption by about 70%. The implementation allows the

use of both at-speed functional and at-speed ATPG patterns, and a comparative analysis

of both approaches in terms of memory requirement, test time etc. for this specific

design is put forth. The presented design has been tested and validated for the 28nm

FDSOI node. The results provide an insight into the impact of within die and within

wafer variation across multiple PVT points. Information regarding CAD vs. silicon

correlation derived from the test results helps to efficiently optimize CAD models and

37

determine the timing margins provided during sign-off. Fault diagnosis using the

derived silicon results helps in deeper understanding & resolution of design ambiguities

and process limitations. In the future, this block is intended to be deployed for the

assessment of even smaller technology nodes during the process development phase.

38

39

3 ANALYSIS OF SETUP AND

HOLD MARGINS INSIDE

SILICON FOR ADVANCED

TECHNOLOGY NODES

“Who looks outside, dreams; who looks inside, awakes.”

Carl Jung

Abstract: This chapter presents a design and methodology for accurate characterization of setup and hold margins in silicon while taking into account effects of Process Variations (PV). The test circuit provides deeper insights into sources of extra timing margins available on silicon. This in turn, enables accurate guard banding by preventing optimism and reducing unnecessary pessimism in the timing margins provided during sign-off. The presented design has been used for the development of the 28nm Fully Depleted Silicon On Insulator (FDSOI) node and associated relevant results and analysis have been provided.

40

3.1 Introduction

As device geometries continue to shrink in order to achieve higher integration density

and performance, the ability to control critical device parameters is becoming

increasingly difficult. With every consecutive technology node, circuit performance

becomes ever more sensitive to process variations [28] as significant variation in device

length, width and oxide thickness have been observed. The loss in performance due to

Process Variation (PV) can be equal to the gain by one full technology generation [58].

With frequencies increasing aggressively, even pico-second variations cannot be

ignored, in particular, for high speed clock design. Characterizing and managing PV is

becoming a critical issue for technology nodes in the nanometer range. Process

variations not only affect yield, but also have an impact on performance and reliability.

Timing analysis for signoff can no longer ignore PVs whose signature is observed from

foundry-to-foundry, lot-to-lot, wafer-to-wafer, within-wafer and also within-dies. In

particular, within-die variations are becoming progressively significant with each

technology node [29]. Hence, additional conservatism is added to account for these

variations, which can result in missed timings on the clock and data paths. Such margins

are added during sign-off and are referred to as De-rate in the rest of this chapter. It is

observed that simple guard banding to mitigate the effects of PV leads to increased

product development times and uncompetitive products. In addition, traditional Static

Timing Analysis (STA) methodologies are unable to cope with the large number of

permutations of process, voltage, and temperature corners created by these independent

and increasingly random sources of variation [59 - 61].

The development of new technology platforms is done through the validation of various

test chips, which fulfil the fundamental requirement of determining the silicon

parameters of IP’s (functionality, timing, power, etc.) before they can be used in a real

System-on-Chip. Therefore, while qualifying new technology nodes, it is essential to

ensure that the silicon performance of test circuits is within acceptable limits.

Furthermore, accurate diagnosis and understanding of the sources of deviation from

expected behaviour is essential for improving the yield and performance of a given

technology node.

This chapter presents a novel yet simple test circuit for accurate characterization of

Setup and Hold margins while taking into account effects of PV. The circuit consists of

a data path between launch and capture flops whose skew can be accurately controlled

41

to cause hold / setup violations. This design has been used for the development of the

28nm FDSOI [38] node and the associated relevant results and analysis provide deeper

insights into the effects of PV and the constituent components of the timing margins

observed on silicon.

3.2 State of the Art

Several methods have been proposed for enhancing yield and reliability as well as

improving timing analysis while factoring in PV. Firstly [62, 63] discuss statistical

methods for estimation and control of PV effects on circuit operation. Ring-oscillator

based measurement circuits are the most popular digital techniques for on-chip

variability control and are often used for simple and fast characterization of the AC

characteristics of digital CMOS circuits [64, 65]. However, what the frequency of a ring

oscillator tells us is how fast or slow a chip can operate and this data would need to be

further post processed to take requisite decisions for tuning performance in the presence

of variability. Post-fabrication tuning of substrate bias has been suggested in order to try

and compensate for threshold-voltage variation [66]. The study of the impact of process

and environmental variation on ASIC timing and methodologies to account for them

while modelling circuit delay and performing STA have been shown in [59, 60].

Nevertheless these methods take into account just the systemic variations while ignoring

random variations, which contribute to an increasing proportion of global variation in

deep sub-micron nodes.

In [67] a new statistical definition for setup and hold time is put forth which accounts

for PV whereas [68] motivates the fluctuation of setup time, hold time and Clock-to-Q

Delay under dynamic conditions and presents a method to cope with dynamic power

supply noise in FF timing estimation. In [69] a methodology for interdependent setup

and hold time characterization of sequential circuits has been presented. However once

again this aims at reducing pessimism in STA without taking into account PV effects on

silicon. Post-silicon skew tuning via Programmable Delay Elements, which aims at

overcoming the effects of PV by utilizing setup and hold timing tests has been shown in

[70]. A hold time failure model is defined and an approach for explicit validation of

hold time violations is shown in [71]. The work presented in this chapter takes forward

42

the State-of-the-Art by presenting a test circuit and methodology for characterizing

Setup and Hold margins on silicon and understanding the constituent components of

these margins.

3.3 Basic Concepts, Measurement Circuit and Methodology

Figure 3.1: Sign-off Clock Data De-rates

In order to ensure that designed circuits maintain functional integrity in light of process

and runtime variations, extra pessimism is added for Setup & Hold margins during

Sign-off, which are referred to as de-rates herein. As shown in Figure 3.1, in order to

safeguard against hold violations, the launch clock path is made faster and the capture

clock path is delayed after de-rating for hold checks. Similarly to prevent setup

violations, the launch clock path is made slower and the capture clock path is made

faster after de-rating. Data de-rates make the data path faster for hold & slower for setup

checks. The common clock path variations in Launch and Capture clock paths are

ignored.

A test circuit namely the Clock Data De-rate (CDD) block has been designed as a hard

macro to characterize and analyze setup and hold margins. This is done by measuring

the variation of delay in the Clock and Data paths to validate the de-rate added during

sign-off. The basic diagram of the CDD block is shown in Figure 3.2. The circuit

consists of a data path commencing and terminating in Launch and Capture Flops. In

order to derive the data path for the design, an analysis was done on real world SoC’s

and a representative set of cells was selected from the ones that were most commonly

43

Figure 3.2: Clock Data De-rate: Basic Diagram

employed and the ones that were susceptible to variability. To prevent the possibility of

metastability at the Capture Flop impacting test results in a negative manner, a sensor

has been used to flag the occurrence of metastability. Two variable delay generators

employing ring oscillators have been used to add different delays in paths of launch and

capture clocks. The amount of delay on each individual path can be controlled by

setting the configuration of the Control Bus via top-level pins of the test-chip. By

inducing different skew conditions, Setup / Hold failures can be created. Each path

within the block is designed as a Ring Oscillator whose frequency is observable on the

output ports after passing through an internal divider to account for the IO Pad and

Tester limitations. As just a single transition is propagated along the clock and data

paths, the flip-flop ring oscillator concept [64] has been used to find the path delay

along the clock and data paths. By considering the Setup / Hold margin at which design

has been closed in conjunction with the data path delay and the skew at which failure

occurs, the extra margin along with its constituent components can be accurately

determined.

The CDD block can be used for the comparison of extra margin available on silicon

with respect to the de-rate added during sign-off when Hold Rise, Hold Fall, Setup Rise

& Setup Fall violations occur. A Hold Rise/Fall violation is said to occur when the skew

between launch and capture clocks induced by the variable delay generators is sufficient

to cause a hold violation at the Capture flop while transferring a 1/0. Conversely a Setup

Rise/Fall violation is said to occur when the skew between launch and capture clocks

induced by the variable delay generators is sufficient to cause a setup violation at the

Capture flop while transferring a 1/0. Timing diagrams of these violations are depicted

in Figure 3.3.

44

Figure 3.3: Timing Diagrams for Hold Rise/Fall and Setup Rise/Fall violations

Figure 3.4: 28nm FDSOI Test-chip with the CDD block embedded at 7 different

locations

The presented design has an accuracy of 5ps with a maximum skew range of 600ps and

occupies a chip area of 0.3 mm2 when implemented in the 28nm FDSOI technology. In

order to enable a study of the effects of Within-Die variation on Setup & Hold margins,

the CDD block has been embedded at 7 different locations within a test-chip as shown

in Figure.3. 4.

3.4 Test Methodology

Steps to characterize the CDD block and derive the extra margin on silicon are

sequentially performed as follow:

45

1. Firstly the functionality of launch and capture flops is verified along with a

check for the occurrence of metastability in the absence of skew between the

clock paths.

2. The delay of the Offset i.e. the extra combinational logic within the Clock and

Data Paths is characterized. This can be done as it is possible to mux just the

Offset within the path and measure its delay.

3. The delay of the complete clock and data paths is characterized i.e. with the

Offset included for the entire range of variable delay that can be added within

these paths. This enables the extraction of data and clock path delays while

excluding the offset.

4. Rising and falling transitions are propagated along the data path while

monotonically increasing the variable delay that can be inserted in the launch

and capture paths to induce skew between them. The configuration of the

Control Bus of the variable delay generators at which the last pass condition

occurs is latched to determine the permissible skew before Setup or Hold

violations occur.

5. The extra timing margin is the difference between extra margin available in

silicon w.r.t. the slack available in sign-off. This margin is computed by skewing

the clock till the last pass condition of hold and setup occurs. Constituents of this

extra margin can be easily derived from the independent delays of the clock and

data paths characterized earlier.

Silicon wafers with test chips having our design embedded within them have been

fabricated as part of a corner lot at two different foundries using the 28nm FDSOI

technology node. For the purpose of our investigation, a single chip from wafers of the

SS, TT and FF corners were tested at -40°C, +30°C and +125°C in the voltage range of

0.7V – 1.4V using the Verigy 93k automated test equipment.

3.5 Results and Analysis

This section describes the results which quantify the available setup and hold

margins inside silicon. The presented analyses help to understand the constituent

components of the available timing margins in silicon.

46

Figure 3.5: Clock Data De-rate: Extra Margin for Hold in Silicon

Figure 3.6: Clock Data De-rate: Extra Margin for Setup in Silicon

As shown in Figures 3.5 & 3.6, the extra timing margins available in silicon are

computed as the difference in data delay between silicon & sign-off plus the skew

difference between silicon & sign-off. Here X1, X2, X3 and X4 are Launch clock delay,

Capture clock delay, Data delay and Clock Skew respectively. The clock skew is

computed as the Launch clock delay minus the Capture clock for Hold and vice-versa

for Setup. X3_S and X4_S are Data Delay and Clock Skew inside Silicon.

Hold extra margin is also represented in the waveforms of Figure 3.7 where

data remains stable for capture flop longer than the required time needed. Extra Delay

added in the design can be avoided by accurately characterizing the margins needed in

Silicon.

47

Figure3.7: Hold Extra Margin Waveforms for Capture

Figure 3.8: Setup Extra Margin Waveforms for Capture

48

Pessimism added in de-rates results in higher cells drives and higher operating voltage

to meet the required frequency specification. The extra setup margin waveforms in

Figure 3.8 show that data is stable for clock much ahead clock arrival time of the

capture flop. This means frequency in Silicon is better than estimated by the design

Sign-off. These Extra margins are often more than required inside design which results

in greater area, increased power consumption, and reduced operating frequency for the

design. Detailed extra hold and setup margin analysis is hence required to reduce

pessimism present in the design.

Results and analysis presented here focus on the Process-Voltage-Temperature

(PVT) point FF_1.3V_-40°C for evaluating Hold margins, since the worst case hold

failure is expected to occur for chips from the Fast Corner when operated at high

voltage and low temperature. Similarly the assessment of Setup margins is done at the

PVT point SS_0.9V_125°C as the worst case for setup failure is expected to occur for

chips from the Slow Corner when operated at low voltage. Results available for

different wafers, lots, and different foundries can be used for sign-off de-rate

calculations.

Timing

Margin

Silicon

Launch Clock

Silicon

Capture

Clock

Silicon

Data Delay

Hold (+) ↑ (+) ↓ (+) ↑

Setup (+) ↓ (+) ↑ (+) ↓

Table 3.1: Clock & Data delay positive directions in the graph

Figures 3.9 – 3.12 show the Within-Die variation of normalized extra margin

along with its constituent components for the Hold Rise/Fall & Setup Rise/Fall

violations at the PVT points FF_1.3V_-40°C & SS_0.9V_125°C respectively. Extra

margin in Hold & Setup on silicon means the timing margin available in silicon over

and above the sign-off passing value for setup & hold. From the graphical spread

approximately 10% within-die variations in timing margins are observed. Such

variations are expected in advanced technology nodes and require a deeper investigation

and understanding of the constituent components of timing margins in order to enable

the application of accurate de-rating factors.

Graphs of Figures 3.9 – 3.12 depict the sources of extra margin available on

silicon before Hold Rise/Fall and Setup Rise/Fall violations occur. Herein, the extra

margin inside silicon for a single die is presented for analysis and a similar pattern is

49

Figure 3.9: Normalized Hold Rise Extra Margin on Silicon depicting Within Die

Variations & constituent components

Figure 3.10: Normalized Hold Fall Extra Margin on Silicon depicting Within Die

Variations & constituent components

50

Figure 3.11: Normalized Setup Rise Extra Margin on Silicon depicting Within Die

Variations & constituent components

Figure 3.12: Normalized Setup Fall Extra Margin on Silicon depicting Within Die

Variations & constituent components

51

observed for different dies within the wafer. An analysis of test results reveals that Hold

which is passing with 0ps of positive slack in sign-off is showing margins in Spice and

Silicon. The reason for this extra margin is pessimism in clock and data de-rates.

Directions of positive value in graph can be interpreted from Table 3.1 which shows the

comparison of silicon data with respect to sign-off. ' ↑ ' represents an increase in silicon

delay with respect to sign- off, whereas ' ↓ ' represents a decrease in silicon delay with

respect to sign-off. When Silicon Clock and Data delays are in the directions of the

arrows shown in Table 3.1, then extra margin for clock and data is in positive direction.

As shown in Figures 3.9 & 3.10, data de-rates are more pessimistic as

compared to clock de-rates for hold. For sign-off Hold, Launch clock is de-rated with

negative value which will decrease the Launch clock delay and capture clock is de-rated

with positive value which will increase the capture clock delay. Therefore we infer that

Clock cells are delayed in silicon with respect to sign-off clock insertion values. Overall

clock de-rates are also pessimistic as skew introduced because of de-rates supports hold.

Spice data which is without any margin is showing positive slack & Silicon result is

above Spice as FF Silicon wafer is towards typical side of the Gaussian curve.

Similarly as seen in Figures 3.11 & 3.12, setup which is passing with 0ps of

slack for a given frequency is having positive slack in Spice and Silicon. Data de-rates

are optimistic with respect to silicon data path/cells and clock de-rates are pessimistic

with respect to clock paths/cells. Sign-off Capture Clock cells are de-rated negatively

for setup which means clock insertion will be faster for capture clock but in silicon

clock cells are having higher delay. This creates more gaps between actual silicon value

and sign-off estimates. Launch path cells are delayed in sign-off with higher de-rates

than silicon which again creates positive skew, supporting the setup. Keeping in mind

that modern technology nodes are slated to be in production at multiple foundries, the

de-rating factors applied must also take into account the lot-to-lot and Fab-to-Fab

variations.

52

3.6 Summary

This chapter presents a test circuit for accurate characterization of delays in

clock and data paths for evaluating the extra timing margin available in silicon.

Understanding the sources of extra margin aids in precisely establishing de-rate added

during sign-off such that fail-free operation can be ensured while deriving maximum

performance from a given circuit. The proposed design uses only standard cells, which

allows for easy implementation. The presented test structure can be used as sensor

inside the chip along with SOC designs to improve the frequency of a design in post-

silicon tuning after reading results from the sensor. The design and methodology

presented in this paper has been tested and validated for the 28nm FDSOI node,

enabling accurate de-rating while taking into account PV effects. In the future, this

block is intended to be deployed for the assessment of even smaller technology nodes to

enable the acceleration of the process development phase.

53

4 ADAPTIVE WEAROUT

MANAGEMENT WITH

IN-SITU AGING

MONITORS

“Live as if you were to die tomorrow;

learn as if you were to live forever.”

Mahatma Gandhi

Abstract: This chapter shows the fundamental elements towards an Adaptive Wearout Management (AWM) control loop for the dynamic supply voltage regulation of digital circuits. Firstly, a new generation of in-situ monitors which generate warning flags to prevent circuit failure under nominal conditions have been explained. A voltage management control loop to efficiently tackle all voltage margins over a large range of DVFS operating conditions and temperatures has been implemented. The relevant experimental silicon results related to timing windows and power savings have been provided.

54

4.1 Introduction

The doubling of transistor counts every two years, as evinced by Moore’s law

[1], has been the fundamental driver of the microprocessor industry. This law still paces

the device roadmap, as defined within the ITRS consortium [75]. Its corollary about

microprocessor performance and power consumption is known as Dennard’s scaling

[76]. In the mid-2000s, this trend was disrupted when the microprocessor reached the

heat dissipation limits of the chip. At this point new solutions (like multiprocessing)

were adopted to keep exploiting the still growing number of transistors, while

maintaining power consumption within manageable limits. On the other hand, the

demand for increasing performances of portable devices like smartphones or tablets is

constrained by a limited power source. Finally, almost all electronic market segments

are now driven by energy efficiency (performance per watt) and not solely by pure

performance metrics. All conventional synchronous digital systems are semi-custom

designs. It means that a set of logic gates is dynamically selected from a standard cell

library (containing all available logic gates) to establish the desired functionality under a

given frequency, while optimizing iteratively power and area consumption. Further, it is

also important that correct timing is ensured under Process, Voltage and Temperature

fluctuations as well as Aging (PVTA). The traditional approach is to fulfil all these

requirements under the worst-case scenario. This worst-case approach translates into

additional voltage safety margins on top of the nominal minimum voltage i.e. the

required supply voltage if no PVTA variations occur as shown in Figure 4.1.

Figure 4.1: Different voltage margins in addition to the nominal minimum voltage

55

Though these margins allow the proper operation in extremely uncommon

scenarios, it results in excessive margins during standard operating conditions, which in

turn deteriorate the circuit’s energy efficiency. Considering the need for growing

margins in recent technologies, it is desirable to cut unused voltage margins while

operating away from worst-case situations. Significant power savings can be achieved

by cutting additional voltage margins as the dynamic power decreases quadratically

with the supply voltage and the leakage power reduces even exponentially. However,

reducing the supply voltage increases the path delays and may result in timing errors.

This necessitates the need for having embedded sensors which can detect / predict the

occurrence of timing errors.

The idea behind circuit failure prediction is to forecast the occurrence of a

failure during normal system operation before the appearance of any error that can

result in corrupt system data and states. This is in contrast to classical error detection

where a failure is detected after errors appear in system data and states. Failure

prediction can be performed in multiple ways wherein the basic principle is to insert a

wide variety of sensors i.e. In-Situ Monitors (ISM) at various locations inside a chip.

During normal system operation, these sensors collect information about various system

parameters over time. Examples of such system parameters range from temperature,

voltage, ring oscillator delays to complex relative timing relationships among logic

signals. The data collected by the sensors is analysed on-chip or off-chip to identify

anomalies and predict failures. In-situ monitors (ISM) might be needed for two main

application kinds. One application is to predict all timing failure events which can be

combined with any existing resilience mechanism [8, 77, 78] at the price to additional,

complex circuitry. Another possible application is to use the monitor as speed sensors

indicating whether stabilized current operation condition is close to possible timing

failure. Except for fast transient voltage variations, this approach can be used with

Adaptive Body Bias (ABB) or Dynamic Voltage Frequency Scaling (DVFS) to account

for static or slowly changing variations such as manufacturing variations, ambient

conditions as well as circuit aging effects.

56

4.2 State of the Art

Margins related to manufacturing variations (towards slow or fast global

corners) can be reduced statically by testing a part after production to adjust the voltage.

This process still does not eliminate the need for margins to cope with dynamic

environmental variations. To exploit unused margins, but avoid aggressive voltage

scaling, the supply voltage has to be adjusted in a closed-loop control. To reach the real-

time feedback of the chip’s operating condition, various classes of monitors have been

proposed in order to measure the circuit path delay. Ring oscillators [79] or Canary

circuits [80, 81] aim to replicate the timing behaviour of the original circuit. Replica

elements are non-intrusive and are well suited for monitoring global variations on the

chip. However, they might fail to capture within-die variations which are local to real

circuits such as random manufacturing variations and circuit aging. Another approach to

overwhelm this drawback is the use of in-situ monitors which aim to measure the timing

slack, directly linked to circuit path delay. Since critical paths typically ends at registers,

the common approach is to use modified flip-flops as slack monitors.

Figure 4.2: Different In-situ delay monitor schematics

57

In Razor approach [8, 82, 83], special latches or flip-flops are used to detect

timing failures due to setup time violation and the voltage adaptation is based on the

monitored error frequency. Errors are corrected through a pipeline flush or architectural

replay by the means of dedicated circuitry. The Razor concept however suffers from

inherent problems, like the large complexity for the necessary recovery mechanisms.

Further problems are the lack of real-time capability and the minimum delay constraint,

which is similar to the hold-time constraint in latch-based designs [82, 84]. The issues

related to error detection scheme can be avoided by preferring circuit failure prediction

approach, i.e. to predict the occurrence of a failure during normal system operation

before the appearance of any error in system data and states. In this approach, enhanced

monitors are used to emit flags (pre-error warnings) when late but still non-erroneous

data transitions occur [85]. The pre-error rate is an indicator for the circuit speed and is

used for local variations and aging aware voltage adaptation. In previous works,

monitors were inserted either at the end of critical paths [85, 86] or at intermediate

nodes [87].

The work described in this chapter uses a new in-situ pre-error monitor [88]

which allows achieving high monitoring performance with minimum overhead, timing

impact and failure rate with respect to previous solutions [85-87]. Figure 4.2 shows the

schematics of different in-situ monitoring solutions. At the top of the figure, a standard

pre-error monitor solution inserted at the end of the critical path is shown. In the middle

of Figure 4.2, the monitor is inserted at an intermediate net with gates having matching

delay. Lastly the bottom of the figure shows the monitoring mechanism used in this

work for implementing an adaptive wearout management scheme.

58

4.3 Monitoring System – Design Perspectives

Figure 4.3: Pre-error monitor schematic

The monitor principle is to minimize both the timing impact on the monitored

critical path and the source of within-die variations. The generic monitor schematic

shown in Figure 4.3 can be straightforwardly compared to previously proposed

monitors [85-87]. The main idea is here to probe the incoming data right after the

master latch [88] (and not prior as all previous approaches) in order to get rid of the

additional load related to the monitor on the already critical path. Through adequate

sizing of the delay element, it is possible to minimize the impact on the flip-flop setup

time. Figure 4.3 shows a detailed schematic of the real monitor implemented and

utilized for the work described in this chapter. The delay element is composed of

minimum size inverter and a multiplexer so to allow a Test Mode (TM). The delayed

signal is then passed through an additional latch, equivalent to the Slave latch of the

reference flip-flop. At this stage, it can be concluded that this approach is more efficient

in terms of area overhead than other approaches [85-87] that are using at minima a

duplicate of the whole reference flip-flop. In the implementation used in this work, a

second latch have been introduced to act as an additional delay element, thus enabling

two different monitor detection window at the price to larger monitor area overhead.

59

Figure 4.4: Timing diagram for Pre-error monitor

The timing diagram for the monitor is shown in Figure 4.4. From left to right,

the Data (D) is more and more delayed with respect to the clock (CLK) though the flip-

flop output Q remains correct. In the first configuration, no warning flags are emitted

since the remaining slack is large enough. For slightly reduced slack (intermediate

situation), only the Flag2 is raised (corresponding to the large timing window) while

Flag1 remains at zero. Finally, when the slack is further reduced, both Flag2 and Flag1

are raised.

To evaluate experimentally the proposed monitoring system, a Bose, Ray-

Choudhary and Hocquenghem (BCH) [89] error correcting code IP core consisting of

encoder and decoder modules has been implemented with in-situ monitors (ISM)

appended at the end of critical paths. The decoder module makes use of the following

algorithm:

1. Calculate the syndromes for the received frames

2. Determine the number of errors and their location using the Chien search

algorithm

3. Calculate the error values at those error locations and

4. Correct the errors.

Random frames and errors are generated using Pseudo Random Binary Sequence

(PRBS) modules to emulate a large range of patterns. The overall BCH IP is

implemented with an autotest module which raises an Autotest_OK flag when the

decoding process (including error localization and correction) proceeded without

60

failure. The BCH IP core was implemented in 28nm UTBB FDSOI technology using

standard design platform flow and tools. The IP consists of 8469 gates for 869

endpoints and 1720 critical paths.

Figure 4.5: Path delay histogram for BCH IP at different conditions of voltage &

process

While using in-situ monitors, the primary questions relate to where and how

should the monitors be inserted. The monitor insertion requires first a placed and routed

reference design (i.e. without monitors). From that reference design, the path delay

61

histogram can be extracted under various Process, Voltage, Temperature and Aging

(PVTA) conditions as shown in Figure 4.5. Any circuit is designed with the intent to

work at a given operating clock period which is guaranteed in a generic flow under

worst-case conditions (i.e. including all margins). As a matter of fact for typical corner,

there is some room for either performance increase (i.e. frequency) or power reduction

(i.e. voltage reduction). In both cases, all paths whose worst-case delay falls in the flag

detection window (paths highlighted in red in Figure 4.5) are classified as timing critical

and require monitors. In the Figure 4.5, the top graph shows the path delay histogram of

the BCH IP at typical corner and nominal voltage; the middle graph is plotted for

minimum voltage (Vmin) and typical corner; whereas the bottom graph is for minimum

voltage (Vmin) and worst-case corner.The size of the flag detection window affects the

total number of monitors and the monitoring area overhead is directly proportional to

the flag detection window. In this work, four versions of the BCH IP have been

implemented depending upon the number of monitors. The implementation results are

summarized in Table 4.1.

Table 4.1: Implementation results of BCH IP with in-situ monitors

62

4.4 Results and Analysis

This section firstly depicts the results of the silicon validation of the embedded

in-situ monitors. Thereafter an Adaptive Wearout Management strategy implemented

using the warning flags from the in-situ monitors is explained with the associated

results. Finally the power savings achieved via implementing the AWM control loop are

enumerated.

4.4.1 In-Situ Monitors

Figure 4.6: Experimental timing diagram showing pre-error flags behavior with

lowering voltage

All experimental results presented in this section are from the BCH1

implementation, except when stated differently. The overall BCH IP core is

implemented with an autotest module which raises an Autotest_OK flag when the

decoding process (including error localization and correction) proceeded without

failure. Overall, experimentally, the occurrence of the Autotest_OK flag and warning

flags (Flag1 and Flag2) under various PVTA conditions and PLL frequencies are the

main parameters monitored with the Leakage and Dynamic powers. Figure 4.6 shows

the experimental timing diagrams observed on the Verigy 93K tester. For a given

voltage value, only the Flag2 (with the biggest detection window) is raised while

Autotest remains OK and the Flag 1 is not raised (top). If the voltage is lowered, both

Flag1 and Flag2 are raised, which illustrates that the critical paths are close to failure

63

(bottom). Yet, the Autotest_OK flag, here raised, states that the pattern executes without

failure.

Figure 4.7: Experimental warning flag/BCH1 Autotest maximum frequencies for

various PVTA conditions

By varying PVTA conditions, the maximum frequency is searched for both the

Autotest_OK flag and the Flag1, Flag2 warning flags. For BCH Autotest, the maximum

frequency corresponds to the highest frequency for which the PRBS pattern runs

without failure under the given PVTA conditions. For warning flags, the maximum

frequency corresponds to the highest frequency for which the pattern runs without

triggering a warning flag. Both maximum frequency measurements can be compared in

a correlation plot as shown in Figure 4.7. The important element here is that warning

flags are mostly always emitted prior to the actual BCH IP core failure, thus acting as

pre-error flags. The usage of in-situ aging monitors is also motivated by the possibility

to correct inline for aging-related drifts. As a matter of fact, adding such monitors into

an IP should not introduce additional failure rate. With pre-error monitors approach, the

risk is related to the presence of false flags, i.e. a warning flag occurring after the actual

failure of the IP. By construction, the pre-error monitor provides some margin (the

delay element) as illustrated in Figure 4.7 where in average warning flags are always

emitted prior to the actual BCH IP failure. The warning flags detection window can be

experimentally measured directly by comparing the warning flags and BCH Autotest

maximum frequencies.

64

4.4.2 Adaptive Wearout Management

Figure 4.8: Adaptive Voltage Management Flowchart

In the proposed Adaptive Wearout Management (AWM) scheme, the supply

voltage is controlled according to the occurrence of in-situ monitor flags. The basic idea

is to reduce the voltage as long as no warning flags are detected for a given observation

interval Ncycles. Otherwise, in presence of detected flags, the voltage is increased. The

flowchart of the implemented voltage management scheme with 1mV regulation step is

shown in Figure 4.8. From recent studies, increasing the voltage to provide additional

margin has proven to be efficient [90-92] to manage additional margins with respect to

reliability. By reducing the supply voltage, the delay distribution is shifted towards

65

longer timings, as shown in Figure 4.5. At some point, some timing paths will be lying

in the in-situ monitor detection window and as a consequence, they can trigger a

warning flag. For a fixed operating condition and for a long execution time, the patterns

running through an IP can be seen as randomly distributed. To emulate this situation

experimentally, patterns at the inputs of the BCH IP core are generated using Pseudo

Random Binary Sequence (PRBS) modules, as described earlier.

Figure 4.9: Adaptive Wearout Management (A) without considering aging effects

and (B) With considering aging effects

Using this simplified scheme, several AWM control loop configurations have

been explored i.e. with different nominal voltage & frequency as well as different

regulation step sizes. At first, a typical, fresh (i.e. no aging) part is observed and the

66

initial voltage is the one resulting from the overall voltage margin stacks as described in

Figure 4.9(A). After some use time, as a result of the AWM control loop, the voltage

reaches a steady state voltage value which is close to the minimum voltage, while still

executing the instructions without failure. This steady-state value is defined by the in-

situ monitor (ISM) delay margin, as introduced by design. The steady-state voltage

values are distributed around an average value which is observed to be independent of

the voltage regulation step. Following an electrical stress, all the degradation modes will

induce a speed slowdown. This is illustrated in Figure 4.9(B) where the minimum

voltage to sustain a given frequency after aging is increased compared to the fresh

situation. This illustrates that the AWM control loop is still efficient after aging to adapt

the voltage.

Figure 4.10: Detection window evolution with stress time

As stated in [93], the impact of electrical aging degradation in digital circuits

should not only consider the average degradation but also the induced additional

variability. To evaluate this impact, the Flag1 detection window degradation has been

monitored along the stress, considering both the average value and its standard

deviation as shown in Figure 4.10. It is worth noticing that its standard deviation is also

increasing with stress time as a result of the degradation-induced variability in the

monitor.

67

4.4.3 Power Savings

Figure 4.11: Monitoring system failure rate and power reduction for various

DVFS operating conditions

Figure 4.12: Total power savings with respect to static voltage margin at different

temperatures

As a result from the AWM control loop, the fresh (i.e. no aging) parts can

operate at much lower voltage, so with power savings with respect to a conventional

static voltage margin. The experimental power savings under worst-case conditions (i.e.

higher temperature) have been measured to be 36% for static power (i.e. leakage) and

25% for dynamic power over 0.5-2.0 GHz frequency range for Flag1 detection window

as shown in Figure 4.11. Over this large range of frequency, the worst-case supply

68

voltage is dynamically following the frequency change in a dynamic way through a

DVFS approach. In all these operating conditions, it is worth noticing that the

monitoring system has negligible failure rate. The total power savings is reduced with

lower temperatures for a given monitor layout as shown in Figure 4.12. This is due to

the fact that, at lower temperatures, the relative impact of the leakage power is reduced

from 20% at 125°C to 5% at -40°C.

4.5 Summary

This chapter demonstrated the fundamental elements towards an Adaptive

Wearout Management (AWM) control loop for the dynamic supply voltage regulation

of digital circuits. Firstly a new generation of in-situ monitors which generate warning

flags to prevent circuit failure under nominal conditions have been explained. The

adaptive voltage management control loop is efficient to tackle all voltage margins on

fresh parts (i.e. without aging) over a large range of DVFS operating conditions and

temperatures. In spite of the performance degradation after an electrical stress, the

AWM control loop is still efficient to maintain a fault-free instructions execution with

consequent power savings over the whole lifetime. Overall, up to 40% static and 25%

dynamic power reduction has been demonstrated over a large range of DVFS operating

conditions while maintaining a fault free instruction execution.

69

5 THERMAL

CHARACTERIZATION USING

DIGITAL SENSORS

“If you want to shine like the Sun,

first burn like the Sun.”

A. P. J. Abdul Kalam

Abstract: Post-silicon thermal validation of fabricated chips is necessary to ensure safety, reliability and conformity to specifications. Because of the enormous complexity of modern designs, it becomes mandatory to accurately characterize thermal behaviour during the technology development phase itself. In this context, this chapter presents a test methodology for thermal characterization under dynamic workloads using embedded digital sensors. Firstly the design of the test-chip used is explained and thereafter the test patterns and test methodology employed are elaborated upon. In particular, the methodology required for managing high volume of data generated during such tests is elaborated upon. Analyses of the on-chip thermal results provide useful insights for test-chip design requirements related to thermal measurements.

70

5.1 Introduction

In recent years elevated on-chip temperatures have become a major concern in

the design and operation of computing devices. Magnitudes of on-die temperatures are

determined by the spatial and temporal allocation of power consumption and the design

of the heat removal system. Elevated temperatures directly impact all key circuit metrics

including: lifetime and reliability, speed, power, and costs. Hot spots reduce the Mean

Time to Failure (MTF) as most failure mechanisms (e.g., electromigration, time

dependent dielectric breakdown, and negative bias temperature instability) have strong

temperature dependencies [94]. Furthermore, different thermal expansion coefficients of

chip materials cause mechanical stresses that can eventually crack the chip/package

interface [95]. Elevated temperatures also slow down devices and interconnects leading

to timing failures [96, 97]. The exponential dependency of leakage power on

temperature further increases total power and could lead to thermal runaway [98]. Due

to the non-ideal scaling of power consumption and costs of heat removal systems,

temperature has become a major hurdle towards improving the performance of high-end

many-core processors [99]. Moreover due to the enormous complexities and sheer

number of modelling parameters of state-of-the-art designs, pre-silicon power and

thermal models cannot be trusted blindly. It is necessary to complement pre-silicon

analysis with post-silicon thermal and power characterization on the fabricated devices,

and then to use the characterization results to improve the design during re-spins before

ramp-up and production. In this context, the work of this chapter describes a test

methodology for accurate thermal characterization under dynamic workloads using

embedded on-chip thermal sensors.

5.2 Test-chip Design Overview

The test-chip used for thermal characterization under dynamic workloads is

fabricated using the 28FDSOI process technology. From the viewpoint of performing

Self-Heating measurements, the test-chip embeds within itself a PLL for high frequency

71

generation, various versions of the BCH IP (explained in Chapter 4) which is a

representative digital block used to emulate dynamic workloads, and a Thermal Sensor.

Figure 5.1: Floor plan and overview of test-chip used for thermal characterization

Figure 5.1 shows the floor plan & overview of test-chip used for thermal

characterization. The main elements of the test-chip used for performing self-heating

measurements are briefly described hereunder:

· BCH IP: Six instances of the BCH (Bose, Ray-Choudhary and Hocquenghem)

error correcting IP have been included in the test-chip. From the point of view of

performing self-heating measurements, the BCH is a representative digital block

consisting of 8469 gates wherein activity can be triggered at high frequency.

Additionally an Autotest mode exists which triggers activity by generating

frames using a pseudo-random generator.

· PLL: An on-chip charge pump Phase Locked Loop (PLL) is used to clock the

BCH IP at high frequencies. The PLL is suitable to produce output frequencies

in the Ghz range above the maximum operational frequency limit of the BCH

IP.

· THS: The thermal sensor embedded within the test-chip has a design range for

temperature measurement from -40°C to 125°C with a resolution of 1°C. The

72

measured temperature is computed by reading a digital code on the top level

pins of the test-chip.

Various blocks are muxed together within the test-chip to overcome the constraint

of having limited number of IO pads for multiple blocks. Hence, the same top level

input pins are used for activating different functions on different blocks. Similarly

the outputs of these different blocks are mapped to overlapping output test pins.

Also the power supplies for the BCH, PLL & THS are muxed together on a single

tester channel due to hardware limitations. These design constraints introduce

additional complexity in the pattern generation and output data observation.

5.3 Test Patterns

Two distinct patterns and their associated test methods have been developed to

understand the effect of static & dynamic heat. Static heat is referred to as the thermal

temperature measured using the sensor when there is no activity triggered on the other

blocks within the test-chip. While dynamic heat is used to refer to the measured

temperature when activity is induced on the neighboring blocks of the test-chip.

· Pattern for Static Self Heating Measurements

Figure 5.2: Pattern for Static Self Heating Measurements

Figure 5.2 depicts the outline of the pattern used to perform static self-heating

measurements. Measurements are made and the sensor reading is recorded at multiple

voltages with all other blocks within the test-chip having no activity induced on them.

These measurements help gauge the impact of switching activity on heat measurements.

· Pattern for Dynamic Self Heating Measurements

Figure 5.3: Pattern for Dynamic Self Heating Measurements

73

Figure 5.3 depicts the outline of the pattern used to perform static self-heating

measurements. The different parts of the pattern are explained hereunder:

1. Initially the THS is activated and a single measurement is performed to acquire

the value of static temperature without the PLL or BCH being activated.

2. In the second step the PLL inputs are set to produce the desired frequency. At

this stage of the pattern, the input configuration of the PLL is latched and

thereafter any change on the input pins of the PLL block has no effect on the

PLL output frequency as long as the input clock is stable & well functional.

Once the PLL Output frequency has stabilized, a status signal goes high. A

digital Capture is performed to ascertain the state of the PLL status signal in

order to ensure that the BCH blocks thereafter are clocked at the correct

frequency.

3. In the third stage, the THS is once again activated and a single measurement is

acquired to see if there is any effect on temperature due to PLL activation.

4. In the fourth stage, the required BCH blocks are activated and a single digital

capture is performed to check the Autotest status and verify BCH functionality

at the set voltage & frequency.

5. In the fifth stage, the THS is activated with the BCH still running at the set PLL

frequency with the outputs configured to observe the Thermal sensor data. This

stage of the pattern is kept in a loop so that the duration for performing dynamic

self-heating measurements can be easily configured by setting the loop counter

for this stage.

6. In the sixth stage, a simple BCH Autotest check is performed with the value

being acquired via a single digital capture to check the BCH functionality after

Self-Heating measurements have been completed.

7. In the final stage, the PLL functionality is once again verified via a single digital

capture on the PLL status signal to ensure that the PLL was well functional

while the dynamic self-heating measurements were in progress.

An important aspect for patterns which activate multiple blocks as in this case is that the

PLL & THS need to be simultaneously provided with different clock frequencies using

a single spec. This is managed at a binary level within the pattern editor and the

resulting clock waveforms are shown in Figure 5.4.

74

Figure 5.4: PLL and THS Clock Waveforms

5.4 Selective Digital Capture Methodology

Figure 5.5: Memory management during Digital Capture [100]

For effective thermal characterization, the test methodology must be capable of

acquiring large amounts of digital output data of a device at a high data rate and storing

this data for later analysis and processing. This section briefly explains the usage of the

SmarTest [100] digital capture test methodology for our particular test case. For digital

capture, as the size of the Sequencer Memory (SM) is limited as compared to the Vector

Memory (VM), it is essential that the pattern be placed in the SM as the captured data

can then stored in the VM as shown in Figure 5.5. During execution, the sequencer

reads the pattern from the SM and stores the captured data in the VM.

In order to read the values of the THS, it is necessary to perform periodic

digital captures. HP SmarTest provides two distinct modes for digital capture namely

the Sequential mode and the Selective mode. The default mode for Digital Capture is

75

the Sequential mode wherein the Tester captures all output samples on all output pins

for every Tester cycle of the pattern length. The Sequential mode is not suitable for

capturing data in this case as capturing starts with the first vector and ends with the last

vector of the main pattern. This causes a memory overflow by capturing output data

which is not used for further processing. In such a scenario, SmarTest returns an

available capture memory overflow message as shown in Figure 5.6.

Figure 5.6: SmarTest Memory Overflow Message

In order to overcome this memory limitation, it is essential to employ the

Selective Capture mode wherein capturing will be only carried out at vectors with

device cycles containing designated capture edge actions. Using selective mode,

capturing of undesired data can be completely suppressed i.e. only specific vectors

within a pattern may be selected for capture. Selective capture mode is automatically

activated when the pattern used for digital capture incorporates two specific edge

actions in the wavetable to define when and when not capturing will be performed:

· Action C à Capture data on this edge.

· Action D à Do not capture data on this edge.

The aspects that need to be kept in mind when defining a selective capture are listed

below:

1. The number of selective capture edges must be the same for all selective capture

waveforms in one wave table. Within a capture pattern (label) selective capture

actions need to be on the same receive edges across all waveforms of the

according wave table.

2. Each waveform within a capture pattern (label) must contain either "C" or "D"

actions consistently over all used capture receive edges. A waveform that

contains a "C" or "D" action must not contain any other compare action.

3. The edge that is used for selective capturing must be set explicitly to "D" in all

waveforms of the wavetable where it is not to be captured. This also applies to

the waveforms for the default device cycle.

4. The "C" action will perform a 'compare to 0' to determine the capture value.

76

5. In the same wavetable C and D actions cannot be mixed with other receive

actions.

6. C or D actions cannot be mixed with other receive actions within one waveform.

7. The selective capture actions C and D cannot be mixed within one selective

capture waveform.

8. If the number of selective capture edges is n, the edges r1 to rn must be defined

in each selective capture waveform in one wavetable.

9. The selective capture actions C & D must be placed only on the receive edges of

Output Pins. On the same receive edges in the input pins, the action “N” may be

specified.

Figure 5.7 shows the Selective Capture wavetable employed for the pattern using

which dynamic self-heating measurements are done.

Figure 5.7: Selective Digital Capture Wavetable

Further, the patterns that contain digital capture actions ('capture' and 'do not

capture') must be associated with test methods, which do not execute functional tests.

77

The Sequential/Selective Capture mode cannot be selected in the capture

settings GUI of the Pattern Editor. The active mode is determined by the selected

capture pattern. If the vector label uses a wavetable that contains selective capture

actions (C & D), the Selective mode is activated.

In order to retrieve the captured data after a Digital Capture Test has been

performed, vector variables must be used which allow us to define an arbitrary subset of

the captured data. By defining several vector variables, it is possible to retrieve different

views of the same captured data. It is important to note here that the vector variables

along with their configuration parameters (no_of_frames, vector_interdiscard etc.) only

define how to extract (filter) the data of interest from the captured data. Further, vector

variables are always related to one specific pattern and cannot be used with another one.

Figure 5.8 shows how a vector variable and associated configuration parameters may be

used to retrieve a subset of the captured data.

Figure 5.8: Example of a Digital Capture Variable with associated parameters

[100]

78

5.5 Results and Analysis

This section firstly depicts the thermal measurement results and thereafter

provides future directions related to test-chip design for efficiently characterizing self-

heating under dynamic workloads.

5.5.1 Results Summary

Figure 5.9: Thermal Characterization results for Static & Dynamic conditions

Figure 5.9 shows the thermal characterization results for static & dynamic

conditions in the voltage range of 0.9V – 2.0V wherein dynamic measurements are

made at each voltage point for 50 seconds. In Figure 5.9, the first 50 seconds

correspond to 0.9V, the time from 51-100 seconds corresponds to a voltage setting of

1V and so on. The Thermal Sensor displays good functionality across the entire

operating temperature range, however no apparent effect of dynamic activity is

observed in the THS output. From the results it can be reliably concluded that the

observed rise in temperature is completely due to higher voltage applied on the Thermal

Sensor itself as all supplies for the PLL, BCH & THS are clubbed together on a single

channel within the probe-card and it is not due to any dynamic activity on the BCH

blocks.

5.5.2 Analysis and Future Directions

This sub-section elaborates on the possible reasons for relative constant

temperature across static and dynamic modes and the directions for future investigation.

79

1. The tests are performed at Wafer Level where there is a high possibility of heat

dissipation. Possible reasons could include thick substrate and no spatial

isolation. Future work should involve performing tests on packaged dies.

Note: No packaged dies of the test-chip under consideration were available for

this work.

2. Resolution of the THS is 1 degree which is not sufficient to observe Self–

Heating at the wafer level with the required granularity. Embedded sensors with

higher resolution should be considered for similar characterization works.

3. The BCH blocks on which activity is induced occupy a very small area within

the test-chip to induce any appreciable self-heating effects. Also the thermal

sensor is located at the corner of the test-chip as shown in the floor plan of

Figure 5.1. In the future, the thermal sensor should be ideally located within the

chip such that it is spatially surrounded by larger activity inducing blocks.

5.6 Summary

This chapter presented a test methodology for thermal characterization using

an embedded digital sensor. Firstly the test pattern structure for self-heating

measurements of the test-chip under consideration is given. Thereafter the methodology

focusses on efficient memory management during silicon testing. Although the

measured results were unable to capture the impact of dynamic activity on self-heating,

the developed methodology is generic in nature and can be utilized for different

applications which deal with memory intensive measurements. Finally the directions for

future work are elaborated upon for accurately capturing the impact of switching

activity on chip temperature.

80

81

6 SIMULATION AND SILICON

VERIFICATION OF ADAPTIVE

VOLTAGE SCALING

“If you torture data long enough,

it will confess to anything you like.”

R. H. Coase

Abstract: Process variation causes measurable and predictable variance in the output performance of all chips. Adaptive Voltage Scaling (AVS) is now a key power management feature which is applied by reducing the voltage of fast silicon, to a point where speed target (Fmax) is met, thereby reducing power consumption. This chapter presents the full silicon level electrical characterization correlation of the simulation done using a novel method based upon existing CAD tools & a mixed mode approach, enabling complete Process-Voltage-Temperature (PVT) characterization on the portion of the design for which AVS is defined. The observed results on silicon for the metrics of performance & power display a close correlation with the ones obtained via the simulation flow, thus providing designers with deeper insights into the gains with AVS from within the simulation.

82

6.1 Introduction

Modern CMOS processes vary in performance from die to die and wafer to

wafer. Process variation causes measurable and predictable variance in the output

performance of all devices but becomes particularly important at smaller process nodes

[28, 29, 58-61]. This phenomenon occurs on all semiconductor devices from all

manufacturers and is more relevant for smaller semiconductor technology nodes or

geometries. Different process corners represent the variation in attributes of integrated

transistors when a semiconductor device is fabricated. Process corners can be loosely

classified into the three following categories:

· Slow (SS): A device fabricated with the slow process corner provides the lowest

acceptable frequency performance allowed at nominal voltages. Slow devices do

not have the potential to run at higher frequencies beyond the specified nominal

frequency. Slow devices typically consume less leakage power than others

devices in the population distribution.

· Typical (TT): A typical device operates at the midpoint or average frequency

performance across the population of devices at nominal voltages. Typical

devices have average performance when compared to slow and fast devices.

· Fast (FF): A fast device provides the highest performance across the population

of devices. Fast devices have the potential to run at faster frequencies than are

required at nominal voltages. Fast devices can operate at the minimum specified

frequency at voltages lower than nominal voltage values. However, fast devices

exhibit more leakage power consumption than slow and typical devices,

especially during high temperature conditions.

However the exact extent of variation is not always known prior to fabrication since

designers are provided with a worse case set of parameters to use for timing closure

simulation with the operating voltage which takes into account the slowest silicon.

With the emphasis on achieving lower power consumption a concern for

system designers, Adaptive Voltage Scaling (AVS) [72, 73] is now a key power

management feature that enables power reduction through voltage reduction. AVS is the

adaptation or modification of the supply voltage for a chip depending upon its process

corner. In the most basic form, Dynamic Power consumption for digital circuits can be

considered as P = CV²F where C is the switching capacitance, V is the supply voltage,

83

and F is the clock frequency. The leakage power is also voltage dependent. Therefore

significant power savings can be made through AVS which delivers accurate voltage to

each individual chip across temperature, process corner, and frequency variations.

Figure 6.1: Basic representation of power savings through adaptive voltage scaling

AVS is applied by reducing the voltage of a chip coming from a fast corner to a point

where speed target (Fmax) is met, thereby reducing power consumption as shown in

Figure 6.1. Because of process variations, the supply voltage of each device can be

adjusted to minimize power while still achieving desired performance. In order to

evaluate power gain for a digital circuit using AVS, it is necessary to evaluate Speed

and Power at multiple PVT points which are in fact different from classic design

development points.

By accurately setting the supply voltage for the actual requirements of each

chip, large power savings can be realized for chips coming from a fast corner. Inversely,

some failing chips below the minimum frequency (Fmin) coming from a slow process

corner can be made to pass by increasing the voltage with the associated penalty of

higher power consumption. It is therefore necessary to evaluate through electrical

simulation, design performance under process and voltage scaling conditions. However,

it has not been feasible to perform electrical simulation for full System on Chip (SoC)

with sufficient accuracy in a reasonable time. The work presented in this chapter,

addresses this infeasibility and proposes a new method that overcomes the major

constraints relating to simulation time and accuracy thereby paving the way towards

accurately simulating the electrical performance parameters of complex MPSOC

designs. A novel method based upon existing CAD tools & a mixed mode approach,

enabling complete PVT characterization on the portion of the design for which AVS is

84

defined. Furthermore, the full silicon level electrical characterization correlation of the

simulation is presented.

6.2 Reference Design

The reference design used for the development of this work is the ‘Area &

Power IP’ (APIP) block which is designed and fabricated in the 28nm technology to

address adaptive voltage scaling. The APIP is a representative design containing a

control and calculation unit with the maximum frequency being greater than 1Ghz and a

voltage range of 0.5V - 1.3V being inline with end product requirements. It is a

programmable power consumption block with the basic elements being 16-bit adders,

subtracters & multipliers. Its design consists of approximately 400k transistors.

Performance characterization of the APIP block enables the understanding and provides

a deeper insight into cutoff frequencies and power supply saving dependencies versus

process variations. This circuit is also useful for leakage and dynamic currents

monitoring on a small design block.

The basic top level block diagram of the APIP is shown in Figure 6.2 and the

main elements are explained thereafter:

Figure 6.2: APIP Basic Representation

85

Figure 6.3: Grouppmm block diagram

· Pace Counter: Power consumption is controlled is via a programmable counter

which allows data to be generated every N cycles (controllable)

· Generator : Generates the signals on the basis of toggling mode selected i.e.

ramp-up, ramp-down etc. Alternately a 16 bit specific external vector can also be

loaded.

· Grouppmm : This module takes the value generated by the generator and applies

two transformations based on addition, subtraction and multiplication as shown

in the Figure 6.3.

The APIP can be considered to be well functional when the following two conditions

are satisfied:

1. HBM i.e. heart beat monitor generates a low frequency, indicating functional

status of the IP. Generated frequency = Input Clock Frequency / 32.

2. Flag_reg remains high unless a mult_flag becomes active or Reference !=

Processed.

In order to enable the APIP to be clocked at high frequency, it is interfaced with an on-

chip low area frequency generator (FreqGen) as shown in Figure 6.4.

86

Figure 6.4: FREQGEN & APIP Interfacing

The FreqGen generates a high frequency clock signal with programmable

frequency and a constant (~ 0.5) duty ratio. The block is used to provide a high

frequency square wave pulse to the APIP. The FreqGen contains a 16 step

programmable ring oscillator & each fundamental frequency can be further divided by 8

division factors. When activated, the frequency available on FreqOut is directly applied

to the APIP. FreqOut/16 is available on the Output Pad of the Test-chip.

Since the FreqGen has 16 fundamental frequencies which may be further

divided, it introduces the limitation that only discrete frequencies can be produced in the

targeted test window. Let FFi, FFi+1, FFi+2, ……. FFn be the range of discrete

fundamental frequencies that can be produced using the FreqGen and DF the division

factor,

It can be clearly seen from the above equation that as the division factor (DF)

decreases and as we move towards higher frequencies, the interval between two discrete

frequencies increases. This introduces a maximum error of approximately 5% in the

final silicon characterization which is computed based upon the difference between the

last passing frequency and the first failing frequency. This needs to be kept in mind

while analyzing the results.

The tradeoff in using a ring oscillator based macro versus a PLL for high

frequency functional characterization lies in the relatively lower on-chip area required

for the FreqGen and the limited discrete frequencies that can be produced.

87

6.3 AVS Simulation Methodology

Figure 6.5: Computation effort for full PVT simulation using Fast Spice

The standard method for full PVT electrical simulation is to load a design data

base along with spice netlist for digital library into a fast spice simulator such as

CustomSim. For AVS simulation, accuracy with less than 2% error or 10mV resolution

is required. This implies fixing maximum accuracy mode and a huge number of PVT

points i.e. 156 PVT configurations, leading to 1560 simulations if we take into account

3 process corners, 13 voltages points, 4 unique temperatures and 10 distinct frequencies

as shown in Figure 6.5. This methodology then intrinsically limits the design size and

the testbench size to small and unrepresentative test cases as the design requires

CustomSim precision of 5 and a simulation run time of 20 hours, leading to a complete

simulation time of 130 days, which is unfeasible and unscalable to large circuits.

The approach utilized for simulation exploits the capabilities of VCS AMS

[74] to address the design size and then applies new simulation techniques to

simultaneously measure Fmax and power within the same simulation run. An additional

step optimizing Fmax search has also been developed. While mix mode simulation for

mixed analog and digital design is well established, solutions for full digital circuit are

generally not required. For this work, firstly the entire circuit was simulated using a

pure spice simulator with an external script to retrieve the results. This enabled to

evaluate the accuracy of the mixed mode approach and also the gain in runtime. In the

particular case of AVS, only a portion of the design requires accurate Fmax simulation,

i.e. the modules containing critical paths. We can take advantage of this fact through

mixed mode simulation where the AVS block is simulated using fast spice and the

remainder of the circuit and testbench is simulated using standard functional simulation

88

i.e. digital simulation using VCS. Furthermore, most digital systems require a

setup/initialization phase followed by a run-mode/calculation phase, and now that a

mixed mode simulation is established, all setup and data extraction portion of the

simulation can now be performed using the faster functional simulator. This approach

leads to a gain of ~7.2X in the total simulation time and is summarized in Figure 6.6.

Figure 6.6: Simulation flows based on CustomSim & VCS AMS

A pass/fail method to locate the Fmax for each given operating point is

developed. This entails running multiple simulations using a standard dichotomy

method, to converge to the FMAX for each given operating (PVT) point. To facilitate

this, features of the simulation ($snps_get_volt command in the digital environment

allows us to retrieve analog values) enabling a feedback from the functional output to

adjust variables for the next simulation are used. This method can be further enhanced

by running multiple simulations in parallel to increase the convergence speed which

enables a rapid convergence to the final functional simulation reporting a Vmin for a

target frequency.

Performance in simulation run times has also been improved using VCS-AMS

through the enablement of direct dynamic power and leakage measurements during the

Fmax search simulation runs. This is an improvement over a more classical approach of

running power simulations at a fixed set of predefined PVT points. Power Measurement

uses analog assertion commands which automatically create the control files required by

CustomSim to post-process the output files. Further simulation time reduction was

achieved while including power variation in the whole test flow, avoiding every

89

preliminary setup phase prior to Fmax search as shown in Figure 6.6. This lead to an

additional gain of ~2X in simulation time, thus leading to a global gain of ~15X in

simulation time. Power variation has been done by changing standard power supply

defined in Spice to a Real number defined directly in the testbench, enabling on the fly

modification of supply voltage during simulation runs. Further gains have been

achieved through a judicious use of simulation features enabling the direct reporting of

the final results.

This is achieved through Fast Spice options which directly extract current

measurements and provide them to the testbench environment, which in turn store them

in output files. Here instead of manually extracting the average current we generate

directly a “.meas command” in a text file with time values obtained from the testbench

itself & then launch automatically Custom Waveview to extract the average current. A

summary of the novel simulation flow enabled using Synopsys tools and the resulting

runtime improvements achieved are diagrammatically shown in Figure 6.7.

Figure 6.7: Simulation methodology overview and resulting Run-time

improvement

90

6.4 Silicon Characterization Methodology

Two distinct and conceptually simple methodologies namely the Fmax &

Vmin searches have been employed for characterization of the APIP block.

· Fmax Search: The voltage is set to distinct values in the design range and the

maximum frequency at which the APIP block is functional is logged.

The following steps are performed for an Fmax Search:

1. The frequencies produced by the FreqGen are precharacterized using a high

division factor i.e. the fundamental frequencies generated by the 16 distinct

steps of the ring oscillator are logged. Using high division factors is required

since a maximum frequency of approximately only 35 Mhz can be reliably

observed and recorded on the output pins of the tester. Therefore the

maximum undivided frequency which can be reliably observed can be

calculated as :

Max Undivided Freq / Output pin divider value = 35 Mhz

! Max Undivided Freq = 35 x 16 = 560 Mhz

Hence it is not possible to observe the entire range of fundamental

frequencies of the FreqGen on the output without using division factors. This

necessitates the pre-characterization of the FreqGen using an internal

division factor in order to overcome the inability to observe in real time the

frequency produced by using low division factors.

2. The desired voltage level along with the fundamental frequency and division

factor of the FreqGen are setup for test.

3. A functional test is performed to check the status of the Flag_reg signal

(Figure 6.2) and the frequency of the HBM signal is measured thereafter.

4. The Pass/Fail results at the particular test point under consideration are

logged.

5. Steps 2 – 4 are performed for the entire range of volatge points, fundamental

frequies and division factors.

6. The values of maximum frequency, static current and dynamic current for

each voltage point are determined from the logged results.

The flowchart for the Fmax search is shown in Figure 6.8.

91

Figure 6.8: Flowchart for APIP Fmax characterization

92

· Vmin Search: The frequency is set using distinct FreqGen configurations and the

minimum voltage at which the APIP block is functional is logged.

The following steps are performed for an Vmin Search:

1. The frequencies produced by the FreqGen are precharacterized using a high

division factor to overcome to hardware limitation of the tester.

2. The desired fundamental frequency and division factor of the FreqGen along

with the voltage level are setup for test.

3. A functional test is performed to check the status of the Flag_reg signal and

the frequency of the HBM signal (Figure 6.2) is measured thereafter.

4. The Pass/Fail results at the particular test point under consideration are

logged.

5. Steps 2 – 4 are performed for the entire range of volatge points, fundamental

frequies and division factors.

6. The values of maximum frequency, static current and dynamic current for

each voltage point are determined from the logged results.

The flowchart for the Fmax search is shown in Figure 6.9.

The intent behind developing two distinct test methodologies was to verify the

stability & reproducibility of the results achieved when the APIP is clocked with the

ring oscillator based FreqGen. Additionally the idea was to also analyse the impact

of discrete fundamental frequencies on the precision of the result since a higher

degree of granularity is available when doing a Vmin search at the available discrete

frequencies. It was observed that the characterization results from these two

methodologies displayed exact correlation. However since the test time for

implementing a Vmin search is much larger than an Fmax search with no visible

benefit in accuracy, the Fmax search method was used for the full wafer

characterization at different PVT points.

93

Figure 6.9: Flowchart for APIP Vmin characterization

94

6.5 Results & Analysis

In terms of analyzing the simulation vs silicon correlation, the metrics of

interest are the maximum frequency, leakage current & dynamic current at a particular

voltage. For if there is a good correspondence on these metrics, one can reliably predict

the power consumption & thereby the power savings through AVS from within the

simulation environment. Silicon characterization tests were performed across corners at

multiple temperature points. It is observed that the simulation data is very well aligned

with the silicon results for the Fmax & Leakage values as seen in Figures 6.10 & 6.11.

This information is very useful in understanding the performance measures as well as

the contribution of leakage power in overall power dissipation particularly at smaller

technology nodes from within the simulation environment. For comparing dynamic

current and understanding the accuracy of simulation, the metric Dynamic_Power/Mhz

is considered. Once again a fairly accurate correlation was observed between the

simulation data and the silicon results across corners and temperature points as seen in

Figure 6.12. This enables a relatively precise estimate of the power savings via AVS

from the simulation results. The silicon results used for correlation with the simulation

data herein correspond to a single die and the data has been normalized for presentation

herein. Figures 6.10, 6.11 & 6.12 show the normalized simulation v/s silicon

comparison for frequency, leakage current and Dynamic_Power/Mhz respectively for

the fast & slow corners. Figure 6.13 depicts the normalized values of voltage scaling

that can be achieved at 30°C on silicon while maintaing the same level of performance

for chips coming from the SS & FF corners. It is seen that 17% reduction in the nominal

value of voltage can be achieved for a chip of the FF corner while maintaining the same

level of performance as a chip of the SS corner. Figure 6.14 provides the normalized

silicon results of the dynamic power savings through AVS for a FF corner chip. From

the graph it can be seen that power savings ranging from 35% to 80% can be achieved

across the operating voltage range without compromising on the target performce.

95

Figure 6.10: Normalized Simulation v/s Silicon Fmax comparison at 125°C

Figure 6.11: Normalized Simulation v/s Silicon Leakage Current comparison at

125°C

96

Figure 6.12: Normalized Simulation v/s silicon Dynamic_Power/Mhz comparison

at 125°C

Figure 6.13: Normalized silicon results for power saving through AVS at 30°C

97

Figure 6.14: Normalized silicon results for dynamic power saving through AVS for

FF corner chip at 30°C

6.6 Summary

The work presented in this chapter describes a time conserving simulation flow

for digital circuits along with associated test methodology and requisite on-chip silicon

results correlation . The reported simulation flow enables a 15X gain in CPU time for

Adaptive Voltage Scaling simulation for digital circuits. The major gain is due to use of

VCS-AMS capabilities, which allows to optimize testbench behaviour, thanks to

communication between bench and fast spice envirnments. This has made possible the

detailed performance assessments that have previously been considered as prohibitive

and opens the door to investigate scaling of this approach to complete industrial IP or

System on Chip designs. The observed results on silicon for the metrics of performance

& power display a close correlation with the ones obtained via the simulation flow, thus

providing designers with deeper insights into the gains with AVS from within the

simulation environment.

98

99

7 DESIGN AND PERFORMANCE

PARAMETERS OF AN ULTRA-LOW VOLTAGE PROCESSOR

“Live one day at a time

emphasizing ethics rather than rules.”

Wayne Dyer

Abstract: This chapter presents a single-supply SPARC 32b V8 microprocessor designed with Ultra Low Voltage (ULV) adapted standard cells and memories, aiming at low energy operation and stand by power. The microprocessor, equipped with 10 Transistors ULV bitcell 8KB SRAM cache, has been fabricated in Fully Depleted Silicon On Insulator (FDSOI) 28nm technology. A comparative analysis with similar implementations has been provided highlighting the performance gain and power savings that are achieved by the presented design methodology and implementation technology. Wafer-level tests showed that the ULV adapted microprocessor has an operating range that is functional down to 0.33V and that the ULV able cache can save from 30% to 62% energy.

100

7.1 Introduction

Growing complexity and performance requirements of modern multimedia

devices is pushing chipmakers to achieve better logic performance especially at low

voltages. Embedded applications for wearable technology which demand more for less

i.e. greater performance for less energy and with the transformational change that the

Internet of Things (IOT) is ushering in, the need for having performant low power cores

cannot be overstated. Herein the FDSOI technology has conclusively demonstrated its

ability to achieve high speed at low operating voltages [38]. Moreover in the race for

higher energy efficiency driven by the Internet Of Things and battery-powered systems,

lowering the device supplies to Ultra Low Voltage (ULV) has been intensively explored

[101, 102]. This is the strongest lever to achieve low energy operation at the expense of

low frequency and increased variability. However, the latest technologies widen the

application range by enabling ULV frequencies from 10 to 100 MHz with lower

variability penalty.

This chapter presents a 32b SPARC microprocessor aiming low operating

energy and stand-by power designed with specific standard cells, a 10 Transistors

memory cell, and adapted place-and-route tooling margins. Fabricated in FDSOI 28nm,

the design has been tested at wafer level and is functional from 1.2V down to 0.33V

7.2 Design and Fabrication Platform

This section firstly provides relevant information regarding the implementation

technology used. Thereafter design & performance parameters of standard cells and

memories employed within the processor are presented. Lastly details regarding the

design flow adaptation are presented.

7.2.1 Technology

The chip has been manufactured in FDSOI 28nm technology [38, 103] with 10

metal stack. The thin oxide logic transistor channel is un-doped and its volume is

isolated from bulk by an ultra-thin buried oxide layer, this isolation enables to control

101

the transistors’ threshold voltage (VT) by the well type lying beneath the channel.

Several design options are possible as shown in Figure 7.1 [104]:

Figure 7.1: FDSOI Well Architectures [104]

· Dual-Well Regular VT (RVT), where the well positions are similar to those of

bulk, allowing full supply range reverse body bias.

· Dual-Well Low VT (LVT), where the well positions are inverted with respect to

those of bulk (i.e. P-well under PMOS, N-well under NMOS), allowing full

supply range forward body bias.

· Single-Well (SW), where RVT PMOS together with LVT NMOS, or vice versa,

can be used, but with no body bias possible.

For all flavors described in Figure 7.1, the static current as function of frequency is

shown in Figure 7.2 [104].

Figure 7.2: Static current vs. Frequency trend from 0.5 to 1.3V of VDD [104]

102

7.2.2 Standard Cells

The standard cells used in the processor implementation presented in this

chapter have been laid out so that the source and drain contacts spacing can be kept

equal for any drawn gate length (L) from 30nm to 46nm. The final derivation of longer

L cells is done by simply adding a CAD marker layer which denotes the L increase

from nominal. This technique is called poly biasing. In the rest of the chapter, Poly Bias

10 (PB10) denotes standard cells with a systematic increased L of 10 nanometers for all

transistors.

This technique can be applied asymmetrically on some selected transistors in a

cell to achieve specific performances. In the rest of the chapter, the cells denoted as

Asymmetric Poly Bias 10 (APB10) have transistors mainly in nominal size, with some

transistors enlarged by +10nm. APB enables offering additional operating points in the

Energy - Frequency space, where the extremes are achieved with symmetric poly bias

cells.

Figure 7.3: Quasi-FO4 Critical Path

Figure 7.4: Poly bias cells leakage vs Quasi-FO4 critical path frequency

103

The various symmetric or asymmetric Poly-Biasing cells’ Power Delay

Product (PDP), leakage, current and delay values have been compared using the critical

path methodology developed in [105] and shown in Figure 7.3. As it can be seen in

Figure 7.4, when frequency is lower than 3000MHz, the optimal leakage is reached with

the largest symmetrical PB but above this frequency, configurations are spread in

frequency and APB configurations enable intermediate leakage-frequency alternatives

to symmetric PB configurations. For example, at 3800MHz frequency, APB16PB4 and

APB10PB4 offer an alternative to PB4 with lower leakage. The Poly-Biasing

configurations giving lowest leakage currents or energy for a given delay were selected

to be derived in standard cell library resulting in a set of 5 Poly-Biasing configurations

and a total of 300 standard cells. In the reported design, 21% of the combinational cells

are asymmetric.

The flip-flops functionality has been validated using 6-σ Monte-Carlo

simulations across the ultra-wide voltage range. The same method has been applied to

the input and output level shifters designed to address nominal voltage (around 1V)

interface.

7.2.3 Memories

Figure 7.5: 10 Transistor ULV Bitcell

The implemented processor uses a single-pwell (SPW) evolution with limited

sizing modifications of the dual-well ULV bitcell previously presented in [106]. Single-

pwell gives higher threshold voltage NMOS and lower leakage with a cost of 50mV

higher Vmin. The schematic of the bitcell is displayed in Figure 7.5 wherein red circles

show the transistors activated during read to drive the bit lines to 0.

104

Figure 7.6: Comparison of leakage between ULV 10T Dual Well (DW) and Single

Well (SW) bitcells

While the dual-well bitcell showed a leakage penalty up to 8X compared to 6T

reference bitcell available in this technology, the new SPW bitcell lowers the leakage

penalty from 66% at typical process 25C down to 25% at fast process 125C (Figure

7.6), while maintaining the read (SNM) and write (WM) stability. The comparison of

leakage between ULV 10T Dual Well and Single Well bitcells, expressed in ratio of

reference 6T bitcell leakage is shown in Figure 7.6.

7.3 Design Architecture and Implementation

Figure 7.7: SPARC V8 32b LEON3 microprocessor system & FDSOI 28nm reticle

picture

105

The microprocessor presented in this work is a SPARC V8 32 bit LEON3

synthesizable core, available at [107]. It is Harvard split data/instruction cache

architecture with 4 KBytes data and 4 KBytes instruction caches, using random

replacement policy. The processor uses register window context swap with a 136 entries

2-read 1 write register file (8 windows composed of 16 registers, plus 8 global

registers). The register file is implemented with inferred flip-flops taken out of a scan

chain to limit routing congestion; the register file is tested by a single register program

at the beginning of the circuit test. The caches are implemented with single-port

memories instances built from 10T bitcells.

The implementation results in approximately 100k gates and 64Kb of memory,

it occupies 0.45mm2 on-chip area wherein the total chip area is 5.5 mm2. Figure 7.7

shows the SPARC V8 32b LEON3 microprocessor with system busses and peripherals.

The 28nm FDSOI reticle with Leon ULV processor marked in green is also shown in

the Figure.

7.4 Test Methodology

The Verigy 93k automated test equipment has been used to measure the

circuits on one wafer at ambient temperature. The chip is activated with application

program serially loaded into the AHB-RAM then run from the caches (if activated) or

from the AHB-RAM. The microprocessor under test features two dedicated supplies,

one for the CPU core and the other for the caches; while this split supply enables

selective current characterization, this chip can be considered single-supply as no

voltage modulation was used to assist operation: i.e. both voltages are constant and

equal between them during the whole execution of the programs.

The low-voltage results are extracted in two steps, first a minimum voltage

(Vmin) search is done at fixed frequency, then at the extracted Vmin the maximum

frequency is searched (Fmax), this filters out hold violation and possibly leads to higher

frequency at voltage floor. The leakage and dynamic currents are extracted at Fmax and

Vmin.

106

7.5 Results and Analysis

This section firstly benchmarks the electrical performance of the implemented

ULV processor against similar implementations. Thereafter the impact of utilizing ULV

enabled cache on total energy consumption is studied.

7.5.1 Electrical Performance

Table 7.1: Microprocessor Performance Comparison

Figure 7.8: Average microprocessor minimum voltage extracted at 1MHz

This Work [108] [109] [110] [111]

Architecture 32b

SPARCV8 32b C64x 32b ReISC 16b MSP430 16b MSP430

SRAM 4KB I$ 4KB D$

32KB L1$ 128KB L2$

256B $ 16KB

256B $ 18KB

8B I$ 176KB

Peripherals

DSU, UART, GPIO

I2C, SPI, UART, MMU

I2C, SPI, JTAG

SPI, UART, GPIO

GPIO

Technology 28nm

FDSOI 28nm CMOS 65nm CMOS 65nm CMOS 65nm CMOS

Gate Count 100k 600k 12k 8k N.A.

Cycle Energy

(pJ) 7.6 200 10.2 7 27.3

F @ MEP (MHz) [email protected] [email protected] [email protected] [email protected] [email protected]

Fmax (MHz) [email protected] [email protected] [email protected] N.A. [email protected]

25MHz power

(mW) 0.241 5.9 0.5 0.175 N.A.

Standby Power

(µW) 14.7 4000 1.65 1.5 1

107

The microprocessor exhibits minimum energy point at 7.6pJ per instruction at

11MHz and 0.44V, the frequency ranges between [email protected] to 25MHz at 0.51V

for low voltage operation while it can achieve [email protected], [email protected] and

[email protected], but this latter voltage is an overdrive that cannot be sustained for

continuous operation. Vmin extracted on 193 dies at respectively 10/1MHz is

0.43V/0.35V, with a best die at 0.33V. The Vmin population at 1MHz is displayed in

Figure 7.8. The standby power extracted at the 1MHz Vmin is 14.7µW, the lowest

energy needed to execute a dhrystone is 6.5nJ@10MHz (detailed in the next section).

The results measured are displayed and compared with similar work in Table

7.1. Compared to the work in [108], if we scale by a factor from 6X to 32X to normalize

either to gate count or memory capacity, the standby power is improved in the range of

37X to 7X in the proposed design. The design in [109] suffers from very low speed at

low voltage and both the designs in [110] and [111] are outperformed in energy per

data- path bits. Both this work and [112] are manufactured in FDSOI 28nm, while [112]

aims high speed and reaches 115MHz at 0.5V with 0V FBB, it drives 62pJ energy

per instruction where the proposed design needs 7.6pJ. Comparison between different

architectures is always difficult but the proposed CPU is a performing alternative for

low energy processing in tenth MHZ speed range.

7.5.2 Cache Impact on Energy

Dhrystone (nJ)

Cycles $

active

Cycles

No $

Energy $

active

Energy

No $

960 1978 12.8 18.3

FFT (µJ)

Cycles $ active

Cycles No $

Energy $ active

Energy No $

192758 654053 2.43 6.04

Atkin (µJ)

Cycles $

active

Cycles

No $

Energy $

active

Energy

No $

86889 287334 1.03 2.66

Table 7.2: Comparison of execution energy with and without cache activation

The caches of the CPU have dedicated supplies and can be deactivated via

software, a feature that enables a simple evaluation of the effects of the caches presence

on the energy required to execute a program. The performance for three different

108

benchmark programs: the Dhrystone test [113], a 256 points Fast Fourier Transform and

Atkin’s sieve prime number search [114] without any caches or with both instruction

and data cache activated has been evaluated and the number of cycles needed for

execution have been extracted. The results are displayed in Table 7.2, it can be seen that

the presence of caches on the same power and clock domain as the CPU can save from

30% to 62% of energy because of dynamic energy saving induced by the CPU cycles

gain, which overcompensates the extra leakage caused by the ULV caches. While this

cycles number and dynamic energy gains stand for any operating voltage, it justifies

equiping the CPU with ULV caches for low energy operation.

7.6 Summary

A FDSOI 28nm 32b SPARC microprocessor has been designed with adapted

standard cells, memories and flow to withstand ULV operations. The validation circuit

with split CORE/cache supplies enables to explore memory architecture options and

demonstrates that designing an ULV able cache can save from 30% to 62% energy. The

operating range extends from 0.33V minimum voltage at 1MHz with associated

14.7µW standby power, to Minimum Energy Point at 0.44V/11MHz where it drives

7.6pJ per instruction and 6.5nJ per Dhrystone. At 25MHz, a frequency fast enough to

enable audio signal processing for hearing aids, the operating power is 241µW. Duty

cycled operation with peak processing power can also be addressed with the

[email protected] operating point.

109

8 CONCLUSION AND FUTURE

PERSPECTIVES

“It always seems impossible until it’s done.”

Nelson Mandela

110

8.1 Thesis Summary

With continually shrinking nanoscale technologies, the increase in

manufacturing defects causes difficulties in developing reliable semiconductor devices.

Usually, these devices are expected to meet the target specification for a wide range of

operating conditions for different on-chip activity levels, voltage distributions in the

power supply and ground networks, parametric variations, etc. With critical transistor

sizes approaching only tens of atoms has led to the relative inability of process

engineers to control the manufacturing processes as accurate as needed. Also the

conventional large design margin allocation method to mitigate the impact of faults and

variations has hit a diminishing rate of returns. Furthermore, the rapid shrinking of the

area-performance-power budget along with the increase in process induced variations

and time-dependent transistor parameters shift does not make this design approach

applicable anymore. Taking into account these issues that affect semiconductor devices

in deep sub-micron technologies, the wide-ranging work of this thesis addresses the

challenges impacting advanced technology nodes from a silicon test perspective.

To conclude, the major contributions of this work are as follows:

· A test circuit for accurately characterizing IPs and standard cell libraries using

at-speed testing has been presented. The design supports the qualification of

basic building gates, cells and structures on silicon during the process

development phase in terms of timing before they are integrated within complex

SoCs. The presented power aware design and test methodology helps mitigate

peak power issues during capture and reduce power consumption by about 70%.

The results provide an insight into the impact of within die and within wafer

variation across multiple PVT points. Information regarding CAD vs. silicon

correlation derived from the test results helps to efficiently optimize CAD

models and determine the timing margins provided during sign-off [A].

· A test circuit for accurate characterization of delays in clock and data paths for

evaluating the extra timing margin available in silicon has been presented. The

test results provide an insight into the constituent components of the extra timing

margins available for setup and hold in silicon. Understanding the sources of

extra margin aids in precisely establishing de-rate added during sign-off such

111

that fail-free operation can be ensured while deriving maximum performance

from a given circuit [B].

· The fundamental elements towards an Adaptive Wearout Management (AWM)

control loop for the dynamic supply voltage regulation of digital circuits have

been demonstrated. An adaptive voltage control loop has been implemented

using a new generation of in-situ monitors which generate warning flags to

prevent circuit failure under nominal conditions. The adaptive voltage

management control loop is efficient to tackle all voltage margins over a large

range of DVFS operating conditions and temperatures. Considerable savings in

static and dynamic power have been achieved over a large range of DVFS

operating conditions while maintaining a fault free instruction execution [E].

· A test methodology for thermal characterization using an embedded digital

sensors has been developed. The test methodology focusses on efficient memory

management during silicon testing and is generic in nature and can be utilized

for different applications which deal with memory intensive measurements.

· A time conserving simulation flow for digital circuits along with associated test

methodology and requisite on-chip silicon results correlation has been presented.

The reported simulation flow enables a 15X gain in CPU time for Adaptive

Voltage Scaling simulation for digital circuits. The observed results on silicon

for the metrics of performance & power display a close correlation with the ones

obtained via the simulation flow, thus providing designers with deeper insights

into the gains with AVS from within the simulation environment [C].

· A FDSOI 28nm 32bit SPARC microprocessor has been designed with adapted

standard cells, memories and flow to withstand ULV operations. The validation

circuit with split core/cache supplies enables to explore memory architecture

options and demonstrates that designing an ULV able cache can save can result

in considerable energy savings [D].

112

8.2 Future Works

This thesis can form the basis for various possible future works in the area of

developing efficient test solutions for resolving the wide-ranging challenges that

confront the semiconductor industry. They are briefly listed below:

· Considering the simple basic structures of the tile and instance implemented for

the characterization of IPs and standard cells, they may be used to stress on-chip

Hotspots (i.e. most critical mask locations) via electric testing with Scan Test

methodology to identify faults present on silicon & compare the same with

initially induced faults via lithography dose modulation. This could help in

accelerating the process learning curve with hotspot monitoring through

electrical test.

· The presented test structure of the CDD block can be used as sensor inside the

chip along with SOC designs to improve the frequency of a design in post-

silicon tuning after reading results from the sensor.

· Based on the lessons learnt with respect to test-chip design and pattern

generation for thermal characterization, in the future thermal data could be used

together with timing information coming from in-situ monitors for implementing

a comprehensive reliability/performance management strategy.

· Further increase in energy savings of the ULV processor can be achieved by

efficiently regulating the body biasing applied to the processor, therby

expanding its energy-performance envelope.

113

SCIENTIFIC CONTRIBUTIONS

· International Conferences

[A] “An Effective Power-Aware At-Speed Test Methodology for IP Qualification and

Characterization” (submitted)

Kapil Juneja, Darayus Adil Patel, Rajesh Kumar Immadi, Balwant Singh, Sylvie

Naudet, Pankaj Agarwal, Arnaud Virazel, Patrick Girard

ITC’2016: International Test Conference, USA, 15-17 Nov. 2016.

[B] “Analysis of Setup and Hold Margins inside Silicon for Advanced Technology

Nodes”

Deepak Kumar Arora, Darayus Adil Patel, Shahabuddin, Sanjay Kumar, Navin

Kumar Dayani, Balwant Singh, Sylvie Naudet, Arnaud Virazel, Alberto Bosio

ISQED’2016: 17th International Symposium on Quality Electronic Design, USA, 15-

16 March 2016.

[C] “Simulation & Silicon Verification of Adaptive Voltage Scaling for Real

Applications”

Darayus Adil Patel, Robin Wilson, Franck Gardic, Sylvie Naudet, Philippe Brahic,

Arnaud Virazel, Patrick Girard

SNUG’2015: Synopsys Users Group Conference, France, 18 June 2015.

[D] “Design and Performance Parameters of an Ultra-Low Voltage, Single Supply 32bit

Processor implemented in 28nm FDSOI Technology”

S. Clerc, F. Abouzeid, D.A. Patel, J-M. Daveau, C. Bottoni, L. Ciampolini, F. Giner, D.

Meyer, R. Wilson, P. Roche, S. Naudet, A. Virazel, A. Bosio, P. Girard

ISQED’2015: 16th International Symposium on Quality Electronic Design, USA, 2-4

March 2015.

[E] “Adaptive Wearout Management with in-situ Aging Monitors”

V. Huard, F. Cacho, F. Giner, M. Saliva, A. Benhassain, D. A. Patel, N. Torres, S.

Naudet, A. Jain, C. Parthasarathy

IRPS’2014: 2014 IEEE International Reliability Physics Symposium, USA, 1-5 June

2014

· National Conferences

[F] “Test and Characterization Methodologies for FDSOI Technology”

Darayus Adil Patel, Sylvie Naudet, Arnaud Virazel, Alberto Bosio, Patrick Girard

GdR’2016: Groupement de Recherche SoC-SiP, Nantes, France, 8-10 June 2016.

114

[G] “Innovative Test Techniques for Advanced Technology Nodes”

Darayus Adil Patel, Sylvie Naudet, Arnaud Virazel, Alberto Bosio, Patrick Girard

JNRDM’2016: Journées Nationales du Réseau Doctoral en Micro-nanoélectronique,

Toulouse, France, 11-13 May 2016.

[H] “Design and Implementation of Novel Test Strategies for High Quality Test”

Darayus Adil Patel, Sylvie Naudet, Loïc Echene, Arnaud Virazel, Alberto Bosio,

Patrick Girard

JNRDM’2015: Journées Nationales du Réseau Doctoral en Micro-nanoélectronique,

Boreaux, France, 5-7 May 2015.

115

REFERENCES

[1] Gordon E. Moore. Cramming more components onto integrated circuits, reprinted

from electronics, volume 38, number 8, April 19, 1965, pp.114 ff. Solid-State

Circuits Newsletter, IEEE, 11(5):33 –35, Sept. 2006.

[2] Laung-Terng Wang Cheng-Wen Wu and Xiaoqing Wen “VLSI Test Principles and

Architectures”, Morgan Kaufmann Publishing House

[3] M. Wirnshofer. Variation-Aware Adaptive Voltage Scaling for Digital CMOS

Circuits. Springer Series in Advanced Microelectronics. Springer Netherlands,

2015.

[4] S. Bhunia, S. Mukhopadhyay, and K. Roy. “Process Variations and Process-Tolerant

Design”. In: VLSI Design, 2007. Held jointly with 6th International Conference on

Embedded Systems., 20th International Conference on. 2007, pp. 699–704.

[5] B.H. Calhoun, A. Wang, N. Verma, and A. Chandrakasan. “Sub-Threshold Design:

The Challenges of Minimizing Circuit Energy”. In: Low Power Electronics and

Design, 2006. ISLPED’06. Proceedings of the 2006 International Symposium on.

2006, pp. 366–368.

[6] S.S. Sapatnekar. “Overcoming Variations in Nanometer-Scale Technologies”. In:

Emerging and Selected Topics in Circuits and Systems, IEEE Journal on 1.1 (2011),

pp. 5–18.

[7] R. Kumar and The University of Wisconsin Madison. Temperature Adaptive and

Variation Tolerant CMOS Circuits. University of Wisconsin–Madison, 2008.

[8] S. Das et al. “RazorII: In Situ Error Detection and Correction for PVT and SER

Tolerance”. In: Solid-State Circuits, IEEE Journal of 44.1 (2009), pp. 32–48.

[9] J. Samandari-Rad, M. Guthaus, and R. Hughey. “Confronting the Variability Issues

Affecting the Performance of Next-Generation SRAM Design to Optimize and

Predict the Speed and Yield”. In: Access, IEEE 2 (2014), pp. 577–601.

[10] Janak H. Patel. “Manufacturing Process Variations and Dependability - A

Contrarian View”. In: Dependable and Secure Nanocomputing (DSN), 2007

Workshop on. 2007.

[11] Wei Huang, M.R. Stan, S. Gurumurthi, R.J. Ribando, and K. Skadron. “Interaction of

scaling trends in processor architecture and cooling”. In: Semiconductor Thermal

Measurement and Management Symposium, 2010.

[12] M. Santoro. “New methodologies for eliminating No Trouble Found, No Fault

Found and other non repeatable failures in depot settings”. In: AUTOTESTCON,

2008 IEEE. 2008, pp. 336–340.

[13] J. Segura and C.F. Hawkins. CMOS Electronics: How It Works, How It Fails. Wiley,

2004.

[14] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers. “The impact of technology scaling

on lifetime reliability”. In: Dependable Systems and Networks, 2004 International

Conference on. 2004, pp. 177–186.

116

[15] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers. “The case for lifetime reliability

aware microprocessors”. In: Computer Architecture, 2004. Proceedings. 31st

Annual International Symposium on. 2004, pp. 276–287.

[16] P. Chen and K. Keutzer, “Towards true crosstalk noise analysis,” in Computer-

Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International

Conference on, 1999, pp. 132–137.

[17] L. Ding, D. Blaauw, and P. Mazumder, “Accurate crosstalk noise modeling for early

signal integrity analysis,” in Computer-Aided Design of Integrated Circuits and

Systems, IEEE Transactions on, 2003, pp. 627–634.

[18] K. Agarwal, D. Sylvester, and D. Blaauw, “Modeling and analysis of crosstalk noise

in coupled RLC interconnects,” in Computer-Aided Design of Integrated Circuits

and Systems, IEEE Transactions on, 2006, pp. 892–901.

[19] M. L. Bushnell, and V. D. Agrawal. “Essentials of Electronic Testing for Digital,

Memory and Mixed-Signal VLSI Circuits,” Kluwer Academic Publishers, Boston,

MA, 2000

[20] E. R. Hnatek. “Integrated Circuit Quality and Reliability,” Mercel Dekker, New York

City, NY, 1987

[21] F. Jensen, and N. E. Petersen. “Burn-In,” John Wiley & Sons, Chichester, UK, 1982

[22] D. Frank, R. Dennard, E. Nowak, P. Solomon, Y. Taur, and H.-S. P. Wong, “Device

scaling limits of Si MOSFETs and their application dependencies,” in Proceedings

of the IEEE, 2001, pp. 259–288.

[23] C. Hawkins, A. Keshavarzi, and J. Segura, “View from the bottom: nanometer

technology AC parametric failures - why, where, and how to detect,” in Defect and

Fault Tolerance in VLSI Systems, 2003. Proceedings. 18th IEEE International

Symposium on, 2003, pp. 267–276.

[24] H. S. P. Wong, D. J. Frank, P. M. Solomon, C. H. J. Wann, and J. J. Welser, “Emerging

Nanoelectronics: Life with and after CMOS,” in A. M. Ionescu and K. Banerjee, Eds.

Norwell, MA: Kluwer Academic Publishers, 2004, pp. 259–288.

[25] H. J. Brand, S. Rulke and M. Radetzki, "IPQ: IP qualification for efficient system

design," 5th International Symposium on Quality Electronic Design, 2004.

Proceedings., 2004, pp. 478-482.

[26] L. W. Wang and H. W. Luo, "Quality and reliability of digital soft IP core and a

qualification framework," 2011 International Conference on Quality, Reliability,

Risk, Maintenance, and Safety Engineering (ICQR2MSE), Xi'an, 2011, pp. 804-808.

[27] A. Vorg, M. Radetzki and W. Rosenstiel, "Measurement of IP qualification costs and

benefits," Design, Automation and Test in Europe Conference and Exhibition, 2004.

Proceedings, 2004, Vol.2 pp. 996-1001.

[28] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi and V. De, "Parameter

variations and impact on circuits and microarchitecture," Design Automation

Conference, 2003. Proceedings, 2003, pp. 338-342.

[29] H. Onodera, "Variability modeling and impact on design," IEEE International

Electron Devices Meeting, 2008. IEDM 2008. San Francisco, CA, 2008, pp. 1-4.

117

[30] Xijiiang Lin et al., "High-frequency, at-speed scan testing," in IEEE Design & Test of

Computers, Sept.-Oct. 2003, vol. 20, no. 5, pp. 17-25.

[31] V. Vorisek, T. Koch and H. Fischer, "At-speed testing of SOC ICs," Design,

Automation and Test in Europe Conference and Exhibition, 2004. Proceedings,

2004, Vol.3 pp. 120-125.

[32] A. Crouch, "Design-for-Test for Digital IC's and Embedded Core Systems", Prentice

Hall, 1999.

[33] V. Dabholkar, S. Chakravarty, I. Pomeranz and S. Reddy, "Techniques for

minimizing power dissipation in scan and combinational circuits during test

application," in IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, Dec 1998, vol. 17, no. 12, pp. 1325-1333.

[34] P. Girard, "Survey of low-power testing of VLSI circuits," in IEEE Design & Test of

Computers, May/Jun 2002, vol. 19, no. 3, pp. 80-90.

[35] N. Nicolici and X. Wen, "Embedded Tutorial on Low Power Test," 12th IEEE

European Test Symposium, 2007. ETS '07, Freiburg, 2007, pp. 202-210.

[36] Seongmoon Wang and S. K. Gupta, "DS-LFSR: a new BIST TPG for low heat

dissipation," International Test Conference, Washington, DC, 1997, pp. 848-857.

[37] R. Multari, R. P. Vollertsen and V. Reece, "Burn-in discussion group minutes," IEEE

International Integrated Reliability Workshop Final Report, 2000, pp. 123-124.

[38] N. Planes et al., "28nm FDSOI technology platform for high-speed low-voltage

digital applications," 2012 Symposium on VLSI Technology (VLSIT), Honolulu, HI,

2012, pp. 133-134.

[39] Kwang-Ting Cheng, S. Dey, M. Rodgers and K. Roy, "Test challenges for deep sub-

micron technologies," Design Automation Conference, 2000. Proceedings, 2000,

pp. 142-149.

[40] S. Reda, "Thermal and Power Characterization of Real Computing Devices," in IEEE

Journal on Emerging and Selected Topics in Circuits and Systems, June 2011, vol. 1,

no. 2, pp. 76-87.

[41] P. Nigh et al., "Failure analysis of timing and IDDq-only failures from the

SEMATECH test methods experiment," International Test Conference, Washington,

DC, 1998, Proceedings, pp. 43-52.

[42] N. N. Tendolkar, "Analysis of Timing Failures Due to Random AC Defects in VLSI

Modules," 22nd Conference on Design Automation, 1985, pp. 709-714.

[43] J. A. Waicukauski, E. Lindbloom, B. K. Rosen and V. S. Iyengar, "Transition Fault

Simulation," in IEEE Design & Test of Computers, April 1987, vol. 4, no. 2, pp. 32-

38.

[44] L. C. Wang, Jing-Jia Liou and Kwang-Ting Cheng, "Critical path selection for delay

fault testing based upon a statistical timing model," in IEEE Transactions on

Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 11, pp.

1550-1565, Nov. 2004.

[45] N. Ahmed, C. P. Ravikumar, M. Tehranipoor and J. Plusquellic, "At-speed transition

118

fault testing with low speed scan enable," 23rd IEEE VLSI Test Symposium, 2005.

Proceedings, pp. 42-47.

[46] M. Yilmaz, K. Chakrabarty and M. Tehranipoor, "Test-Pattern Grading and Pattern

Selection for Small-Delay Defects," 26th IEEE VLSI Test Symposium, 2008. San

Diego, CA, pp. 233-239.

[47] Vikram Iyengar et al., "Variation-aware performance verification using at-speed

structural test and statistical timing," IEEE/ACM International Conference on

Computer-Aided Design, 2007. ICCAD 2007, San Jose, CA, 2007, pp. 405-412.

[48] N. Tendolkar et al., "Improving Transition Fault Test Pattern Quality through At-

Speed Diagnosis," IEEE International Test Conference, 2006. ITC '06, Santa Clara,

CA, 2006, pp. 1-9.

[49] J. Saxena et al., "A case study of ir-drop in structured at-speed testing,"

International Test Conference, 2003, ITC 2003. Proceedings, pp. 1098-1104.

[50] T. Yoshida and M. Watati, "A new approach for low-power scan testing,"

International Test Conference, 2003, ITC 2003. Proceedings, pp. 480-487.

[51] S. Bahukudumbi and K. Chakrabarty, "Power Management for Wafer-Level Test

During Burn-In," 17th Asian Test Symposium, 2008. ATS '08., Sapporo, pp. 231-

236.

[52] Srivaths Ravi, "Power-aware test: Challenges and solutions," IEEE International

Test Conference, 2007. ITC 2007, Santa Clara, CA, pp. 1-10.

[53] M. S. Hsiao, E. M. Rudnick and J. H. Patel, "Peak power estimation of VLSI circuits:

new peak power measures," in IEEE Transactions on Very Large Scale Integration

(VLSI) Systems, Aug. 2000, vol. 8, no. 4, pp. 435-439.

[54] P. Girard, N. Nicolici and X. Wen (editors), “Power-Aware Testing and Test

Strategies for Low Power Devices”, Springer, ISBN 978-1-4419- 0927-5, 2009.

[55] L. Whetsel, "Adapting scan architectures for low power operation," International

Test Conference, 2000. Proceedings, Atlantic City, NJ, 2000, pp. 863-872.

[56] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault and S. Pravossoudovitch, "A gated

clock scheme for low power scan testing of logic ICs or embedded cores," 10th

Asian Test Symposium, 2001, Kyoto. Proceedings, pp. 253-258.

[57] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch and H. J. Wunderlich, "A

modified clock scheme for a low power BIST test pattern generator," 19th IEEE

Proceedings on VLSI Test Symposium, VTS 2001, Marina Del Rey, CA, pp. 306-311.

[58] Bowman, K.A.; Duvall, S.G.; Meindl, J.D., "Impact of die-to-die and within-die

parameter fluctuations on the maximum clock frequency distribution for gigascale

integration," in Solid-State Circuits, IEEE Journal of , vol.37, no.2, pp.183-190.

[59] Zuchowski, P.S.; Habitz, P.A.; Hayes, J.D.; Oppold, J.H., "Process and environmental

variation impacts on ASIC timing," in International Conference on Computer Aided

Design, 2004. ICCAD 2004. IEEE/ACM pp.336-342.

[60] Dai, W.; Hao Ji, "Timing analysis taking into account interconnect process

variation," in IEEE International Workshop on Statistical Methodology, 2001

119

pp.51-53.

[61] Nassif, S.R., "Process variability at the 65nm node and beyond," in Custom

Integrated Circuits Conference, 2008. CICC 2008. IEEE, pp.1-8.

[62] Mutlu, Ayhan A.; Rahman, M., "Statistical methods for the estimation of process

variation effects on circuit operation," in IEEE Transactions on Electronics

Packaging Manufacturing, vol.28, no.4, pp.364-375.

[63] Sato, S., "Growing Importance of Fundamental Understanding of the Source of

Process Variations," in 14th IEEE International Conference on Advanced Thermal

Processing of Semiconductors, 2006. RTP '06., pp.5-9.

[64] Bhushan, M.; Gattiker, A.; Ketchen, M.B.; Das, K.K., "Ring oscillators for CMOS

process tuning and variability control," in IEEE Transactions on Semiconductor

Manufacturing, vol.19, no.1, pp.10-18.

[65] Das, B.P.; Onodera, H., "On-Chip Measurement of Rise/Fall Gate Delay Using

Reconfigurable Ring Oscillator," in IEEE Transactions on Circuits and Systems II:

Express Briefs, vol.61, no.3, pp.183-187.

[66] Tschanz, J.; Kao, J.; Narendra, S.; Nair, R.; Antoniadis, D.; Chandrakasan, A.; Vivek

De, "Adaptive body bias for reducing impacts of die-to-die and within-die

parameter variations on microprocessor frequency and leakage," in International

Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002

IEEE, pp.422-478.

[67] Xiaoliang Bai; Patel, P.; Xiaonan Zhang, "A new statistical setup and hold time

definition," in IEEE International Conference on IC Design and Technology

(ICICDT), 2012, pp.1-4.

[68] Okumura, T.; Hashimoto, M., "Setup time, hold time and clock-to-Q delay

computation under dynamic supply noise," in Custom Integrated Circuits

Conference (CICC), 2010 IEEE pp.1-4.

[69] Salman, E.; Dasdan, A.; Taraporevala, F.; Kucukcakar, K.; Friedman, E.G.,

"Pessimism reduction in static timing analysis using interdependent setup and

hold times," in 7th International Symposium on Quality Electronic Design, 2006.

ISQED '06. , pp.159-164.

[70] Kaneko, M.; Li, Jian, "Post-silicon skew tuning algorithm utilizing setup and hold

timing tests," in 2012 IEEE International Symposium on Circuits and Systems

(ISCAS), pp.125-128.

[71] Majumdar, A.; Wei-Yu Chen; Jun Guo, "Hold time validation on silicon and the

relevance of hazards in timing analysis," in Design Automation Conference, 2006

ACM/IEEE, pp.326-331.

[72] Adaptive (Dynamic) Voltage (Frequency) Scaling - Motivation and Implementation

TI Application Report SLVA646 March 2014.

[73] B. Shekhar and A. C. Andrew, “The future of microprocessors,” Commun. ACM, vol.

54, no. 5, pp. 67–77, May 2011.

[74] White Paper - Extending Digital Verification Techniques for Mixed-Signal SoCs

with VCS® AMS. URL:

120

http://www.synopsys.com/tools/verification/amsverification/documents/vcs-

ams-wp.pdf

[75] ITRS, update. URL http://www.itrs.net.

[76] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous and A. R. LeBlanc, "Design

of ion-implanted MOSFET's with very small physical dimensions," in IEEE Journal

of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, Oct 1974.

[77] M. Choudhury, V. Chandra, K. Mohanram and R. Aitken, "Analytical model for

TDDB-based performance degradation in combinational logic," 2010 Design,

Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden,

2010, pp. 423-428.

[78] M. Fojtik et al., "Bubble Razor: An architecture-independent approach to timing-

error detection and correction," 2012 IEEE International Solid-State Circuits

Conference, San Francisco, CA, 2012, pp. 488-490.

[79] T. D. Burd, T. A. Pering, A. J. Stratakos and R. W. Brodersen, "A dynamic voltage

scaled microprocessor system," in IEEE Journal of Solid-State Circuits, vol. 35, no.

11, pp. 1571-1580, Nov. 2000.

[80] A. Drake et al., "A Distributed Critical-Path Timing Monitor for a 65nm High-

Performance Microprocessor," 2007 IEEE International Solid-State Circuits

Conference. Digest of Technical Papers, San Francisco, CA, 2007, pp. 398-399.

[81] T. B. Chan, P. Gupta, A. B. Kahng and L. Lai, "DDRO: A novel performance

monitoring methodology based on design-dependent ring oscillators," Thirteenth

International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA,

2012, pp. 633-640.

[82] D. Ernst et al., "Razor: a low-power pipeline based on circuit-level timing

speculation," 36th Annual IEEE/ACM International Symposium on

Microarchitecture, 2003. MICRO-36. Proceedings. 2003, pp. 7-18.

[83] K. A. Bowman et al., "Energy-Efficient and Metastability-Immune Resilient Circuits

for Dynamic Variation Tolerance," in IEEE Journal of Solid-State Circuits, vol. 44,

no. 1, pp. 49-63, Jan. 2009.

[84] J. Tschanz et al., "A 45nm resilient and adaptive microprocessor core for dynamic

variation tolerance," 2010 IEEE International Solid-State Circuits Conference -

(ISSCC), San Francisco, CA, 2010, pp. 282-283.

[85] M. Agarwal, B. C. Paul, M. Zhang and S. Mitra, "Circuit Failure Prediction and Its

Application to Transistor Aging," 25th IEEE VLSI Test Symposium (VTS'07),

Berkeley, CA, 2007, pp. 277-286.

[86] M. Wirnshofer, L. Heiß, A. N. Kakade, N. P. Aryan, G. Georgakos and D. Schmitt-

Landsiedel, "Adaptive voltage scaling by in-situ delay monitoring for an image

processing circuit," IEEE 15th International Symposium on Design and Diagnostics

of Electronic Circuits & Systems (DDECS), 2012, Tallinn, 2012, pp. 205-208.

[87] L. Lai, V. Chandra, R. Aitken and P. Gupta, "SlackProbe: A low overhead in situ on-

line timing slack monitoring methodology," Design, Automation & Test in Europe

Conference & Exhibition (DATE), 2013, Grenoble, France, 2013, pp. 282-287.

121

[88] V. Huard, D.A. Patel et al., "Adaptive Wearout Management with in-situ aging

monitors," 2014 IEEE International Reliability Physics Symposium, Waikoloa, HI,

2014.

[89] BCH Code - URL : https://en.wikipedia.org/wiki/BCH_code

[90] E. Mintarno, V. Chandra, D. Pietromonaco, R. Aitken and R. W. Dutton, "Workload

dependent NBTI and PBTI analysis for a sub-45nm commercial microprocessor,"

2013 IEEE International Reliability Physics Symposium (IRPS), Anaheim, CA, 2013,

pp. 3A.1.1-3A.1.6.

[91] V. Huard et al., "Advances in industrial practices for optimal

performance/reliability/power trade-off in commercial high-performance

microprocessors for wireless applications," 2013 IEEE International Reliability

Physics Symposium (IRPS), 2013 IEEE International, Anaheim, CA, 2013, pp.

3E.5.1-3E.5.7.

[92] V. Huard, F. Cacho and X. Federspiel, "Technology scaling and reliability challenges

in the multicore era," 2013 IEEE International Reliability Physics Symposium

(IRPS), Anaheim, CA, 2013, pp. 3A.5.1-3A.5.7.

[93] D. Angot et al., "BTI variability fundamental understandings and impact on digital

logic by the use of extensive dataset," 2013 IEEE International Electron Devices

Meeting, Washington, DC, 2013, pp. 15.4.1-15.4.4.

[94] M. Pedram and S. Nazarin, “Thermal modeling, analysis, and management in VLSI

circuits: Principles and methods,” Proc. IEEE, vol. 94, no. 8, pp. 1487–1501, Aug.

2006.

[95] D. Brooks, R. Dick, R. Joseph, and L. Shang, “Power, thermal, and reliability

modeling in nanometer-scale microprocessors,” IEEE Micro, vol. 27, no. 3, pp. 49–

62, 2007.

[96] Y. Cheng, P. Raha, C. Teng, E. Rosenbaum, and S. Kang, “ILLIADS-T: An

electrothermal timing simulator for temperature-sensitive reliability diagnosis of

CMOS VLSI chips,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 17,

no. 8, pp. 668–680, Aug. 1998.

[97] P. Li, L. Pileggi, M. Asheghi, and R. Chandra, “IC thermal simulation and modeling

via efficient multigrid-based approaches,” IEEE Trans. Comput.-Aided Des. Integr.

Circuits Syst., vol. 25, no. 9, pp. 1763–1776, Sep. 2006.

[98] S.-C. Lin and K. Banerjee, “Cool chips: Opportunities and implications for power

and thermal management,” IEEE Trans. Electron Devices, vol. 55, no. 1, pp. 245–

255, Jan. 2008.

[99] H. Hamann, A. Weger, J. Lacey, Z. Hu, and P. Bose, “Hotspot-limited

microprocessors: Direct temperature and power distribution measurements,”

IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 56–65, Jan. 2007.

[100] Advantest Technical Documentation, SmarTest 6.5.4 User Guide.

[101] Vittoz, E.; Fellrath, J., "CMOS analog integrated circuits based on weak inversion

operations," Solid-State Circuits, IEEE Journal of , vol.12, no.3, pp.224,231, Jun

1977.

122

[102] Wang, A.; Chandrakasan, A., "A 180-mV subthreshold FFT processor using a

minimum energy design methodology," Solid-State Circuits, IEEE Journal of ,

vol.40, no.1, pp.310,319, Jan. 2005.

[103] C. Fenouillet-Beranger, et al., “Fully- depleted soi technology using high-k and

single-metal gate for 32 nm node lstp applications featuring 0.179 um2 6t-sram

bitcell,” in Electron Devices Meeting, 2007. IEDM 2007. IEEE International , vol.,

no., pp.267,270, 10-12 Dec. 2007.

[104] B. Pelloux-Prayer, M. Blagojević, S. Haendler, A. Valentian, A. Amara and P.

Flatresse, "Performance analysis of multi-VT design solutions in 28nm UTBB FD-

SOI technology," 2013 IEEE SOI-3D-Subthreshold Microelectronics Technology

Unified Conference (S3S), Monterey, CA, 2013, pp. 1-2.

[105] F. Abouzeid, et. al, “28nm CMOS, energy efficient and variability tolerant,

350mV-to-1.0V, 10MHz/700MHz, 252bits frame error-decoder,” ESSCIRC

(ESSCIRC), 2012 Proceedings of the , vol., no., pp.153,156, 17-21 Sept. 2012.

[106] F. Abouzeid, A. Bienfait, K. C. Akyel, S. Clerc, et al., “Scalable 0.35v to 1.2v SRAM

Bitcell design from 65nm CMOS to 28nm FDSOI,” ESSCIRC (ESSCIRC), 2013

Proceedings of the , vol., no., pp.205,208, 16-20 Sept. 2013

[107] Aeroflex Gaisler AB, 2012. Available Online:

http://www.gaisler.com/index.php/products/processors

[108] N. Ickes, G. Gammie, M. Sinangil, R. Rithe, et al., “A 28 nm 0.6 v low power DSP for

mobile applications,” Solid-State Circuits, IEEE Journal of , vol.47, no.1, pp.35,46,

Jan. 2012.

[109] N. Ickes, Y. Sinangil, F. Pappalardo, E. Guidetti, and A. Chandrakasan, “A 10 pj/cycle

ultra-low-voltage 32-bit microprocessor system-on-chip,” ESSCIRC (ESSCIRC),

2011 Proceedings of the , vol., no., pp.159,162, 12-16 Sept. 2011.

[110] D. Bol, et al., “A 25MHz 7 uW/MHz ultra-low-voltage microcontroller soc in 65nm

LP/GP CMOS for low- carbon wireless sensor nodes,” Solid-State Circuits

Conference Digest of Technical Papers (ISSCC), 2012 IEEE International , vol., no.,

pp.490,492, 19-23 Feb. 2012.

[111] J. Kwong, et al., “A 65nm sub-vt microcontroller with integrated sram and

switched-capacitor dc-dc converter,” Solid-State Circuits Conference, 2008. ISSCC

2008. Digest of Technical Papers. IEEE International , vol., no., pp.318,616, 3-7 Feb.

2008.

[112] R. Wilson, E. Beigne, P. Flatresse, A. V. F. Abouzeid, et al., “A 460mhz at 397mv,

2.6ghz at 1.3v, 32b vliw dsp, embedding f max tracking,” Solid-State Circuits

Conference Digest of Technical Papers (ISSCC), 2014 IEEE International , vol., no.,

pp.452,453, 9-13 Feb. 2014.

[113] R. P. Weicker. (1988, May) dhrystone test suite. Siemens AG. Available Online:

http://en.wikipedia.org/wiki/Dhrystone

[114] A. O. L. Atkin and D. J. Bernstein. Sieve of atkin. Available Online:

http://en.wikipedia.org/wiki/Atkin sieve

123