embedded tutorial: breaking the dynamic power barrier ...mrg/w2a_1.pdf · clock distribution...

2
Embedded Tutorial: Breaking the Dynamic Power Barrier using Distributed-LC Resonant Clocking Matthew R. Guthaus Department of Computer Engineering University of California Santa Cruz Santa Cruz, CA 95064 [email protected] Abstract—Power consumption is the predominant challenge in modern high-performance systems and is becoming increasingly important due to mobile applications. High power consumption leads to decreased battery lifetime, cooling challenges which limit form factors, and large on-chip temperatures which can decrease reliability. While stand-by and idle power can be minimized with many techniques, active-mode dynamic power consumption during peak operation continues to be a formidable barrier. Reducing the clock power consumption can significantly reduce overall active-mode dynamic power. This embedded tutorial explores the recent commercial and academic results in the design and optimization of distributed-LC resonant clocks and discusses the future challenges and open research problems. I. I NTRODUCTION Power consumption continues to be one of the major concerns in VLSI design, especially due to the increasing demand of mobile applications. Clock gating, power gating, dynamic voltage and fre- quency scaling (DVFS), and multiple threshold voltages among other techniques are used to reduce dynamic and leakage power. However, this power is usually saved by exploiting inactivity or adjusting performance to minimally meet demand. After all is said and done, significant dynamic power is still required to perform computation. This required amount of computation power is the dynamic power barrier. Many high-performance applications have active circuits that can- not be further slowed or turned off without sacrificing performance. This is especially problematic in highly parallel many-core and GPGPU designs where high activities during peak throughput demand the most power and the chips often have unsustainable levels of power consumption. While previous methods can save power outside of this peak demand, new methods to lower active-mode dynamic power consumption are in absolute demand. Resonant clocking is gaining acceptance as a technique to address this. On-chip clock distribution networks (CDNs) often consume in excess of 35% of total chip power and occasionally as much as 70%. Resonant clock distributions can reduce this power by recycling energy on-chip and reducing the overall clock power beyond tradi- tional circuit and physical design techniques [1]–[3]. This embedded tutorial introduces recent techniques for distributed-LC resonant clock distributions for both circuit designers and CAD engineers. It begins with the basic distributed-LC resonant circuit techniques and how these are applied to form resonant clock networks. It then includes an overview of the recent resonant-clocked chips prototyped by IBM and AMD. Finally, it discusses the state-of-the-art synthesis algorithms for resonant clocks in ASIC methodologies. Last, it discusses remaining challenges in the research of distributed-LC resonant clocks. II. RESONANT CLOCK CLASSIFICATION There are three distinctly different approaches to create “resonant” clocks. These include standing wave (salphasic) [4], [5], traveling wave (rotary) [6], and LC-tank resonant [7]–[18] clock distributions. Prior embedded tutorials have compared and contrasted the previous three types of clock distributions [19], but this tutorial focuses on the later. Standing and traveling wave clocks assume that on-chip inductance forms transmission lines whereas the LC-tank resonant clocks assume lumped inductors. LC-tank resonant clocks can further be distinguished as monolithic or distributed. Monolithic LC-tank clock distributions use a single tank circuit whereas distributed LC- tank clock address the distributed parasitic interconnect in large modern chips. III. MONOLITHIC AND DISTRIBUTED LC-TANK RESONANT CLOCKS The LC-tank resonant clocks are similar to conventional clocks by providing a clock signal with constant phase and magnitude, thus it easier to integrate in a traditional IC flow than standing and traveling wave clocks. However, the LC-tank resonant clocks require additional passive components to form electrical resonance to reduce power consumption. This electrical resonance occurs when the imaginary reactive parts of impedance cancel out at a particular “resonant” frequency. Resonance has been used extensively in the analog and radio frequency (RF) fields to build high-quality passive filter networks. Both monolithic and distributed LC-tank resonant clocks modify the clock impedance by inserting inductor-capacitor (LC) “tank” circuits to cancel the aforementioned reactive portion of the clock impedance. In a parallel LC circuit, this ideally leaves an infinite electrical impedance at the resonant frequency, while in a series LC-tank this ideally leaves zero electrical impedance. Increasing the clock network impedance allows clock driver transistor sizes to be dramatically reduced which saves both power and silicon area. In addition, the oscillatory behavior of an LC-tank circuit recovers some of the energy as it is transferred back and forth between the clock/decoupling capacitance and the inductor. LC-tank resonant clocks have been shown to save between 30% and 80% of dynamic power compared to buffered clocks [8], [16], [23]. Low-power, monolithic voltage controlled oscillators (VCOs) have used similar techniques for many years to create on-chip clock references [20], [21]. Early adoption of these techniques resulted in monolithic LC tank clocks [7], [10], [11]. Recently, monolithic LC tanks have been simulated using additional harmonics to improve the sinusoidal slew rates [22]. The monolithic concepts were extended to 978-1-4799-0524-9/13/$31.00 ©2013 IEEE 380

Upload: others

Post on 11-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Embedded Tutorial: Breaking the Dynamic Power Barrier ...mrg/W2A_1.pdf · clock distribution network using coupled standing-wave oscillators,” in ACM/IEEE Design Automation Conference

Embedded Tutorial: Breaking the Dynamic Power Barrier using

Distributed-LC Resonant Clocking

Matthew R. Guthaus

Department of Computer Engineering

University of California Santa Cruz

Santa Cruz, CA 95064

[email protected]

Abstract—Power consumption is the predominant challenge in modern

high-performance systems and is becoming increasingly important due to

mobile applications. High power consumption leads to decreased battery

lifetime, cooling challenges which limit form factors, and large on-chip

temperatures which can decrease reliability. While stand-by and idle

power can be minimized with many techniques, active-mode dynamic

power consumption during peak operation continues to be a formidable

barrier. Reducing the clock power consumption can significantly reduce

overall active-mode dynamic power. This embedded tutorial explores the

recent commercial and academic results in the design and optimization

of distributed-LC resonant clocks and discusses the future challenges and

open research problems.

I. INTRODUCTION

Power consumption continues to be one of the major concerns

in VLSI design, especially due to the increasing demand of mobile

applications. Clock gating, power gating, dynamic voltage and fre-

quency scaling (DVFS), and multiple threshold voltages among other

techniques are used to reduce dynamic and leakage power. However,

this power is usually saved by exploiting inactivity or adjusting

performance to minimally meet demand. After all is said and done,

significant dynamic power is still required to perform computation.

This required amount of computation power is the dynamic power

barrier.

Many high-performance applications have active circuits that can-

not be further slowed or turned off without sacrificing performance.

This is especially problematic in highly parallel many-core and

GPGPU designs where high activities during peak throughput demand

the most power and the chips often have unsustainable levels of power

consumption. While previous methods can save power outside of this

peak demand, new methods to lower active-mode dynamic power

consumption are in absolute demand. Resonant clocking is gaining

acceptance as a technique to address this.

On-chip clock distribution networks (CDNs) often consume in

excess of 35% of total chip power and occasionally as much as

70%. Resonant clock distributions can reduce this power by recycling

energy on-chip and reducing the overall clock power beyond tradi-

tional circuit and physical design techniques [1]–[3]. This embedded

tutorial introduces recent techniques for distributed-LC resonant clock

distributions for both circuit designers and CAD engineers. It begins

with the basic distributed-LC resonant circuit techniques and how

these are applied to form resonant clock networks. It then includes an

overview of the recent resonant-clocked chips prototyped by IBM and

AMD. Finally, it discusses the state-of-the-art synthesis algorithms for

resonant clocks in ASIC methodologies. Last, it discusses remaining

challenges in the research of distributed-LC resonant clocks.

II. RESONANT CLOCK CLASSIFICATION

There are three distinctly different approaches to create “resonant”

clocks. These include standing wave (salphasic) [4], [5], traveling

wave (rotary) [6], and LC-tank resonant [7]–[18] clock distributions.

Prior embedded tutorials have compared and contrasted the previous

three types of clock distributions [19], but this tutorial focuses on

the later. Standing and traveling wave clocks assume that on-chip

inductance forms transmission lines whereas the LC-tank resonant

clocks assume lumped inductors. LC-tank resonant clocks can further

be distinguished as monolithic or distributed. Monolithic LC-tank

clock distributions use a single tank circuit whereas distributed LC-

tank clock address the distributed parasitic interconnect in large

modern chips.

III. MONOLITHIC AND DISTRIBUTED LC-TANK RESONANT

CLOCKS

The LC-tank resonant clocks are similar to conventional clocks

by providing a clock signal with constant phase and magnitude,

thus it easier to integrate in a traditional IC flow than standing

and traveling wave clocks. However, the LC-tank resonant clocks

require additional passive components to form electrical resonance

to reduce power consumption. This electrical resonance occurs when

the imaginary reactive parts of impedance cancel out at a particular

“resonant” frequency. Resonance has been used extensively in the

analog and radio frequency (RF) fields to build high-quality passive

filter networks.

Both monolithic and distributed LC-tank resonant clocks modify

the clock impedance by inserting inductor-capacitor (LC) “tank”

circuits to cancel the aforementioned reactive portion of the clock

impedance. In a parallel LC circuit, this ideally leaves an infinite

electrical impedance at the resonant frequency, while in a series

LC-tank this ideally leaves zero electrical impedance. Increasing

the clock network impedance allows clock driver transistor sizes to

be dramatically reduced which saves both power and silicon area.

In addition, the oscillatory behavior of an LC-tank circuit recovers

some of the energy as it is transferred back and forth between

the clock/decoupling capacitance and the inductor. LC-tank resonant

clocks have been shown to save between 30% and 80% of dynamic

power compared to buffered clocks [8], [16], [23].

Low-power, monolithic voltage controlled oscillators (VCOs)

have used similar techniques for many years to create on-chip clock

references [20], [21]. Early adoption of these techniques resulted in

monolithic LC tank clocks [7], [10], [11]. Recently, monolithic LC

tanks have been simulated using additional harmonics to improve the

sinusoidal slew rates [22]. The monolithic concepts were extended to

978-1-4799-0524-9/13/$31.00 ©2013 IEEE380

Page 2: Embedded Tutorial: Breaking the Dynamic Power Barrier ...mrg/W2A_1.pdf · clock distribution network using coupled standing-wave oscillators,” in ACM/IEEE Design Automation Conference

distributed LC-tank clocks in order to address performance scalability

and harness improved power efficiency. Industry has demonstrated

distributed-LC resonant clocks for uniform H-trees [8], [9] and clock

grids [18], [23].

On the design tools side, several methodologies have been

proposed to synthesize global H-tree resonant clocks [12], asym-

metric global resonant clock trees [13], hierarchical local resonant

clocks [17], and resonant clock grids using custom inductor siz-

ing [14], [16] and inductor-library-based sizing [15]. At least one

company has a commercial product to support LC-tank resonant

clocking [24] in commercial designs [18], [23].

IV. CHALLENGES AND OPPORTUNITIES

Numerous challenges remain as barriers to the acceptance of

resonant clocking in most designs. The following are some of the

challenges of resonant clocking:

• increased resource consumption by inductors and decoupling

capacitances,

• integration with existing clock gating techniques,

• difficulty in wide-range frequency scaling for DVFS,

• increased clock slew,

• difficulty generating mid-frequency (1−2GHz) clocks,

• and integration into existing design tools and methodologies.

However, solutions to the above problems are becoming apparent

in recent years. This tutorial discusses each of the prior issues and

presents interim solutions to many of them.

In particular, the skepticism of designers to accept a multi-

disciplinary methodology that requires knowledge of analog/RF

circuits, digital design, and computer-aided design is problematic.

Because of this, research into design tools and methodologies is

the biggest challenge in the full-scale adoption of these technologies

in mainstream ICs. The synthesis of a resonant clock distribution

network to synchronize an entire system, complete with a global

and local clock networks, is a gargantuan task, but is of the utmost

importance for future low-power needs and the adoption of such a

promising technique.

ACKNOWLEDGMENTS

This work was supported in part by the National Science Foun-

dation under grant CCF-1053838.

REFERENCES

[1] M. R. Guthaus, G. Wilke, and R. Reis, “Non-uniform clock meshoptimization with linear programming buffer insertion,” in ACM/IEEE

Design Automation Conference (DAC), June 2010, pp. 74–79.

[2] M. Guthaus, X. Hu, G. Wilke, G. Flache, and R. Reis, “High-performance clock mesh optimization,” ACM Transactions on Design

Automation of Electronic Systems (TODAES), vol. 17, no. 3, pp. 33:1–33:17, June 2012.

[3] M. Guthaus, G. Wilke, and R. Reis, “Revisiting automated physicalsynthesis of high-performance clock networks,” ACM Transactions on

Design Automation of Electronic Systems (TODAES), vol. 18, no. 2, pp.31:1–31:27, March 2013.

[4] F. O’Mahony, C. Yue, M. Horowitz, and S. Wong, “Design of a 10GHzclock distribution network using coupled standing-wave oscillators,” inACM/IEEE Design Automation Conference (DAC), June 2003, pp. 682–687.

[5] V. Chi, “Salphasic distribution of clock signals for synchronous sys-tems,” IEEE Transactions on Computers, vol. 43, no. 5, pp. 597 – 602,May 1994.

[6] J. Wood, T. C. Edwards, and S. Lipa, “Rotary traveling-wave oscil-lator arrays: A new clock technology,” IEEE Journal of Solid-State

Circuits (JSSC), vol. 36, no. 11, pp. 1654–1664, November 2001.

[7] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, “Resonantclocking using distributed parasitic capacitance,” IEEE Journal of Solid-

State Circuits (JSSC), vol. 39, no. 9, pp. 1520 – 1528, September 2004.

[8] S. Chan, P. Restle, K. Shepard, N. James, and R. Franch, “A 4.6GHzresonant global clock distribution network,” in IEEE International

Solid-State Circuits Conference (ISSCC), February 2004, pp. 342 – 343.

[9] S. C. Chan, K. L. Shepard, and P. J. Restle, “Design of resonantglobal clock distributions,” in International Conference on Computer

Design (ICCD), October 2003, pp. 248–253.

[10] C. Ziesler, S. Kim, and M. Papaefthymiou, “A resonant clock generatorfor single-phase adiabatic systems,” in IEEE International Symposium

on Low Power Electronics and Design (ISLPED), August 2001, pp.159–164.

[11] J.-Y. Chueh, M. Papaefthymiou, and C. Ziesler, “Two-phase resonantclock distribution,” in IEEE International Symposium on Very Large

Scale Integration (ISVLSI), May 2005, pp. 65–70.

[12] J. Rosenfeld and E. Friedman, “Design methodology for global resonanth-tree clock distribution networks,” IEEE Transactions on Very Large

Scale Integration (VLSI) Systems, vol. 15, no. 2, pp. 135–148, February2007.

[13] M. R. Guthaus, “Distributed LC resonant clock tree synthesis,” in IEEE

International Symposium on Circuits and Systems (ISCAS), May 2011,pp. 1215–1218.

[14] X. Hu and M. R. Guthaus, “Distributed resonant clock grid synthesis(ROCKS),” in ACM/IEEE Design Automation Conference (DAC), June2011, pp. 516–521.

[15] X. Hu, W. Condley, and M. R. Guthaus, “Library-aware resonantclock synthesis (LARCS),” in ACM/IEEE Design Automation Confer-

ence (DAC), June 2012, pp. 145–150.

[16] X. Hu and M. Guthaus, “Distributed LC resonant clock grid synthesis,”IEEE Transactions on Circuits and Systems I (TCAS-I), 2012.

[17] W. Condley, X. Hu, and M. Guthaus, “A methodology for local resonantclock synthesis using LC-assisted local clock buffers,” in International

Conference on Computer-Aided Design (ICCAD), November 2011, pp.503–506.

[18] V. Sathe, S. Arekapudi, C. Ouyang, M. Papaefthymiou, A. Ishii, andS. Naffziger, “Resonant clock design for a power-efficient high-volumex86-64 microprocessors,” in IEEE International Solid-State Circuits

Conference (ISSCC), February 2012, pp. 68–70.

[19] M. Guthaus and B. Taskin, “High-performance, low-power resonantclocking: Embedded tutorial,” in IEEE/ACM International Conference

on Computer-Aided Design (ICCAD), 2012, pp. 742–745.

[20] R. M. Senger, E. D. Marsman, M. S. McCorquodale, F. H. Gebara,K. L. Kraver, M. R. Guthaus, and R. B. Brown, “A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference,” inACM/IEEE Design Automation Conference (DAC), June 2003, pp. 520–525.

[21] E. D. Marsman, R. M. Senger, M. S. McCorquodale, M. R. Guthaus,R. A. Ravindran, G. S. Dasinka, S. A. Mahlke, and R. B. Brown, “A 16-bit low-power microcontroller with monolithic MEMS-LC clocking,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2005,pp. 624–627.

[22] H. Skinner, X. Hu, and M. R. Guthaus, “Harmonic resonant clocking,”in IFIP/IEEE International Conference on Very Large Scale Integra-

tion (VLSI-SoC), 2012, pp. 59–64.

[23] V. Sathe, S. Arekapudi, A. Ishii, C. Ouyang, M. Papaefthymiou, andS. Naffziger, “Resonant-clock design for a power-efficient, high-volumex86-64 microprocessor,” IEEE Journal of Solid-State Circuits (JSSC),vol. 48, no. 1, pp. 140–149, 2013.

[24] http://www.cyclos-semi.com.

381