[ieee 2014 ieee 29th international conference on microelectronics (miel) - belgrade, serbia...

397978-1-4799-5296-0/14/$31.00 © 2014 IEEE

PROC. 29th INTERNATIONAL CONFERENCE ON MICROELECTRONICS (MIEL 2014), BELGRADE, SERBIA, 12-14 MAY, 2014

Low Power Nonlinear MIN/MAX FiltersImplemented in the CMOS Technology

Rafa�l D�lugosz, Andrzej Rydlewski, Tomasz Talaska

Abstract—A novel, binary-tree, asynchronous, nonlin-ear Min/Max filter is presented in the paper. In theproposed circuit an input signal (current in this case)is first sampled in the circular delay line, controlled bya multiphase clock (8 phases in this case). In the nextstage particular samples are converted to 1-bit signalswith delays proportional to the values of these samples.In the following step the delay times are compared indigital binary-tree structure. The circuit has been sim-ulated in the TSMC CMOS 0.18 µm process. It offers aprecision of 99.5% at data rate of 2.5 MSamples/s andenergy consumption of 0.3 – 1 pJ per input.

I. Introduction

The Max and the Min functions, often referred to asthe winner takes all (WTA) and loser takes all (LTA)operations respectively, are useful in many applications.Such operations are used, for example, in artificial neu-ral networks (ANN) and in the signal and image pro-cessing [1]. In the competitive learning, which is com-mon in some types of ANNs, the Min function is beingused to determine which of the neurons is located in theclosest proximity to a given learning pattern. On theother hand, in a typical signal processing both Min andMax circuits are used as nonlinear filters. Such filtersare used, for example, to enhance the signal or to cor-rect shapes of the objects in pictures. The Min/Maxfilters may be combined together to perform more com-plex tasks, e.g. morphological dilatation and erosionsmoothing operations commonly used in image process-ing [2], [3], [4].A large similarity exists between the nonlinear

Min/Max filters and the LTA/WTA circuits. In bothcases the core circuit performs the same task, which re-lies on searching for either the minimum or maximumsignal among a set of the input signals. In ANNs all in-put signals are usually independent as they come fromindependent neurons distributed over the input dataspace. On the other hand, the Min/Max filters, which

R. D�lugosz is with the University of Technology and Life Sci-ences, Faculty of Telecommunication and Electrical Engineering,ul. Kaliskiego 7, 85-796, Bydgoszcz, Poland, and with the SwissFederal Institute of Technology (EPFL), Institute of Microtech-nology, Rue de la Maladiere 71b, CH-2000, Neuchatel, Switzer-land, E-mail: [email protected]. Rydlewski is with Alcatel-Lucent, Coldra Woods, Chepstow

Rd, Newport NP18 2YB , E-mail: [email protected]. Talaska is with the University of Technology and Life

Sciences, Faculty of Telecommunication and Electrical Engi-neering, ul. Kaliskiego 7, 85-796, Bydgoszcz, Poland, E-mail:[email protected]

are in the scope of our interest in this work, processdelayed samples of the same signal stored in the delayline, as shown in Fig. 1. In Fig. 1 a classic delay lineis shown to illustrate the idea, but in this case signalsamples have to be rewriten many times between ana-log memory cells that is the source of errors. To avoidthis problem we have used a circular delay line [5].

Various Min/Max circuits have been reported in theliterature, but two general types of architectures can beclearly distinguished. In the first group the Min/Maxcircuits are usually based on the current conveyor (CC)architecture [6], [7], [8]. In this case all input signals(either currents or voltages) are compared in a singlestage. Such circuits usually feature a simple structurebut suffer from limited accuracy that decreases whenthe number of inputs increases [9], [7]. This problemresults mostly from the, so called, ‘corner error’, whichoccurs when two or more input signals have similar val-ues. In this case an average value between these signalsoccurs at the output of the circuit.

The second group makes use of the binary tree (BT)concept. In this case the competition between the inputsignals is conducted on particular layers of the tree. Thenumber of layers equals log2 M , where M is the numberof inputs. Signals at particular layers compete in pairsand always only one winning signal is allowed to takepart in the competition at the next layer of the tree [10],[11], [7]. Such circuits usually are more complex, but ifprecise comparators are used they are able to properlydistinguish signals that differ by very small amounts.In typical BT solutions analog signals at the outputs ofparticular layers are determined (calculated or copied)on the basis of the signals provided from previous layers[10], [7], [12], which can be the source of errors [13] thataccumulate at the top of the tree.

The proposed circuit is based on the BT approachthat enables an unambiguous selection of only one –Min or Max – signal, which is an advantage. In con-trary to typical solutions of this type in the proposedcircuit copying of analog signals between layers has beeneliminated, which strongly reduces the errors describedabove.

II. The proposed Min / Max binary-treenonlinear filter

The proposed nonlinear binary-tree filter, shown inFig. 2, is composed of several blocks or groups of el-

398

T T T

( )x −ln−2nx( )−1n( )x( )x n

( )x nmax[ ] ( )x nmin[ ]

Min / Max block

or

Fig. 1. The idea of the nonlinear filtering

Fig. 2. Block diagram of the proposed Min / Max filter – Binary-tree solution

ements, presented in Fig. 3. The input current Iin iscopied M times using the input multiple output cur-rent mirror (CM), shown in Fig. 3(a). In this way eachbranch receives a separate copy of the input signal andthus data processing in particular branches in indepen-dent from each other. This signal is then sampled andstored in sample&hold memory elements (S&H), shownin Fig. 3(b). To compensate a typical in this case chargeinjection effect across the capacitor we have used the, socalled, dummy switches (swD) controlled by clock sig-nals of opposite polarity in comparison with the mem-ory switches (swM).Output signals from the S&H elements, denoted as

I′

ini are provided to current-to-time converters (ITC),shown in Fig. 3(c), that convert them into binary 1-bitflag (F) signals. The flag signals occur at the outputs ofparticular ITCs with delays proportional to the valuesof the signal samples. Each ITC block is composed ofthe PMOS cascoded current-mirror, a capacitor with areset function, and two NOT gates. The speed of charg-ing of the capacitor is proportional to the value of theI

′

ini signal. The NOT gates change their logical stateswhen the voltage across the capacitor, C, reaches a valeof about VDD/2. The cascoded CMs have been usedto increase the accuracy of the copying operations. Tominimize the offsets that can be introduced by the NOTgates, transistors in these elements have been oversized.To avoid multiple read and write operations of partic-

ular signal samples, which is the source of large errorsin classical delay line, we have used the circular delayline. In this approach particular samples are not rewrit-ten between memory cells but remain in particular cellsas long as they are replaced by new samples after Mclock cycles. A disadvantage of this solution is more

(a)Iin Iin1 Iin2

ckn

Iinn

VDD

ck2ck1

(b)

Iin i

Iin i

ck i

ck i

swD

swM

ck i

ck i

,

CST

(c)

Iin

VC

Iin

FL

C

W/LW/L

W/L W/L

Reset

,

,,

i

ii

(d)

o1

o2

swp

swn

VDD

VSS

F1

F2

F (flag of the pair)

RS out1

RS out2

NOTp

NOTn

XOR

swp

swn

swn

swp

(e) A1

A2

A8A

3A4

A5

A6

A7

o12

o18

o11

o13

o14

o15

o16

o17

o21

o22

o23

o24

o31

o32

o41

Fig. 3. Components of the proposed Min/Max filter: (a) In-put current mirror, (b) sample&hold memory element (S&H),(c) Current to time converter (ITC), (d) Dealy comparator(DCMP) – the Max mode, (e) address determination block(ADET)

Fig. 4. Transistor level simulations of a single DCMP circuit.

complex (M -phases) controlling clock, but the circuitprecision is more important parameter.

399

Fig. 5. Simulations of the circular delay line with S&H memoryelements. From top to bottom are presented: (1) an exam-ple input current with the amplitude of 2 µA, (2) controllingclock signals (8 phases), (3) signal samples stored in partic-ular S&H cells (voltages across the CST capacitors), and (4)the supply current.

The following block in data processing chain is thedelay comparator (DCMP), shown in Fig. 3(d), thatcompares delays of particular flag signals. This circuitis built on the basis of the RS flip flop (RSFF) that isable to distinguish very small (at the level of 3 – 5 ns)differences between delays. Depending on the mode ofthe filter (Min or Max) either the smaller or the largerof two input signals becomes the winner, which DCMPblock signalizes by two digital signals, o1 and o2. In theproposed BT circuit the determination process of thewinning signal is based on the competition performedat particular layers of the tree. To make it possible,DCMP blocks provide an additional signal – flag (F) ofa given pair – that takes part in the competition at thefollowing layer of the tree. Between the input F11 andF12 signals and the output F signal only one OR gate isused. As a result, as soon as only one input flag occurs,a given DCMP immediately (with a delay below 0.5 ns)sends the flag F to a next layer of the tree.The o1 and o2 signals are used by the ADET block

(address determination), shown in Fig. 3(e), to deter-mine the address of the winning signal. The o1 and o2signals have always such values that enable the ADETblock an unambiguous indication of the winning signal.Unfortunately, the problem with the RSFF is that

it can hang (‘0.5’ states at both outputs) when twoinput flags arrive at almost the same time i.e. when thecorresponding input currents are almost equal. In thiscase the values at both outputs of the RSFF are equalto about VDD/2. To avoid ambiguity in this case, asimple hierarchy mechanism has been introduced that isable to recognize the ‘0.5’ states. In such situations thecircuit arbitrarily decides that one of the input signals

obtains the status of the ‘winner’.The arbitrary mechanism is based on asymmetrical

NOT (NOTn and NOTp) gates. The gates have differ-ent threshold voltages obtained throughout a propertransistor sizing. These voltages are equal to 0.25 ·VDD/2 and 0.75 · VDD/2 for the NOTp and the NOTngates, respectively. In case when the RSFF hangs, thegates provide different output signals that is detectedby the XOR gate. This gate throughout the configu-ration switches (controlled by ‘swn’ and ‘swp’ signals)controls the values of the o1 and o2 output signals. Inthis case the circuit arbitrarily connects the outputs ofthe RSFF to VDD (‘1’) and VSS (‘0’) supplies. Asin this case both analog input signals are almost equal(difference < 0.2%) such operation does not introducea substantial error. Additionally, it is worth to say thatthe ‘0.5’ states occur very seldom in practice.

A. Simulations of the proposed circuit

The proposed circuit has been tested in severalstages. At the beginning a single DCMP circuit hasbeen thoroughly verified, with a special emphasis put onthe arbitrary mechanism. Illustrative results are shownin Fig. 4. The RS out1 and RS out2 signals can be ei-ther in a typical state (A), in which their values are ‘0’or ‘1’, or in the ‘0.5’ state (B), which is not desired. Incase B (e.g. in the range from 47 to 52 µs) the outputsof asymmetrical NOTn and NOTp gates have differentvalues. This state is detected by the XOR gate thatsignalizes it by changing the values of the ‘swp’ and‘swn’ signals. As a result, the values of the o1 and o2are arbitrary set to ‘1’ and ‘0’ values, respectively.After verifying the DCMP block we have tested the

performance of the overall filter composed of a delayline with 8 memory cells and the BT block with threelayers (log2 8). The results are presented in Figs. 5 and6. Fig. 5 illustrates the operation of the delay line. Anexample input current (sinus waveform with f = 10 kHzand the amplitude of 2 µA) is sampled and held in thememory cells (CST=400 fF). Each sample remains ina given cell during 8 subsequent clock cycles. An aver-age supply current during a single detection cycle of thewinning signal equals 70 µA, which means that averagepower dissipation equals 126 µW (at VDD=1.8 V). Ascan be seen in Fig. 6 a typical detection time of thewinning signal does not exceed 20 ns. In the worst casescenario (low values of ca. 1 µA at all inputs) the de-tection time equals 80 ns. Energy consumption per onedetection cycle, per 1 input equals 1.26 pJ and 0.32 pJin the worst case and in typical cases, respectively. Forthe comparison, in other circuits of this type the en-ergy consumption is usually at higher level (43 pJ in[10], and about 14 pJ in [7] and [13]).A full detection cycle performed in the BT block is

shown in Fig. 6. Top panel presents voltages across theC capacitors in particular ITC blocks. This phase is

400

Fig. 6. Simulations of the BT block composed of the DCMP circuits. From top to bottom are presented: (1) VC voltages in the ITCcircuits, (2) resultant flag signals, and (3) addresses of the samples that in a given time period have the maximum values.

preceded by reseting the capacitors. After the Resetfunction is released, the capacitors are charged from 0to VDD by currents, whose values are stored in the cor-responding S&H elements. Middle panel presents resul-tant delays of particular flags, while the bottom panelthe addresses of the winning signal (max in this case).Three detection cycles are shown in Fig. 6. In the firstcase the I7 and I8 signals are equal and both corre-sponding flags occur at the same time. The arbitrarymechanism selected one of these signals as a winner.The next two cycles are typical cases. Detection timevaries in this case in between 5 and 20 ns. In most caseswe expect this time to be below 10 – 20 ns.

III. Conclusions

A novel nonlinear filter has been proposed. The cir-cuit is based on the binary tree concept, but in contraryto typical solutions of this type, the used BT block isa parallel and asynchronous digital circuit. As a result,it is much faster than its analog counterparts. The cir-cuit offers a precision at the level exceeding 99.5%, buta future laboratory confirmation of these results is nec-essary, as the noise can have an influence on the results.

References

[1] Vemis M., Economou G., Fotopoulos S., Khodyrev A., TheUse of Boolean Functions and Logical Operations for EdgeDetection in Images, Signal Processing, 1995, Vol. 45, 161-172

[2] R.A.Araujo, A.L.I.Oliveira, S. Soares, S. Meira, “DesigningDilation-Erosion Perceptrons with Differential EvolutionaryLearning for Air Pressure Forecasting”, Proceedings of In-ternational Joint Conference on Neural Networks, 2011, SanJose, California, USA, pp.595–602

[3] P.T. Jackway, M. Deriche, “Scale-Space Properties of the

Multiscale Morphological Dilation-Erosion”, IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1996,Vol. 18, No. 1, pp.38–51

[4] Joseph (Yossi) Gil and Ron Kimmel, “Efficient dilation,erosion, opening, and closing algorithms”, IEEE Transac-tions on Pattern Analysis and Machine Intelligence, Vol.24,Iss.12, December 2002, pp.1606–1617

[5] Sophocles J. Orfanidis, Introduction to Signal Processing,previously published by Pearson Education, Inc. 1996–2009by Prentice Hall, Inc. Previous ISBN 0-13-209172-0

[6] W. W. Moses, E. Beuville, M. H. Ho, “AWinner-Take-All ICfor determining the crystal of interaction in PET detectors”,IEEE Transactions on Nuclear Science, Vol. 43, No. 3, 1996,pp.1615-1618

[7] A. Demosthenous, S. Smedley, J. Taylor, “A CMOS ana-log Winner-Takes-All network for large-scale applications”,IEEE Transactions on Circuits and Systems-I: Fundamen-tal Theory and Applications, Vol. 45, No. 3, 1998, pp.300–304.

[8] J. Ramirez-Angulo, J.E. Molinar-Solis, S. Gupta, R. G. Car-vajal, A. J. Lopez-Martin, “A high-swing, high-speed CMOSWTA using differential flipped voltage followers”, IEEETransactions on Circuits and Systems II: Express Briefs,Vol.54, No. 8, 2007, pp.668–672.

[9] T. Serrano, B. Linares-Barranco, “A modular current-modehigh-precision winner-take-all circuit”, IEEE Transactionson Circuits and Systems-II: Analog and Digital Signal Pro-cessing, Vol. 42, No. 2, 1995, pp.132–134.

[10] K. Wawryn, B. Strzeszewski, “Current mode AB class WTAcircuit, in: The IEEE International Conference on Electron-ics, Circuits and Systems (ICECS)”, 2001, pp. 293–296.

[11] G. T. Tuttle, S. Fallahi, A. A. Abidi, “An 8-b CMOS vec-tor A/D converter”, IEEE International Solid-State CircuitConference (ISSCC), San Francisco, USA, 1993, pp. 38–39

[12] R. Dlugosz, T. Talaska, R. Wojtyna, “New binary-tree-basedWinner-Takes-All circuit for learning on silicon Kohonen’snetworks”, Int. Conf. on Signals and Electronic Systems(ICSES), Lodz, Poland, 2006, pp. 441–446

[13] B. Tomatsopoulos, A. Demosthenous, “Low power, low com-plexity CMOS multiple-input replicating current compara-tors and WTA/LTA circuits”, European Conference on Cir-cuit Theory and Design (ECCTD), Vol. 3, No. 28, Cork,Ireland, 2005, pp. 241–244.

[ieee 2014 ieee 29th international conference on microelectronics (miel) - belgrade, serbia...

Documents