when zero ps epa is not enough

8/3/2019 When Zero Ps Epa is Not Enough

1/9

When Zero Picoseconds Edge Placement Accuracy Is Not EnoughJohn Cheng, Teradyne Inc.

1.0 AbstractIn the last ten years, test equipment suppliers have driven improvements in edge placement accuracy takingit from f225ps to sub-1OOps through a combination of architectural improvements and new calibrationtechnology. However with the adoption of high speed source synchronous buses such as HyperTransportand RapidIO on high performance devices, it is no longer sufficient to just look at the tester EPAcomponent in the overall timing budget. Although test system accuracy is still very important, error termsfrom the DUT must also be considered. The proposed methodology of device strobed comparatorsaddresses both test system and device error terms.2.0 BackgroundMicroprocessor and PC system designers have long since known that one of the major bottlenecks inachieving high system level performance is bandwidth (Its the bandwidth stupid, Dick Sites) [11.Microprocessors core speed has been growing at a faster rate than Moores Law. But without a highbandwidth bus, the processor is left with executing wait cycles while off-chip data is being retrieved.Legacy VO bus architectures such as PCI provides a maximum bandwidth of 5 12MB/sec (PCI-X atlGB/sec) [2]. Because PCI is a shared bus architecture, it contains certain inherent limitations.In a shared bus architecture, bandwidth of the bus is shared among several devices. Any other device thatwants to send or receive data must wait for the arbitrator to complete a previous transaction beforebeginning another. The total available bandwidth of the bus is limited by the ability of one device tocomplete a transaction with another device in a given clock period. This means that the clock-to-data validon the transmitting device and the setup time in the receiving device must be considered. Because there arewell established software and hardware standards, PCI can be easily implemented and has gained wideacceptance not only in PC applications but in embedded applications such as networking andcommunication systems, digital consumer electronics and information appliances.In embedded systems such as networking and communication systems, system designers are facing asimilar challenge as their PC counterparts. Data rates are reaching in excess of lGHz to comply withEthernet standards of 10 Gigabit and OC-192. Multimedia data, communications and compressionalgorithms, routing and addressing databases all need to be processed at high speeds [3]. Currently there isthe added complexity of the multitude of busses which need to talk to one another; as network processor,control plane processor, switch fabric chip and security processor vendors all have their own proprietarybusses. Ideally, there would be an industry standard interconnect that makes it easy to link chips fromdifferent vendors while meeting the bandwidth requirement without straining tight pin counts [4].RapidIO and HyperTransportare both point to point packet switched interconnects. In such a system, alldevices are connected through multiple switching devices forming a common switching fabric network. Insystems with more than two devices, any two devices can communicate at the same time with no effect onother devices. This means transactions can take place concurrently which increases overall systembandwidth [ 5 ] . The RapidIO and HyperTransport protocols transmit a clock along with the data with bothbeing clocked from the same PLL (synchronized at the transmitter hence the term source synchronous-Figure 1). This has the advantage or reducing skew between clock and data [6]. In addition, clock periodsmuch less than the flight time can be realized. This allows for more scalable topologies, including theability to use switches to connect more devices.

Paper 41.21134

ITC INTERNATIONAL TEST CONFERENCE0-7803-7169-0/01$10.00 0 2001 IEEE


2/9

Example of errors due to transmission path:Example of errors at transmitter r71:. lockjitterClock duty cycle variation. Clock to data skew. round bounce9 Threshold and delay mismatch of device

output cells. imultaneous switching effects(crosstalk, diidt, etc)9 Etch mismatch9 Clock to data skew

Example of errors at receiver:9 DLLjitter. Common mode skew. Clock to data skewEdge rate mismatch between clock-and-data output cells9 Receiver threshold mismatch= Receiver Vref variation due to on-chip

ClkI - Transmitter Receiver I

DataSource Swchronous system: IData and clock is transmittedsimultaneously clocked from the samePLL. At the destination, clock and dataare received

I Figure 1: Simplified Block Diagram Of Source Synchronous SystemThe widespread adoption of these buses provides much needed bandwidth but it also leads to new testingchallenges.As speeds increase and data valid windows decrease, reducing the error component in theoverallAC timing budget becomes increasingly important (Figure 2). The error components of the DUTcan't be allowed to dominate the already small data valid windows at high speeds.To this end, the test environment must mimic the end application of a source synchronous system. In thispaper, issues that impact test as a result of source synchronous design are examined by reviewing threemethodologies:1. Search and strobe

2. Sweeping window3. Device strobed comparatorA proposal is made as to which methodology offers the best trade-offs.

clk-out

data-out

Data Rate: 250MTsBittime= 4nsI4 D

DUT/Tester ErrorData rate: 1.6GTs

f Bit time=625as

clk-out

DataValid iAs data rate increases, data valid window decreases whichdecreases the amount of timing margin available

DUT error terms 300psTester error terms 150psData Valid: 1oops

DUT Testerterms terms

Timing - Bit - + -k ValidTiming margin at 250MTs: 3450psTiming margin at 1GTs: 450psTiming margin at I.6GTs: 75psf DUT/TesterError 1 Valid i1 i

Figure 2: Timing Margin In AC Timing Budget Paper 41.21135


3/9

3.0 Search And Strobe MethodIn an application where the DUTs output to output timings need to be tested, the traditional method is toprogram the tester comparator strobe at a fixed location to test for the start of the data valid window. Whenusing a search and strobe technique, the placement of the strobe for the data output is calculated based onthe position of the output clock (Figure 3) . After the strobe position of data-out is programmed, thepatterns are bursted again.3.1 Limitations With Search And Strobe MethodThe search and strobe method is effective in that it compensates for the DUTs analog delay errors such asskew errors (e.g. clock skew, common mode skew, etc). However, this method doesnt compensate for anyanalog drift errors such as PLL drift and low frequency jitter that affects both clock and data. In addition,this method introduces error sources from the test system. The first source of error is in the strobe search ofthe output clock (clk-out). The amount of error introduced is governed by the compare side EPA. Thesecond source of error is in the strobe for the output data (data-out) which is calculated from the strobesearch of clk-out. In this case, the device guardband would typically include the RMS of the EPA of thetest system.In a source synchronous system, timing for clk-out and data-out is derived from the same PLL andtransmitted together. The receiving device latches the data based on the output clock of the transmittingdevice. The critical timing relationship is the relative timing between clk-out and data-out and less so theabsolute position of the pair. Because the relative timing for clk-outldata-out is a tighter spec than the specfor absolute position of the pair, simply programming a fixed strobe fordata-out is insufficient. There willbe devices that incorrectly fail because they dont meet the absolute timing specification but meet therelative timing.

Strobe for dab-out is calculatedbased on position of ck-out

.....................................................................................................................................................................................clk-out Tester capture I

I @Compare event for ck-outThis edge is swept through the valid window range to findthe position of the clock I.....................................................................................................................................................................................Tester capture)(


4/9

4.0 Sweeping Window MethodWhile the search and strobe method compensates for skew and delay errors, analog drift errors still need tobe addressed. Analog drift errors due to variations in temperature, process and voltage will cause outputclock and data to vary within each cycle. This means for every cycle in the burst, the absolute timingposition of clk-out and data-out will be different, even though the relative timing may remain unchanged.Because of the variation in the absolute position of clk-out cycle to cycle, programming the data strobeafter an edge search for output clock is insufficient. Using such a technique will reduce yield, allowescapes or both. In our experience, there are devices that have these characteristics, especially at speedsabove 800MTs.In order to address the cycle to cycle variation of the output clock, a different methodology needs to beused (Figure 4):

1.2 .3 .

Program the offset between clk-out and data-out by using data-valid specSweep clk-out, data-out pair through valid rangeKeep track of whether each cycle has ever passed

-lk-outdata-out

1. Program the offset between ck-out and data-out using data-validspecSweep ck-out, data-out pair through the valid rangeKeep track ofwhether each cycle has ever passed

2.3.4. Repeat for next cycle

Figure 4: Sweeping Window MethodIn order to test every combination of clk-out and data-out, timing sets need to be programmed to sweep theclk-out, data-out timing relationship for the entire cycle. The algorithm would look something like:

Iteration 1 clk-out-middata-out-minIteration 2: clk-out-middata-out-min + minimum timing resolutionIteration N: clk-out-middata-out-min + O\J-l)*minimum iming resolutionIteration last: clk-out-max/data_out-max

.. .

...

Paper 41.21137


5/9


6/9

5.0 Device Strobed Comparator MethodologyThe key motivation for adopting a source synchronous capture methodology is to reduce or eliminate errorcomponents that would be present in a standard fixed strobe environment. In a source synchronous system,the clock and data are derived from the same PLL at the source (i.e. the transmitting device). Since thereceiving device latches data with the forwarded clock, any error components on the output clock and datacan be ignored as long as the relative timing between clock and data is valid. In a fixed strobe system, anyerrors that affect the clock and data equally including ground bounce, clock jitter, clock duty cyclevariation, simultaneousswitching effects (crosstalk, di/dt, etc) and analog delays due to temperature orvoltage supply changes have to be added to the error budget. The magnitude of these errors can be thelargest single component of the error budget. In addition, because these errors are dynamic, it can provevery difficult to compensate for if the location of the strobe has to be programmed prior to the pattern burst.In a device strobed comparator ATE system however, the data is latched by the DUTs output clock. Thisremoves any error terms that affect the clock and data equally as described above. After data is captured,it needs to be processed. This is accomplished by examining the capture RAM and comparing it to aprogrammed vector set similar to a standard functional test pattern (Figure 6 ) .

Data rate: 1.6GTsBit time = 62511s I I

clk-out idata-out

DUT/TesterError 1 Valid ii iData rate: 1.6GTs

I Bit time = 625us

clk-outI

data-outDataDUT/Tester Error Valid

!In a device strobed comparator ATE environment,DUT and Tester errors are essentially eliminated.

As data rate increases, data valid window decreases whichdecreases the amount of timing margin availableDUT error terms lopsData Valid: loopsTester error terms 30ps

DUT Testerterms terms+ + Valid

Timing = Bit -

Fixed Strobe Device Strobed ComuTiming margin at 250MTs: 3450ps 3860psTiming margin at IGTs: 45Ops 86OpsTiming margin at 1.6GTs: 75ps 485ps

Figure 6: Timing Margin In Source SynchronousATE System

Paper 41.21139


7/9

In a clock forwarded scheme such as DDR, the DQS signal becomes active with valid data [8]. This DQSsignal (analogous to clk-out in Figure 7) would be used to trigger the comparator. Any clock signal may beused to trigger the comparator as long as the data being captured is aligned to the cycle of the expect data.

In the DDR protocol, all of the captured data is relevant and needs to be processed for padf ail information.However, in protocols that have a free running clock, this is not necessarily true. If the free running clocktriggers the data capture, there needs to be a way to determine when a valid data word begins. One way toaddress this is to have the tester generate a signal that corresponds to the beginning of the valid data. Thisreintroduces the analog delay error component because the tester will generate this signal based on whenthe data should be valid but the data out of the DUT is subject to the analog delay effects on the signal path.Therefore, there is going to be a shift between when the tester starts capturing data and when relevant dataappears on the bus. In order to process pass/fail results, this shift needs to be known. Because this shift is afunction of the DUTs analog delays, it is not trivial to determine what this shift is.

DUT-reset

clk-out

data-out

$ Clk-out triggers tester to start captureTester captureRAM ontents J

Figure 7: Data Capture Using Device Strobed Comparator MethodIn order to accurately align the compare values with the captured data, there needs to be a method todetermine where the first valid data bit is stored in the RAM. his requires the tester to take one of threepaths to account for the analog delays of the various data paths. The first would be to determine the amountof delays on each bus and then program dont care cycles in the pattern to account for them. However,since the amount of analog delays can change as a function of speed, this would require the generation ofseveral patterns, each with different numbers of dont care vectors, for each test. This quickly increasesthe amount of vector memory needed to store all of these redundant patterns. The second is to search thefirst few bits of the output data, using a match routine, to determine which bit in the stored arraycorresponds to the start of the data and then making pasdfail determinations. Based on the complexity ofthe algorithm used, this may require a significant amount of time to find a match and begin the compare.A third option is to once again use one of the DUT pins to tell the tester when data becomes valid. Whentwo source synchronous devices are talking to each other, they send a bit of data on a separate control linethat signifies the beginning of a valid data word.A test platform can use this bit to trigger the capture ofdata allowing it to store only the valid data bits. Since the sync pulse will shift with the data, the latencydelay is removed. This method eliminates the need to perform search routines or load additional patternsinto the tester.

Paper 41.21140


8/9

Figure 8 shows experimental results on a bench fixture. An oscilloscope, acting as a receiver, is triggeredby a clock that is independent of the data at the source (i.e. the tester) in one case while in the other case,the trigger clock is synchronized with the data. At a data rate of over 600Mbps, there was significantimprovement in the amount of timing margin available (i.e. the size of the data eye).

Figure 8: Capture Triggered by asynchronous and synchronous

Paper 41.21141


9/9

6.0 ConclusionThe key benefit of a source synchronous system is that a receiving device latches data based on theforwarded clock of the transmitting device. As long as the relative timing between the output clock anddata is met, the receiver is able to latch in the correct data. This architecture inherently provides additionaltiming margin to the overall system. In a test environment, it is essential that the tester comparator operatesin the same manner as the receiving device in the source synchronous system or else the benefit of havingadditional timing margin will not be realized. A device strobed comparator system best mimics the endapplication of a source synchronous system. Depending on the output jitter characteristics of the device, itis possible that not having this will cause devices to fail at test but will work properly in the endapplication.The author would like to thank the following individuals for their contribution: Calvin Cheung, DumithDesilva, Greg Hilliard, Scott Schaber, Jason Sturm7.0 References[13Gwennap, Linley. Digital 21264 Sets New Standard, 1996,Microprocessor Report[2] Nwaekwe, Laverty, Chowdhury, Syeed, PCI-X Boosts Bus Bandwidth to 1 Gbps, 2000, EDN[3] The Lightning Data Transport I/O Bus Architecture,2000, M I Networks[4] Glaskowsky, Peter N. RapidIO Expands Narrow-Bus Options, 2000, Microprocessor Report[5] Powell, Courtney. Internetworking Equipment Design: RapidIO Renders Overall Performance,[6] Bouvier, Dan. Example AC Timing Budget, 2000, RapidIO Trade Association[7] Haller, Robert. The Nuts And Bolts Of Signal-Integrity Analysis, 3/16/00, EDN[8] Double Data Rate (DDR) Spe~ification~,000, JEDEC Solid State Technology Association

2000, EE Times

Paper 41.21142

when zero ps epa is not enough

Documents