complexity analysis of optical-computing paradigms

16
Complexity analysis of optical-computing paradigms Ahmed Louri and Arthur Post Optical computing has been suggested as a means of achieving a high degree of parallelism for both scientific and symbolic applications. While a number of implementations of logic operations have been forwarded, all have some characteristic that prevents their direct extension to functions of a large number of input bits. We analyze several of these implementations and demonstrate that all these implementa- tions require that some measure of the system (area, space-bandwidth product, or time) grow exponentially with the number of inputs. We then suggest an implementation whose complexity is no greater than the best theoretical realization of a Boolean function. We demonstrate the optimality of that realization, to within a constant multiple, for digital optical-computing systems realized by bulk spatially variant elements. 1. Introduction Measures of gate count are as important in designing Boolean functions as are measures of operations in program design. In designing algorithms, the no- tion of an upper bound to the number of computa- tions required for implementing a given function is fundamental to the analysis of the run time of an algorithm. On the other hand, a knowledge of lower bounds on the number of computations can provide an understanding of when an algorithm is optimal, and when no further reduction in operations is possible. Similar observations can be made of Bool- ean functions. In general, a design method that can implement any Boolean function may not provide optimum solutions for certain special cases. Shan- non' demonstrated lower bounds to the cost, in numbers of gates, for almost all functions, mapping n Boolean input variables to a single output of fl(2n/n), where i() represents the lower bound notation f(n) = fl[g(n)] limg(n) < c, (1) n-*cf(n) where c is a constant, f(n) is, in this instance, the cost of implementing a Boolean function, measured in numbers of gates, g(n) represents the function that bounds the cost, and n is the number of inputs. The authors are with the Department of Electrical and Com- puter Engineering, University of Arizona, Tucson, Arizona 85721. Received 24 May 1991. 0003-6935/92/265568-16$05.00/0. © 1992 Optical Society of America. Thus Shannon's theorem states that the number of gates that are required for implementing almost all functions cannot be less than a constant multiple or fraction of 2n/n as the number of inputs increases. The qualification that the theorem is true for almost all functions of n inputs means that the ratio of the number of functions for which the theorem is true to the total number of functions of n inputs tends to unity as n increases without bound. A more detailed discussion is included in Appendix A. While the Shannon limit applies to the general case, certain functions may have an inherent struc- ture that can be exploited to provide significant reductions in complexity order, which a general de- sign methodology cannot achieve. As a concrete example of this sort of reduction in gate count, consider the simple full adder. By using standard minimization techniques, we see that the carry func- tion has three prime implicants, while the sum function has four prime implicants, and these impli- cants are minterms. Thus it would appear that the sum function cannot be minimized, and the resultant realization as a direct implementation of the sum-of- products formulas requires four 3-input AND gates and one 4-input OR gate for computing the sum, along with three 2-input AND gates and one 3-input OR gate to compute the carry. Further, in the computation of the sum function, all the inputs appear in each of the terms, requiring that each input be mapped four times (twice as the negation of the variable), while the computation of the carry requires that each input be mapped twice; the result is a total fan-out of 5 from the input nodes and 2 from the inverters. Construct- 5568 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Upload: arthur

Post on 03-Oct-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Complexity analysis of optical-computing paradigms

Complexity analysis of optical-computingparadigms

Ahmed Louri and Arthur Post

Optical computing has been suggested as a means of achieving a high degree of parallelism for both

scientific and symbolic applications. While a number of implementations of logic operations have been

forwarded, all have some characteristic that prevents their direct extension to functions of a large number

of input bits. We analyze several of these implementations and demonstrate that all these implementa-tions require that some measure of the system (area, space-bandwidth product, or time) grow

exponentially with the number of inputs. We then suggest an implementation whose complexity is nogreater than the best theoretical realization of a Boolean function. We demonstrate the optimality ofthat realization, to within a constant multiple, for digital optical-computing systems realized by bulkspatially variant elements.

1. Introduction

Measures of gate count are as important in designingBoolean functions as are measures of operations inprogram design. In designing algorithms, the no-tion of an upper bound to the number of computa-tions required for implementing a given function isfundamental to the analysis of the run time of analgorithm. On the other hand, a knowledge of lowerbounds on the number of computations can providean understanding of when an algorithm is optimal,and when no further reduction in operations ispossible. Similar observations can be made of Bool-ean functions. In general, a design method that canimplement any Boolean function may not provideoptimum solutions for certain special cases. Shan-non' demonstrated lower bounds to the cost, innumbers of gates, for almost all functions, mapping nBoolean input variables to a single output of fl(2n/n),where i() represents the lower bound notation

f(n) = fl[g(n)] limg(n) < c, (1)n-*cf(n)

where c is a constant, f(n) is, in this instance, the costof implementing a Boolean function, measured innumbers of gates, g(n) represents the function thatbounds the cost, and n is the number of inputs.

The authors are with the Department of Electrical and Com-puter Engineering, University of Arizona, Tucson, Arizona 85721.

Received 24 May 1991.0003-6935/92/265568-16$05.00/0.© 1992 Optical Society of America.

Thus Shannon's theorem states that the number ofgates that are required for implementing almost allfunctions cannot be less than a constant multiple orfraction of 2n/n as the number of inputs increases.The qualification that the theorem is true for almostall functions of n inputs means that the ratio of thenumber of functions for which the theorem is true tothe total number of functions of n inputs tends tounity as n increases without bound. A more detaileddiscussion is included in Appendix A.

While the Shannon limit applies to the generalcase, certain functions may have an inherent struc-ture that can be exploited to provide significantreductions in complexity order, which a general de-sign methodology cannot achieve. As a concreteexample of this sort of reduction in gate count,consider the simple full adder. By using standardminimization techniques, we see that the carry func-tion has three prime implicants, while the sumfunction has four prime implicants, and these impli-cants are minterms. Thus it would appear that thesum function cannot be minimized, and the resultantrealization as a direct implementation of the sum-of-products formulas requires four 3-input AND gatesand one 4-input OR gate for computing the sum, alongwith three 2-input AND gates and one 3-input OR gateto compute the carry. Further, in the computationof the sum function, all the inputs appear in each ofthe terms, requiring that each input be mapped fourtimes (twice as the negation of the variable), while thecomputation of the carry requires that each input bemapped twice; the result is a total fan-out of 5 fromthe input nodes and 2 from the inverters. Construct-

5568 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 2: Complexity analysis of optical-computing paradigms

ing this circuit from 2-input gates alone requireseleven AND gates, three inverters, and five OR gates.But the fact that the Karnaugh map of the sum is thatof the odd-parity function (which can be implementedby two XOR gates as A D B C), along with the factthat the carry can be realized as

C=AB + (AfflB)C (2)

results in an implementation that has two XOR gates,two AND gates, and one OR gate while requiring amaximum fan-out from any one input of only 2.While the above example is quite simple, takingadvantage of inherent structures of certain functionscan produce large reductions in the complexity ofmany functions.

Other considerations, such as restrictions imposedby the nature of the target environment, can impactthe size of a circuit realization to a significant degree.For example, imposing restrictions on the nature of acircuit graph can significantly increase the complex-ity of a design. McColl2 has studied the behavior ofcircuits when they are restricted to strictly planarimplementations (i.e., no edges may cross), and hasshown that almost all functions, when implementedin this way, require a minimum of 2n gates, where nis, again, the number of inputs to the function. Thisminimum is a factor of n greater than the lowerbounds for almost all functions derived by Shannon.It is interesting to note that the restriction imposedon circuit graphs in this paper is a restriction on theedges of the graph; most previous work in Booleanfunction theory has imposed limitations on only thebasis functions employed. That such a seeminglysimple restriction on the graphs that implement acircuit should have this kind of impact on circuit sizehas important implications for optical computingsince the use of spatially invariant interconnectionsimposes rigid limitations on the interconnections in acircuit graph.

Systems that employ both spatially invariant andspatially variant imaging have been proposed in theliterature. For example, Brenner et al.3 discuss spa-tially invariant imaging systems for interconnectionsbetween gates, while Jenkins et al.

4 implement inter-connections by spatially variant imaging techniques.It is the thesis of this paper that, while systemsconstructed from spatially invariant imaging tech-niques seem to be cheaper to build than spatiallyvariant systems, the nature of the restrictions im-posed on the graphs that can be constructed fromspatially invariant interconnections forces these sys-tems to be much larger and costlier than spatiallyvariant systems, particularly when the number ofinputs to a function constructed in this mannerbecomes large.

In discussing this issue we make use of the ordernotations frequently used in computer science. Inaddition to the f() notation used above, we use twoother similar notations in this text. The 00 nota-tion is the dual of the }() notation in the sense that itis an upper-bound notation and has the precise

meaning

f(n) = O[g(n)] - 1ig.(n)n-- g(n) ~ (3)

Equivalently the function f(n) can increase no fasterthan a constant multiple of g(n), or, in the limitingcase, as n grows without bound. A special case ofthis notation is f(n) = 0(1), which has the meaningthat the function is a constant, or finds a limit in aconstant value.

We also make use of the E() notation, which can bedefined in terms of the Q() and 00 notations as

f(n) = 0[Ig(n)] -

f(n) = Ol g(n)] A f(n) = fl[ g(n)]. (4)

That is, again in the limiting case, f(n) will increaseno faster than a constant multiple of g(n), nor slowerthan a (possibly different) constant multiple of g(n);less precisely, f(n) is bounded above and below byg(n).

We briefly analyze several proposed optical-comput-ing paradigms constructed by spatially invariant tech-niques in Section 2. In Section 3, we implementseveral of the 2-input functions by using a simplespatially variant imaging technique. This same tech-nique is extended in Section 4 for constructing somelarger functions. The system we present in Sections3 and 4 is described in architectural and computa-tional terms. A comparison of the various tech-niques in terms of underlying complexity is presentedin Section 5, with conclusions and suggestions forfurther research discussed in Section 6. We includean amplification of Shannon's theorem Appendix A.

2. Complexity Analysis of Several Optical-ComputingParadigms

In the interest of continuing the development ofdigital optical computing, and to demonstrate theapplicability of the field of Boolean complexity theoryto optical systems, in this section we analyze severalof the proposed implementations of digital optical-computing systems from a system complexity view-point, and show that these systems have similarsystem complexities. The systems that we detail aresymbolic substitution logic, which is implemented byadditive methods3 56 and by convolution or correla-tion techniques,7-10 shadow-casting logic,"1-14 the pro-grammable logic of Murdocca et al.,1516 and thecombinatorial logic-based system of Guilfoyle andWiley.17 We analyze these implementations in termsof their complexity and, hence, the ability of thesesystems to compute Boolean functions of sizes compa-rable to those currently achieved in electronic sys-tems.

While it might be argued that our choice of Booleanfunction implementations is unsuitable for the opti-cal domain, we note that some model of what comput-ing means is required, and that the two major bodies

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5569

Page 3: Complexity analysis of optical-computing paradigms

of study in computation center around Turing ma-chines and Boolean functions. Because of the lack ofsome other model of computation, we must formulateour theories of optical computation around one ofthese models. Since the optical techniques we wishto study incorporate the implementation of Booleanfunctions, we restrict our discussion to this model.

A. Symbolic Substitution Logic

The basic idea behind symbolic substitution logic(SSL) is that one first detects the presence of one ormore patterns in the input plane, and then substi-tutes some appropriate pattern for each detectedpattern. Three primary methods for performingSSL have been advanced, both based on spatiallyinvariant optics. The first method, which is knownas additive logic, involves making copies of the dataplane, shifting these copies, and superimposing theshifted copies to determine where in the data planecertain patterns occur. The second method involvesfiltering in the spatial frequency domain, and iscommonly known as spatial filtering logic. A morerecent method, proposed by Louri,'8 combines shadow-casting logic (which is discussed below) with polariza-tion techniques to address some of the optical engi-neering questions involving SSL. We analyze thefirst two techniques separately.

1. Additive LogicData is typically represented in additive logic-basedSSL in a dual-rail intensity-coded format, in whichtwo pixels are used to encode each bit, although othercodings are possible (see, for instance, Refs. 19 and20). The locations of these patterns may be definedby masking, or by providing detectors at only thoselocations to which the origin of the pattern is mapped.Thus in the conventional split-shift-recombine tech-nique, the origin of the pattern is the location in thedetector planes to which the dark pixels of each bitare mapped.

The disadvantage to this approach is that for agiven number n of inputs coded in any arbitrarymanner, the total number of combinations of thesebits is 2n. In order to implement completely speci-fied digital logic, each of these 2n combinations (whichcorrespond to the 2n possible minterms) must begenerated. The reason for this may be seen from thefollowing argument. Suppose that less than the 2ncombinations are generated in space and imaged ontothe corresponding detection planes. Then some com-binations of inputs will exist that will not be detectedby the system. For these combinations, no determi-nate output will be generated, and some indetermi-nate value will result. In the typical dual-rail codingscheme, the result is the substitution of two darkpixels in the position in the output plane at which thefunction is to be computed. But in dual-rail binarylogic encoding, two dark pixels are undefined. Hencein this implementation not specifying the outputvalues for some combinations yields a mapping to avalue outside the set 10, 11. If the data are to be

processed simultaneously, 2 image planes must beformed at discrete locations in space. If each imageplane has the same area as the input plane, the areaoccupied by the detection planes will be fl(2n) timesthat of the input plane.

This result is due to the nature of the paradigm, notto a function of the encoding scheme that is used torepresent the data. Intensity encoding can achieve atwofold reduction in the area of the input plane thatis required for representing the inputs; or, alterna-tively, a twofold increase in the density of data in thisplane. The nature of the paradigm does not change,however, and still requires separate detection of eachof the possible combinations of the inputs. Thus anintensity-encoded SSL implementation of a com-pletely specified Boolean function of two inputs willneed to detect each of the four patterns [(dark, dark),(dark, light), (light, dark), and (light, light)] in orderto be able to generate the appropriate output for eachof these combinations. This observation extends tofunctions of larger numbers of inputs in the samemanner as for dual-rail encoding. Louri' 9 has pro-posed a technique in which pairs of dual-rail inputsare superimposed before imaging, which reduces thenumber of rules in the initial detection phase tothree; he has also demonstrated an increase in compu-tational throughput for addition by this method.The method does require that a fourth rule be appliedas a separator in the final step to discriminate be-tween outputs that the technique has mapped to acode outside the range of the input coding of data.

The actual complexity of an SSL implementationmay be greater than that discussed above. The extracomplexity comes from the necessity of providingextra rules to allow for the separation of data vectorsor regions of computation from one another.

2. Spatial FilteringThe second method for implementing SSL is based onconvolution-correlation techniques, wherein the in-put plane is transformed to its Fourier domain repre-sentation, filtered, and the inverse Fourier transformis then taken of the filtered image. The principle isthe same as for additive logic: all combinations ofthe inputs are detected, and suitable outputs aregenerated for these combinations. A number ofinput codings may be used: Casasent and Botha9

suggest a dual-rail coding similar to that discussedabove for the additive logic implementation, whileJeon et al.10 propose a somewhat more complexencoding to improve cross-talk characteristics. Bren-ner et al.8 and Weigelt2l suggest several alternatecodings that achieve a spatial frequency modulationrepresenting the data. What is common to thesetechniques is that a separate filter must be used foreach pixel pattern to be detected, regardless of theencoding technique. Thus four filters are requiredin order to detect patterns representing two binaryinputs. Implementing Boolean functions of threeinputs requires eight filters; ultimately 2n filters arerequired for computing a function of n inputs. In

5570 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 4: Complexity analysis of optical-computing paradigms

the methods of Weigelt,7,2' and Casasent and Botha9

these filters are spatially multiplexed to create asingle filter upon which the input plane or planes areimaged. But each each filter has a space-bandwidthproduct that is a function of the space-bandwidthproduct of the input plane, with the resultant band-width increasing as a function of the number filters,and hence as a function of 2 . In the techniqueproposed by Jeon et al.'0 the filters are separated inspace, but one is required for each pattern detected.Thus the spatial filtering techniques require that thenumber of filters, and possibly that spatial bandwidthof the filters, increase exponentially with the numberof inputs to implement Boolean functions correctly,just as the number of detector planes increasesexponentially in additive logic-based SSL.

B. Shadow-Casting Logic

A second implementation of digital optical comput-ing, attributable to Tanida and Ichioka,11-14 is knownas shadow-casting logic. In this method data ismapped to multiple parallel spatial light modulators(SLM's). These SLM's are composed of liquid-crystal light valves driven by an electronic adjunct,which is responsible for mapping the data to theappropriate cells of each SLM. The planes of theSLM's are set physically close to one another and areparallel. The data encoding is by orthogonal dual-rail transparent-opaque regions within overlappingcells of the parallel planes. The orthogonality of theparallel codings results in a portion of the resultantcombined cell being transparent. By having multi-ple SLM's combined with suitable locations for thedetectors, any of the n-input functions may be com-puted. The number of light sources and detectors isdependent on the manner in which the system is to beused.

While this method has the advantage of simplicityin implementation [the light is provided by light-emitting diodes or other inexpensive sources], thesize of the transparent area in each cell decreasesexponentially with the number of inputs mapped toeach cell. Thus the available power at the detector isonly 2-n times the power incident on the cell. Further,the number of sources is equal to the number ofminterms in the ON set of the implemented function(those minterms that map to the value 1 in thefunction's output). The number of sources can thenvary from one to a maximum of 2 . (Should therebe more than 2n-1 minterms constituting the ON set ofa function, simply detecting the minterms mapping to0, which is the OFF set, and inverting yields a numberless than 2n-1.) For good functions, such as then-input AND, only one source is required, while afunction such as the parity function requires 2n-1.

This exponential complexity was first described byArrathoon and Kozaitis,22 who then proposed the useof this technique to provide an associative look-up formultivalued logic. While these authors are able toshow some improvement in power and space band-width, their system has many of the same drawbacksas the original method.

The method also has the drawback that the outputis incompatible with the input, being intensity en-coded rather than transparency encoded. Thus thesystem is not directly extensible, and the outputsmust be converted to an electronic signal before beingmapped to the input plane for further computation.

C. Combinatorial Logic

Another computing paradigm, from Guilfoyle andWiley, is called combinatorial logic, which divides thecomputational load between the electronic and theoptical domains.' 7 In this method, a pair of multi-channel acousto-optic cells are telecentrically imaged,with data within the cells counterpropagating. Wedemonstrate how this structure can be used to achievean AND-OR structure, and relate their structure to aprogrammable logic array (PLA) implementation.The resultant AND-OR structure is shown in Fig. 1.

As noted above, this model does not perform allcomputations in the optical domain, and requires acertain amount of electronic logic to compute anumber of terms before the application of the opticallogic. This technique has the advantage that a largenumber of computations are performed in parallel bythe optical devices. It has the disadvantage that theoptical logic performs only a fraction of the computa-tions and that the computational capability of theoptical portion of the system is limited. The spa-tially invariant optical subsystem can be modeled as aset of n-2-input AND gates, whose outputs are con-nected to the inputs of an n-input OR gate. Weassume that the function to be computed depends onall inputs (is nondegenerate), and that the inputs tothe AND gates are two distinguished sets of n bitseach. That this system is inherently weak can beestablished as follows:

* Since the structure of the AND-OR tree is fixed,the only way to compute a variety of different func-

Fig. 1. AND-OR structure of the optical subsystem of combinato-rial logic.

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5571

Page 5: Complexity analysis of optical-computing paradigms

tions is by making different assignments to the leavesof the tree (the AND gate inputs).

* The number of different possible assignmentsof the variables in either set of n variables in n!.Thus the total number of possible assignments of the2n-input variables to the 2n inputs of the AND level isn! x n! = (n!)2.

* The number of different possible Boolean func-tions of 2n variables is given by

If} 22nI hfn) = 22

* The number of degenerate functions of 2nvariables (which are denoted by f2n) can be shown tobe

I|f2nil (2n )22 - 2n22

* Thus the number of nondegenerate functions of2n inputs is given by

fn - An I > 222n - 2n222.- 1

> (22 - 2n)22

222n- 1

* Taking the limit of the ratio of the number offunctions computable by the AND-OR tree to thenumber of nondegenerate functions establishes theresult:

(n!)2 n!lim - = lim 2222 0.

One example of a function that this spatially invari-ant optical subsystem cannot perform is the parityfunction over 2n inputs. That the fraction of thetotal functions implementable by this method isactually quite small, even for small values of n, can beseen from Table 1. Hence the optical subsystemalone is not generally capable of computing Booleanfunctions.

D. Programmable Logic

Murdocca et al. have proposed a method of digitaloptical computing that uses optical nonlinearitiesdirectly as a form of logic gate.15"16 In this method,planes of these logic gates are interconnected by aspatially invariant imaging system that is con-

Table 1. Comparison of the Number of Functions Implementable in theSpatially Invariant Subsystem of Combinatorial Logic, and the TotalNumber of Nondegenerate Functions versus the Number of Input

Variables

n 2n n! n!2 2 22-2 222-1

3 6 6 36 65 536 4 294 967 2964 8 24 576 264 2128

5 10 120 14 400 2256 2512

structed from beam splitters and prism arrays, withthe resultant interconnection pattern forming a cross-over network.

The essence of the design technique used in thissystem is that the n-input variables are mapped to theinput array along with their complements. A seriesof n + 1 interconnection stages consisting of onecrossover network and one mask per stage is thenused, along with n + 1 arrays of AND gates, togenerate all the minterms of the function. Once theminterms of the function have been generated, theappropriate minterms are combined through a simi-lar n + 1 stage network in which the AND gatefunctions have now been replaced by OR functions togenerate the appropriate functional output. Thetotal number of gates required by this method isfl(n2n), where, again, n is the number of inputs.

It has been asserted that the size of the circuitsrealized in this manner is optimal, being fl(n2n) in thenumber of inputs, and that the number of levels inthe circuit is optimal, being linear in the number ofinputs. While it is true that Shannon's theoremdemonstrates that almost all Boolean circuits havesize fl(2n/n) and depth fl(n), Shannon noted that thisis not true for many fundamental circuits. But thisapproach has lower bounds that are exponential inthe number of inputs, regardless of the functionimplemented. Thus, while a given function mightbe optimally computed by a circuit with size O(n2)and depth 0 (log2 n) (see Ref. 23), programmable logicwill not allow minimization to this level. Even forrandom functions whose size is 0(2n/n), this tech-nique requires 0(n 2 ) more gates than the optimalcircuit realization.

In order to argue that the spatially invariantinterconnection pattern does not greatly increase thecost of implementation over that of a spatially variantimplementation, a spatially variant implementationof the full adder'5 that requires 78 AND gates, 11 OR

gates, and 6 levels is compared with the 128 gates and8 levels required of the present method. But thespatially variant implementation generates all themonoms (not minterms) of the function, resulting ina circuit that is not minimal. The minimal full addercircuit, which was discussed above, requires only 5gates (two XOR, two AND, and one OR), and has a depthof 3. An alternate realization of the full adder has agate count of 6 (two XOR, three AND, and one OR) and adepth of 2. Even when restricted to the use of NOR

gates alone, a minimized full adder should require nomore than 14 such gates (although the resultantdepth is 7). While it is possible that a spatiallyinvariant full adder circuit could be reduced to 48 ANDand OR gates, the result would still not be minimal.

While this technique does share the features oftopological regularity and the use of AND and OR gateswith a PLA design, a true PLA design has only twolevels of logic, regardless of the number of inputs, andtypically does not require that all minterms of afunction be generated. Rather, the PLA user devel-ops one or more equations (depending on-the number

5572 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 6: Complexity analysis of optical-computing paradigms

of outputs required) that are in minimized sum-of-products form. It is the product terms of theseequations that are then mapped onto the PLA struc-ture. While some functions (the parity function, forinstance) do not admit to minimization of theirsum-of-products form, a ring-sum expansion willfrequently minimize these functions and permit aneconomical circuit realization in XOR logic. Althougha minimized circuit has been demonstrated that doesnot generate all minterms and then select from them,the breadth of the ciruit is still exponential in thenumber of inputs; hence the circuit is no moreminimal than would be an electronic circuit thatconsists of an n-to-2n demultiplexer, whose outputsare then connected to one or more trees of OR gates,which generate the resultant function.

E. Commonalities of the Above Implementations

We have examined several implementations of digitaloptical computing and have found that all thesesystems require an exponential increase in the com-plexity of some parameter of the system as theimplementation is used to compute functions of in-creasing numbers of inputs. This exponential in-crease impacts the feasibility of systems implementedby these means in direct and profound ways:

* Only those functions, such as addition, forwhich there are serial implementations, can be ex-ploited, and the full potential for parallelism inoptical computing will not be developed.

* Control functions, which typically cannot beserialized, will remain in the electronic domain.

* Other functions that require global communica-tions will require some other computational domainfor solution.

These systems also lack the capability of interconnect-ing even the small functions that have been imple-mented in the arbitrary manner that is required inorder to construct larger functions. This lack ofinterconnect capability is both the root cause of theinherent exponential complexity of these systems andthe constraint that prohibits the economical construc-tion of large functions from the smaller basis func-tions.

3. New Approach

The inability of the systems discussed above to com-pute Boolean functions with less than exponentialcomplexity severely limits the application of thesesystems even in terms of their ability to performspecial-purpose computing in conjunction with elec-tronic hosts. Thus we must look for new ways ofachieving digital optical computing that do not sufferfrom the shortcomings of the previous implementa-tions, but that still provide the advantages thatoptical computing promises.

Whether optical computing remains an adjunct toelectronic computing, or whether general-purposeoptical computers are eventually developed, will de-

pend primarily on whether optical random accessmemory is developed. Since optical memory is notcurrently available, we restrict our discussion to theuse of optical computing as an adjunct to the elec-tronic domain. In this regard, then, we would expectan optical-computing adjunct to:

* Use nonlinear optics directly as logic gates, andrequire a number of such gates that is commensuratewith the best theoretical implementation;

* Allow the parallel mapping of registers, orregister pairs, to the input of the optical system, andoutput the results to one or more entire register;

* Perform a large amount of computation on theregisters before returning the results to the electronicadjunct (the addition of two 8-bit registers would benear the low end of this range);

* Perform the computations as quickly as possi-ble, utilizing parallel implementations of the func-tions rather than serial implementations.

In summary, we wish the optical subsystem to re-quire minimum overhead in the electronic sub-system, to compute functions on large data itemssimultaneously, and to compute these functions byusing the greatest amount of parallelism possiblewithin each function.

While many of the optical-computing paradigmsproposed in the past have demonstrated the basisfunctions that we demonstrate, these previous imple-mentations have relied on spatially invariant intercon-nections of one form or another to achieve their goals.What we submit here is a simple spatially variantconstruction of the basis functions, which can then beextended for constructing larger functions from theseelementary ones. While the implementation pro-posed here is constructed from simple spatially invari-ant imaging elements, advances in holographic tech-niques could substantially improve the speed and sizeof our implementation.

A. NOR Function

We begin our development by demonstrating the2-input NOR function applied to an object plane ofN 2 = 2n pixels, which computes n of these functionssimultaneously. We assume that the data pairs aredistinct and have the same spatial relation to eachother. This requirement is imposed by the spatiallyinvariant nature of the imaging systems we use. Wedo not, in general, assume that the data elements areadjacent, and we make use of the simple spatiallyvariant interconnections to perform computations onpairs of elements that are not adjacent. The realiza-tion of the NOR function is shown in Fig. 2, with aschematic representation of the realization shown inFig. 3.

In Fig. 2, the input data are mapped into two pairsof rows with all four possible combinations of theinputs encoded in each pair of rows. The values 0and 1 are shown numerically for clarity; in actuality,the inputs would be intensity encoded. The input

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5573

Page 7: Complexity analysis of optical-computing paradigms

Input Plane Beam Splitter1 1 0 0 1T f l I100

Resultant Planes

D o ll01011100__

101 0

1 0 1 .

Fig. 2. Implementation of the 2-input NOR function.

array is replicated by means of a beam splitter. Thereplicated images are then masked so that alternaterows of pixels are blocked in either of the two images.A pair of prisms now recombines the images in such away that every pair of pixels that is to be NORed isincident on the nonlinear element that computes theNOR function. While the operations performed herecould have been more directly implemented by alenslet array since the data were mapped in adjacentpixels, our purpose is to demonstrate a more generaltechnique that may be used to construct significantlylarger functions in which computations may occurbetween inputs or intermediate results that are notphysically adjacent.

The schematic representation of Fig. 3 is derivedfrom the implementation shown in Fig. 2 by takingthe top pair of rows from the implementation, divid-ing the rows into columns, and stacking the columnsto achieve one column of eight elements. We rely onschematic representations in the discussions below.

00

011011

1000

Fig. 3. Schematic of the NOR implementation.

The schematic representations will show a decreasein active device density with increasing numbers ofinputs. This density decreases linearly with thenumber of inputs, and is not different in this respectfrom SSL.

In this regard, we note that the density of the NORelements is only one half that of the input plane.In contrast, the density of threshold elements in SSL(for 2-input functions) is one fourth the density of theinput plane when dual-rail encodings are used.Further, with SSL, one requires four planes of suchdevices, each essentially implementing the NOR func-tion, for performing the same computation we per-form with one such plane. Also, since we are usingsimple intensity coding, we achieve twice the numberof computations with the same size planes. Whileprogrammable logic fully populates the logic planeswith devices, implementation of the NOR function bythis method also requires a minimum of four planesof devices (operated in AND and OR modes). Program-mable logic also requires that a field of four pixelpositions be reserved for each 2-bit vector, and thatthe complements of the inputs be provided in additionto the noncomplemented inputs.

B. NOT Function

While not a 2-input function, the NOT function, or theability to achieve the function by other gates, isessential for computing any nonmonotonic function.The implementation of this function by the NOR isdirect, and is shown schematically in Fig. 4. Thisimplementation is trivial and requires little comment.The main feature of this implementation that should

5574 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

MasksRecombination

StagePlane of

NOR Gates

Legend:

El Data Value '1'

r Data Value '0'

M Mask ElementE Optical NOR Gate

g Transparent

0 1

0 0 0

Page 8: Complexity analysis of optical-computing paradigms

A A'B B1

C CCD Do

Fig. 4. Schematic of the NOT implementation.

be noted is that if the intensity of the input plane issufficient to permit splitting into two separate im-ages, then, with proper biasing, the intensity issufficient to drive the NOR into its low-transmissionstate. Conversely, a low-intensity pixel in the inputplane will cause the NOR to remain in a high-transmission state.

C. OR Function

The OR function may be obtained from the above twofunctions by simply inverting the output of the NOR.This implementation is shown in Fig. 5. The NOTfunction-must be implemented with planes of NORfunctions with the same active device density as thatof the plane implementing the NOR.

D. AND Function

The construction of the AND in NOR logic is based onthe tautology AB = A + . A schematic of theimplementation of this function is shown in Fig. 6.As with the OR and NOR functions, the input is denselymapped to the input plane, requiring no more thanone pixel position per bit, while the density of activeelements in the NOR planes is one half that of theinput. This construction achieves a savings of oneplane of active elements over that required by SSL orprogrammable logic.

E. Equivalence (X-NOR) and the XOR Function

In the above sections, we developed implementationsfor two complete basis sets, the set NOR} and the set{AND, OR, NOT}. Either of these sets is sufficient forcomputing all the Boolean functions. The ability tocompute the XOR and the equivalence or X-NOR func-tions provides flexibility in designing a variety ofcircuits. We first detail the design of the X-NOR.

The logic diagram of the X-NOR is shown in Fig. 7.As can be seen, construction of this function from theNOR requires five gates and three levels. It would

Input

aC

a

g

a + bbdf

h

c+de+fg + h

Input

a abCb c

e defh~~~~~g

Fig. 6. Implementation of the AND function.

seem that this construction, when directly translatedto an optical realization, would require more gateplanes than SSL, which we have argued requires fourgate planes for implementing any 2-input function.The freedom allowed by spatially variant interconnec-tions, however, permits us to implement the invert-ing NOR gates at level one and the NOR gates in leveltwo as single fully populated NOR planes for each level.This realization is shown in Fig. 8. A careful exami-nation of this figure reveals that we have provided thefive gates that the logical design requires, but havecollapsed the gates at levels one and two into only twogate planes. Many designs may be able to takeadvantage of this technique, if the designer is clever,and if mass production of several densities of gateplanes is possible.

We noted that the X-NOR and the XOR functionswere merely complements of one another, and so thesimple inversion of the output of the X-NOR could beused to implement the XOR. We have used instead analternative implementation that requires only thesame number of NOR gates as the X-NOR. The result-ant logic diagram shown in Fig. 9. The implementa-tion of this alternate realization of the XOR is shown inFig. 10. This implementation is also somewhat moreexpensive than that which results from simply invert-ing the X-NOR since it requires an extra intensifier anddoes not collapse the second gate level into a singleplane.

We have argued that economy in implementation ofoptical computers requires that spatially variant inter-connections be used to construct Boolean functions.We have also given several characteristics that webelieve are required of optical adjuncts to electroniccomputers. Further, we have demonstrated how asimple spatially variant imaging technique can beused to construct two basis sets and two other usefulfunctions. In Section 4 we expand this discussion tothe construction of larger functions, which are opti-mal in gate count and depth.

A

A ( B

B

Fig. 5. Implementation of the OR function. Fig. 7. Logic diagram of the X-NOR or equivalence function.

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5575

Page 9: Complexity analysis of optical-computing paradigms

Masks

a a IV bbCc ddee f

Fig. 8. Implementation of the X-NOR or equivalence function.

4. Construction of Larger Functions

Here we illustrate the extension of our technique tofunctions of larger numbers of inputs. In this exten-sion, we develop all functions from the basis gate NOR,as was done in Section 3. This restriction on gateselection increases the gate count we require forimplementing a given function, and thus tends toimpose worst-case gate counts on our implementa-tions. This is in keeping with our general desire todemonstrate that, even when such conditions areimposed on the proposed method, this method stillretains a significant advantage over other methodsthat have been forwarded. This advantage is areflection of the underlying complexities involved inthe functions we demonstrate. Since these func-tions have implementations that are less complexthan those imposed by SSL and other paradigms, andsince the use of the 2-input NOR gate as the sole gatefunction in our implementation can affect only thecomplexity of the systems we implement by, at most,a constant multiple, we are able to demonstrate thesuperiority of our proposed method.

A. Arbitrary 3-Input Function

The first function that we implement has 3 inputs,and can be described by the shorthand notation for asum-of-products form:

f(a, b, c) = (O, M2 , M3 , M6 )

= ab + Wc + bE

= a-(b + ) + b. (5)

This function would require two AND, two OR, andthree NOT gates to implement in factored form. In

A

B

Fig. 9. Logic diagram of the OR.

aC

e

g

a bC de f9 ( h

bdf

h

Masks

Fig. 10. Implementation of the XOR.

the implementation shown in Fig. 11, the functionrequires eight NOR gates. It should be noted thatonly the uncomplemented inputs are mapped to the

-,input plane, and that all complementation results inan increased gate count.

In Fig. 11 there are three basic elements: masks,image intensifiers, and NOR gates. The process ofconstructing the optical circuit begins by replicatingthe input image (the column at left) as many times asthere are input variables. In this case, there are onlythree variables, and so the image is split into threeimages with one image representing the variables a,b, and c. These variables are then mapped in pairs tothe first level of the computation graph. The outputof this first level is a set of five images (including theimage of the variable a, which does not participate inany computation at this level), which is then focusedon the gate planes at the second level of computation.The process continues until the final gate computesthe desired function.

Notice from Fig. 11 that the number of gates ineach computation plane is one third the number ofinput variables, and that the density of these gates issimilarly reduced. This reduction is due to theinherent parallel nature of light imaged by the spa-tially invariant devices used in the method. Byisolating each variable in separate image planes, wecan map any two images to a gate plane with nooverlap of variables outside the two upon which weoperate. Thus we can achieve both masking andgate functions with one plane. This will be the casewith all functions we demonstrate in this section.

Comparing this circuit realization to those ob-tained -by other. methods, we note that there areactually ten active elements in the optical realization,including two intensifying elements, An additive

5576 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 10: Complexity analysis of optical-computing paradigms

Fig. 11. Optical implementation of the 3-input function.

logic SSL implementation requires eight detectionplanes, which is an apparent saving over our realiza-tion. An analysis of the energy requirements of thetwo implementations shows that the energy requiredat each pixel of the input plane in an additive logicimplementation is 24 times the switching energythreshold of a given detector because of the need toprovide three shifted copies of the input plane foreach detection plane. In the above implementation,the energy required is proportional only to the num-ber of NOR gates and the number of intensifiers.Hence the energy required for implementing thiscircuit is less than half that of an SSL implementation.A programmable logic implementation requires aminimum of 48 gates, and hence more than fourtimes the energy required here. A shadow-castingimplementation requires four sources, but masksone-eighth of the light incident on each computationcell from the detector, hence requiring that the inputenergy per cell be 32 times the detector thresholdenergy.

B. Implementation of Two Adders

We now turn our attention to some practical applica-tions, the full adder, and the carry look-ahead adder.While numerous papers have demonstrated designsof the serial ripple-carry adder, we extend this circuitto demonstrate the design of the parallel carry look-ahead adder. Despite the apparent parallelism hy-pothesized for optical systems, most adders proposedto date are based on the full adder with ripple carrybetween full adder stages. This kind of addition isfundamentally serial, and does not truly exploit thehigh degree of parallelism that should accompany theuse of optical computing elements. After first dem-onstrating the implementation of the full adder, wedemonstrate one possible implementation of the carrylook-ahead adder with spatially variant imaging.Specifically, we demonstrate the design of an adder

that has two 2-bit vectors and a carry as its inputs,and that computes a 2-bit sum vector and a carry asits outputs. Larger adders can be constructed fromthis circuit by rippling the carry between these stages,by increasing the size of the adder to accept a greaternumber of bits in each input vector, or by addinganother level of carry look-ahead. While a 4- or 8-bitadder would be more practical in a real application,the size of these adders merely complicate the figureswithout illustrating the design technique we demon-strate.

1. Full AdderThe design of the full adder was addressed in SectionI. We synthesize an optical realization of the fulladder on the basis of NOR from those equations.Although there are several ways to implement the fulladder, the method we have chosen requires five gatesand three levels in XOR-AND-OR logic, and has amaximum fan-out from any one gate of two. Whilethere is a two-level design, the total gate count is six,and the fan-out of some of the nodes is greater also.The choice of this implementation was dictated by theminimality in the number of gates rather than fan-out restrictions. This realization is shown in Fig.12.

An adder computing an n + 1-bit result from a 2n +1-bit input can be constructed from this circuit by aprocess of replication, masking, and spatially variantimaging in the same manner as the construction ofthe full adder. The resultant ripple-carry adderwould have a gate count that would increase [(n)]and a depth [also 0 (n)]. The details of this adder areomitted so that we may concentrate on the carrylook-ahead adder.

2. Carry Look-Ahead AdderAs with any general adder, the carry look-ahead addertakes as its inputs two n-bit vectors and a carry as its

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5577

Page 11: Complexity analysis of optical-computing paradigms

Fig. 12. Optical implementation of the full adder.

inputs, and generates an n + 1-bit result. The 2-bitcarry look-ahead adder is shown in Fig. 13. In Fig.13 the carry look-ahead adder is implemented on thebasis of the set {AND, OR, XOR}. The circuit operatesby computing in parallel a pair of intermediate signalsfor each pair of inputs. These signals are shown inFig. 13 as Gi and Pi, where i = 1, 2. The signal G, =AiBi is true if a carry is generated by inputs Ai and Bi.The signal Pi = Ai (D Bi is true if a carry is propagatedby the pair. A carry look-ahead adder with 2n + 1

A2 B2 al B, Co

< w~ ~ ~~2 2 1 P

C2 C2

SI S

Fig. 13. Minimized circuit graph for the carry look-ahead adder.

inputs will then generate 2n of these intermediatesignals. From this information, the carryout of theith bit position can be computed as

+ (PiPi-1 ... PiCo), (6)

while the corresponding sum is computed as

Si = Pi CQ-.1. (7)

The cost in the number of gates and the delay inrealizing the adder for pairs of n inputs on the basis ofthe set {AND, OR, XOR} can be computed by a countingargument. The terms Gi and Pi can be realized inunit size and depth. The computation of the sum Sican also be realized in unit size and depth. Hence foreach sum output at least three gates are required.The carry Ci requires one i + 1-input OR gate, and iAND gates with (2, 3, . . . , i, i + 1) inputs. The i +1-input OR gate may be implemented by a binary treewhose nodes are 2-input OR gates. For i = 0, there isno cost since the carry CO is an input. If i = 1, then

5578 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 12: Complexity analysis of optical-computing paradigms

only one 2-input OR is required. For i > 1, i 2-inputOR gates are required, and the depth of the OR is[log2 il. Thus the i + 1-input OR may be replaced bya circuit consisting of i OR gates, with a depth ofrlog2 i exactly.

The analysis of the number of AND gates is some-what more involved. We can easily construct aj-input AND gate from a binary tree of j - 1 2-inputAND gates (j 2 2). Thus, the total number of 2-in-put AND gates required to implement the carry for theith bit is

i+l i(i +1)CND' = 1: ( -) = 2: =

j=2 j= 2(8)

while the depth of the resulting tree is the depth ofthe largest subtree, which is

DAND'S = [g 2 ( + 1)]. (9)

Thus the total cost of a carry look-ahead adder withtwo n-bit inputs is the sum over the costs of alloutputs:

CCLA = (3 + i + )l /~ 2

3 + 2+ 2i 2

3n(n + 1) 1(n n2 n3\= 3n + - + -2- +-

1 3 223= 6n3 + n + n, (10)

where CLA is the carry look-ahead adder.The depth is found as the maximum depth, which is

associated with the final carry:

Dc, = log2 n] + log2(n + 1)] + 2. (11)

The implementation to the 2-bit carry look-aheadadder is shown in Fig. 14. An upper bound to thenumber of NOR gates required to implement thiscircuit can be calculated by multiplying the number ofXOR, AND, and OR gates by the number of NOR gatesrequired to implement each of these functions, andsumming the results. This is an upper bound be-cause the possibility exists that minimization such ascanceling two successive inversions can reduce thenumber of gates actually required. Even though thedirect conversion of the carry look-ahead adder to aNOR-based implementation increases the gate count,this increase is bounded by, at most, a constantmultiple; hence the implementation will still requireonly 0(n 3 ) gates, and 0(log 2 n) depth. The directoptical implementation of this circuit is shown in Fig.15.

C2

Fig. 14. Minimizedahead adder.

S2 51

NOR gate implementation of the carry look-

5. Comparisons with Other Methods

Having developed our method and demonstratedseveral functional implementations, we can now dem-onstrate the viability of our method in relation toother digital optical-computing methods that havebeen advanced. In terms of complexity, our methodhas advantages over all the other methods analyzed,since the other methods are so similar in complexity.A tabular comparison of the orders of complexity ofthe proposed system and the other systems addressedby this paper may be found in Table 2.

A. Present Method Compared With Symbolic Substitution

In analyzing SSL, we saw that the power required atthe input plane increased as fl(n * 2n) times thepower required to switch the detector, and that therewas not way to reduce this requirement withoutleaving the Boolean domain. While SSL is a viablemethod for certain classes of algorithms that can betransformed to serial implementations operating onmany sets of a small number of bits,24 or thosealgorithms can make use of large numbers of don't-care combinations, as a general-purpose computingparadigm it suffers from the above-mentioned limita-tions. In contrast, the method proposed in thispaper has several advantages.

Perhaps the greatest advantage with this method isthe ability to achieve near-minimal complexity in gatecount (and hence power dissipation) while retainingthe high degree of parallelism promised by optics.Once the input has been duplicated to provide anumber of separate images that is equal to the

10 September 1992 Vol. 31, No. 26 APPLIED OPTICS 5579

Page 13: Complexity analysis of optical-computing paradigms

Fig. 15. Optical implementation of the carry look-ahead adder.

Table 2. Order Analysis of an n-Bit Parallel Adder by Using theMethods In This Paper,

Numberof Gates, Circuit Normalized

Light Sources, Depth or Energyor Detection Time 1 Gate/

Method Planes Complexity Detector = 1

SSL 0(2n)(1) O(n2n)Shadow casting 0(2n) O(1) (2n)Programmable logic fl(n2n) £l(n) W(n2n)Proposed spatially O(n3 ) 0(log2 n) O(n

3 )variant method(carry look-aheadadder)

aNormalized energy refers to the amount of energy required tocompute the function in terms of the energy required to switch onegate or detector.

number of input variables la process which has an0(n) gate or amplification count, and a depth0[(log2 n)]1, the subsequent computations are per-formed in a minimal number of operations, providedthat the circuit has been properly minimized. Thusthe design cycle is similar to that of the electronicdomain in that the initial effort is devoted to computa-tions and methods that have an abstract connectiononly to the physical realization, but that generate agood realization when translated to the physicalrealm. Hence we may use well-established methodsof circuit minimization in our initial design, with thecertainty that any minimal design so conceived will beincreased in complexity by an additive linear termonly, and the depth will be increased by an additivelogarithmic term only. More concretely, given afunction whose implementation requires an O(n3)

gate count and an 0(log 2 n) depth, we can implement

5580 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 14: Complexity analysis of optical-computing paradigms

the function optically with costs:

C(fA) = O(n3) + 0(n); D(fn)

= 0(log2 n) + 0(log2 n). (12)

Thus for functions of a large number of inputs, thecost involved in duplicating the input plane is a smallfraction of the cost of implementation, which is itselfmuch smaller than that which could be achieved inSSL.

We achieve a further gain over most SSL implemen-tations by using simple intensity coding. A factor of2 increase in data density achieved by this encodingmay be increased over that of SSL in those cases inwhich SSL must provide dead space between fields ofoperands. Our density of active elements is alsotwice that of SSL. In additive SSL there is only oneactive element for every n input bit, which corre-sponds to 2n pixel positions. The method we pro-pose here will result in one active element for every npixel.

B. Present Method Compared With Shadow-Casting Logic

Shadow-casting logic is the most difficult of theproposed paradigms with which to compare our pro-posed system. This difficulty results from the mea-sures by which the paradigm's complexity is measured.The decrease of available light with increasingnumbers of inputs does show an exponential varianceand the number of sources required is 0(2n). Theredoes not seem to be a good choice for a single unit ofmeasure that can combine these two variances. Inthe best case for shadow casting, i.e., the implementa-tion of a function with only one minterm in its ON-set,the available light at the detector is only 2-n of thatwhich is incident on the corresponding cell. In thiscase, however, a circuit with 0(n) gates may beconstructed by spatially variant techniques to com-pute the function. That this is true may be seenfrom the following argument. A single minterm of nvariables may be implemented by a n-input AND gatewhose inputs are suitable assigned the value of eachvariable or that variable's complement, depending onthe form of the minterm. Hence, at most, n invert-ers are required, in addition to the n-input AND, if thecomplements of the inputs are not available. Nowan n-input AND gate may be constructed from n - 12-input AND gates; the resulting tree has a depth of(log2 n). Thus a function with n-inputs, and withone minterm only in its ON set may be implementedby a circuit containing, at most, n inverters and n - 1AND gates, for a total of 2n - 1 active elements.Even when restricted to NOR gates alone, each AND

may be replaced by, at most, three NOR gates, andeach inverter with one, which results in a circuit with,at most, 4n - 3 gates. Since the power required of acircuit is proportional to the number of gates, such afunction can be constructed with 0(n) power require-ments. At the other extreme, computing the parityfunction a shadow-casting implementation requires

2n-1 sources or detectors. An implementation of thisfunction in terms of the XOR requires only 0(n) XORgates, each of which can again be replaced by aconstant number of NOR gates. In between theseextremes, there are functions that require a directoptical implementation of the best theoretical solu-tion and that have fl(2n/n) gate functions and, hence,power dissipation. This will still be an improvementover that achievable with Shadow Casting.

C. Present Method Compared With Programmable Logic

Our method provides many of the same strengthswhile eliminating several of programmable logic'sweaknesses. As with programmable logic, ourmethod uses optical nonlinearities directly as logicgates. The use of general interconnections, how-ever, overcomes the exponential complexity imposedby the limited interconnection capabilities of thecrossover network.

When we compare the carry look-ahead adder witha parallel adder implemented with the crossoverinterconnection only, we can see that, for the 2-bitadder demonstrated, 34 NOR gates were required inthe spatially variant implementation, while the lattermethod requires a minimum of 2 x 5 x 2 = 320 ANDand OR gates. Even given that the latter method canprovide fault tolerance,2 5 a triple-modular-redundantversion of the spatially variant circuit with votingwould still require fewer gates, and, hence, be funda-mentally more reliable.

Further, even when a function implementation canbe minimized in programmable logic to a level requir-ing only n x 2n logic elements, the number of outputsthat can then be computed is indeterminate, and islimited by the ability to find contention-free pathsthrough the OR stage of the network.

D. Present Method Compared With Combinatorial Logic

Comparison with the method of combinatorial logic isdifficult. While the spatially invariant version of thecombinatorial logic method has been shown to becapable of implementing only a small subset of thepossible functions, allowing spatially variant intercon-nections may extend the capabilities of this system toallow a greater portion, perhaps all, of the possibleBoolean functions to be computed. We note, how-ever, that the system that we propose can implementall Boolean functions, at a cost of power and speedthat is commensurate with the best theoretical imple-mentations. Since our system is spatially variantoverall and uses optical elements directly as gates,there is no need to precompute partial terms electron-ically, and, hence, the speed should be close to themaximum that is attainable optically.

6. Conclusions

We have shown that simple spatially invariant inter-connects can reduce the complexity required of anoptical implementation. This reduction was achievedbecause the spatially invariant implementations im-

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5581

Page 15: Complexity analysis of optical-computing paradigms

pose such great restrictions on the capabilities of anoptical-computing system. We have seen that sev-eral seemingly dissimilar optical-computing para-digms have complexity measures that are quite simi-lar. SSL requires 2 detector planes and 2n shiftoperations per detection operation. Shadow-castinglogic requires n planes of SLM's and 0(2n-1) lightsources, with a resultant decrease in illumination onthe detector plane that is a function of 2-". Program-mable logic requires a field in each plane of at least 2ngates width and a number of planes, of which at least2n are reserved. The spatially invariant subsystemof the combinatorial logic method cannot compute allthe possible Boolean functions. It has been ourcontention that all these systems share similar com-plexity measures because of the nature of the restric-tions imposed by their imaging techniques.

Obviously what we have presented here is in thenature of a plausibility argument rather than a strictproof. A rigorous model of spatially invariant imag-ing elements and spatially invariant interconnectionsapplicable to the level considered here is needed.Such a model could be used as the basis of construc-tive proofs that could demonstrate rigorously whetherthere is any set of spatially invariant interconnec-tions that can optimally construct Boolean functionsof a higher order. The model of Thompson2 6 has hada great impact on very large-scale integrated design; asimilar model for spatially invariant optical systems,or possibly a model incorporating spatially varianttechniques, should make a similar impact on the fieldof optical computing.

Appendix A. Some Results in Boolean Complexity

The limits imposed by Shannon's theorem' may seemsurprising, but are well established. To clarify theimplications of this theorem the following discussionmay prove helpful. The normal two-level realizationof a minimized sum-of-products Boolean formulaconstitutes a general design methodology, and, hence,can result in expensive implementations of mostfunctions. The worst-case function in this regard,the parity function, has 2n-1 product terms, whichrequire that either the actual value of each variable orthat variable's complement be present. Implement-ing this function directly from its minimized sum-of-products representation requires the construction of2"-1 n-input AND gates and one 2n-1-input OR gate.While each of the n-input AND gates may be con-structed from a Binary tree of n - 1 2-input ANDgates, realization of the 2n-'-input OR gate requires abinary tree of 2n- 1 - 1 OR gates. While the complex-ity of this realization is in excess of that of theShannon limit, there exists a realization of thisfunction that requires only n - 1 XOR gates and has adepth of log2 n - 1. This example demonstratesthat the Karnaugh map or the Quine-McCluskeyreduction techniques can fail to yield a truly mini-mum realization, and other techniques such as aring-sum expansion (which is used to derive the XORimplementation) or factoring may achieve reductionsthat other techniques cannot.

The problem of finding a minimal realization of afunction becomes worse when functions with mutipleoutputs are considered. Each output is, in general, afunction of all inputs and hence may require a largenumber of gates to implement. In these cases,finding terms that are common to two or moreoutputs may decrease the number of gates required.All these techniques can fail to find a minimumrealization in cases in which a function has anunderlying structure that can be exploited. Hence,careful analysis of the nature of the function beingimplemented is required.

It is interesting to note that increasing the fan-outof gates does not decrease the complexity of theimplementation of a function by more than a con-stant multiple (restricting fan-out to 1 imposes aspecial case). Additionally, increasing the fan-in ofthe gate functions (actually changing to a differentbasis set) increases the number of gates required toimplement a function by, at most, a constant multiple.For proofs of these two statements the reader isreferred to Ref. 23. Consequently, from a theoreti-cal viewpoint, we need only analyze the cost ofimplementing a function with any of the 2-input basissets to determine the order of complexity of animplementation, as long as we allow a fan-out of 2.

This research was supported by National ScienceFoundation grant no. MIP-8909326. We are in-debted to Udi Manber of the Computer ScienceDepartment, University of Arizona, for many fruitfuldiscussions.

References1. C. E. Shannon, "The synthesis of two-terminal switching

circuits," Bell Syst. Tech. J. 28, 59-98 (1949).2. W. F. McColl, "Planar circuits have short specifications," in

Lecture Notes on Computer Science, G. Goos and J. Hartmanis,eds., (Springer-Verlag, New York, 1985), Vol. 182, pp. 231-242.

3. K. H. Brenner, A. Huang, and N. Streibl, "Digital opticalcomputing with symbolic substitution," Appl. Opt. 25, 3054-3060 (1986).

4. B. K. Jenkins, P. Chavel, R. Forchheimer, A. A. Sawchuk, andT. C. Strand, "Architectural implications of a digital opticalprocessor," Appl. Opt. 23, 3465-3474 (1984).

5. A. Huans, "Parallel algorithms for optical digital computers,"in Proceedings of the Tenth International Optical ComputingConference (Institute of Electrical and Electronics Engineers,New York, 1983), p. 13.

6. K. H. Brenner, "New implementation of symbolic substitutionlogic," Appl. Opt. 25, 3061-3064 (1986).

7. J. Weigelt, "Binary logic by spatial filtering," Opt. Eng. 26,28-32 (1987).

8. K. H. Brenner, A. W. Lohmann, and T. M. Merklein, "Sym-bolic substitution implemented by spatial filtering logic," Opt.Eng. 28, 390-396, (1989).

9. D. P. Casasent and E. C. Botha, "Multifunctional opticalprocessor based on symbolic substitution," Opt. Eng. 28,425-433 (1989).

10. H. Jeon, M. A. G. Abushagur, A. A. Sawchuk, and B. K.Jenkins, "Digital optical processor based on symbolic substitu-tion using holographic matched filtering," Appl. Opt. 29,2113-2125 (1990).

5582 APPLIED OPTICS / Vol. 31, No. 26 / 10 September 1992

Page 16: Complexity analysis of optical-computing paradigms

11. J. Tanida and Y. Ichioka, "Optical logic array processor usingshadowgrams," J. Opt. Soc. Am. 73, 800-809 (1983).

12. J. Tanida and Y. Ichioka, "Optical-logic-array processor usingshadowgrams. III. Parallel neighborhood operations and anarchitecture of an optical digital-computing system," J. Opt.Soc. Am. A 2, 1245-1253 (1985).

13. J. Tanida and Y. Ichioka, "OPALS: optical parallel arraylogic system," Appl. Opt. 25, 1565-1570 (1986).

14. J. Tanida and Y. Ichioka, "Modular components for an opticalarray logic system," Appl. Opt. 26, 3954-3960 (1987).

15. M. J. Murdocca, A. Huang, J. Jahns, and N. Streibl, "Opticaldesign of programmable logic arrays," Appl. Opt. 27, 1651-1660 (1988).

16. M. J. Murdocca and T. J. Cloonan, "Optical design of a digitalswitch," Appl. Opt. 28, 2505-2517 (1989).

17. P. S. Guilfoyle and W. J. Wiley, "Combinatorial logic baseddigital optical computing architectures," Appl. Opt. 27, 1661-1673 (1988).

18. A. Louri, "Parallel implementation of optical symbolic substi-tution logic using shadow-casting and polarization," Appl.Opt. 30, 540-548 (1991).

19. A. Louri, "Throughput enhancement for optical symbolic

substitution computing systems," Appl. Opt. 29, 2979-2981(1990).

20. G. Eichmann, A. Kostrzewski, D. H, Kim, and Y. Li, "Opticalhigher-order symbolic recognition," Appl. Opt. 29, 2135-2147(1990).

21. J. Weigelt, "Space-bandwidth product and crosstalk of spatialfiltering methods for performing binary logic optically," Opt.Eng. 27, 883-892, (1988).

22. R. Arrathoon and S. Kozaitis, "Shadow casting for multiple-valued associative logic," Opt. Eng. 25, 29-37 (1986).

23. J. E. Savage, The Complexity of Computing (Wiley, New York,1976).

24. A. Louri, "Three-dimensional optical architectural and data-parallel algorithms for massively parallel computing," IEEEMicro II, 24-27, 65-82 (1991).

25. M. J. Murdocca, "Fault avoidance for optical logic arrays andregular free-space interconnects," in Digital Optical Comput-ing II, R. Arrathoon, ed., Proc. Soc. Photo-Opt. Instrum. Eng.1215, 116-122) (1990).

26. C. D. Thompson, "A complexity theory for VLSI," Ph.D.dissertation (Carnegie-Mellon University, Pittsburgh, Pa.,1980).

10 September 1992 / Vol. 31, No. 26 / APPLIED OPTICS 5583