cellular neural network simulation and modeling oroszi balázs 2006.01.06
TRANSCRIPT
Cellular Neural Network Cellular Neural Network Simulation and ModelingSimulation and Modeling
Oroszi BalázsOroszi Balázs
2006.01.06.2006.01.06.
OverviewOverview
Introduction: About the CNN in generalIntroduction: About the CNN in general Basic characteristics of the CNNBasic characteristics of the CNN Modeling and simulation of the CNN architectureModeling and simulation of the CNN architecture The functional model of the CNN architectureThe functional model of the CNN architecture
– Handling special cases to increase performanceHandling special cases to increase performance From theory to practice: Realization of the CNN From theory to practice: Realization of the CNN
simulatorsimulator SummarySummary DemonstrationDemonstration
About the CNN in generalAbout the CNN in general
In 1988 papers from Leon O. Chua In 1988 papers from Leon O. Chua introduced the concept of the Cellular Neural introduced the concept of the Cellular Neural Network. CNNs can be defined as “Network. CNNs can be defined as “2D or 2D or 3D arrays of mainly locally connected 3D arrays of mainly locally connected nonlinear dynamical systems called cells, nonlinear dynamical systems called cells, whose dynamics are functionally determined whose dynamics are functionally determined by a small set of parameters which control by a small set of parameters which control the cell interconnection strengththe cell interconnection strength” (Chua). ” (Chua). These parameters determine the connection These parameters determine the connection pattern, and are collected into the so-called pattern, and are collected into the so-called cloning templatescloning templates, which, once determined, , which, once determined, define the processing of the whole structure.define the processing of the whole structure.
Basic Characteristics of the CNNBasic Characteristics of the CNN
The CNN can be defined as an The CNN can be defined as an M x NM x N type array of identical cells type array of identical cells arranged in a rectangular grid. Each cell is locally connected to its 8 arranged in a rectangular grid. Each cell is locally connected to its 8 nearest surrounding neighbors.nearest surrounding neighbors.
Each cell is characterized by Each cell is characterized by uuijij, , yyijij and and xxijij being the being the inputinput, the , the outputoutput
and the and the statestate variable of the cell respectively. variable of the cell respectively. The The outputoutput is related to the is related to the statestate by the nonlinear equation: by the nonlinear equation:
yyijij = = ff((xxijij) = 0.5 (| ) = 0.5 (| xxijij + 1| – |+ 1| – |xxijij – 1|)– 1|)
The state transition of neuron (i, j) is governed by the following The state transition of neuron (i, j) is governed by the following differential equation:differential equation:
jijiSlkC
lkjiSlkC
lkjiji ztulkjiBtylkjiAxxrr
,),(),(
,),(),(
,,, )(),,,()(),,,(
Basic Characteristics of the CNN (2)Basic Characteristics of the CNN (2)
Where C(i,j) represents the Where C(i,j) represents the neuronneuron at column i, row j, S at column i, row j, Srr(i,j) represents (i,j) represents
the neurons in the the neurons in the radius rradius r of the neuron C(i,j), and z of the neuron C(i,j), and zi,ji,j is the is the thresholdthreshold
(bias) of the cell C(i,j).(bias) of the cell C(i,j). The coefficients The coefficients AA((ii, , j,j, kk, , ll) and ) and BB((ii, , j,j, kk, , ll) are known as the ) are known as the cloning cloning
templatestemplates. In general, they are . In general, they are nonlinearnonlinear, , timetime- and - and space variant space variant operatorsoperators..
If they are considered If they are considered linearlinear, , timetime- and - and space invariantspace invariant, they can , they can simply be represented by matrices.simply be represented by matrices.
Modeling and simulation of the CNN architectureModeling and simulation of the CNN architecture
Simulation plays an important role in the design of the CNN cloning Simulation plays an important role in the design of the CNN cloning templates.templates.
Therefore, it has to be Therefore, it has to be fastfast enough to allow the design phase of enough to allow the design phase of various templates be accomplished in reasonable time.various templates be accomplished in reasonable time.
At the same time, the simulation has to be At the same time, the simulation has to be accurateaccurate enough, to reflect enough, to reflect the behavior of the analog circuitry correctly.the behavior of the analog circuitry correctly.
In practice, the simulation of the CNN involves a trade-off between In practice, the simulation of the CNN involves a trade-off between accuracy and computation time.accuracy and computation time.
Modeling and simulation of the CNN architecture (2)Modeling and simulation of the CNN architecture (2)
The true processing capabilities of CNNs for high-speed parallel The true processing capabilities of CNNs for high-speed parallel processing are only fully exploited by dedicated VLSI hardware processing are only fully exploited by dedicated VLSI hardware realizations.realizations.
Typical CNN chips may contain up to 200 transistors per pixel.Typical CNN chips may contain up to 200 transistors per pixel. At the same time, industrial applications require large enough grid At the same time, industrial applications require large enough grid
sizes (around 100 x 100).sizes (around 100 x 100). Thus, CNN chip designers must confront complexity levels larger than Thus, CNN chip designers must confront complexity levels larger than
101066 transistors, most of them operating in analogue mode. transistors, most of them operating in analogue mode.
Modeling and simulation of the CNN architecture (3)Modeling and simulation of the CNN architecture (3)
On the one hand, On the one hand, high-level simulationhigh-level simulation, which is focused on , which is focused on emulating the emulating the functionalfunctional behaviour, cannot reflect realistically the behaviour, cannot reflect realistically the underlying electronic circuitry. Their lack of detail makes them ill-underlying electronic circuitry. Their lack of detail makes them ill-suited for reliable IC simulation.suited for reliable IC simulation.
On the other hand, the SPICE-type On the other hand, the SPICE-type transistor-level simulatorstransistor-level simulators, , although very accurate, are barely capable of handling more than about although very accurate, are barely capable of handling more than about 101055 transistors and may take several days of CPU time for circuit transistors and may take several days of CPU time for circuit netlists containing about 10netlists containing about 1066 transistors. Hence, these low-level tools transistors. Hence, these low-level tools are ill-suited for simulating large CNN chips.are ill-suited for simulating large CNN chips.
Therefore, it would be necessary to bridge the gap between these Therefore, it would be necessary to bridge the gap between these approaches, which would give very accurate results in reasonable (but approaches, which would give very accurate results in reasonable (but not real-) time.not real-) time.
However, our main concern now is fast simulation, so in the rest of However, our main concern now is fast simulation, so in the rest of this presentation we shall focus on the this presentation we shall focus on the functional modelingfunctional modeling of the of the CNN architecture.CNN architecture.
The functional model of the CNN architectureThe functional model of the CNN architecture
The output of a CNN model simulation is the final state reached by the The output of a CNN model simulation is the final state reached by the network after evolving from an initial state under the influence of a network after evolving from an initial state under the influence of a specific input and boundary conditions. The following block diagram specific input and boundary conditions. The following block diagram shows the state-transition and output of a single cell:shows the state-transition and output of a single cell:
The functional model of the CNN architecture (2)The functional model of the CNN architecture (2)
In the most general case, the final state of one cell can be described by In the most general case, the final state of one cell can be described by the following equation:the following equation:
dxftxdxtxtxt
t
t
t
))(()()()()(00
00
As a closed form for the solution of the above equation cannot be As a closed form for the solution of the above equation cannot be given, it must be integrated numerically.given, it must be integrated numerically.
For the simulation of such equations on a digital computer, they must For the simulation of such equations on a digital computer, they must be mapped into a be mapped into a discrete-time system discrete-time system thatthat– emulates the continuous-time behavioremulates the continuous-time behavior,,
– has similar dynamicshas similar dynamics
– and and converges to the same final stateconverges to the same final state.. The error committed by this emulation depends on the choice of the The error committed by this emulation depends on the choice of the
method of integration, i. e. the way in which the integral is calculated.method of integration, i. e. the way in which the integral is calculated.
The functional model of the CNN architecture (3)The functional model of the CNN architecture (3)
There is a wide variety of integration algorithms that can be used to There is a wide variety of integration algorithms that can be used to perform this task. However, only three of them are going to be considered perform this task. However, only three of them are going to be considered here. These methods are:here. These methods are: the explicit Euler’s formula:the explicit Euler’s formula:
))(())((1
n
t
t
txftdttxfn
n
)])(())(([2
))(( )0(1
1
nn
t
t
txftxft
dttxfn
n
))(()()( )0(1 nnn txfttxtx
)22(6
1))(( 4321
1
kkkkdttxfn
n
t
t
)2
1)((
)2
1)((
)2
1)((
))((
34
23
12
1
ktxftk
ktxftk
ktxftk
txftk
n
n
n
n
the predictor-corrector algorithm:the predictor-corrector algorithm:
and the fourth-order Runge-Kutta method:and the fourth-order Runge-Kutta method:
wherewhere
wherewhere
The functional model of the CNN architecture (4)The functional model of the CNN architecture (4)
The Euler method is the The Euler method is the fastestfastest, but gives the , but gives the least accurateleast accurate convergence behaviour.convergence behaviour.
Runge-Kutta gives the Runge-Kutta gives the best resultsbest results, however, , however, much slowermuch slower. In this . In this case, four auxiliary components (k1-k4) are computed. These are case, four auxiliary components (k1-k4) are computed. These are auxiliary values, which are then averaged. This makes it rather ill-auxiliary values, which are then averaged. This makes it rather ill-suited for applications, that prefer speed over accuracy.suited for applications, that prefer speed over accuracy.
If the main goal would be accuracy and robustness, undoubtedly If the main goal would be accuracy and robustness, undoubtedly Runge-Kutta would be the method of choice. In our case, however, as Runge-Kutta would be the method of choice. In our case, however, as the primary target is a fast, working implementation of a CNN the primary target is a fast, working implementation of a CNN simulator as an image processor, we shall choose the Euler method.simulator as an image processor, we shall choose the Euler method.
Handling special cases for increasing performamceHandling special cases for increasing performamce
What can be considered a special case from a programming point of What can be considered a special case from a programming point of view?view?– special inputspecial input
– special templates (A, B)special templates (A, B) To gain significant speed improvements, the case of special templates To gain significant speed improvements, the case of special templates
should be examined.should be examined. It is not uncommon within templates that extract local properties of It is not uncommon within templates that extract local properties of
the image (like edge detectors) to use a fully zero A template.the image (like edge detectors) to use a fully zero A template. I have discovered, that revisiting the state equation when the A I have discovered, that revisiting the state equation when the A
template is fully zero, significant improvements in speed can be template is fully zero, significant improvements in speed can be achieved.achieved.
Handling special cases for increasing performamce (2)Handling special cases for increasing performamce (2)
Given A = 0, the state equation takes the following form:Given A = 0, the state equation takes the following form:
ZBUXX
001 XtXX )( 00 CXtX tCtX )1(0
tCtXX )1(12 tCttCtX )1]()1([ 0 tCttCtX )1()1( 20
tCtXX )1(23 tCttCttCtX )1]()1()1([ 20
tCttCttCtX )1()1()1( 230
tCtXX )1(34 tCttCttCttCtX )1]()1()1()1([ 230
tCttCttCttCtX )1()1()1()1( 2340...
(BU + Z) is constant during the process. Let: BU + Z = C(BU + Z) is constant during the process. Let: BU + Z = C Using Euler integration:Using Euler integration:
The pattern can clearly be seen by now.The pattern can clearly be seen by now.
Handling special cases for increasing performamce (3)Handling special cases for increasing performamce (3)
In each new step In each new step XX00(1-Δt)(1-Δt)nn gets multiplied by gets multiplied by (1-Δt)(1-Δt) so it’s power so it’s power
index increases. The remaining part is a geometric series, so the index increases. The remaining part is a geometric series, so the general equation of calculating the n-th state is:general equation of calculating the n-th state is:
kn
k
nn ttCtXX )1()1(
1
00
1
11
q
qaS
n
n
])1(1[]1)1[(1)1(
1)1(
1)1()1(
1
0
nnnn
kn
k
tCtCt
ttC
t
ttCttC
ZBUtZBUXX
CtCXtCCtXtCtXXn
n
nnnnnn
)1)((
)1)(()1()1(])1(1[)1(
0
000
tCttCttCttCtXX )1()1()1()1( 23404
Using the general formula of calculating the sum of a geometric series:Using the general formula of calculating the sum of a geometric series:
So the state equation using this will be:So the state equation using this will be:
the sum of the above geometric series turns into:the sum of the above geometric series turns into:
Handling special cases for increasing performance (4)Handling special cases for increasing performance (4)
This result is of utmost importance regarding speed, because:This result is of utmost importance regarding speed, because:
– the number of iterations that need to be performed to get to the nthe number of iterations that need to be performed to get to the n thth state is reduced to 1state is reduced to 1
– thus we can get to the final state thus we can get to the final state immediatelyimmediately, given the U input, , given the U input, the B template and Z biasthe B template and Z bias
As multiple iterations through an image causes lots of non-cacheable As multiple iterations through an image causes lots of non-cacheable memory accesses (which is very slow), this improvement in the memory accesses (which is very slow), this improvement in the special case of A = 0 gives a huge boost in speed.special case of A = 0 gives a huge boost in speed.
ZBUtZBUXX nn )1)(( 0
From theory to practice:From theory to practice:Realization of the CNN simulatorRealization of the CNN simulator
Environment used: Avisynth (http://www.avisynth.org)Environment used: Avisynth (http://www.avisynth.org)– A powerful tool for video post-production.A powerful tool for video post-production.
– Special programming language, designed specifically for video Special programming language, designed specifically for video processing.processing.
– It’s functions are implemented under-the-hood as C/C++ dynamic link It’s functions are implemented under-the-hood as C/C++ dynamic link libraries (DLLs), which are called libraries (DLLs), which are called pluginsplugins..
– Plugins expose an interface (functions) towards the scripting language, Plugins expose an interface (functions) towards the scripting language, from which these functions can be called.from which these functions can be called.
– The CNN simulator is realised as a plugin (DLL) for Avisynth, written in The CNN simulator is realised as a plugin (DLL) for Avisynth, written in C++.C++.
Primary goal: speedPrimary goal: speed– but also make sure it behaves according to the state-equationbut also make sure it behaves according to the state-equation
SummarySummary
CNN simulation:CNN simulation:– functional modeling (mathematical calculation according functional modeling (mathematical calculation according
to the state-equation)to the state-equation)
– circuit-level modelingcircuit-level modeling Implementation:Implementation:
– based on functional modelbased on functional model
– using Avisynth (http://www.avisynth.org)using Avisynth (http://www.avisynth.org)
– written in C++ programming languagewritten in C++ programming language
– available in my web-space at available in my web-space at http://digitus.itk.ppke.hu/~orobahttp://digitus.itk.ppke.hu/~oroba