generic visual perception

8/13/2019 Generic Visual Perception

1/19

Generic visual perception processor

INTRODUCTION

While computing technology is growing in leaps and bounds, the human

brain continues to be the world's fastest computer. Combine brain-power with seeing

power, and you have the fastest, cheapest, most extra ordinary processor ever-the

human eye. Little wonder, research labs the world over are striving to produce a near-

perfect electronic eye.

The 'generic visual perception processor !"##$' has been developed after %&

long years of scientific effort. !eneric "isual #erception #rocessor !"##$ canautomatically detect ob ects and trac( their movement in real-time . The !"##, which

crunches )& billion instructions per second *+# $, models the human perceptual

process at the hardware level by mimic(ing the separate temporal and spatial functions

of the eye-to-brain system. The processor sees its environment as a stream of

histograms regarding the location and velocity of ob ects.

!"## has been demonstrated as capable of learning-in-place to solve a variety of

pattern recognition problems. +t boasts automatic normali ation for varying ob ect si e,

orientation and lighting conditions, and can function in daylight or dar(ness.

This electronic eye on a chip can now handle most tas(s that a normal human eye

can. That includes driving safely, selecting ripe fruits, reading and recogni ing things.

adly, though modeled on the visual perception capabilities of the human brain, the

chip is not really a medical marvel, poised to cure the blind.

1College of applied sciences, Mavelikkara


2/19


BAC GROUND O! T"# IN$#NTION

The invention relates generally to methods and devices for automatic visualperception, and more particularly to methods and devices for processing image signals

using two or more histogram calculation units to locali e one or more ob ects in an

image signal using one or more characteristics an ob ect such as the shape, si e and

orientation of the ob ect. uch devices can be termed an electronic spatio-temporal

neuron, and are particularly useful for image processing, but may also be used for other

signals, such as audio signals. The techni/ues of the present invention are also

particularly useful for trac(ing one or more ob ects in real time.

+t is desirable to provide devices including combined data processing units of a

similar nature, each addressing a particular parameter extracted from the video signal.

+n particular, it is desirable to provide devices including multiple units for calculating

histograms, or electronic spatio-temporal neuron T0, each processing a 12T2 2$, bya function in order to generate individually an output value.

The present invention also provides a method for perception of an ob ect using

characteristics, such as its shape, its si e or its orientation, using a device composed of a

set of histogram calculation units.



3/19


3sing the techni/ues of the present invention, a general outline of a moving

ob ect is determined with respect to a relatively stable bac(ground, then inside this

outline, elements that are characteri ed by their tone, color, relative position etc. are

determined. .

%OTI#NTIA& 'IG"T#D

The !"## was invented in 1992 , by *4" founder #atric #irim . +t would be

relatively simple for a C56 chip to implement in hardware the separate contributions

of temporal and spatial processing in the brain. The brain-eye system uses layers of

parallel-processing neurons that pass the signal through a series of preprocessing steps,resulting in real-time trac(ing of multiple moving ob ects within a visual scene.

#irim created a chip architecture that mimic(ed the wor( of the neurons, with the

help of multiplexing and memory. The result is an inexpensive device that can

autonomously perceive and then trac( up to eight user-specified ob ects in a video

stream based on hue, luminance, saturation, spatial orientation, speed and direction of

motion.

The !"## trac(s an ob ect, defined as a certain set of hue, luminance and

saturation values in a specific shape, from frame to frame in a video stream by

anticipating where it7s leading and trailing edges ma(e differences with the

bac(ground. That means it can trac( an ob ect through varying light sources or changes

in si e, as when an ob ect gets closer to the viewer or moves farther away.

The !"##7 ma or performance strength over current-day vision systems is its

adaptation to varying light conditions. Today7s vision systems dictate uniform shadow

less illumination ,and even next generation prototype systems, designed to wor( under

8normal9 lighting conditions, can be used only dawn to dus(. The !"## on the other

hand, adapt to real time changes in lighting without recalibration, day or light.



4/19


:or many decades the field of computing has been trapped by the limitations of

the traditional processors. 5any futuristic technologies have been bound by limitations

of these processors .These limitations stemmed from the basic architecture of these

processors. Traditional processors wor( by slicing each and every complex program

into simple tas(s that a processor could execute. This re/uires an existence of an

algorithm for solution of the particular problem. *ut there are many situations where

there is an inexistence of an algorithm or inability of a human to understand the

algorithm. 4ven in these extreme cases !"## performs well. +t can solve a problem with

its neural learning function. 0eural networ(s are extremely fault tolerant. *y their

design even if a group of neurons get, the neural networ( only suffers a smooth

degradation of the performance. +t won7t abruptly fail to wor(. This is a crucial

difference, from traditional processors as they fail to wor( even if a few components aredamaged. !"## recogni es stores , matches and process patterns. 4ven if pattern is not

recogni able to a human programmer in input the neural networ(, it will dig it out from

the input. Thus !"## becomes an efficient tool for applications li(e the pattern

matching and recognition.

"O( IT (OR '

*asically the chip is made of neural networ( modeled resembling the structure of

human brain. The basic element here is a neuron. There are large number of input lines

and an output line to a neuron. 4ach neuron is capable of implementing a simple

function. +t ta(es the weighted sum of its inputs and produces an output that is fed into

the next layer. The weights assigned to each input are a variable /uantity.

2 large number of such neurons interconnected form a neural networ(. 4very

input that is given to the neural networ( gets transmitted over entire networ( via direct

connections called synaptic connections and feed bac( paths. Thus the signal ripples in

the neural networ(, every time changing the weighted values associated with each input

of every neuron. These changes in the ripples will naturally direct the weights to modify



5/19


into those values that will become stable .That is, those values does not change. 2t this

point the information about the signal is stored as the weighted values of inputs in the

neural networ(.

2 neural networ( geometri es computation. When we draw the state diagram of

a neural networ(, the networ( activity burrows a tra ectory in this state space. The

tra ectory begins with a computation problem. The problem specifies initial conditions

which define the beginning of tra ectory in the state space.

+n pattern learning, the pattern to be learned defines the initial conditions.

Where as in pattern recognition, the pattern to be recogni ed defines the initial

conditions. 5ost of the tra ectory consists of transient behavior or computations. The

weights associated with inputs gradually change to learn new pattern information. The

tra ectory ends when the system reaches e/uilibrium. This is the final state of the neuralnetwor(. +f the pattern was meant to be matched, the final neuronal state represents the

pattern that is closest match to the input pattern.

+nput to the electronic eye can come from video, infrared or radar signals. "ideo

signal is composed of a succession of frames, wherein each frame includes a

succession of pixels whose assembly forms a space, for example an image for a two-

dimensional space. ;eal-time outputs perceive, recogni e and analy e both static images

and time-varying patterns for specific ob ects, their heading, speed, shading and color

differences. *y mimic(ing the eye and the visual regions of the brain, the !"## puts

together the salient features necessary for recognition. o instead of capturing frames of

pixels, the chip identifies ob ects of interest, determines each ob ect's speed and

direction, then follows them by trac(ing their color through the scene.

The chip emulates the eye, which has < million cones sensitive to color, only %< times more sensitive than the cones through two processing steps-tonic

and phasic.

The chip mimics the human eye7s two processing steps, tonic and phasic .Tonic

processing auto-scales according to ambient light conditions, enabling it to adapt to a



6/19


range of luminosity. #hasic processing determines movement by using local variable in

the feedbac( loops loops. .2s light7s edges pass over the cones and rods and cones of the

eye, these local feedbac( loops detect contrast changes caused by ob ects moving

through the scene

:or detecting smooth contours, rather than sharp contrast changes, the eye adds

ocular movement. The eye typically sweeps a scene about two to three times a second as

well as ma(ing vibratory movements at about %&& ? . The faster itter accounts for the

eye-sensitivity to the smallest detectable feature, which is an edge moving between

ad acent rods or cones.

2fter all this processing the visual signal is then sent to the brain for higher-level

observation and recognition tas(s. *ecause only detected movements and color along

with the shape and contour of ob ects is sent to the brain, rather than raw pixels, theaverage compression ratio information is about %=


7/19


associated with the first leading-edge$ and last trailing-edge$ pixel belonging to the

ob ect. 4ach of these silicon neurons is built with ;25, a few registers, an adder and a

comparator. upplied as a %&&-pin module, the chip accommodates analogue-input line

levels for video input, with an input amplifier with programmable gain auto scaling the

signal.

The modules measure =& s/uare mm, have %&& pins and can handle )&-5?

video signals. The chip is priced at @AB&. 2 card with a soc(eted !"## and 64 bytes of

:lash ;25 comes at @ 1,500 .

DI$ID# AND CON)U#R ince processing in each module on the !"## runs in parallel out of its own memory

space, multiple !"## chips can be cascaded to expand the number of ob ects that can be

recogni ed and trac(ed. When set in master-slave mode, any number of !"## chips can

divide and con/uer, for instance, complex stereoscopic vision applications.

'O!T(AR# A'%#CT' 6n the software side, a host operating system running on an external #C

communicates with the !"##'s evaluation board via an 6 (ernel within the on-chip

microprocessor. *4" dubs the neural-learning capability of its development

environment programming by seeing and doing, because of its ease of use. The

engineer needs no (nowledge of the internal wor(ings of the !"##, the company said,

only application-specific domain (nowledge.

#rogramming the !"## is as simple as setting a few registers, and then testing

the results to gauge the application's success, said teve ;owe, *4"'s director ofresearch and development. 6nce debugged, these tiny application programs are loaded

directly into the !"##'s internal ;65.

2pplication programs themselves can use CDD, which ma(es calls to a library of

assembly language algorithms for visual perception and trac(ing of ob ects. The



8/19


system's modular approach permits the developer to create a hierarchy of application

building bloc(s that simplify problems with inheritable software characteristics.

*ased on the neural networ( learning functions, the chip allows application

software to be developed /uic(ly through a combination of simple operationalcommands and immediate feedbac(. imple applications such as detecting and trac(ing

a single moving ob ect can be programmed in less than a day. 4ven complex

applications such as detecting a driver falling sleep can be programmed in a month.

The ability of a neural networ( to learn from the data is perhaps its greatest

ability. This learning of neural networ( can be described as its configuration such that

the application of a set of inputs produces the desired set of outputs. "arious methods to

set the strengths of the connections exist. 6ne way is to set the synaptic connection

explicitly, using a priori (nowledge. 2nother way is to EtrainEthe neural networ( by

feeding it teaching patterns and letting it change its weights according to some learning

rule.

R#COGNITION TA' +n applications, each pixel may be described with respect to any of the six

domains of information available to it F hue, luminance, saturation, speed, direction of

motion and spatial orientation. The !"## further subcategori es pixels by ranges, for

instance luminance within %& percent and B< percent, hue of blue, saturation between

)& and )< percent, and moving upward in scene.

2 set of second-level pattern recognition commands permits the !"## to search

for different ob ects in different parts of the scene F for instance, to loo( for a closed

eyelid only within the rectangle bordered by the corners of the eye. ince some

applications may also re/uire multiple levels of recognition, the !"## has softwarehoo(s to pass along the recognition tas( from level to level.

:or instance, to detect when a driver is falling asleep F a capability that could

find use in California, which is about to mandate that cars sound an alarm when

drowsy drivers begin to nod off F the !"## is first programmed to detect the driver's



9/19


head, for which it creates histograms of head movement. The microprocessor reads

these histograms to identify the area for the eye.

Then the recognition tas( passes to the next level, which searches only within the

eye area rectangles. ?igh-speed movement there, normally indicative of blin(ing, isdiscounted, but when blin(s become slower than a predetermined level, they are

interpreted as the driver nodding off, and trigger an alarm.

G$%% ARC"IT#CTUR# The chip houses 23 neural bloc(s, both temporal and spatial, each consisting of )&

hardware input and output synaptic connections. The !"## multiplexes this neuralhardware with off-chip scratchpad memory to simulate as many as %&&,&&& synaptic

connections per neuron. 4ach of these synapses can be changed through the on-chip

microprocessor for a combined processing total of over B.) billion synaptic

connectionsGsecond.

+n executing up to )& *+# to analy e successive frames of a video stream, the

temporal neurons identify pixels that have changed over time and generate a >-bit value

indicative of the magnitude of that change. The spatial-processing system analy es theresulting difference histogram to calculate the speed and direction of the motion.

?istogram is a bar chart of the count of pixels of every tone of gray that occurs in

the image. +t helps us analy e, and more importantly, correct the contrast of the image.

Technically, the histogram maps Luminance, which is defined from the way the

human eye perceives the brightness of different colors. :or example, our eyes are most

sensitive to greenH we see green as being brighter than we see blue. Luminance weighsthe effect of this to indicate the actual perceived brightness of the image pixels due to

the color components.



10/19


:ig A.% ;epresentation of !"##

MU&TI% %#RC#%TION'

The vision perception processor chip processes motion images in real-time,

and is able to perceive and trac( ob ects based on combinations of hue, luminance,

speed, direction of motion and spatial orientation. The chip has three functionsI

temporal processing, spatial processing and histogram processing. +n applications, the

vision processor called generic visual perception processor !"##$ is used with a

microprocessor, flash memory and peripherals.

+n a case of detection of a driver falling sleep, the processor core detects movement

of the eyelids. :irst, the driver is identified, and then the microprocessor directs the

vision processor to search within the corner points of a rectangular area in which the

nose of the driver would be expected to be located, based on a model, and to select pixels

having characteristics of the shadows of nostrils. The results are calculated to the end of



11/19


the image frame being analy ed, and the microprocessor analy es the resulting

histograms to determine characteristics indicative of nostrils, such as the spacing and

shaping and shape of the nostrils.

Then, the microprocessor directs the vision processor to isolate a rectangular area

in which the eyes of the driver are expected to be located, based on the model, and to

select pixels having characteristics of high-speed movement normally indicative of

blin(ing. The microprocessor analy es over time the histograms ta(en of the eye area to

determine the duration of each blin( and the interval between blin(s. :rom the blin(

duration and interval information, the microprocessor is able to determine when the

driver is falling

asleep and triggers an alarm accordingly.

The highly parallel internal design of the vision processor allows it to perform

over )& billion instructions per second *+# $, and the internal three functions play the

following roles. Temporal processing involves the processing of successive frames of an

image to smooth the pixels of the image over time to prevent interference affecting

ob ect detection. patial processing involves the processing of pixels within a locali ed

area of the images to determine the speed and direction of movement of each pixel.

;eceiving the information about speed and direction of each pixel from spatial and

temporal processors, the histogram processor perceives ob ects in the images that have

undergone significant change over time and the magnitude of the changes.

The chip is able to detect and trac( eight ob ects per frame, meaning )=& ob ects

per second in a >& framesGs systems such as T" or high definition T"s ?1T"$. The

vision processor includes a total of )> spatio-temporal neural bloc(s, each having

approximately )& different input and output connections. 4ach connection can becontrolled through the on-chip microprocessor. The vision processor provides a

mechanism to reconfigure the spatio-temporal neural bloc(s to perform a wide variety

of vision processing, such as moving ob ect detection.



12/19


:ig %&.% ;epresentation of histogram calculation unit



13/19


AD$ANTAG#' AND DI'AD$ANTAG#'

AD$ANTAG#'

1. +mitating the human eye's neural networ(s and the brain, the !"## can handle

some )& billion instructions per second, compared to a mere millions handled by

#entium-class processors of today.

2. !"## has been demonstrated as capable of learning-in-place to solve a variety of

pattern recognition problems.

3. +t has automatic normali ation for varying ob ect si e, orientation, and lighting

conditions, and can function in daylight or dar(ness.

4. .+t is an inexpensive device that can autonomously perceive and then trac( up

to eight user-specified ob ects in a video stream.

5. ince processing in each module on the !"## runs in parallel out of its own

memory space, multiple !"## chips can be cascaded to expand the number of

ob ects that can be recogni ed and trac(ed.

6. The engineer needs no (nowledge of the internal wor(ings of the !"##, the

company said, only application-specific domain (nowledge.

7. imple applications can be /uic(ly prototyped in a few days, with medium-si eapplications ta(ing a few wee(s and even big applications only a couple of

months

8. The chip could be useful across a wide variety of industries where visual trac(ing

is important



14/19


15/19


A%%&ICATION' *4" lists possible applications for the !"## in process monitoring, /uality

control and assemblyH automotive systems such as intelligent air bags that monitorpassenger si e and traffic congestion monitorsH pedestrian detection, license plate

recognition, electronic toll collection, automatic par(ing management, automatic

inspectionH and medical uses including disease identification. The chip could also prove

useful in unmanned air vehicles, miniature smart weapons, ground reconnaissance and

other military applications, as well as in security access using facial, iris, fingerprint, or

height and gait identification

*+ Auto otive industr-

?ere, !"##-type devices could (eep watch and trigger off an alarm if the driver

were to nod off to sleep at the wheel. 6r act as a monitor for the car's progress to ma(e

sure it does not get too close to the curb.

2lso with transportation, !"## could be used in developing systems for collision

avoidance, automatic cruise control, smart air bag systems, license plate recognition,measurement of traffic flow, electronic toll collection, automatic cargo trac(ing, par(ing

management and the inspection of crac(s in rails and tunnels.

2nother automotive application is warning erratic drivers. !"## does so by

monitoring the left and right lanes of the road, signaling the driver when the vehicle

deviates from the proscribed lane.

.+ Ro/otics

+n manufacturing, !"## have applications in robotics, particularly for dirty and

dangerous obs such as feeding hot parts to forging presses, cleaning up ha ardous

waste, and spraying toxic coatings on aircraft parts.



16/19


0+ Agriculture and fis1eries

+n agriculture and fisheries engineering, !"## could help with tas(s that

traditionally have been labor intensive such as disease and parasite identification,

harvest control, ripeness detection and yield identification.

2+ Militar- applications

The chip can be used in other important pattern-recognition applications, such as

military target ac/uisition and fire control.. 5ilitary applications include unmanned air

vehicles, automatic target detection, tra ectory correction, ground reconnaissance andsurveillance. "ision systems are useful in home security, as well, and !"##-based

systems could be inexpensively developed to detect intruders and fires.

+n addition to the above applications, !"## should be able to wor( as medical

scanners, blood analy ers, cardiac monitoring, ban( chec(s, bar code reading, seal and

signature verification, trademar( database indexing, construction of virtual reality

environment models, human motion analysis, expression understanding, cloud

identification, and many other fields.



17/19


PRELIMINARY INFORMATION OF GVPP-7BGeneric Visual Perception Processor

2rray :ormat max$I J&&?xB&&" :rame rateI &-%&& "!2 frames per second progressive-scan +nterface 5odeI 5asterG lave 1ata ;ate max$I =& 5egapixel per second 1ynamic ;angeI %&-bits, > channels #arametersI Luminance, ?ue, aturation, 5otion orientation,

velocity$, 6riented edges, lines, curves, corners ComputationI B= T0 bloc(s

5ulti-scales possibilities optional$ 5ulti-chips connections capabilities emi-!raphic interface visuali ation !3+ with mouse #2L, 0T C, "!2, visuali ation +nternal 6 C language programmation #C+, # ), +)C, ; )>), #W5 &.> Watt, >,>"olts for #2LG0T C format -=&,DJ< C Complete application in one i# % x% I C-56 +mager, !"##,

%&5bits ";25, % 5bits erial :lash 5emory,)K 5? Cristal.



18/19


!UTUR# 'CO%#'

:or future, scientists are wor(ing on a 8visual mouse9 for hand-

gesture interface to computers that ta(e advantage of that high compression ratio.

:uture studies also involve using this processor as an eye of the robots, which provides

tremendous applications.

CONC&U'ION'

+mitating the human eye's neural networ(s and the brain, the generic visual perception

processor can handle about )& billion instructions per second, and can manage most

tas(s performed by the eye. 5odeled on the visual perception capabilities of the human

brain, the !"## is a single chip that can detect ob ects in a motion video signal and then

to locate and trac( them in real time far more dependably than competing systems, which cost far more, according to company scientists. This is a generic chip, and we've

already identified more than %&& applications in ten or more industries. The chip could

be useful across a wide swathe of industries where visual trac(ing is important.



19/19


R#!#RCNC#'

1M httpIGGwww.patentstorm.us

2M httpIGGwww.techweb.com

3M *art os(o., 0eural networ(s and fu y systems, 2ddison Wesley =MhttpIGGwww.pixelin(.comGsupport

generic visual perception

Documents